Re: Cassandra Data Archiving

2012-05-31 Thread Zhu Han
On Fri, Jun 1, 2012 at 12:28 PM, Harshvardhan Ojha harshvardhan.o...@makemytrip.com wrote: Problem statement: We are keeping daily generated data(user generated content) in Cassandra, but our application is using only 15 days old data. So how can we archive data older than 15 days so

Re: will compaction delete empty rows after all columns expired?

2012-05-30 Thread Zhu Han
On Thu, May 31, 2012 at 9:31 AM, Curt Allred c...@mediosystems.com wrote: No, these were not wide rows. They are rows that formerly had one or 2 columns. The columns are deleted but the empty rows dont go away, even after gc_grace_secs. The empty row goes away only during a compaction after

Re: Repair Process Taking too long

2012-04-14 Thread Zhu Han
On Sat, Apr 14, 2012 at 1:57 PM, Igor i...@4friends.od.ua wrote: Hi! What is the difference between 'repair' and '-pr repair'? Simple repair touch all token ranges (for all nodes) and -pr touch only range for which given node responsible? -pr only touches the primary range of the node. If

Re: sstable2json and resurrected rows

2012-03-31 Thread Zhu Han
Did you hit the bug here? https://issues.apache.org/jira/browse/CASSANDRA-4054 best regards, 坚果云 https://jianguopuzi.com/, 最简捷易用的云存储 无限空间, 文件同步, 备份和分享! 2012/3/30 Jonas Borgström jo...@borgstrom.se Let me rephrase my question: Is it true that deleted rows will still be present in the

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Zhu Han
On Tue, Dec 27, 2011 at 2:31 PM, Kevin Burton burtona...@gmail.com wrote: I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework optimized for iterative and pipelined map reduce jobs. http://peregrine_mapreduce.bitbucket.org/ This originally started off with some internal

Re: split large sstable

2011-11-21 Thread Zhu Han
best regards, 韩竹(Zhu Han) 坚果铺子 https://jianguopuzi.com, 最简单易用的云存储 同步文件, 分享照片, 文档备份! On Mon, Nov 21, 2011 at 11:07 PM, Dan Hendry dan.hendry.j...@gmail.comwrote: Pretty sure your argument about indirect blocks making large files inefficient only pertains to ext2/3 and not ext4. It seems ext4

Re: Upgrading to 1.0

2011-11-02 Thread Zhu Han
I'd like to know whether it is possible to upgrade from 0.6.13 to 1.0.x directly, either. Is there anything we should notice that nodetool scrub might not fix? On Wed, Nov 2, 2011 at 7:46 PM, Jake Maizel j...@soundcloud.com wrote: Hello, We run a medium sized cluster of 12 nodes on 0.6.13

Re: Planet Cassandra is now live

2011-08-12 Thread Zhu Han
On Sat, Aug 13, 2011 at 4:35 AM, Konstantin Naryshkin konstant...@a-bb.netwrote: Would you consider adding an RSS feed to the site for the benefit of those who like to use feed readers to keep track of unread posts and what not? Here it is: http://planetcassandra.org/aggregator/rss -

Re: migrating from 0.6 to 0.8, java.io.IOError: ... cannot extend file to required size

2011-08-11 Thread Zhu Han
On Wed, Aug 10, 2011 at 5:24 PM, aaron morton aa...@thelastpickle.comwrote: I remember seeing this once before upgrading a system from 0.6 to 0.7 on a Ubuntu EC2 (non data stax build) with EBS disks. I did the same thing and just assumed it was an EBS or 0.6 bug. From memory after the upgrade

Re: Survey: Cassandra/JVM Resident Set Size increase

2011-07-29 Thread Zhu Han
Chris, I've deployed the patch to the cluster for two days. Everything is quite good since then. Thank you! best regards, 韩竹(Zhu Han) On Sat, Jul 30, 2011 at 3:52 AM, Chris Burroughs chris.burrou...@gmail.comwrote: Thanks to everyone who responded (I think I learned a few new tricks from

Re: Cassandra 0.6.8 snapshot problem?

2011-07-28 Thread Zhu Han
On Thu, Jul 28, 2011 at 10:47 PM, Jian Fang jian.fang.subscr...@gmail.comwrote: Hi, We have an old production Cassandra 0.6.8 instance without replica, i.e., the replication factor is 1. Recently, we noticed that the snapshot data we took from this instance are inconsistent with the running

Re: Cassandra Storage Sizing

2011-07-21 Thread Zhu Han
Very helpful. Thank you! best regards, Zhu Han On Thu, Jul 21, 2011 at 12:24 PM, Todd Burruss bburr...@expedia.com wrote: I put together a blog post on Cassandra Storage Sizing so I don’t need to keep figuring it out again and again. Hope everyone finds it useful, and give feedback

Re: Commit log is not emptied after nodetool drain

2011-07-15 Thread Zhu Han
2011/7/15 Zhu Han schumi@gmail.com 2011/7/15 Jonathan Ellis jbel...@gmail.com If you have non-empty segments post-drain that is a bug. Is it reproducible? I think it is always reproducible on 0.6.x branch. Here is a simple experiment: Should I raise an issue ticket on it? 1

Commit log is not emptied after nodetool drain

2011-07-14 Thread Zhu Han
://issues.apache.org/jira/browse/CASSANDRA-2874 best regards, 韩竹(Zhu Han) 坚果铺子 https://jianguopuzi.com, 最简单易用的云存储 同步文件, 分享照片, 文档备份!

Re: Commit log is not emptied after nodetool drain

2011-07-14 Thread Zhu Han
-1310702291383.log 2011/7/14 Zhu Han schumi@gmail.com: Jonathan, But all the old non-empty log segments are kept on the disk. And cassandra takes some time to apply the operations from these closed log segments after restart of the process. Is it expected? best regards, 韩竹(Zhu

Re: Survey: Cassandra/JVM Resident Set Size increase

2011-07-13 Thread Zhu Han
On Wed, Jul 13, 2011 at 9:45 PM, Konstantin Naryshkin konstant...@a-bb.netwrote: Do you mean that it is using all of the available heap? That is the expected behavior of most long running Java applications. The JVM will not GC until it needs memory (or you explicitly ask it to) and will only

Re: copy data from multi-node cluster to single node

2011-07-04 Thread Zhu Han
On Tue, Jul 5, 2011 at 8:58 AM, aaron morton aa...@thelastpickle.comwrote: How do you change the name of a cluster? The FAQ instructions do not seem to work for me - are they still valid for 0.7.5? Is the backup / restore mechanism going to work, or is there a better/simpler to copy data

Re: compaction behaviour

2011-04-03 Thread Zhu Han
best regards, Zhu Han On Sun, Apr 3, 2011 at 9:21 AM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I have loaded data into cassandra using batch processing the response times for reads are in the range of 0.8 ms but I am using SSDs. so I expect the read times to be even

Re: reduced cached mem; resident set size growth

2011-03-16 Thread Zhu Han
On Thu, Feb 3, 2011 at 1:49 AM, Ryan King r...@twitter.com wrote: On Wed, Feb 2, 2011 at 6:22 AM, Chris Burroughs chris.burrou...@gmail.com wrote: On 01/28/2011 09:19 PM, Chris Burroughs wrote: Thanks Oleg and Zhu. I swear that wasn't a new hotspot version when I checked, but that's

Re: reduced cached mem; resident set size growth

2011-03-16 Thread Zhu Han
On Thu, Mar 17, 2011 at 10:27 AM, Zhu Han schumi@gmail.com wrote: On Thu, Feb 3, 2011 at 1:49 AM, Ryan King r...@twitter.com wrote: On Wed, Feb 2, 2011 at 6:22 AM, Chris Burroughs chris.burrou...@gmail.com wrote: On 01/28/2011 09:19 PM, Chris Burroughs wrote: Thanks Oleg and Zhu. I

Re: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Zhu Han
On Fri, Mar 11, 2011 at 10:40 AM, Erik Forkalsrud eforkals...@cj.comwrote: I see the same behavior with smaller batch sizes. It appears to happen when starting Cassandra with the defaults on relatively large systems. Attached is a script I created to reproduce the problem. (usage: mutate.sh

Re: Tombstone lifespan after multiple deletions

2011-01-19 Thread Zhu Han
On Wed, Jan 19, 2011 at 8:41 PM, Germán Kondolf german.kond...@gmail.comwrote: On Wed, Jan 19, 2011 at 12:59 AM, Zhu Han schumi@gmail.com wrote: On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf german.kond...@gmail.com wrote: Yes, that's what I meant, but correct me if I'm

Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread Zhu Han
If the tombstone is older than the row or column inserted later, is the tombstone skipped entirely after compaction? best regards, hanzhu On Wed, Jan 19, 2011 at 11:16 AM, Jonathan Ellis jbel...@gmail.com wrote: If you mean that multiple tombstones for the same row or column should be merged

Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread Zhu Han
I'm not clear here. Are you worried about the later inserted tombstone prevents the whole row from being reclaimed and the storage space can not be freed? To my knowledge, after major compaction, only the row key and tombstone are kept. Is it a big deal? best regards, hanzhu On Tue, Jan

Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread Zhu Han
On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf german.kond...@gmail.comwrote: Yes, that's what I meant, but correct me if I'm wrong, when a deletion comes after another deletion for the same row or column will the gc-before count against the last one, isn't it? IIRC, after compaction. even

Re: Which Java on Fedora? Sun's or GNU's?

2010-12-29 Thread Zhu Han
I met with native memory leak with OpenJDK. Still try to figure it out... best regards, hanzhu On Wed, Dec 29, 2010 at 6:11 PM, Peter Schuller peter.schul...@infidyne.com wrote: Which is best? Which is preferred? If by GNU you mean the gcj stuff, then absolutely no. :) If you mean

Re: Which Java on Fedora? Sun's or GNU's?

2010-12-29 Thread Zhu Han
Eric, Do you use the default GC settings? Can you show me the openJDK version by java -version? Thank you! If everything is the same, I suspect I need to upgrade the kernel. best regards, hanzhu On Wed, Dec 29, 2010 at 11:44 PM, Eric Evans eev...@rackspace.com wrote: On Wed, 2010-12-29 at

Re: complexity

2010-12-24 Thread Zhu Han
When the row is stored on disk as SSTable, the complexity of getting a row is constant, as it always know where to get the row by in-memory indices. When the row is stored in memory as memtable, it is stored as skip list[1]. The complexity is O(logN). N is the total number of rows in the skip

Re: complexity

2010-12-24 Thread Zhu Han
Yep. I forgot about the binary search part. Thank you! regards, hanzhu On Fri, Dec 24, 2010 at 9:35 PM, Jonathan Ellis jbel...@gmail.com wrote: On Fri, Dec 24, 2010 at 4:42 AM, Zhu Han schumi@gmail.com wrote: When the row is stored on disk as SSTable, the complexity of getting a row

Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)

2010-12-20 Thread Zhu Han
Can anybody recommend a stable enough JDK environment for 0.6.x branch on ubuntu server? Thank you! best regards, hanzhu On Sun, Dec 19, 2010 at 10:29 AM, Zhu Han schumi@gmail.com wrote: The problem seems still like the C-heap of JVM, which leaks 70MB every day. Here is the summary

Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)

2010-12-17 Thread Zhu Han
, hanzhu On Thu, Dec 16, 2010 at 9:28 PM, Zhu Han schumi@gmail.com wrote: I've tried it. But it does not work for me this afternoon. Thank you! best regards, hanzhu On Thu, Dec 16, 2010 at 8:59 PM, Matthew Conway m...@backupify.comwrote: Thanks for debugging this, I'm running

Re: Memory leak with Sun Java 1.6 ?

2010-12-16 Thread Zhu Han
This bug is present in both Sun JDK and open JDK because they share the same hotspot VM. Sun JDK got the fix earlier than open JDK. best regards, hanzhu On Thu, Dec 16, 2010 at 6:43 PM, Jedd Rashbrooke jedd.rashbro...@imagini.net wrote: Hi Peter, I've read through the Very high memory

Very high memory utilization (not caused by mmap on sstables)

2010-12-15 Thread Zhu Han
Hi, I have a test node with apache-cassandra-0.6.8 on ubuntu 10.4. The hardware environment is an OpenVZ container. JVM settings is # java -Xmx128m -version java version 1.6.0_18 OpenJDK Runtime Environment (IcedTea6 1.8.2) (6b18-1.8.2-4ubuntu2) OpenJDK 64-Bit Server VM (build 16.0-b13, mixed

Re: Very high memory utilization (not caused by mmap on sstables)

2010-12-15 Thread Zhu Han
:50 AM, Zhu Han schumi@gmail.com wrote: Hi, I have a test node with apache-cassandra-0.6.8 on ubuntu 10.4. The hardware environment is an OpenVZ container. JVM settings is # java -Xmx128m -version java version 1.6.0_18 OpenJDK Runtime Environment (IcedTea6 1.8.2) (6b18-1.8.2-4ubuntu2

Re: Very high memory utilization (not caused by mmap on sstables)

2010-12-15 Thread Zhu Han
the instance. best regards, hanzhu On Thu, Dec 16, 2010 at 1:00 PM, Zhu Han schumi@gmail.com wrote: After investigating it deeper, I suspect it's native memory leak of JVM. The large anonymous map on lower address space should be the native heap of JVM, but not java object heap. Has anybody met

Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)

2010-12-15 Thread Zhu Han
/bugdatabase/view_bug.do?bug_id=6824570 [2] http://blog.fuseyism.com/index.php/2010/09/10/icedtea6-19-released/ best regards, hanzhu On Thu, Dec 16, 2010 at 3:10 PM, Zhu Han schumi@gmail.com wrote: The test node is behind a firewall. So I took some time to find a way to get JMX diagnostic