Moving Cassandra data from one cluster to another cluster on a different network

2010-02-22 Thread Jon Graham
Hello Everyone, I have a cluster of 6 Cassandra nodes in the network range 192.168.1.10 to 192.168.1.15 with host names CAS1 .. CAS6 I want to move all the data from the existing cluster to a test Cassandra cluster in network range 192.168.2.10 - 192.168.2.15 with host names CASTEST1 .. CASTEST6

Re: Moving Cassandra data from one cluster to another cluster on a different network

2010-02-22 Thread Jonathan Ellis
Just scp the data files over, one per node. You just need to make sure that the token on the node you are copying to is the same as the source token. if you copy everything from data and commitlog this will Just Work, since token is stored in data/system. And of course you will need to tweak the

Cassandra paging, gathering stats

2010-02-22 Thread Sonny Heer
Hey, We are in the process of implementing a cassandra application service. we have already ingested TB of data using the cassandra bulk loader (StorageService). One of the requirements is to get a data explosion factor as a result of denormalization. Since the writes are going to the memory ta

Re: Cassandra paging, gathering stats

2010-02-22 Thread Brandon Williams
On Mon, Feb 22, 2010 at 1:40 PM, Sonny Heer wrote: > Hey, > > We are in the process of implementing a cassandra application service. > > we have already ingested TB of data using the cassandra bulk loader > (StorageService). > > One of the requirements is to get a data explosion factor as a resul

Re: Cassandra range scans

2010-02-22 Thread Peter Schüller
>>  1) would you consider Cassandra (0.5+) "safe enough" for a primary data >> store? > > Yes.  Several companies are deploying 0.5 in production.  It's pretty > solid.  (We'll have a 0.5.1 fixing some minor issues RSN, and a 0.6 > beta.)  And I agree that it's significantly simpler to deploy (and

Re: Cassandra paging, gathering stats

2010-02-22 Thread Jonathan Ellis
On Mon, Feb 22, 2010 at 1:40 PM, Sonny Heer wrote: > Hey, > > We are in the process of implementing a cassandra application service. > > we have already ingested TB of data using the cassandra bulk loader > (StorageService). > > One of the requirements is to get a data explosion factor as a result

Re: Cassandra range scans

2010-02-22 Thread Jonathan Ellis
On Mon, Feb 22, 2010 at 2:23 PM, wrote: >> You could use supercolumns here too (where the supercolumn name is the >> thing type).  If you always want to retrieve all things of type A at a >> time per user, then that is a more natural fit.  (Otherwise, the lack >> of subcolumn indexing could be a

Re: Cassandra range scans

2010-02-22 Thread Jeremey.Barrett
On Feb 22, 2010, at 12:19 AM, ext Jonathan Ellis wrote: >> 2) is the row key model I suggested above the best approach in Cassandra, >> or is there something better? My testing so far has been using >> get_range_slice with a ColumnParent of just the CF and SlicePredicate >> listing the columns

Re: Cassandra paging, gathering stats

2010-02-22 Thread Sonny Heer
Jonathan, I could use df command to find the size per column family.  Although when inserting directly into cassandra (not using StorageService) we were collecting the following information for each column family: Total number of keys: 59557 Total number of columns (over all keys): 2171309 Total

Re: Cassandra paging, gathering stats

2010-02-22 Thread Sonny Heer
Is this a bug? ColumnParent columnParent = new ColumnParent(cp, null); SlicePredicate slicePredicate = new SlicePredicate(); // Get all columns SliceRange sliceRange = new SliceRange(new byte[] {}, new byte[] {}, false, Integer.MAX_

Re: Cassandra paging, gathering stats

2010-02-22 Thread Jonathan Ellis
Breaking sooner rather than later is a feature, of sorts. You really do need to give a sane max. Remember that thrift must pull results into memory before giving them back to you, so allowing you to give it a max that cannot possibly fit in memory is not doing you a favor. -Jonathan On Mon, Feb

problem about bootstrapping when used in huge node

2010-02-22 Thread Michael Lee
HI, guys: I have a 15 node cluster, each node has 12 SATA disk which is 1TB, I make soft RAID5 on 11 disk to create a large data partition(md0): [r...@ ~]# mount /dev/sda2 on / type ext2 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw