RE: Preventing an update of a CF row

2010-10-19 Thread Viktor Jevdokimov
Nice and simple! -Original Message- From: Oleg Anastasyev [mailto:olega...@gmail.com] Sent: Tuesday, October 19, 2010 9:00 AM To: user@cassandra.apache.org Subject: Re: Preventing an update of a CF row kannan chandrasekaran ckannanck at yahoo.com writes: Hi All,I have a query

Re: What is the correct way of changing a partitioner?

2010-10-19 Thread Dan Washusen
http://wiki.apache.org/cassandra/DistributedDeletes From the http://wiki.apache.org/cassandra/StorageConfiguration page: Achtung! Changing this parameter requires wiping your data directories, since the partitioner can modify the !sstable on-disk format. So delete your data and commit log

R: Re: TimeUUID makes me crazy

2010-10-19 Thread cbert...@libero.it
I am using Pelops for Cassandra 0.6.x The error that raise isInvalidRequestException(why:UUIDs must be exactly 16 bytes) For the UUID I am using the UuidHelper class provided.

Re: Preventing an update of a CF row

2010-10-19 Thread Sylvain Lebresne
Always specify some constant value for timestamp. Only 1st insertion with that timestamp will succeed. Others will be ignored, because will be considered duplicates by cassandra. Well, that's not entirely true. When cassandra 'resolves' two columns having the same timestamp, it will compare

Re: TimeUUID makes me crazy

2010-10-19 Thread Sylvain Lebresne
In you first column family, you are using a UUID as a row key (your column names are strings apparently (phone, addres)). The CompareWith directive specify the comparator for *column names*. So you are providing strings where you indicated Cassandra you'll provide UUID, hence the exceptions. The

Re: Cassandra security model? ( or, authentication docs ?)

2010-10-19 Thread Yang
Thanks a lot On Mon, Oct 18, 2010 at 11:44 AM, Eric Evans eev...@rackspace.com wrote: On Sun, 2010-10-17 at 21:26 -0700, Yang wrote: I searched around, it seems that this is not clearly documented yet; the closest I found is: http://www.riptano.com/docs/0.6.5/install/auth-config

Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Yang
I read from the Facebook cassandra paper that zookeeper is used . for certain things ( membership and Rack-aware placement) but I pulled 0.7.0-beta2 source and couldn't grep out anything with Zk or Zoo, nor any files with Zk/Zoo in the names is Zookeeper really used? docs/blog posts from

Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Norman Maurer
No Zookeeper is not used in cassandra. You can use Zookeeper as some kind of add-on to do locking etc. Bye, Norman 2010/10/19 Yang tedd...@gmail.com: I read from the Facebook cassandra paper that zookeeper is used . for certain things ( membership and Rack-aware placement) but I

RE: Preventing an update of a CF row

2010-10-19 Thread Viktor Jevdokimov
Reverse timestamp. -Original Message- From: Sylvain Lebresne [mailto:sylv...@yakaz.com] Sent: Tuesday, October 19, 2010 10:44 AM To: user@cassandra.apache.org Subject: Re: Preventing an update of a CF row Always specify some constant value for timestamp. Only 1st insertion with that

Re: Thift version

2010-10-19 Thread Brayton Thompson
Go into the lib dir in Cassandra and look at the thrift jar. The name has in it the specific revision you need to use. Use svn to pull it down. Sent from my iPhone On Oct 18, 2010, at 10:50 PM, JKnight JKnight beukni...@gmail.com wrote: Dear all, Which Thrift version does Cassandra 0.66

Re: Read Latency

2010-10-19 Thread Wayne
The changes seems to do the trick. We are down to about 1/2 of the original quorum read performance. I did not see any more errors. More than 3 seconds on the client side is still not acceptable to us. We need the data in Python, but would we be better off going through Java or something else to

Dumping Cassandra into Hadoop

2010-10-19 Thread Mark
As the subject implies I am trying to dump Cassandra rows into Hadoop. What is the easiest way for me to accomplish this? Thanks. Should I be looking into pig for something like this?

Re: Cassandra/Pelops error processing get_slice

2010-10-19 Thread Frank LoVecchio
Aaron, It seems that we had a beta-1 node in our cluster of beta-2'. Haven't had the problem since. Thanks for the help, Frank On Sat, Oct 16, 2010 at 1:50 PM, aaron morton aa...@thelastpickle.comwrote: Frank, Things are a bit clearer now. Think I had the wrong idea to start with. The

Hadoop Word Count Super Column Example?

2010-10-19 Thread Frank LoVecchio
I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and got the WordCount example to work using the standard configuration. I have been inserting data into a Super Column (Sensor) with TimeUUID as the compare type, it looks like this: get Sensor['DeviceID:Sensor'] =

Re: Hadoop Word Count Super Column Example?

2010-10-19 Thread Jeremy Hanna
It's relatively straightforward, the current mapper gets a map of column names to IColumns. The SuperColumn implements the IColumn interface. So you would probably need both the super column name and the subcolumn name to get at it, but you just need to cast the IColumn to a super column and

Re: Cassandra security model? ( or, authentication docs ?)

2010-10-19 Thread Jeremy Hanna
just as an fyi, I created something in the wiki yesterday - it's just a start though - http://wiki.apache.org/cassandra/ExtensibleAuth there's also a FAQ entry on it now - http://wiki.apache.org/cassandra/FAQ#auth just for going forward - on the wiki itself, just trying to help there. On Oct 19,

Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Yang
Thanks guys. but I feel it would probably be better to refactor out the hooks and make components like zookeeper pluggable , so users could use either zookeeper or the current config-file based seeds discovery Yang On Tue, Oct 19, 2010 at 9:02 AM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote:

Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Jeremy Hanna
That code has never existed in the public. It was taken out before it was open-sourced. On Oct 19, 2010, at 11:45 AM, Yang wrote: Thanks guys. but I feel it would probably be better to refactor out the hooks and make components like zookeeper pluggable , so users could use either

Re: Read Latency

2010-10-19 Thread Jonathan Ellis
I would expect C++ or Java to be substantially faster than Python. However, I note that Hector (and I believe Pelops) don't yet use the newest, fastest Thrift library. On Tue, Oct 19, 2010 at 8:21 AM, Wayne wav...@gmail.com wrote: The changes seems to do the trick. We are down to about 1/2 of

Re: Dumping Cassandra into Hadoop

2010-10-19 Thread aaron morton
Depends on what you mean by dumping into Hadoop. If you want to read them from a Hadoop Job then you can use either native Hadoop or Pig. See the contrib/word_count and contrib/pig examples. If you want to copy the data into a Hadoop File System install then I guess almost anything that can

Re: Read Latency

2010-10-19 Thread aaron morton
Wayne, I'm calling cassandra from Python and have not seen too many 3 second reads. Your last email with log messages in it looks like your are asking for 10,000,000 columns. How much data is this request actually transferring to the client? The column names suggest only a few. DEBUG

Re: Read Latency

2010-10-19 Thread Wayne
Our problem is not that Python is slow, our problem is that getting data from the Cassandra server is slow (while Cassandra itself is fast). Python can handle the result data a lot faster that whatever is it passing through now... I guess to ask a specific question what right now is the fastest

Re: Read Latency

2010-10-19 Thread Wayne
It is an entire row which is 600,000 cols. We pass a limit of 10million to make sure we get it all. Our issue is that it seems Thrift itself has more overhead/latency added to a read that Cassandra takes itself to do the read. If cfstats for the slowest node reports 2.25s to us it is not

Re: Read Latency

2010-10-19 Thread Aaron Morton
Just wondering how many bytes you are returning to the client to get an idea of how slow it is.The call to fastbinary is decoding the wireformat and creating the Python objects. When you ask for 600,000 columns your are creating a lot of python objects. Each column will be a ColumnOrSuperColumn,

Re: Throttling ColumnFamilyRecordReader

2010-10-19 Thread Jonathan Ellis
(Moving to u...@.) Isn't reducing the number of map tasks the easiest way to tune this? Also: in 0.7 you can use NetworkTopologyStrategy to designate a group of nodes as your hadoop datacenter so the workloads won't overlap. On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores mmoo...@real.com

How to get all rows inserted

2010-10-19 Thread Wicked J
Hi, I inserted 500 rows (records) in Cassandra and I'm using the following code to retrieve all the inserted rows. However, I'm able to get only 100 rows (in a random order). I'm using Cassandra v0.6.4 with OrderPreserving Partition on a single node/instance. How can I get all the rows inserted?

Re: How to get all rows inserted

2010-10-19 Thread Aaron Morton
KeyRange as a count on it, the default is 100.For the ordering, double check you are using the OrderPreserving partitioner It it's still out of order send an example.CheersAaronOn 20 Oct, 2010,at 09:39 AM, Wicked J wickedj2...@gmail.com wrote:Hi,I inserted 500 rows (records) in Cassandra and I'm

Re: How to get all rows inserted

2010-10-19 Thread Robert
I have a similar question. Is there a way to divide this into multiple requests? I am using Cassandra v0.6.4, RandomPartitioner, and the pycassa library. Can I use get_range_slices with a start_token=0, and then recalculate the token from the last value key returned until it equals it loops

Re: Read Latency

2010-10-19 Thread Wayne
I am not sure how many bytes, but we do convert the cassandra object that is returned in 3s into a dictionary in ~1s and then again into a custom python object in about ~1.5s. Expectations are based on this timing. If we can convert what thrift returns into a completely new python object in 1s why

Re: How to get all rows inserted

2010-10-19 Thread Tyler Hobbs
I don't think I understand what you're trying to do. Do you want to page over the whole column family X rows at a time? Does it matter if the rows are in order? - Tyler On Tue, Oct 19, 2010 at 5:22 PM, Robert keyboard.opera...@gmail.com wrote: I have a similar question. Is there a way to

Re: How to get all rows inserted

2010-10-19 Thread Robert
For this case, the order doesn't matter, I just need to page over all of the data X rows at a time. When I use column_family.get_range from pycassa and pass in the last key as the new start key, I do not get all of the results. I have found a few posts about this, but I did not find a

Re: How to get all rows inserted

2010-10-19 Thread Aaron Morton
The general pattern is to use get_range_slices as you describe, alsohttp://wiki.apache.org/cassandra/FAQ#iter_worldNote you should be used the key fields on the KeyRange not the tokens.There have been a few issues around with using the RandomPartitioner so it may be best to get on 0.6.6 if you

Re: Read Latency

2010-10-19 Thread Nicholas Knight
On Oct 20, 2010, at 6:30 AM, Wayne wrote: I am not sure how many bytes, Then I don't think your performance numbers really mean anything substantial. Deserialization time is inevitably going to go up with the amount of data present, so unless you know how much data you actually have, there's

bootstrap question

2010-10-19 Thread Yang
from line 396 of StorageService.java from the 0.7.0-beta2 source, it looks that when I boot up a completely new node, if there is not any keyspace defined in its storage.yaml, it would not even participate in the ring? in other words, let's say the cassandra instance currently has 10 nodes, and

Re: Read Latency

2010-10-19 Thread Aaron Morton
Hard to say why your code performs that way, it may not be creating as many objects for example strings may not be re-created just referenced. Are your creating new objects for every column returned?Bring 600,000 to 10M columns back at once is always going to take time. I think any python database

Re: bootstrap question

2010-10-19 Thread Jonathan Ellis
I think this code has had some changes since beta2. Here is what it looks like in trunk: if (DatabaseDescriptor.getNonSystemTables().size() 0) { bootstrap(token); assert !isBootstrapMode; // bootstrap will block until finished

Re: Read Latency

2010-10-19 Thread Wayne
Thanks for all of the feedback. I may not very well be doing a deep copy, so my numbers might not be accurate. I will test with writing to/from the disk to verify how long native python takes. I will also check how large the data is coming from cassandra is in size for comparison. Our high

Re: Read Latency

2010-10-19 Thread Aaron Morton
Not sure how pycassa does it, but it a simple case of...- get_slice with start="", finish="" and count = 100,001- pop the last column and store it's name- get_slice with start as the last column name, finish="" and count = 100,001repeat.AOn 20 Oct, 2010,at 03:08 PM, Wayne wav...@gmail.com