Re: Pycassa vs YCSB results.

2013-02-05 Thread aaron morton
The first thing I noticed is your script uses python threading library, which is hampered by the Global Interpreter Lock http://docs.python.org/2/library/threading.html You don't really have multiple threads running in parallel, try using the multiprocessor library. Cheers -

Re: Pycassa vs YCSB results.

2013-02-05 Thread Tim Wintle
On Tue, 2013-02-05 at 21:38 +1300, aaron morton wrote: The first thing I noticed is your script uses python threading library, which is hampered by the Global Interpreter Lock http://docs.python.org/2/library/threading.html You don't really have multiple threads running in parallel, try

Re: neither 'nodetool repair' nor 'hinted hanoff/read repair' work for secondary indexes

2013-02-05 Thread Alexei Bakanov
Made a d-test for easier reproduction and created https://issues.apache.org/jira/browse/CASSANDRA-5223 On 1 February 2013 15:14, Alexei Bakanov russ...@gmail.com wrote: Hi again, Once started playing with CCM it's hard to stop, such a great tool. My issue with secondary indexes is following:

Why do Datastax docs recommend Java 6?

2013-02-05 Thread Baron Schwartz
The Datastax docs repeatedly say (e.g. http://www.datastax.com/docs/1.2/install/install_jre) that Java 7 is not recommended, but they don't say why. It would be helpful to know this. Does anyone know? The same documentation is referenced from the Cassandra wiki, for example,

Re: Why do Datastax docs recommend Java 6?

2013-02-05 Thread Michael Kjellman
There have been tons of threads/convos on this. In the early days of Java 7 it was pretty unstable and there was pretty much no convincing reason to use Java 7 over Java 6. Now that Java 7 has stabilized and Java 6 is EOL it's a reasonable decision to use Java 7 and we do it in production with

where is the UTF8Comparator code for cassandra

2013-02-05 Thread Hiller, Dean
Our in-memory version has a slight different we just found out about that we want to fix in the case where we are using UTF8 sorting and our column name Is String.long.String. With our in-memory, two different longs are sometimes generating the same string causing a clash and overwriting a

Re: where is the UTF8Comparator code for cassandra

2013-02-05 Thread Edward Capriolo
The comparator should be defined in the UTF8Type class. On Tue, Feb 5, 2013 at 10:46 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Our in-memory version has a slight different we just found out about that we want to fix in the case where we are using UTF8 sorting and our column name Is

Clarification on num_tokens setting

2013-02-05 Thread Baron Schwartz
As I understand the num_tokens setting, it makes Cassandra do the following pseudocode when a new node is added: for 1...num_tokens do my_token = rand(0, 2^128-1) next_token = min(tokens in cluster where token my_token) my_range = (my_token, next_token - 1) done Now the new node owns

Re: where is the UTF8Comparator code for cassandra

2013-02-05 Thread Hiller, Dean
Thanks, I misread the code Šgot it now. Thanks, Dean On 2/5/13 9:22 AM, Edward Capriolo edlinuxg...@gmail.com wrote: The comparator should be defined in the UTF8Type class. On Tue, Feb 5, 2013 at 10:46 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Our in-memory version has a slight different

Re: Pycassa vs YCSB results.

2013-02-05 Thread aaron morton
The simple thing to do would be use the multiprocessing package and eliminate all shared state. On a multicore box python threads can run on different cores and battle over obtaining the GIL. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton

Re: Pycassa vs YCSB results.

2013-02-05 Thread Pradeep Kumar Mantha
Thanks, I will use the multiprocessing package, since I need to scale it to multiple nodes. I will also try to optimize the function calls and use global variables. Thank you very much for your help. On Tue, Feb 5, 2013 at 9:12 AM, aaron morton aa...@thelastpickle.comwrote: The simple thing

Pycassa KEY read error.

2013-02-05 Thread Pradeep Kumar Mantha
Hi, I am trying to read fields using pycassa api. But seems like I am missing something and not getting expected results. pool = pycassa.ConnectionPool('usertable', server_list=['1.1.1.1']) cf = pycassa.ColumnFamily(pool, 'data') cf.get('7573657232323132333035343936323937363138343433')

Operation Consideration with Counter Column Families

2013-02-05 Thread Drew Kutcharian
Hey Guys, Are there any specific operational considerations one should make when using counter columns families? How are counter column families stored on disk? How do they effect compaction? -- Drew

Re: Pycassa vs YCSB results.

2013-02-05 Thread Edward Capriolo
Without stating the obvious, if you are interested in scale, then why pick python?. I did want to point out that YCSB is not even the gold standard for benchmarks using cassandra's stress you can get more ops per sec then YCSB. On Tue, Feb 5, 2013 at 1:13 PM, Pradeep Kumar Mantha

Re: Pycassa KEY read error.

2013-02-05 Thread Tyler Hobbs
I answered this here: https://groups.google.com/forum/?fromgroups=#!topic/pycassa-discuss/9-GzSPEJqPU You may want to check your subscription to the pycassa mailing list; it seems like you're not getting my responses for some reason. On Tue, Feb 5, 2013 at 12:20 PM, Pradeep Kumar Mantha

unbalanced ring

2013-02-05 Thread Stephen.M.Thompson
So I have three nodes in a ring in one data center. My configuration has num_tokens: 256 set and initial_token commented out. When I look at the ring, it shows me all of the token ranges of course, and basically identical data for each range on each node. Here is the Cliff's Notes version of

Re: Pycassa KEY read error.

2013-02-05 Thread Pradeep Kumar Mantha
Hi Tyler, Thanks, I didn't get your response regarding this post on pycassa group. I will check my subscription. thanks pradeep On Tue, Feb 5, 2013 at 11:23 AM, Tyler Hobbs ty...@datastax.com wrote: I answered this here:

Re: Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows

2013-02-05 Thread Alexei Bakanov
I tried to run with tracing, but it says 'Scanned 0 rows and matched 0'. I found existing issue on this bug https://issues.apache.org/jira/browse/CASSANDRA-4973 I made a d-test for reproducing it and attached to the ticket. Alexei On 2 February 2013 23:00, aaron morton aa...@thelastpickle.com

Re: Operation Consideration with Counter Column Families

2013-02-05 Thread aaron morton
Are there any specific operational considerations one should make when using counter columns families? Performance, as they incur a read and a write. There were some issues with overcounts in log replay (see the changes.txt). How are counter column families stored on disk? Same as

Re: unbalanced ring

2013-02-05 Thread aaron morton
Use nodetool status with vnodes http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes The different load can be caused by rack affinity, are all the nodes in the same rack ? Another simple check is have you created some very big rows? Cheers - Aaron Morton

Re: Clarification on num_tokens setting

2013-02-05 Thread aaron morton
With N nodes, the ring is divided into N*num_tokens. Correct? There is always num_tokens tokens in the ring. Each node has (num_tokens / N) * RF ranges on it. so the ranges of keys are not uniform, although with enough nodes in the cluster there probably won't be any really large ranges.

Re: Clarification on num_tokens setting

2013-02-05 Thread Andrey Ilinykh
On Tue, Feb 5, 2013 at 12:42 PM, aaron morton aa...@thelastpickle.comwrote: With N nodes, the ring is divided into N*num_tokens. Correct? There is always num_tokens tokens in the ring. Each node has (num_tokens / N) * RF ranges on it. That means every node should have the same num_token

RE: Why do Datastax docs recommend Java 6?

2013-02-05 Thread Ilya Grebnov
Also, what is particular reason to use Oracle JDK over Open JDK? Sorry, I could not find this information online. Thanks, Ilya From: Michael Kjellman [mailto:mkjell...@barracuda.com] Sent: Tuesday, February 05, 2013 7:29 AM To: user@cassandra.apache.org Subject: Re: Why do Datastax docs

Re: Operation Consideration with Counter Column Families

2013-02-05 Thread Drew Kutcharian
Thanks Aaron, so will there only be one value for each counter column per sstable just like regular columns? For some reason I was under the impression that Cassandra keeps a log of all the increments not the actual value. On Feb 5, 2013, at 12:36 PM, aaron morton aa...@thelastpickle.com

Re: Clarification on num_tokens setting

2013-02-05 Thread aaron morton
There is always num_tokens tokens in the ring. I got this wrong. Each node *does* have num_tokens tokens. With N nodes, the ring is divided into N*num_tokens. Correct? Yes In other words it is cluster wide parameter. Correct? Yes. Cheers - Aaron Morton Freelance

Re: Clarification on num_tokens setting

2013-02-05 Thread Eric Evans
On Tue, Feb 5, 2013 at 4:19 PM, aaron morton aa...@thelastpickle.com wrote: There is always num_tokens tokens in the ring. I got this wrong. Each node *does* have num_tokens tokens. With N nodes, the ring is divided into N*num_tokens. Correct? Yes In other words it is cluster wide

Re: Why do Datastax docs recommend Java 6?

2013-02-05 Thread jeffpk
Oracle now owns the sun hotspot team, which is inarguably the highest powered java vm team in the world. Its still really the epicenter of all java vm development. Sent from my Verizon Wireless BlackBerry -Original Message- From: Ilya Grebnov i...@metricshub.com Date: Tue, 5 Feb 2013

Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-05 Thread Takenori Sato(Cloudian)
Hi, We found this issue is specific to 1.0.1 through 1.0.8, which was fixed at 1.0.9. https://issues.apache.org/jira/browse/CASSANDRA-4023 So by upgrading, we will see a reasonable performnace no matter how large row we have! Thanks, Takenori (2013/02/05 2:29), aaron morton wrote: Yes,

DataModel Question

2013-02-05 Thread Kanwar Sangha
Hi - We are designing a Cassandra based storage for the following use cases- *Store SMS messages *Store MMS messages *Store Chat history What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines .. Row-Key :

RE: DataModel Question

2013-02-05 Thread Rishabh Agrawal
Hello, Composite keys are always good and model looks clean to me. Run pilot with around 10 GB or more data and compare it with RDBMS and make changes accordingly. Thanks and Regards Rishabh Agrawal From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: Wednesday, February 06, 2013 7:10 AM

Re: DataModel Question

2013-02-05 Thread Vivek Mishra
Avoid super columns. If you need Sorted, wide rows then go for Composite columns. -Vivek On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – We are designing a Cassandra based storage for the following use cases- ** ** **·**Store SMS messages

Re: DataModel Question

2013-02-05 Thread Tamar Fraenkel
Hi! I have couple of questions regarding your model: 1. What Cassandra version are you using? I am still working with 1.0 and this seems to make sense, but 1.2 gives you much more power I think. 2. Maybe I don't understand your model, but I think you need DynamicComposite columns, as user

RE: Why do Datastax docs recommend Java 6?

2013-02-05 Thread Viktor Jevdokimov
I would prefer Oracle to own an Azul's Zing JVM over any other (GC) to provide it for free for anyone :) Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio