Data Model Question

2012-01-20 Thread Tamar Fraenkel
Hi! I am a newbie to Cassandra and seeking some advice regarding the data model I should use to best address my needs. For simplicity, what I want to accomplish is: I have a system that has users (potentially ~10,000 per day) and they perform actions in the system (total of ~50,000 a day). Each

Re: Unbalanced cluster with RandomPartitioner

2012-01-20 Thread Marcel Steinbach
On 19.01.2012, at 20:15, Narendra Sharma wrote: I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. With three nodes, it was also imbalanced. What I don't

delay in data deleting in cassadra

2012-01-20 Thread Shammi Jayasinghe
Hi, I am experiencing a delay in delete operations in cassandra. Its as follows. I am running a thread which contains following three steps. Step 01: Read data from column family foo[1] Step 02: Process received data eg: bar1,bar2,bar3,bar4,bar5 Step 03: Remove those processed data from

RE: Garbage collection freezes cassandra node

2012-01-20 Thread Rene Kochen
Thanks for this very helpful info. It is indeed a production site which I cannot easily upgrade. I will try the various gc knobs and post any positive results. -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller Sent: vrijdag 20 januari 2012

Re: Unbalanced cluster with RandomPartitioner

2012-01-20 Thread Marcel Steinbach
Thanks for all the responses! I found our problem: Using the Random Partitioner, the key range is from 0..2**127.When we added nodes, we generated the keys and out of convenience, we added an offset to the tokens because the move was easier like that. However, we did not execute the modulo

two dimensional slicing

2012-01-20 Thread Bryce Allen
I'm storing very large versioned lists of names, and I'd like to query a range of names within a given range of versions, which is a two dimensional slice, in a single query. This is easy to do using ByteOrderedPartitioner, but seems to require multiple (non parallel) queries and extra CFs when

Cassandra to Oracle?

2012-01-20 Thread Brian O'Neill
I can't remember if I asked this question before, but We're using Cassandra as our transactional system, and building up quite a library of map/reduce jobs that perform data quality analysis, statistics, etc. ( 100 jobs now) But... we are still struggling to provide an ad-hoc query mechanism

Re: Garbage collection freezes cassandra node

2012-01-20 Thread Peter Schuller
Thanks for this very helpful info. It is indeed a production site which I cannot easily upgrade. I will try the various gc knobs and post any positive results. *IF* your data size, or at least hot set, is small enough that you're not extremely reliant on the current size of page cache, and

Re: delay in data deleting in cassadra

2012-01-20 Thread Peter Schuller
 The problem occurs when this thread is invoked for the second time. In that step , it returns some of data that i already deleted in the third step of the previous cycle. In order to get a guarantee about a subsequent read seeing a write, you must read and write at QUORUM (or LOCAL_QUORUM if

Re: Cassandra to Oracle?

2012-01-20 Thread Zach Richardson
How much data do you think you will need ad hoc query ability for? On Fri, Jan 20, 2012 at 11:28 AM, Brian O'Neill b...@alumni.brown.eduwrote: I can't remember if I asked this question before, but We're using Cassandra as our transactional system, and building up quite a library of

Re: Cassandra to Oracle?

2012-01-20 Thread Brian O'Neill
Not terribly large ~50 million rows, each row has ~100-300 columns. But big enough that a map/reduce job takes longer than users would like. Actually maybe that is another question... Does anyone have any benchmarks running map/reduce against Cassandra? (even a simple count / or copy CF

Ad Hoc Queries

2012-01-20 Thread Brian O'Neill
Interesting articles... (changing the subject line to broaden the scope) http://codemonkeyism.com/dark-side-nosql/ http://www.reportsanywhere.com/pebble/2010/04/16/127143774.html These articulate the exact challenge we're trying to overcome. -brian On Fri, Jan 20, 2012 at 12:57 PM, Brian

Encryption related question

2012-01-20 Thread A J
Hello, I am trying to use internode encryption in Cassandra (1.0.6) for the first time. 1. Followed the steps 1 to 5 at http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore Q. In cassandra.yaml , what value goes for keystore ? I exported the

Triggers?

2012-01-20 Thread Brian O'Neill
Anyone know if there is any activity to deliver triggers? I saw this quote: http://www.readwriteweb.com/cloud/2011/10/cassandra-reaches-10-whats-nex.php Ellis says that he's just starting to think about the post-1.0 world for Cassandra. Two features do come to mind, though, that missed the boat

Re: Encryption related question

2012-01-20 Thread Vijay
I had the following writeup when i did the KS and TS creation... Hope this helps *Step 1:* Download your Organisation Cert/Cert Chain/Generate one. *Step 2:* Login to any of one machine do the following to create p12 # openssl pkcs12 -export -in cassandra-app.cert -inkey cassandra-app.key

Re: delay in data deleting in cassadra

2012-01-20 Thread Maxim Potekhin
Did you run repairs withing GC_GRACE all the time? On 1/20/2012 3:42 AM, Shammi Jayasinghe wrote: Hi, I am experiencing a delay in delete operations in cassandra. Its as follows. I am running a thread which contains following three steps. Step 01: Read data from column family foo[1]

Re: Cassandra to Oracle?

2012-01-20 Thread Maxim Potekhin
What makes you think that RDBMS will give you acceptable performance? I guess you will try to index it to death (because otherwise the ad hoc queries won't work well if at all), and at this point you may be hit with a performance penalty. It may be a good idea to interview users and build

Re: Cassandra to Oracle?

2012-01-20 Thread Mohit Anchlia
I think the problem stems when you have data in a column that you need to run adhoc query on which is not denormalized. In most cases it's difficult to predict the type of query that would be required. Another way of solving this could be to index the fields in search engine. On Fri, Jan 20,

Re: Cassandra to Oracle?

2012-01-20 Thread Maxim Potekhin
I certainly agree with difficult to predict. There is a Danish proverb, which goes it's difficult to make predictions, especially about the future. My point was that it's equally difficult with noSQL and RDBMS. The latter requires indexing to operate well, and that's a potential performance

Re: ideal cluster size

2012-01-20 Thread Maxim Potekhin
You can also scale not horizontally but diagonally, i.e. raid SSDs and have multicore CPUs. This means that you'll have same performance with less nodes, making it far easier to manage. SSDs by themselves will give you an order of magnitude improvement on I/O. On 1/19/2012 9:17 PM, Thorsten