Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Tharindu Mathew
Thanks Jeremy. These will be really useful. On Wed, Aug 31, 2011 at 12:12 AM, Jeremy Hanna wrote: > I've tried to help out with some UDFs and references that help with our use > case: https://github.com/jeromatron/pygmalion/ > > There are some brisk docs on pig as well that might be helpful: > ht

Re: Cassandra upgrading

2011-08-30 Thread Jonathan Ellis
NEWS.txt covers upgrading. [moving to user list.] On Tue, Aug 30, 2011 at 8:47 PM, 邓志远 wrote: > Hi All: >  Now i use Cassandra0.7.5 in the cluster .how to upgrade to Cassandra0.8.4? > there are a large data in Cassandra0.7.5. Can you tell me how to upgrade ? > > Thanks! -- Jonathan Ellis Pr

Re: Disk usage for CommitLog

2011-08-30 Thread Derek Andree
Okay I figured this out, the default for MemtableFlushAfterMins is not 60 minutes like some here said and what datastax docs say (http://www.datastax.com/docs/0.8/configuration/storage_configuration), it's 24 hours (1440). I changed them all to 60 for every CF and now commit logs only hang aro

Re: Disk usage for CommitLog

2011-08-30 Thread Derek Andree
> > 86GB in commitlog and 42GB in data > > Whoa, that seems really wrong, particularly given your data spans 13 months. > Have you changed any of the default cassandra.yaml setting? What is the > maximum memtable_flush_after across all your CFs? Any warnings/errors in the > Cassandra log? >

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
Sorry - misread your earlier email. I would login to IRC and ask in #cassandra. I would think given the nature of nanotime you'll run into harder to track down problems, but it may be fine. On Aug 30, 2011, at 2:06 PM, Jiang Chen wrote: > Do you see any problem with my approach to derive the

Re: Updates lost

2011-08-30 Thread Jiang Chen
Do you see any problem with my approach to derive the current time in nano seconds though? On Tue, Aug 30, 2011 at 2:39 PM, Jeremy Hanna wrote: > Yes - the reason why internally Cassandra uses milliseconds * 1000 is because > System.nanoTime javadoc says "This method can only be used to measure

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
I've tried to help out with some UDFs and references that help with our use case: https://github.com/jeromatron/pygmalion/ There are some brisk docs on pig as well that might be helpful: http://www.datastax.com/docs/0.8/brisk/about_pig On Aug 30, 2011, at 1:30 PM, Tharindu Mathew wrote: > Than

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
Yes - the reason why internally Cassandra uses milliseconds * 1000 is because System.nanoTime javadoc says "This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time." http://download.oracle.com/javase/6/docs/api/java/lang/System.htm

Re: Updates lost

2011-08-30 Thread Jiang Chen
Indeed it's microseconds. We are talking about how to achieve the precision of microseconds. One way is System.currentTimeInMillis() * 1000. It's only precise to milliseconds. If there are more than one update in the same millisecond, the second one may be lost. That's my original problem. The oth

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Tharindu Mathew
Thanks Jeremy for your response. That gives me some encouragement, that I might be on that right track. I think I need to try out more stuff before coming to a conclusion on Brisk. For Pig operations over Cassandra, I only could find http://svn.apache.org/repos/asf/cassandra/trunk/contrib/pig. Ar

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
Ed- you're right - milliseconds * 1000. That's right. The other stuff about nano time still stands, but you're right - microseconds. Sorry about that. On Aug 30, 2011, at 1:20 PM, Edward Capriolo wrote: > > > On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna > wrote: > I would not use nano ti

Re: Updates lost

2011-08-30 Thread Edward Capriolo
On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna wrote: > I would not use nano time with cassandra. Internally and throughout the > clients, milliseconds is pretty much a standard. You can get into trouble > because when comparing nanoseconds with milliseconds as long numbers, > nanoseconds will al

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
I would not use nano time with cassandra. Internally and throughout the clients, milliseconds is pretty much a standard. You can get into trouble because when comparing nanoseconds with milliseconds as long numbers, nanoseconds will always win. That bit us a while back when we deleted someth

Re: Updates lost

2011-08-30 Thread Jiang Chen
Looks like the theory is correct for the java case at least. The default timestamp precision of Pelops is millisecond. Hence the problem as explained by Peter. Once I supplied timestamps precise to microsecond (using System.nanoTime()), the problem went away. I previously stated that sleeping for

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to potentially move to Brisk because of the simplicity of operations there. Not sure what you mean about the true power of Hadoop. In my mind the true power of Hadoop is the ability to parallelize jobs and send each task to wher

Re: Solandra error - spaces in search

2011-08-30 Thread Ashley Martens
Could you reproduce it?

Re: Querying a composite key with cassandra-cli

2011-08-30 Thread Anthony Ikeda
No problems. Anthony On Tue, Aug 30, 2011 at 9:31 AM, Jonathan Ellis wrote: > Sounds like a bug. Can you create a ticket on > https://issues.apache.org/jira/browse/CASSANDRA ? > > On Tue, Aug 30, 2011 at 11:28 AM, Anthony Ikeda > wrote: > > One thing I have noticed is that when you query via

Re: Querying a composite key with cassandra-cli

2011-08-30 Thread Jonathan Ellis
Sounds like a bug. Can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA ? On Tue, Aug 30, 2011 at 11:28 AM, Anthony Ikeda wrote: > One thing I have noticed is that when you query via the cli with an invalid > "assume" you no longer get the MarshalException beyond 0.8.1, it j

Re: Querying a composite key with cassandra-cli

2011-08-30 Thread Anthony Ikeda
One thing I have noticed is that when you query via the cli with an invalid "assume" you no longer get the MarshalException beyond 0.8.1, it just states "null" Any chance this could be more user friendly? It kind of stumped me when I switched to 0.8.4. Anthony On Mon, Aug 29, 2011 at 2:35 PM, A

Re: Cassandra 0.8 & schematool

2011-08-30 Thread Jenny
Thank you. problem solved 在 2011-8-30,下午9:12, Jonathan Ellis 写道: > The right way to do this is to use a script of "create" commands: > > bin/cassandra-cli -f my-schema-creation-script > > On Tue, Aug 30, 2011 at 1:00 AM, Jenny wrote: >> Hi >> I notice that schematool was removed from the relea

RE: Disk usage for CommitLog

2011-08-30 Thread Dan Hendry
> 86GB in commitlog and 42GB in data Whoa, that seems really wrong, particularly given your data spans 13 months. Have you changed any of the default cassandra.yaml setting? What is the maximum memtable_flush_after across all your CFs? Any warnings/errors in the Cassandra log? > Out of curi

Re: Updates lost

2011-08-30 Thread Jiang Chen
It's a single node. Thanks for the theory. I suspect part of it may still be right. Will dig more. On Tue, Aug 30, 2011 at 9:50 AM, Peter Schuller wrote: >> The problem still happens with very high probability even when it >> pauses for 5 milliseconds at every loop. If Pycassa uses microseconds >

Re: Updates lost

2011-08-30 Thread Peter Schuller
> The problem still happens with very high probability even when it > pauses for 5 milliseconds at every loop. If Pycassa uses microseconds > it can't be the cause. Also I have the same problem with a Java client > using Pelops. You connect to localhost, but is that a single node or part of a clus

Re: Updates lost

2011-08-30 Thread Jiang Chen
The problem still happens with very high probability even when it pauses for 5 milliseconds at every loop. If Pycassa uses microseconds it can't be the cause. Also I have the same problem with a Java client using Pelops. On Tue, Aug 30, 2011 at 12:14 AM, Tyler Hobbs wrote: > > On Mon, Aug 29, 201

Re: asynchronous writes (aka consistency = 0)

2011-08-30 Thread Jonathan Ellis
On Tue, Aug 30, 2011 at 6:54 AM, Sylvain Lebresne wrote: > If you don't want to wait for the write to be applied by Cassandra before > doing something else, then you can do that easily[1] client side. Right. Also consider that if you did have local replicas in each DC you could get low-latency r

Re: Cassandra 0.8 & schematool

2011-08-30 Thread Jonathan Ellis
The right way to do this is to use a script of "create" commands: bin/cassandra-cli -f my-schema-creation-script On Tue, Aug 30, 2011 at 1:00 AM, Jenny wrote: > Hi > I notice that schematool was removed from the release of Cassandra 0.8. I > would like to know the reason of doing that and  how i

Re: Cassandra 0.8 & schematool

2011-08-30 Thread B R
Look for a file called schema-sample.txt under the conf folder. You'll find a sample schema and the command to load the same. On Tue, Aug 30, 2011 at 11:30 AM, Jenny wrote: > Hi > > I notice that schematool was removed from the release of Cassandra 0.8. I > would like to know the reason of doin

Re: asynchronous writes (aka consistency = 0)

2011-08-30 Thread Sylvain Lebresne
There used to be a ZERO consistency level but it was removed because it was harming more people than it was helping. If what you want is very high availability, i.e. being able to write even if the sole replica (in your RF=1 case) is down, then what you want to use is CL ANY. If you don't want to

asynchronous writes (aka consistency = 0)

2011-08-30 Thread Eric tamme
Is there any mechanism that would allow me to write to Cassandra with no blocking at all? I spent a long time figuring out a problem I encountered with one node in each datacenter: LA, and NY using SS RF=1 and write consistency 1. My row keys are -mm-dd-h so basically for every hour a row woul