Re: Regarding Cassandra Scalability

2010-04-19 Thread dir dir
Hi Paul, I do not have any pressure to build software using Cassandra right now. I am studying and exploring Cassandra now. Hence I have a big curiosity about Cassandra. Ok I will continue my study and wait better documentation. Dir. On Mon, Apr 19, 2010 at 1:44 PM, Paul Prescod

cassandra monitoring

2010-04-19 Thread Simeonov, Daniel
Hi, What is the preferred way of monitoring Cassandra clusters? Is Cassandra integrated with Ganglia? Thank you very much! Best regards, Daniel.

0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-19 Thread Masood Mortazavi
I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest,

RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-19 Thread Mark Jones
I'm seeing some issues like this as well, in fact, I think seeing your graphs has helped me understand the dynamics of my cluster better. Using some ballpark figures for inserting single column objects of ~500 bytes onto individual nodes(not when combined as a cluster): Node1: Inserts 12000/s

Re: Regarding Cassandra Scalability

2010-04-19 Thread Gary Dusbabek
On Sun, Apr 18, 2010 at 11:14, dir dir sikerasa...@gmail.com wrote: Hi Gary, The main reason is that the compaction operation (removing deleted values) currently requires that an entire row be read into memory. Thank you for your explanation. But I still do not understand what do you mean.

RE: Cassandra Java Client

2010-04-19 Thread Dop Sun
May I take this chance to share this link here: http://code.google.com/p/jassandra/ It currently based with Cassandra 0.6 Thrift APIs. The class ThriftCriteria and ThriftColumnFamily has direct use of Thrift API. Also, the site itself has test code, which is actually works on Jassandra

Re: Cassandra Java Client

2010-04-19 Thread Jonathan Ellis
How is Jassandra different from http://github.com/rantav/hector ? On Mon, Apr 19, 2010 at 9:21 AM, Dop Sun su...@dopsun.com wrote: May I take this chance to share this link here: http://code.google.com/p/jassandra/ It currently based with Cassandra 0.6 Thrift APIs. The class

RE: Cassandra Java Client

2010-04-19 Thread Dop Sun
Well, there are couple of points while Jassandra is created: 1. First of all, I want to create something like that is because I come from JDBC background, and familiar with Hibernate API. The ICriteria (which is created for querying) is inspired by the Criteria API from hibernate. Actually,

tcp CLOSE_WAIT bug

2010-04-19 Thread Ingram Chen
Hi all, We have observed several connections between nodes in CLOSE_WAIT after several hours of operation: At node 87: netstat -tn | grep 7000 tcp0 0 :::192.168.2.87:7000:::192.168.2.88:57625 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000

Re: tcp CLOSE_WAIT bug

2010-04-19 Thread Ingram Chen
Thank your information. We do use connection pools with thrift client and ThriftAdress is on port 9160. Those problematic connections we found are all in port 7000, which is internal communications port between nodes. I guess this related to StreamingService. On Mon, Apr 19, 2010 at 23:46,

RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-19 Thread Daniel Kluesing
We see this behavior as well with 0.6, heap usage graphs look almost identical. The GC is a noticeable bottleneck, we've tried jdku19 and jrockit vm's. It basically kills any kind of soft real time behavior. From: Masood Mortazavi [mailto:masoodmortaz...@gmail.com] Sent: Monday, April 19, 2010

Map/Reduce Cassandra Output

2010-04-19 Thread Sonny Heer
Different from the wordcount my input source is a directory, and I have the a split class and record reader defined. Different from wordcount during reduce I need to insert into Cassandra. I notice for the wordcount input it retrieves a handle on a cassandra client like this: TSocket

Re: [RELEASE] 0.6.0

2010-04-19 Thread Ted Zlatanov
On Wed, 14 Apr 2010 13:09:13 -0500 Ted Zlatanov t...@lifelogs.com wrote: TZ On Wed, 14 Apr 2010 12:23:19 -0500 Eric Evans eev...@rackspace.com wrote: EE On Wed, 2010-04-14 at 10:16 -0500, Ted Zlatanov wrote: Can it support a non-root user through /etc/default/cassandra? I've been patching

Modelling assets and user permissions

2010-04-19 Thread tsuraan
Suppose I have a CF that holds some sort of assets that some users of my program have access to, and that some do not. In SQL-ish terms it would look something like this: TABLE Assets ( asset_id serial primary key, ... ); TABLE Users ( user_id serial primary key, user_name text );

Re: Cassandra Java Client

2010-04-19 Thread Ran Tavory
Hi Dop, you may want to look at hector as a low level cassandra client on which you build jassandra, adding hibernate style magic etc like other ppl have done with ORM layers on top of it. Hector's main features include extensive jmx counters, failover and connection pooling. It's available for

Re: [RELEASE] 0.6.0

2010-04-19 Thread Eric Evans
On Mon, 2010-04-19 at 12:02 -0500, Ted Zlatanov wrote: EE It's the first item on debian/TODO, but, you know, patches welcome and EE all that. TZ The appended patch has been sufficient for me. Eric, do you need me to open a ticket for this, too, or is what I posted sufficient? Feel

PropertyFileEndPointSnitch

2010-04-19 Thread Erik Holstad
When building the PropertyFileEndPointSnitch into the jar cassandra-propsnitch.jar the files in the jar end up on src/java/org/apache/cassandra/locator/PropertyFileEndPointSnitch.class instead of org/apache/cassandra/locator/PropertyFileEndPointSnitch.class. Am I doing something wrong , is this

RE: Map/Reduce Cassandra Output

2010-04-19 Thread Stu Hood
If you used that snippet of code, all connections would go through the same seed: the input code does additional work to determine which nodes are holding particular key ranges, and then connects directly. For outputting from Hadoop to Cassandra, you may want to consider using a Java

restore with snapshot

2010-04-19 Thread Lee Parker
I am working on finalizing our backup and restore procedures for a cassandra cluster running on EC2. I understand based on the wiki that in order to replace a single node, I don't actually need to put data on that node. I just need to bootstrap the new node into the cluster and it will get data

Re: Data model question - column names sort

2010-04-19 Thread Jonathan Ellis
On Thu, Apr 15, 2010 at 6:01 PM, Sonny Heer sonnyh...@gmail.com wrote: Need a way to have two different types of indexes. Key: aTextKey ColumnName: aTextColumnName:55 Value: Key: aTextKey ColumnName: 55:aTextColumnName Value: All the valuable information is stored in the column name

RE: Cassandra Java Client

2010-04-19 Thread Dop Sun
Hi Ran: Yep, looks like there is possibility that I can add dependencies to hector, and enhance the functionality to Jassandra. I would take this chance to extend the discussion about “xxx Client for Cassandra” a little bit: In short, Cassandra may need a kind of sub-project to

Re: Clarification on Ring operations in Cassandra 0.5.1

2010-04-19 Thread Jonathan Ellis
On Thu, Apr 15, 2010 at 6:10 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: 1) shutdown cassandra on instance I want to replace 2) create a new instance, start cassandra with AutoBootstrap = true 3) run nodeprobe removetoken against the token of the instance I am   replacing Then

Re: effective modeling for fixed limit columns

2010-04-19 Thread Jonathan Ellis
Limiting by number of columns in a row will perform very poorly. Limiting by the time a column has existed can perform quite well, and was added by Sylvain for 0.7 in https://issues.apache.org/jira/browse/CASSANDRA-699 On Fri, Apr 16, 2010 at 1:50 PM, Chris Shorrock ch...@shorrockin.com wrote:

Re: why read operation use so much of memory?

2010-04-19 Thread Jonathan Ellis
(Moving to users@ list.) Like any Java server, Cassandra will use as much memory in its heap as you allow it to. You can request a GC from jconsole to see what its approximate real working set it. http://wiki.apache.org/cassandra/SSTableMemtable explains why reads are slower than writes. You

Re: cassandra monitoring

2010-04-19 Thread Jonathan Ellis
Anything that can consume JMX. On Mon, Apr 19, 2010 at 5:34 AM, Simeonov, Daniel daniel.simeo...@sap.com wrote: Hi,    What is the preferred way of monitoring Cassandra clusters? Is Cassandra integrated with Ganglia? Thank you very much! Best regards, Daniel.

Re: tcp CLOSE_WAIT bug

2010-04-19 Thread Jonathan Ellis
Is this after doing a bootstrap or other streaming operation? Or did a node go down? The internal sockets are supposed to remain open, otherwise. On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen ingramc...@gmail.com wrote: Thank your information. We do use connection pools with thrift client

Re: 0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-19 Thread Jonathan Ellis
It's hard to tell from those slides, but it looks like the slowdown doesn't hit until after several GCs. Perhaps this is compaction kicking in, not GCs? Definitely the extra I/O + CPU load from compaction will cause a drop in throughput. On Mon, Apr 19, 2010 at 6:14 AM, Masood Mortazavi

Re: Map/Reduce Cassandra Output

2010-04-19 Thread Sonny Heer
Thanks Stu. I will take a look at Hector. Do you know where the input code does the additional work? On Mon, Apr 19, 2010 at 11:20 AM, Stu Hood stu.h...@rackspace.com wrote: If you used that snippet of code, all connections would go through the same seed: the input code does additional

Re: busy thread on IncomingStreamReader ?

2010-04-19 Thread Rob Coli
On 4/17/10 6:47 PM, Ingram Chen wrote: after upgrading jdk from 1.6.0_16 to 1.6.0_20, the problem solved. FYI, this sounds like it might be : https://issues.apache.org/jira/browse/CASSANDRA-896 http://bugs.sun.com/view_bug.do;jsessionid=60c39aa55d3666c0c84dd70eb826?bug_id=6805775 Where

get_range_slices in hector

2010-04-19 Thread Chris Dean
Is there a version of hector that has an interface to get_range_slices ? or should I provide a patch? Cheers, Chris Dean

Re: Help with MapReduce

2010-04-19 Thread Jesse McConnell
most likely means that the count() operation is taking too long for the configured RPCTimeout counts get unreliable after a certain number of columns under a key in my experience jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Mon, Apr 19, 2010 at 19:12, Joost Ouwerkerk

Re: Help with MapReduce

2010-04-19 Thread Jesse McConnell
err not count in your case, but same symptom, cassandra can't return the answer to your query in the configured rpctimeout time cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Mon, Apr 19, 2010 at 19:40, Jesse McConnell jesse.mcconn...@gmail.com wrote: most likely means that

Re: Help with MapReduce

2010-04-19 Thread Joost Ouwerkerk
hmm, might be too much data. In the case of a supercolumn, how do I specify which sub-columns to retrieve? Or can I only retrieve entire supercolumns? On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis jbel...@gmail.com wrote: Possibly you are asking it to retrieve too many columns per row.

Re: Help with MapReduce

2010-04-19 Thread Jonathan Ellis
the latter, if you are retrieving multiple supercolumns. On Mon, Apr 19, 2010 at 8:10 PM, Joost Ouwerkerk jo...@openplaces.org wrote: hmm, might be too much data.  In the case of a supercolumn, how do I specify which sub-columns to retrieve?  Or can I only retrieve entire supercolumns? On Mon,

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-19 Thread Schubert Zhang
Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney bluefl...@gmail.com wrote: Hi I am doing a insert test with 9 nodes, the command: stress.py -n 10 -t 1000 -c 10 -o insert

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-19 Thread Schubert Zhang
Seems you should configure larger jvm-heap. On Tue, Apr 20, 2010 at 9:32 AM, Schubert Zhang zson...@gmail.com wrote: Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney

Re: Help with MapReduce

2010-04-19 Thread Joost Ouwerkerk
And when retrieving only one supercolumn? Can I further specify which subcolumns to retrieve? On Mon, Apr 19, 2010 at 9:29 PM, Jonathan Ellis jbel...@gmail.com wrote: the latter, if you are retrieving multiple supercolumns. On Mon, Apr 19, 2010 at 8:10 PM, Joost Ouwerkerk

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-19 Thread Brandon Williams
On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang zson...@gmail.com wrote: 2. Reject the request when be short of resource, instead of throws OOME and exit (crash). Right, that is the crux of the problem It will be addressed here: https://issues.apache.org/jira/browse/CASSANDRA-685 -Brandon

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-19 Thread Ken Sandney
I am just running Cassandra on normal boxes, and grants 1GB of total 2GB to Cassandra is reasonable I think. Can this problem be resolved by tuning the thresholds described on this pagehttp://wiki.apache.org/cassandra/MemtableThresholds , or just be waiting for the 0.7 release as Brandon

Re: busy thread on IncomingStreamReader ?

2010-04-19 Thread Ingram Chen
Ouch ! I talk too early ! We still suffer same problems after upgrade to 1.6.0_20. In JMX StreamingService, I see several wired incoming/outgoing transfer: In Host A, 192.168.2.87 StreamingService Status: Done with transfer to /192.168.2.88 StreamingService StreamSources: [/192.168.2.88]

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-19 Thread Jonathan Ellis
Ken, I linked you to the FAQ answering your problem in the first reply you got. Please don't hijack my replies to other people; that's rude. On Mon, Apr 19, 2010 at 9:32 PM, Ken Sandney bluefl...@gmail.com wrote: I am just running Cassandra on normal boxes, and grants 1GB of total 2GB to

Re: Clarification on Ring operations in Cassandra 0.5.1

2010-04-19 Thread Schubert Zhang
You can have a look at org.apache.cassandra.service.StorageService public void initServer() throws IOException 1. If AutoBootstrap=false, it means the the node is bootstaped (not a new node) Usually, the first new node is set false. (1) check the system table to find the saved token, if found

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-19 Thread Ken Sandney
Sorry I just don't know how to resolve this :) On Tue, Apr 20, 2010 at 10:37 AM, Jonathan Ellis jbel...@gmail.com wrote: Ken, I linked you to the FAQ answering your problem in the first reply you got. Please don't hijack my replies to other people; that's rude. On Mon, Apr 19, 2010 at 9:32

Re: busy thread on IncomingStreamReader ?

2010-04-19 Thread Jonathan Ellis
I don't see csArena-tmp-6-Index.db in the incoming files list. If it's not there, that means that it did break out of that while loop. Did you check both logs for exceptions? On Mon, Apr 19, 2010 at 9:36 PM, Ingram Chen ingramc...@gmail.com wrote: Ouch ! I talk too early ! We still suffer

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-19 Thread Schubert Zhang
Jonathan, Thanks. Yes, the scale of GC grath is different from the throughput one. I will do more check and tuning in our next test immediately. On Tue, Apr 20, 2010 at 10:39 AM, Ken Sandney bluefl...@gmail.com wrote: Sorry I just don't know how to resolve this :) On Tue, Apr 20, 2010 at

Re: tcp CLOSE_WAIT bug

2010-04-19 Thread Ingram Chen
this happened after several hours of operations and both nodes are started at the same time (clean start without any data). so it might not relate to Bootstrap. In system.log I do not see any logs like xxx node dead or exceptions. and both nodes in test are alive. they serve read/write well, too.

Re: 0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-19 Thread Schubert Zhang
Since the scale of GC graph in the slides is different from the throughput ones. I will do another test for this issue. Thanks for your advices, Masood and Jonathan. --- Here, i just post my cossandra.in.sh. JVM_OPTS= \ -ea \ -Xms128M \ -Xmx6G \

Re: why read operation use so much of memory?

2010-04-19 Thread Brandon Williams
On Mon, Apr 19, 2010 at 10:28 PM, dir dir sikerasa...@gmail.com wrote: Hi Jonathan, I see this page (http://wiki.apache.org/cassandra/SSTableMemtable) does not exist yet. I think he meant: http://wiki.apache.org/cassandra/MemtableSSTable -Brandon

Re: Help with MapReduce

2010-04-19 Thread Joost Ouwerkerk
Ok. This should be ok for now, although not optimal for some jobs. Next issue is node stability during the insert job. The stacktrace below occured on several nodes while inserting 10 million rows. We're running on 4G machines, 1G of which is allocated to cassandra. What's the best config to

Re: Help with MapReduce

2010-04-19 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts On Tue, Apr 20, 2010 at 12:48 AM, Joost Ouwerkerk jo...@openplaces.org wrote: Ok.  This should be ok for now, although not optimal for some jobs. Next issue is node stability during the insert job.  The stacktrace below