Re: NPE in apache cassandra
the config format changed. now you need to specify the cfname as an attribute: -Jonathan On Wed, Mar 11, 2009 at 3:52 PM, Jiansheng Huang wrote: > > > -- Forwarded message -- > From: Jiansheng Huang > Date: Wed, Mar 11, 2009 at 2:49 PM > Subject: NPE in apache cassandra > To: cassandra-user@incubator.apache.org, Avinash Lakshman > , Prashant Malik > Cc: agu...@rocketfuelinc.com > > > Hi folks, I checked out the new code from apache and compiled it. When I > start up the server with a clean installation base (i.e., without using any > system/user data from previous installation), > I got the following. > > UNCAUGHT EXCEPTION IN main() > java.lang.NullPointerException > at java.io.DataOutputStream.writeUTF(DataOutputStream.java:347) > at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323) > at > org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:254) > at > org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:244) > at org.apache.cassandra.db.Table$TableMetadata.apply(Table.java:209) > at org.apache.cassandra.db.DBManager.storeMetadata(DBManager.java:150) > at org.apache.cassandra.db.DBManager.(DBManager.java:102) > at org.apache.cassandra.db.DBManager.instance(DBManager.java:61) > at > org.apache.cassandra.service.StorageService.start(StorageService.java:465) > at > org.apache.cassandra.service.CassandraServer.start(CassandraServer.java:110) > at > org.apache.cassandra.service.CassandraServer.main(CassandraServer.java:1078) > Disconnected from the target VM, address: '127.0.0.1:45693', transport: > 'socket' > > I did some debugging and found that in the following code, the first entry > in cfNames is always null. Is it safe to say that if cfName is null, then we > don't want to do the writings? > > public void serialize(TableMetadata tmetadata, DataOutputStream dos) throws > IOException > { > int size = tmetadata.cfIdMap_.size(); > dos.writeInt(size); > Set cfNames = tmetadata.cfIdMap_.keySet(); > > for ( String cfName : cfNames ) > { > dos.writeUTF(cfName); > dos.writeInt( tmetadata.cfIdMap_.get(cfName).intValue() ); > dos.writeUTF(tmetadata.getColumnFamilyType(cfName)); > } > } > > A related question I have is what's the procedure for us to check in code? I > have made some changes for adding latency counters in the server and > exposing them through http. Would be good to check in the changes and minor > fixes so that I don't have to risk of losing them ... > > Thanks, > > Jiansheng > > > >
Re: NPE in apache cassandra
Also, it will not work AT ALL with data from the old version. you need to start fresh. -Jonathan On Wed, Mar 11, 2009 at 4:00 PM, Jonathan Ellis wrote: > the config format changed. now you need to specify the cfname as an > attribute: > > > > -Jonathan > > On Wed, Mar 11, 2009 at 3:52 PM, Jiansheng Huang > wrote: >> >> >> -- Forwarded message -- >> From: Jiansheng Huang >> Date: Wed, Mar 11, 2009 at 2:49 PM >> Subject: NPE in apache cassandra >> To: cassandra-user@incubator.apache.org, Avinash Lakshman >> , Prashant Malik >> Cc: agu...@rocketfuelinc.com >> >> >> Hi folks, I checked out the new code from apache and compiled it. When I >> start up the server with a clean installation base (i.e., without using any >> system/user data from previous installation), >> I got the following. >> >> UNCAUGHT EXCEPTION IN main() >> java.lang.NullPointerException >> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:347) >> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323) >> at >> org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:254) >> at >> org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:244) >> at org.apache.cassandra.db.Table$TableMetadata.apply(Table.java:209) >> at org.apache.cassandra.db.DBManager.storeMetadata(DBManager.java:150) >> at org.apache.cassandra.db.DBManager.(DBManager.java:102) >> at org.apache.cassandra.db.DBManager.instance(DBManager.java:61) >> at >> org.apache.cassandra.service.StorageService.start(StorageService.java:465) >> at >> org.apache.cassandra.service.CassandraServer.start(CassandraServer.java:110) >> at >> org.apache.cassandra.service.CassandraServer.main(CassandraServer.java:1078) >> Disconnected from the target VM, address: '127.0.0.1:45693', transport: >> 'socket' >> >> I did some debugging and found that in the following code, the first entry >> in cfNames is always null. Is it safe to say that if cfName is null, then we >> don't want to do the writings? >> >> public void serialize(TableMetadata tmetadata, DataOutputStream dos) throws >> IOException >> { >> int size = tmetadata.cfIdMap_.size(); >> dos.writeInt(size); >> Set cfNames = tmetadata.cfIdMap_.keySet(); >> >> for ( String cfName : cfNames ) >> { >> dos.writeUTF(cfName); >> dos.writeInt( tmetadata.cfIdMap_.get(cfName).intValue() ); >> dos.writeUTF(tmetadata.getColumnFamilyType(cfName)); >> } >> } >> >> A related question I have is what's the procedure for us to check in code? I >> have made some changes for adding latency counters in the server and >> exposing them through http. Would be good to check in the changes and minor >> fixes so that I don't have to risk of losing them ... >> >> Thanks, >> >> Jiansheng >> >> >> >> >
Re: OPHF vs. Random
Use Random for now. The OPHF is the same as the old one, i.e., not actually OP. :) I'm pretty convinced at this point that it's impossible to have an order-preserving hash that doesn't either (a) impose a relatively short key length past which no partitioning is done (i.e., all keys w/ the same prefix go to the same node) or is (b) very sensitive to key length such that the keys with a given length N will not be evenly distributed across all nodes. Or both. So I am working on migrating from pluggable hash functions key -> BigInteger, to pluggable partitioning algorithms key -> EndPoint. Without the requirement to transform to a numeric value first I think I can create an order-preserving distribution that performs well. (I need this for range queries.) So far I have just laid the foundation, here: https://issues.apache.org/jira/browse/CASSANDRA-3 I hope to finish the rest tomorrow. -Jonathan On Wed, Mar 11, 2009 at 5:28 PM, Jiansheng Huang wrote: > > Which one is better to use? The default is Random. > > In Avinash's annoucement mail, we have > (1) Ability to switch between a random hash and a OPHF. We still have the > old (wrong) OPHF in there. I will update it to the corrected one tomorrow. > > Is correct OPHF in? Thanks. >
Re: OPHF vs. Random
The order-preserving partitioner code (not hash-based anymore) is up now at https://issues.apache.org/jira/browse/CASSANDRA-3. -Jonathan On Wed, Mar 11, 2009 at 6:48 PM, Jonathan Ellis wrote: > Use Random for now. The OPHF is the same as the old one, i.e., not > actually OP. :) > > I'm pretty convinced at this point that it's impossible to have an > order-preserving hash that doesn't either (a) impose a relatively > short key length past which no partitioning is done (i.e., all keys w/ > the same prefix go to the same node) or is (b) very sensitive to key > length such that the keys with a given length N will not be evenly > distributed across all nodes. Or both. > > So I am working on migrating from pluggable hash functions key -> > BigInteger, to pluggable partitioning algorithms key -> EndPoint. > Without the requirement to transform to a numeric value first I think > I can create an order-preserving distribution that performs well. (I > need this for range queries.) > > So far I have just laid the foundation, here: > https://issues.apache.org/jira/browse/CASSANDRA-3 > > I hope to finish the rest tomorrow. > > -Jonathan > > On Wed, Mar 11, 2009 at 5:28 PM, Jiansheng Huang > wrote: >> >> Which one is better to use? The default is Random. >> >> In Avinash's annoucement mail, we have >> (1) Ability to switch between a random hash and a OPHF. We still have the >> old (wrong) OPHF in there. I will update it to the corrected one tomorrow. >> >> Is correct OPHF in? Thanks. >> >
Re: OPHF vs. Random
I think that key -> endpoint might still be simpler long term but short term there is far too much code that depends on being able to compare both nodes and keys transformed to tokens. Previously token was hardcoded to be BigInteger but I introduced the abstraction Token defining compareTo(Token), so you can have Token as well as Token. The OrderPreservingPartitioner then uses Token to do lexicographic comparisons. -Jonathan On Mon, Mar 16, 2009 at 3:30 PM, Sandeep Tata wrote: > I like the idea of supporting more general/sophisticated strategies. > > Let me see if I understand the issues at play here: > > OPHFs are tricky to design and leaning on it for load-balancing and > data locality will require incredibly good OPHFs that might not exist. > (I learned this with a bunch of the experiments we ran on our > relatively small test cluster) > > RANDOM of course is going to be great for load-balancing, but we're > completely giving up locality, so range queries are shot. > > If we want to support clever placement strategies, we'll need to make > some changes. Take for instance a key like "userid.messageid" . I > might want: > a) all the keys with the same userid on the same node, and > b) all the messageids stored in order so I can do simple range queries > like "get message 1 to 100" > > OPHF might break a) and RANDOM will break b) > > The claim is that the simplest (best :-) ) way to guarantee a) and b) > is to map the key to an end-point instead of merely an integer. > > What if I changed the hash-function to do RANDOM on just the "userid" > part. And each node still stores the keys in "<" order on the entire > key ("userid.messageid") Would this solve the problem? What is this > approach missing? > > Do we just need to decouple the hash used for routing from the key > used in the end-point for storage? Is this essentially what the series > of patches does? > > > On Mon, Mar 16, 2009 at 1:36 PM, Jonathan Ellis wrote: >> The order-preserving partitioner code (not hash-based anymore) is up >> now at https://issues.apache.org/jira/browse/CASSANDRA-3. >> >> -Jonathan >> >> On Wed, Mar 11, 2009 at 6:48 PM, Jonathan Ellis wrote: >>> Use Random for now. The OPHF is the same as the old one, i.e., not >>> actually OP. :) >>> >>> I'm pretty convinced at this point that it's impossible to have an >>> order-preserving hash that doesn't either (a) impose a relatively >>> short key length past which no partitioning is done (i.e., all keys w/ >>> the same prefix go to the same node) or is (b) very sensitive to key >>> length such that the keys with a given length N will not be evenly >>> distributed across all nodes. Or both. >>> >>> So I am working on migrating from pluggable hash functions key -> >>> BigInteger, to pluggable partitioning algorithms key -> EndPoint. >>> Without the requirement to transform to a numeric value first I think >>> I can create an order-preserving distribution that performs well. (I >>> need this for range queries.) >>> >>> So far I have just laid the foundation, here: >>> https://issues.apache.org/jira/browse/CASSANDRA-3 >>> >>> I hope to finish the rest tomorrow. >>> >>> -Jonathan >>> >>> On Wed, Mar 11, 2009 at 5:28 PM, Jiansheng Huang >>> wrote: >>>> >>>> Which one is better to use? The default is Random. >>>> >>>> In Avinash's annoucement mail, we have >>>> (1) Ability to switch between a random hash and a OPHF. We still have the >>>> old (wrong) OPHF in there. I will update it to the corrected one tomorrow. >>>> >>>> Is correct OPHF in? Thanks. >>>> >>> >> >
some "getting started" information
Hi all, There's a bunch of useful material about getting started with Cassandra but it's rather scattered. So until we get our wiki going I wrote a blog post pulling some of that together: http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html HTH, -Jonathan
cassandra-20
Just a heads up that I committed Eric Evans's patch from #20, which replaces bin/start-server with bin/cassandra and bestows it with magical shell kung-fu to background the server cleanly by default. Should work out of the box on linux, OS X, and cygwin. Use the -f flag to put it in foreground mode (the way it used to be) and -p to log the process id to a file where it can be used for shutdown. -Jonathan
Cassandra at OSCON
My proposal to present on Cassandra at OSCON this year was accepted. OSCON will be July 22 to 24 in San Jose. My talk will be on Thursday: http://en.oreilly.com/oscon2009/public/schedule/grid/2009-07-23 I covered similar material at my PyCon open space talk last week (standing room only); it went very well. There is a lot of interest in scalable systems and Cassandra is one of a very few of these that can handle structured data. This will help get Cassandra a lot more visibility as it reboots as an ASF project. (I'm also giving a talk on Friday, "What every developer should know about database scalability." I'm mostly going to focus on relational databases there but non-relational options will be mentioned, including Cassandra.) -Jonathan
Re: Cassandra at OSCON
It looks like last year they took a bunch of videos, but it looks like only selected talks to me: http://oscon.blip.tv/ So I don't know. :) -Jonathan On Thu, Apr 2, 2009 at 2:53 PM, Johan Oskarsson wrote: > > Congrats! > Will this talk be recorded for those of us who can't make it? > > /Johan > > Jonathan Ellis wrote: >> My proposal to present on Cassandra at OSCON this year was accepted. >> OSCON will be July 22 to 24 in San Jose. My talk will be on Thursday: >> http://en.oreilly.com/oscon2009/public/schedule/grid/2009-07-23 >> >> I covered similar material at my PyCon open space talk last week >> (standing room only); it went very well. There is a lot of interest >> in scalable systems and Cassandra is one of a very few of these that >> can handle structured data. This will help get Cassandra a lot more >> visibility as it reboots as an ASF project. >> >> (I'm also giving a talk on Friday, "What every developer should know >> about database scalability." I'm mostly going to focus on relational >> databases there but non-relational options will be mentioned, >> including Cassandra.) >> >> -Jonathan >> >> > > > > --~--~-~--~~~---~--~~ > You received this message because you are subscribed to the Google Groups > "Cassandra Users" group. > To post to this group, send email to cassandra-u...@googlegroups.com > To unsubscribe from this group, send email to > cassandra-user+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/cassandra-user?hl=en > -~--~~~~--~~--~--~--- > >
Re: Sample Client Code
That looks reasonable. How are you reading the data back out? The web interface only hits the local machine so it is not very useful in a clustered situation. -Jonathan On Thu, Apr 9, 2009 at 4:02 PM, Sam D wrote: > Hi, > > I am new to Cassandra, just installed the latest version on my machine. I > am able to insert rows using the web (@7002), but I am not able to get a > java client to insert rows into a table. Below the piece of code I am using, > the insert call goes through fine without any exceptions, but I am not able > to see the row in the table, so I assume its not being inserted properly. > > socket = new TSocket(machine,port); > TProtocol tp = new TBinaryProtocol(socket); > cl = new Cassandra.Client(tp); > socket.open(); > cl.insert("xmls", "x1", "content:xml", "xyz", 0); > > Can you please point me to any sample code available which I can refer to ?. > > Thanks > Sam. >
Re: Sample Client Code
is content a supercolumn? otherwise specifying a subcolumn isn't going to work. did you check your log file for exceptions? On Thu, Apr 9, 2009 at 4:19 PM, Sam D wrote: > Thanks for the quick response, > > I have only one node. So the web client also should see the data, right ?. > Below is the code which I am using to read. > > socket = new TSocket(machine,port); > TProtocol tp = new TBinaryProtocol(socket); > cl = new Cassandra.Client(tp); > socket.open(); > column_t u1 = cl.get_column("xmls","x1","content:xml"); > System.out.println("xml : " + u1.value); > > Sam. > > On Thu, Apr 9, 2009 at 2:07 PM, Jonathan Ellis wrote: >> >> That looks reasonable. How are you reading the data back out? The >> web interface only hits the local machine so it is not very useful in >> a clustered situation. >> >> -Jonathan >> >> On Thu, Apr 9, 2009 at 4:02 PM, Sam D wrote: >> > Hi, >> > >> > I am new to Cassandra, just installed the latest version on my machine. >> > I >> > am able to insert rows using the web (@7002), but I am not able to get a >> > java client to insert rows into a table. Below the piece of code I am >> > using, >> > the insert call goes through fine without any exceptions, but I am not >> > able >> > to see the row in the table, so I assume its not being inserted >> > properly. >> > >> > socket = new TSocket(machine,port); >> > TProtocol tp = new TBinaryProtocol(socket); >> > cl = new Cassandra.Client(tp); >> > socket.open(); >> > cl.insert("xmls", "x1", "content:xml", "xyz", 0); >> > >> > Can you please point me to any sample code available which I can refer >> > to ?. >> > >> > Thanks >> > Sam. >> > > >
Re: Sample Client Code
So content:xml is your ColumnFamily:column tuple. That looks right. That exception is from the client side, right? That looks to me like it can't connect to the server. Your connection code looks okay... port should be the thrift port, 9160 if you haven't changed it. On Thu, Apr 9, 2009 at 4:31 PM, Sam D wrote: > No, its not a supercolumn, how do I retrieve it if its not a supercolumn ?. > > > > > > I didn't notice it earlier, but yes, I am seeing the following exception in > the log > > Exception in thread "main" > com.facebook.thrift.transport.TTransportException: Cannot write to null > outputStream > at com.facebook.thrift.transport.TIOStreamTransport.write(Unknown > Source) > at com.facebook.thrift.protocol.TBinaryProtocol.writeI32(Unknown Source) > > Thanks > > On Thu, Apr 9, 2009 at 2:24 PM, Jonathan Ellis wrote: >> >> is content a supercolumn? otherwise specifying a subcolumn isn't going to >> work. >> >> did you check your log file for exceptions? >> >> On Thu, Apr 9, 2009 at 4:19 PM, Sam D wrote: >> > Thanks for the quick response, >> > >> > I have only one node. So the web client also should see the data, right >> > ?. >> > Below is the code which I am using to read. >> > >> > socket = new TSocket(machine,port); >> > TProtocol tp = new TBinaryProtocol(socket); >> > cl = new Cassandra.Client(tp); >> > socket.open(); >> > column_t u1 = cl.get_column("xmls","x1","content:xml"); >> > System.out.println("xml : " + u1.value); >> > >> > Sam. >> > >> > On Thu, Apr 9, 2009 at 2:07 PM, Jonathan Ellis >> > wrote: >> >> >> >> That looks reasonable. How are you reading the data back out? The >> >> web interface only hits the local machine so it is not very useful in >> >> a clustered situation. >> >> >> >> -Jonathan >> >> >> >> On Thu, Apr 9, 2009 at 4:02 PM, Sam D wrote: >> >> > Hi, >> >> > >> >> > I am new to Cassandra, just installed the latest version on my >> >> > machine. >> >> > I >> >> > am able to insert rows using the web (@7002), but I am not able to >> >> > get a >> >> > java client to insert rows into a table. Below the piece of code I am >> >> > using, >> >> > the insert call goes through fine without any exceptions, but I am >> >> > not >> >> > able >> >> > to see the row in the table, so I assume its not being inserted >> >> > properly. >> >> > >> >> > socket = new TSocket(machine,port); >> >> > TProtocol tp = new TBinaryProtocol(socket); >> >> > cl = new Cassandra.Client(tp); >> >> > socket.open(); >> >> > cl.insert("xmls", "x1", "content:xml", "xyz", 0); >> >> > >> >> > Can you please point me to any sample code available which I can >> >> > refer >> >> > to ?. >> >> > >> >> > Thanks >> >> > Sam. >> >> > >> > >> > > >
Re: Sample Client Code
For now you'll have to encode it somehow. We have a ticket (https://issues.apache.org/jira/browse/CASSANDRA-29) to switch to binary data as column values and that's high on my list to get done. -Jonathan On Thu, Apr 9, 2009 at 7:40 PM, Sam D wrote: > Thanks Jonathan, it issue was due to some connectivity issues. Its working > fine now. > > I had one more question. > > Can we insert byte arrays as values for the columns ?. I am trying to store > JPEG images. > > Thanks > > On Thu, Apr 9, 2009 at 2:38 PM, Jonathan Ellis wrote: >> >> So content:xml is your ColumnFamily:column tuple. That looks right. >> >> That exception is from the client side, right? That looks to me like >> it can't connect to the server. >> >> Your connection code looks okay... port should be the thrift port, >> 9160 if you haven't changed it. >> >> On Thu, Apr 9, 2009 at 4:31 PM, Sam D wrote: >> > No, its not a supercolumn, how do I retrieve it if its not a supercolumn >> > ?. >> > >> > >> > >> > >> > >> > I didn't notice it earlier, but yes, I am seeing the following exception >> > in >> > the log >> > >> > Exception in thread "main" >> > com.facebook.thrift.transport.TTransportException: Cannot write to null >> > outputStream >> > at com.facebook.thrift.transport.TIOStreamTransport.write(Unknown >> > Source) >> > at com.facebook.thrift.protocol.TBinaryProtocol.writeI32(Unknown >> > Source) >> > >> > Thanks >> > >> > On Thu, Apr 9, 2009 at 2:24 PM, Jonathan Ellis >> > wrote: >> >> >> >> is content a supercolumn? otherwise specifying a subcolumn isn't going >> >> to >> >> work. >> >> >> >> did you check your log file for exceptions? >> >> >> >> On Thu, Apr 9, 2009 at 4:19 PM, Sam D wrote: >> >> > Thanks for the quick response, >> >> > >> >> > I have only one node. So the web client also should see the data, >> >> > right >> >> > ?. >> >> > Below is the code which I am using to read. >> >> > >> >> > socket = new TSocket(machine,port); >> >> > TProtocol tp = new TBinaryProtocol(socket); >> >> > cl = new Cassandra.Client(tp); >> >> > socket.open(); >> >> > column_t u1 = cl.get_column("xmls","x1","content:xml"); >> >> > System.out.println("xml : " + u1.value); >> >> > >> >> > Sam. >> >> > >> >> > On Thu, Apr 9, 2009 at 2:07 PM, Jonathan Ellis >> >> > wrote: >> >> >> >> >> >> That looks reasonable. How are you reading the data back out? The >> >> >> web interface only hits the local machine so it is not very useful >> >> >> in >> >> >> a clustered situation. >> >> >> >> >> >> -Jonathan >> >> >> >> >> >> On Thu, Apr 9, 2009 at 4:02 PM, Sam D >> >> >> wrote: >> >> >> > Hi, >> >> >> > >> >> >> > I am new to Cassandra, just installed the latest version on my >> >> >> > machine. >> >> >> > I >> >> >> > am able to insert rows using the web (@7002), but I am not able to >> >> >> > get a >> >> >> > java client to insert rows into a table. Below the piece of code I >> >> >> > am >> >> >> > using, >> >> >> > the insert call goes through fine without any exceptions, but I am >> >> >> > not >> >> >> > able >> >> >> > to see the row in the table, so I assume its not being inserted >> >> >> > properly. >> >> >> > >> >> >> > socket = new TSocket(machine,port); >> >> >> > TProtocol tp = new TBinaryProtocol(socket); >> >> >> > cl = new Cassandra.Client(tp); >> >> >> > socket.open(); >> >> >> > cl.insert("xmls", "x1", "content:xml", "xyz", 0); >> >> >> > >> >> >> > Can you please point me to any sample code available which I can >> >> >> > refer >> >> >> > to ?. >> >> >> > >> >> >> > Thanks >> >> >> > Sam. >> >> >> > >> >> > >> >> > >> > >> > > >
change to client API
All column values that were declared `string` in thrift are now `binary`. (See https://issues.apache.org/jira/browse/CASSANDRA-29.) For Java that means byte[] instead of String. For Python, because thrift treatment of `string` is broken, that actually means no change -- values were str before and remain str. I don't know the details of the other thrift generators but it probably follows one of those two patterns. :) -Jonathan
Re: Questions around API changes
On Fri, May 1, 2009 at 5:59 AM, Jonas Bonér wrote: > Hi there. > > First, should I use this ML or the google forum? This one. > * What does the new timestamp arg in > public boolean remove(String tablename, String key, String > columnFamily_column, long timestamp, boolean block) > specify? It's compared against the timestamp in insert, to make sure remove doesn't get applied to newer data than it was intended to. > * Any reason for making ctor in CassandraServer protected? I am > embedding Cassandra and now I have to use reflection to create the > instance. No big deal, just checking why? No particular reason I know of. We can make that public. > * I get this exception when invoking batch_update (in the previous > release, haven't tried with the latest trunk yet): Yeah, that's a long-standing bug. I have a patch to fix it here https://issues.apache.org/jira/browse/CASSANDRA-120 that is waiting for review. -Jonathan
Re: Questions around API changes
On Fri, May 1, 2009 at 11:19 AM, Jonas Bonér wrote: > Thanks for the answers. > > Btw, is the CQL in usable state? No idea. Probably not. :) > If not, any plans? The third cassandra committer from FB who mostly remains silent (forget his name atm) is supposedly planning to work on it more. > What about the CLI interface? That is working. In fact, Eric just wrote a new wrapper script for it and a README: https://svn.apache.org/repos/asf/incubator/cassandra/trunk/README.txt -Jonathan
Re: Some questions.
On Sat, May 2, 2009 at 6:22 AM, Manuel Crotti wrote: > Now I have some questions: > 1. each "storage-conf.xml" should contain just one of the above > ip-addresses (obviously not the localhost's IP address) in the > section to let cassandra learn the whole topology? Or it must contain the > whole list? Just pick one of the public IPs to be seed. > 2. how can I see if the nodes of the cluster are "talking" (some logfile, > ...)? (I supposed to find it into the localhost:7002 interface but i see > just a host -localhost - and I suppose hosts are not "talking") If it is working each :7002 will show all the nodes. > 3. What should differ between the "storage-conf.xml" files of each node? > 3.1. the "storage-conf.xml" of each node should contain the table structure > to replicate/propagate the information of the data of a table? Right. The only thing you that should be different is the ListenAddress section. (You can try leaving that out and Cassandra will pick an interface to use but it often guesses a non-public interface which is not helpful. :) > 3.2 finally: should I start a cluster with an empty DB or I can replicate an > existing DB? You can start from an existing one if it's really legitimate for all nodes to have copies of that data but it probably is not. > I also submit a couple of errors that raised using the command-line client: Okay, so the problems are that (1) it thinks it is connected when it is not, and (2) it allows you to run commands when it is not connected. Right? Can you file those in the issue tracker? https://issues.apache.org/jira/browse/CASSANDRA thanks, -Jonathan
last api change for 0.3
I committed the patch for CASSANDRA-131 which (a) enables exception throwing on the insert methods (so you don't have to explicitly check return value to see if something worked), and (b) moves the _blocking method as a flag into the nonblocking ones. so instead of insert_blocking use insert with block=True. The block flags default to false so your nonblocking calls will work as before. (Assuming you are using a thrift binding that actually generates default values correctly. I haven't seen one yet but I assume they're out there. :) -Jonathan
Re: Non relational db meetup - San Francisco, June 11th
That's true, but 100 people is about the largest space you're going to find for free, so past that you'd have to start charging people and worrying about taxes and such. Messy. Maybe next year... :) -Jonathan On Tue, May 12, 2009 at 2:02 PM, Jonas Bonér wrote: > Great initiative. > Just sad that it is not the week before (during JavaOne). Then I think > a lot of people (including me) could go. > > 2009/5/12 Johan Oskarsson : >> Cassandra will be represented by Avinash Lakshman on a free full day >> meetup covering "open source, distributed, non relational databases" on >> June 11th in San Francisco. >> >> The idea is that the event will give people interested in this area a >> great introduction and an easy way to compare the different projects out >> there as well as the opportunity to discuss them with the developers. >> >> Registration >> The event is free but space is limited, please register if you wish to >> attend: http://nosql.eventbrite.com/ >> >> >> Preliminary schedule, 2009-06-11 >> 09.45: Doors open >> 10.00: Intro session (Todd Lipcon, Cloudera) >> 10.40: Voldemort (Jay Kreps, Linkedin) >> 11.20: Short break >> 11.30: Cassandra (Avinash Lakshman, Facebook) >> 12.10: Free lunch (sponsored by CBSi) >> 13.10: Dynomite (Cliff Moon, Powerset) >> 13.50: HBase (Ryan Rawson, Stumbleupon) >> 14.30: Short break >> 14.40: Hypertable (Doug Judd, Zvents) >> 15.20: Panel discussion >> 16.00: End of meetup, relocate to a pub called Kate O’Brien’s nearby >> >> Location >> Magma room, CBS interactive >> 235 Second Street >> San Francisco, CA 94105 >> >> Sponsor >> A big thanks to CBSi for providing the venue and free lunch. >> >> >> /Johan Oskarsson, developer @ last.fm >> > > > > -- > Jonas Bonér > > twitter: @jboner > blog: http://jonasboner.com > work: http://crisp.se > work: http://scalablesolutions.se > code: http://github.com/jboner >
Cassandra 0.3 RC is out
Short version: http://incubator.apache.org/cassandra/cassandra-0.3.0-rc.tgz Long version: http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html Release Candidate means "we fixed all the bugs we could find; help us find more so the release is even more solid." :) I've created a 0.3 branch for bugfixes; trunk will now be for 0.4 development. I'll start to look at the patches I've been postponing until the RC was out now; thanks for your patience, Jun and Sandeep. -Jonathan
Re: Cassandra 0.3 RC is out
Oops, fat-fingered the url: http://incubator.apache.org/cassandra/releases/cassandra-0.3-rc.tgz :) On Wed, May 13, 2009 at 10:28 PM, Jonathan Ellis wrote: > Short version: http://incubator.apache.org/cassandra/cassandra-0.3.0-rc.tgz > Long version: > http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html > > Release Candidate means "we fixed all the bugs we could find; help us > find more so the release is even more solid." :) > > I've created a 0.3 branch for bugfixes; trunk will now be for 0.4 > development. I'll start to look at the patches I've been postponing > until the RC was out now; thanks for your patience, Jun and Sandeep. > > -Jonathan >
Re: Cassandra 0.3 RC is out
I've been asked to change the download url to http://people.apache.org/%7Ejbellis/cassandra/cassandra-0.3-rc.tgz to avoid incorrectly implying that this is An Official Release which it is not. -Jonathan
Re: Cassandra 0.3 RC is out
Thanks! And it is probably worth repeating that although I am the only active committer at the moment, this represents the work of many people, especially (alphabetically :) Eric Evans, Johan Oskarsson, Jun Rao, and Sandeep Tata -- hopefully we will get more committers from this group soon. Lots of others also contributed patches, bug reports, and testing. -Jonathan On May 14, 2009, at 8:34 AM, Jonas Bonér wrote: > Awesome job Jonathan. > Just getting into the codebase so fast is admirable. > Churning out code like this (and releases) is amazing. Keep it up. > > 2009/5/14 Jonathan Ellis : >> Short version: http://incubator.apache.org/cassandra/cassandra-0.3.0-rc.tgz >> Long version: >> http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html >> >> Release Candidate means "we fixed all the bugs we could find; help us >> find more so the release is even more solid." :) >> >> I've created a 0.3 branch for bugfixes; trunk will now be for 0.4 >> development. I'll start to look at the patches I've been postponing >> until the RC was out now; thanks for your patience, Jun and Sandeep. >> >> -Jonathan >> > > > > -- > Jonas Bonér > > twitter: @jboner > blog:http://jonasboner.com > work: http://crisp.se > work: http://scalablesolutions.se > code: http://github.com/jboner
Re: Node Recovery
That's the price you pay for (a) eventual consistency in general and (b) doing read repair in the background specifically. Cassandra also has functionality (called "strong read") to do a quorum read in the foreground and repair if necessary but that is not exposed in Thrift yet -- but even with that there are scenarios where you could get back "no data" for a write that has been acked. The only way to avoid it entirely is to require acking all writes from all replicas and checking all replicas on all reads, which (in a large cluster) is going to hurt from the availability standpoint. Most apps are ok trading off some consistency for availability. -Jonathan On Mon, May 18, 2009 at 12:24 PM, Chris Goffinet wrote: > Scenario: if i setup a 2 node cluster, with replicationfactor of 2. Inserted > a new key (1) into a table. Its replicated to both nodes. I shutdown node > (2), delete all data, then bring it back up. I noticed that if i make a > request to that node the first time for that key, it will return back an > empty result (was using get_slice), then that node will pull the data from > other node. On next request to that node its there. How does one really know > if the data isn't there (should I retry) vs it was never there to begin > with? > > --- > Chris Goffinet > goffi...@digg.com > > > > > >
Re: multi-table
Different apps will have different performance characteristics (and different key domains, which can also be important). So there are operational reasons to prefer cluster-per-app. That said, multi table support is high on my priority list. The changes required are straightforward so I'd love to help someone dive in as opposed to just doing it myself. :) -Jonathan On May 18, 2009, at 7:56 PM, Chris Goffinet wrote: Has anyone here needed multi-table support yet in Cassandra? Anyone willing to share use cases where you felt maybe you didn't need multi-table support? Seems just a bit odd it isn't there yet :) --- Chris Goffinet goffi...@digg.com
schema example
Does anyone have a simple app schema they can share? I can't share the one for our main app. But we do need an example here. A real one would be nice if we can find one. I checked App Engine. They don't have a whole lot of examples either. They do have a really simple one: http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html The most important thing in Cassandra modeling is choosing a good key, since that is what most of your lookups will be by. Keys are also how Cassandra scales -- Cassandra can handle effectively infinite keys (given enough nodes obviously) but only thousands to millions of columns per key/CF (depending on what API calls you use -- Jun is adding one now that does not deseriailze everything in the whole CF into memory. The rest will need to follow this model eventually too). For this guestbook I think the choice is obvious: use the name as the key, and have a single simple CF for the messages. Each column will be a message (you can even use the mandatory timestamp field as part of your user-visible data. win!). You get the list (or page) of users with get_key_range and then their messages with get_slice. Anyone got another one for pedagogical purposes? -Jonathan
Re: schema example
Mail storage, man, I think pretty much anything I could come up with would look pretty simplistic compared to what "real" systems do in that domain. :) But blogs, I think I can handle those. Let's make it ours multiuser or there isn't enough scale to make it interesting. :) The interesting thing here is we want to be able to query two things efficiently: - the most recent posts belonging to a given blog, in reverse chronological order - a single post and its comments, in chronological order At first glance you might think we can again reasonably do this with a single CF, this time a super CF: The key is the blog name, the supercolumns are posts and the subcolumns are comments. This would be reasonable BUT supercolumns are just containers, they have no data or timestamp associated with them directly (only through their subcolumns). So you cannot sort a super CF by time. So instead what I would do would be to use two CFs: For the first, the keys used would be blog names, and the columns would be the post titles and body. So to get a list of most recent posts you just do a slice query. Even though Cassandra currently handles large groups of columns sub-optimally, even with a blog updated several times a day you'd be safe taking this approach (i.e. we'll have that problem fixed before you start seeing it :). For the second, the keys are blog name. The columns are the comment data. You can serialize these a number of ways; I would probably use title as the column name and have the value be the author + body (e.g. as a json dict). Again we use the slice call to get the comments in order. (We will have to manually reverse what slice gives us since time sort is always reverse chronological atm, but the overhead of doing this in memory will be negligible.) Does this help? -Jonathan On Tue, May 19, 2009 at 11:49 AM, Evan Weaver wrote: > Even if it's not actually in real-life use, some examples for common > domains would really help clarify things. > > * blog > * email storage > * search index > > etc. > > Evan > > On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis wrote: >> Does anyone have a simple app schema they can share? >> >> I can't share the one for our main app. But we do need an example >> here. A real one would be nice if we can find one. >> >> I checked App Engine. They don't have a whole lot of examples either. >> They do have a really simple one: >> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html >> >> The most important thing in Cassandra modeling is choosing a good key, >> since that is what most of your lookups will be by. Keys are also how >> Cassandra scales -- Cassandra can handle effectively infinite keys >> (given enough nodes obviously) but only thousands to millions of >> columns per key/CF (depending on what API calls you use -- Jun is >> adding one now that does not deseriailze everything in the whole CF >> into memory. The rest will need to follow this model eventually too). >> >> For this guestbook I think the choice is obvious: use the name as the >> key, and have a single simple CF for the messages. Each column will >> be a message (you can even use the mandatory timestamp field as part >> of your user-visible data. win!). You get the list (or page) of >> users with get_key_range and then their messages with get_slice. >> >> >> >> Anyone got another one for pedagogical purposes? >> >> -Jonathan >> > > > > -- > Evan Weaver >
Re: Ingesting from Hadoop to Cassandra
Have you benchmarked the batch insert apis? If that is "fast enough" then it's by far the simplest way to go. Otherwise you'll have to use the binarymemtable stuff which is undocumented and not exposed as a client api (you basically write a custom "loader" version of cassandra to use it, I think). FB used this for their own bulk loading so it works at some level, but clearly there is some assembly required. -Jonathan On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares wrote: > Hi all, > > I'm trying to find the most optimal way to ingest my content from Hadoop to > Cassandra. Assuming I have figured out the table representation for this > content, what is the best way to do go about pushing from my cluster? What > Cassandra client batch APIs do you suggest I use to push to Cassandra? I'm > sure this is a common pattern, I'm curious to see how it has been > implemented. Assume millions of of rows and 1000s of columns. > > Thanks in advance, > -Alex > >
Re: Ingesting from Hadoop to Cassandra
No, batch APIs are per CF, not per row. Several people have asked Avinash for sample code using BinaryMemtable but to my knowledge nothing ever came of that. The high level description of the BMT is that you give it serialized CFs as values instead of raw columns so it can just sort on key and write directly to disk. So then you would do something like this: Table table = Table.open(mytablename); ColumnFamilyStore store = table.getColumnFamilyStore(mycfname); for cf : mydata store.applyBinary(cf.key, toByteArray(cf)) There's no provision for doing this over the network that I know of, you have to put the right keys on the right nodes manually. -Jonathan On Thu, May 21, 2009 at 11:27 AM, Alexandre Linares wrote: > Jonathan, > > Thanks for your thoughts. > > I've done some simple benchmarks with the batch insert apis and was looking > for something slightly more performant. Is there a batch row insert that I > missed? > > Any pointers (at all) to anything related to FB's bulk loading or the > binarymemtable? I've attempted to do this by writing a custom IVerbHandler > for ingestion and interfacing with the MessagingService internally but it's > not that clean. > > Thanks again, > -Alex > > > From: Jonathan Ellis > To: cassandra-user@incubator.apache.org > Sent: Thursday, May 21, 2009 7:44:59 AM > Subject: Re: Ingesting from Hadoop to Cassandra > > Have you benchmarked the batch insert apis? If that is "fast enough" > then it's by far the simplest way to go. > > Otherwise you'll have to use the binarymemtable stuff which is > undocumented and not exposed as a client api (you basically write a > custom "loader" version of cassandra to use it, I think). FB used > this for their own bulk loading so it works at some level, but clearly > there is some assembly required. > > -Jonathan > > On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares > wrote: >> Hi all, >> >> I'm trying to find the most optimal way to ingest my content from Hadoop >> to >> Cassandra. Assuming I have figured out the table representation for this >> content, what is the best way to do go about pushing from my cluster? >> What >> Cassandra client batch APIs do you suggest I use to push to Cassandra? I'm >> sure this is a common pattern, I'm curious to see how it has been >> implemented. Assume millions of of rows and 1000s of columns. >> >> Thanks in advance, >> -Alex >> >> > >
Re: Ingesting from Hadoop to Cassandra
waiting on <0x92a26e30> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:485) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > - locked <0x92a26e30> (a java.lang.ref.Reference$Lock) > > "main" prio=10 tid=0x0805a800 nid=0x4c47 runnable [0xb7fea000..0xb7feb288] >java.lang.Thread.State: RUNNABLE > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > - locked <0x92ac9578> (a java.io.BufferedOutputStream) > at > org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:139) > at > org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:184) > at org.apache.cassandra.service.column_t.write(column_t.java:321) > at > org.apache.cassandra.service.superColumn_t.write(superColumn_t.java:291) > at > org.apache.cassandra.service.batch_mutation_super_t.write(batch_mutation_super_t.java:365) > at > org.apache.cassandra.service.Cassandra$batch_insert_superColumn_args.write(Cassandra.java:9776) > at > org.apache.cassandra.service.Cassandra$Client.send_batch_insert_superColumn(Cassandra.java:546) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.pushDocuments(CassandraImport.java:168) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.sendOut(CassandraImport.java:146) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:127) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:1) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > > ] > > It looks like the client is waiting on a response from Cassandra but never > gets it. Any ideas? I had seen similar behavior in the Cassandra code prior > to the 0.3 release candidate, b/c of a race condition in SelectorManager. > It looks like this was taken care of in 0.3-rc, so I'm not sure what's going > on here. > > Thanks, > -Alex > > > From: Jonathan Ellis > To: cassandra-user@incubator.apache.org > Sent: Thursday, May 21, 2009 9:42:29 AM > Subject: Re: Ingesting from Hadoop to Cassandra > > No, batch APIs are per CF, not per row. > > Several people have asked Avinash for sample code using BinaryMemtable > but to my knowledge nothing ever came of that. > > The high level description of the BMT is that you give it serialized > CFs as values instead of raw columns so it can just sort on key and > write directly to disk. So then you would do something like this: > > Table table = Table.open(mytablename); > ColumnFamilyStore store = table.getColumnFamilyStore(mycfname); > for cf : mydata > store.applyBinary(cf.key, toByteArray(cf)) > > There's no provision for doing this over the network that I know of, > you have to put the right keys on the right nodes manually. > > -Jonathan > > On Thu, May 21, 2009 at 11:27 AM, Alexandre Linares > wrote: >> Jonathan, >> >> Thanks for your thoughts. >> >> I've done some simple benchmarks with the batch insert apis and was >> looking >> for something slightly more performant. Is there a batch row insert that >> I >> missed? >> >> Any pointers (at all) to anything related to FB's bulk loading or the >> binarymemtable? I've attempted to do this by writing a custom >> IVerbHandler >> for ingestion and interfacing with the MessagingService internally but >> it's >> not that clean. >> >> Thanks again, >> -Alex >> >> >> From: Jonathan Ellis >> To: cassandra-user@incubator.apache.org >> Sent: Thursday, May 21, 2009 7:44:59 AM >> Subject: Re: Ingesting from Hadoop to Cassandra >> >> Have you benchmarked the batch insert apis? If that is "fast enough" >> then it's by far the simplest way to go. >> >> Otherwise you'll have to use the binarymemtable stuff which is >> undocumented and not exposed as a client api (you basically write a >> custom "loader" version of cassandra to use it, I think). FB used >> this for their own bulk loading so it works at some level, but clearly >> there is some assembly required. >> >> -Jonathan >> >> On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares >> wrote: >>> Hi all, >>> >>> I'm trying to find the most optimal way to ingest my content from Hadoop >>> to >>> Cassandra. Assuming I have figured out the table representation for this >>> content, what is the best way to do go about pushing from my cluster? >>> What >>> Cassandra client batch APIs do you suggest I use to push to Cassandra? >>> I'm >>> sure this is a common pattern, I'm curious to see how it has been >>> implemented. Assume millions of of rows and 1000s of columns. >>> >>> Thanks in advance, >>> -Alex >>> >>> >> >> > >
Re: Ingesting from Hadoop to Cassandra
id=0x4c49 in > Object.wait() [0x8fe4a000..0x8fe4afb0] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x92a26e30> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:485) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > - locked <0x92a26e30> (a java.lang.ref.Reference$Lock) > > "main" prio=10 tid=0x0805a800 nid=0x4c47 runnable [0xb7fea000..0xb7feb288] > java.lang.Thread.State: RUNNABLE > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > - locked <0x92ac9578> (a java.io.BufferedOutputStream) > at > org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:139) > at > org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:184) > at org.apache.cassandra.service.column_t.write(column_t.java:321) > at > org.apache.cassandra.service.superColumn_t.write(superColumn_t.java:291) > at > org.apache.cassandra.service.batch_mutation_super_t.write(batch_mutation_super_t.java:365) > at > org.apache.cassandra.service.Cassandra$batch_insert_superColumn_args.write(Cassandra.java:9776) > at > org.apache.cassandra.service.Cassandra$Client.send_batch_insert_superColumn(Cassandra.java:546) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.pushDocuments(CassandraImport.java:168) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.sendOut(CassandraImport.java:146) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:127) > at > com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:1) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > > ] > > It looks like the client is waiting on a response from Cassandra but never > gets it. Any ideas? I had seen similar behavior in the Cassandra code prior > to the 0.3 release candidate, b/c of a race condition in SelectorManager. > It looks like this was taken care of in 0.3-rc, so I'm not sure what's going > on here. > > Thanks, > -Alex > > > From: Jonathan Ellis > To: cassandra-user@incubator.apache.org > Sent: Thursday, May 21, 2009 9:42:29 AM > Subject: Re: Ingesting from Hadoop to Cassandra > > No, batch APIs are per CF, not per row. > > Several people have asked Avinash for sample code using BinaryMemtable > but to my knowledge nothing ever came of that. > > The high level description of the BMT is that you give it serialized > CFs as values instead of raw columns so it can just sort on key and > write directly to disk. So then you would do something like this: > > Table table = Table.open(mytablename); > ColumnFamilyStore store = table.getColumnFamilyStore(mycfname); > for cf : mydata > store.applyBinary(cf.key, toByteArray(cf)) > > There's no provision for doing this over the network that I know of, > you have to put the right keys on the right nodes manually. > > -Jonathan > > On Thu, May 21, 2009 at 11:27 AM, Alexandre Linares > wrote: >> Jonathan, >> >> Thanks for your thoughts. >> >> I've done some simple benchmarks with the batch insert apis and was >> looking >> for something slightly more performant. Is there a batch row insert that >> I >> missed? >> >> Any pointers (at all) to anything related to FB's bulk loading or the >> binarymemtable? I've attempted to do this by writing a custom >> IVerbHandler >> for ingestion and interfacing with the MessagingService internally but >> it's >> not that clean. >> >> Thanks again, >> -Alex >> >> >> From: Jonathan Ellis >> To: cassandra-user@incubator.apache.org >> Sent: Thursday, May 21, 2009 7:44:59 AM >> Subject: Re: Ingesting from Hadoop to Cassandra >> >> Have you benchmarked the batch insert apis? If that is "fast enough" >> then it's by far the simplest way to go. >> >> Otherwise you'll have to use the binarymemtable stuff which is >> undocumented and not exposed as a client api (you basically write a >> custom "loader" version of cassandra to use it, I think). FB used >> this for their own bulk loading so it works at some level, but clearly >> there is some assembly required. >> >> -Jonathan >> >> On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares >> wrote: >>> Hi all, >>> >>> I'm trying to find the most optimal way to ingest my content from Hadoop >>> to >>> Cassandra. Assuming I have figured out the table representation for this >>> content, what is the best way to do go about pushing from my cluster? >>> What >>> Cassandra client batch APIs do you suggest I use to push to Cassandra? >>> I'm >>> sure this is a common pattern, I'm curious to see how it has been >>> implemented. Assume millions of of rows and 1000s of columns. >>> >>> Thanks in advance, >>> -Alex >>> >>> >> >> > >
Re: Ingesting from Hadoop to Cassandra
On Wed, May 27, 2009 at 6:39 PM, Alexandre Linares wrote: > So it actually doesn't look blocked, but it's crawling. Of course, in > Hadoop, it always timed out (10 mins), before I could tell that it was > crawling (I think) So, back to the original hypothesis: you need to increase the memory you are giving to the JVM, (in bin/cassandra.in.sh) or increase the flush frequency (by lowering the memtable object count threshold). > Can you reproduce with a non-hadoop client program that you can share here? BTW, I meant share the client code, not a client thread dump. And please use attachments for thread dumps or source files; it's really impossible to read this thread on my phone with everything jammed into the body. :) -Jonathan
Re: Ingesting from Hadoop to Cassandra
I can't reproduce with this, there is too much unspecified. (What is a Document? How do I get one?) Attached is a short program that successfully does 100k supercolumn inserts against a default configuration. Can you create a program like this for me to run? (Java is fine; Python is just more concise.) -Jonathan On Thu, May 28, 2009 at 11:03 AM, Alexandre Linares wrote: > Jonathan, sorry for the lengthy emails! Hope this one's more readable. > > So I'm fairly convinced it's not a Cassandra-side configuration problem; at > least not one that entails tweaking the object count threshold or the > memtable size. > > Given the client code at http://pastie.org/492753 : > from thrift.transport import TTransport from thrift.transport import TSocket from thrift.transport import THttpClient from thrift.protocol import TBinaryProtocol from cassandra import Cassandra from cassandra.ttypes import batch_mutation_t, batch_mutation_super_t, superColumn_t, column_t, NotFoundException, InvalidRequestException socket = TSocket.TSocket('localhost', 9160) transport = TTransport.TBufferedTransport(socket) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = Cassandra.Client(protocol) transport.open() for i in xrange(10): doc_id = str(i) columns = [column_t('header', 'x'*1024, 0)] cfmap = {'Super1': [superColumn_t(doc_id, columns)]} client.batch_insert_superColumn(batch_mutation_t('Table1', doc_id, cfmap), True) print i
Re: cassandra's performance?
We're basically in a roll-your-own benchmark state. Johan can probably give some pointers: http://blog.oskarsson.nu/2009/05/vpork.html. Also see the "how fast is it" section here: http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html -Jonathan On Wed, Jun 3, 2009 at 3:06 AM, lichun li wrote: > I can't find cassandra's performance data, such as throughput. Does > anyone know where to find these data? > > -- > Sincerely yours, > > Lichun Li > Mobile Life New Media Lab, BUPT >
Re: cassandra's performance?
Cassandra is not designed to work memory-only. It's designed designed to use disk for durability and to accommodate using large sets of data, letting the OS use memory as a huge cache for that. For typical data use patterns (where a relatively small amount is "hot") this will be a much better use of hardware than memory-only. On Wed, Jun 3, 2009 at 7:44 PM, lichun li wrote: > Thank you! > The "how fast is it" section says:"In a nutshell, Cassandra is much > faster than relational databases, and much slower than memory-only > systems or systems that don't sync each update to disk." > Can Cassandra work in a memory-only mode? Can it be done by just > changing configuration? > > On Wed, Jun 3, 2009 at 10:38 PM, Jonathan Ellis wrote: >> We're basically in a roll-your-own benchmark state. Johan can >> probably give some pointers: >> http://blog.oskarsson.nu/2009/05/vpork.html. Also see the "how fast >> is it" section here: >> http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html >> >> -Jonathan >> >> On Wed, Jun 3, 2009 at 3:06 AM, lichun li wrote: >>> I can't find cassandra's performance data, such as throughput. Does >>> anyone know where to find these data? >>> >>> -- >>> Sincerely yours, >>> >>> Lichun Li >>> Mobile Life New Media Lab, BUPT >>> >> > > > > -- > Sincerely yours, > > Lichun Li > Mobile Life New Media Lab, BUPT >
Re: cassandra's performance?
You'd still be hitting the transaction log, though. (I assume the logging you were talking about was the log4j kind, because you can't turn off the xlog without hacking at the code right now.) -Jonathan On Wed, Jun 3, 2009 at 8:11 PM, Sandeep Tata wrote: > Apart from logging, given enough memory, you could get Cassandra to > behave almost like an in-memory system. > > Turning off logging is relatively straightforward. > If you turn off periodic flushing of memtables and have the thresholds > high enough (a little more tricky), you're done -- chances are the > read path and the write path will never hit disk. > > > On Wed, Jun 3, 2009 at 5:48 PM, Jonathan Ellis wrote: >> Cassandra is not designed to work memory-only. It's designed designed >> to use disk for durability and to accommodate using large sets of >> data, letting the OS use memory as a huge cache for that. For typical >> data use patterns (where a relatively small amount is "hot") this will >> be a much better use of hardware than memory-only. >> >> On Wed, Jun 3, 2009 at 7:44 PM, lichun li wrote: >>> Thank you! >>> The "how fast is it" section says:"In a nutshell, Cassandra is much >>> faster than relational databases, and much slower than memory-only >>> systems or systems that don't sync each update to disk." >>> Can Cassandra work in a memory-only mode? Can it be done by just >>> changing configuration? >>> >>> On Wed, Jun 3, 2009 at 10:38 PM, Jonathan Ellis wrote: >>>> We're basically in a roll-your-own benchmark state. Johan can >>>> probably give some pointers: >>>> http://blog.oskarsson.nu/2009/05/vpork.html. Also see the "how fast >>>> is it" section here: >>>> http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html >>>> >>>> -Jonathan >>>> >>>> On Wed, Jun 3, 2009 at 3:06 AM, lichun li wrote: >>>>> I can't find cassandra's performance data, such as throughput. Does >>>>> anyone know where to find these data? >>>>> >>>>> -- >>>>> Sincerely yours, >>>>> >>>>> Lichun Li >>>>> Mobile Life New Media Lab, BUPT >>>>> >>>> >>> >>> >>> >>> -- >>> Sincerely yours, >>> >>> Lichun Li >>> Mobile Life New Media Lab, BUPT >>> >> >
Re: questions about operations
On Thu, Jun 4, 2009 at 12:33 AM, Thorsten von Eicken wrote: > I'm looking at the cassandra data model and operations and I'm running into > a number of questions I have not been able to answer: > > - what does get_columns_since do? I thought there's only one version of a > column stored. I'm puzzled about the "since" aspect. this is for use with time-sorted CFs or supercolumns -- it's like a slice by time. > - is the Thrift interface for get_superColumn correct? It seems to me that > "3:string columnFamily" should really be "3:string > columnFamily_superColumnName" (I know this doesn't have any functional > impact, just makes it hard to understand what the operation does) > > - is the Thrift interface for get_slice_super correct? It seems to me that > "3:string columnFamily_superColumnName" should really be "3:string > columnFamily" I think you're right. > - what does get_key_range do? It looks like it returns a list of keys, but > why does one have to specify a list of column family names? The CF is the unit of data storage, so it will be more efficient if you can narrow down which CFs you are interested in keys from. But if you pass an empty list it will scan all of them. > - what does touch do? It's intended to force the index information for the key in question into an explicit LRU cache to save a seek on the next lookup, and also get the row data into the OS fs cache. But the first part is buggy and the second part works poorly with large rows so it's going to be removed in trunk RSN. -Jonathan
Re: questions about operations
On Thu, Jun 4, 2009 at 10:01 AM, Thorsten von Eicken wrote: > Ah, got it, I forgot about the time-sorted CFs. So does this mean that if I > call get_columns_since on a name-sorted CF I will get an invalid request > exception? And also if I call get_slice_by_name_range or get_slice_by_names > on a time-sorted CF? Or does the sorting only affect performance and not > whether the operations are allowed or not? My best guess from looking at the code (I haven't tested it) is that it will try to fulfil the request on the "wrong" kind of CF, but I don't think it actually handles that case correctly. If you could verify that there is a bug here and file a JIRA ticket if so, that would be helpful. :) > Also, is there no get_slice_super_since and get_slice_super_by_name_range? Right -- currently supercolumns are always name-sorted, and their subcolumns are always time-sorted. -Jonathan
Re: Database backstore
I suppose you could do that either directly from your client or with a proxy, but if your rdbms can handle the write volume then just use replication to handle the reads. Typically people move to Cassandra and other distributed dbs when they need to scale more writes than you can do on an rdbms. If possible, I think a better approach to "I don't trust this new technology" is to keep a separate (distributed) log of your writes somehow such that if you absolutely had to you could rebuild your cassandra data from. Risk of corruption with Cassandra is much lower than most systems since SSTables are immutable once written. -Jonathan On Thu, Jun 11, 2009 at 6:53 PM, testn wrote: > > Is it possible to persist the data into the database and using cassandra as a > cache writethrough? I wonder this because many organizations don't really > quite believe in the reliability of disk storage (i.e. can be corrupted). If > Cassandra can load data from Database on the fly while persisting it into > the database when writing, it would be perfect.. > -- > View this message in context: > http://n2.nabble.com/Database-backstore-tp3065200p3065200.html > Sent from the cassandra-user@incubator.apache.org mailing list archive at > Nabble.com. > >
Re: Viability of running on EC2
IMO the biggest downside to running on EC2 is that IO is terrible. I haven't done benchmarks, but anecdotally disk performance in particular seems like an order of magnitude slower than you'd get on non-virtual disks. So that is worth investigating before assuming that the price/performance on EC2 is what you think it is. Other than that, Cassandra is designed to emphasize availability so it should work fine in the situations you describe. Hinted handoff in particular will get writes to the right nodes quickly when machines come back online. (However, Cassandra is not yet good at dealing with machines becoming permanently dead.) Of course if _all_ of some keys' replicas are temporarily partitioned off from you you won't be able to read that data until they are visible again. -Jonathan On Sat, Jun 13, 2009 at 11:20 AM, Anthony Molinaro wrote: > Hi, > > I was wondering what the viability of running cassandra on ec2 was. > I believe that it currently runs on some pretty hefty hardware at > facebook, so I'm wondering what the minimum hardware config is > (in other words can I run it on a cluster of 2core 4GB machines)? > Also, running on Amazon means no multicast, network partitions and > machines just disappearing. How does cassandra deal with these > constraints/failures? > > Thanks for information, > > -Anthony > > -- > > Anthony Molinaro >
Re: Viability of running on EC2
https://issues.apache.org/jira/browse/CASSANDRA-208 is probably the issue you are referring to. It is fixed in trunk. Our goal is to run most workloads fine with 1GB of heap out of the box, which should be fine even on a small EC2 instance iirc. See http://wiki.apache.org/cassandra/MemtableThresholds for tuning memory use. -Jonathan On Sat, Jun 13, 2009 at 3:10 PM, Anthony Molinaro wrote: > And any problems with small memory boxes? I see some chatter on the > cassandra development list about OOM errors. Are they more prevalent > on smaller footprint boxes? > > Thanks again, > > -Anthony > > On Sat, Jun 13, 2009 at 11:33:21AM -0500, Jonathan Ellis wrote: >> IMO the biggest downside to running on EC2 is that IO is terrible. I >> haven't done benchmarks, but anecdotally disk performance in >> particular seems like an order of magnitude slower than you'd get on >> non-virtual disks. So that is worth investigating before assuming >> that the price/performance on EC2 is what you think it is. >> >> Other than that, Cassandra is designed to emphasize availability so it >> should work fine in the situations you describe. Hinted handoff in >> particular will get writes to the right nodes quickly when machines >> come back online. (However, Cassandra is not yet good at dealing with >> machines becoming permanently dead.) >> >> Of course if _all_ of some keys' replicas are temporarily partitioned >> off from you you won't be able to read that data until they are >> visible again. >> >> -Jonathan >> >> On Sat, Jun 13, 2009 at 11:20 AM, Anthony >> Molinaro wrote: >> > Hi, >> > >> > I was wondering what the viability of running cassandra on ec2 was. >> > I believe that it currently runs on some pretty hefty hardware at >> > facebook, so I'm wondering what the minimum hardware config is >> > (in other words can I run it on a cluster of 2core 4GB machines)? >> > Also, running on Amazon means no multicast, network partitions and >> > machines just disappearing. How does cassandra deal with these >> > constraints/failures? >> > >> > Thanks for information, >> > >> > -Anthony >> > >> > -- >> > >> > Anthony Molinaro >> > > > -- > > Anthony Molinaro >
Re: Querying columns return strange characters
byte[].toString is not the inverse of String.getBytes; you need to use new String(byte[]) for that. fyi, the characters you see are [: this is an array B: of bytes dcb03b: memory address this will let you recognize such output in the future :) -Jonathan On Mon, Jun 15, 2009 at 11:26 AM, Ivan Chang wrote: > I modified some test cases in the Cassandra distribution. Specifically in > the unit test package I modified ServerTest.java, basically just tried to > insert some columns and retrieve them. Here's part of the code: > > RowMutation rm = new RowMutation("Table1", "partner0"); > ColumnFamily cf = new ColumnFamily("Standard1", "Standard"); > long now = Calendar.getInstance().getTimeInMillis(); > System.out.println(now); > cf.addColumn("firstname", "John".getBytes(), now); > cf.addColumn("lastname", "Doe".getBytes(), now); > rm.add(cf); > try { > rm.apply(); > } catch (Exception e) { > } > > Table table = Table.open("Table1"); > > try { > Row result = table.getRow("partner0", "Standard1"); > System.out.println(result.toString()); > ColumnFamily cres = result.getColumnFamily("Standard1"); > Map cols = cres.getColumns(); > System.out.println(cols.size()); > Set c = cols.keySet(); > Iterator it = c.iterator(); > while (it.hasNext()) { > String cn = (String) it.next(); > System.out.println(cn); > //Byt/eArrayOutputStream baos = new ByteArrayOutputStream(); > /DataOutputStream dos = new DataOutputStream(baos); > //cres.getColumnSerializer().serialize(cres.getColumn(cn), > dos); > //dos.flush(); > //System.out.println(dos.size()); > //System.out.println(dos.toString()); > System.out.println(cres.getColumn(cn).value().toString()); > } > > //System.out.println(cres.getColumn("firstname").value().toString()); > } catch (Exception e) { > System.out.println(e.getMessage()); > } > > In summary, it's a very simple code that inserts a row (key "partner0") with > two columns: firstname (value "John"), lastname (value "Doe") to the > Standard1 column family. When I execute the test, I got the following > output: > > [testng] 1245082940509 > [testng] Row(partner0 [ColumnFamily(Standard1 > [firstname:false:4...@1245082940509, lastname:false:3...@1245082940509]))] > [testng] 2 > [testng] lastname > [testng] [...@dcb03b > [testng] firstname > [testng] [...@b60b93 > > Everything looks fine, the columns were inserted. However, the retrieved > values were [...@dcb03b for lastname and [...@b60b93 for firstname, instead of > what's inserted by the code ("Doe", "John"). > > Anyone could give a clue as to why this happened? > > Thanks! > > Ivan >
Re: Distributed filtering / aggregation
There's some preliminary support for running server-side filters (see CalloutManager.java) but basically the first person who needs this functionality gets to finish coding it up. :) I'm happy to help you get started but it's not something we're going to need soon. -Jonathan On Wed, Jun 17, 2009 at 10:46 AM, testn wrote: > > I don't see much documentation yet. But is there any chance that it can > perform filtering (apart from Range Query) or aggregation remote? > -- > View this message in context: > http://n2.nabble.com/Distributed-filtering---aggregation-tp3093626p3093626.html > Sent from the cassandra-user@incubator.apache.org mailing list archive at > Nabble.com. > >
Re: Data persistency
You're using internal APIs. Don't do that unless you know what you're doing. :) The client API is in Cassandra.Client. We have some sample code here: http://wiki.apache.org/cassandra/ClientExamples (although none in Java yet, it should still be pretty clear.) -Jonathan On Wed, Jun 17, 2009 at 3:54 PM, Ivan Chang wrote: > I tried to insert and retrieve data from a standalone Java program. While I > am able to insert and retrieve the correct data from within the Java > session. After I terminate the session, and rerun only the data retrieval > part, the previous inserted data does not exist anymore, throwing a null > exception. Here's the code: > > // Get storage-config file location > > System.out.println("storage-config="+DatabaseDescriptor.getConfigFileName()); > > // Insert some data with key "partner1" > RowMutation rm = new RowMutation("Table1", "partner1"); > ColumnFamily cf = new ColumnFamily("Standard1", "Standard"); > long now = Calendar.getInstance().getTimeInMillis(); > System.out.println(now); > cf.addColumn("firstname", "John1".getBytes(), now); > cf.addColumn("lastname", "Doe1".getBytes(), now); > rm.add(cf); > try { > rm.apply(); > } catch (Exception e) { > } > > // Retrieve data for key "partner1" > Table table = Table.open("Table1"); > > try { > Row result = table.getRow("partner1", "Standard1"); > System.out.println(result.toString()); > ColumnFamily cres = result.getColumnFamily("Standard1"); > Map cols = cres.getColumns(); > System.out.println(cols.size()); > Set c = cols.keySet(); > Iterator it = c.iterator(); > while (it.hasNext()) { > String cn = (String) it.next(); > System.out.println(cn); > System.out.println(new String(cres.getColumn(cn).value())); > } > } catch (Exception e) { > System.out.println("Ex: " + e.getMessage()); > } > > the print out from above is > > storage-config=~/Cassandra/trunk/conf/storage-conf.xml > 1245270260114 > Row(partner1 [ColumnFamily(Standard1 [firstname:false:5...@1245270260114, > lastname:false:4...@1245270260114]))] > 2 > lastname > Doe1 > firstname > John1 > > However, when I commented out the insert part of the above code and try > retrieve data again by rerunning the main code, I got an exception: > > Row(partner1 [)] > Ex: null > > So the data doesn't seem to persist across sessions. > > Could someone explain what's wrong with the code? > > Thanks, > Ivan >
Re: Data persistency
You don't. Supercolumns are not arbitrarily nestable. A columnfamily is either super or normal; a super columnfamily contains supercolumns, which in turn contain Columns. A normal columnfamily contains Columns directly. You can't mix-and-match supercolumns and normal columns (at the same level of nesting) in a single columnfamily. -Jonathan On Thu, Jun 18, 2009 at 12:12 PM, Ivan Chang wrote: > Using Cassandra.Client works. However more questions arise, specifically > regarding Super Columns, while the following code persist the super column > "sc1"with 3 simple columns. How do I create nested super columns? A super > column with multiple super columns and standard columns? Thanks, Ivan > > // Super Column > batch_mutation_super_t bt = new batch_mutation_super_t(); > bt.key = "testkey"; > bt.table = tablename_; > bt.cfmap = new HashMap>(); > List superColumn_arr = new > ArrayList(); > List column_arr2 = new ArrayList(); > column_arr2.add(new column_t("c1", "v1".getBytes(), now)); > column_arr2.add(new column_t("c2", "v2".getBytes(), now)); > column_arr2.add(new column_t("c3", "v3".getBytes(), now)); > superColumn_arr.add(new superColumn_t("sc1", column_arr2)); > bt.cfmap.put("Super1", superColumn_arr); > peerstorageClient.batch_insert_superColumn(bt, false); > > On Wed, Jun 17, 2009 at 5:01 PM, Jonathan Ellis wrote: >> >> You're using internal APIs. Don't do that unless you know what you're >> doing. :) >> >> The client API is in Cassandra.Client. >> >> We have some sample code here: >> http://wiki.apache.org/cassandra/ClientExamples >> >> (although none in Java yet, it should still be pretty clear.) >> >> -Jonathan >> >> On Wed, Jun 17, 2009 at 3:54 PM, Ivan Chang wrote: >> > I tried to insert and retrieve data from a standalone Java program. >> > While I >> > am able to insert and retrieve the correct data from within the Java >> > session. After I terminate the session, and rerun only the data >> > retrieval >> > part, the previous inserted data does not exist anymore, throwing a null >> > exception. Here's the code: >> > >> > // Get storage-config file location >> > >> > >> > System.out.println("storage-config="+DatabaseDescriptor.getConfigFileName()); >> > >> > // Insert some data with key "partner1" >> > RowMutation rm = new RowMutation("Table1", "partner1"); >> > ColumnFamily cf = new ColumnFamily("Standard1", "Standard"); >> > long now = Calendar.getInstance().getTimeInMillis(); >> > System.out.println(now); >> > cf.addColumn("firstname", "John1".getBytes(), now); >> > cf.addColumn("lastname", "Doe1".getBytes(), now); >> > rm.add(cf); >> > try { >> > rm.apply(); >> > } catch (Exception e) { >> > } >> > >> > // Retrieve data for key "partner1" >> > Table table = Table.open("Table1"); >> > >> > try { >> > Row result = table.getRow("partner1", "Standard1"); >> > System.out.println(result.toString()); >> > ColumnFamily cres = result.getColumnFamily("Standard1"); >> > Map cols = cres.getColumns(); >> > System.out.println(cols.size()); >> > Set c = cols.keySet(); >> > Iterator it = c.iterator(); >> > while (it.hasNext()) { >> > String cn = (String) it.next(); >> > System.out.println(cn); >> > System.out.println(new >> > String(cres.getColumn(cn).value())); >> > } >> > } catch (Exception e) { >> > System.out.println("Ex: " + e.getMessage()); >> > } >> > >> > the print out from above is >> > >> > storage-config=~/Cassandra/trunk/conf/storage-conf.xml >> > 1245270260114 >> > Row(partner1 [ColumnFamily(Standard1 [firstname:false:5...@1245270260114, >> > lastname:false:4...@1245270260114]))] >> > 2 >> > lastname >> > Doe1 >> > firstname >> > John1 >> > >> > However, when I commented out the insert part of the above code and try >> > retrieve data again by rerunning the main code, I got an exception: >> > >> > Row(partner1 [)] >> > Ex: null >> > >> > So the data doesn't seem to persist across sessions. >> > >> > Could someone explain what's wrong with the code? >> > >> > Thanks, >> > Ivan >> > > >
Re: Database backstore
You have to give up a lot of optimizations when you say "we're going to plug into any generic backend." That is not something we are interested in doing. -Jonathan On Mon, Jun 22, 2009 at 7:15 AM, testn wrote: > > It would be nice if we can plug in different backstore to it. Voldemort seems > to be quite extensible that way and I think it's quite suitable for an > application that has high read/write ratio. > > > Jonathan Ellis wrote: >> >> I suppose you could do that either directly from your client or with a >> proxy, but if your rdbms can handle the write volume then just use >> replication to handle the reads. Typically people move to Cassandra >> and other distributed dbs when they need to scale more writes than you >> can do on an rdbms. >> >> If possible, I think a better approach to "I don't trust this new >> technology" is to keep a separate (distributed) log of your writes >> somehow such that if you absolutely had to you could rebuild your >> cassandra data from. >> >> Risk of corruption with Cassandra is much lower than most systems >> since SSTables are immutable once written. >> >> -Jonathan >> >> On Thu, Jun 11, 2009 at 6:53 PM, testn wrote: >>> >>> Is it possible to persist the data into the database and using cassandra >>> as a >>> cache writethrough? I wonder this because many organizations don't really >>> quite believe in the reliability of disk storage (i.e. can be corrupted). >>> If >>> Cassandra can load data from Database on the fly while persisting it into >>> the database when writing, it would be perfect.. >>> -- >>> View this message in context: >>> http://n2.nabble.com/Database-backstore-tp3065200p3065200.html >>> Sent from the cassandra-user@incubator.apache.org mailing list archive at >>> Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://n2.nabble.com/Database-backstore-tp3065200p3135134.html > Sent from the cassandra-user@incubator.apache.org mailing list archive at > Nabble.com. > >
Re: New table and column families
you'll need to (a) make sure you have the latest trunk (b) wipe your data, commitlog, and system directories, since adding new tables or columnfamilies non-destructively is not yet supported (see https://issues.apache.org/jira/browse/CASSANDRA-44) -Jonathan On Tue, Jun 23, 2009 at 8:55 AM, Ivan Chang wrote: > I modified storage-config.xml to add a new table and couple column families > (see excerpt below). The new table added is identified by the name > "NewTable" and associated column families "Standard3", "Super3", and > "Super4". > > > > > > FlushPeriodInMinutes="60"/> > > > > Name="Super1"/> > Name="Super2"/> > > > > Name="Super3"/> > Name="Super4"/> > > > > Here comes some code to insert some data, the goal is to feed Cassandra with > data from an xml file. > When I execute the code, I got an exception. What I don't understand is why > this code failed even I have configured the super column families and new > table etc. > > InvalidRequestException(why:Column Family Super3 is invalid.) > at > org.apache.cassandra.service.Cassandra$get_column_result.read(Cassandra.java:3604) > at > org.apache.cassandra.service.Cassandra$Client.recv_get_column(Cassandra.java:202) > at > org.apache.cassandra.service.Cassandra$Client.get_column(Cassandra.java:178) > ... > > // New Table Sample > String docID = ""; > try { > batch_mutation_super_t bt = new batch_mutation_super_t(); > bt.table = "NewTable"; > bt.cfmap = new HashMap>(); > > // Read sample xml > XMLUtils xmlUtils = new XMLUtils( > System.getProperty("samples-xml-dir") > + System.getProperty("file.separator") > + "Sample.xml"); > > /* docID from xml file */ > doctID = xmlUtils.getNodeValue("/Document/docID"); > bt.key = docID; > > // Collect all nodes that matches /Document/node1 > NodeList nl = xmlUtils.getRequestedNodeList("/Document/node1"); > > StringWriter sw = new StringWriter(); > Transformer t = > TransformerFactory.newInstance().newTransformer(); > t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); > t.transform(new DOMSource(nl.item(0)), new StreamResult(sw)); > sw.flush(); > System.out.println(sw.toString()); > sw.close(); > > /* nodes */ > // see populate function below > List nodes_arr = populate("node", "/Document/node1", > xmlUtils, t); > > List S3 = new ArrayList(); > > S3.add(new superColumn_t("sc1", nodes_arr)); > > bt.cfmap.put("Super3", S3); > > List S4= new ArrayList(); > > S4.add(new superColumn_t("sc1_replicate", nodes_arr)); > > bt.cfmap.put("Super4", S4); > > peerstorageClient.batch_insert_superColumn(bt, false); > > } catch (Exception e) { > e.printStackTrace(); > } > > // Returns columns of XML data matching xpath on given xml doc (via > xmlUtlis) > private static List populate(String column_prefix, String > xpath, XMLUtils xmlUtils, Transformer t) throws Exception { > StringWriter sw = new StringWriter(); > List c = new ArrayList(); > NodeList nl = xmlUtils.getRequestedNodeList(xpath); > long now = Calendar.getInstance().getTimeInMillis(); > if (nl != null) { > for (int i = 0; i < nl.getLength(); i++) { > sw = new StringWriter(); > t.transform(new DOMSource(nl.item(i)), new > StreamResult(sw)); > sw.flush(); > System.out.println(sw.toString()); > c.add(new column_t(column_prefix+i, > sw.toString().getBytes(), now)); > sw.close(); > } > } > return c; > } > > Thanks for checking this issue out. > > -Ivan
Re: Question about cassandra (replication)
Rather than post the same question verbatim, it would be more useful if you explained what you still don't understand after Alexander and Sandeep's explanations on the google group. (http://groups.google.com/group/cassandra-user/browse_thread/thread/4330e415e959e9d9) On Thu, Jun 25, 2009 at 9:11 AM, Harold Lim wrote: > > Hi All, > > I posted a similar message on the google groups page. Hopefully, I'll get > more feedback here. > > > I just started reading about dynamo and Cassandra and I am thinking > about possibly using cassandra for my system. > > I was reading the dynamo paper and they mentioned about a preference > list for a particular key. Is this preference list configurable? > > How does Cassandra choose which nodes are in the preference list? > Also, are the number of replica for each key/column configurable? For > example, can I set the replication factor per key/value? > > I read that Cassandra has optimistic replication. What exactly does > that mean? Underneath the hood, how does cassandra maintain/detect the > number of replicas? Does it aggressively replicates an item, when it > detects that the number of replica of a particular item goes below the > specified repliation factor? > > Is the replication strategy (when to replicate, aggresiveness, etc) > configurable too? > > > > > > > Thanks, > Harold > > > >
Re: Question about cassandra (replication)
On Thu, Jun 25, 2009 at 10:10 AM, Harold Lim wrote: > > Hi, > > Is the replication factor configurable? For example, Can I configure the > replication factor per column-family (e.g., 5 for column-family a and 3 for > column-family b). It is currently only configurable globally. It may make sense to configure on a table/namespace basis. IMO it does not make sense on a CF basis. > Also, I am interested about the replication details. Sandeep wrote: > "When there's a failure and the #of replicas for a given key goes down, > Cassandra does not aggressively create a new copy for the data. The > assumption is that the failed node will be replaced soon enough, and work > can continue with the other 2 replicas." > > When and how does cassandra replicate when the replication count of a > particular data goes below the replication factor? How does it monitor the > replication count of a particular data? Currently it re-replicates (repairs) lazily. This is called "read repair" and we follow essentially the model given in the Dynamo paper. Non-lazy repair is being worked on at https://issues.apache.org/jira/browse/CASSANDRA-193 -Jonathan
Re: schema example
get_columns_since On Fri, Jul 3, 2009 at 7:21 PM, Evan Weaver wrote: > This helps a lot. > > However, I can't find any API method that actually lets me do a > slice query on a time-sorted column, as necessary for the second blog > example. I get the following error on r789419: > > InvalidRequestException: get_slice_from requires CF indexed by name > > Evan > > On Tue, May 19, 2009 at 8:00 PM, Jonathan Ellis wrote: >> Mail storage, man, I think pretty much anything I could come up with >> would look pretty simplistic compared to what "real" systems do in >> that domain. :) >> >> But blogs, I think I can handle those. Let's make it ours multiuser >> or there isn't enough scale to make it interesting. :) >> >> The interesting thing here is we want to be able to query two things >> efficiently: >> - the most recent posts belonging to a given blog, in reverse >> chronological order >> - a single post and its comments, in chronological order >> >> At first glance you might think we can again reasonably do this with a >> single CF, this time a super CF: >> >> >> >> The key is the blog name, the supercolumns are posts and the >> subcolumns are comments. This would be reasonable BUT supercolumns >> are just containers, they have no data or timestamp associated with >> them directly (only through their subcolumns). So you cannot sort a >> super CF by time. >> >> So instead what I would do would be to use two CFs: >> >> >> >> >> For the first, the keys used would be blog names, and the columns >> would be the post titles and body. So to get a list of most recent >> posts you just do a slice query. Even though Cassandra currently >> handles large groups of columns sub-optimally, even with a blog >> updated several times a day you'd be safe taking this approach (i.e. >> we'll have that problem fixed before you start seeing it :). >> >> For the second, the keys are blog name. The >> columns are the comment data. You can serialize these a number of >> ways; I would probably use title as the column name and have the value >> be the author + body (e.g. as a json dict). Again we use the slice >> call to get the comments in order. (We will have to manually reverse >> what slice gives us since time sort is always reverse chronological >> atm, but the overhead of doing this in memory will be negligible.) >> >> Does this help? >> >> -Jonathan >> >> On Tue, May 19, 2009 at 11:49 AM, Evan Weaver wrote: >>> Even if it's not actually in real-life use, some examples for common >>> domains would really help clarify things. >>> >>> * blog >>> * email storage >>> * search index >>> >>> etc. >>> >>> Evan >>> >>> On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis wrote: >>>> Does anyone have a simple app schema they can share? >>>> >>>> I can't share the one for our main app. But we do need an example >>>> here. A real one would be nice if we can find one. >>>> >>>> I checked App Engine. They don't have a whole lot of examples either. >>>> They do have a really simple one: >>>> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html >>>> >>>> The most important thing in Cassandra modeling is choosing a good key, >>>> since that is what most of your lookups will be by. Keys are also how >>>> Cassandra scales -- Cassandra can handle effectively infinite keys >>>> (given enough nodes obviously) but only thousands to millions of >>>> columns per key/CF (depending on what API calls you use -- Jun is >>>> adding one now that does not deseriailze everything in the whole CF >>>> into memory. The rest will need to follow this model eventually too). >>>> >>>> For this guestbook I think the choice is obvious: use the name as the >>>> key, and have a single simple CF for the messages. Each column will >>>> be a message (you can even use the mandatory timestamp field as part >>>> of your user-visible data. win!). You get the list (or page) of >>>> users with get_key_range and then their messages with get_slice. >>>> >>>> >>>> >>>> Anyone got another one for pedagogical purposes? >>>> >>>> -Jonathan >>>> >>> >>> >>> >>> -- >>> Evan Weaver >>> >> > > > > -- > Evan Weaver >
Re: schema example
On Fri, Jul 3, 2009 at 8:53 PM, Evan Weaver wrote: > (From talking on IRC): > > I think this boils down to the offset/limit vs. token/limit debate. > > Token/limit is fine in all cases for me, but you still have to be able > to query the head of the list (with a limit, but no token) to get > started. Right now there is no facility for that on time-sorted column > families: > > list get_columns_since(1:string tablename, 2:string key, > 3:string columnParent, 4:i64 timeStamp) basically we need _since to add the kind of functionality we have in Slice (or will, after 261 is committed). it's probably better to get 240 (and 185 + 189) done sooner than later though instead of wasting effort on an API we know is broken. (the old get_slice could do basically anything since it deserialized the entire CF into memory. we're moving away from that to support larger-than-memory CFs.) -Jonathan
Re: [Announce] CassandraClient 0.1 for Ruby released
Nice! On Sat, Jul 4, 2009 at 4:59 AM, Evan Weaver wrote: > I am pleased to release: > > cassandra_client 0.1 > > A Ruby client for the Cassandra distributed database. > > http://blog.evanweaver.com/files/doc/fauna/cassandra_client/ > http://github.com/fauna/cassandra_client/ > > Evan > > -- > Evan Weaver >
Re: cassandra Cli example from wiki error
This is a known problem in trunk. It's fixed by the patch in issue 272, which should be applied tonight or tomorrow. -Jonathan On Mon, Jul 6, 2009 at 7:27 PM, Kevin Castiglione wrote: > hi > i just got cassandra compiled. > but the cli example from wiki is not working. the conf files are untouched. > can you help me out here! > thanks > > CLI output: > ./cassandra-cli --host localhost --port 9160 > Connected to localhost/9160 > Welcome to cassandra CLI. > > Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. > cassandra> set Table1.Standard1['jsmith']['first'] = 'John' > Statement processed. > cassandra> set Table1.Standard1['jsmith']['last'] = 'Smith' > Statement processed. > cassandra> set Table1.Standard1['jsmith']['age'] = '42' > Statement processed. > cassandra> get Table1.Standard1['jsmith'] > Error: CQL Execution Error > cassandra> > > > > > > cassandra output > sudo ./bin/cassandra -f > Listening for transport dt_socket at address: > DEBUG - Loading settings from ./bin/../conf/storage-conf.xml > DEBUG - adding Super1 as 0 > DEBUG - adding Standard2 as 1 > DEBUG - adding Standard1 as 2 > DEBUG - adding StandardByTime1 as 3 > DEBUG - adding LocationInfo as 4 > DEBUG - adding HintsColumnFamily as 5 > DEBUG - Starting to listen on 127.0.0.1:7001 > INFO - Cassandra starting up... > DEBUG - Compiling CQL query ... > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'first') > 'John') > DEBUG - Executing CQL query ... > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000 > DEBUG - Compiling CQL query ... > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'last') > 'Smith') > DEBUG - Executing CQL query ... > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000 > DEBUG - Compiling CQL query ... > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'age') '42') > DEBUG - Executing CQL query ... > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000 > DEBUG - Compiling CQL query ... > DEBUG - AST: (A_GET (A_COLUMN_ACCESS Table1 Standard1 'jsmith')) > DEBUG - Executing CQL query ... > DEBUG - weakreadlocal reading SliceFromReadCommand(table='Table1', > key='jsmith', columnFamily='Standard1', isAscending='true', limit='-1', > count='2147483647') > ERROR - Exception was generated at : 07/06/2009 17:21:30 on thread > pool-1-thread-1 > 1 > java.lang.ArrayIndexOutOfBoundsException: 1 > at org.apache.cassandra.db.Table.getSliceFrom(Table.java:612) > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57) > at > org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:600) > at > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:303) > at > org.apache.cassandra.cql.common.ColumnRangeQueryRSD.getRows(ColumnRangeQueryRSD.java:101) > at > org.apache.cassandra.cql.common.QueryPlan.execute(QueryPlan.java:41) > at > org.apache.cassandra.cql.driver.CqlDriver.executeQuery(CqlDriver.java:45) > at > org.apache.cassandra.service.CassandraServer.executeQuery(CassandraServer.java:491) > at > org.apache.cassandra.service.Cassandra$Processor$executeQuery.process(Cassandra.java:1323) > at > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:839) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > > > > > > svn version : Revision: 791656 > > java -version > java version "1.6.0_14" > Java(TM) SE Runtime Environment (build 1.6.0_14-b08) > Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode, sharing) > > > >
Re: cassandra Cli example from wiki error
Sorry, 277 is the right issue. Just one patch. Once it's applied it will be in svn trunk. On Mon, Jul 6, 2009 at 7:35 PM, Kevin Castiglione wrote: > thanks for this: > http://issues.apache.org/jira/browse/CASSANDRA-272 > > do i need to apply all 3 patches? > > or can you tell me which svn version i can use so that it is working? > thanks again! > On Mon, Jul 6, 2009 at 5:31 PM, Jonathan Ellis wrote: >> >> This is a known problem in trunk. It's fixed by the patch in issue >> 272, which should be applied tonight or tomorrow. >> >> -Jonathan >> >> On Mon, Jul 6, 2009 at 7:27 PM, Kevin >> Castiglione wrote: >> > hi >> > i just got cassandra compiled. >> > but the cli example from wiki is not working. the conf files are >> > untouched. >> > can you help me out here! >> > thanks >> > >> > CLI output: >> > ./cassandra-cli --host localhost --port 9160 >> > Connected to localhost/9160 >> > Welcome to cassandra CLI. >> > >> > Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. >> > cassandra> set Table1.Standard1['jsmith']['first'] = 'John' >> > Statement processed. >> > cassandra> set Table1.Standard1['jsmith']['last'] = 'Smith' >> > Statement processed. >> > cassandra> set Table1.Standard1['jsmith']['age'] = '42' >> > Statement processed. >> > cassandra> get Table1.Standard1['jsmith'] >> > Error: CQL Execution Error >> > cassandra> >> > >> > >> > >> > >> > >> > cassandra output >> > sudo ./bin/cassandra -f >> > Listening for transport dt_socket at address: >> > DEBUG - Loading settings from ./bin/../conf/storage-conf.xml >> > DEBUG - adding Super1 as 0 >> > DEBUG - adding Standard2 as 1 >> > DEBUG - adding Standard1 as 2 >> > DEBUG - adding StandardByTime1 as 3 >> > DEBUG - adding LocationInfo as 4 >> > DEBUG - adding HintsColumnFamily as 5 >> > DEBUG - Starting to listen on 127.0.0.1:7001 >> > INFO - Cassandra starting up... >> > DEBUG - Compiling CQL query ... >> > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'first') >> > 'John') >> > DEBUG - Executing CQL query ... >> > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000 >> > DEBUG - Compiling CQL query ... >> > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'last') >> > 'Smith') >> > DEBUG - Executing CQL query ... >> > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000 >> > DEBUG - Compiling CQL query ... >> > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'age') >> > '42') >> > DEBUG - Executing CQL query ... >> > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000 >> > DEBUG - Compiling CQL query ... >> > DEBUG - AST: (A_GET (A_COLUMN_ACCESS Table1 Standard1 'jsmith')) >> > DEBUG - Executing CQL query ... >> > DEBUG - weakreadlocal reading SliceFromReadCommand(table='Table1', >> > key='jsmith', columnFamily='Standard1', isAscending='true', limit='-1', >> > count='2147483647') >> > ERROR - Exception was generated at : 07/06/2009 17:21:30 on thread >> > pool-1-thread-1 >> > 1 >> > java.lang.ArrayIndexOutOfBoundsException: 1 >> > at org.apache.cassandra.db.Table.getSliceFrom(Table.java:612) >> > at >> > >> > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57) >> > at >> > >> > org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:600) >> > at >> > >> > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:303) >> > at >> > >> > org.apache.cassandra.cql.common.ColumnRangeQueryRSD.getRows(ColumnRangeQueryRSD.java:101) >> > at >> > org.apache.cassandra.cql.common.QueryPlan.execute(QueryPlan.java:41) >> > at >> > >> > org.apache.cassandra.cql.driver.CqlDriver.executeQuery(CqlDriver.java:45) >> > at >> > >> > org.apache.cassandra.service.CassandraServer.executeQuery(CassandraServer.java:491) >> > at >> > >> > org.apache.cassandra.service.Cassandra$Processor$executeQuery.process(Cassandra.java:1323) >> > at >> > >> > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:839) >> > at >> > >> > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) >> > at >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> > at >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> > at java.lang.Thread.run(Thread.java:619) >> > >> > >> > >> > >> > >> > >> > svn version : Revision: 791656 >> > >> > java -version >> > java version "1.6.0_14" >> > Java(TM) SE Runtime Environment (build 1.6.0_14-b08) >> > Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode, sharing) >> > >> > >> > >> > > >
Re: problems with python client
you want start='' finish='' offset=0 On Tue, Jul 7, 2009 at 8:01 AM, Kevin Castiglione wrote: > i have inserted a row into the table Table1 and Standard1 column family. And > this works with the cassandra-cli > > cassandra> get Table1.Standard1['1'] > COLUMN_TIMESTAMP = 1246942866; COLUMN_VALUE = 24; COLUMN_KEY = age; > COLUMN_TIMESTAMP = 1246943353; COLUMN_VALUE = Chris Goffinet; COLUMN_KEY = > name; > Statement processed. > > > but if i try to get this data using the python client I get an empty list: client.get_slice(tablename='Table1', key='1', columnParent='Standard1', start='0', finish='100', isAscending=True, offset=-1, count=1000) > [ ] > > this is the output from cassandra > DEBUG - weakreadlocal reading SliceFromReadCommand(table='Table1', key='1', > columnFamily='Standard1', isAscending='true', limit='-1', count='1000') > DEBUG - clearing > > > also notice that the argument 'offset' in the python client is actually > passed to cassandra as 'limit'. > > > is there something im missing here? > thanks >
Re: problems with python client
On Tue, Jul 7, 2009 at 8:19 AM, Kevin Castiglione wrote: > thanks a lot for this! it works. > can you pl. explain what start, finish, isAscending are? start = column name to start with finish = " " to stop with ascending = order to return columns in > also the value i pass to offset gets passed to cassandra as limit, is this > expected? not sure what you mean.
Re: problems with python client
On Tue, Jul 7, 2009 at 8:31 AM, Kevin Castiglione wrote: > you can see that i passed the value -1 to offset and in the cassandra server > log, it is received as the argument limit. > offset and limit mean different things right? is this a problem in python > client? or am i missing something here? ah, that just means I forgot to update toString on the java side. :)
Re: Up and Running with Cassandra
Before 0.4 is released. :) The user-facing API is more of an immediate pain point (tickets 139, 185, 240), but the disk format change would be next in my mind. -Jonathan On Tue, Jul 7, 2009 at 1:06 PM, Kevin Castiglione wrote: > any ideas when this will happen? > thanks > > On Tue, Jul 7, 2009 at 10:52 AM, Evan Weaver wrote: >> >> It will; I don't think the change is committed yet. >> >> Evan >> >> On Tue, Jul 7, 2009 at 10:50 AM, Kevin >> Castiglione wrote: >> > thanks for this post! >> > >> > you have said that: >> > the on-disk storage format is expected to change in version 0.4.0. >> > >> > >> > im using svn latest revision 791696. will the on-disk storage format >> > change >> > affect this version? >> > >> > On Mon, Jul 6, 2009 at 11:18 PM, Evan Weaver wrote: >> >> >> >> In case you missed it, a big introductory post: >> >> >> >> >> >> >> >> http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/ >> >> >> >> Evan >> >> >> >> -- >> >> Evan Weaver >> > >> > >> >> >> >> -- >> Evan Weaver > >
Re: problem running cassandra
what version are you trying to run? on what platform? On Thu, Jul 9, 2009 at 12:04 PM, wrote: > I did set it up as the readme file instructed but i encountered this error, > Can you please suggest how i fix this > thanks > > cassandra]$ bin/cassandra -f > Listening for transport dt_socket at address: > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/cassandra/service/CassandraDaemon > Caused by: java.lang.ClassNotFoundException: > org.apache.cassandra.service.CassandraDaemon > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:252) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) > Could not find the main class: > org.apache.cassandra.service.CassandraDaemon. Program will exit. > >
Re: problem running cassandra
for 0.3 you can connect to the web interface on port 7002 (configurable). In trunk we have removed the web interface in favor of JMX and nodeprobe; see http://wiki.apache.org/cassandra/GettingStarted, http://wiki.apache.org/cassandra/NodeProbe, and http://wiki.apache.org/cassandra/MemtableThresholds On Thu, Jul 9, 2009 at 1:00 PM, wrote: > Hey jonathan > thanks a lot > fedora > I searched and found that the problem was i hadnt setup JAVA_HOME > once i set it up > it worked immediately > But i m trying to setup the cassandra web inerface. Can you show me how to > setup cassandra > Thanks a lot > > On Thu, Jul 9, 2009 at 10:27 AM, Jonathan Ellis wrote: >> >> what version are you trying to run? on what platform? >> >> On Thu, Jul 9, 2009 at 12:04 PM, wrote: >> > I did set it up as the readme file instructed but i encountered this >> > error, >> > Can you please suggest how i fix this >> > thanks >> > >> > cassandra]$ bin/cassandra -f >> > Listening for transport dt_socket at address: >> > Exception in thread "main" java.lang.NoClassDefFoundError: >> > org/apache/cassandra/service/CassandraDaemon >> > Caused by: java.lang.ClassNotFoundException: >> > org.apache.cassandra.service.CassandraDaemon >> > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >> > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >> > Could not find the main class: >> > org.apache.cassandra.service.CassandraDaemon. Program will exit. >> > >> > > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: problem running cassandra
because it was (a) buggy and (b) trying to do too many things at once, all of which html was a poor fit for. you can generate a bare-bones python client with thrift; see http://wiki.apache.org/cassandra/ThriftInterface the Digg guys are working on a more idiomatic python client. On Thu, Jul 9, 2009 at 3:20 PM, wrote: > why was the web interface removed? > Is there a simple python client for cassandra like python-couchdb > thanks a lot > > On Thu, Jul 9, 2009 at 12:25 PM, Jonathan Ellis wrote: >> >> for 0.3 you can connect to the web interface on port 7002 (configurable). >> >> In trunk we have removed the web interface in favor of JMX and >> nodeprobe; see http://wiki.apache.org/cassandra/GettingStarted, >> http://wiki.apache.org/cassandra/NodeProbe, and >> http://wiki.apache.org/cassandra/MemtableThresholds >> >> On Thu, Jul 9, 2009 at 1:00 PM, wrote: >> > Hey jonathan >> > thanks a lot >> > fedora >> > I searched and found that the problem was i hadnt setup JAVA_HOME >> > once i set it up >> > it worked immediately >> > But i m trying to setup the cassandra web inerface. Can you show me how >> > to >> > setup cassandra >> > Thanks a lot >> > >> > On Thu, Jul 9, 2009 at 10:27 AM, Jonathan Ellis >> > wrote: >> >> >> >> what version are you trying to run? on what platform? >> >> >> >> On Thu, Jul 9, 2009 at 12:04 PM, wrote: >> >> > I did set it up as the readme file instructed but i encountered this >> >> > error, >> >> > Can you please suggest how i fix this >> >> > thanks >> >> > >> >> > cassandra]$ bin/cassandra -f >> >> > Listening for transport dt_socket at address: >> >> > Exception in thread "main" java.lang.NoClassDefFoundError: >> >> > org/apache/cassandra/service/CassandraDaemon >> >> > Caused by: java.lang.ClassNotFoundException: >> >> > org.apache.cassandra.service.CassandraDaemon >> >> > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> >> > at java.security.AccessController.doPrivileged(Native Method) >> >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> >> > at >> >> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >> >> > at >> >> > java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >> >> > Could not find the main class: >> >> > org.apache.cassandra.service.CassandraDaemon. Program will exit. >> >> > >> >> > >> > >> > >> > >> > -- >> > Bidegg worlds best auction site >> > http://bidegg.com >> > > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: How to answer queries of form "Give me the top 10 messages"
Have you read this? http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/ On Fri, Jul 10, 2009 at 4:43 PM, wrote: > Hey guys > how do we answer queries of type - give me the top 10 messages > or top 10 users and so on > thanks > > Example: SuperColumns for Search Apps > > You can think of each supercolumn name as a term and the columns within as > the docids with rank info and other attributes being a part of it. If you > have keys as the userids then you can have a per-user index stored in this > form. This is how the per user index for term search is laid out for Inbox > search at Facebook. Furthermore since one has the option of storing data on > disk sorted by "Time" it is very easy for the system to answer queries of > the form "Give me the top 10 messages". For a pictorial explanation please > refer to the Cassandra powerpoint slides presented at SIGMOD 2008.
Re: Can we connect to every node in cassandra ?
Every node assumes each other node listens on the same ports. (This might seem inflexible but it is actually a good policy to enforce.) So just make sure those numbers are consistent across the cluster. On Sun, Jul 12, 2009 at 5:31 PM, wrote: > Yes. There are more ports than just '9160' to consider. Gossip, Storage, > UDP, etc. So as long as the other nodes have similar configs, just setting > the IP's in the seed section is good enough. > Thanks chris > How do we specify 9160, gossip,storage, udp in the seeds xml section > > On Sun, Jul 12, 2009 at 1:37 PM, Chris Goffinet wrote: >> >> On Jul 12, 2009, at 1:34 PM, mobiledream...@gmail.com wrote: >> >> Say there are 4 nodes in cassandra, is it possible that we can send >> insert/delete/update queries to any of the nodes? >> >> >> >> Yes. Using the default partitioner, its designed to connect to any nodes. >> >> >> Do all the data stores in other nodes need to be runnin on port 9160 as >> there is not a way to specify port in the list of seeds >> >> >> Yes. There are more ports than just '9160' to consider. Gossip, Storage, >> UDP, etc. So as long as the other nodes have similar configs, just setting >> the IP's in the seed section is good enough. >> >> Thanks a lot >> -- >> Bidegg worlds best auction site >> http://bidegg.com >> > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: cassandra slows down after inserts
On Mon, Jul 13, 2009 at 12:37 AM, Sandeep Tata wrote: > What hardware are you running one? How long does the slowdown last ? > There are a few reasons for temporary slowdowns ... perhaps the JVM > started GCing? Every time someone has reported this symptom, that has been the problem. The object count tunable is the most direct way to ameliorate this. http://wiki.apache.org/cassandra/MemtableThresholds
Re: cassandra slows down after inserts
See the wiki page I linked. On Mon, Jul 13, 2009 at 8:06 AM, rkmr...@gmail.com wrote: > > how do i find out if JVM is GCing? > > On Sun, Jul 12, 2009 at 10:37 PM, Sandeep Tata > wrote: >> >> What hardware are you running one? > > dual quadcore intel xeon 2.0 ghz, 32GB ram, and hardware raid config > operating system is fedora core 9 > > >> How long does the slowdown last ? > > i stopped inserting data after slowdown starts, and it is still slow now > after over 10 hours. > however if i stop cassandra and start it, it becomes super fast immediately. > till i insert another 100k or so rows when it becomes really slow again. > > >> >> There are a few reasons for temporary slowdowns ... perhaps the JVM >> started GCing? > > how do i find out if this is the cause? > > >> >> Cassandra spends time cleaning up the on-disk SSTables >> in a process called compaction. This could cause the client to observe >> a slowdown. >> >> Things you could try -- >> Reduce the Memtable size in the config files. (If GCing was the problem) >> Increasing the number of SSTables written before compaction kicks in. > > can you tell me what numbers i should use? > > thanks! > >
Re: cassandra slows down after inserts
Cassandra is replaying the transaction log and preloading SSTable indexes. This is normal. On Mon, Jul 13, 2009 at 8:10 AM, rkmr...@gmail.com wrote: > when i stop cassandra and start it again, this is what is printed. it takes > just a couple of seconds for this to run. > and after that it becomes really fast. > > > Listening for transport dt_socket at address: > DEBUG - Loading settings from ./../conf/storage-conf.xml > DEBUG - adding Super1 as 0 > DEBUG - adding Standard2 as 1 > DEBUG - adding Standard1 as 2 > DEBUG - adding StandardByTime1 as 3 > DEBUG - adding LocationInfo as 4 > DEBUG - adding HintsColumnFamily as 5 > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Super1-9-Data.db: 400 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Super1-52-Data.db: 300 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Super1-92-Data.db: 300 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db: 751 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db: 100 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db: 50 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db: 100 ms. > INFO - Compacting > [/home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db] > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db: 0 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db: 50 ms. > DEBUG - INDEX LOAD TIME for > /home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db: 0 ms. > INFO - Replaying > /home/mark/local/var/cassandra/commitlog/CommitLog-1247454203796.log > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db : 73600 > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db : 84224 > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db : 94848 > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db : 105472 > DEBUG - Expected bloom filter size : 105472 > INFO - Compacted to > /home/mark/local/var/cassandra/data/Table1-Super1-139-Data.db. 0/28831084 > bytes for 104856/104860 keys read/written. Time: 8119ms. > INFO - Flushing Memtable(Super1)@552364977 > DEBUG - Submitting Super1 for compaction > INFO - Completed flushing Memtable(Super1)@552364977 > INFO - Flushing Memtable(Standard1)@1290243769 > DEBUG - Submitting Standard1 for compaction > INFO - Completed flushing Memtable(Standard1)@1290243769 > INFO - Compacting > [/home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-8-Data.db] > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db : 256 > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db : 512 > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db : 768 > DEBUG - index size for bloom filter calc for file : > /home/mark/local/var/cassandra/data/Table1-Standard1-8-Data.db : 1024 > DEBUG - Expected bloom filter size : 1024 > INFO - Compacted to > /home/mark/local/var/cassandra/data/Table1-Standard1-3-Data.db. 0/210 bytes > for 0/1 keys read/written. Time: 301ms. > DEBUG - Starting to listen on 127.0.0.1:7001 > INFO - Cassandra starting up... > > > > On Mon, Jul 13, 2009 at 6:06 AM, rkmr...@gmail.com > wrote: >> >> how do i find out if JVM is GCing? >> >> On Sun, Jul 12, 2009 at 10:37 PM, Sandeep Tata >> wrote: >>> >>> What hardware are you running one? >> >> dual quadcore intel xeon 2.0 ghz, 32GB ram, and hardware raid config >> operating system is fedora core 9 >> >> >>> How long does the slowdown last ? >> >> i stopped inserting data after slowdown starts, and it is still slow now >> after over 10 hours. >> however if i stop cassandra and start it, it becomes super fast >> immediately. till i insert another 100k or so rows when it becomes really >> slow again. >> >> >>> >>> There are a few reasons for temporary slowdowns ... perhaps the JVM >>> started GCing? >> >> how do i find out if this is the cause? >> >> >>> >>> Cassandra spends time cleaning up the on-disk SSTables >>> in a process called compaction. This could c
Re: cassandra slows down after inserts
decrease On Mon, Jul 13, 2009 at 8:53 AM, rkmr...@gmail.com wrote: > On Mon, Jul 13, 2009 at 6:03 AM, Jonathan Ellis wrote: >> >> On Mon, Jul 13, 2009 at 12:37 AM, Sandeep Tata >> wrote: >> > What hardware are you running one? How long does the slowdown last ? >> > There are a few reasons for temporary slowdowns ... perhaps the JVM >> > started GCing? >> >> Every time someone has reported this symptom, that has been the problem. >> >> The object count tunable is the most direct way to ameliorate this. >> >> http://wiki.apache.org/cassandra/MemtableThresholds > > this is my current setting: > > 0.02 > > should i increase or decrease it? >
Re: Scaling from 1 to x (was: one server or more servers?)
On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson wrote: > Cassandra doesn't provide the guarantees about the latest changes being > available from any given node, so you can't really use it in such an > application. > > I don't know if the "blocking" variants of the write operations make any > more guarantees, if they do then it might be suitable. Yes, quorum write/read would work just fine here. -Jonathan
Re: Scaling from 1 to x (was: one server or more servers?)
There are several interesting values you can pass to block_for: 0: fire-and-forget. minimizes latency when that is more important than robustness 1: wait for at least one node to fully ack the write before returning (the other replicas will be finished in the background) N/2 + 1, where N is the number of replicas: this is a quorum write; combined with quorum reads, it means you can tolerate up to N - (N/2 + 1) nodes failing before you can get inconsistent results. (which is usually better than no results at all.) N: guarantees consistent reads without having to wait for a quorum, so you trade write latency and availability (since the write will fail if one of the target nodes is down) for 100% consistency and reduced read latency -Jonathan On Tue, Jul 14, 2009 at 9:18 AM, Mark Robson wrote: > > > 2009/7/14 Jonathan Ellis >> >> On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson wrote: >> > Cassandra doesn't provide the guarantees about the latest changes being >> > available from any given node, so you can't really use it in such an >> > application. >> > >> > I don't know if the "blocking" variants of the write operations make any >> > more guarantees, if they do then it might be suitable. >> >> Yes, quorum write/read would work just fine here. > > Are those the type of writes which you get by setting the "block" parameter > to 1? > > Mark >
Re: one server or more servers?
gossip distributes the cluster status. the seeds are there to be an initial contact point. On Tue, Jul 14, 2009 at 10:04 AM, wrote: > Hey mark > thanks for the detailed reply explaining the example of Seeds > > How do we add servers other than Seeds as there is no such place in conf > file > > thanks
Re: one server or more servers?
the new servers contact the seeds, not the other way around On Tue, Jul 14, 2009 at 10:10 AM, wrote: > Mark and Jonathan > I m lost here > Dont we need to specify atleast the server ip address in the conf file. How > would cassandra know which ips they are running in ie the other servers. > I can see there is a way to specify seed but how would the seeds pick up the > other servers if they do not know their ip address > Also given the unlimited # of ips it cannot jus go thru each one of the ips > and ping 7001 > Servers other than seeds are automatically picked up by the cluster when > they start up; the nodes talk amongst themselves to figure out who's there. > On Tue, Jul 14, 2009 at 8:06 AM, Mark Robson wrote: >> >> >> 2009/7/14 >>> >>> How do we add servers other than Seeds as there is no such place in conf >>> file >> >> Servers other than seeds are automatically picked up by the cluster when >> they start up; the nodes talk amongst themselves to figure out who's there. >> >> Only the seeds need to be explicitly configured. >> >> This is a Good Thing :) >> >> Mark > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: replica on in the beginning or added later
although the repair code Stu is working on (https://issues.apache.org/jira/browse/CASSANDRA-193) could handle increasing the replica count, IMO there's little sense in relying any more on features that don't yet exist than necessary. :) On Tue, Jul 14, 2009 at 10:17 AM, wrote: > as a followup question > the items we are storing are extremely valuable and we are using cassandra > as a sql replacement tool.. ie no more postgres and all data from cassandra, > given cassandra scalability > as we hit limits on postgres and found pgpool-II horizontal partitioning too > clunky and skype, plproxy requires too much rewiring the client code. > should we start with a replica factor 1 and then increase replica factor to > 2 > or is is prudent to start with a replica factor of 2 > Can cassandra replicate even after running for a long time with a replica > factor of 1, if we change the replica factor to say 2 after 2months when we > add more nodes and figure there is enough space now to replicate > thanks > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: replica on in the beginning or added later
Note that for N=2, quorum write is the same as block-for-all. That is why N=3 is more popular, because it allows for one node to be down but still give you a quorum for any key. -Jonathan On Tue, Jul 14, 2009 at 10:22 AM, wrote: > starting with replica count 2 is more prudent thanks > > On Tue, Jul 14, 2009 at 8:21 AM, Jonathan Ellis wrote: >> >> although the repair code Stu is working on >> (https://issues.apache.org/jira/browse/CASSANDRA-193) could handle >> increasing the replica count, IMO there's little sense in relying any >> more on features that don't yet exist than necessary. :) >> >> On Tue, Jul 14, 2009 at 10:17 AM, wrote: >> > as a followup question >> > the items we are storing are extremely valuable and we are using >> > cassandra >> > as a sql replacement tool.. ie no more postgres and all data from >> > cassandra, >> > given cassandra scalability >> > as we hit limits on postgres and found pgpool-II horizontal partitioning >> > too >> > clunky and skype, plproxy requires too much rewiring the client code. >> > should we start with a replica factor 1 and then increase replica factor >> > to >> > 2 >> > or is is prudent to start with a replica factor of 2 >> > Can cassandra replicate even after running for a long time with a >> > replica >> > factor of 1, if we change the replica factor to say 2 after 2months when >> > we >> > add more nodes and figure there is enough space now to replicate >> > thanks >> > >> > -- >> > Bidegg worlds best auction site >> > http://bidegg.com >> > > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: problem running cassandra
the bind to port was successful; the ones to the messagingservice ports were not On Tue, Jul 14, 2009 at 10:59 PM, wrote: > http://pastie.org/546395 > > get this eror but > > cassandra]$ sudo netstat -apn | grep |wc -l > > is empty > > i wonder if this is a known issue > > thanks >
Re: Best way to use a Cassandra Client in a multi-threaded environment?
IIRC thrift makes no effort to generate threadsafe code. which makes sense in an rpc-oriented protocol really. On Wed, Jul 15, 2009 at 7:25 PM, Joel Meyer wrote: > Hello, > Are there any recommendations on how to use Cassandra Clients in a > multi-threaded front-end application (java)? Is the Client thread-safe or is > it best to do a client per thread (or object pool of some sort)? > Thanks, > Joel
Re: Best way to use a Cassandra Client in a multi-threaded environment?
What I mean is, if you have client.rpc1() it doesn't really matter if you can do client.rpc2() from another thread or not, since it's dumb. :) On Wed, Jul 15, 2009 at 7:41 PM, Ian Holsman wrote: > > On 16/07/2009, at 10:35 AM, Jonathan Ellis wrote: > >> IIRC thrift makes no effort to generate threadsafe code. >> >> which makes sense in an rpc-oriented protocol really. > > hmm.. not really. you can have a webserver calling a thrift backend quite > easily, and then you would have 100+ threads all calling the same code. >> >> On Wed, Jul 15, 2009 at 7:25 PM, Joel Meyer wrote: >>> >>> Hello, >>> Are there any recommendations on how to use Cassandra Clients in a >>> multi-threaded front-end application (java)? Is the Client thread-safe or >>> is >>> it best to do a client per thread (or object pool of some sort)? >>> Thanks, >>> Joel > > -- > Ian Holsman > i...@holsman.net > > > >
Re: Best way to use a Cassandra Client in a multi-threaded environment?
On Wed, Jul 15, 2009 at 8:13 PM, Ian Holsman wrote: > ugh. > if this is a byproduct of thrift it is. > we should have another way of getting to > the backend. > serialization is *not* a desired feature for most people ;-0 maybe not, but that's how every single database client works that I can think of, so it shouldn't exactly be surprising. you want multiple commands executing in parallel, you open multiple connections. not a Big Deal imo. -Jonathan
Re: one server or more servers?
the FAQ talks about using listenaddress: http://wiki.apache.org/cassandra/FAQ On Thu, Jul 16, 2009 at 1:49 AM, wrote: > if i make listenaddress blank > i get in oneserver > binding to 127.0.0.1 > in 2nd server > sometimes to the ip address of the server > in 3rd server > WARN - Exception was generated at : 07/16/2009 02:39:37 on thread GMFD:1 > Network is unreachable > java.net.SocketException: Network is unreachable > at sun.nio.ch.DatagramChannelImpl.send0(Native Method) > at > sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(DatagramChannelImpl.java:319) > at sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:299) > at sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:268) > at > org.apache.cassandra.net.UdpConnection.write(UdpConnection.java:88) > at > org.apache.cassandra.net.MessagingService.sendUdpOneWay(MessagingService.java:469) > at > org.apache.cassandra.gms.GossipDigestSynVerbHandler.doVerb(Gossiper.java:984) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > > On Wed, Jul 15, 2009 at 11:31 PM, wrote: >> >> some one in the group said a min of 2 seeds is necessary. >> >> i ll set the listenAddress to blank >> >> but i think it might be a problem of ports being blocked by fedora >> >> Can someone please list the ports used by cassandra to access the outside >> seeds and find the ring network? >> >> And if there are any users using fedora - can you show me how to open >> those ports so cassandra can gossip its way into a ring network >> >> right now i have 4 island cassandra nodes :( >> >> On Wed, Jul 15, 2009 at 9:24 PM, Evan Weaver wrote: >>> >>> Oh, yeah, definitely set ListenAddress to blank. 0.0.0.0 doesn't mean >>> "all interfaces" for some reason I forget. >>> >>> Evan >>> >>> On Wed, Jul 15, 2009 at 9:23 PM, Evan Weaver wrote: >>> > Try with only one seed. Not every host has to be in the seeds. >>> > >>> > Evan >>> > >>> > On Wed, Jul 15, 2009 at 8:52 PM, wrote: >>> >> in Seeds >>> >> can we specify domain name instead of ip address >>> >> right now seeds is specifying ip address >>> >> >>> >> On Wed, Jul 15, 2009 at 4:49 PM, Evan Weaver >>> >> wrote: >>> >>> >>> >>> I sometimes have to use 127.0.0.1, at least when ListenAddress is >>> >>> blank (auto-discover). Dunno if that has changed. >>> >>> >>> >>> Looks like this if you're successful: >>> >>> >>> >>> $ bin/nodeprobe --host 10.224.17.13 ring >>> >>> Token(124007023942663924846758258675932114665) 3 10.224.17.13 |<--| >>> >>> Token(106858063638814585506848525974047690568) 3 10.224.17.19 | ^ >>> >>> Token(141130545721235451315477340120224986045) 3 10.224.17.14 |-->| >>> >>> >>> >>> Evan >>> >>> >>> >>> On Wed, Jul 15, 2009 at 4:24 PM, Michael >>> >>> Greene >>> >>> wrote: >>> >>> > The port you're looking for is typically 8080, but if you only >>> >>> > specify >>> >>> > the host and not the port it shoudl work just fine. >>> >>> > >>> >>> > bin/nodeprobe -host localhost >>> >>> > >>> >>> > Michael >>> >>> > >>> >>> > On Wed, Jul 15, 2009 at 6:18 PM, wrote: >>> >>> >> bin]$ ./nodeprobe -host localhost -port >>> >>> >> Error connecting to remote JMX agent! >>> >>> >> java.io.IOException: Failed to retrieve RMIServer stub: >>> >>> >> javax.naming.CommunicationException [Root exception is >>> >>> >> java.rmi.ConnectIOException: error during JRMP connection >>> >>> >> establishment; >>> >>> >> nested exception is: >>> >>> >> java.io.EOFException] >>> >>> >> at >>> >>> >> >>> >>> >> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:342) >>> >>> >> at >>> >>> >> >>> >>> >> >>> >>> >> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267) >>> >>> >> at >>> >>> >> org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:149) >>> >>> >> at >>> >>> >> org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:111) >>> >>> >> at >>> >>> >> org.apache.cassandra.tools.NodeProbe.main(NodeProbe.java:470) >>> >>> >> Caused by: javax.naming.CommunicationException [Root exception is >>> >>> >> java.rmi.ConnectIOException: error during JRMP connection >>> >>> >> establishment; >>> >>> >> nested exception is: >>> >>> >> java.io.EOFException] >>> >>> >> at >>> >>> >> >>> >>> >> >>> >>> >> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118) >>> >>> >> at >>> >>> >> >>> >>> >> >>> >>> >> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:203) >>> >>> >> at >>> >>> >> javax.naming.InitialContext.lookup(InitialContext.java:409) >>> >>> >> at >>> >>> >> >>> >>> >> >>> >>> >> javax.management.remote.rmi.RMIConnect
Re: WARN - Unable to find a live Endpoint we might be out of live nodes , This is dangerous !!!!
Please don't repeat your question separately on -user, -dev, and irc. If nobody answers it's either because we're busy or we don't know the answer. In this case it's probably a bit of both. :) I've never heard of anyone running into this before so my guess is it's something weird with your network configuration. What happens if you try to connect to 226.129.12.117:7001 from 226.229.123.185 (e.g. with netcat), for instance? If you want to get into the code, set your log to TRACE and it will spit out a _ton_ of messages about gossip. On Fri, Jul 17, 2009 at 2:23 AM, wrote: > if i kill and start the cassandras again they are able to find each other > but if they are left alone they go down on each other! ie they are unable to > find each other > > On Fri, Jul 17, 2009 at 12:20 AM, wrote: >> >> Why do the other nodes go down, in each of the nodes if i run nodprobe the >> results show that the other nodes are down >> 226.229.123.185:7001 up >> 226.129.12.117:7001 down >> 226.229.123.116:7001 down >> 226.229.112.134:7001 down >> Token(165434480505148814142836593307761304854) >> >> On Fri, Jul 17, 2009 at 12:18 AM, wrote: >>> >>> What does this mean? >>> >>> DEBUG - clearing >>> DEBUG - remove >>> WARN - Unable to find a live Endpoint we might be out of live nodes , >>> This is dangerous >>> WARN - Unable to find a live Endpoint we might be out of live nodes , >>> This is dangerous >>> DEBUG - locally writing writing key tofu to 11.12.13.0:7000 >>> -- >>> Bidegg worlds best auction site >>> http://bidegg.com >> >> >> >> -- >> Bidegg worlds best auction site >> http://bidegg.com > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: Concurrent updates
This is the kind of inconsistency that vector clocks can handle but the more simplistic timestamp-based resolution cannot. Of test-and-set vs vector clocks, vector clocks fits cassandra much better. -Jonathan On Fri, Jul 17, 2009 at 9:59 AM, Jun Rao wrote: > This is a case where a test-and-set feature would be useful. See the > following JIRA. We just don't have it nailed down yet. > https://issues.apache.org/jira/browse/CASSANDRA-48 > > Jun > IBM Almaden Research Center > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > jun...@almaden.ibm.com > > Ivan Chang > > > Ivan Chang > > 07/17/2009 07:14 AM > > Please respond to > cassandra-user@incubator.apache.org > > To > cassandra-user@incubator.apache.org > cc > > Subject > Concurrent updates > I have the following scenario that would like a best solution for. > > Here's the scenario: > > Table1.Standard1['cassandra']['frequency'] > > it is used for keeping track of how many times the word "cassandra" > appeared. > > Let's say we have a bunch of articles stored in Hadoop, a Map/Reduce greps > all articles throughout the Hadoop cluster that matches the pattern > ^cassandra$ > and updates Table1.Standard1['cassandra']['frequency']. Hence > Table1.Standard1['cassandra']['frequency'] will be updated concurrently. > > One of the issues I am facing is that > Table1.Standard1['cassandra']['frequency'] > stores the count as a String (I am using Java), so in order to update the > frequency > properly, the thread that's running the Map/Reduce will have to retrieve > Table1.Standard1['cassandra']['frequency'] in its native String format and > hold > that in temp (java Sttring), convert into int, then add the new counts in, > and finally > "SET Table1.Standard1['cassandra']['frequency']. = '" + temp.toString() + > ''" > > During the entire process, how do we guranatee concurrency. The Cql SET > does > not allow something like > > SET Table1.Standard1['cassandra']['frequency']. = > Table1.Standard1['cassandra']['frequency']. + newCounts > > since there's only one String type. > > What would be the best solution in this situtaion? > > Thanks, > Ivan >
Re: WARN - Unable to find a live Endpoint we might be out of live nodes , This is dangerous !!!!
7000 is tcp 7001 is udp On Fri, Jul 17, 2009 at 12:34 PM, wrote: > Jonathan > > tmp]$ nc -v 226.129.12.117 7001 > nc: connect to 226.129.12.117 port 7001 (tcp) failed: Connection refused > tmp]$ nc -v 226.129.12.117 7001 > nc: connect to 226.129.12.117 port 7001 (tcp) failed: Connection refused > > I get a connect refused but is tcp the way to connect or is there a > different way to use nc command ie using udp mode? > > if i do > nc -u -v 226.129.12.117 7001 > it just hangs there > > /etc/hosts has the following in our servers > # Do not remove the following line, or various programs > # that require network functionality will fail. > 127.0.0.1 localhost.localdomain localhost localhost > ::1 localhost6.localdomain6 localhost6 > > On Fri, Jul 17, 2009 at 6:09 AM, Jonathan Ellis wrote: >> >> Please don't repeat your question separately on -user, -dev, and irc. >> If nobody answers it's either because we're busy or we don't know the >> answer. >> >> In this case it's probably a bit of both. :) >> >> I've never heard of anyone running into this before so my guess is >> it's something weird with your network configuration. What happens if >> you try to connect to 226.129.12.117:7001 from 226.229.123.185 (e.g. >> with netcat), for instance? >> >> If you want to get into the code, set your log to TRACE and it will >> spit out a _ton_ of messages about gossip. >> >> On Fri, Jul 17, 2009 at 2:23 AM, wrote: >> > if i kill and start the cassandras again they are able to find each >> > other >> > but if they are left alone they go down on each other! ie they are >> > unable to >> > find each other >> > >> > On Fri, Jul 17, 2009 at 12:20 AM, wrote: >> >> >> >> Why do the other nodes go down, in each of the nodes if i run nodprobe >> >> the >> >> results show that the other nodes are down >> >> 226.229.123.185:7001 up >> >> 226.129.12.117:7001 down >> >> 226.229.123.116:7001 down >> >> 226.229.112.134:7001 down >> >> Token(165434480505148814142836593307761304854) >> >> >> >> On Fri, Jul 17, 2009 at 12:18 AM, wrote: >> >>> >> >>> What does this mean? >> >>> >> >>> DEBUG - clearing >> >>> DEBUG - remove >> >>> WARN - Unable to find a live Endpoint we might be out of live nodes , >> >>> This is dangerous >> >>> WARN - Unable to find a live Endpoint we might be out of live nodes , >> >>> This is dangerous >> >>> DEBUG - locally writing writing key tofu to 11.12.13.0:7000 >> >>> -- >> >>> Bidegg worlds best auction site >> >>> http://bidegg.com >> >> >> >> >> >> >> >> -- >> >> Bidegg worlds best auction site >> >> http://bidegg.com >> > >> > >> > >> > -- >> > Bidegg worlds best auction site >> > http://bidegg.com >> > > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: Scaling from 1 to x (was: one server or more servers?)
HH is a mechanism to reduce inconsistency, but a node holding a HH row while waiting for the "right" node to recover won't be part of the group that is queried for it (since it could be anywhere). So if you set block_for to M and less than M of the actual replica destinations are up, Cassandra will fail the write. If you set block_for to zero, then writes will indeed never fail (unless the node the client is talking to dies mid-action, of course). -Jonathan On Fri, Jul 17, 2009 at 3:01 PM, Vijay wrote: > "since the write will fail if one of the target nodes is down" > I thought Hinted handoff will take care of this Right? Write will never fail > insted it will write to another node right? > > correct me if i am wrong. > > Thanks and Regards, > > > > > > On Tue, Jul 14, 2009 at 7:26 AM, Jonathan Ellis wrote: >> >> N: guarantees consistent reads without having to wait for a quorum, so >> you trade write latency and availability (since the write will fail if >> one of the target nodes is down) for 100% consistency and reduced read >> latency >
Re: Scaling from 1 to x (was: one server or more servers?)
On Fri, Jul 17, 2009 at 3:58 PM, Vijay wrote: > Still confused, > > If i have a Quorum Write with block for to be 3 and 2 of them are alive i > will write to 3 nodes with HH right? yes, but only 2 will be available for reads, so the 3rd can't count towards fulfulling block_for. the semantics of block_for on read (R) and write (W) are that you have strong consistency if R + W >= N where N is number of replicas. (see http://www.allthingsdistributed.com/2007/12/eventually_consistent.html) For this to hold in cassandra, we need to provide consistency where (for instance) W = N and R = 1. Remember that a HH write is not available for reads. This means that we need to fail the write if we can't write the full N replicas to the right nodes. (This is why quorum write + quorum read is often a better tradeoff in practice since you can tolerate node failures w/o losing availability.) -Jonathan > During query it will fail if i only have block for to be 3? > > Regards, > > > > > > On Fri, Jul 17, 2009 at 1:36 PM, Jonathan Ellis wrote: >> >> ck_for to zero, then writes will indeed never fail >> (unless the node the client is ta >
Re: python thrift cassandra: get_slice_super vs get_slice_super_by_names
I would guess because kw != 'tofu' On Sun, Jul 19, 2009 at 12:24 AM, wrote: > Why doesnt res return ColumnFamily Related whereas res2 works just fine > thanks? > > timestamp = time.time() > res = client.get_slice_super('Table1', kw, 'Super1','','',True,0,1000) > print res > [] > res2 = client.get_slice_super_by_names('Table1', 'tofu', 'Super1', > ['Related',])[0] > > > print res2 > [superColumn_t(name='Related', columns=[column_t(columnName='tofu calories', > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), > column_t(columnName='tofu festival', value="(dp1\nS'count'\np2\nI1\ns.", > timestamp=1247980687), column_t(columnName='tofu marinade', > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), > column_t(columnName='tofu recipe', value="(dp1\nS'count'\np2\nI1\ns.", > timestamp=1247980687), column_t(columnName='tofu recipes', > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), > column_t(columnName='tofu recipes easy', value="(dp1\nS'count'\np2\nI1\ns.", > timestamp=1247980687), column_t(columnName='tofu scramble', > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), > column_t(columnName='tofu stir fry', value="(dp1\nS'count'\np2\nI1\ns.", > timestamp=1247980687), column_t(columnName='tofurkey', > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), > column_t(columnName='tofutti', value="(dp1\nS'count'\np2\nI1\ns.", > timestamp=1247980687)])] > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: ì¤ ìì§ ì¤ ì thrift.Thrift.TApplicationEx ception: Internal error processing insert
That should be partially solved in trunk now that 139 is committed, and more solved when we commit 185 soon. On Sun, Jul 19, 2009 at 3:43 AM, wrote: > Any utf-8 keyword causes cassandra to crash! >
Re: how to delete an entire column family
iterate through the keys with get_key_range, and delete the row associated with each key On Sun, Jul 19, 2009 at 3:51 AM, wrote: > In Super-column family Super1 there is a column family Related > How do i delete the entire related column family > thanks
Re: python thrift cassandra: get_slice_super vs get_slice_super_by_names
Strange. If you can post a script showing how to reproduce the problem from a fresh database then I can debug it. On Sun, Jul 19, 2009 at 11:23 AM, wrote: > Jon i should have mntioned kw is 'tofu' > that is why it looks quite not right > > On Sun, Jul 19, 2009 at 6:08 AM, Jonathan Ellis wrote: >> >> I would guess because kw != 'tofu' >> >> On Sun, Jul 19, 2009 at 12:24 AM, wrote: >> > Why doesnt res return ColumnFamily Related whereas res2 works just fine >> > thanks? >> > >> > timestamp = time.time() >> > res = client.get_slice_super('Table1', kw, 'Super1','','',True,0,1000) >> > print res >> > [] >> > res2 = client.get_slice_super_by_names('Table1', 'tofu', 'Super1', >> > ['Related',])[0] >> > >> > >> > print res2 >> > [superColumn_t(name='Related', columns=[column_t(columnName='tofu >> > calories', >> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), >> > column_t(columnName='tofu festival', value="(dp1\nS'count'\np2\nI1\ns.", >> > timestamp=1247980687), column_t(columnName='tofu marinade', >> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), >> > column_t(columnName='tofu recipe', value="(dp1\nS'count'\np2\nI1\ns.", >> > timestamp=1247980687), column_t(columnName='tofu recipes', >> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), >> > column_t(columnName='tofu recipes easy', >> > value="(dp1\nS'count'\np2\nI1\ns.", >> > timestamp=1247980687), column_t(columnName='tofu scramble', >> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), >> > column_t(columnName='tofu stir fry', value="(dp1\nS'count'\np2\nI1\ns.", >> > timestamp=1247980687), column_t(columnName='tofurkey', >> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687), >> > column_t(columnName='tofutti', value="(dp1\nS'count'\np2\nI1\ns.", >> > timestamp=1247980687)])] >> > >> > >> > -- >> > Bidegg worlds best auction site >> > http://bidegg.com >> > > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: New cassandra in trunk - breaks python thrift interface (was AttributeError: 'str' object has no attribute 'write')
Don't run trunk if you're not going to read "svn log." The api changed with the commit of the 139 patches (and it will change again with the 185 ones). look at interface/cassandra.thrift to see what arguments are expected. On Sun, Jul 19, 2009 at 3:31 PM, wrote: > Hey Gasol wu > i regenerated the new thrift interface using > thrift -gen py cassandra.thrift > > > > client.insert('Table1', 'tofu', 'Super1:Related:tofu stew', > pickle.dumps(dict(count=1)), time.time(), 0) > --- > AttributeError Traceback (most recent call last) > > /home/mark/work/cexperiments/ in () > > /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, key, > column_path, value, timestamp, block_for) > 358 - block_for > 359 """ > --> 360 self.send_insert(table, key, column_path, value, timestamp, > block_for) > 361 self.recv_insert() > 362 > > /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, table, > key, column_path, value, timestamp, block_for) > 370 args.timestamp = timestamp > 371 args.block_for = block_for > --> 372 args.write(self._oprot) > 373 self._oprot.writeMessageEnd() > 374 self._oprot.trans.flush() > > /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot) > 1923 if self.column_path != None: > 1924 oprot.writeFieldBegin('column_path', TType.STRUCT, 3) > -> 1925 self.column_path.write(oprot) > 1926 oprot.writeFieldEnd() > 1927 if self.value != None: > > AttributeError: 'str' object has no attribute 'write' > > > On Sun, Jul 19, 2009 at 10:29 AM, Gasol Wu wrote: >> >> hi, >> the cassandra.thrift has changed. >> u need to generate new python client and compile class again. >> >> >> On Mon, Jul 20, 2009 at 1:18 AM, wrote: >>> >>> Hi guys >>> the new trunk cassandra doesnt work for a simple insert, how do we get >>> this working >>> client.insert('Table1', 'tofu', 'Super1:Related:tofu >>> stew',pickle.dumps(dict(count=1)), time.time(), 0) >>> >>> --- >>> AttributeError Traceback (most recent call >>> last) >>> /home/mark/work/cexperiments/ in () >>> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, key, >>> column_path, value, timestamp, block_for) >>> 358 - block_for >>> 359 """ >>> --> 360 self.send_insert(table, key, column_path, value, timestamp, >>> block_for) >>> 361 self.recv_insert() >>> 362 >>> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, table, >>> key, column_path, value, timestamp, block_for) >>> 370 args.timestamp = timestamp >>> 371 args.block_for = block_for >>> --> 372 args.write(self._oprot) >>> 373 self._oprot.writeMessageEnd() >>> 374 self._oprot.trans.flush() >>> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot) >>> 1923 if self.column_path != None: >>> 1924 oprot.writeFieldBegin('column_path', TType.STRUCT, 3) >>> -> 1925 self.column_path.write(oprot) >>> 1926 oprot.writeFieldEnd() >>> 1927 if self.value != None: >>> AttributeError: 'str' object has no attribute 'write' >>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu >>> stew',pickle.dumps(dict(count=1)), time.time(), 0) >>> >>> -- >>> Bidegg worlds best auction site >>> http://bidegg.com >> > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: New cassandra in trunk - breaks python thrift interface (was AttributeError: 'str' object has no attribute 'write')
For the record, this is not actually a bug, and if you're not sure, asking on the list before filing a report in JIRA is probably a good thing. On Sun, Jul 19, 2009 at 6:45 PM, Ian Holsman wrote: > hi mobile. > is it possible to put these as JIRA bugs ? instead of just mailing them on > the list ? > > that way people can give them a bit more attention. and other people who > have the same issue will be easily see what is going on. > > the URL is here :- https://issues.apache.org/jira/browse/CASSANDRA > regards > Ian > > On 20/07/2009, at 6:36 AM, mobiledream...@gmail.com wrote: > >> ok >> so which is the version where cassandra python thrift works out of the box >> thanks >> >> On 7/19/09, Jonathan Ellis wrote: Don't run trunk if >> you're not going to read "svn log." >> >> The api changed with the commit of the 139 patches (and it will change >> again with the 185 ones). >> >> look at interface/cassandra.thrift to see what arguments are expected. >> >> >> On Sun, Jul 19, 2009 at 3:31 PM, wrote: >> > Hey Gasol wu >> > i regenerated the new thrift interface using >> > thrift -gen py cassandra.thrift >> > >> > >> > >> > client.insert('Table1', 'tofu', 'Super1:Related:tofu stew', >> > pickle.dumps(dict(count=1)), time.time(), 0) >> > >> > --- >> > AttributeError Traceback (most recent call >> > last) >> > >> > /home/mark/work/cexperiments/ in () >> > >> > /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, >> > key, >> > column_path, value, timestamp, block_for) >> > 358 - block_for >> > 359 """ >> > --> 360 self.send_insert(table, key, column_path, value, timestamp, >> > block_for) >> > 361 self.recv_insert() >> > 362 >> > >> > /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, >> > table, >> > key, column_path, value, timestamp, block_for) >> > 370 args.timestamp = timestamp >> > 371 args.block_for = block_for >> > --> 372 args.write(self._oprot) >> > 373 self._oprot.writeMessageEnd() >> > 374 self._oprot.trans.flush() >> > >> > /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot) >> > 1923 if self.column_path != None: >> > 1924 oprot.writeFieldBegin('column_path', TType.STRUCT, 3) >> > -> 1925 self.column_path.write(oprot) >> > 1926 oprot.writeFieldEnd() >> > 1927 if self.value != None: >> > >> > AttributeError: 'str' object has no attribute 'write' >> > >> > >> > On Sun, Jul 19, 2009 at 10:29 AM, Gasol Wu wrote: >> >> >> >> hi, >> >> the cassandra.thrift has changed. >> >> u need to generate new python client and compile class again. >> >> >> >> >> >> On Mon, Jul 20, 2009 at 1:18 AM, wrote: >> >>> >> >>> Hi guys >> >>> the new trunk cassandra doesnt work for a simple insert, how do we get >> >>> this working >> >>> client.insert('Table1', 'tofu', 'Super1:Related:tofu >> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0) >> >>> >> >>> >> >>> --- >> >>> AttributeError Traceback (most recent call >> >>> last) >> >>> /home/mark/work/cexperiments/ in () >> >>> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, >> >>> key, >> >>> column_path, value, timestamp, block_for) >> >>> 358 - block_for >> >>> 359 """ >> >>> --> 360 self.send_insert(table, key, column_path, value, >> >>> timestamp, >> >>> block_for) >> >>> 361 self.recv_insert() >> >>> 362 >> >>> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, >> >>> table, >> >>> key, column_path, value, timestamp, block_for) >> >>> 370 args.timestamp = timestamp >> >>> 371 args.block_for = block_for >> >>> --> 372 args.write(self._oprot) >> >>> 373 self._oprot.writeMessageEnd() >> >>> 374 self._oprot.trans.flush() >> >>> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot) >> >>> 1923 if self.column_path != None: >> >>> 1924 oprot.writeFieldBegin('column_path', TType.STRUCT, 3) >> >>> -> 1925 self.column_path.write(oprot) >> >>> 1926 oprot.writeFieldEnd() >> >>> 1927 if self.value != None: >> >>> AttributeError: 'str' object has no attribute 'write' >> >>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu >> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0) >> >>> >> >>> -- >> >>> Bidegg worlds best auction site >> >>> http://bidegg.com >> >> >> > >> > >> > >> > -- >> > Bidegg worlds best auction site >> > http://bidegg.com >> > >> >> >> >> -- >> Bidegg worlds best auction site >> http://bidegg.com > > -- > Ian Holsman > i...@holsman.net > > > >
Re: New cassandra in trunk - breaks python thrift interface (was AttributeError: 'str' object has no attribute 'write')
It works fine, it's just not the same as it was two weeks ago. On Sun, Jul 19, 2009 at 3:36 PM, wrote: > ok > so which is the version where cassandra python thrift works out of the box > thanks > > On 7/19/09, Jonathan Ellis wrote: >> >> Don't run trunk if you're not going to read "svn log." >> >> The api changed with the commit of the 139 patches (and it will change >> again with the 185 ones). >> >> look at interface/cassandra.thrift to see what arguments are expected. >> >> >> On Sun, Jul 19, 2009 at 3:31 PM, wrote: >> > Hey Gasol wu >> > i regenerated the new thrift interface using >> > thrift -gen py cassandra.thrift >> > >> > >> > >> > client.insert('Table1', 'tofu', 'Super1:Related:tofu stew', >> > pickle.dumps(dict(count=1)), time.time(), 0) >> > >> > --- >> > AttributeErrorTraceback (most recent call >> > last) >> > >> > /home/mark/work/cexperiments/ in () >> > >> > /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, >> > key, >> > column_path, value, timestamp, block_for) >> > 358 - block_for >> > 359 """ >> > --> 360 self.send_insert(table, key, column_path, value, timestamp, >> > block_for) >> > 361 self.recv_insert() >> > 362 >> > >> > /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, >> > table, >> > key, column_path, value, timestamp, block_for) >> > 370 args.timestamp = timestamp >> > 371 args.block_for = block_for >> > --> 372 args.write(self._oprot) >> > 373 self._oprot.writeMessageEnd() >> > 374 self._oprot.trans.flush() >> > >> > /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot) >> >1923 if self.column_path != None: >> >1924 oprot.writeFieldBegin('column_path', TType.STRUCT, 3) >> > -> 1925 self.column_path.write(oprot) >> >1926 oprot.writeFieldEnd() >> >1927 if self.value != None: >> > >> > AttributeError: 'str' object has no attribute 'write' >> > >> > >> > On Sun, Jul 19, 2009 at 10:29 AM, Gasol Wu wrote: >> >> >> >> hi, >> >> the cassandra.thrift has changed. >> >> u need to generate new python client and compile class again. >> >> >> >> >> >> On Mon, Jul 20, 2009 at 1:18 AM, wrote: >> >>> >> >>> Hi guys >> >>> the new trunk cassandra doesnt work for a simple insert, how do we get >> >>> this working >> >>> client.insert('Table1', 'tofu', 'Super1:Related:tofu >> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0) >> >>> >> >>> >> >>> --- >> >>> AttributeErrorTraceback (most recent call >> >>> last) >> >>> /home/mark/work/cexperiments/ in () >> >>> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, >> >>> key, >> >>> column_path, value, timestamp, block_for) >> >>> 358 - block_for >> >>> 359 """ >> >>> --> 360 self.send_insert(table, key, column_path, value, >> >>> timestamp, >> >>> block_for) >> >>> 361 self.recv_insert() >> >>> 362 >> >>> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, >> >>> table, >> >>> key, column_path, value, timestamp, block_for) >> >>> 370 args.timestamp = timestamp >> >>> 371 args.block_for = block_for >> >>> --> 372 args.write(self._oprot) >> >>> 373 self._oprot.writeMessageEnd() >> >>> 374 self._oprot.trans.flush() >> >>> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot) >> >>>1923 if self.column_path != None: >> >>>1924 oprot.writeFieldBegin('column_path', TType.STRUCT, 3) >> >>> -> 1925 self.column_path.write(oprot) >> >>>1926 oprot.writeFieldEnd() >> >>>1927 if self.value != None: >> >>> AttributeError: 'str' object has no attribute 'write' >> >>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu >> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0) >> >>> >> >>> -- >> >>> Bidegg worlds best auction site >> >>> http://bidegg.com >> >> >> > >> > >> > >> > -- >> > Bidegg worlds best auction site >> > http://bidegg.com >> > > > > > -- > Bidegg worlds best auction site > http://bidegg.com
Re: AttributeError: 'str' object has no attribute 'write'
Building the java interface is part of the build, but ant has no way to guess which additional client interfaces you want to use, if any. On Sun, Jul 19, 2009 at 6:46 PM, Ian Holsman wrote: > hi Gasol. > shouldn't regeneration of the interface be part of the build process? > > On 20/07/2009, at 3:29 AM, Gasol Wu wrote: > >> hi, >> the cassandra.thrift has changed. >> u need to generate new python client and compile class again. >> >> >> On Mon, Jul 20, 2009 at 1:18 AM, wrote: >> Hi guys >> the new trunk cassandra doesnt work for a simple insert, how do we get >> this working >> >> client.insert('Table1', 'tofu', 'Super1:Related:tofu >> stew',pickle.dumps(dict(count=1)), time.time(), 0) >> >> --- >> AttributeError Traceback (most recent call >> last) >> >> /home/mark/work/cexperiments/ in () >> >> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, key, >> column_path, value, timestamp, block_for) >> 358 - block_for >> 359 """ >> --> 360 self.send_insert(table, key, column_path, value, timestamp, >> block_for) >> 361 self.recv_insert() >> 362 >> >> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, table, >> key, column_path, value, timestamp, block_for) >> 370 args.timestamp = timestamp >> 371 args.block_for = block_for >> --> 372 args.write(self._oprot) >> 373 self._oprot.writeMessageEnd() >> 374 self._oprot.trans.flush() >> >> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot) >> 1923 if self.column_path != None: >> 1924 oprot.writeFieldBegin('column_path', TType.STRUCT, 3) >> -> 1925 self.column_path.write(oprot) >> 1926 oprot.writeFieldEnd() >> 1927 if self.value != None: >> >> AttributeError: 'str' object has no attribute 'write' >> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu >> stew',pickle.dumps(dict(count=1)), time.time(), 0) >> >> >> -- >> Bidegg worlds best auction site >> http://bidegg.com >> > > -- > Ian Holsman > i...@holsman.net > > > >
Re: a talk on building an email app on Cassandra
Nice! On Mon, Jul 20, 2009 at 12:43 PM, Jun Rao wrote: > Last Friday, I gave an IEEE talk on an email app that we built on top of > Cassandra. Below is the link to the slides. I thought some of the people > here might find this interesting. > > http://ewh.ieee.org/r6/scv/computer//nfic/2009/IBM-Jun-Rao.pdf > > Jun > IBM Almaden Research Center > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > jun...@almaden.ibm.com >
Fwd: thrift API changes
Oops, I sent this to the old google -user list by mistake the first time. Now that that's gone, I realized the error. -- Forwarded message -- From: Jonathan Ellis Date: Mon, Jul 20, 2009 at 10:10 PM Subject: Re: thrift API changes To: cassandra-u...@googlegroups.com, cassandra-...@incubator.apache.org Well, that was a long "week." My fault -- as I commented on IRC, I underestimated how long 185 would take as badly as I can remember doing anywhere. We're done with the big ones now. (185, 240, 303, and 304). Two more and then I think we can call it good for 0.4 from the client's perspective: 232 and 300 (dealing with specifying the number of replicas to wait for when reading/writing, respectively) -Jonathan On Wed, Jul 8, 2009 at 1:47 PM, Jonathan Ellis wrote: > Hi all, > > Just a heads up that this is going to be The Week Of Breaking Things > in the client api. There are a bunch of long-standing problems that > can't be fixed without making fundamental changes in the API so we are > going to bite the bullet and get those done now. We've already done > CASSANDRA-261, -277, and -280; next up will be CASSANDRA-139, and > eventually CASSANDRA-240 and friends. > > If you're on a version of trunk that Works For You, you might want to > resist the urge to svn up until the dust settles. > > -Jonathan
Re: trunk
the internals should be solid but we are in the middle (towards the end of, actually) changing the thrift api pretty drastically. (the colons had to go, and the sooner we bit the bullet, the better. :) see this thread -- http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200907.mbox/%3ce06563880907202024l49ce9ack3a10ead5d0e97...@mail.gmail.com%3e On Tue, Jul 21, 2009 at 8:11 AM, Jonas Bonér wrote: > Hi guys. > > How stable is trunk? > I have been on trunk for pretty long now and no issues so far but > Thanks. > > -- > Jonas Bonér > > twitter: @jboner > blog: http://jonasboner.com > work: http://crisp.se > work: http://scalablesolutions.se > code: http://github.com/jboner >
Re: keys and column names cannot be utf-8
did you read the new section in the config xml explaining how to use a UTF8 comparator? also: thrift itself is just plain broken for unicode support in some languages; see THRIFT-395 I think the short version is that when you have a java server, unicode will work with java or C# clients but not with anything else (so if you are using a python client for instance switching to jython might be a workaround) On Tue, Jul 21, 2009 at 4:00 PM, wrote: > Not fixed > The following utf8 key names and column names still give an error. > cass: 2009-07-21 13:55:35,597 error 98. ìµì§ > ì¤ ìì§ > Ûïº] (1)icasso's, instruments de musique sur un guéridon] (1)Ûïº[irancel] > (1)ïº > cass: 2009-07-21 13:55:55,093 error 377. friday night lights > s03e01[âmegaupload..50 error 321. instruments de musique sur un guéridon[[ > comâ > cass: 2009-07-21 13:56:12,341 error 637. asuka izumi photos[u15 ç«¥æãçé] > (1) > cass: 2009-07-21 13:56:39,380 error 1118. dragonball z games for pc[dragon > balĺz pc games download] (1) > cass: 2009-07-21 13:56:48,976 error 1301. ï»ïºïºïºï» ﺳ[ï»ïºïºïºï» ﺳ ï» > > 导æç³æµ·è¯±å¥¸å¯¼è´å¥³çèªæ] ((1)2009-07-21 13:56:55,352 error 1430. > æç³æµ·[大å > cass: 2009-07-21 13:56:59,287 error 1510. cinquième république[définition > de république?] (1) 导æç³æµ·] (1) > cass: 2009-07-21 13:59:38,783 error 1842. navaratri kolu[doll festival in > navratt > ri golu] (1) > cass: 2009-07-21 13:59:39,069 error 1846. tn lottery winning > numbers[www.tnlottery] (1) > cass: 2009-07-21 13:59:39,274 error 1850. www.buildabearville.com cheats[all > the buildabear.com cheats and codes] (1) > cass: 2009-07-21 13:59:39,773 error 1860. shippuuden 78[naruto shippuuden 78 > subbed torrent] (1) > > On Tue, Jul 21, 2009 at 10:34 AM, Eric Evans wrote: >> >> On Tue, 2009-07-21 at 09:18 -0700, mobiledream...@gmail.com wrote: >> > Is there any timeline on when commit 185 will be done as the utf8 >> > error still exists >> >> 185 was committed yesterday. >> >> https://issues.apache.org/jira/browse/CASSANDRA-185 >> >> -- >> Eric Evans >> eev...@rackspace.com >> > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: keys and column names cannot be utf-8
On Tue, Jul 21, 2009 at 4:06 PM, Jonathan Ellis wrote: > (so if you are using a python client for instance switching to jython > might be a workaround) that is, using the java thrift client, not the python ones.
Re: keys and column names cannot be utf-8
On Tue, Jul 21, 2009 at 4:18 PM, wrote: > Hey jonathan > this is not in the wiki or any documentation. this is trunk. i wrote it a couple days ago. feel free to step in and update the wiki. > does this work in python thrift probably not, given the thrift utf8 bugs. (but you could use BytesType and at least you will get the right data back.) > if it does - that would be perfect > but this doesnt explain why keys cannot be utf8 because FB didn't write it and so far neither has anyone else. -Jonathan
Re: keys and column names cannot be utf-8
On Tue, Jul 21, 2009 at 4:21 PM, Jonathan Ellis wrote: >> does this work in python thrift > > probably not, given the thrift utf8 bugs. to correct myself: now that we are using binary data in the thrift api it can't screw us over. so yes, UTF8Type should be fine.
Re: keys and column names cannot be utf-8
you may also want to specify CompareSubcolumnsWith. On Tue, Jul 21, 2009 at 4:27 PM, wrote: > thanks jonathan > trying this > > > On Tue, Jul 21, 2009 at 2:24 PM, Jonathan Ellis wrote: >> >> On Tue, Jul 21, 2009 at 4:21 PM, Jonathan Ellis wrote: >> >> does this work in python thrift >> > >> > probably not, given the thrift utf8 bugs. >> >> to correct myself: now that we are using binary data in the thrift api >> it can't screw us over. so yes, UTF8Type should be fine. > > > > -- > Bidegg worlds best auction site > http://bidegg.com >
Re: keys and column names cannot be utf-8
guarantee? in a pre-alpha trunk? no, that is too strong a word. but that's what *supposed* to work, so I will fix it if it doesn't. :) On Tue, Jul 21, 2009 at 4:32 PM, wrote: > if this would be the conf/storage-conf.xml > Name="Standard1" CompareWith="UTF8Type" FlushPeriodInMinutes="60"/> > > ColumnSort="Time" CompareWith="UTF8Type" Name="StandardByTime1"/> > CompareSubcolumnsWith="UTF8Type" Name="Super1"/> > Jonathan can you clarify if this will guarantee proper python thrift utf8 > behavior thanks > On Tue, Jul 21, 2009 at 2:29 PM, Jonathan Ellis wrote: >> >> you may also want to specify CompareSubcolumnsWith. >> >> On Tue, Jul 21, 2009 at 4:27 PM, wrote: >> > thanks jonathan >> > trying this >> > >> > >> > On Tue, Jul 21, 2009 at 2:24 PM, Jonathan Ellis >> > wrote: >> >> >> >> On Tue, Jul 21, 2009 at 4:21 PM, Jonathan Ellis >> >> wrote: >> >> >> does this work in python thrift >> >> > >> >> > probably not, given the thrift utf8 bugs. >> >> >> >> to correct myself: now that we are using binary data in the thrift api >> >> it can't screw us over. so yes, UTF8Type should be fine. >> > >> > >> > >> > -- >> > Bidegg worlds best auction site >> > http://bidegg.com >> > > > > > -- > Bidegg worlds best auction site > http://bidegg.com >