Re: NPE in apache cassandra

2009-03-11 Thread Jonathan Ellis
the config format changed.  now you need to specify the cfname as an attribute:



-Jonathan

On Wed, Mar 11, 2009 at 3:52 PM, Jiansheng Huang  wrote:
>
>
> -- Forwarded message --
> From: Jiansheng Huang 
> Date: Wed, Mar 11, 2009 at 2:49 PM
> Subject: NPE in apache cassandra
> To: cassandra-user@incubator.apache.org, Avinash Lakshman
> , Prashant Malik 
> Cc: agu...@rocketfuelinc.com
>
>
> Hi folks, I checked out the new code from apache and compiled it. When I
> start up the server with a clean installation base (i.e., without using any
> system/user data from previous installation),
> I got the following.
>
> UNCAUGHT EXCEPTION IN main()
> java.lang.NullPointerException
>     at java.io.DataOutputStream.writeUTF(DataOutputStream.java:347)
>     at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
>     at
> org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:254)
>     at
> org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:244)
>     at org.apache.cassandra.db.Table$TableMetadata.apply(Table.java:209)
>     at org.apache.cassandra.db.DBManager.storeMetadata(DBManager.java:150)
>     at org.apache.cassandra.db.DBManager.(DBManager.java:102)
>     at org.apache.cassandra.db.DBManager.instance(DBManager.java:61)
>     at
> org.apache.cassandra.service.StorageService.start(StorageService.java:465)
>     at
> org.apache.cassandra.service.CassandraServer.start(CassandraServer.java:110)
>     at
> org.apache.cassandra.service.CassandraServer.main(CassandraServer.java:1078)
> Disconnected from the target VM, address: '127.0.0.1:45693', transport:
> 'socket'
>
> I did some debugging and found that in the following code, the first entry
> in cfNames is always null. Is it safe to say that if cfName is null, then we
> don't want to do the writings?
>
> public void serialize(TableMetadata tmetadata, DataOutputStream dos) throws
> IOException
>     {
>     int size = tmetadata.cfIdMap_.size();
>     dos.writeInt(size);
>     Set cfNames = tmetadata.cfIdMap_.keySet();
>
>     for ( String cfName : cfNames )
>     {
>     dos.writeUTF(cfName);
>     dos.writeInt( tmetadata.cfIdMap_.get(cfName).intValue() );
>     dos.writeUTF(tmetadata.getColumnFamilyType(cfName));
>     }
>     }
>
> A related question I have is what's the procedure for us to check in code? I
> have made some changes for adding latency counters in the server and
> exposing them through http. Would be good to check in the changes and minor
> fixes so that I don't have to risk of losing them ...
>
> Thanks,
>
> Jiansheng
>
>
>
>


Re: NPE in apache cassandra

2009-03-11 Thread Jonathan Ellis
Also, it will not work AT ALL with data from the old version.  you
need to start fresh.

-Jonathan

On Wed, Mar 11, 2009 at 4:00 PM, Jonathan Ellis  wrote:
> the config format changed.  now you need to specify the cfname as an 
> attribute:
>
>            
>
> -Jonathan
>
> On Wed, Mar 11, 2009 at 3:52 PM, Jiansheng Huang  
> wrote:
>>
>>
>> -- Forwarded message --
>> From: Jiansheng Huang 
>> Date: Wed, Mar 11, 2009 at 2:49 PM
>> Subject: NPE in apache cassandra
>> To: cassandra-user@incubator.apache.org, Avinash Lakshman
>> , Prashant Malik 
>> Cc: agu...@rocketfuelinc.com
>>
>>
>> Hi folks, I checked out the new code from apache and compiled it. When I
>> start up the server with a clean installation base (i.e., without using any
>> system/user data from previous installation),
>> I got the following.
>>
>> UNCAUGHT EXCEPTION IN main()
>> java.lang.NullPointerException
>>     at java.io.DataOutputStream.writeUTF(DataOutputStream.java:347)
>>     at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
>>     at
>> org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:254)
>>     at
>> org.apache.cassandra.db.Table$TableMetadataSerializer.serialize(Table.java:244)
>>     at org.apache.cassandra.db.Table$TableMetadata.apply(Table.java:209)
>>     at org.apache.cassandra.db.DBManager.storeMetadata(DBManager.java:150)
>>     at org.apache.cassandra.db.DBManager.(DBManager.java:102)
>>     at org.apache.cassandra.db.DBManager.instance(DBManager.java:61)
>>     at
>> org.apache.cassandra.service.StorageService.start(StorageService.java:465)
>>     at
>> org.apache.cassandra.service.CassandraServer.start(CassandraServer.java:110)
>>     at
>> org.apache.cassandra.service.CassandraServer.main(CassandraServer.java:1078)
>> Disconnected from the target VM, address: '127.0.0.1:45693', transport:
>> 'socket'
>>
>> I did some debugging and found that in the following code, the first entry
>> in cfNames is always null. Is it safe to say that if cfName is null, then we
>> don't want to do the writings?
>>
>> public void serialize(TableMetadata tmetadata, DataOutputStream dos) throws
>> IOException
>>     {
>>     int size = tmetadata.cfIdMap_.size();
>>     dos.writeInt(size);
>>     Set cfNames = tmetadata.cfIdMap_.keySet();
>>
>>     for ( String cfName : cfNames )
>>     {
>>     dos.writeUTF(cfName);
>>     dos.writeInt( tmetadata.cfIdMap_.get(cfName).intValue() );
>>     dos.writeUTF(tmetadata.getColumnFamilyType(cfName));
>>     }
>>     }
>>
>> A related question I have is what's the procedure for us to check in code? I
>> have made some changes for adding latency counters in the server and
>> exposing them through http. Would be good to check in the changes and minor
>> fixes so that I don't have to risk of losing them ...
>>
>> Thanks,
>>
>> Jiansheng
>>
>>
>>
>>
>


Re: OPHF vs. Random

2009-03-11 Thread Jonathan Ellis
Use Random for now.  The OPHF is the same as the old one, i.e., not
actually OP. :)

I'm pretty convinced at this point that it's impossible to have an
order-preserving hash that doesn't either (a) impose a relatively
short key length past which no partitioning is done (i.e., all keys w/
the same prefix go to the same node) or is (b) very sensitive to key
length such that the keys with a given length N will not be evenly
distributed across all nodes. Or both.

So I am working on migrating from pluggable hash functions key ->
BigInteger, to pluggable partitioning algorithms key -> EndPoint.
Without the requirement to transform to a numeric value first I think
I can create an order-preserving distribution that performs well.  (I
need this for range queries.)

So far I have just laid the foundation, here:
https://issues.apache.org/jira/browse/CASSANDRA-3

I hope to finish the rest tomorrow.

-Jonathan

On Wed, Mar 11, 2009 at 5:28 PM, Jiansheng Huang  wrote:
>
> Which one is better to use? The default is Random.
>
> In Avinash's annoucement mail, we have
> (1) Ability to switch between a random hash and a OPHF. We still have the
> old (wrong) OPHF in there. I will update it to the corrected one tomorrow.
>
> Is correct OPHF in? Thanks.
>


Re: OPHF vs. Random

2009-03-16 Thread Jonathan Ellis
The order-preserving partitioner code (not hash-based anymore) is up
now at https://issues.apache.org/jira/browse/CASSANDRA-3.

-Jonathan

On Wed, Mar 11, 2009 at 6:48 PM, Jonathan Ellis  wrote:
> Use Random for now.  The OPHF is the same as the old one, i.e., not
> actually OP. :)
>
> I'm pretty convinced at this point that it's impossible to have an
> order-preserving hash that doesn't either (a) impose a relatively
> short key length past which no partitioning is done (i.e., all keys w/
> the same prefix go to the same node) or is (b) very sensitive to key
> length such that the keys with a given length N will not be evenly
> distributed across all nodes. Or both.
>
> So I am working on migrating from pluggable hash functions key ->
> BigInteger, to pluggable partitioning algorithms key -> EndPoint.
> Without the requirement to transform to a numeric value first I think
> I can create an order-preserving distribution that performs well.  (I
> need this for range queries.)
>
> So far I have just laid the foundation, here:
> https://issues.apache.org/jira/browse/CASSANDRA-3
>
> I hope to finish the rest tomorrow.
>
> -Jonathan
>
> On Wed, Mar 11, 2009 at 5:28 PM, Jiansheng Huang  
> wrote:
>>
>> Which one is better to use? The default is Random.
>>
>> In Avinash's annoucement mail, we have
>> (1) Ability to switch between a random hash and a OPHF. We still have the
>> old (wrong) OPHF in there. I will update it to the corrected one tomorrow.
>>
>> Is correct OPHF in? Thanks.
>>
>


Re: OPHF vs. Random

2009-03-16 Thread Jonathan Ellis
I think that key -> endpoint might still be simpler long term but
short term there is far too much code that depends on being able to
compare both nodes and keys transformed to tokens.  Previously token
was hardcoded to be BigInteger but I introduced the abstraction
Token defining compareTo(Token), so you can have Token
as well as Token.  The OrderPreservingPartitioner then uses
Token to do lexicographic comparisons.

-Jonathan

On Mon, Mar 16, 2009 at 3:30 PM, Sandeep Tata  wrote:
> I like the idea of supporting more general/sophisticated strategies.
>
> Let me see if I understand the issues at play here:
>
> OPHFs are tricky to design and leaning on it for load-balancing and
> data locality will require incredibly good OPHFs that might not exist.
> (I learned this with a bunch of the experiments we ran on our
> relatively small test cluster)
>
> RANDOM of course is going to be great for load-balancing, but we're
> completely giving up locality, so range queries are shot.
>
> If we want to support clever placement strategies, we'll need to make
> some changes. Take for instance a key like "userid.messageid" . I
> might want:
> a) all the keys with the same userid on the same node, and
> b) all the messageids stored in order so I can do simple range queries
> like "get message 1 to 100"
>
> OPHF might break a) and RANDOM will break b)
>
> The claim is that the simplest (best :-) ) way to guarantee a) and b)
> is to map the key to an end-point instead of merely an integer.
>
> What if I changed the hash-function to do RANDOM on just the "userid"
> part. And each node still stores the keys in "<" order on the entire
> key ("userid.messageid") Would this solve the problem? What is this
> approach missing?
>
> Do we just need to decouple the hash used for routing from the key
> used in the end-point for storage? Is this essentially what the series
> of patches does?
>
>
> On Mon, Mar 16, 2009 at 1:36 PM, Jonathan Ellis  wrote:
>> The order-preserving partitioner code (not hash-based anymore) is up
>> now at https://issues.apache.org/jira/browse/CASSANDRA-3.
>>
>> -Jonathan
>>
>> On Wed, Mar 11, 2009 at 6:48 PM, Jonathan Ellis  wrote:
>>> Use Random for now.  The OPHF is the same as the old one, i.e., not
>>> actually OP. :)
>>>
>>> I'm pretty convinced at this point that it's impossible to have an
>>> order-preserving hash that doesn't either (a) impose a relatively
>>> short key length past which no partitioning is done (i.e., all keys w/
>>> the same prefix go to the same node) or is (b) very sensitive to key
>>> length such that the keys with a given length N will not be evenly
>>> distributed across all nodes. Or both.
>>>
>>> So I am working on migrating from pluggable hash functions key ->
>>> BigInteger, to pluggable partitioning algorithms key -> EndPoint.
>>> Without the requirement to transform to a numeric value first I think
>>> I can create an order-preserving distribution that performs well.  (I
>>> need this for range queries.)
>>>
>>> So far I have just laid the foundation, here:
>>> https://issues.apache.org/jira/browse/CASSANDRA-3
>>>
>>> I hope to finish the rest tomorrow.
>>>
>>> -Jonathan
>>>
>>> On Wed, Mar 11, 2009 at 5:28 PM, Jiansheng Huang  
>>> wrote:
>>>>
>>>> Which one is better to use? The default is Random.
>>>>
>>>> In Avinash's annoucement mail, we have
>>>> (1) Ability to switch between a random hash and a OPHF. We still have the
>>>> old (wrong) OPHF in there. I will update it to the corrected one tomorrow.
>>>>
>>>> Is correct OPHF in? Thanks.
>>>>
>>>
>>
>


some "getting started" information

2009-03-28 Thread Jonathan Ellis
Hi all,

There's a bunch of useful material about getting started with
Cassandra but it's rather scattered.  So until we get our wiki going I
wrote a blog post pulling some of that together:
http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html

HTH,

-Jonathan


cassandra-20

2009-03-31 Thread Jonathan Ellis
Just a heads up that I committed Eric Evans's patch from #20, which
replaces bin/start-server with bin/cassandra and bestows it with
magical shell kung-fu to background the server cleanly by default.
Should work out of the box on linux, OS X, and cygwin.

Use the -f flag to put it in foreground mode (the way it used to be)
and -p  to log the process id to a file where it can be used
for shutdown.

-Jonathan


Cassandra at OSCON

2009-04-02 Thread Jonathan Ellis
My proposal to present on Cassandra at OSCON this year was accepted.
OSCON will be July 22 to 24 in San Jose.  My talk will be on Thursday:
http://en.oreilly.com/oscon2009/public/schedule/grid/2009-07-23

I covered similar material at my PyCon open space talk last week
(standing room only); it went very well.  There is a lot of interest
in scalable systems and Cassandra is one of a very few of these that
can handle structured data.  This will help get Cassandra a lot more
visibility as it reboots as an ASF project.

(I'm also giving a talk on Friday, "What every developer should know
about database scalability."  I'm mostly going to focus on relational
databases there but non-relational options will be mentioned,
including Cassandra.)

-Jonathan


Re: Cassandra at OSCON

2009-04-02 Thread Jonathan Ellis
It looks like last year they took a bunch of videos, but it looks like
only selected talks to me: http://oscon.blip.tv/

So I don't know. :)

-Jonathan

On Thu, Apr 2, 2009 at 2:53 PM, Johan Oskarsson  wrote:
>
> Congrats!
> Will this talk be recorded for those of us who can't make it?
>
> /Johan
>
> Jonathan Ellis wrote:
>> My proposal to present on Cassandra at OSCON this year was accepted.
>> OSCON will be July 22 to 24 in San Jose.  My talk will be on Thursday:
>> http://en.oreilly.com/oscon2009/public/schedule/grid/2009-07-23
>>
>> I covered similar material at my PyCon open space talk last week
>> (standing room only); it went very well.  There is a lot of interest
>> in scalable systems and Cassandra is one of a very few of these that
>> can handle structured data.  This will help get Cassandra a lot more
>> visibility as it reboots as an ASF project.
>>
>> (I'm also giving a talk on Friday, "What every developer should know
>> about database scalability."  I'm mostly going to focus on relational
>> databases there but non-relational options will be mentioned,
>> including Cassandra.)
>>
>> -Jonathan
>>
>> >
>
>
> --~--~-~--~~~---~--~~
> You received this message because you are subscribed to the Google Groups 
> "Cassandra Users" group.
> To post to this group, send email to cassandra-u...@googlegroups.com
> To unsubscribe from this group, send email to 
> cassandra-user+unsubscr...@googlegroups.com
> For more options, visit this group at 
> http://groups.google.com/group/cassandra-user?hl=en
> -~--~~~~--~~--~--~---
>
>


Re: Sample Client Code

2009-04-09 Thread Jonathan Ellis
That looks reasonable.  How are you reading the data back out?  The
web interface only hits the local machine so it is not very useful in
a clustered situation.

-Jonathan

On Thu, Apr 9, 2009 at 4:02 PM, Sam D  wrote:
> Hi,
>
> I am new to Cassandra, just installed the latest version on my machine.  I
> am able to insert rows using the web (@7002), but I am not able to get a
> java client to insert rows into a table. Below the piece of code I am using,
> the insert call goes through fine without any exceptions, but I am not able
> to see the row in the table, so I assume its not being inserted properly.
>
>         socket = new TSocket(machine,port);
>         TProtocol tp = new TBinaryProtocol(socket);
>         cl = new Cassandra.Client(tp);
>         socket.open();
>         cl.insert("xmls", "x1", "content:xml", "xyz", 0);
>
> Can you please point me to any sample code available which I can refer to ?.
>
> Thanks
> Sam.
>


Re: Sample Client Code

2009-04-09 Thread Jonathan Ellis
is content a supercolumn?  otherwise specifying a subcolumn isn't going to work.

did you check your log file for exceptions?

On Thu, Apr 9, 2009 at 4:19 PM, Sam D  wrote:
> Thanks for the quick response,
>
> I have only one node. So the web client also should see the data, right ?.
> Below is the code which I am using to read.
>
>        socket = new TSocket(machine,port);
>         TProtocol tp = new TBinaryProtocol(socket);
>         cl = new Cassandra.Client(tp);
>         socket.open();
>     column_t u1 = cl.get_column("xmls","x1","content:xml");
>         System.out.println("xml : " + u1.value);
>
> Sam.
>
> On Thu, Apr 9, 2009 at 2:07 PM, Jonathan Ellis  wrote:
>>
>> That looks reasonable.  How are you reading the data back out?  The
>> web interface only hits the local machine so it is not very useful in
>> a clustered situation.
>>
>> -Jonathan
>>
>> On Thu, Apr 9, 2009 at 4:02 PM, Sam D  wrote:
>> > Hi,
>> >
>> > I am new to Cassandra, just installed the latest version on my machine.
>> > I
>> > am able to insert rows using the web (@7002), but I am not able to get a
>> > java client to insert rows into a table. Below the piece of code I am
>> > using,
>> > the insert call goes through fine without any exceptions, but I am not
>> > able
>> > to see the row in the table, so I assume its not being inserted
>> > properly.
>> >
>> >         socket = new TSocket(machine,port);
>> >         TProtocol tp = new TBinaryProtocol(socket);
>> >         cl = new Cassandra.Client(tp);
>> >         socket.open();
>> >         cl.insert("xmls", "x1", "content:xml", "xyz", 0);
>> >
>> > Can you please point me to any sample code available which I can refer
>> > to ?.
>> >
>> > Thanks
>> > Sam.
>> >
>
>


Re: Sample Client Code

2009-04-09 Thread Jonathan Ellis
So content:xml is your ColumnFamily:column tuple.  That looks right.

That exception is from the client side, right?  That looks to me like
it can't connect to the server.

Your connection code looks okay... port should be the thrift port,
9160 if you haven't changed it.

On Thu, Apr 9, 2009 at 4:31 PM, Sam D  wrote:
> No, its not a supercolumn, how do I retrieve it if its not a supercolumn ?.
>
>  
>    
>  
>
> I didn't notice it earlier, but yes, I am seeing the following exception in
> the log
>
> Exception in thread "main"
> com.facebook.thrift.transport.TTransportException: Cannot write to null
> outputStream
>     at com.facebook.thrift.transport.TIOStreamTransport.write(Unknown
> Source)
>     at com.facebook.thrift.protocol.TBinaryProtocol.writeI32(Unknown Source)
>
> Thanks
>
> On Thu, Apr 9, 2009 at 2:24 PM, Jonathan Ellis  wrote:
>>
>> is content a supercolumn?  otherwise specifying a subcolumn isn't going to
>> work.
>>
>> did you check your log file for exceptions?
>>
>> On Thu, Apr 9, 2009 at 4:19 PM, Sam D  wrote:
>> > Thanks for the quick response,
>> >
>> > I have only one node. So the web client also should see the data, right
>> > ?.
>> > Below is the code which I am using to read.
>> >
>> >        socket = new TSocket(machine,port);
>> >         TProtocol tp = new TBinaryProtocol(socket);
>> >         cl = new Cassandra.Client(tp);
>> >         socket.open();
>> >     column_t u1 = cl.get_column("xmls","x1","content:xml");
>> >         System.out.println("xml : " + u1.value);
>> >
>> > Sam.
>> >
>> > On Thu, Apr 9, 2009 at 2:07 PM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> That looks reasonable.  How are you reading the data back out?  The
>> >> web interface only hits the local machine so it is not very useful in
>> >> a clustered situation.
>> >>
>> >> -Jonathan
>> >>
>> >> On Thu, Apr 9, 2009 at 4:02 PM, Sam D  wrote:
>> >> > Hi,
>> >> >
>> >> > I am new to Cassandra, just installed the latest version on my
>> >> > machine.
>> >> > I
>> >> > am able to insert rows using the web (@7002), but I am not able to
>> >> > get a
>> >> > java client to insert rows into a table. Below the piece of code I am
>> >> > using,
>> >> > the insert call goes through fine without any exceptions, but I am
>> >> > not
>> >> > able
>> >> > to see the row in the table, so I assume its not being inserted
>> >> > properly.
>> >> >
>> >> >         socket = new TSocket(machine,port);
>> >> >         TProtocol tp = new TBinaryProtocol(socket);
>> >> >         cl = new Cassandra.Client(tp);
>> >> >         socket.open();
>> >> >         cl.insert("xmls", "x1", "content:xml", "xyz", 0);
>> >> >
>> >> > Can you please point me to any sample code available which I can
>> >> > refer
>> >> > to ?.
>> >> >
>> >> > Thanks
>> >> > Sam.
>> >> >
>> >
>> >
>
>


Re: Sample Client Code

2009-04-09 Thread Jonathan Ellis
For now you'll have to encode it somehow.

We have a ticket (https://issues.apache.org/jira/browse/CASSANDRA-29)
to switch to binary data as column values and that's high on my list
to get done.

-Jonathan

On Thu, Apr 9, 2009 at 7:40 PM, Sam D  wrote:
> Thanks Jonathan, it issue was due to some connectivity issues.  Its working
> fine now.
>
> I had one more question.
>
> Can we insert byte arrays as values for the columns ?. I am trying to store
> JPEG images.
>
> Thanks
>
> On Thu, Apr 9, 2009 at 2:38 PM, Jonathan Ellis  wrote:
>>
>> So content:xml is your ColumnFamily:column tuple.  That looks right.
>>
>> That exception is from the client side, right?  That looks to me like
>> it can't connect to the server.
>>
>> Your connection code looks okay... port should be the thrift port,
>> 9160 if you haven't changed it.
>>
>> On Thu, Apr 9, 2009 at 4:31 PM, Sam D  wrote:
>> > No, its not a supercolumn, how do I retrieve it if its not a supercolumn
>> > ?.
>> >
>> >  
>> >    
>> >  
>> >
>> > I didn't notice it earlier, but yes, I am seeing the following exception
>> > in
>> > the log
>> >
>> > Exception in thread "main"
>> > com.facebook.thrift.transport.TTransportException: Cannot write to null
>> > outputStream
>> >     at com.facebook.thrift.transport.TIOStreamTransport.write(Unknown
>> > Source)
>> >     at com.facebook.thrift.protocol.TBinaryProtocol.writeI32(Unknown
>> > Source)
>> >
>> > Thanks
>> >
>> > On Thu, Apr 9, 2009 at 2:24 PM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> is content a supercolumn?  otherwise specifying a subcolumn isn't going
>> >> to
>> >> work.
>> >>
>> >> did you check your log file for exceptions?
>> >>
>> >> On Thu, Apr 9, 2009 at 4:19 PM, Sam D  wrote:
>> >> > Thanks for the quick response,
>> >> >
>> >> > I have only one node. So the web client also should see the data,
>> >> > right
>> >> > ?.
>> >> > Below is the code which I am using to read.
>> >> >
>> >> >        socket = new TSocket(machine,port);
>> >> >         TProtocol tp = new TBinaryProtocol(socket);
>> >> >         cl = new Cassandra.Client(tp);
>> >> >         socket.open();
>> >> >     column_t u1 = cl.get_column("xmls","x1","content:xml");
>> >> >         System.out.println("xml : " + u1.value);
>> >> >
>> >> > Sam.
>> >> >
>> >> > On Thu, Apr 9, 2009 at 2:07 PM, Jonathan Ellis 
>> >> > wrote:
>> >> >>
>> >> >> That looks reasonable.  How are you reading the data back out?  The
>> >> >> web interface only hits the local machine so it is not very useful
>> >> >> in
>> >> >> a clustered situation.
>> >> >>
>> >> >> -Jonathan
>> >> >>
>> >> >> On Thu, Apr 9, 2009 at 4:02 PM, Sam D 
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > I am new to Cassandra, just installed the latest version on my
>> >> >> > machine.
>> >> >> > I
>> >> >> > am able to insert rows using the web (@7002), but I am not able to
>> >> >> > get a
>> >> >> > java client to insert rows into a table. Below the piece of code I
>> >> >> > am
>> >> >> > using,
>> >> >> > the insert call goes through fine without any exceptions, but I am
>> >> >> > not
>> >> >> > able
>> >> >> > to see the row in the table, so I assume its not being inserted
>> >> >> > properly.
>> >> >> >
>> >> >> >         socket = new TSocket(machine,port);
>> >> >> >         TProtocol tp = new TBinaryProtocol(socket);
>> >> >> >         cl = new Cassandra.Client(tp);
>> >> >> >         socket.open();
>> >> >> >         cl.insert("xmls", "x1", "content:xml", "xyz", 0);
>> >> >> >
>> >> >> > Can you please point me to any sample code available which I can
>> >> >> > refer
>> >> >> > to ?.
>> >> >> >
>> >> >> > Thanks
>> >> >> > Sam.
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>


change to client API

2009-04-20 Thread Jonathan Ellis
All column values that were declared `string` in thrift are now
`binary`.  (See https://issues.apache.org/jira/browse/CASSANDRA-29.)

For Java that means byte[] instead of String.

For Python, because thrift treatment of `string` is broken, that
actually means no change -- values were str before and remain str.

I don't know the details of the other thrift generators but it
probably follows one of those two patterns. :)

-Jonathan


Re: Questions around API changes

2009-05-01 Thread Jonathan Ellis
On Fri, May 1, 2009 at 5:59 AM, Jonas Bonér  wrote:
> Hi there.
>
> First, should I use this ML or the google forum?

This one.

> * What does the new timestamp arg in
> public boolean remove(String tablename, String key, String
> columnFamily_column, long timestamp, boolean block)
> specify?

It's compared against the timestamp in insert, to make sure remove
doesn't get applied to newer data than it was intended to.

> * Any reason for making ctor in CassandraServer protected? I am
> embedding Cassandra and now I have to use reflection to create the
> instance. No big deal, just checking why?

No particular reason I know of.  We can make that public.

> * I get this exception when invoking batch_update (in the previous
> release, haven't tried with the latest trunk yet):

Yeah, that's a long-standing bug.  I have a patch to fix it here
https://issues.apache.org/jira/browse/CASSANDRA-120 that is waiting
for review.

-Jonathan


Re: Questions around API changes

2009-05-01 Thread Jonathan Ellis
On Fri, May 1, 2009 at 11:19 AM, Jonas Bonér  wrote:
> Thanks for the answers.
>
> Btw, is the CQL in usable state?

No idea.  Probably not. :)

> If not, any plans?

The third cassandra committer from FB who mostly remains silent
(forget his name atm) is supposedly planning to work on it more.

> What about the CLI interface?

That is working.  In fact, Eric just wrote a new wrapper script for it
and a README: 
https://svn.apache.org/repos/asf/incubator/cassandra/trunk/README.txt

-Jonathan


Re: Some questions.

2009-05-02 Thread Jonathan Ellis
On Sat, May 2, 2009 at 6:22 AM, Manuel Crotti
 wrote:
> Now I have some questions:
> 1. each "storage-conf.xml" should contain just one of the above
> ip-addresses (obviously not the localhost's IP address) in the 
> section to let cassandra learn the whole topology? Or it must contain the
> whole list?

Just pick one of the public IPs to be seed.

> 2. how can I see if the nodes of the cluster are "talking" (some logfile,
> ...)? (I supposed to find it into the localhost:7002 interface but i see
> just a host -localhost -  and I suppose hosts are not "talking")

If it is working each :7002 will show all the nodes.

> 3. What should differ between the  "storage-conf.xml" files of each node?
> 3.1. the "storage-conf.xml" of each node should contain the table structure
> to replicate/propagate the information of the data of a table?

Right.  The only thing you that should be different is the
ListenAddress section.  (You can try leaving that out and Cassandra
will pick an interface to use but it often guesses a non-public
interface which is not helpful. :)

> 3.2 finally: should I start a cluster with an empty DB or I can replicate an
> existing DB?

You can start from an existing one if it's really legitimate for all
nodes to have copies of that data but it probably is not.

> I also submit a couple of errors that raised using the command-line client:

Okay, so the problems are that (1) it thinks it is connected when it
is not, and (2) it allows you to run commands when it is not
connected.  Right?  Can you file those in the issue tracker?
https://issues.apache.org/jira/browse/CASSANDRA

thanks,

-Jonathan


last api change for 0.3

2009-05-05 Thread Jonathan Ellis
I committed the patch for CASSANDRA-131 which (a) enables exception
throwing on the insert methods (so you don't have to explicitly check
return value to see if something worked), and (b) moves the _blocking
method as a flag into the nonblocking ones.  so instead of
insert_blocking use insert with block=True.  The block flags default
to false so your nonblocking calls will work as before.  (Assuming you
are using a thrift binding that actually generates default values
correctly.  I haven't seen one yet but I assume they're out there. :)

-Jonathan


Re: Non relational db meetup - San Francisco, June 11th

2009-05-12 Thread Jonathan Ellis
That's true, but 100 people is about the largest space you're going to
find for free, so past that you'd have to start charging people and
worrying about taxes and such.  Messy.

Maybe next year... :)

-Jonathan

On Tue, May 12, 2009 at 2:02 PM, Jonas Bonér  wrote:
> Great initiative.
> Just sad that it is not the week before (during JavaOne). Then I think
> a lot of people (including me) could go.
>
> 2009/5/12 Johan Oskarsson :
>> Cassandra will be represented by Avinash Lakshman on a free full day
>> meetup covering "open source, distributed, non relational databases" on
>> June 11th in San Francisco.
>>
>> The idea is that the event will give people interested in this area a
>> great introduction and an easy way to compare the different projects out
>> there as well as the opportunity to discuss them with the developers.
>>
>> Registration
>> The event is free but space is limited, please register if you wish to
>> attend: http://nosql.eventbrite.com/
>>
>>
>> Preliminary schedule, 2009-06-11
>> 09.45: Doors open
>> 10.00: Intro session (Todd Lipcon, Cloudera)
>> 10.40: Voldemort (Jay Kreps, Linkedin)
>> 11.20: Short break
>> 11.30: Cassandra (Avinash Lakshman, Facebook)
>> 12.10: Free lunch (sponsored by CBSi)
>> 13.10: Dynomite (Cliff Moon, Powerset)
>> 13.50: HBase (Ryan Rawson, Stumbleupon)
>> 14.30: Short break
>> 14.40: Hypertable (Doug Judd, Zvents)
>> 15.20: Panel discussion
>> 16.00: End of meetup, relocate to a pub called Kate O’Brien’s nearby
>>
>> Location
>> Magma room, CBS interactive
>> 235 Second Street
>> San Francisco, CA 94105
>>
>> Sponsor
>> A big thanks to CBSi for providing the venue and free lunch.
>>
>>
>> /Johan Oskarsson, developer @ last.fm
>>
>
>
>
> --
> Jonas Bonér
>
> twitter: @jboner
> blog:    http://jonasboner.com
> work:   http://crisp.se
> work:   http://scalablesolutions.se
> code:   http://github.com/jboner
>


Cassandra 0.3 RC is out

2009-05-13 Thread Jonathan Ellis
Short version: http://incubator.apache.org/cassandra/cassandra-0.3.0-rc.tgz
Long version: 
http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html

Release Candidate means "we fixed all the bugs we could find; help us
find more so the release is even more solid." :)

I've created a 0.3 branch for bugfixes; trunk will now be for 0.4
development.  I'll start to look at the patches I've been postponing
until the RC was out now; thanks for your patience, Jun and Sandeep.

-Jonathan


Re: Cassandra 0.3 RC is out

2009-05-13 Thread Jonathan Ellis
Oops, fat-fingered the url:
http://incubator.apache.org/cassandra/releases/cassandra-0.3-rc.tgz

:)

On Wed, May 13, 2009 at 10:28 PM, Jonathan Ellis  wrote:
> Short version: http://incubator.apache.org/cassandra/cassandra-0.3.0-rc.tgz
> Long version: 
> http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html
>
> Release Candidate means "we fixed all the bugs we could find; help us
> find more so the release is even more solid." :)
>
> I've created a 0.3 branch for bugfixes; trunk will now be for 0.4
> development.  I'll start to look at the patches I've been postponing
> until the RC was out now; thanks for your patience, Jun and Sandeep.
>
> -Jonathan
>


Re: Cassandra 0.3 RC is out

2009-05-14 Thread Jonathan Ellis
I've been asked to change the download url to
http://people.apache.org/%7Ejbellis/cassandra/cassandra-0.3-rc.tgz to
avoid incorrectly implying that this is An Official Release which it
is not.

-Jonathan


Re: Cassandra 0.3 RC is out

2009-05-14 Thread Jonathan Ellis
Thanks!  And it is probably worth repeating that although I am the
only active committer at the moment, this represents the work of many
people, especially (alphabetically :) Eric Evans, Johan Oskarsson, Jun
Rao, and Sandeep Tata -- hopefully we will get more committers from
this group soon.  Lots of others also contributed patches, bug
reports, and testing.

-Jonathan

On May 14, 2009, at 8:34 AM, Jonas Bonér  wrote:

> Awesome job Jonathan.
> Just getting into the codebase so fast is admirable.
> Churning out code like this (and releases) is amazing. Keep it up.
>
> 2009/5/14 Jonathan Ellis :
>> Short version: http://incubator.apache.org/cassandra/cassandra-0.3.0-rc.tgz
>> Long version: 
>> http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html
>>
>> Release Candidate means "we fixed all the bugs we could find; help us
>> find more so the release is even more solid." :)
>>
>> I've created a 0.3 branch for bugfixes; trunk will now be for 0.4
>> development.  I'll start to look at the patches I've been postponing
>> until the RC was out now; thanks for your patience, Jun and Sandeep.
>>
>> -Jonathan
>>
>
>
>
> --
> Jonas Bonér
>
> twitter: @jboner
> blog:http://jonasboner.com
> work:   http://crisp.se
> work:   http://scalablesolutions.se
> code:   http://github.com/jboner


Re: Node Recovery

2009-05-18 Thread Jonathan Ellis
That's the price you pay for (a) eventual consistency in general and
(b) doing read repair in the background specifically.  Cassandra also
has functionality (called "strong read") to do a quorum read in the
foreground and repair if necessary but that is not exposed in Thrift
yet -- but even with that there are scenarios where you could get back
"no data" for a write that has been acked.  The only way to avoid it
entirely is to require acking all writes from all replicas and
checking all replicas on all reads, which (in a large cluster) is
going to hurt from the availability standpoint.  Most apps are ok
trading off some consistency for availability.

-Jonathan

On Mon, May 18, 2009 at 12:24 PM, Chris Goffinet  wrote:
> Scenario: if i setup a 2 node cluster, with replicationfactor of 2. Inserted
> a new key (1) into a table. Its replicated to both nodes. I shutdown node
> (2), delete all data, then bring it back up. I noticed that if i make a
> request to that node the first time for that key, it will return back an
> empty result (was using get_slice), then that node will pull the data from
> other node. On next request to that node its there. How does one really know
> if the data isn't there (should I retry) vs it was never there to begin
> with?
>
> ---
> Chris Goffinet
> goffi...@digg.com
>
>
>
>
>
>


Re: multi-table

2009-05-18 Thread Jonathan ellis
Different apps will have different performance characteristics (and  
different key domains, which can also be important).  So there are  
operational reasons to prefer cluster-per-app.


That said, multi table support is high on my priority list.  The  
changes required are straightforward so I'd love to help someone dive  
in as opposed to just doing it myself. :)


-Jonathan

On May 18, 2009, at 7:56 PM, Chris Goffinet  wrote:

Has anyone here needed multi-table support yet in Cassandra? Anyone  
willing to share use cases where you felt maybe you didn't need  
multi-table support? Seems just a bit odd it isn't there yet :)


---
Chris Goffinet
goffi...@digg.com







schema example

2009-05-18 Thread Jonathan Ellis
Does anyone have a simple app schema they can share?

I can't share the one for our main app.  But we do need an example
here.  A real one would be nice if we can find one.

I checked App Engine.  They don't have a whole lot of examples either.
 They do have a really simple one:
http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html

The most important thing in Cassandra modeling is choosing a good key,
since that is what most of your lookups will be by.  Keys are also how
Cassandra scales -- Cassandra can handle effectively infinite keys
(given enough nodes obviously) but only thousands to millions of
columns per key/CF (depending on what API calls you use -- Jun is
adding one now that does not deseriailze everything in the whole CF
into memory.  The rest will need to follow this model eventually too).

For this guestbook I think the choice is obvious: use the name as the
key, and have a single simple CF for the messages.  Each column will
be a message (you can even use the mandatory timestamp field as part
of your user-visible data.  win!).  You get the list (or page) of
users with get_key_range and then their messages with get_slice.



Anyone got another one for pedagogical purposes?

-Jonathan


Re: schema example

2009-05-19 Thread Jonathan Ellis
Mail storage, man, I think pretty much anything I could come up with
would look pretty simplistic compared to what "real" systems do in
that domain. :)

But blogs, I think I can handle those.  Let's make it ours multiuser
or there isn't enough scale to make it interesting. :)

The interesting thing here is we want to be able to query two things
efficiently:
 - the most recent posts belonging to a given blog, in reverse
chronological order
 - a single post and its comments, in chronological order

At first glance you might think we can again reasonably do this with a
single CF, this time a super CF:



The key is the blog name, the supercolumns are posts and the
subcolumns are comments.  This would be reasonable BUT supercolumns
are just containers, they have no data or timestamp associated with
them directly (only through their subcolumns).  So you cannot sort a
super CF by time.

So instead what I would do would be to use two CFs:




For the first, the keys used would be blog names, and the columns
would be the post titles and body.  So to get a list of most recent
posts you just do a slice query.  Even though Cassandra currently
handles large groups of columns sub-optimally, even with a blog
updated several times a day you'd be safe taking this approach (i.e.
we'll have that problem fixed before you start seeing it :).

For the second, the keys are blog name.  The
columns are the comment data.  You can serialize these a number of
ways; I would probably use title as the column name and have the value
be the author + body (e.g. as a json dict).  Again we use the slice
call to get the comments in order.  (We will have to manually reverse
what slice gives us since time sort is always reverse chronological
atm, but the overhead of doing this in memory will be negligible.)

Does this help?

-Jonathan

On Tue, May 19, 2009 at 11:49 AM, Evan Weaver  wrote:
> Even if it's not actually in real-life use, some examples for common
> domains would really help clarify things.
>
>  * blog
>  * email storage
>  * search index
>
> etc.
>
> Evan
>
> On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis  wrote:
>> Does anyone have a simple app schema they can share?
>>
>> I can't share the one for our main app.  But we do need an example
>> here.  A real one would be nice if we can find one.
>>
>> I checked App Engine.  They don't have a whole lot of examples either.
>>  They do have a really simple one:
>> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
>>
>> The most important thing in Cassandra modeling is choosing a good key,
>> since that is what most of your lookups will be by.  Keys are also how
>> Cassandra scales -- Cassandra can handle effectively infinite keys
>> (given enough nodes obviously) but only thousands to millions of
>> columns per key/CF (depending on what API calls you use -- Jun is
>> adding one now that does not deseriailze everything in the whole CF
>> into memory.  The rest will need to follow this model eventually too).
>>
>> For this guestbook I think the choice is obvious: use the name as the
>> key, and have a single simple CF for the messages.  Each column will
>> be a message (you can even use the mandatory timestamp field as part
>> of your user-visible data.  win!).  You get the list (or page) of
>> users with get_key_range and then their messages with get_slice.
>>
>> 
>>
>> Anyone got another one for pedagogical purposes?
>>
>> -Jonathan
>>
>
>
>
> --
> Evan Weaver
>


Re: Ingesting from Hadoop to Cassandra

2009-05-21 Thread Jonathan Ellis
Have you benchmarked the batch insert apis?  If that is "fast enough"
then it's by far the simplest way to go.

Otherwise you'll have to use the binarymemtable stuff which is
undocumented and not exposed as a client api (you basically write a
custom "loader" version of cassandra to use it, I think).  FB used
this for their own bulk loading so it works at some level, but clearly
there is some assembly required.

-Jonathan

On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares  wrote:
> Hi all,
>
> I'm trying to find the most optimal way to ingest my content from Hadoop to
> Cassandra.  Assuming I have figured out the table representation for this
> content, what is the best way to do go about pushing from my cluster?  What
> Cassandra client batch APIs do you suggest I use to push to Cassandra? I'm
> sure this is a common pattern, I'm curious to see how it has been
> implemented.  Assume millions of of rows and 1000s of columns.
>
> Thanks in advance,
> -Alex
>
>


Re: Ingesting from Hadoop to Cassandra

2009-05-21 Thread Jonathan Ellis
No, batch APIs are per CF, not per row.

Several people have asked Avinash for sample code using BinaryMemtable
but to my knowledge nothing ever came of that.

The high level description of the BMT is that you give it serialized
CFs as values instead of raw columns so it can just sort on key and
write directly to disk.  So then you would do something like this:

Table table = Table.open(mytablename);
ColumnFamilyStore store = table.getColumnFamilyStore(mycfname);
for cf : mydata
  store.applyBinary(cf.key, toByteArray(cf))

There's no provision for doing this over the network that I know of,
you have to put the right keys on the right nodes manually.

-Jonathan

On Thu, May 21, 2009 at 11:27 AM, Alexandre Linares  wrote:
> Jonathan,
>
> Thanks for your thoughts.
>
> I've done some simple benchmarks with the batch insert apis and was looking
> for something slightly more performant.  Is there a batch row insert that I
> missed?
>
> Any pointers (at all) to anything related to FB's bulk loading or the
> binarymemtable?  I've attempted to do this by writing a custom IVerbHandler
> for ingestion and interfacing with the MessagingService internally but it's
> not that clean.
>
> Thanks again,
> -Alex
>
> 
> From: Jonathan Ellis 
> To: cassandra-user@incubator.apache.org
> Sent: Thursday, May 21, 2009 7:44:59 AM
> Subject: Re: Ingesting from Hadoop to Cassandra
>
> Have you benchmarked the batch insert apis?  If that is "fast enough"
> then it's by far the simplest way to go.
>
> Otherwise you'll have to use the binarymemtable stuff which is
> undocumented and not exposed as a client api (you basically write a
> custom "loader" version of cassandra to use it, I think).  FB used
> this for their own bulk loading so it works at some level, but clearly
> there is some assembly required.
>
> -Jonathan
>
> On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares 
> wrote:
>> Hi all,
>>
>> I'm trying to find the most optimal way to ingest my content from Hadoop
>> to
>> Cassandra.  Assuming I have figured out the table representation for this
>> content, what is the best way to do go about pushing from my cluster?
>> What
>> Cassandra client batch APIs do you suggest I use to push to Cassandra? I'm
>> sure this is a common pattern, I'm curious to see how it has been
>> implemented.  Assume millions of of rows and 1000s of columns.
>>
>> Thanks in advance,
>> -Alex
>>
>>
>
>


Re: Ingesting from Hadoop to Cassandra

2009-05-25 Thread Jonathan Ellis
waiting on <0x92a26e30> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:485)
> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> - locked <0x92a26e30> (a java.lang.ref.Reference$Lock)
>
> "main" prio=10 tid=0x0805a800 nid=0x4c47 runnable [0xb7fea000..0xb7feb288]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> - locked <0x92ac9578> (a java.io.BufferedOutputStream)
> at
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:139)
> at
> org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:184)
> at org.apache.cassandra.service.column_t.write(column_t.java:321)
> at
> org.apache.cassandra.service.superColumn_t.write(superColumn_t.java:291)
> at
> org.apache.cassandra.service.batch_mutation_super_t.write(batch_mutation_super_t.java:365)
> at
> org.apache.cassandra.service.Cassandra$batch_insert_superColumn_args.write(Cassandra.java:9776)
> at
> org.apache.cassandra.service.Cassandra$Client.send_batch_insert_superColumn(Cassandra.java:546)
> at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.pushDocuments(CassandraImport.java:168)
> at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.sendOut(CassandraImport.java:146)
> at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:127)
> at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:1)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
>
> ]
>
> It looks like the client is waiting on a response from Cassandra but never
> gets it. Any ideas?  I had seen similar behavior in the Cassandra code prior
> to the 0.3 release candidate, b/c of a race condition in SelectorManager.
> It looks like this was taken care of in 0.3-rc, so I'm not sure what's going
> on here.
>
> Thanks,
> -Alex
>
> 
> From: Jonathan Ellis 
> To: cassandra-user@incubator.apache.org
> Sent: Thursday, May 21, 2009 9:42:29 AM
> Subject: Re: Ingesting from Hadoop to Cassandra
>
> No, batch APIs are per CF, not per row.
>
> Several people have asked Avinash for sample code using BinaryMemtable
> but to my knowledge nothing ever came of that.
>
> The high level description of the BMT is that you give it serialized
> CFs as values instead of raw columns so it can just sort on key and
> write directly to disk.  So then you would do something like this:
>
> Table table = Table.open(mytablename);
> ColumnFamilyStore store = table.getColumnFamilyStore(mycfname);
> for cf : mydata
>   store.applyBinary(cf.key, toByteArray(cf))
>
> There's no provision for doing this over the network that I know of,
> you have to put the right keys on the right nodes manually.
>
> -Jonathan
>
> On Thu, May 21, 2009 at 11:27 AM, Alexandre Linares 
> wrote:
>> Jonathan,
>>
>> Thanks for your thoughts.
>>
>> I've done some simple benchmarks with the batch insert apis and was
>> looking
>> for something slightly more performant.  Is there a batch row insert that
>> I
>> missed?
>>
>> Any pointers (at all) to anything related to FB's bulk loading or the
>> binarymemtable?  I've attempted to do this by writing a custom
>> IVerbHandler
>> for ingestion and interfacing with the MessagingService internally but
>> it's
>> not that clean.
>>
>> Thanks again,
>> -Alex
>>
>> 
>> From: Jonathan Ellis 
>> To: cassandra-user@incubator.apache.org
>> Sent: Thursday, May 21, 2009 7:44:59 AM
>> Subject: Re: Ingesting from Hadoop to Cassandra
>>
>> Have you benchmarked the batch insert apis?  If that is "fast enough"
>> then it's by far the simplest way to go.
>>
>> Otherwise you'll have to use the binarymemtable stuff which is
>> undocumented and not exposed as a client api (you basically write a
>> custom "loader" version of cassandra to use it, I think).  FB used
>> this for their own bulk loading so it works at some level, but clearly
>> there is some assembly required.
>>
>> -Jonathan
>>
>> On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares 
>> wrote:
>>> Hi all,
>>>
>>> I'm trying to find the most optimal way to ingest my content from Hadoop
>>> to
>>> Cassandra.  Assuming I have figured out the table representation for this
>>> content, what is the best way to do go about pushing from my cluster?
>>> What
>>> Cassandra client batch APIs do you suggest I use to push to Cassandra?
>>> I'm
>>> sure this is a common pattern, I'm curious to see how it has been
>>> implemented.  Assume millions of of rows and 1000s of columns.
>>>
>>> Thanks in advance,
>>> -Alex
>>>
>>>
>>
>>
>
>


Re: Ingesting from Hadoop to Cassandra

2009-05-26 Thread Jonathan Ellis
id=0x4c49 in
> Object.wait() [0x8fe4a000..0x8fe4afb0]
>    java.lang.Thread.State: WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     - waiting on <0x92a26e30> (a java.lang.ref.Reference$Lock)
>     at java.lang.Object.wait(Object.java:485)
>     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>     - locked <0x92a26e30> (a java.lang.ref.Reference$Lock)
>
> "main" prio=10 tid=0x0805a800 nid=0x4c47 runnable [0xb7fea000..0xb7feb288]
>    java.lang.Thread.State: RUNNABLE
>     at java.net.SocketOutputStream.socketWrite0(Native Method)
>     at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>     at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>     - locked <0x92ac9578> (a java.io.BufferedOutputStream)
>     at
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:139)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:184)
>     at org.apache.cassandra.service.column_t.write(column_t.java:321)
>     at
> org.apache.cassandra.service.superColumn_t.write(superColumn_t.java:291)
>     at
> org.apache.cassandra.service.batch_mutation_super_t.write(batch_mutation_super_t.java:365)
>     at
> org.apache.cassandra.service.Cassandra$batch_insert_superColumn_args.write(Cassandra.java:9776)
>     at
> org.apache.cassandra.service.Cassandra$Client.send_batch_insert_superColumn(Cassandra.java:546)
>     at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.pushDocuments(CassandraImport.java:168)
>     at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.sendOut(CassandraImport.java:146)
>     at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:127)
>     at
> com.yahoo.carmot.client.mapred.CassandraImport$PushReduce.reduce(CassandraImport.java:1)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
>     at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
>
> ]
>
> It looks like the client is waiting on a response from Cassandra but never
> gets it. Any ideas?  I had seen similar behavior in the Cassandra code prior
> to the 0.3 release candidate, b/c of a race condition in SelectorManager.
> It looks like this was taken care of in 0.3-rc, so I'm not sure what's going
> on here.
>
> Thanks,
> -Alex
>
> 
> From: Jonathan Ellis 
> To: cassandra-user@incubator.apache.org
> Sent: Thursday, May 21, 2009 9:42:29 AM
> Subject: Re: Ingesting from Hadoop to Cassandra
>
> No, batch APIs are per CF, not per row.
>
> Several people have asked Avinash for sample code using BinaryMemtable
> but to my knowledge nothing ever came of that.
>
> The high level description of the BMT is that you give it serialized
> CFs as values instead of raw columns so it can just sort on key and
> write directly to disk.  So then you would do something like this:
>
> Table table = Table.open(mytablename);
> ColumnFamilyStore store = table.getColumnFamilyStore(mycfname);
> for cf : mydata
>   store.applyBinary(cf.key, toByteArray(cf))
>
> There's no provision for doing this over the network that I know of,
> you have to put the right keys on the right nodes manually.
>
> -Jonathan
>
> On Thu, May 21, 2009 at 11:27 AM, Alexandre Linares 
> wrote:
>> Jonathan,
>>
>> Thanks for your thoughts.
>>
>> I've done some simple benchmarks with the batch insert apis and was
>> looking
>> for something slightly more performant.  Is there a batch row insert that
>> I
>> missed?
>>
>> Any pointers (at all) to anything related to FB's bulk loading or the
>> binarymemtable?  I've attempted to do this by writing a custom
>> IVerbHandler
>> for ingestion and interfacing with the MessagingService internally but
>> it's
>> not that clean.
>>
>> Thanks again,
>> -Alex
>>
>> 
>> From: Jonathan Ellis 
>> To: cassandra-user@incubator.apache.org
>> Sent: Thursday, May 21, 2009 7:44:59 AM
>> Subject: Re: Ingesting from Hadoop to Cassandra
>>
>> Have you benchmarked the batch insert apis?  If that is "fast enough"
>> then it's by far the simplest way to go.
>>
>> Otherwise you'll have to use the binarymemtable stuff which is
>> undocumented and not exposed as a client api (you basically write a
>> custom "loader" version of cassandra to use it, I think).  FB used
>> this for their own bulk loading so it works at some level, but clearly
>> there is some assembly required.
>>
>> -Jonathan
>>
>> On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares 
>> wrote:
>>> Hi all,
>>>
>>> I'm trying to find the most optimal way to ingest my content from Hadoop
>>> to
>>> Cassandra.  Assuming I have figured out the table representation for this
>>> content, what is the best way to do go about pushing from my cluster?
>>> What
>>> Cassandra client batch APIs do you suggest I use to push to Cassandra?
>>> I'm
>>> sure this is a common pattern, I'm curious to see how it has been
>>> implemented.  Assume millions of of rows and 1000s of columns.
>>>
>>> Thanks in advance,
>>> -Alex
>>>
>>>
>>
>>
>
>


Re: Ingesting from Hadoop to Cassandra

2009-05-27 Thread Jonathan Ellis
On Wed, May 27, 2009 at 6:39 PM, Alexandre Linares  wrote:
> So it actually doesn't look blocked, but it's crawling.  Of course, in
> Hadoop, it always timed out (10 mins), before I could tell that it was
> crawling (I think)

So, back to the original hypothesis: you need to increase the memory
you are giving to the JVM, (in bin/cassandra.in.sh) or increase the
flush frequency (by lowering the memtable object count threshold).

> Can you reproduce with a non-hadoop client program that you can share here?

BTW, I meant share the client code, not a client thread dump.  And
please use attachments for thread dumps or source files; it's really
impossible to read this thread on my phone with everything jammed into
the body. :)

-Jonathan


Re: Ingesting from Hadoop to Cassandra

2009-05-28 Thread Jonathan Ellis
I can't reproduce with this, there is too much unspecified.  (What is
a Document?  How do I get one?)

Attached is a short program that successfully does 100k supercolumn
inserts against a default configuration.  Can you create a program
like this for me to run?  (Java is fine; Python is just more concise.)

-Jonathan

On Thu, May 28, 2009 at 11:03 AM, Alexandre Linares  wrote:
> Jonathan, sorry for the lengthy emails! Hope this one's more readable.
>
> So I'm fairly convinced it's not a Cassandra-side configuration problem; at
> least not one that entails tweaking the object count threshold or the
> memtable size.
>
> Given the client code at http://pastie.org/492753 :
>
from thrift.transport import TTransport
from thrift.transport import TSocket
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol

from cassandra import Cassandra
from cassandra.ttypes import batch_mutation_t, batch_mutation_super_t, superColumn_t, column_t, NotFoundException, InvalidRequestException


socket = TSocket.TSocket('localhost', 9160)
transport = TTransport.TBufferedTransport(socket)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Cassandra.Client(protocol)
transport.open()


for i in xrange(10):
doc_id = str(i)
columns = [column_t('header', 'x'*1024, 0)]
cfmap = {'Super1': [superColumn_t(doc_id, columns)]}
client.batch_insert_superColumn(batch_mutation_t('Table1', doc_id, cfmap), True)
print i


Re: cassandra's performance?

2009-06-03 Thread Jonathan Ellis
We're basically in a roll-your-own benchmark state.  Johan can
probably give some pointers:
http://blog.oskarsson.nu/2009/05/vpork.html.  Also see the "how fast
is it" section here:
http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html

-Jonathan

On Wed, Jun 3, 2009 at 3:06 AM, lichun li  wrote:
> I can't find cassandra's performance data, such as throughput. Does
> anyone know where to find these data?
>
> --
> Sincerely yours,
>
> Lichun Li
> Mobile Life New Media Lab, BUPT
>


Re: cassandra's performance?

2009-06-03 Thread Jonathan Ellis
Cassandra is not designed to work memory-only.  It's designed designed
to use disk for durability and to accommodate using large sets of
data, letting the OS use memory as a huge cache for that.  For typical
data use patterns (where a relatively small amount is "hot") this will
be a much better use of hardware than memory-only.

On Wed, Jun 3, 2009 at 7:44 PM, lichun li  wrote:
> Thank you!
> The "how fast is it" section says:"In a nutshell, Cassandra is much
> faster than relational databases, and much slower than memory-only
> systems or systems that don't sync each update to disk."
> Can Cassandra work in a memory-only mode? Can it be done by just
> changing configuration?
>
> On Wed, Jun 3, 2009 at 10:38 PM, Jonathan Ellis  wrote:
>> We're basically in a roll-your-own benchmark state.  Johan can
>> probably give some pointers:
>> http://blog.oskarsson.nu/2009/05/vpork.html.  Also see the "how fast
>> is it" section here:
>> http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html
>>
>> -Jonathan
>>
>> On Wed, Jun 3, 2009 at 3:06 AM, lichun li  wrote:
>>> I can't find cassandra's performance data, such as throughput. Does
>>> anyone know where to find these data?
>>>
>>> --
>>> Sincerely yours,
>>>
>>> Lichun Li
>>> Mobile Life New Media Lab, BUPT
>>>
>>
>
>
>
> --
> Sincerely yours,
>
> Lichun Li
> Mobile Life New Media Lab, BUPT
>


Re: cassandra's performance?

2009-06-03 Thread Jonathan Ellis
You'd still be hitting the transaction log, though.  (I assume the
logging you were talking about was the log4j kind, because you can't
turn off the xlog without hacking at the code right now.)

-Jonathan

On Wed, Jun 3, 2009 at 8:11 PM, Sandeep Tata  wrote:
> Apart from logging, given enough memory, you could get Cassandra to
> behave almost like an in-memory system.
>
> Turning off logging is relatively straightforward.
> If you turn off periodic flushing of memtables and have the thresholds
> high enough (a little more tricky), you're done -- chances are the
> read path and the write path will never hit disk.
>
>
> On Wed, Jun 3, 2009 at 5:48 PM, Jonathan Ellis  wrote:
>> Cassandra is not designed to work memory-only.  It's designed designed
>> to use disk for durability and to accommodate using large sets of
>> data, letting the OS use memory as a huge cache for that.  For typical
>> data use patterns (where a relatively small amount is "hot") this will
>> be a much better use of hardware than memory-only.
>>
>> On Wed, Jun 3, 2009 at 7:44 PM, lichun li  wrote:
>>> Thank you!
>>> The "how fast is it" section says:"In a nutshell, Cassandra is much
>>> faster than relational databases, and much slower than memory-only
>>> systems or systems that don't sync each update to disk."
>>> Can Cassandra work in a memory-only mode? Can it be done by just
>>> changing configuration?
>>>
>>> On Wed, Jun 3, 2009 at 10:38 PM, Jonathan Ellis  wrote:
>>>> We're basically in a roll-your-own benchmark state.  Johan can
>>>> probably give some pointers:
>>>> http://blog.oskarsson.nu/2009/05/vpork.html.  Also see the "how fast
>>>> is it" section here:
>>>> http://spyced.blogspot.com/2009/05/cassandra-03-release-candidate-and.html
>>>>
>>>> -Jonathan
>>>>
>>>> On Wed, Jun 3, 2009 at 3:06 AM, lichun li  wrote:
>>>>> I can't find cassandra's performance data, such as throughput. Does
>>>>> anyone know where to find these data?
>>>>>
>>>>> --
>>>>> Sincerely yours,
>>>>>
>>>>> Lichun Li
>>>>> Mobile Life New Media Lab, BUPT
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Sincerely yours,
>>>
>>> Lichun Li
>>> Mobile Life New Media Lab, BUPT
>>>
>>
>


Re: questions about operations

2009-06-04 Thread Jonathan Ellis
On Thu, Jun 4, 2009 at 12:33 AM, Thorsten von Eicken  
wrote:
> I'm looking at the cassandra data model and operations and I'm running into
> a number of questions I have not been able to answer:
>
> - what does get_columns_since do? I thought there's only one version of a
> column stored. I'm puzzled about the "since" aspect.

this is for use with time-sorted CFs or supercolumns -- it's like a
slice by time.

> - is the Thrift interface for get_superColumn correct? It seems to me that
> "3:string columnFamily" should really be "3:string
> columnFamily_superColumnName" (I know this doesn't have any functional
> impact, just makes it hard to understand what the operation does)
>
> - is the Thrift interface for get_slice_super correct? It seems to me that
> "3:string columnFamily_superColumnName" should really be "3:string
> columnFamily"

I think you're right.

> - what does get_key_range do? It looks like it returns a list of keys, but
> why does one have to specify a list of column family names?

The CF is the unit of data storage, so it will be more efficient if
you can narrow down which CFs you are interested in keys from.  But if
you pass an empty list it will scan all of them.

> - what does touch do?

It's intended to force the index information for the key in question
into an explicit LRU cache to save a seek on the next lookup, and also
get the row data into the OS fs cache.  But the first part is buggy
and the second part works poorly with large rows so it's going to be
removed in trunk RSN.

-Jonathan


Re: questions about operations

2009-06-04 Thread Jonathan Ellis
On Thu, Jun 4, 2009 at 10:01 AM, Thorsten von Eicken  
wrote:
> Ah, got it, I forgot about the time-sorted CFs. So does this mean that if I
> call get_columns_since on a name-sorted CF I will get an invalid request
> exception? And also if I call get_slice_by_name_range or get_slice_by_names
> on a time-sorted CF? Or does the sorting only affect performance and not
> whether the operations are allowed or not?

My best guess from looking at the code (I haven't tested it) is that
it will try to fulfil the request on the "wrong" kind of CF, but I
don't think it actually handles that case correctly.

If you could verify that there is a bug here and file a JIRA ticket if
so, that would be helpful. :)

> Also, is there no get_slice_super_since and get_slice_super_by_name_range?

Right -- currently supercolumns are always name-sorted, and their
subcolumns are always time-sorted.

-Jonathan


Re: Database backstore

2009-06-11 Thread Jonathan Ellis
I suppose you could do that either directly from your client or with a
proxy, but if your rdbms can handle the write volume then just use
replication to handle the reads.  Typically people move to Cassandra
and other distributed dbs when they need to scale more writes than you
can do on an rdbms.

If possible, I think a better approach to "I don't trust this new
technology" is to keep a separate (distributed) log of your writes
somehow such that if you absolutely had to you could rebuild your
cassandra data from.

Risk of corruption with Cassandra is much lower than most systems
since SSTables are immutable once written.

-Jonathan

On Thu, Jun 11, 2009 at 6:53 PM, testn wrote:
>
> Is it possible to persist the data into the database and using cassandra as a
> cache writethrough? I wonder this because many organizations don't really
> quite believe in the reliability of disk storage (i.e. can be corrupted). If
> Cassandra can load data from Database on the fly while persisting it into
> the database when writing, it would be perfect..
> --
> View this message in context: 
> http://n2.nabble.com/Database-backstore-tp3065200p3065200.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at 
> Nabble.com.
>
>


Re: Viability of running on EC2

2009-06-13 Thread Jonathan Ellis
IMO the biggest downside to running on EC2 is that IO is terrible.  I
haven't done benchmarks, but anecdotally disk performance in
particular seems like an order of magnitude slower than you'd get on
non-virtual disks.  So that is worth investigating before assuming
that the price/performance on EC2 is what you think it is.

Other than that, Cassandra is designed to emphasize availability so it
should work fine in the situations you describe.  Hinted handoff in
particular will get writes to the right nodes quickly when machines
come back online.  (However, Cassandra is not yet good at dealing with
machines becoming permanently dead.)

Of course if _all_ of some keys' replicas are temporarily partitioned
off from you you won't be able to read that data until they are
visible again.

-Jonathan

On Sat, Jun 13, 2009 at 11:20 AM, Anthony
Molinaro wrote:
> Hi,
>
>  I was wondering what the viability of running cassandra on ec2 was.
> I believe that it currently runs on some pretty hefty hardware at
> facebook, so I'm wondering what the minimum hardware config is
> (in other words can I run it on a cluster of 2core 4GB machines)?
> Also, running on Amazon means no multicast, network partitions and
> machines just disappearing.  How does cassandra deal with these
> constraints/failures?
>
> Thanks for information,
>
> -Anthony
>
> --
> 
> Anthony Molinaro                           
>


Re: Viability of running on EC2

2009-06-13 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-208 is probably the
issue you are referring to.  It is fixed in trunk.

Our goal is to run most workloads fine with 1GB of heap out of the
box, which should be fine even on a small EC2 instance iirc.

See http://wiki.apache.org/cassandra/MemtableThresholds for tuning memory use.

-Jonathan

On Sat, Jun 13, 2009 at 3:10 PM, Anthony
Molinaro wrote:
> And any problems with small memory boxes?  I see some chatter on the
> cassandra development list about OOM errors.  Are they more prevalent
> on smaller footprint boxes?
>
> Thanks again,
>
> -Anthony
>
> On Sat, Jun 13, 2009 at 11:33:21AM -0500, Jonathan Ellis wrote:
>> IMO the biggest downside to running on EC2 is that IO is terrible.  I
>> haven't done benchmarks, but anecdotally disk performance in
>> particular seems like an order of magnitude slower than you'd get on
>> non-virtual disks.  So that is worth investigating before assuming
>> that the price/performance on EC2 is what you think it is.
>>
>> Other than that, Cassandra is designed to emphasize availability so it
>> should work fine in the situations you describe.  Hinted handoff in
>> particular will get writes to the right nodes quickly when machines
>> come back online.  (However, Cassandra is not yet good at dealing with
>> machines becoming permanently dead.)
>>
>> Of course if _all_ of some keys' replicas are temporarily partitioned
>> off from you you won't be able to read that data until they are
>> visible again.
>>
>> -Jonathan
>>
>> On Sat, Jun 13, 2009 at 11:20 AM, Anthony
>> Molinaro wrote:
>> > Hi,
>> >
>> >  I was wondering what the viability of running cassandra on ec2 was.
>> > I believe that it currently runs on some pretty hefty hardware at
>> > facebook, so I'm wondering what the minimum hardware config is
>> > (in other words can I run it on a cluster of 2core 4GB machines)?
>> > Also, running on Amazon means no multicast, network partitions and
>> > machines just disappearing.  How does cassandra deal with these
>> > constraints/failures?
>> >
>> > Thanks for information,
>> >
>> > -Anthony
>> >
>> > --
>> > 
>> > Anthony Molinaro                           
>> >
>
> --
> 
> Anthony Molinaro                           
>


Re: Querying columns return strange characters

2009-06-15 Thread Jonathan Ellis
byte[].toString is not the inverse of String.getBytes; you need to use
new String(byte[]) for that.

fyi, the characters you see are

[: this is an array
B: of bytes
dcb03b: memory address

this will let you recognize such output in the future :)

-Jonathan

On Mon, Jun 15, 2009 at 11:26 AM, Ivan Chang wrote:
> I modified some test cases in the Cassandra distribution.  Specifically in
> the unit test package I modified ServerTest.java, basically just tried to
> insert some columns and retrieve them.  Here's part of the code:
>
>         RowMutation rm = new RowMutation("Table1", "partner0");
>         ColumnFamily cf = new ColumnFamily("Standard1", "Standard");
>         long now = Calendar.getInstance().getTimeInMillis();
>         System.out.println(now);
>         cf.addColumn("firstname", "John".getBytes(), now);
>         cf.addColumn("lastname", "Doe".getBytes(), now);
>         rm.add(cf);
>         try {
>             rm.apply();
>         } catch (Exception e) {
>         }
>
>         Table table = Table.open("Table1");
>
>         try {
>             Row result = table.getRow("partner0", "Standard1");
>             System.out.println(result.toString());
>             ColumnFamily cres = result.getColumnFamily("Standard1");
>             Map cols = cres.getColumns();
>             System.out.println(cols.size());
>             Set c = cols.keySet();
>             Iterator it = c.iterator();
>             while (it.hasNext()) {
>                 String cn = (String) it.next();
>                 System.out.println(cn);
>                 //Byt/eArrayOutputStream baos = new ByteArrayOutputStream();
>                 /DataOutputStream dos = new DataOutputStream(baos);
>                 //cres.getColumnSerializer().serialize(cres.getColumn(cn),
> dos);
>                 //dos.flush();
>                 //System.out.println(dos.size());
>                 //System.out.println(dos.toString());
>                 System.out.println(cres.getColumn(cn).value().toString());
>             }
>
> //System.out.println(cres.getColumn("firstname").value().toString());
>         } catch (Exception e) {
>             System.out.println(e.getMessage());
>         }
>
> In summary, it's a very simple code that inserts a row (key "partner0") with
> two columns: firstname (value "John"), lastname (value "Doe") to the
> Standard1 column family.  When I execute the test, I got the following
> output:
>
>    [testng] 1245082940509
>    [testng] Row(partner0 [ColumnFamily(Standard1
> [firstname:false:4...@1245082940509, lastname:false:3...@1245082940509]))]
>    [testng] 2
>    [testng] lastname
>    [testng] [...@dcb03b
>    [testng] firstname
>    [testng] [...@b60b93
>
> Everything looks fine, the columns were inserted.  However, the retrieved
> values were [...@dcb03b for lastname and [...@b60b93 for firstname, instead of
> what's inserted by the code ("Doe", "John").
>
> Anyone could give a clue as to why this happened?
>
> Thanks!
>
> Ivan
>


Re: Distributed filtering / aggregation

2009-06-17 Thread Jonathan Ellis
There's some preliminary support for running server-side filters (see
CalloutManager.java) but basically the first person who needs this
functionality gets to finish coding it up. :)

I'm happy to help you get started but it's not something we're going
to need soon.

-Jonathan

On Wed, Jun 17, 2009 at 10:46 AM, testn wrote:
>
> I don't see much documentation yet. But is there any chance that it can
> perform filtering (apart from Range Query) or aggregation remote?
> --
> View this message in context: 
> http://n2.nabble.com/Distributed-filtering---aggregation-tp3093626p3093626.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at 
> Nabble.com.
>
>


Re: Data persistency

2009-06-17 Thread Jonathan Ellis
You're using internal APIs.  Don't do that unless you know what you're doing. :)

The client API is in Cassandra.Client.

We have some sample code here: http://wiki.apache.org/cassandra/ClientExamples

(although none in Java yet, it should still be pretty clear.)

-Jonathan

On Wed, Jun 17, 2009 at 3:54 PM, Ivan Chang wrote:
> I tried to insert and retrieve data from a standalone Java program.  While I
> am able to insert and retrieve the correct data from within the Java
> session.  After I terminate the session, and rerun only the data retrieval
> part, the previous inserted data does not exist anymore, throwing a null
> exception.  Here's the code:
>
>     // Get storage-config file location
>
> System.out.println("storage-config="+DatabaseDescriptor.getConfigFileName());
>
>     // Insert some data with key "partner1"
>     RowMutation rm = new RowMutation("Table1", "partner1");
>         ColumnFamily cf = new ColumnFamily("Standard1", "Standard");
>         long now = Calendar.getInstance().getTimeInMillis();
>         System.out.println(now);
>         cf.addColumn("firstname", "John1".getBytes(), now);
>         cf.addColumn("lastname", "Doe1".getBytes(), now);
>         rm.add(cf);
>         try {
>             rm.apply();
>         } catch (Exception e) {
>         }
>
>     // Retrieve data for key "partner1"
>         Table table = Table.open("Table1");
>
>         try {
>             Row result = table.getRow("partner1", "Standard1");
>             System.out.println(result.toString());
>             ColumnFamily cres = result.getColumnFamily("Standard1");
>             Map cols = cres.getColumns();
>             System.out.println(cols.size());
>             Set c = cols.keySet();
>             Iterator it = c.iterator();
>             while (it.hasNext()) {
>                 String cn = (String) it.next();
>                 System.out.println(cn);
>                 System.out.println(new String(cres.getColumn(cn).value()));
>             }
>         } catch (Exception e) {
>             System.out.println("Ex: " + e.getMessage());
>         }
>
> the print out from above is
>
> storage-config=~/Cassandra/trunk/conf/storage-conf.xml
> 1245270260114
> Row(partner1 [ColumnFamily(Standard1 [firstname:false:5...@1245270260114,
> lastname:false:4...@1245270260114]))]
> 2
> lastname
> Doe1
> firstname
> John1
>
> However, when I commented out the insert part of the above code and try
> retrieve data again by rerunning the main code, I got an exception:
>
> Row(partner1 [)]
> Ex: null
>
> So the data doesn't seem to persist across sessions.
>
> Could someone explain what's wrong with the code?
>
> Thanks,
> Ivan
>


Re: Data persistency

2009-06-18 Thread Jonathan Ellis
You don't.  Supercolumns are not arbitrarily nestable.

A columnfamily is either super or normal; a super columnfamily
contains supercolumns, which in turn contain Columns.  A normal
columnfamily contains Columns directly.  You can't mix-and-match
supercolumns and normal columns (at the same level of nesting) in a
single columnfamily.

-Jonathan

On Thu, Jun 18, 2009 at 12:12 PM, Ivan Chang wrote:
> Using Cassandra.Client works.  However more questions arise, specifically
> regarding Super Columns, while the following code persist the super column
> "sc1"with 3 simple columns.  How do I create nested super columns?  A super
> column with multiple super columns and standard columns?  Thanks, Ivan
>
>             // Super Column
>     batch_mutation_super_t bt = new batch_mutation_super_t();
>     bt.key = "testkey";
>     bt.table = tablename_;
>     bt.cfmap = new HashMap>();
>     List superColumn_arr = new
> ArrayList();
>     List column_arr2 = new ArrayList();
>         column_arr2.add(new column_t("c1", "v1".getBytes(), now));
>         column_arr2.add(new column_t("c2", "v2".getBytes(), now));
>         column_arr2.add(new column_t("c3", "v3".getBytes(), now));
>     superColumn_arr.add(new superColumn_t("sc1", column_arr2));
>     bt.cfmap.put("Super1", superColumn_arr);
>     peerstorageClient.batch_insert_superColumn(bt, false);
>
> On Wed, Jun 17, 2009 at 5:01 PM, Jonathan Ellis  wrote:
>>
>> You're using internal APIs.  Don't do that unless you know what you're
>> doing. :)
>>
>> The client API is in Cassandra.Client.
>>
>> We have some sample code here:
>> http://wiki.apache.org/cassandra/ClientExamples
>>
>> (although none in Java yet, it should still be pretty clear.)
>>
>> -Jonathan
>>
>> On Wed, Jun 17, 2009 at 3:54 PM, Ivan Chang wrote:
>> > I tried to insert and retrieve data from a standalone Java program.
>> > While I
>> > am able to insert and retrieve the correct data from within the Java
>> > session.  After I terminate the session, and rerun only the data
>> > retrieval
>> > part, the previous inserted data does not exist anymore, throwing a null
>> > exception.  Here's the code:
>> >
>> >     // Get storage-config file location
>> >
>> >
>> > System.out.println("storage-config="+DatabaseDescriptor.getConfigFileName());
>> >
>> >     // Insert some data with key "partner1"
>> >     RowMutation rm = new RowMutation("Table1", "partner1");
>> >         ColumnFamily cf = new ColumnFamily("Standard1", "Standard");
>> >         long now = Calendar.getInstance().getTimeInMillis();
>> >         System.out.println(now);
>> >         cf.addColumn("firstname", "John1".getBytes(), now);
>> >         cf.addColumn("lastname", "Doe1".getBytes(), now);
>> >         rm.add(cf);
>> >         try {
>> >             rm.apply();
>> >         } catch (Exception e) {
>> >         }
>> >
>> >     // Retrieve data for key "partner1"
>> >         Table table = Table.open("Table1");
>> >
>> >         try {
>> >             Row result = table.getRow("partner1", "Standard1");
>> >             System.out.println(result.toString());
>> >             ColumnFamily cres = result.getColumnFamily("Standard1");
>> >             Map cols = cres.getColumns();
>> >             System.out.println(cols.size());
>> >             Set c = cols.keySet();
>> >             Iterator it = c.iterator();
>> >             while (it.hasNext()) {
>> >                 String cn = (String) it.next();
>> >                 System.out.println(cn);
>> >                 System.out.println(new
>> > String(cres.getColumn(cn).value()));
>> >             }
>> >         } catch (Exception e) {
>> >             System.out.println("Ex: " + e.getMessage());
>> >         }
>> >
>> > the print out from above is
>> >
>> > storage-config=~/Cassandra/trunk/conf/storage-conf.xml
>> > 1245270260114
>> > Row(partner1 [ColumnFamily(Standard1 [firstname:false:5...@1245270260114,
>> > lastname:false:4...@1245270260114]))]
>> > 2
>> > lastname
>> > Doe1
>> > firstname
>> > John1
>> >
>> > However, when I commented out the insert part of the above code and try
>> > retrieve data again by rerunning the main code, I got an exception:
>> >
>> > Row(partner1 [)]
>> > Ex: null
>> >
>> > So the data doesn't seem to persist across sessions.
>> >
>> > Could someone explain what's wrong with the code?
>> >
>> > Thanks,
>> > Ivan
>> >
>
>


Re: Database backstore

2009-06-22 Thread Jonathan Ellis
You have to give up a lot of optimizations when you say "we're going
to plug into any generic backend."  That is not something we are
interested in doing.

-Jonathan

On Mon, Jun 22, 2009 at 7:15 AM, testn wrote:
>
> It would be nice if we can plug in different backstore to it. Voldemort seems
> to be quite extensible that way and I think it's quite suitable for an
> application that has high read/write ratio.
>
>
> Jonathan Ellis wrote:
>>
>> I suppose you could do that either directly from your client or with a
>> proxy, but if your rdbms can handle the write volume then just use
>> replication to handle the reads.  Typically people move to Cassandra
>> and other distributed dbs when they need to scale more writes than you
>> can do on an rdbms.
>>
>> If possible, I think a better approach to "I don't trust this new
>> technology" is to keep a separate (distributed) log of your writes
>> somehow such that if you absolutely had to you could rebuild your
>> cassandra data from.
>>
>> Risk of corruption with Cassandra is much lower than most systems
>> since SSTables are immutable once written.
>>
>> -Jonathan
>>
>> On Thu, Jun 11, 2009 at 6:53 PM, testn wrote:
>>>
>>> Is it possible to persist the data into the database and using cassandra
>>> as a
>>> cache writethrough? I wonder this because many organizations don't really
>>> quite believe in the reliability of disk storage (i.e. can be corrupted).
>>> If
>>> Cassandra can load data from Database on the fly while persisting it into
>>> the database when writing, it would be perfect..
>>> --
>>> View this message in context:
>>> http://n2.nabble.com/Database-backstore-tp3065200p3065200.html
>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>>> Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://n2.nabble.com/Database-backstore-tp3065200p3135134.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at 
> Nabble.com.
>
>


Re: New table and column families

2009-06-23 Thread Jonathan Ellis
you'll need to

(a) make sure you have the latest trunk

(b) wipe your data, commitlog, and system directories, since adding
new tables or columnfamilies non-destructively is not yet supported
(see https://issues.apache.org/jira/browse/CASSANDRA-44)

-Jonathan

On Tue, Jun 23, 2009 at 8:55 AM, Ivan Chang wrote:
> I modified storage-config.xml to add a new table and couple column families
> (see excerpt below).  The new table added is identified by the name
> "NewTable" and associated column families "Standard3", "Super3", and
> "Super4".
>
>     
>     
>     
>     
>      FlushPeriodInMinutes="60"/>
>     
>     
>     
>      Name="Super1"/>
>      Name="Super2"/>
>     
>     
>     
>      Name="Super3"/>
>      Name="Super4"/>
>     
>     
>
> Here comes some code to insert some data, the goal is to feed Cassandra with
> data from an xml file.
> When I execute the code, I got an exception.  What I don't understand is why
> this code failed even I have configured the super column families and new
> table etc.
>
> InvalidRequestException(why:Column Family Super3 is invalid.)
>     at
> org.apache.cassandra.service.Cassandra$get_column_result.read(Cassandra.java:3604)
>     at
> org.apache.cassandra.service.Cassandra$Client.recv_get_column(Cassandra.java:202)
>     at
> org.apache.cassandra.service.Cassandra$Client.get_column(Cassandra.java:178)
>     ...
>
>         // New Table Sample
>         String docID = "";
>         try {
>             batch_mutation_super_t bt = new batch_mutation_super_t();
>             bt.table = "NewTable";
>             bt.cfmap = new HashMap>();
>
>     // Read sample xml
>             XMLUtils xmlUtils = new XMLUtils(
>             System.getProperty("samples-xml-dir")
>             + System.getProperty("file.separator")
>             + "Sample.xml");
>
>             /* docID from xml file */
>             doctID = xmlUtils.getNodeValue("/Document/docID");
>             bt.key = docID;
>
>     // Collect all nodes that matches /Document/node1
>             NodeList nl = xmlUtils.getRequestedNodeList("/Document/node1");
>
>             StringWriter sw = new StringWriter();
>             Transformer t =
> TransformerFactory.newInstance().newTransformer();
>             t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
>             t.transform(new DOMSource(nl.item(0)), new StreamResult(sw));
>             sw.flush();
>             System.out.println(sw.toString());
>             sw.close();
>
>             /* nodes */
>     // see populate function below
>             List nodes_arr = populate("node", "/Document/node1",
> xmlUtils, t);
>
>             List S3 = new ArrayList();
>
>             S3.add(new superColumn_t("sc1", nodes_arr));
>
>             bt.cfmap.put("Super3", S3);
>
>             List S4= new ArrayList();
>
>             S4.add(new superColumn_t("sc1_replicate", nodes_arr));
>
>             bt.cfmap.put("Super4", S4);
>
>             peerstorageClient.batch_insert_superColumn(bt, false);
>
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>
>     // Returns columns of XML data matching xpath on given xml doc (via
> xmlUtlis)
>     private static List populate(String column_prefix, String
> xpath, XMLUtils xmlUtils, Transformer t) throws Exception {
>         StringWriter sw = new StringWriter();
>     List c = new ArrayList();
>         NodeList nl = xmlUtils.getRequestedNodeList(xpath);
>         long now = Calendar.getInstance().getTimeInMillis();
>         if (nl != null) {
>             for (int i = 0; i < nl.getLength(); i++) {
>                 sw = new StringWriter();
>                 t.transform(new DOMSource(nl.item(i)), new
> StreamResult(sw));
>                 sw.flush();
>                 System.out.println(sw.toString());
>                 c.add(new column_t(column_prefix+i,
> sw.toString().getBytes(), now));
>         sw.close();
>             }
>         }
>         return c;
>     }
>
> Thanks for checking this issue out.
>
> -Ivan


Re: Question about cassandra (replication)

2009-06-25 Thread Jonathan Ellis
Rather than post the same question verbatim, it would be more useful
if you explained what you still don't understand after Alexander and
Sandeep's explanations on the google group.

(http://groups.google.com/group/cassandra-user/browse_thread/thread/4330e415e959e9d9)

On Thu, Jun 25, 2009 at 9:11 AM, Harold Lim wrote:
>
> Hi All,
>
> I posted a similar message on the google groups page. Hopefully, I'll get 
> more feedback here.
>
>
> I just started reading about dynamo and Cassandra and I am thinking
> about possibly using cassandra for my system.
>
> I was reading the dynamo paper and they mentioned about a preference
> list for a particular key. Is this preference list configurable?
>
> How does Cassandra choose which nodes are in the preference list?
> Also, are the number of replica for each key/column configurable? For
> example, can I set the replication factor per key/value?
>
> I read that Cassandra has optimistic replication. What exactly does
> that mean? Underneath the hood, how does cassandra maintain/detect the
> number of replicas? Does it aggressively replicates an item, when it
> detects that the number of replica of a particular item goes below the
> specified repliation factor?
>
> Is the replication strategy (when to replicate, aggresiveness, etc)
> configurable too?
>
>
>
>
>
>
> Thanks,
> Harold
>
>
>
>


Re: Question about cassandra (replication)

2009-06-25 Thread Jonathan Ellis
On Thu, Jun 25, 2009 at 10:10 AM, Harold Lim wrote:
>
> Hi,
>
> Is the replication factor configurable? For example, Can I configure the 
> replication factor per column-family (e.g., 5 for column-family a and 3 for 
> column-family b).

It is currently only configurable globally.  It may make sense to
configure on a table/namespace basis.  IMO it does not make sense on a
CF basis.

> Also, I am interested about the replication details. Sandeep wrote:
> "When there's a failure and the #of replicas for a given key goes down,
> Cassandra does not aggressively create a new copy for the data. The
> assumption is that the failed node will be replaced soon enough, and work
> can continue with the other 2 replicas."
>
> When and how does cassandra replicate when the replication count of a 
> particular data goes below the replication factor? How does it monitor the 
> replication count of a particular data?

Currently it re-replicates (repairs) lazily.  This is called "read
repair" and we follow essentially the model given in the Dynamo paper.

Non-lazy repair is being worked on at
https://issues.apache.org/jira/browse/CASSANDRA-193

-Jonathan


Re: schema example

2009-07-03 Thread Jonathan Ellis
get_columns_since

On Fri, Jul 3, 2009 at 7:21 PM, Evan Weaver wrote:
> This helps a lot.
>
> However, I can't find any API method that actually lets me do a
> slice query on a time-sorted column, as necessary for the second blog
> example. I get the following error on r789419:
>
> InvalidRequestException: get_slice_from requires CF indexed by name
>
> Evan
>
> On Tue, May 19, 2009 at 8:00 PM, Jonathan Ellis wrote:
>> Mail storage, man, I think pretty much anything I could come up with
>> would look pretty simplistic compared to what "real" systems do in
>> that domain. :)
>>
>> But blogs, I think I can handle those.  Let's make it ours multiuser
>> or there isn't enough scale to make it interesting. :)
>>
>> The interesting thing here is we want to be able to query two things
>> efficiently:
>>  - the most recent posts belonging to a given blog, in reverse
>> chronological order
>>  - a single post and its comments, in chronological order
>>
>> At first glance you might think we can again reasonably do this with a
>> single CF, this time a super CF:
>>
>> 
>>
>> The key is the blog name, the supercolumns are posts and the
>> subcolumns are comments.  This would be reasonable BUT supercolumns
>> are just containers, they have no data or timestamp associated with
>> them directly (only through their subcolumns).  So you cannot sort a
>> super CF by time.
>>
>> So instead what I would do would be to use two CFs:
>>
>> 
>> 
>>
>> For the first, the keys used would be blog names, and the columns
>> would be the post titles and body.  So to get a list of most recent
>> posts you just do a slice query.  Even though Cassandra currently
>> handles large groups of columns sub-optimally, even with a blog
>> updated several times a day you'd be safe taking this approach (i.e.
>> we'll have that problem fixed before you start seeing it :).
>>
>> For the second, the keys are blog name.  The
>> columns are the comment data.  You can serialize these a number of
>> ways; I would probably use title as the column name and have the value
>> be the author + body (e.g. as a json dict).  Again we use the slice
>> call to get the comments in order.  (We will have to manually reverse
>> what slice gives us since time sort is always reverse chronological
>> atm, but the overhead of doing this in memory will be negligible.)
>>
>> Does this help?
>>
>> -Jonathan
>>
>> On Tue, May 19, 2009 at 11:49 AM, Evan Weaver  wrote:
>>> Even if it's not actually in real-life use, some examples for common
>>> domains would really help clarify things.
>>>
>>>  * blog
>>>  * email storage
>>>  * search index
>>>
>>> etc.
>>>
>>> Evan
>>>
>>> On Mon, May 18, 2009 at 8:19 PM, Jonathan Ellis  wrote:
>>>> Does anyone have a simple app schema they can share?
>>>>
>>>> I can't share the one for our main app.  But we do need an example
>>>> here.  A real one would be nice if we can find one.
>>>>
>>>> I checked App Engine.  They don't have a whole lot of examples either.
>>>>  They do have a really simple one:
>>>> http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
>>>>
>>>> The most important thing in Cassandra modeling is choosing a good key,
>>>> since that is what most of your lookups will be by.  Keys are also how
>>>> Cassandra scales -- Cassandra can handle effectively infinite keys
>>>> (given enough nodes obviously) but only thousands to millions of
>>>> columns per key/CF (depending on what API calls you use -- Jun is
>>>> adding one now that does not deseriailze everything in the whole CF
>>>> into memory.  The rest will need to follow this model eventually too).
>>>>
>>>> For this guestbook I think the choice is obvious: use the name as the
>>>> key, and have a single simple CF for the messages.  Each column will
>>>> be a message (you can even use the mandatory timestamp field as part
>>>> of your user-visible data.  win!).  You get the list (or page) of
>>>> users with get_key_range and then their messages with get_slice.
>>>>
>>>> 
>>>>
>>>> Anyone got another one for pedagogical purposes?
>>>>
>>>> -Jonathan
>>>>
>>>
>>>
>>>
>>> --
>>> Evan Weaver
>>>
>>
>
>
>
> --
> Evan Weaver
>


Re: schema example

2009-07-03 Thread Jonathan Ellis
On Fri, Jul 3, 2009 at 8:53 PM, Evan Weaver wrote:
> (From talking on IRC):
>
> I think this boils down to the offset/limit vs. token/limit debate.
>
> Token/limit is fine in all cases for me, but you still have to be able
> to query the head of the list (with a limit, but no token) to get
> started. Right now there is no facility for that on time-sorted column
> families:
>
>  list get_columns_since(1:string tablename, 2:string key,
> 3:string columnParent, 4:i64 timeStamp)

basically we need _since to add the kind of functionality we have in
Slice (or will, after 261 is committed).

it's probably better to get 240 (and 185 + 189) done sooner than later
though instead of wasting effort on an API we know is broken.

(the old get_slice could do basically anything since it deserialized
the entire CF into memory.  we're moving away from that to support
larger-than-memory CFs.)

-Jonathan


Re: [Announce] CassandraClient 0.1 for Ruby released

2009-07-04 Thread Jonathan Ellis
Nice!

On Sat, Jul 4, 2009 at 4:59 AM, Evan Weaver wrote:
> I am pleased to release:
>
> cassandra_client 0.1
>
> A Ruby client for the Cassandra distributed database.
>
> http://blog.evanweaver.com/files/doc/fauna/cassandra_client/
> http://github.com/fauna/cassandra_client/
>
> Evan
>
> --
> Evan Weaver
>


Re: cassandra Cli example from wiki error

2009-07-06 Thread Jonathan Ellis
This is a known problem in trunk.  It's fixed by the patch in issue
272, which should be applied tonight or tomorrow.

-Jonathan

On Mon, Jul 6, 2009 at 7:27 PM, Kevin
Castiglione wrote:
> hi
> i just got cassandra compiled.
> but the cli example from wiki is not working. the conf files are untouched.
> can you help me out here!
> thanks
>
> CLI output:
> ./cassandra-cli --host localhost --port 9160
> Connected to localhost/9160
> Welcome to cassandra CLI.
>
> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
> cassandra> set Table1.Standard1['jsmith']['first'] = 'John'
> Statement processed.
> cassandra> set Table1.Standard1['jsmith']['last'] = 'Smith'
> Statement processed.
> cassandra> set Table1.Standard1['jsmith']['age'] = '42'
> Statement processed.
> cassandra> get Table1.Standard1['jsmith']
> Error: CQL Execution Error
> cassandra>
>
>
>
>
>
> cassandra output
> sudo ./bin/cassandra -f
> Listening for transport dt_socket at address: 
> DEBUG - Loading settings from ./bin/../conf/storage-conf.xml
> DEBUG - adding Super1 as 0
> DEBUG - adding Standard2 as 1
> DEBUG - adding Standard1 as 2
> DEBUG - adding StandardByTime1 as 3
> DEBUG - adding LocationInfo as 4
> DEBUG - adding HintsColumnFamily as 5
> DEBUG - Starting to listen on 127.0.0.1:7001
> INFO - Cassandra starting up...
> DEBUG - Compiling CQL query ...
> DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'first')
> 'John')
> DEBUG - Executing CQL query ...
> DEBUG - locally writing writing key jsmith to 127.0.0.1:7000
> DEBUG - Compiling CQL query ...
> DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'last')
> 'Smith')
> DEBUG - Executing CQL query ...
> DEBUG - locally writing writing key jsmith to 127.0.0.1:7000
> DEBUG - Compiling CQL query ...
> DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'age') '42')
> DEBUG - Executing CQL query ...
> DEBUG - locally writing writing key jsmith to 127.0.0.1:7000
> DEBUG - Compiling CQL query ...
> DEBUG - AST: (A_GET (A_COLUMN_ACCESS Table1 Standard1 'jsmith'))
> DEBUG - Executing CQL query ...
> DEBUG - weakreadlocal reading SliceFromReadCommand(table='Table1',
> key='jsmith', columnFamily='Standard1', isAscending='true', limit='-1',
> count='2147483647')
> ERROR - Exception was generated at : 07/06/2009 17:21:30 on thread
> pool-1-thread-1
> 1
> java.lang.ArrayIndexOutOfBoundsException: 1
>     at org.apache.cassandra.db.Table.getSliceFrom(Table.java:612)
>     at
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57)
>     at
> org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:600)
>     at
> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:303)
>     at
> org.apache.cassandra.cql.common.ColumnRangeQueryRSD.getRows(ColumnRangeQueryRSD.java:101)
>     at
> org.apache.cassandra.cql.common.QueryPlan.execute(QueryPlan.java:41)
>     at
> org.apache.cassandra.cql.driver.CqlDriver.executeQuery(CqlDriver.java:45)
>     at
> org.apache.cassandra.service.CassandraServer.executeQuery(CassandraServer.java:491)
>     at
> org.apache.cassandra.service.Cassandra$Processor$executeQuery.process(Cassandra.java:1323)
>     at
> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:839)
>     at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>
>
> svn version : Revision: 791656
>
> java -version
> java version "1.6.0_14"
> Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
> Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode, sharing)
>
>
>
>


Re: cassandra Cli example from wiki error

2009-07-06 Thread Jonathan Ellis
Sorry, 277 is the right issue.  Just one patch.

Once it's applied it will be in svn trunk.

On Mon, Jul 6, 2009 at 7:35 PM, Kevin
Castiglione wrote:
> thanks for this:
> http://issues.apache.org/jira/browse/CASSANDRA-272
>
> do i need to apply all 3 patches?
>
> or can you tell me which svn version i can use so that it is working?
> thanks again!
> On Mon, Jul 6, 2009 at 5:31 PM, Jonathan Ellis  wrote:
>>
>> This is a known problem in trunk.  It's fixed by the patch in issue
>> 272, which should be applied tonight or tomorrow.
>>
>> -Jonathan
>>
>> On Mon, Jul 6, 2009 at 7:27 PM, Kevin
>> Castiglione wrote:
>> > hi
>> > i just got cassandra compiled.
>> > but the cli example from wiki is not working. the conf files are
>> > untouched.
>> > can you help me out here!
>> > thanks
>> >
>> > CLI output:
>> > ./cassandra-cli --host localhost --port 9160
>> > Connected to localhost/9160
>> > Welcome to cassandra CLI.
>> >
>> > Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
>> > cassandra> set Table1.Standard1['jsmith']['first'] = 'John'
>> > Statement processed.
>> > cassandra> set Table1.Standard1['jsmith']['last'] = 'Smith'
>> > Statement processed.
>> > cassandra> set Table1.Standard1['jsmith']['age'] = '42'
>> > Statement processed.
>> > cassandra> get Table1.Standard1['jsmith']
>> > Error: CQL Execution Error
>> > cassandra>
>> >
>> >
>> >
>> >
>> >
>> > cassandra output
>> > sudo ./bin/cassandra -f
>> > Listening for transport dt_socket at address: 
>> > DEBUG - Loading settings from ./bin/../conf/storage-conf.xml
>> > DEBUG - adding Super1 as 0
>> > DEBUG - adding Standard2 as 1
>> > DEBUG - adding Standard1 as 2
>> > DEBUG - adding StandardByTime1 as 3
>> > DEBUG - adding LocationInfo as 4
>> > DEBUG - adding HintsColumnFamily as 5
>> > DEBUG - Starting to listen on 127.0.0.1:7001
>> > INFO - Cassandra starting up...
>> > DEBUG - Compiling CQL query ...
>> > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'first')
>> > 'John')
>> > DEBUG - Executing CQL query ...
>> > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000
>> > DEBUG - Compiling CQL query ...
>> > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'last')
>> > 'Smith')
>> > DEBUG - Executing CQL query ...
>> > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000
>> > DEBUG - Compiling CQL query ...
>> > DEBUG - AST: (A_SET (A_COLUMN_ACCESS Table1 Standard1 'jsmith' 'age')
>> > '42')
>> > DEBUG - Executing CQL query ...
>> > DEBUG - locally writing writing key jsmith to 127.0.0.1:7000
>> > DEBUG - Compiling CQL query ...
>> > DEBUG - AST: (A_GET (A_COLUMN_ACCESS Table1 Standard1 'jsmith'))
>> > DEBUG - Executing CQL query ...
>> > DEBUG - weakreadlocal reading SliceFromReadCommand(table='Table1',
>> > key='jsmith', columnFamily='Standard1', isAscending='true', limit='-1',
>> > count='2147483647')
>> > ERROR - Exception was generated at : 07/06/2009 17:21:30 on thread
>> > pool-1-thread-1
>> > 1
>> > java.lang.ArrayIndexOutOfBoundsException: 1
>> >     at org.apache.cassandra.db.Table.getSliceFrom(Table.java:612)
>> >     at
>> >
>> > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57)
>> >     at
>> >
>> > org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:600)
>> >     at
>> >
>> > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:303)
>> >     at
>> >
>> > org.apache.cassandra.cql.common.ColumnRangeQueryRSD.getRows(ColumnRangeQueryRSD.java:101)
>> >     at
>> > org.apache.cassandra.cql.common.QueryPlan.execute(QueryPlan.java:41)
>> >     at
>> >
>> > org.apache.cassandra.cql.driver.CqlDriver.executeQuery(CqlDriver.java:45)
>> >     at
>> >
>> > org.apache.cassandra.service.CassandraServer.executeQuery(CassandraServer.java:491)
>> >     at
>> >
>> > org.apache.cassandra.service.Cassandra$Processor$executeQuery.process(Cassandra.java:1323)
>> >     at
>> >
>> > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:839)
>> >     at
>> >
>> > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
>> >     at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >     at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >     at java.lang.Thread.run(Thread.java:619)
>> >
>> >
>> >
>> >
>> >
>> >
>> > svn version : Revision: 791656
>> >
>> > java -version
>> > java version "1.6.0_14"
>> > Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
>> > Java HotSpot(TM) Client VM (build 14.0-b16, mixed mode, sharing)
>> >
>> >
>> >
>> >
>
>


Re: problems with python client

2009-07-07 Thread Jonathan Ellis
you want

start=''
finish=''
offset=0

On Tue, Jul 7, 2009 at 8:01 AM, Kevin
Castiglione wrote:
> i have inserted a row into the table Table1 and Standard1 column family. And
> this works with the cassandra-cli
>
> cassandra> get Table1.Standard1['1']
> COLUMN_TIMESTAMP = 1246942866; COLUMN_VALUE = 24; COLUMN_KEY = age;
> COLUMN_TIMESTAMP = 1246943353; COLUMN_VALUE = Chris Goffinet; COLUMN_KEY =
> name;
> Statement processed.
>
>
> but if i try to get this data using the python client I get an empty list:
 client.get_slice(tablename='Table1', key='1', columnParent='Standard1',
 start='0', finish='100', isAscending=True, offset=-1, count=1000)
> [ ]
>
> this is the output from cassandra
> DEBUG - weakreadlocal reading SliceFromReadCommand(table='Table1', key='1',
> columnFamily='Standard1', isAscending='true', limit='-1', count='1000')
> DEBUG - clearing
>
>
> also notice that the argument 'offset' in the python client is actually
> passed to cassandra as 'limit'.
>
>
> is there something im missing here?
> thanks
>


Re: problems with python client

2009-07-07 Thread Jonathan Ellis
On Tue, Jul 7, 2009 at 8:19 AM, Kevin
Castiglione wrote:
> thanks a lot for this! it works.
> can you pl. explain what start, finish, isAscending are?

start = column name to start with
finish = " " to stop with
ascending = order to return columns in

> also the value i pass to offset gets passed to cassandra as limit, is this
> expected?

not sure what you mean.


Re: problems with python client

2009-07-07 Thread Jonathan Ellis
On Tue, Jul 7, 2009 at 8:31 AM, Kevin
Castiglione wrote:
> you can see that i passed the value -1 to offset and in the cassandra server
> log, it is received as the argument limit.
> offset and limit mean different things right? is this a problem in python
> client? or am i missing something here?

ah, that just means I forgot to update toString on the java side. :)


Re: Up and Running with Cassandra

2009-07-07 Thread Jonathan Ellis
Before 0.4 is released. :)

The user-facing API is more of an immediate pain point (tickets 139,
185, 240), but the disk format change would be next in my mind.

-Jonathan

On Tue, Jul 7, 2009 at 1:06 PM, Kevin
Castiglione wrote:
> any ideas when this will happen?
> thanks
>
> On Tue, Jul 7, 2009 at 10:52 AM, Evan Weaver  wrote:
>>
>> It will; I don't think the change is committed yet.
>>
>> Evan
>>
>> On Tue, Jul 7, 2009 at 10:50 AM, Kevin
>> Castiglione wrote:
>> > thanks for this post!
>> >
>> > you have said that:
>> > the on-disk storage format is expected to change in version 0.4.0.
>> >
>> >
>> > im using svn latest revision 791696. will the on-disk storage format
>> > change
>> > affect this version?
>> >
>> > On Mon, Jul 6, 2009 at 11:18 PM, Evan Weaver  wrote:
>> >>
>> >> In case you missed it, a big introductory post:
>> >>
>> >>
>> >>
>> >> http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
>> >>
>> >> Evan
>> >>
>> >> --
>> >> Evan Weaver
>> >
>> >
>>
>>
>>
>> --
>> Evan Weaver
>
>


Re: problem running cassandra

2009-07-09 Thread Jonathan Ellis
what version are you trying to run?  on what platform?

On Thu, Jul 9, 2009 at 12:04 PM,  wrote:
> I did set it up as the readme file instructed but i encountered this error,
> Can you please suggest how i fix this
> thanks
>
> cassandra]$ bin/cassandra -f
> Listening for transport dt_socket at address: 
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/cassandra/service/CassandraDaemon
> Caused by: java.lang.ClassNotFoundException:
> org.apache.cassandra.service.CassandraDaemon
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> Could not find the main class:
> org.apache.cassandra.service.CassandraDaemon.  Program will exit.
>
>


Re: problem running cassandra

2009-07-09 Thread Jonathan Ellis
for 0.3 you can connect to the web interface on port 7002 (configurable).

In trunk we have removed the web interface in favor of JMX and
nodeprobe; see http://wiki.apache.org/cassandra/GettingStarted,
http://wiki.apache.org/cassandra/NodeProbe, and
http://wiki.apache.org/cassandra/MemtableThresholds

On Thu, Jul 9, 2009 at 1:00 PM,  wrote:
> Hey jonathan
> thanks a lot
> fedora
> I searched and found that the problem was i hadnt setup JAVA_HOME
> once i set it up
> it worked immediately
> But i m trying to setup the cassandra web inerface. Can you show me how to
> setup cassandra
> Thanks a lot
>
> On Thu, Jul 9, 2009 at 10:27 AM, Jonathan Ellis  wrote:
>>
>> what version are you trying to run?  on what platform?
>>
>> On Thu, Jul 9, 2009 at 12:04 PM,  wrote:
>> > I did set it up as the readme file instructed but i encountered this
>> > error,
>> > Can you please suggest how i fix this
>> > thanks
>> >
>> > cassandra]$ bin/cassandra -f
>> > Listening for transport dt_socket at address: 
>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> > org/apache/cassandra/service/CassandraDaemon
>> > Caused by: java.lang.ClassNotFoundException:
>> > org.apache.cassandra.service.CassandraDaemon
>> >     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>> >     at java.security.AccessController.doPrivileged(Native Method)
>> >     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>> >     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>> >     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> >     at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>> >     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>> > Could not find the main class:
>> > org.apache.cassandra.service.CassandraDaemon.  Program will exit.
>> >
>> >
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: problem running cassandra

2009-07-09 Thread Jonathan Ellis
because it was (a) buggy and (b) trying to do too many things at once,
all of which html was a poor fit for.

you can generate a bare-bones python client with thrift; see
http://wiki.apache.org/cassandra/ThriftInterface

the Digg guys are working on a more idiomatic python client.

On Thu, Jul 9, 2009 at 3:20 PM,  wrote:
> why was the web interface removed?
> Is there a simple python client for cassandra like python-couchdb
> thanks a lot
>
> On Thu, Jul 9, 2009 at 12:25 PM, Jonathan Ellis  wrote:
>>
>> for 0.3 you can connect to the web interface on port 7002 (configurable).
>>
>> In trunk we have removed the web interface in favor of JMX and
>> nodeprobe; see http://wiki.apache.org/cassandra/GettingStarted,
>> http://wiki.apache.org/cassandra/NodeProbe, and
>> http://wiki.apache.org/cassandra/MemtableThresholds
>>
>> On Thu, Jul 9, 2009 at 1:00 PM,  wrote:
>> > Hey jonathan
>> > thanks a lot
>> > fedora
>> > I searched and found that the problem was i hadnt setup JAVA_HOME
>> > once i set it up
>> > it worked immediately
>> > But i m trying to setup the cassandra web inerface. Can you show me how
>> > to
>> > setup cassandra
>> > Thanks a lot
>> >
>> > On Thu, Jul 9, 2009 at 10:27 AM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> what version are you trying to run?  on what platform?
>> >>
>> >> On Thu, Jul 9, 2009 at 12:04 PM,  wrote:
>> >> > I did set it up as the readme file instructed but i encountered this
>> >> > error,
>> >> > Can you please suggest how i fix this
>> >> > thanks
>> >> >
>> >> > cassandra]$ bin/cassandra -f
>> >> > Listening for transport dt_socket at address: 
>> >> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> > org/apache/cassandra/service/CassandraDaemon
>> >> > Caused by: java.lang.ClassNotFoundException:
>> >> > org.apache.cassandra.service.CassandraDaemon
>> >> >     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>> >> >     at java.security.AccessController.doPrivileged(Native Method)
>> >> >     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>> >> >     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>> >> >     at
>> >> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> >> >     at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>> >> >     at
>> >> > java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>> >> > Could not find the main class:
>> >> > org.apache.cassandra.service.CassandraDaemon.  Program will exit.
>> >> >
>> >> >
>> >
>> >
>> >
>> > --
>> > Bidegg worlds best auction site
>> > http://bidegg.com
>> >
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: How to answer queries of form "Give me the top 10 messages"

2009-07-10 Thread Jonathan Ellis
Have you read this?

http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/

On Fri, Jul 10, 2009 at 4:43 PM,  wrote:
> Hey guys
> how do we answer queries of type - give me the top 10 messages
> or top 10 users and so on
> thanks
>
> Example: SuperColumns for Search Apps
>
> You can think of each supercolumn name as a term and the columns within as
> the docids with rank info and other attributes being a part of it. If you
> have keys as the userids then you can have a per-user index stored in this
> form. This is how the per user index for term search is laid out for Inbox
> search at Facebook. Furthermore since one has the option of storing data on
> disk sorted by "Time" it is very easy for the system to answer queries of
> the form "Give me the top 10 messages". For a pictorial explanation please
> refer to the Cassandra powerpoint slides presented at SIGMOD 2008.


Re: Can we connect to every node in cassandra ?

2009-07-12 Thread Jonathan Ellis
Every node assumes each other node listens on the same ports.  (This
might seem inflexible but it is actually a good policy to enforce.)
So just make sure those numbers are consistent across the cluster.

On Sun, Jul 12, 2009 at 5:31 PM,  wrote:
> Yes. There are more ports than just '9160' to consider. Gossip, Storage,
> UDP, etc. So as long as the other nodes have similar configs, just setting
> the IP's in the seed section is good enough.
> Thanks chris
> How do we specify 9160, gossip,storage, udp in the seeds xml section
>
> On Sun, Jul 12, 2009 at 1:37 PM, Chris Goffinet  wrote:
>>
>> On Jul 12, 2009, at 1:34 PM, mobiledream...@gmail.com wrote:
>>
>> Say there are 4 nodes in cassandra, is it possible that we can send
>> insert/delete/update queries to any of the nodes?
>>
>>
>>
>> Yes. Using the default partitioner, its designed to connect to any nodes.
>>
>>
>> Do all the data stores in other nodes need to be runnin on port 9160 as
>> there is not a way to specify port in the list of seeds
>>
>>
>> Yes. There are more ports than just '9160' to consider. Gossip, Storage,
>> UDP, etc. So as long as the other nodes have similar configs, just setting
>> the IP's in the seed section is good enough.
>>
>> Thanks a lot
>> --
>> Bidegg worlds best auction site
>> http://bidegg.com
>>
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: cassandra slows down after inserts

2009-07-13 Thread Jonathan Ellis
On Mon, Jul 13, 2009 at 12:37 AM, Sandeep Tata wrote:
> What hardware are you running one? How long does the slowdown last ?
>  There are a few reasons for temporary slowdowns ... perhaps the JVM
> started GCing?

Every time someone has reported this symptom, that has been the problem.

The object count tunable is the most direct way to ameliorate this.

http://wiki.apache.org/cassandra/MemtableThresholds


Re: cassandra slows down after inserts

2009-07-13 Thread Jonathan Ellis
See the wiki page I linked.

On Mon, Jul 13, 2009 at 8:06 AM, rkmr...@gmail.com wrote:
>
> how do i find out if JVM is GCing?
>
> On Sun, Jul 12, 2009 at 10:37 PM, Sandeep Tata 
> wrote:
>>
>> What hardware are you running one?
>
>  dual quadcore intel xeon 2.0 ghz, 32GB ram, and hardware raid config
> operating system is fedora core 9
>
>
>> How long does the slowdown last ?
>
> i stopped inserting data after slowdown starts, and it is still slow now
> after over 10 hours.
> however if i stop cassandra and start it, it becomes super fast immediately.
> till i insert another 100k or so rows when it becomes really slow again.
>
>
>>
>>  There are a few reasons for temporary slowdowns ... perhaps the JVM
>> started GCing?
>
> how do i find out if this is the cause?
>
>
>>
>> Cassandra spends time cleaning up the on-disk SSTables
>> in a process called compaction. This could cause the client to observe
>> a slowdown.
>>
>> Things you could try --
>> Reduce the Memtable size in the config files. (If GCing was the problem)
>> Increasing the number of SSTables written before compaction kicks in.
>
> can you tell me what numbers i should use?
>
> thanks!
>
>


Re: cassandra slows down after inserts

2009-07-13 Thread Jonathan Ellis
Cassandra is replaying the transaction log and preloading SSTable
indexes.  This is normal.

On Mon, Jul 13, 2009 at 8:10 AM, rkmr...@gmail.com wrote:
> when i stop cassandra and start it again, this is what is printed. it takes
> just a couple of seconds for this to run.
> and after that it becomes really fast.
>
>
> Listening for transport dt_socket at address: 
> DEBUG - Loading settings from ./../conf/storage-conf.xml
> DEBUG - adding Super1 as 0
> DEBUG - adding Standard2 as 1
> DEBUG - adding Standard1 as 2
> DEBUG - adding StandardByTime1 as 3
> DEBUG - adding LocationInfo as 4
> DEBUG - adding HintsColumnFamily as 5
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Super1-9-Data.db: 400 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Super1-52-Data.db: 300 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Super1-92-Data.db: 300 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db: 751 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db: 100 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db: 50 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db: 100 ms.
> INFO - Compacting
> [/home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db]
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db: 0 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db: 50 ms.
> DEBUG - INDEX LOAD TIME for
> /home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db: 0 ms.
> INFO - Replaying
> /home/mark/local/var/cassandra/commitlog/CommitLog-1247454203796.log
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db   : 73600
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db   : 84224
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db   : 94848
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db   : 105472
> DEBUG - Expected bloom filter size : 105472
> INFO - Compacted to
> /home/mark/local/var/cassandra/data/Table1-Super1-139-Data.db.  0/28831084
> bytes for 104856/104860 keys read/written.  Time: 8119ms.
> INFO - Flushing Memtable(Super1)@552364977
> DEBUG - Submitting Super1 for compaction
> INFO - Completed flushing Memtable(Super1)@552364977
> INFO - Flushing Memtable(Standard1)@1290243769
> DEBUG - Submitting Standard1 for compaction
> INFO - Completed flushing Memtable(Standard1)@1290243769
> INFO - Compacting
> [/home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-8-Data.db]
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db   : 256
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db   : 512
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db   : 768
> DEBUG - index size for bloom filter calc for file  :
> /home/mark/local/var/cassandra/data/Table1-Standard1-8-Data.db   : 1024
> DEBUG - Expected bloom filter size : 1024
> INFO - Compacted to
> /home/mark/local/var/cassandra/data/Table1-Standard1-3-Data.db.  0/210 bytes
> for 0/1 keys read/written.  Time: 301ms.
> DEBUG - Starting to listen on 127.0.0.1:7001
> INFO - Cassandra starting up...
>
>
>
> On Mon, Jul 13, 2009 at 6:06 AM, rkmr...@gmail.com 
> wrote:
>>
>> how do i find out if JVM is GCing?
>>
>> On Sun, Jul 12, 2009 at 10:37 PM, Sandeep Tata 
>> wrote:
>>>
>>> What hardware are you running one?
>>
>>  dual quadcore intel xeon 2.0 ghz, 32GB ram, and hardware raid config
>> operating system is fedora core 9
>>
>>
>>> How long does the slowdown last ?
>>
>> i stopped inserting data after slowdown starts, and it is still slow now
>> after over 10 hours.
>> however if i stop cassandra and start it, it becomes super fast
>> immediately. till i insert another 100k or so rows when it becomes really
>> slow again.
>>
>>
>>>
>>>  There are a few reasons for temporary slowdowns ... perhaps the JVM
>>> started GCing?
>>
>> how do i find out if this is the cause?
>>
>>
>>>
>>> Cassandra spends time cleaning up the on-disk SSTables
>>> in a process called compaction. This could c

Re: cassandra slows down after inserts

2009-07-13 Thread Jonathan Ellis
decrease

On Mon, Jul 13, 2009 at 8:53 AM, rkmr...@gmail.com wrote:
> On Mon, Jul 13, 2009 at 6:03 AM, Jonathan Ellis  wrote:
>>
>> On Mon, Jul 13, 2009 at 12:37 AM, Sandeep Tata
>> wrote:
>> > What hardware are you running one? How long does the slowdown last ?
>> >  There are a few reasons for temporary slowdowns ... perhaps the JVM
>> > started GCing?
>>
>> Every time someone has reported this symptom, that has been the problem.
>>
>> The object count tunable is the most direct way to ameliorate this.
>>
>> http://wiki.apache.org/cassandra/MemtableThresholds
>
> this is my current setting:
>
>     0.02
>
> should i increase or decrease it?
>


Re: Scaling from 1 to x (was: one server or more servers?)

2009-07-14 Thread Jonathan Ellis
On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson wrote:
> Cassandra doesn't provide the guarantees about the latest changes being
> available from any given node, so you can't really use it in such an
> application.
>
> I don't know if the "blocking" variants of the write operations make any
> more guarantees, if they do then it might be suitable.

Yes, quorum write/read would work just fine here.

-Jonathan


Re: Scaling from 1 to x (was: one server or more servers?)

2009-07-14 Thread Jonathan Ellis
There are several interesting values you can pass to block_for:

0: fire-and-forget.  minimizes latency when that is more important
than robustness
1: wait for at least one node to fully ack the write before returning
(the other replicas will be finished in the background)
N/2 + 1, where N is the number of replicas: this is a quorum write;
combined with quorum reads, it means you can tolerate up to N - (N/2 +
1) nodes failing before you can get inconsistent results.  (which is
usually better than no results at all.)
N: guarantees consistent reads without having to wait for a quorum, so
you trade write latency and availability (since the write will fail if
one of the target nodes is down) for 100% consistency and reduced read
latency

-Jonathan

On Tue, Jul 14, 2009 at 9:18 AM, Mark Robson wrote:
>
>
> 2009/7/14 Jonathan Ellis 
>>
>> On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson wrote:
>> > Cassandra doesn't provide the guarantees about the latest changes being
>> > available from any given node, so you can't really use it in such an
>> > application.
>> >
>> > I don't know if the "blocking" variants of the write operations make any
>> > more guarantees, if they do then it might be suitable.
>>
>> Yes, quorum write/read would work just fine here.
>
> Are those the type of writes which you get by setting the "block" parameter
> to 1?
>
> Mark
>


Re: one server or more servers?

2009-07-14 Thread Jonathan Ellis
gossip distributes the cluster status.

the seeds are there to be an initial contact point.

On Tue, Jul 14, 2009 at 10:04 AM,  wrote:
> Hey mark
> thanks for the detailed reply explaining the example of Seeds
>
> How do we add servers other than Seeds as there is no such place in conf
> file
>
> thanks


Re: one server or more servers?

2009-07-14 Thread Jonathan Ellis
the new servers contact the seeds, not the other way around

On Tue, Jul 14, 2009 at 10:10 AM,  wrote:
> Mark and Jonathan
> I m lost here
> Dont we need to specify atleast the server ip address in  the conf file. How
> would cassandra know which ips they are running in ie the other servers.
> I can see there is a way to specify seed but how would the seeds pick up the
> other servers if they do not know their ip address
> Also given the unlimited # of ips it cannot jus go thru each one of the ips
> and ping 7001
> Servers other than seeds are automatically picked up by the cluster when
> they start up; the nodes talk amongst themselves to figure out who's there.
> On Tue, Jul 14, 2009 at 8:06 AM, Mark Robson  wrote:
>>
>>
>> 2009/7/14 
>>>
>>> How do we add servers other than Seeds as there is no such place in conf
>>> file
>>
>> Servers other than seeds are automatically picked up by the cluster when
>> they start up; the nodes talk amongst themselves to figure out who's there.
>>
>> Only the seeds need to be explicitly configured.
>>
>> This is a Good Thing :)
>>
>> Mark
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: replica on in the beginning or added later

2009-07-14 Thread Jonathan Ellis
although the repair code Stu is working on
(https://issues.apache.org/jira/browse/CASSANDRA-193) could handle
increasing the replica count, IMO there's little sense in relying any
more on features that don't yet exist than necessary. :)

On Tue, Jul 14, 2009 at 10:17 AM,  wrote:
> as a followup question
> the items we are storing are extremely valuable and we are using cassandra
> as a sql replacement tool.. ie no more postgres and all data from cassandra,
> given cassandra scalability
> as we hit limits on postgres and found pgpool-II horizontal partitioning too
> clunky and skype, plproxy requires too much rewiring the client code.
> should we start with a replica factor 1 and then increase replica factor to
> 2
> or is is prudent to start with a replica factor of 2
> Can cassandra replicate even after running for a long time with a replica
> factor of 1, if we change the replica factor to say 2 after 2months when we
> add more nodes and figure there is enough space now to replicate
> thanks
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: replica on in the beginning or added later

2009-07-14 Thread Jonathan Ellis
Note that for N=2, quorum write is the same as block-for-all.  That is
why N=3 is more popular, because it allows for one node to be down but
still give you a quorum for any key.

-Jonathan

On Tue, Jul 14, 2009 at 10:22 AM,  wrote:
> starting with replica count 2 is more prudent thanks
>
> On Tue, Jul 14, 2009 at 8:21 AM, Jonathan Ellis  wrote:
>>
>> although the repair code Stu is working on
>> (https://issues.apache.org/jira/browse/CASSANDRA-193) could handle
>> increasing the replica count, IMO there's little sense in relying any
>> more on features that don't yet exist than necessary. :)
>>
>> On Tue, Jul 14, 2009 at 10:17 AM,  wrote:
>> > as a followup question
>> > the items we are storing are extremely valuable and we are using
>> > cassandra
>> > as a sql replacement tool.. ie no more postgres and all data from
>> > cassandra,
>> > given cassandra scalability
>> > as we hit limits on postgres and found pgpool-II horizontal partitioning
>> > too
>> > clunky and skype, plproxy requires too much rewiring the client code.
>> > should we start with a replica factor 1 and then increase replica factor
>> > to
>> > 2
>> > or is is prudent to start with a replica factor of 2
>> > Can cassandra replicate even after running for a long time with a
>> > replica
>> > factor of 1, if we change the replica factor to say 2 after 2months when
>> > we
>> > add more nodes and figure there is enough space now to replicate
>> > thanks
>> >
>> > --
>> > Bidegg worlds best auction site
>> > http://bidegg.com
>> >
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: problem running cassandra

2009-07-14 Thread Jonathan Ellis
the bind to port  was successful; the ones to the messagingservice
ports were not

On Tue, Jul 14, 2009 at 10:59 PM,  wrote:
> http://pastie.org/546395
>
> get this eror but
>
> cassandra]$  sudo netstat -apn | grep |wc -l
>
> is empty
>
> i wonder if this is a known issue
>
> thanks
>


Re: Best way to use a Cassandra Client in a multi-threaded environment?

2009-07-15 Thread Jonathan Ellis
IIRC thrift makes no effort to generate threadsafe code.

which makes sense in an rpc-oriented protocol really.

On Wed, Jul 15, 2009 at 7:25 PM, Joel Meyer wrote:
> Hello,
> Are there any recommendations on how to use Cassandra Clients in a
> multi-threaded front-end application (java)? Is the Client thread-safe or is
> it best to do a client per thread (or object pool of some sort)?
> Thanks,
> Joel


Re: Best way to use a Cassandra Client in a multi-threaded environment?

2009-07-15 Thread Jonathan Ellis
What I mean is, if you have

client.rpc1()


it doesn't really matter if you can do

client.rpc2()


from another thread or not, since it's dumb. :)

On Wed, Jul 15, 2009 at 7:41 PM, Ian Holsman wrote:
>
> On 16/07/2009, at 10:35 AM, Jonathan Ellis wrote:
>
>> IIRC thrift makes no effort to generate threadsafe code.
>>
>> which makes sense in an rpc-oriented protocol really.
>
> hmm.. not really. you can have a webserver calling a thrift backend quite
> easily, and then you would have 100+ threads all calling the same code.
>>
>> On Wed, Jul 15, 2009 at 7:25 PM, Joel Meyer wrote:
>>>
>>> Hello,
>>> Are there any recommendations on how to use Cassandra Clients in a
>>> multi-threaded front-end application (java)? Is the Client thread-safe or
>>> is
>>> it best to do a client per thread (or object pool of some sort)?
>>> Thanks,
>>> Joel
>
> --
> Ian Holsman
> i...@holsman.net
>
>
>
>


Re: Best way to use a Cassandra Client in a multi-threaded environment?

2009-07-15 Thread Jonathan Ellis
On Wed, Jul 15, 2009 at 8:13 PM, Ian Holsman wrote:
> ugh.
> if this is a byproduct of thrift

it is.

> we should have another way of getting to
> the backend.
> serialization is *not* a desired feature for most people ;-0

maybe not, but that's how every single database client works that I
can think of, so it shouldn't exactly be surprising.

you want multiple commands executing in parallel, you open multiple
connections.  not a Big Deal imo.

-Jonathan


Re: one server or more servers?

2009-07-16 Thread Jonathan Ellis
the FAQ talks about using listenaddress: http://wiki.apache.org/cassandra/FAQ

On Thu, Jul 16, 2009 at 1:49 AM,  wrote:
> if i make listenaddress blank
> i get in oneserver
> binding to 127.0.0.1
> in 2nd server
> sometimes to the ip address of the server
> in 3rd server
> WARN - Exception was generated at : 07/16/2009 02:39:37 on thread GMFD:1
> Network is unreachable
> java.net.SocketException: Network is unreachable
>     at sun.nio.ch.DatagramChannelImpl.send0(Native Method)
>     at
> sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(DatagramChannelImpl.java:319)
>     at sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:299)
>     at sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:268)
>     at
> org.apache.cassandra.net.UdpConnection.write(UdpConnection.java:88)
>     at
> org.apache.cassandra.net.MessagingService.sendUdpOneWay(MessagingService.java:469)
>     at
> org.apache.cassandra.gms.GossipDigestSynVerbHandler.doVerb(Gossiper.java:984)
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     at java.lang.Thread.run(Thread.java:636)
>
> On Wed, Jul 15, 2009 at 11:31 PM,  wrote:
>>
>> some one in the group said a min of 2 seeds is necessary.
>>
>> i ll set the listenAddress to blank
>>
>> but i think it might be a problem of ports being blocked by fedora
>>
>> Can someone please list the ports used by cassandra to access the outside
>> seeds and find the ring network?
>>
>> And if there are any users using fedora - can you show me how to open
>> those ports so cassandra can gossip its way into a ring network
>>
>> right now i have 4 island cassandra nodes  :(
>>
>> On Wed, Jul 15, 2009 at 9:24 PM, Evan Weaver  wrote:
>>>
>>> Oh, yeah, definitely set ListenAddress to blank. 0.0.0.0 doesn't mean
>>> "all interfaces" for some reason I forget.
>>>
>>> Evan
>>>
>>> On Wed, Jul 15, 2009 at 9:23 PM, Evan Weaver wrote:
>>> > Try with only one seed. Not every host has to be in the seeds.
>>> >
>>> > Evan
>>> >
>>> > On Wed, Jul 15, 2009 at 8:52 PM,  wrote:
>>> >> in Seeds
>>> >> can we specify domain name instead of ip address
>>> >> right now seeds is specifying ip address
>>> >>
>>> >> On Wed, Jul 15, 2009 at 4:49 PM, Evan Weaver 
>>> >> wrote:
>>> >>>
>>> >>> I sometimes have to use 127.0.0.1, at least when ListenAddress is
>>> >>> blank (auto-discover). Dunno if that has changed.
>>> >>>
>>> >>> Looks like this if you're successful:
>>> >>>
>>> >>> $ bin/nodeprobe --host 10.224.17.13 ring
>>> >>> Token(124007023942663924846758258675932114665)  3 10.224.17.13  |<--|
>>> >>> Token(106858063638814585506848525974047690568)  3 10.224.17.19  |   ^
>>> >>> Token(141130545721235451315477340120224986045)  3 10.224.17.14  |-->|
>>> >>>
>>> >>> Evan
>>> >>>
>>> >>> On Wed, Jul 15, 2009 at 4:24 PM, Michael
>>> >>> Greene
>>> >>> wrote:
>>> >>> > The port you're looking for is typically 8080, but if you only
>>> >>> > specify
>>> >>> > the host and not the port it shoudl work just fine.
>>> >>> >
>>> >>> > bin/nodeprobe -host localhost
>>> >>> >
>>> >>> > Michael
>>> >>> >
>>> >>> > On Wed, Jul 15, 2009 at 6:18 PM,  wrote:
>>> >>> >> bin]$ ./nodeprobe -host localhost -port 
>>> >>> >> Error connecting to remote JMX agent!
>>> >>> >> java.io.IOException: Failed to retrieve RMIServer stub:
>>> >>> >> javax.naming.CommunicationException [Root exception is
>>> >>> >> java.rmi.ConnectIOException: error during JRMP connection
>>> >>> >> establishment;
>>> >>> >> nested exception is:
>>> >>> >>         java.io.EOFException]
>>> >>> >>         at
>>> >>> >>
>>> >>> >> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:342)
>>> >>> >>         at
>>> >>> >>
>>> >>> >>
>>> >>> >> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267)
>>> >>> >>         at
>>> >>> >> org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:149)
>>> >>> >>         at
>>> >>> >> org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:111)
>>> >>> >>         at
>>> >>> >> org.apache.cassandra.tools.NodeProbe.main(NodeProbe.java:470)
>>> >>> >> Caused by: javax.naming.CommunicationException [Root exception is
>>> >>> >> java.rmi.ConnectIOException: error during JRMP connection
>>> >>> >> establishment;
>>> >>> >> nested exception is:
>>> >>> >>         java.io.EOFException]
>>> >>> >>         at
>>> >>> >>
>>> >>> >>
>>> >>> >> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118)
>>> >>> >>         at
>>> >>> >>
>>> >>> >>
>>> >>> >> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:203)
>>> >>> >>         at
>>> >>> >> javax.naming.InitialContext.lookup(InitialContext.java:409)
>>> >>> >>         at
>>> >>> >>
>>> >>> >>
>>> >>> >> javax.management.remote.rmi.RMIConnect

Re: WARN - Unable to find a live Endpoint we might be out of live nodes , This is dangerous !!!!

2009-07-17 Thread Jonathan Ellis
Please don't repeat your question separately on -user, -dev, and irc.
If nobody answers it's either because we're busy or we don't know the
answer.

In this case it's probably a bit of both. :)

I've never heard of anyone running into this before so my guess is
it's something weird with your network configuration.  What happens if
you try to connect to 226.129.12.117:7001 from  226.229.123.185 (e.g.
with netcat), for instance?

If you want to get into the code, set your log to TRACE and it will
spit out a _ton_ of messages about gossip.

On Fri, Jul 17, 2009 at 2:23 AM,  wrote:
> if i kill and start the cassandras again they are able to find each other
> but if they are left alone they go down on each other! ie they are unable to
> find each other
>
> On Fri, Jul 17, 2009 at 12:20 AM,  wrote:
>>
>> Why do the other nodes go down, in each of the nodes if i run nodprobe the
>> results show that the other nodes are down
>> 226.229.123.185:7001  up
>> 226.129.12.117:7001  down
>> 226.229.123.116:7001  down
>> 226.229.112.134:7001  down
>> Token(165434480505148814142836593307761304854)
>>
>> On Fri, Jul 17, 2009 at 12:18 AM,  wrote:
>>>
>>> What does this mean?
>>>
>>> DEBUG - clearing
>>> DEBUG - remove
>>> WARN - Unable to find a live Endpoint we might be out of live nodes ,
>>> This is dangerous 
>>> WARN - Unable to find a live Endpoint we might be out of live nodes ,
>>> This is dangerous 
>>> DEBUG - locally writing writing key tofu to 11.12.13.0:7000
>>> --
>>> Bidegg worlds best auction site
>>> http://bidegg.com
>>
>>
>>
>> --
>> Bidegg worlds best auction site
>> http://bidegg.com
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: Concurrent updates

2009-07-17 Thread Jonathan Ellis
This is the kind of inconsistency that vector clocks can handle but
the more simplistic timestamp-based resolution cannot.

Of test-and-set vs vector clocks, vector clocks fits cassandra much better.

-Jonathan

On Fri, Jul 17, 2009 at 9:59 AM, Jun Rao wrote:
> This is a case where a test-and-set feature would be useful. See the
> following JIRA. We just don't have it nailed down yet.
> https://issues.apache.org/jira/browse/CASSANDRA-48
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>
> jun...@almaden.ibm.com
>
> Ivan Chang 
>
>
> Ivan Chang 
>
> 07/17/2009 07:14 AM
>
> Please respond to
> cassandra-user@incubator.apache.org
>
> To
> cassandra-user@incubator.apache.org
> cc
>
> Subject
> Concurrent updates
> I have the following scenario that would like a best solution for.
>
> Here's the scenario:
>
> Table1.Standard1['cassandra']['frequency']
>
> it is used for keeping track of how many times the word "cassandra"
> appeared.
>
> Let's say we have a bunch of articles stored in Hadoop, a Map/Reduce greps
> all articles throughout the Hadoop cluster that matches the pattern
> ^cassandra$
> and updates Table1.Standard1['cassandra']['frequency'].  Hence
> Table1.Standard1['cassandra']['frequency'] will be updated concurrently.
>
> One of the issues I am facing is that
> Table1.Standard1['cassandra']['frequency']
> stores the count as a String (I am using Java), so in order to update the
> frequency
> properly, the thread that's running the Map/Reduce will have to retrieve
> Table1.Standard1['cassandra']['frequency'] in its native String format and
> hold
> that in temp (java Sttring), convert into int, then add the new counts in,
> and finally
> "SET Table1.Standard1['cassandra']['frequency']. =  '" + temp.toString() +
> ''"
>
> During the entire process, how do we guranatee concurrency.  The Cql SET
> does
> not allow something like
>
> SET Table1.Standard1['cassandra']['frequency']. =
> Table1.Standard1['cassandra']['frequency']. + newCounts
>
> since there's only one String type.
>
> What would be the best solution in this situtaion?
>
> Thanks,
> Ivan
>


Re: WARN - Unable to find a live Endpoint we might be out of live nodes , This is dangerous !!!!

2009-07-17 Thread Jonathan Ellis
7000 is tcp

7001 is udp

On Fri, Jul 17, 2009 at 12:34 PM,  wrote:
> Jonathan
>
> tmp]$ nc  -v 226.129.12.117 7001
> nc: connect to 226.129.12.117 port 7001 (tcp) failed: Connection refused
>  tmp]$ nc  -v 226.129.12.117 7001
> nc: connect to 226.129.12.117 port 7001 (tcp) failed: Connection refused
>
> I get a connect refused but is tcp the way to connect or is there a
> different way to use nc command ie using udp mode?
>
> if i do
> nc -u -v 226.129.12.117 7001
> it just hangs there
>
> /etc/hosts has the following in our servers
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 127.0.0.1 localhost.localdomain localhost localhost
> ::1 localhost6.localdomain6 localhost6
>
> On Fri, Jul 17, 2009 at 6:09 AM, Jonathan Ellis  wrote:
>>
>> Please don't repeat your question separately on -user, -dev, and irc.
>> If nobody answers it's either because we're busy or we don't know the
>> answer.
>>
>> In this case it's probably a bit of both. :)
>>
>> I've never heard of anyone running into this before so my guess is
>> it's something weird with your network configuration.  What happens if
>> you try to connect to 226.129.12.117:7001 from  226.229.123.185 (e.g.
>> with netcat), for instance?
>>
>> If you want to get into the code, set your log to TRACE and it will
>> spit out a _ton_ of messages about gossip.
>>
>> On Fri, Jul 17, 2009 at 2:23 AM,  wrote:
>> > if i kill and start the cassandras again they are able to find each
>> > other
>> > but if they are left alone they go down on each other! ie they are
>> > unable to
>> > find each other
>> >
>> > On Fri, Jul 17, 2009 at 12:20 AM,  wrote:
>> >>
>> >> Why do the other nodes go down, in each of the nodes if i run nodprobe
>> >> the
>> >> results show that the other nodes are down
>> >> 226.229.123.185:7001  up
>> >> 226.129.12.117:7001  down
>> >> 226.229.123.116:7001  down
>> >> 226.229.112.134:7001  down
>> >> Token(165434480505148814142836593307761304854)
>> >>
>> >> On Fri, Jul 17, 2009 at 12:18 AM,  wrote:
>> >>>
>> >>> What does this mean?
>> >>>
>> >>> DEBUG - clearing
>> >>> DEBUG - remove
>> >>> WARN - Unable to find a live Endpoint we might be out of live nodes ,
>> >>> This is dangerous 
>> >>> WARN - Unable to find a live Endpoint we might be out of live nodes ,
>> >>> This is dangerous 
>> >>> DEBUG - locally writing writing key tofu to 11.12.13.0:7000
>> >>> --
>> >>> Bidegg worlds best auction site
>> >>> http://bidegg.com
>> >>
>> >>
>> >>
>> >> --
>> >> Bidegg worlds best auction site
>> >> http://bidegg.com
>> >
>> >
>> >
>> > --
>> > Bidegg worlds best auction site
>> > http://bidegg.com
>> >
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: Scaling from 1 to x (was: one server or more servers?)

2009-07-17 Thread Jonathan Ellis
HH is a mechanism to reduce inconsistency, but a node holding a HH row
while waiting for the "right" node to recover won't be part of the
group that is queried for it (since it could be anywhere).  So if you
set block_for to M and less than M of the actual replica destinations
are up, Cassandra will fail the write.

If you set block_for to zero, then writes will indeed never fail
(unless the node the client is talking to dies mid-action, of course).

-Jonathan

On Fri, Jul 17, 2009 at 3:01 PM, Vijay wrote:
> "since the write will fail if one of the target nodes is down"
> I thought Hinted handoff will take care of this Right? Write will never fail
> insted it will write to another node right?
>
> correct me if i am wrong.
>
> Thanks and Regards,
> 
>
>
>
>
> On Tue, Jul 14, 2009 at 7:26 AM, Jonathan Ellis  wrote:
>>
>> N: guarantees consistent reads without having to wait for a quorum, so
>> you trade write latency and availability (since the write will fail if
>> one of the target nodes is down) for 100% consistency and reduced read
>> latency
>


Re: Scaling from 1 to x (was: one server or more servers?)

2009-07-17 Thread Jonathan Ellis
On Fri, Jul 17, 2009 at 3:58 PM, Vijay wrote:
> Still confused,
>
> If i have a Quorum Write with block for to be 3 and 2 of them are alive i
> will write to 3 nodes with HH right?

yes, but only 2 will be available for reads, so the 3rd can't count
towards fulfulling block_for.

the semantics of block_for on read (R) and write (W) are that you have
strong consistency if R + W >= N where N is number of replicas.  (see
http://www.allthingsdistributed.com/2007/12/eventually_consistent.html)

For this to hold in cassandra, we need to provide consistency where
(for instance) W = N and R = 1.  Remember that a HH write is not
available for reads.  This means that we need to fail the write if we
can't write the full N replicas to the right nodes.

(This is why quorum write + quorum read is often a better tradeoff in
practice since you can tolerate node failures w/o losing
availability.)

-Jonathan

> During query it will fail if i only have block for to be 3?
>
> Regards,
> 
>
>
>
>
> On Fri, Jul 17, 2009 at 1:36 PM, Jonathan Ellis  wrote:
>>
>> ck_for to zero, then writes will indeed never fail
>> (unless the node the client is ta
>


Re: python thrift cassandra: get_slice_super vs get_slice_super_by_names

2009-07-19 Thread Jonathan Ellis
I would guess because kw != 'tofu'

On Sun, Jul 19, 2009 at 12:24 AM,  wrote:
> Why doesnt res return ColumnFamily Related whereas res2 works just fine
> thanks?
>
> timestamp = time.time()
> res = client.get_slice_super('Table1', kw, 'Super1','','',True,0,1000)
> print res
> []
> res2 = client.get_slice_super_by_names('Table1', 'tofu', 'Super1',
> ['Related',])[0]
>
>
> print res2
> [superColumn_t(name='Related', columns=[column_t(columnName='tofu calories',
> value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
> column_t(columnName='tofu festival', value="(dp1\nS'count'\np2\nI1\ns.",
> timestamp=1247980687), column_t(columnName='tofu marinade',
> value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
> column_t(columnName='tofu recipe', value="(dp1\nS'count'\np2\nI1\ns.",
> timestamp=1247980687), column_t(columnName='tofu recipes',
> value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
> column_t(columnName='tofu recipes easy', value="(dp1\nS'count'\np2\nI1\ns.",
> timestamp=1247980687), column_t(columnName='tofu scramble',
> value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
> column_t(columnName='tofu stir fry', value="(dp1\nS'count'\np2\nI1\ns.",
> timestamp=1247980687), column_t(columnName='tofurkey',
> value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
> column_t(columnName='tofutti', value="(dp1\nS'count'\np2\nI1\ns.",
> timestamp=1247980687)])]
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: ì¤ ìì§ ì¤ ì thrift.Thrift.TApplicationEx ception: Internal error processing insert

2009-07-19 Thread Jonathan Ellis
That should be partially solved in trunk now that 139 is committed,
and more solved when we commit 185 soon.

On Sun, Jul 19, 2009 at 3:43 AM,  wrote:
> Any utf-8 keyword causes cassandra to crash!
>


Re: how to delete an entire column family

2009-07-19 Thread Jonathan Ellis
iterate through the keys with get_key_range, and delete the row
associated with each key

On Sun, Jul 19, 2009 at 3:51 AM,  wrote:
> In Super-column family Super1 there is a column family Related
> How do i delete the entire related column family
> thanks


Re: python thrift cassandra: get_slice_super vs get_slice_super_by_names

2009-07-19 Thread Jonathan Ellis
Strange.  If you can post a script showing how to reproduce the
problem from a fresh database then I can debug it.

On Sun, Jul 19, 2009 at 11:23 AM,  wrote:
> Jon i should have mntioned kw is 'tofu'
> that is why it looks quite not right
>
> On Sun, Jul 19, 2009 at 6:08 AM, Jonathan Ellis  wrote:
>>
>> I would guess because kw != 'tofu'
>>
>> On Sun, Jul 19, 2009 at 12:24 AM,  wrote:
>> > Why doesnt res return ColumnFamily Related whereas res2 works just fine
>> > thanks?
>> >
>> > timestamp = time.time()
>> > res = client.get_slice_super('Table1', kw, 'Super1','','',True,0,1000)
>> > print res
>> > []
>> > res2 = client.get_slice_super_by_names('Table1', 'tofu', 'Super1',
>> > ['Related',])[0]
>> >
>> >
>> > print res2
>> > [superColumn_t(name='Related', columns=[column_t(columnName='tofu
>> > calories',
>> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
>> > column_t(columnName='tofu festival', value="(dp1\nS'count'\np2\nI1\ns.",
>> > timestamp=1247980687), column_t(columnName='tofu marinade',
>> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
>> > column_t(columnName='tofu recipe', value="(dp1\nS'count'\np2\nI1\ns.",
>> > timestamp=1247980687), column_t(columnName='tofu recipes',
>> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
>> > column_t(columnName='tofu recipes easy',
>> > value="(dp1\nS'count'\np2\nI1\ns.",
>> > timestamp=1247980687), column_t(columnName='tofu scramble',
>> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
>> > column_t(columnName='tofu stir fry', value="(dp1\nS'count'\np2\nI1\ns.",
>> > timestamp=1247980687), column_t(columnName='tofurkey',
>> > value="(dp1\nS'count'\np2\nI1\ns.", timestamp=1247980687),
>> > column_t(columnName='tofutti', value="(dp1\nS'count'\np2\nI1\ns.",
>> > timestamp=1247980687)])]
>> >
>> >
>> > --
>> > Bidegg worlds best auction site
>> > http://bidegg.com
>> >
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: New cassandra in trunk - breaks python thrift interface (was AttributeError: 'str' object has no attribute 'write')

2009-07-19 Thread Jonathan Ellis
Don't run trunk if you're not going to read "svn log."

The api changed with the commit of the 139 patches (and it will change
again with the 185 ones).

look at interface/cassandra.thrift to see what arguments are expected.

On Sun, Jul 19, 2009 at 3:31 PM,  wrote:
> Hey Gasol wu
> i regenerated the new thrift interface using
> thrift -gen py cassandra.thrift
>
>
>
> client.insert('Table1', 'tofu', 'Super1:Related:tofu stew',
> pickle.dumps(dict(count=1)), time.time(), 0)
> ---
> AttributeError    Traceback (most recent call last)
>
> /home/mark/work/cexperiments/ in ()
>
> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, key,
> column_path, value, timestamp, block_for)
>     358  - block_for
>     359 """
> --> 360 self.send_insert(table, key, column_path, value, timestamp,
> block_for)
>     361 self.recv_insert()
>     362
>
> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, table,
> key, column_path, value, timestamp, block_for)
>     370 args.timestamp = timestamp
>     371 args.block_for = block_for
> --> 372 args.write(self._oprot)
>     373 self._oprot.writeMessageEnd()
>     374 self._oprot.trans.flush()
>
> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>    1923 if self.column_path != None:
>    1924   oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
> -> 1925   self.column_path.write(oprot)
>    1926   oprot.writeFieldEnd()
>    1927 if self.value != None:
>
> AttributeError: 'str' object has no attribute 'write'
>
>
> On Sun, Jul 19, 2009 at 10:29 AM, Gasol Wu  wrote:
>>
>> hi,
>> the cassandra.thrift has changed.
>> u need to generate new python client and compile class again.
>>
>>
>> On Mon, Jul 20, 2009 at 1:18 AM,  wrote:
>>>
>>> Hi guys
>>> the new trunk cassandra doesnt work for a simple insert, how do we get
>>> this working
>>> client.insert('Table1', 'tofu', 'Super1:Related:tofu
>>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>>>
>>> ---
>>> AttributeError                            Traceback (most recent call
>>> last)
>>> /home/mark/work/cexperiments/ in ()
>>> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, key,
>>> column_path, value, timestamp, block_for)
>>>     358      - block_for
>>>     359     """
>>> --> 360     self.send_insert(table, key, column_path, value, timestamp,
>>> block_for)
>>>     361     self.recv_insert()
>>>     362
>>> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, table,
>>> key, column_path, value, timestamp, block_for)
>>>     370     args.timestamp = timestamp
>>>     371     args.block_for = block_for
>>> --> 372     args.write(self._oprot)
>>>     373     self._oprot.writeMessageEnd()
>>>     374     self._oprot.trans.flush()
>>> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>>>    1923     if self.column_path != None:
>>>    1924       oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
>>> -> 1925       self.column_path.write(oprot)
>>>    1926       oprot.writeFieldEnd()
>>>    1927     if self.value != None:
>>> AttributeError: 'str' object has no attribute 'write'
>>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu
>>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>>>
>>> --
>>> Bidegg worlds best auction site
>>> http://bidegg.com
>>
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: New cassandra in trunk - breaks python thrift interface (was AttributeError: 'str' object has no attribute 'write')

2009-07-19 Thread Jonathan Ellis
For the record, this is not actually a bug, and if you're not sure,
asking on the list before filing a report in JIRA is probably a good
thing.

On Sun, Jul 19, 2009 at 6:45 PM, Ian Holsman wrote:
> hi mobile.
> is it possible to put these as JIRA bugs ? instead of just mailing them on
> the list ?
>
> that way people can give them a bit more attention. and other people who
> have the same issue will be easily see what is going on.
>
> the URL is here :- https://issues.apache.org/jira/browse/CASSANDRA
> regards
> Ian
>
> On 20/07/2009, at 6:36 AM, mobiledream...@gmail.com wrote:
>
>> ok
>> so which is the version where cassandra python thrift works out of the box
>> thanks
>>
>> On 7/19/09, Jonathan Ellis  wrote: Don't run trunk if
>> you're not going to read "svn log."
>>
>> The api changed with the commit of the 139 patches (and it will change
>> again with the 185 ones).
>>
>> look at interface/cassandra.thrift to see what arguments are expected.
>>
>>
>> On Sun, Jul 19, 2009 at 3:31 PM,  wrote:
>> > Hey Gasol wu
>> > i regenerated the new thrift interface using
>> > thrift -gen py cassandra.thrift
>> >
>> >
>> >
>> > client.insert('Table1', 'tofu', 'Super1:Related:tofu stew',
>> > pickle.dumps(dict(count=1)), time.time(), 0)
>> >
>> > ---
>> > AttributeError                            Traceback (most recent call
>> > last)
>> >
>> > /home/mark/work/cexperiments/ in ()
>> >
>> > /home/mark/work/common/cassandra/Cassandra.py in insert(self, table,
>> > key,
>> > column_path, value, timestamp, block_for)
>> >     358      - block_for
>> >     359     """
>> > --> 360     self.send_insert(table, key, column_path, value, timestamp,
>> > block_for)
>> >     361     self.recv_insert()
>> >     362
>> >
>> > /home/mark/work/common/cassandra/Cassandra.py in send_insert(self,
>> > table,
>> > key, column_path, value, timestamp, block_for)
>> >     370     args.timestamp = timestamp
>> >     371     args.block_for = block_for
>> > --> 372     args.write(self._oprot)
>> >     373     self._oprot.writeMessageEnd()
>> >     374     self._oprot.trans.flush()
>> >
>> > /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>> >    1923     if self.column_path != None:
>> >    1924       oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
>> > -> 1925       self.column_path.write(oprot)
>> >    1926       oprot.writeFieldEnd()
>> >    1927     if self.value != None:
>> >
>> > AttributeError: 'str' object has no attribute 'write'
>> >
>> >
>> > On Sun, Jul 19, 2009 at 10:29 AM, Gasol Wu  wrote:
>> >>
>> >> hi,
>> >> the cassandra.thrift has changed.
>> >> u need to generate new python client and compile class again.
>> >>
>> >>
>> >> On Mon, Jul 20, 2009 at 1:18 AM,  wrote:
>> >>>
>> >>> Hi guys
>> >>> the new trunk cassandra doesnt work for a simple insert, how do we get
>> >>> this working
>> >>> client.insert('Table1', 'tofu', 'Super1:Related:tofu
>> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>> >>>
>> >>>
>> >>> ---
>> >>> AttributeError                            Traceback (most recent call
>> >>> last)
>> >>> /home/mark/work/cexperiments/ in ()
>> >>> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table,
>> >>> key,
>> >>> column_path, value, timestamp, block_for)
>> >>>     358      - block_for
>> >>>     359     """
>> >>> --> 360     self.send_insert(table, key, column_path, value,
>> >>> timestamp,
>> >>> block_for)
>> >>>     361     self.recv_insert()
>> >>>     362
>> >>> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self,
>> >>> table,
>> >>> key, column_path, value, timestamp, block_for)
>> >>>     370     args.timestamp = timestamp
>> >>>     371     args.block_for = block_for
>> >>> --> 372     args.write(self._oprot)
>> >>>     373     self._oprot.writeMessageEnd()
>> >>>     374     self._oprot.trans.flush()
>> >>> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>> >>>    1923     if self.column_path != None:
>> >>>    1924       oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
>> >>> -> 1925       self.column_path.write(oprot)
>> >>>    1926       oprot.writeFieldEnd()
>> >>>    1927     if self.value != None:
>> >>> AttributeError: 'str' object has no attribute 'write'
>> >>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu
>> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>> >>>
>> >>> --
>> >>> Bidegg worlds best auction site
>> >>> http://bidegg.com
>> >>
>> >
>> >
>> >
>> > --
>> > Bidegg worlds best auction site
>> > http://bidegg.com
>> >
>>
>>
>>
>> --
>> Bidegg worlds best auction site
>> http://bidegg.com
>
> --
> Ian Holsman
> i...@holsman.net
>
>
>
>


Re: New cassandra in trunk - breaks python thrift interface (was AttributeError: 'str' object has no attribute 'write')

2009-07-19 Thread Jonathan Ellis
It works fine, it's just not the same as it was two weeks ago.

On Sun, Jul 19, 2009 at 3:36 PM,  wrote:
> ok
> so which is the version where cassandra python thrift works out of the box
> thanks
>
> On 7/19/09, Jonathan Ellis  wrote:
>>
>> Don't run trunk if you're not going to read "svn log."
>>
>> The api changed with the commit of the 139 patches (and it will change
>> again with the 185 ones).
>>
>> look at interface/cassandra.thrift to see what arguments are expected.
>>
>>
>> On Sun, Jul 19, 2009 at 3:31 PM,  wrote:
>> > Hey Gasol wu
>> > i regenerated the new thrift interface using
>> > thrift -gen py cassandra.thrift
>> >
>> >
>> >
>> > client.insert('Table1', 'tofu', 'Super1:Related:tofu stew',
>> > pickle.dumps(dict(count=1)), time.time(), 0)
>> >
>> > ---
>> > AttributeErrorTraceback (most recent call
>> > last)
>> >
>> > /home/mark/work/cexperiments/ in ()
>> >
>> > /home/mark/work/common/cassandra/Cassandra.py in insert(self, table,
>> > key,
>> > column_path, value, timestamp, block_for)
>> > 358  - block_for
>> > 359 """
>> > --> 360 self.send_insert(table, key, column_path, value, timestamp,
>> > block_for)
>> > 361 self.recv_insert()
>> > 362
>> >
>> > /home/mark/work/common/cassandra/Cassandra.py in send_insert(self,
>> > table,
>> > key, column_path, value, timestamp, block_for)
>> > 370 args.timestamp = timestamp
>> > 371 args.block_for = block_for
>> > --> 372 args.write(self._oprot)
>> > 373 self._oprot.writeMessageEnd()
>> > 374 self._oprot.trans.flush()
>> >
>> > /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>> >1923 if self.column_path != None:
>> >1924   oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
>> > -> 1925   self.column_path.write(oprot)
>> >1926   oprot.writeFieldEnd()
>> >1927 if self.value != None:
>> >
>> > AttributeError: 'str' object has no attribute 'write'
>> >
>> >
>> > On Sun, Jul 19, 2009 at 10:29 AM, Gasol Wu  wrote:
>> >>
>> >> hi,
>> >> the cassandra.thrift has changed.
>> >> u need to generate new python client and compile class again.
>> >>
>> >>
>> >> On Mon, Jul 20, 2009 at 1:18 AM,  wrote:
>> >>>
>> >>> Hi guys
>> >>> the new trunk cassandra doesnt work for a simple insert, how do we get
>> >>> this working
>> >>> client.insert('Table1', 'tofu', 'Super1:Related:tofu
>> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>> >>>
>> >>>
>> >>> ---
>> >>> AttributeErrorTraceback (most recent call
>> >>> last)
>> >>> /home/mark/work/cexperiments/ in ()
>> >>> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table,
>> >>> key,
>> >>> column_path, value, timestamp, block_for)
>> >>> 358  - block_for
>> >>> 359 """
>> >>> --> 360 self.send_insert(table, key, column_path, value,
>> >>> timestamp,
>> >>> block_for)
>> >>> 361 self.recv_insert()
>> >>> 362
>> >>> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self,
>> >>> table,
>> >>> key, column_path, value, timestamp, block_for)
>> >>> 370 args.timestamp = timestamp
>> >>> 371 args.block_for = block_for
>> >>> --> 372 args.write(self._oprot)
>> >>> 373 self._oprot.writeMessageEnd()
>> >>> 374 self._oprot.trans.flush()
>> >>> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>> >>>1923 if self.column_path != None:
>> >>>1924   oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
>> >>> -> 1925   self.column_path.write(oprot)
>> >>>1926   oprot.writeFieldEnd()
>> >>>1927 if self.value != None:
>> >>> AttributeError: 'str' object has no attribute 'write'
>> >>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu
>> >>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>> >>>
>> >>> --
>> >>> Bidegg worlds best auction site
>> >>> http://bidegg.com
>> >>
>> >
>> >
>> >
>> > --
>> > Bidegg worlds best auction site
>> > http://bidegg.com
>> >
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com


Re: AttributeError: 'str' object has no attribute 'write'

2009-07-19 Thread Jonathan Ellis
Building the java interface is part of the build, but ant has no way
to guess which additional client interfaces you want to use, if any.

On Sun, Jul 19, 2009 at 6:46 PM, Ian Holsman wrote:
> hi Gasol.
> shouldn't regeneration of the interface be part of the build process?
>
> On 20/07/2009, at 3:29 AM, Gasol Wu wrote:
>
>> hi,
>> the cassandra.thrift has changed.
>> u need to generate new python client and compile class again.
>>
>>
>> On Mon, Jul 20, 2009 at 1:18 AM,  wrote:
>> Hi guys
>> the new trunk cassandra doesnt work for a simple insert, how do we get
>> this working
>>
>> client.insert('Table1', 'tofu', 'Super1:Related:tofu
>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>>
>> ---
>> AttributeError                            Traceback (most recent call
>> last)
>>
>> /home/mark/work/cexperiments/ in ()
>>
>> /home/mark/work/common/cassandra/Cassandra.py in insert(self, table, key,
>> column_path, value, timestamp, block_for)
>>    358      - block_for
>>    359     """
>> --> 360     self.send_insert(table, key, column_path, value, timestamp,
>> block_for)
>>    361     self.recv_insert()
>>    362
>>
>> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self, table,
>> key, column_path, value, timestamp, block_for)
>>    370     args.timestamp = timestamp
>>    371     args.block_for = block_for
>> --> 372     args.write(self._oprot)
>>    373     self._oprot.writeMessageEnd()
>>    374     self._oprot.trans.flush()
>>
>> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>>   1923     if self.column_path != None:
>>   1924       oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
>> -> 1925       self.column_path.write(oprot)
>>   1926       oprot.writeFieldEnd()
>>   1927     if self.value != None:
>>
>> AttributeError: 'str' object has no attribute 'write'
>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu
>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>>
>>
>> --
>> Bidegg worlds best auction site
>> http://bidegg.com
>>
>
> --
> Ian Holsman
> i...@holsman.net
>
>
>
>


Re: a talk on building an email app on Cassandra

2009-07-20 Thread Jonathan Ellis
Nice!

On Mon, Jul 20, 2009 at 12:43 PM, Jun Rao wrote:
> Last Friday, I gave an IEEE talk on an email app that we built on top of
> Cassandra. Below is the link to the slides. I thought some of the people
> here might find this interesting.
>
> http://ewh.ieee.org/r6/scv/computer//nfic/2009/IBM-Jun-Rao.pdf
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>
> jun...@almaden.ibm.com
>


Fwd: thrift API changes

2009-07-20 Thread Jonathan Ellis
Oops, I sent this to the old google -user list by mistake the first
time.  Now that that's gone, I realized the error.


-- Forwarded message --
From: Jonathan Ellis 
Date: Mon, Jul 20, 2009 at 10:10 PM
Subject: Re: thrift API changes
To: cassandra-u...@googlegroups.com, cassandra-...@incubator.apache.org


Well, that was a long "week."  My fault -- as I commented on IRC, I
underestimated how long 185 would take as badly as I can remember
doing anywhere.

We're done with the big ones now.  (185, 240, 303, and 304).

Two more and then I think we can call it good for 0.4 from the
client's perspective: 232 and 300 (dealing with specifying the number
of replicas to wait for when reading/writing, respectively)

-Jonathan

On Wed, Jul 8, 2009 at 1:47 PM, Jonathan Ellis wrote:
> Hi all,
>
> Just a heads up that this is going to be The Week Of Breaking Things
> in the client api.  There are a bunch of long-standing problems that
> can't be fixed without making fundamental changes in the API so we are
> going to bite the bullet and get those done now.  We've already done
> CASSANDRA-261, -277, and -280; next up will be CASSANDRA-139, and
> eventually CASSANDRA-240 and friends.
>
> If you're on a version of trunk that Works For You, you  might want to
> resist the urge to svn up until the dust settles.
>
> -Jonathan


Re: trunk

2009-07-21 Thread Jonathan Ellis
the internals should be solid but we are in the middle (towards the
end of, actually) changing the thrift api pretty drastically.  (the
colons had to go, and the sooner we bit the bullet, the better. :)

see this thread --
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/200907.mbox/%3ce06563880907202024l49ce9ack3a10ead5d0e97...@mail.gmail.com%3e

On Tue, Jul 21, 2009 at 8:11 AM, Jonas Bonér wrote:
> Hi guys.
>
> How stable is trunk?
> I have been on trunk for pretty long now and no issues so far but
> Thanks.
>
> --
> Jonas Bonér
>
> twitter: @jboner
> blog:    http://jonasboner.com
> work:   http://crisp.se
> work:   http://scalablesolutions.se
> code:   http://github.com/jboner
>


Re: keys and column names cannot be utf-8

2009-07-21 Thread Jonathan Ellis
did you read the new section in the config xml explaining how to use a
UTF8 comparator?

also: thrift itself is just plain broken for unicode support in some
languages; see THRIFT-395

I think the short version is that when you have a java server, unicode
will work with java or C# clients but not with anything else

(so if you are using a python client for instance switching to jython
might be a workaround)

On Tue, Jul 21, 2009 at 4:00 PM,  wrote:
> Not fixed
> The following utf8 key names and column names still give an error.
> cass: 2009-07-21 13:55:35,597 error 98. ìµì§
>                                             ì¤ ìì§
> Ûïº] (1)icasso's, instruments de musique sur un guéridon] (1)Ûïº[irancel]
> (1)ïº
> cass: 2009-07-21 13:55:55,093 error 377. friday night lights
> s03e01[âmegaupload..50 error 321. instruments de musique sur un guéridon[[
> comâ
>     cass: 2009-07-21 13:56:12,341 error 637. asuka izumi photos[u15 ç«¥æãçé]
> (1)
> cass: 2009-07-21 13:56:39,380 error 1118. dragonball z games for pc[dragon
> balĺz pc games download] (1)
> cass: 2009-07-21 13:56:48,976 error 1301. ï»ïº­ïºïºï» ﺳ[ï»ïº­ïºïºï» ﺳ ï»
>
> 导æç³æµ·è¯±å¥¸å¯¼è´å¥³çèªæ] ((1)2009-07-21 13:56:55,352 error 1430.
> æç³æµ·[大å
> cass: 2009-07-21 13:56:59,287 error 1510. cinquième république[définition
> de  république?] (1)                                      å¯¼æç³æµ·] (1)
> cass: 2009-07-21 13:59:38,783 error 1842. navaratri kolu[doll festival in
> navratt
> ri golu] (1)
> cass: 2009-07-21 13:59:39,069 error 1846. tn lottery winning
> numbers[www.tnlottery] (1)
> cass: 2009-07-21 13:59:39,274 error 1850. www.buildabearville.com cheats[all
> the buildabear.com cheats and codes] (1)
> cass: 2009-07-21 13:59:39,773 error 1860. shippuuden 78[naruto shippuuden 78
> subbed torrent] (1)
>
> On Tue, Jul 21, 2009 at 10:34 AM, Eric Evans  wrote:
>>
>> On Tue, 2009-07-21 at 09:18 -0700, mobiledream...@gmail.com wrote:
>> > Is there any timeline on when commit 185 will be done as the utf8
>> > error still exists
>>
>> 185 was committed yesterday.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-185
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: keys and column names cannot be utf-8

2009-07-21 Thread Jonathan Ellis
On Tue, Jul 21, 2009 at 4:06 PM, Jonathan Ellis wrote:
> (so if you are using a python client for instance switching to jython
> might be a workaround)

that is, using the java thrift client, not the python ones.


Re: keys and column names cannot be utf-8

2009-07-21 Thread Jonathan Ellis
On Tue, Jul 21, 2009 at 4:18 PM,  wrote:
> Hey jonathan
> this is not in the wiki or any documentation.

this is trunk.  i wrote it a couple days ago.  feel free to step in
and update the wiki.

> does this work in python thrift

probably not, given the thrift utf8 bugs.  (but you could use
BytesType and at least you will get the right data back.)

> if it does - that would be perfect
> but this doesnt explain why keys cannot be utf8

because FB didn't write it and so far neither has anyone else.

-Jonathan


Re: keys and column names cannot be utf-8

2009-07-21 Thread Jonathan Ellis
On Tue, Jul 21, 2009 at 4:21 PM, Jonathan Ellis wrote:
>> does this work in python thrift
>
> probably not, given the thrift utf8 bugs.

to correct myself: now that we are using binary data in the thrift api
it can't screw us over.  so yes, UTF8Type should be fine.


Re: keys and column names cannot be utf-8

2009-07-21 Thread Jonathan Ellis
you may also want to specify CompareSubcolumnsWith.

On Tue, Jul 21, 2009 at 4:27 PM,  wrote:
> thanks jonathan
> trying this
> 
>
> On Tue, Jul 21, 2009 at 2:24 PM, Jonathan Ellis  wrote:
>>
>> On Tue, Jul 21, 2009 at 4:21 PM, Jonathan Ellis wrote:
>> >> does this work in python thrift
>> >
>> > probably not, given the thrift utf8 bugs.
>>
>> to correct myself: now that we are using binary data in the thrift api
>> it can't screw us over.  so yes, UTF8Type should be fine.
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


Re: keys and column names cannot be utf-8

2009-07-21 Thread Jonathan Ellis
guarantee?  in a pre-alpha trunk?  no, that is too strong a word.

but that's what *supposed* to work, so I will fix it if it doesn't. :)

On Tue, Jul 21, 2009 at 4:32 PM,  wrote:
> if this would be the conf/storage-conf.xml
>  Name="Standard1"  CompareWith="UTF8Type" FlushPeriodInMinutes="60"/>
> 
>  ColumnSort="Time"  CompareWith="UTF8Type" Name="StandardByTime1"/>
>  CompareSubcolumnsWith="UTF8Type" Name="Super1"/>
> Jonathan can you clarify if this will guarantee proper python thrift utf8
> behavior thanks
> On Tue, Jul 21, 2009 at 2:29 PM, Jonathan Ellis  wrote:
>>
>> you may also want to specify CompareSubcolumnsWith.
>>
>> On Tue, Jul 21, 2009 at 4:27 PM,  wrote:
>> > thanks jonathan
>> > trying this
>> > 
>> >
>> > On Tue, Jul 21, 2009 at 2:24 PM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> On Tue, Jul 21, 2009 at 4:21 PM, Jonathan Ellis
>> >> wrote:
>> >> >> does this work in python thrift
>> >> >
>> >> > probably not, given the thrift utf8 bugs.
>> >>
>> >> to correct myself: now that we are using binary data in the thrift api
>> >> it can't screw us over.  so yes, UTF8Type should be fine.
>> >
>> >
>> >
>> > --
>> > Bidegg worlds best auction site
>> > http://bidegg.com
>> >
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>


  1   2   3   4   5   6   7   8   9   >