Re: Storing large blobs

2010-03-17 Thread Avinash Lakshman
It is practically a seek and large streaming read. I do not believe this would be an issue. I have never run such a workload but a simple experiment should clear the air. Cheers Avinash On Wed, Mar 17, 2010 at 7:42 PM, Carlos Sanchez < carlos.sanc...@riskmetrics.com> wrote: > We could have blob

RE: Storing large blobs

2010-03-17 Thread Carlos Sanchez
We could have blob as large as 50mb compressed (XML compresses quite well). Typical documents we would deal with would be between 500K and 3MB Carlos From: Avinash Lakshman [avinash.laksh...@gmail.com] Sent: Wednesday, March 17, 2010 8:49 PM To: user@ca

Re: question about deleting from cassandra

2010-03-17 Thread Jonathan Ellis
That's a strange assumption. Users typically don't like their data being deleted without a very good reason. "We didn't have enough room" is not a very good reason. :) On Wed, Mar 17, 2010 at 9:03 PM, Bill Au wrote: > I would assume that Facebook and Twitter are not keep all the data that they

Re: question about deleting from cassandra

2010-03-17 Thread Bill Au
I would assume that Facebook and Twitter are not keep all the data that they store in Cassandra forever. I wonder how are they deleting old data from Cassandra... Bill On Mon, Mar 15, 2010 at 1:01 PM, Weijun Li wrote: > OK I will try to separate them out. > > > On Sat, Mar 13, 2010 at 5:35 AM,

Re: Storing large blobs

2010-03-17 Thread Avinash Lakshman
My question would be how large is large? Perhaps you could compress the blobs and then store them. But it depends on the answer to the first question. Cheers Avinash On Wed, Mar 17, 2010 at 5:10 PM, Carlos Sanchez < carlos.sanc...@riskmetrics.com> wrote: > Has anyone had experience storing large

Re: Storing large blobs

2010-03-17 Thread Jonathan Ellis
It's not tailored for it, but it works "well enough" for some applications. Better than having to deal with two different data stores. On Wed, Mar 17, 2010 at 8:10 PM, Carlos Sanchez wrote: > Has anyone had experience storing large blobs in Cassandra? Is really > Cassandra tailored for large co

Storing large blobs

2010-03-17 Thread Carlos Sanchez
Has anyone had experience storing large blobs in Cassandra? Is really Cassandra tailored for large content? Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or other

get_string_property(token map) freezes for cluster

2010-03-17 Thread Weijun Li
get_string_property(token map) worked for one node on localhost, but it freezed when I was trying to call it against a cluster of 6 nodes. What's the correct way to return the list of all nodes in a cluster? Thanks, -Weijun

nodetool-compact duplicated data files again and again

2010-03-17 Thread Weijun Li
I'm testing the ExpiringColumn patch in 0.6-beta2, inserted 26GB data with TTL, after columns have expired I use get_slice to verify that no columns can be retrieved. When I run "nodetool compact" I think all data should be gone. But the problem is: 1) After the first nodetool-comact, Cassandra du

Re: Dividing the client load between machines in Cassandra

2010-03-17 Thread Sonny Heer
Opps. Yep, thanks! On Wed, Mar 17, 2010 at 1:47 PM, Jonathan Ellis wrote: > You didn't call tr.open() ? > > On Wed, Mar 17, 2010 at 3:45 PM, Sonny Heer wrote: >> I'm getting: >> org.apache.thrift.transport.TTransportException: Cannot write to null >> outputStream >>        at >> org.apache.thr

Re: upgrade path 0.5.1 to 0.6.X

2010-03-17 Thread Jonathan Ellis
It's on my wish list, as well as my "as soon as someone needs it badly enough, he's welcome to contribute the code" list. :) On Wed, Mar 17, 2010 at 3:53 PM, B. Todd Burruss wrote: > i see this in the upgrade notes: > >   - 0.6 network traffic is not compatible with earlier versions.  You >     w

Re: upgrade path 0.5.1 to 0.6.X

2010-03-17 Thread B. Todd Burruss
i see this in the upgrade notes: - 0.6 network traffic is not compatible with earlier versions. You will need to shut down all your nodes at once, upgrade, then restart. is there a plan to version the protocol used so new versions should be compatible with older ones until deprecated?

Re: Dividing the client load between machines in Cassandra

2010-03-17 Thread Jonathan Ellis
You didn't call tr.open() ? On Wed, Mar 17, 2010 at 3:45 PM, Sonny Heer wrote: > I'm getting: > org.apache.thrift.transport.TTransportException: Cannot write to null > outputStream >        at > org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:137) >        at > org.

Re: Dividing the client load between machines in Cassandra

2010-03-17 Thread Sonny Heer
I'm getting: org.apache.thrift.transport.TTransportException: Cannot write to null outputStream at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:137) at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:152) at org.apa

Re: upgrade path 0.5.1 to 0.6.X

2010-03-17 Thread Jonathan Ellis
upgrading is always covered in NEWS, e.g. https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.6/NEWS.txt On Wed, Mar 17, 2010 at 3:21 PM, Joseph Stein wrote: > since it looks like 0.6.X is making its way towards GA, what is the > data migration path for folks running 0.5.1? thanks

upgrade path 0.5.1 to 0.6.X

2010-03-17 Thread Joseph Stein
since it looks like 0.6.X is making its way towards GA, what is the data migration path for folks running 0.5.1? thankstrying to decide if to go to production with 0.6 beta3 or 0.5.1 and upgrade. /* Joe Stein http://www.linkedin.com/in/charmalloc */

Re: Atomic Operations

2010-03-17 Thread Ted Zlatanov
On Wed, 17 Mar 2010 16:44:50 -0300 Juan Manuel García del Moral wrote: JMGdM> That's exactly what I need, Do you have an idea how can I JMGdM> implement this through the C++ API? Sorry, I've only worked with the Java and Perl interfaces. I'll have to use Cassandra with C++ later this year (lo

Re: Atomic Operations

2010-03-17 Thread Juan Manuel García del Moral
also += 1 was just an example, it could be += N, += 15 or whatever, it depends on certain conditions , but I think it would work as well El 17 de marzo de 2010 16:44, Juan Manuel García del Moral < juanman...@taringa.net> escribió: > That's exactly what I need, Do you have an idea how can I impl

Re: Atomic Operations

2010-03-17 Thread Juan Manuel García del Moral
That's exactly what I need, Do you have an idea how can I implement this through the C++ API? Many thanks 2010/3/17 Ted Zlatanov > On Wed, 17 Mar 2010 16:29:27 -0300 Juan Manuel García del Moral < > juanman...@taringa.net> wrote: > > JMGdM> I have this: > JMGdM> SocialAds.Anonimos['145']['Tag'

Re: Atomic Operations

2010-03-17 Thread Ted Zlatanov
On Wed, 17 Mar 2010 16:29:27 -0300 Juan Manuel García del Moral wrote: JMGdM> I have this: JMGdM> SocialAds.Anonimos['145']['Tag']['12'] = 13 JMGdM> I would need to to JMGdM> SocialAds.Anonimos['145']['Tag']['12'] += 1 JMGdM> for example JMGdM> with that, would be enough for now JMGdM>

Re: Atomic Operations

2010-03-17 Thread Juan Manuel García del Moral
Sorry I have this: SocialAds.Anonimos['145']['Tag']['12'] = 13 I would need to to SocialAds.Anonimos['145']['Tag']['12'] += 1 for example with that, would be enough for now I don't want to retrieve the value, do 13+1 in my code, and re-set() it to 14 El 17 de marzo de 2010 16:28, Juan

Re: Atomic Operations

2010-03-17 Thread Juan Manuel García del Moral
I have this: SocialAds.Anonimos['145']['Tag']['12'] = 2010/3/17 Ted Zlatanov > On Wed, 17 Mar 2010 16:05:48 -0300 Juan Manuel García del Moral < > juanman...@taringa.net> wrote: > > JMGdM> So I would have to retrieve (client.get()) the value, then > JMGdM> increment it and update it (client.set(

Re: Atomic Operations

2010-03-17 Thread Ted Zlatanov
On Wed, 17 Mar 2010 16:05:48 -0300 Juan Manuel García del Moral wrote: JMGdM> So I would have to retrieve (client.get()) the value, then JMGdM> increment it and update it (client.set()) again? with the JMGdM> inconsistency risk this two operations imply... Can you explain what you are trying t

Re: Atomic Operations

2010-03-17 Thread Jesse McConnell
ya, this access pattern doesn't really ring true for cassandra, at least imo jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Wed, Mar 17, 2010 at 14:14, Ned Wolpert wrote: > I could be wrong, but I would say that even if the thrift syntax gave you a > 'get/set' you want, you would still

Re: Atomic Operations

2010-03-17 Thread Ned Wolpert
I could be wrong, but I would say that even if the thrift syntax gave you a 'get/set' you want, you would still have the 'inconsistency' risk primarily because of the Eventually Consistency ( http://www.allthingsdistributed.com/2008/12/eventually_consistent.html) as implementation in Cassandra. (No

Re: Atomic Operations

2010-03-17 Thread Sandeep Kalidindi
@juan - for now yes. But as far as i remember, some guys from Digg are trying to implement counters. Don't know how complete it is and when it can be available. But for now such feature is not there. Cheers, Deepu. 2010/3/18 Juan Manuel García del Moral > So I would have to retrieve (client.ge

Re: Atomic Operations

2010-03-17 Thread Juan Manuel García del Moral
So I would have to retrieve (client.get()) the value, then increment it and update it (client.set()) again? with the inconsistency risk this two operations imply... annoying thanks for your help 2010/3/17 Jesse McConnell > afaik, nope > > -- > jesse mcconnell > jesse.mcconn...@gmail.com >

Re: Atomic Operations

2010-03-17 Thread Jesse McConnell
afaik, nope -- jesse mcconnell jesse.mcconn...@gmail.com 2010/3/17 Juan Manuel García del Moral : > Hello > > I would like to know if there is any method available to update > (increment/decrement) INTEGER values in Columns/SuperColumns, and how I can > use this method thru the C++ API > > Than

Atomic Operations

2010-03-17 Thread Juan Manuel García del Moral
Hello I would like to know if there is any method available to update (increment/decrement) INTEGER values in Columns/SuperColumns, and how I can use this method thru the C++ API Thanks Juan

Re: Dividing the client load between machines in Cassandra

2010-03-17 Thread Sonny Heer
Cool thanks Todd. I'd be interested at some point to see the updated .6 version as well. Thanks again! On Wed, Mar 17, 2010 at 9:24 AM, B. Todd Burruss wrote: > below is the commented out code i once used.  i think it is from the 0.5 > days, so it might not even work now.  not sure.  the bootst

Re: Key scan

2010-03-17 Thread Jonathan Ellis
you can iterate through keys w/ get_range_slice; in 0.6 this works w/ all partitioners On 3/17/10, Marcus Herou wrote: > Hi. > > I have started to evaluate some KeyValue Stores and some Document > Stores to find the best fit. > > I wonder how I as a developer can iterate over all keys and/or > en

Re: Model to store biggest score

2010-03-17 Thread Brandon Williams
On Wed, Mar 17, 2010 at 11:48 AM, Richard Grossman wrote: > But in the case of simple column family I've the same problem when I update > the score of 1 user then I need to remove his old score too. For example > here the user uid5 was at 130 now he is at 140 because I add the random > number cass

Re: Model to store biggest score

2010-03-17 Thread Richard Grossman
But in the case of simple column family I've the same problem when I update the score of 1 user then I need to remove his old score too. For example here the user uid5 was at 130 now he is at 140 because I add the random number cassandra will keep all the score evolution. get Keyspace2.topScoreUse

Re: Dividing the client load between machines in Cassandra

2010-03-17 Thread B. Todd Burruss
below is the commented out code i once used. i think it is from the 0.5 days, so it might not even work now. not sure. the bootstrapHostArr is simply a list of host information used to bootstrap the process. connectToHost is a method used to generate a Cassandra.Client object. there is sam

Re: Model to store biggest score

2010-03-17 Thread Brandon Williams
On Wed, Mar 17, 2010 at 11:13 AM, Toby DiPasquale wrote: > > Couldn't you just use a supercolumn whose keys were the score and the > subcolumns were username:true? Basically using the subcolumns as a > list? > Sure, but that complicates getting the top N scores. You'd have to use the OrderedPart

Re: Model to store biggest score

2010-03-17 Thread Toby DiPasquale
On Wed, Mar 17, 2010 at 12:10 PM, Brandon Williams wrote: > On Wed, Mar 17, 2010 at 11:05 AM, Richard Grossman > wrote: >> >> Thanks, But what do you mean by ? >> >>> pack a random integer after the score (so the sort order is maintained) >>> in big endian format and only examine the first 8 byte

Re: Model to store biggest score

2010-03-17 Thread Brandon Williams
On Wed, Mar 17, 2010 at 11:05 AM, Richard Grossman wrote: > Thanks, But what do you mean by ? > > pack a random integer after the score (so the sort order is maintained) in >> big endian format and only examine the first 8 bytes of the column upon >> retrieval. >> >> -Brandon >> > > Do I need to t

Re: Model to store biggest score

2010-03-17 Thread Richard Grossman
Thanks, But what do you mean by ? pack a random integer after the score (so the sort order is maintained) in > big endian format and only examine the first 8 bytes of the column upon > retrieval. > > -Brandon > Do I need to take the score and add like -number like 100-1, 100-2, 100-3 etc... to pr

Key scan

2010-03-17 Thread Marcus Herou
Hi. I have started to evaluate some KeyValue Stores and some Document Stores to find the best fit. I wonder how I as a developer can iterate over all keys and/or entries ? Is it possible ? Let's say I have putted a huge amount of data into Cassandra and finds out that I probably should index bot

Incrementing a value through Cassandra C++

2010-03-17 Thread Juan Manuel García del Moral
Hello everybody I need to have a columns within a supercolumn of type INTEGER (or LONG, does not matter) I think I'm not defining it properly: this is what I have in my storage-conf.xml org.apache.cassandra.locator.RackUnawareStrategy 1 org.apache.cassandra.locator.EndPointSnitc

Re: Model to store biggest score

2010-03-17 Thread Brandon Williams
On Wed, Mar 17, 2010 at 10:38 AM, Richard Grossman wrote: > Hi, > > I trying to find a model where I can keep the list of biggest score for > users. > it's seems simple but I'm stuck here . > For example user1 score = 10 > user2 score = 20 > user3 score = 30

Model to store biggest score

2010-03-17 Thread Richard Grossman
Hi, I trying to find a model where I can keep the list of biggest score for users. it's seems simple but I'm stuck here . For example user1 score = 10 user2 score = 20 user3 score = 30 Query: Top score (2) = user3, user2 If someone have made something simil

Re: Re: failover exception with hector

2010-03-17 Thread Ran Tavory
let's continue this offline. user@ to bcc Would be helpful if: - You cancheck with jconsole how hector sees the ring: connect to the java client running hector and check me.prettyprint.hector if the list of known hosts is consistent with your setup? - Send all recent log lines so I have better un

Re: Sparse vs dense index

2010-03-17 Thread alex kamil
yep, I'll probably try both I don't think there is anything out there which can beat in-memory db in terms of bulk throughput (e.g http://cs.nyu.edu/cs/faculty/shasha/papers/sigmodpap.pdf) but will see how far we can get with open source tools and using a combination of persistent storage and cach

Re: Sparse vs dense index

2010-03-17 Thread Jonathan Ellis
I guess if you are going to read the full 5MB at once then that makes more sense. But if you are going to slice it or access parts by column name then the other does. On Tue, Mar 16, 2010 at 12:15 PM, alex kamil wrote: > which index structure would fit Cassandra more naturally and perform better

Re: exception when adding new node

2010-03-17 Thread Jonathan Ellis
This is harmless. 2010/3/17 casablinca126.com : >  hello , >        I try to add a new node to a 2-node cluster, an exception occured > while transferring data to the new node: > > WARN - Running on default stage - beware > WARN - Problem reading from socket connected to : > java.nio.channels.So

exception when adding new node

2010-03-17 Thread casablinca126.com
hello , I try to add a new node to a 2-node cluster, an exception occured while transferring data to the new node: WARN - Running on default stage - beware WARN - Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/192.168.13.39:35182 remote=/19

Re: Re: failover exception with hector

2010-03-17 Thread casablinca126.com
Ran, I ran the nodeprobe tool , and got the correct ring infomation. Address Status Load Range Ring 85080781597816482766914734169501403890 192.168.13.40 Up 38.18 GB 2604993082310184907

Re: Cassandra and hadoop?

2010-03-17 Thread Johan Oskarsson
Hi Matteo, * Hadoop MapReduce can talk to Cassandra and process the data just like other input formats does from HDFS. But I would not recommend seeing Cassandra as a first class replacement for HDFS, they are two very different beasts. It will most likely always be a lot faster to let MapRed