Re: About the replication strategy of Cassandra

2010-03-13 Thread Jonathan Ellis
On Sat, Mar 13, 2010 at 3:47 PM, Kauzki Aranami wrote: > Does each replication strategy of "Rack Unaware" "Rack Aware(within a > datacenter)" "Datacenter Aware" in Cassandra depend by the algorithm > adopted respectively? I still don't understand, sorry. > For instance, what is the strategy reco

Re: question about deleting from cassandra

2010-03-13 Thread Jonathan Ellis
a. All > based on 0.6 beta2. > > -Weijun > > -Original Message- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Saturday, March 13, 2010 5:36 AM > To: cassandra-user@incubator.apache.org > Subject: Re: question about deleting from cassandra > > You

Re: About the replication strategy of Cassandra

2010-03-13 Thread Jonathan Ellis
On Sat, Mar 13, 2010 at 12:51 AM, Kauzki Aranami wrote: > 1. Please give notes the replication strategy of Cassandra is selected. Can you be more specific? > 2. About the Zab protocol adopted with Zookeeper. The weak point of > the Paxos protocol of Chubby is a delay. Is the Zab protocol more >

Re: question about deleting from cassandra

2010-03-13 Thread Jonathan Ellis
You should submit your minor change to jira for others who might want to try it. On Sat, Mar 13, 2010 at 3:18 AM, Weijun Li wrote: > Tried Sylvain's feature in 0.6 beta2 (need minor change) and it worked > perfectly. Without this feature, as far as you have high volume new and > expired columns y

Re: Cassandra Demo/Tutorial Applications

2010-03-12 Thread Jonathan Ellis
On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar wrote: > I was looking at this from CASSANDRA-873 as well as hands-on homework (!) > for my OSCON tutorial. Have couple of questions. Would appreciate insights: > > A)  Cassandra-873 suggests Luenandra as one demo application > B)  Are there other id

Re: Cassandra 0.5.1 get_key_range problem

2010-03-12 Thread Jonathan Ellis
get_key_range is deprecated. You should use get_range_slice. On Fri, Mar 12, 2010 at 3:59 PM, Jon Graham wrote: > Hello, > > When using the get_key_range method with ConsistencyLevel.ONE an entire > block of keys is not returned. > I loop over the get_key_range method, advancing the start key af

Re: Grails Cassandra plugin

2010-03-12 Thread Jonathan Ellis
Great! You should also link it from http://wiki.apache.org/cassandra/ClientExamples (click "Login" at the top to create an account.) On Fri, Mar 12, 2010 at 3:57 PM, Ned Wolpert wrote: > Folks- > >   I put together a quick n' dirty grails plugin for Cassandra, wrapped with > Hector. Its availabl

Re: How to force GC in Cassandra?

2010-03-12 Thread Jonathan Ellis
I think you mean compaction? You can use nodeprobe / nodetool for that. http://wiki.apache.org/cassandra/NodeProbe On Fri, Mar 12, 2010 at 12:40 PM, Weijun Li wrote: > Suppose I insert a lot of new items but also delete a lot of new items > daily, it will be ideal if I can force GC to happen du

Re: get_range_slice(s) question

2010-03-12 Thread Jonathan Ellis
That would be a bug, not intended behavior. Can you open a ticket? On Fri, Mar 12, 2010 at 11:48 AM, Omer van der Horst Jansen wrote: > I've noticed that both 0.5.1 and 0.6b2 return (ReplicationFactor) > identical copies of the data stored in my keyspace whenever I make a > call to get_range_sli

Re: libcassandra - C++ Cassandra Client

2010-03-11 Thread Jonathan Ellis
Cool! On Thu, Mar 11, 2010 at 11:12 PM, Padraig O'Sullivan wrote: > We have developed a C++ client library based on the hector Java client > for Cassandra that we intend on using for Drizzle integration. This > library is still very much alpha and more features will be added while > we work on dr

Re: SuperColumn.getSubColumns() ordering

2010-03-11 Thread Jonathan Ellis
it's ordered by the column name as determined by the subcolumn comparator you declared in the definition, yes On Thu, Mar 11, 2010 at 12:24 PM, Matteo Caprari wrote: > Hi. > > If I iterate over SuperColumn.getSubColumn(), do I get > columns sorted by the column name? > > Thanks. > -- > :Matteo Ca

Re: Effective allocation of multiple disks

2010-03-11 Thread Jonathan Ellis
Except that for a major compaction the whole thing gets put in one directory. That's the problem w/ the JBOD approach. On Thu, Mar 11, 2010 at 12:01 PM, Eric Evans wrote: > On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote: >> On Wed, Mar 10, 2010 at 9:31 PM, Anthony Moli

Re: Hackathon?!?

2010-03-11 Thread Jonathan Ellis
wrote: > We could do it on April 22 (1 week later), that's my birthday :-) What better > way to celebrate haha. > > -Chris > > On Mar 10, 2010, at 9:58 AM, Jonathan Ellis wrote: > >> I'm in either way, but if we push it a week later then the twitter >>

Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro wrote: > I would almost > recommend just keeping things simple and removing multiple data directories > from the config altogether and just documenting that you should plan on using > OS level mechanisms for growing diskspace and io. I think that

Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-10 Thread Jonathan Ellis
I think he means how the column names are rendered as bytes but the values are strings. On Wed, Mar 10, 2010 at 5:22 PM, Brandon Williams wrote: > On Wed, Mar 10, 2010 at 5:09 PM, Bill Au wrote: >> >> I am checking out 0.6.0-beta2 since I need the batch-mutate function.  I >> am just trying to r

Re: Testing row cache feature in trunk: write should put record in cache

2010-03-10 Thread Jonathan Ellis
stands I'll take you up on it. I took a crack at it in > https://issues.apache.org/jira/browse/CASSANDRA-860 - also in large part to > get my feet wet with the code. > > -----Original Message- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Tuesday, Februar

Re: NoSQL live tomorrow

2010-03-10 Thread Jonathan Ellis
http://nosqlboston.eventbrite.com/ don't know about recording / casting plans. On Wed, Mar 10, 2010 at 3:25 PM, Tim Haines wrote: > Hey Jonathan, > What event is this and will it be livecasted/recorded? > Cheers, > Tim. > > On Thu, Mar 11, 2010 at 10:21 AM, Jonathan E

NoSQL live tomorrow

2010-03-10 Thread Jonathan Ellis
Ryan King and I will have 20 minutes to talk about Cassandra in the Lab part of the program. 20 minutes isn't enough to present a whole lot in a structured manner so we are planning to just do Q&A the whole time. So if you are going to be there, come with your questions. I will also bring a few

Re: Hackathon?!?

2010-03-10 Thread Jonathan Ellis
I'm in either way, but if we push it a week later then the twitter guys could (a) make it and (b) pimp it at their own conference. On Wed, Mar 10, 2010 at 12:26 AM, Jeff Hodges wrote: > Ah, hell. Thought this was the first day. Can't make it. > -- > Jeff > > On Mar 9, 2010 9:32 PM, "Ryan King" w

Re: schema design question

2010-03-10 Thread Jonathan Ellis
> } > > flat_mutation_map = { >        'example_item': { >                'Item_Info': [ >                        Mutation(Column('title', 'an_article')), >                        Mutation(Column('link', 'www.example.com'))

Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
Thanks for testing that, added a note to http://wiki.apache.org/cassandra/CassandraHardware on stripe size. On Wed, Mar 10, 2010 at 11:03 AM, B. Todd Burruss wrote: > with the file sizes we're talking about with cassandra and other database > products, the stripe size doesn't seem to matter.  i s

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-10 Thread Jonathan Ellis
For the record, I note that "no row cache" is the default on user-defined CFs; we include it in the sample configuration file as an example only. On Wed, Mar 10, 2010 at 9:58 AM, Sylvain Lebresne wrote: >> So did you disable the row cache entirely? > > Yes (getting back reasonable performances).

Re: Login Failure Error

2010-03-10 Thread Jonathan Ellis
error ? > > On Wed, Mar 10, 2010 at 6:49 PM, Jonathan Ellis wrote: >> >> Please don't use trunk unless you're actively fixing bugs.  If you >> want the latest & greatest, get the 0.6 branch from svn. >> >> On Wed, Mar 10, 2010 at 6:46 AM, shirish w

Re: Login Failure Error

2010-03-10 Thread Jonathan Ellis
Please don't use trunk unless you're actively fixing bugs. If you want the latest & greatest, get the 0.6 branch from svn. On Wed, Mar 10, 2010 at 6:46 AM, shirish wrote: > hello, > > I have just download the source code from the trunk using svn, I have set up > the following configuration > > C

Re: atomicity across keys and secondary index support

2010-03-09 Thread Jonathan Ellis
Atomicity: no. 2ary indexes: CASSANDRA-749 is targeting the 0.8 release 2010/3/9 Patricio Echagüe : > Hey Jonathan, has there been any update on this feature? > > Thanks a lot > Patricio > > On Thu, Dec 3, 2009 at 2:35 PM, Jonathan Ellis wrote: >> >> that is still

Re: Hackathon?!?

2010-03-09 Thread Jonathan Ellis
I can make it. \o/ On Tue, Mar 9, 2010 at 8:05 PM, Dan Di Spaltro wrote: > Alright guys, we have settled on a date for the Cassandra meetup on... > April 15th, better known as, Tax day! > We can host it here at Cloudkick, unless a cooler startup wants to host it. > http://maps.google.com/maps/ms?

Re: IllegalStateException: Queue full

2010-03-09 Thread Jonathan Ellis
v2 of patch attached to #864 (replaces old one) On Tue, Mar 9, 2010 at 6:08 PM, Todd Burruss wrote: > using tip of 0.6 branch with 864.txt patch.  i have 4 nodes, one node is > overcome with compaction right now.  i started with no load then added a tiny > bit of load and almost immediately got

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari wrote: > On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis wrote: >> That's true.  So you'd want to use a custom comparator where first 64 >> bits is the Long and the rest is the userid, for instance. >> >> (Long +

Re: no longer in storage-conf.xml in 0.6

2010-03-09 Thread Jonathan Ellis
It's no longer used. And it was always assumed that ControlPort and StoragePort are the same across all instances; you run multiple instances on a single machine by varying the IP address, not the ports. On Tue, Mar 9, 2010 at 1:21 PM, Bill Au wrote: > I am checking out the 0.6 release since I n

Re: another ConcurrentModificationException

2010-03-09 Thread Jonathan Ellis
Cool, you're doing a great job finding these. :) Can you create a ticket? On Tue, Mar 9, 2010 at 11:57 AM, B. Todd Burruss wrote: > using cassandra-0.6.0-beta2/ > > > 2010-03-09 09:17:26,827 ERROR [pool-1-thread-675] [Cassandra.java:1166] > Internal error processing get > java.util.ConcurrentMod

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 8:31 AM, Sylvain Lebresne wrote: > Well, unless I'm mistaking, that's the same in my example as I give in > both case > to stress.py the option '-c 1' which tells it to retrieve only one > column each time > even in the case where I have 100 columns by row. Oh. Why would y

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:15 AM, Sylvain Lebresne wrote: >  1) stress.py -t 10 -o read -n 5000 -c 1 -r >  2) stress.py -t 10 -o read -n 50 -c 1 -r > > In the case 1) I get around 200 reads/seconds and that's pretty stable. The > disk is spinning like crazy (~25% io_wait), very few cpu or me

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 3:53 AM, Matteo Caprari wrote: > Thanks Jonathan. > > Correct if I'm wrong: you are suggesting that each time we receive a new > row (item, [users]) we do 2 operations: > > 1) insert (or merge) this row 'as it is' (item, [users]) > 2) for each user in [users]: insert  (user,

Re: DigestMismatchException

2010-03-08 Thread Jonathan Ellis
ause of > "quorum") and one of them must have been from the "third" replica that may > not have been updated yet by async replication? > > On Mon, 2010-03-08 at 15:36 -0800, Jonathan Ellis wrote: > > It means that you're doing a lot of reads that saw mult

Re: schema design question

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari wrote: > The 'key' queries are: These map straightforwardly to one CF per query. > - list all the items a user liked row key is user id, columns names are timeuuid of when the like-ing occurred, column value is either item id, or a supercolumn cont

Re: Cassandra latency question

2010-03-08 Thread Jonathan Ellis
something is screwed up if writes are 10x slower than reads On Mon, Mar 8, 2010 at 5:52 PM, David Dabbs wrote: > > Hello. I've been running the vPork load generator against two Cassandra > nodes running in VMs. > I'm running a trunk build with W=2 and R=1 and out-of-the-box JVM_OPTS which > shoul

Re: DigestMismatchException

2010-03-08 Thread Jonathan Ellis
It means that you're doing a lot of reads that saw multiple versions of the answer, which depending on your workload may be normal On Mon, Mar 8, 2010 at 5:31 PM, B. Todd Burruss wrote: > i am seeing a lot of these INFO level messages in cassandra server's logs: > > 2010-03-08 15:30:08,123  INFO

Re: Incr/Decr Counters in Cassandra

2010-03-08 Thread Jonathan Ellis
On Sat, Mar 6, 2010 at 4:59 PM, simon.reavely wrote: > Is there a place on the Cassandra wiki where the proposals/thinking on these > issues has been captured in one place? The wiki is a terrible place for proposals. Use the ML for those, and use JIRA when you start to actually generate code. h

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 12:07 PM, Erik Holstad wrote: > So why is it again that the value field in the Column cannot be null if it > is not the > value field in the map, but just a part of the value field? Because without a compelling reason to allow nulls, the best policy is not to do so. > All

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad wrote: > I was probably a little bit unclear here. I'm wondering about the two byte[] > in Column. > One for name and one for value. I was under the impression that the > skiplistmap > wraps the Columns, not that the name and the value are themselves i

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 11:07 AM, Erik Holstad wrote: > Why is it that null column values are not allowed? It's semantically unnecessary and potentially harmful at an implementation level. (Many java Map implementations can't distinguish between a null key and a key that is not present.) > What

Re: gem install cassandra fails on windows

2010-03-07 Thread Jonathan Ellis
Those of us who don't use Windows or Ruby might be able to help better if you included the actual error message. :) On Sun, Mar 7, 2010 at 2:19 PM, Erez Efrati wrote: > Hi, > I tried doing "gem install cassandra" and it failed trying to run the nmake. > Is there a way to get the cassandra gem ins

Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-07 Thread Jonathan Ellis
2010 at 5:47 PM, Jonathan Ellis wrote: >> I think http://wiki.apache.org/cassandra/CassandraHardware answers >> most of your questions. >> >> If possible, it's definitely useful to try out a small fraction of >> your anticipated workload against a test cluster, e

Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-06 Thread Jonathan Ellis
I think http://wiki.apache.org/cassandra/CassandraHardware answers most of your questions. If possible, it's definitely useful to try out a small fraction of your anticipated workload against a test cluster, even a single node, before finalizing your production hardware purchase. On Sat, Mar 6, 2

Re: Incr/Decr Counters in Cassandra

2010-03-06 Thread Jonathan Ellis
First, SimpleDB is probably not built on Dynamo. And the devil is in the details. I haven't seen anyone propose a reasonable model for how Conditional Puts work (that is the tough one). On Sat, Mar 6, 2010 at 8:11 AM, simon.reavely wrote: > > Werner Vogels had a recent post around Amazon's supp

Re: Unreliable transport layer

2010-03-05 Thread Jonathan Ellis
In 0.6 gossip is over TCP. On Fri, Mar 5, 2010 at 6:54 PM, Ashwin Jayaprakash wrote: > Hey guys! I have a simple question. I'm a casual observer, not a real > Cassandra user yet. So, excuse my ignorance. > > I see that the Gossip feature uses UDP. I was curious to know if you guys > faced issues

Re: ConcurrentModificationException

2010-03-05 Thread Jonathan Ellis
Fixed, thanks. On Fri, Mar 5, 2010 at 11:12 AM, B. Todd Burruss wrote: > https://issues.apache.org/jira/browse/CASSANDRA-853 > > On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote: > > This is the 0.6 beta yes? Looks like a regression, please open a ticket. > > On Thu

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 1:36 PM, shiv shivaji wrote: > Sorry, how to get compaction progress with 0.6. Is it in nodetool or > somewhere else? I tried a few options after nodetool and did not get this > info. it's under CompactionManager in jmx. I'm not sure if nodetool exposes this but it's easy

Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Jonathan Ellis
Generally, you want to have different types of data in different CFs so you can tune them separately (key / row caches). Mixing different row types in one CF also makes doing get_slice_range scans difficult. On Fri, Mar 5, 2010 at 12:04 PM, Erik Holstad wrote: > What are the benefits of using mu

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 2:13 AM, shiv shivaji wrote: > 1. Is there a way to estimate the time it would take to compact this work > load? I hope the load balancing will be much faster after the compaction. > Curious how fast I can get the transfer once compaction is done. 0.6 gives you compaction p

Re: Memtable size and garbage collection in JVM

2010-03-04 Thread Jonathan Ellis
A lot of churn is hard on Cassandra because of http://wiki.apache.org/cassandra/DistributedDeletes, but Cassandra is so fast that it may make up for that depending on your needs. It's not designed to eliminate i/o entirely, no. But if you set RowsCached=100% in 0.6 you'll get something pretty clo

Re: ConcurrentModificationException

2010-03-04 Thread Jonathan Ellis
This is the 0.6 beta yes? Looks like a regression, please open a ticket. On Thu, Mar 4, 2010 at 8:54 PM, Todd Burruss wrote: > i'm seeing a lot of these ... any idea? > > 2010-03-04 18:53:21,455 ERROR [MEMTABLE-POST-FLUSHER:1] > [DebuggableThreadPoolExecutor.java:94] Error in executor futuretas

Re: Questions while evaluating Cassandra

2010-03-04 Thread Jonathan Ellis
On Thu, Mar 4, 2010 at 2:51 AM, Eran Kutner wrote: > On Tue, Mar 2, 2010 at 15:44, Jonathan Ellis wrote: >> >> On Tue, Mar 2, 2010 at 6:43 AM, Eran Kutner wrote: >> > Is the procedure described in the description of ticket CASSANDRA-44 really >> > the way t

Re: Help with Replication Issue

2010-03-04 Thread Jonathan Ellis
On Thu, Mar 4, 2010 at 1:33 PM, joe smith wrote: > Hi, > > I installed a cluster of 2 nodes using 0.5 version of binary distribution. > Node 1 is on a Macbook 10.4 w/SoyLatte (java 1.6 port). Node 2 is on a > Linux desktop. The configuration is straight out of the distribution - > except the host

Re: What's the ideal size of a column?

2010-03-04 Thread Jonathan Ellis
> > http://www.facebook.com/note.php?note_id=76191543919&ref=mf > http://developer.yahoo.net/blog/archives/2009/07/mobstor.html > CB > 2010/3/2 Jonathan Ellis >> >> On Tue, Mar 2, 2010 at 11:57 PM, Cool BSD wrote: >> > Be short - what's the ideal column size in r

Re: Using Cassandra via the Erlang Thrift Client API (HOW ??)

2010-03-04 Thread Jonathan Ellis
On Thu, Mar 4, 2010 at 11:27 AM, J T wrote: > Hi All, > Many thanks for the responses. You helped enormously. I was in fact talking > to the wrong port. It should have been 9160 rather than . > I just assumed that the port the cassandra server displayed when starting up > was the one I should

Re: map/reduce question

2010-03-04 Thread Jonathan Ellis
0.6 has Hadoop map/reduce support (see contrib/wordcount for an example) but this is more for analytics (small numbers of large queries) than live web app style load. Cassandra also has range queries (get_slice_range) -- performing a slice across sequential rows, where you don't necessarily know a

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-04 Thread Jonathan Ellis
a little slow, but I > will open a new thread on that if needed. > > Thanks, Shiv > > > ____ > From: Jonathan Ellis > To: cassandra-user@incubator.apache.org > Sent: Wed, March 3, 2010 9:21:28 AM > Subject: Re: Anti-compaction Diskspa

Re: Using Cassandra via the Erlang Thrift Client API (HOW ??)

2010-03-04 Thread Jonathan Ellis
You probably need to switch the server to framed thrift mode. On Thu, Mar 4, 2010 at 2:02 AM, J T wrote: > Hi, > I've been trying to piece together some notion of how to use cassandra from > an erlang client. > So far I have managed to come up with the following, but it doesn't work. > Unfortunat

Re: failed to identify others in a 3-node ring

2010-03-03 Thread Jonathan Ellis
You probably assigned all nodes the same token. Don't do that. :) On Wed, Mar 3, 2010 at 4:41 AM, Pahud wrote: > Hello list, > I just setup a 3-node ring in a virtualbox bridging environment. By running > the 'cassandra -f' the log indicates it discovers other nodes but if I > execute 'nodeprobe

Re: finding Cassandra servers

2010-03-03 Thread Jonathan Ellis
We appear to be reaching consensus that this is solving a non-problem, so I have closed that ticket. 2010/3/3 Ted Zlatanov : > On Wed, 3 Mar 2010 12:08:06 -0500 Ian Holsman wrote: > > IH> We could create a branch or git fork where you guys could develop it, > IH> and if it reaches a usable state

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-03 Thread Jonathan Ellis
ution you mentioned is likely worthy of consideration as the > load balancing is taking a while. > > I will track the jira issue of anticompaction and diskspace. Thanks for the > pointer. > > > Thanks, Shiv > > > > > > From: J

Re: why have ColumnFamilies?

2010-03-03 Thread Jonathan Ellis
I would rather move to a more flexible model ("as many levels of nesting as you want") than a less-flexible one. 2010/3/3 Ted Zlatanov : > On Wed, 3 Mar 2010 07:23:48 -0600 Jonathan Ellis wrote: > > JE> 2010/3/3 Ted Zlatanov : >>> I don't understand the

Re: why have ColumnFamilies?

2010-03-03 Thread Jonathan Ellis
http://issues.apache.org/jira/browse/CASSANDRA-598 2010/3/3 Ted Zlatanov : > I don't understand the advantages of ColumnFamilies over a > SuperColumnFamily with just one supercolumn.  Why have the former if the > latter is functionally equivalent? > > Thanks > Ted > >

Re: Connect during bootstrapping?

2010-03-03 Thread Jonathan Ellis
30.37 is now part of the cluster > INFO - Node /98.137.30.38 is now part of the cluster > INFO - InetAddress /98.137.30.37 is now UP > INFO - InetAddress /98.137.30.38 is now UP > INFO - Joining: getting bootstrap token > INFO - New token will be user148315419 to assume load from /98.137

Re: What's the ideal size of a column?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 11:57 PM, Cool BSD wrote: > Be short - what's the ideal column size in real world? > > Long description - I'm working on a prototype, the application is a data > store that holding blobs sizing from couple of KB to hundreds of MB, close > to 1GB in the worst case. You shoul

Re: Connect during bootstrapping?

2010-03-02 Thread Jonathan Ellis
ams are transferring: > > Mode: Bootstrapping > Not sending any streams. > Not receiving any streams. > > And it doesn’t look like the node is getting any data. Any ideas? > > Thanks for the help... > > Brian > > > On 3/2/10 12:22 PM, "Jonathan Ellis&q

Re: Looking for work

2010-03-02 Thread Jonathan Ellis
(This is not to say that I think job posts are off-topic here, because they are not.) On Tue, Mar 2, 2010 at 10:43 PM, Jonathan Ellis wrote: > If there's one thing that's worse than a mailing list as a job board, > it's a wiki. :) > > On Tue, Mar 2, 2010 at 10:39 PM,

Re: Looking for work

2010-03-02 Thread Jonathan Ellis
If there's one thing that's worse than a mailing list as a job board, it's a wiki. :) On Tue, Mar 2, 2010 at 10:39 PM, Ryan Daum wrote: > Maybe the wiki needs a job board ? > On Tue, Mar 2, 2010 at 10:15 PM, Joe Stump wrote: >> >> Us too at SimpleGeo! We're Python, Cassandra, Erlang, and a smatt

Re: Index values: data or pointers?

2010-03-02 Thread Jonathan Ellis
right. as long as you don't have a ton of subcolumns (which is usually the case for a denormalize like this) then you're fine. On Tue, Mar 2, 2010 at 4:48 PM, wrote: > On Mar 2, 2010, at 4:17 PM, ext Jonathan Ellis wrote: > >> On Tue, Mar 2, 2010 at 4:13 PM,   wrote:

Re: Index values: data or pointers?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 4:13 PM, wrote: > I'm exploring data layouts and it seems like the common practice is to store > an index in one CF (e.g. userid for row key and thingid for column name) and > then to fetch all the things by their thingids separately... so get index, > and then get each

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 2:51 PM, Jon Graham wrote: > Thanks! > > Switching to java 1.6.0_18 seems to have gotten past the 2GB file boundary. > I now have a new ring token for the first node in my cluster. > > Can I run a "loadbalance" on nodes 2-6 to achive more data and token > balancing? You sho

Re: Connect during bootstrapping?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper wrote: > Hi folks, > > I’m running 0.5 and I had 2 nodes up and running, then added a 3rd node in > bootstrap mode. I understand from other discussion list threads that the new > node doesn’t serve reads while it is bootstrapping, but does that me

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jonathan Ellis
Thanks, > Jon > On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis wrote: >> >> On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham wrote: >> > Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 >> > BufferSizeRemaining: 16 >> >> This one

Re: Questions while evaluating Cassandra

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 6:43 AM, Eran Kutner wrote: > Is the procedure described in the description of ticket CASSANDRA-44 really > the way to do schema changes in the latest release? I'm not sure what's your > thoughts about this but our experience is that every release of our software > requires

Re: Error with Cassandra Only Example in contrib/client_only

2010-03-01 Thread Jonathan Ellis
That means it doesn't know any of your other nodes. Probably you don't have it configured with a seed. On Mon, Mar 1, 2010 at 9:31 PM, JKnight JKnight wrote: > Dear all, > > I tried to run ClientOnlyExample.java on contrib/client_only. But the code > did not run. The  error is: > Exception in th

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham wrote: > Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 > BufferSizeRemaining: 16 This one is harmless > java.io.IOException: Value too large for defined data type >     at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)

Re: Storage format

2010-03-01 Thread Jonathan Ellis
Then you definitely want one row, range queries are slower than we'd like right now. (Ticket to fix that: https://issues.apache.org/jira/browse/CASSANDRA-821) On Mon, Mar 1, 2010 at 5:00 PM, Erik Holstad wrote: > On Mon, Mar 1, 2010 at 2:51 PM, Jonathan Ellis wrote: >> >>

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 3:18 PM, Jon Graham wrote: > Thanks Jonathan. > > It seems like the load balance operation isn't moving. I haven't seen any > data file time changes in 2 hours and no location file time > changes in over an hour. > > I can see a tcp port # 7000 opened on the node where I ran

Re: Storage format

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 4:49 PM, Erik Holstad wrote: > Haha! > Thanks. Well I'm z little bit worried about this but since the indexes are > pretty > small I don't think it is going to be too bad. But was mostly thinking about > performance and and having the index row as a bottleneck for writing, s

Re: Process for removing an old CF in 0.5.0

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 4:41 PM, Anthony Molinaro wrote: > Hi, > >  I was just wondering what the process might be for removing an old > column family in 0.5.0. > > Can I just update the config and restart the server? Yes, but make sure your commitlog is flushed first (and that it stays empty). >

Re: Storage format

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 4:06 PM, Erik Holstad wrote: > So that is kinda of what I want to do, but I want to go from > a row with multiple columns to multiple rows with one column Right, and I'm trying to tell you that this is a bad idea unless you are worried about exhausting your "row must fit in

Re: In-Memory Storage (no disk)

2010-03-01 Thread Jonathan Ellis
No, but you could use a ramdisk. On Mon, Mar 1, 2010 at 3:56 PM, Masood Mortazavi wrote: > Hi there - > > Is there a setting of storage config or some other *user-level* programmatic > means that would cause Cassandra not to write to disk? > \ > - m. > >

Re: Is Cassandra a document based DB?

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 3:49 PM, Erik Holstad wrote: > I think that there are people that would be of a different opinion here. > Cassandra has > as I've understood it table:key:name:val and in cases the val is a > serialized data structure. > In HBase you have table:row:family:key:val:version, whi

Re: Storage format

2010-03-01 Thread Jonathan Ellis
If you are turning a row containing supercolumns each with lots of subcolumns, into "one row per supercolumn" with lots of regular columns each, then yes, that would be more efficient (unless your really do read all the subcolumns w/ each query anyway, in which case it doesn't matter). On Mon, Mar

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 1:44 PM, Jon Graham wrote: > Can I tell if the load balancing operaion is still running ok or if it > has terminated? > > Is there a rough computation to determine how long the process should take? Not really, although you can guess from cpu/io usage. This is much improved

Re: Storage format

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 12:50 PM, Erik Holstad wrote: > I've been looking at the source, but not quite find the things I'm looking > for, so I have a few > questions. > Are columns for a row stored in a serialized data structure on disk or > stored individually and > put into a data structure when

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
e nodetool command for cassandra 0.5.0. > > Is this a separate package/tool? > > Thanks, > Jon > > > On Wed, Feb 24, 2010 at 8:17 PM, Jonathan Ellis wrote: >> >> nodeprobe loadbalance and/or nodeprobe move >> >> http://wiki.apache.org/cassandra/Operations

Re: Use cases for Cassandra

2010-03-01 Thread Jonathan Ellis
Someone summarized a thread we did here a few months ago on his blog: http://www.dbthink.com/?p=183 On Mon, Mar 1, 2010 at 7:16 AM, HHB wrote: > > Hey, > What are the typical use cases for Cassandra? > How to know if I should use Cassandra or documents-based data bases like > CouchDB? > I'm worki

Re: Is Cassandra a document based DB?

2010-03-01 Thread Jonathan Ellis
Dominic Williams wrote an article about switching from hbase to cassandra here: http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/ warning: it's pretty long :) On Mon, Mar 1, 2010 at 5:34 AM, HHB wrote: > > What are the advantages/disadvantages of Cassandra over HBase? > Tha

Re: binary data in key names?

2010-02-27 Thread Jonathan Ellis
Yes. On Sat, Feb 27, 2010 at 8:34 PM, Robert Edmonds wrote: > On 2010-02-28, Jonathan Ellis wrote: >> Keys are strings.  That means they have to be UTF8-encoded, although >> thrift bindings for many languages (including python) don't help you >> with this. > > a

Re: binary data in key names?

2010-02-27 Thread Jonathan Ellis
Keys are strings. That means they have to be UTF8-encoded, although thrift bindings for many languages (including python) don't help you with this. On Sat, Feb 27, 2010 at 7:43 PM, Robert Edmonds wrote: > hi, > > i'm using cassandra 0.5.0 and pycassa 0.1. > > i'd like to store binary data (speci

Re: MapReduce in Cassandra 0.6

2010-02-27 Thread Jonathan Ellis
There's an example in contrib/word_count, as mentioned in NEWS. Basic hadoop knowledge is assumed. :) Johan has been making fixes to the 0.6 branch that are not in beta2, so you will probably want to get that from svn. I've added to CHANGES in the 0.6 branch too, thanks for the heads up. -Jonat

Re: cassandra freezes

2010-02-26 Thread Jonathan Ellis
wrote: > What will be the implications of the fact that cassandra can't keep up > with the write? Will the memtables be queued in memory until they are > flushed? > > On Thu, Feb 25, 2010 at 4:56 PM, Jonathan Ellis wrote: >> Are you swapping? >> http://spyced.b

Re: cassandra freezes

2010-02-26 Thread Jonathan Ellis
On Fri, Feb 26, 2010 at 4:49 AM, Boris Shulman wrote: > I did some analysis using iostat and vmstat and those are the results: > When the node freezes (I'm not running on a vm, I'm running on 2 cpu 8 > cores machine with 12G RAM): > > sda               0.00  9791.20  0.00 93.60     0.00 94080.00  

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
On Thu, Feb 25, 2010 at 3:54 PM, Anthony Molinaro wrote: > What about the case where cpu and ram are underutilized, and your bottleneck > is disk io (which seems to often be the case in ec2), then adding more > spindles improves overall throughput of the system.  I've actually tested > this when a

Re: Would deleted columns slow down reads?

2010-02-25 Thread Jonathan Ellis
Yes, that's going to hurt forward scans with no start column. (Reverse scans, or scans that start with a known live column, will still be fast b/c of the per-row column indexes.) On Thu, Feb 25, 2010 at 8:56 PM, Edmond Lau wrote: > Given that Cassandra needs to maintain tombstones to handle > dis

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
bute > the data files?  The operations page makes me think that maybe nodeprobe > repair might do it, will it? > > Thanks, > > -Anthony > > On Thu, Feb 25, 2010 at 01:43:22PM -0600, Jonathan Ellis wrote: >> Compaction is why http://wiki.apache.org/cassandra/CassandraHardwa

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
Compaction is why http://wiki.apache.org/cassandra/CassandraHardware recommends raid0-ing if you are concerned about free disk space limits. On Thu, Feb 25, 2010 at 1:36 PM, Gary Dusbabek wrote: > Cassandra always compacts to the directory with the most free space. > There is not a way to influen

Re: 3 node installation

2010-02-25 Thread Jonathan Ellis
dra, as far as I can tell. > These are good suggestions. Thanks. > (I don't know whether it is worth describing this in a JIRA as a bug. I > would be willing to do it if you like me to do so.) > On Thu, Feb 25, 2010 at 6:19 AM, Jonathan Ellis wrote: >> >> Then it

  1   2   3   4   5   6   7   8   9   >