Re: avro + cassandra + ruby

2010-11-17 Thread Benjamin Black
Cassandra.new(keyspace, server, {:protocol = Thrift::BinaryProtocolAccelerated}) On Tue, Nov 16, 2010 at 5:13 PM, Ryan King r...@twitter.com wrote: On Tue, Nov 16, 2010 at 10:25 AM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Sep 28, 2010 at 6:35 PM, Ryan King r...@twitter.com wrote: One

Range of 'Bootstrap'?

2010-03-22 Thread Benjamin Black
As part of my continuous abuse of a small cluster for Chef cookbook development, I've run across a strange issue I'm hoping someone can explain. The following is output after upgrading from beta2 to beta3 and running nodetool rebalance on .140.224: Address Status Load Range

Re: Range of 'Bootstrap'?

2010-03-22 Thread Benjamin Black
Looking at db/SystemTable.java I see the use of Bootstrap as a token during bootstrap, but it seems to be for the system table, not other keyspaces. Is it used more generally than that or is this a bug? On Sun, Mar 21, 2010 at 11:29 PM, Benjamin Black b...@b3k.us wrote: As part of my continuous

Re: Range of 'Bootstrap'?

2010-03-22 Thread Benjamin Black
Could've misread it, it was late. Regardless, seems this should never happen. On Mon, Mar 22, 2010 at 6:19 AM, Gary Dusbabek gdusba...@gmail.com wrote: On Mon, Mar 22, 2010 at 01:58, Benjamin Black b...@b3k.us wrote: Looking at db/SystemTable.java I see the use of Bootstrap as a token during

Re: Example of data model

2010-03-22 Thread Benjamin Black
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ On Mon, Mar 22, 2010 at 6:51 AM, Hill, Ed A ed-h...@uiowa.edu wrote: I've been through the twissandra data model as well and it is pretty straightforward and well explained (thanks!) - but I notice that a number of the folks

Re: why Cassandra 0.5.1 write speed is very slow

2010-03-22 Thread Benjamin Black
Maybe I missed this: what replication factor and consistency level are you using? 2010/3/21 郭鹏 gpcus...@gmail.com: Thx, I will try it in the multi-thread mode. What's the best practice in the production env? 在 2010年3月21日 下午12:04,Jonathan Ellis jbel...@gmail.com写道: If you're benchmarking

Re: Auto Increament

2010-03-25 Thread Benjamin Black
Cassandra is not being used to generate the Twitter identifiers. Twitter, like most places using Cassandra, has more than one database system in production. UUIDs are not at risk of conflicts with billions of rows. b On Thu, Mar 25, 2010 at 5:57 AM, Jaepil Jeong zgdr...@gmail.com wrote: Hi

Re: Model Question

2010-03-25 Thread Benjamin Black
Erez, To make this work you have to make your model fit Cassandra, not the other way around. As a rule, you either do complex queries via client code to process the results of several, simpler queries or via a CF you create to act as an index. Yes, this means you have to write data to each

multinode cluster wiki page

2010-04-03 Thread Benjamin Black
Just added this to the wiki as it seemed a very frequent request on irc: http://wiki.apache.org/cassandra/MultinodeCluster Would very much appreciate feedback and edits to improve it. b

Re: multinode cluster wiki page

2010-04-03 Thread Benjamin Black
on making the thrift interface listening on more than localhost. Kind regards, Benoit. 2010/4/3 Benjamin Black b...@b3k.us: Just added this to the wiki as it seemed a very frequent request on irc: http://wiki.apache.org/cassandra/MultinodeCluster Would very much appreciate feedback and edits

Re: multinode cluster wiki page

2010-04-03 Thread Benjamin Black
the other day. One question I'm still unclear on, when setting up multiple nodes, say 4-8 (or more) what's the suggested ratio of seed vs. non-seed nodes? thanks, Joe On Apr 3, 2010, at 1:14 AM, Benjamin Black wrote: Just added this to the wiki as it seemed a very frequent request

Re: multinode cluster wiki page

2010-04-03 Thread Benjamin Black
config to all nodes. On Sat, Apr 3, 2010 at 3:14 AM, Benjamin Black b...@b3k.us wrote: Just added this to the wiki as it seemed a very frequent request on irc: http://wiki.apache.org/cassandra/MultinodeCluster Would very much appreciate feedback and edits to improve it. b

Re: multinode cluster wiki page

2010-04-03 Thread Benjamin Black
Seems like a lot of complexity for a very small win (how often do you bootstrap new nodes? if you only need a handful of seeds, what's all that hard about listing them all on all nodes?). I prefer simple and predictable, and trying to do this with round robin DNS seems to be neither, to me. b

Re: Deployment on AWS

2010-04-03 Thread Benjamin Black
as 'placement_availability_zone', avoiding the need to speak the EC2 API or store credentials in the configs. b On Sat, Apr 3, 2010 at 2:45 PM, Joe Stump j...@joestump.net wrote: On Apr 3, 2010, at 1:53 PM, Benjamin Black wrote: What specific features are you looking for to operate on EC2

Re: Deployment on AWS

2010-04-03 Thread Benjamin Black
wrote: On Apr 3, 2010, at 2:54 PM, Benjamin Black wrote: I'm pretty familiar with EC2, hence the question.  I don't believe any patches are required to do these things.  Regardless, as I noted in that ticket, you definitely do NOT need AWS credentials to determine your availability zone

Re: Deployment on AWS and replication strategies

2010-04-04 Thread Benjamin Black
On Sat, Apr 3, 2010 at 8:23 PM, Mike Gallamore mike.e.gallam...@googlemail.com wrote: I didn't mean a real time determination, more of if the nodes aren't identical. For example if you have a cluster made up of a bunch of EC2 light instances and decide to add a large instance, it would be

Re: Memcached protocol?

2010-04-04 Thread Benjamin Black
On Sun, Apr 4, 2010 at 8:42 PM, Paul Prescod pres...@gmail.com wrote: On Sun, Apr 4, 2010 at 5:06 PM, Benjamin Black b...@b3k.us wrote: ... Are you suggesting this would give you counter semantics? Yes: My understanding of cassandra-580 is that it gives you increment and decrement which

Re: Overwhelming a cluster with writes?

2010-04-06 Thread Benjamin Black
You are blowing away the mostly saner JVM_OPTS running it that way. Edit cassandra.in.sh (or wherever config is on your system) to increase mx to 4G (not 6G, for now) and leave everything else untouched and do not specify JVM_OPTS on the command line. See if you get the same behavior. b On

Re: OrderPreservingPartitioner limits and workarounds

2010-04-07 Thread Benjamin Black
I'd suggest you use RandomPartitioner, an index, and multiget. You'll be able to do range queries and won't have the load imbalance and performance problems of OPP and native range queries. b On Wed, Apr 7, 2010 at 3:51 AM, Paul Prescod p...@prescod.net wrote: I have one append-oriented

Re: does compaction of Super Column Family have same limit as compaction of Column Family

2010-04-07 Thread Benjamin Black
SCF rows are loaded in their entirety into memory, so the limit applies in the same way. On Wed, Apr 7, 2010 at 5:16 PM, Jeremy Davis jerdavis.cassan...@gmail.com wrote: Quick question: There is an open issue with ColumnFamilies growing too large to fit in memory when compacting.. Does this

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Benjamin Black
...@gmail.com wrote: Then from an IT standpoint, if i'm using a RF of 3, it stands to reason that running on Raid 1 makes sense, since RAID and RF achieve the same ends... it makes sense to strip for speed and let cassandra deal with redundancy, eh? On Wed, Apr 7, 2010 at 4:07 PM, Benjamin Black b

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Benjamin Black
is for this software... and whats to be avoided because its a given that it dosnt work. On Wed, Apr 7, 2010 at 6:40 PM, Benjamin Black b...@b3k.us wrote: That depends on your goals for fault tolerance and recovery time.  If you use RAID1 (or other redundant configuration) you can tolerate disk failure

Re: Write consistency

2010-04-08 Thread Benjamin Black
On Thu, Apr 8, 2010 at 12:55 AM, Paul Prescod p...@ayogo.com wrote: ¹ http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/ Pay no attention to this disingenuous troll. b

Re: Write consistency

2010-04-08 Thread Benjamin Black
Yes. Or you would retry the write. Either way, the system achieves consistency eventually, hence the name. On Thu, Apr 8, 2010 at 9:36 AM, Mark Greene green...@gmail.com wrote: So unless you re-try the write, the previous stale write stays on the other two nodes? Would a read repair fix this

Re: Write consistency

2010-04-08 Thread Benjamin Black
His arguments consistently (hah!) boil down to this: if you misconfigure things for your intended application, you get undesirable behavior. For example, the correct approach to the situation cited is to use quorum reads and writes. W=3/R=1/N=3 might be appropriate for situations in which you

Re: Iterate through entire data set

2010-04-08 Thread Benjamin Black
Strange setup, but, ok. What is your ThriftAddress setting on the Windows machine? On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer sonnyh...@gmail.com wrote: I have two boxes.  One is a windows box running Cassandra .6, and the other is an ubuntu box from which I'm trying to run the word count

Re: Iterate through entire data set

2010-04-08 Thread Benjamin Black
Are you actually trying to make the Ubuntu system another node in the ring? While the first node is only listening on localhost? There's your problem. On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer sonnyh...@gmail.com wrote: I have two boxes.  One is a windows box running Cassandra .6, and the

Re: Iterate through entire data set

2010-04-08 Thread Benjamin Black
connecting. On Thu, Apr 8, 2010 at 11:58 AM, Sonny Heer sonnyh...@gmail.com wrote: Single node cluster (the windows box).  the Ubuntu box is only used to run the word count On Thu, Apr 8, 2010 at 11:54 AM, Benjamin Black b...@b3k.us wrote: Are you actually trying to make the Ubuntu system another node

Re: How to perform queries on Cassandra?

2010-04-11 Thread Benjamin Black
You would have a Column Family, not a column for that; let's call it the Users CF. You'd use username as the row key and have a column called 'password'. For your example query, you'd retrieve row key 'usr2', column 'password'. The general pattern is that you create CFs to act as indices for

Re: How to perform queries on Cassandra?

2010-04-11 Thread Benjamin Black
   123456 - column name     value - 123456 I m thinking of doing it this way for my applicaton, this way i can run different sorts of queries too. Any feedback on this is welcome. On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black b...@b3k.us wrote: You would have a Column Family

Re: How to perform queries on Cassandra?

2010-04-11 Thread Benjamin Black
example is clear this time. Should you have any queries feel free to revert. On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote: Sorry, I don't understand your example. On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified vineetdan...@gmail.com wrote: Benjamin I quite agree

Re: How to perform queries on Cassandra?

2010-04-11 Thread Benjamin Black
, I think every idea or experience should be shared with the community. I hope I example is clear this time. Should you have any queries feel free to revert. On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote: Sorry, I don't understand your example. On Sun, Apr 11, 2010 at 12

Re: How to perform queries on Cassandra?

2010-04-11 Thread Benjamin Black
. This is just a thought that has come to my mind while trying to design my db for cassandra. On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black b...@b3k.us wrote: Row keys must be unique.  If your usernames are not unique and you want to be able to query on them, you either need to figure out

Re: How to perform queries on Cassandra?

2010-04-11 Thread Benjamin Black
On Sun, Apr 11, 2010 at 12:10 PM, vineet daniel vineetdan...@gmail.com wrote: I assume that using the key i can get the all the columns like an array. Now i'd be using php to extract  arraykey=value in that array, just want to avoid that i.e i can directly print the column names. It doesn't

Re: Newbie ? with get_range_slices

2010-04-11 Thread Benjamin Black
http://wiki.apache.org/cassandra/FAQ#range_ghosts On Sun, Apr 11, 2010 at 5:12 PM, Kevin Wiggen kwig...@xythos.com wrote: I have spent the last few days playing with Cassandra and I have attempted to create a simple Java-Thrift-Cassandra Discussion Group Server (because the world needs

Re: Worst case #iops to read a row

2010-04-12 Thread Benjamin Black
On Mon, Apr 12, 2010 at 4:27 PM, Time Less timelessn...@gmail.com wrote: With this formula, we can already begin to formulate more useful answers to the question. If I have 10B rows in my CF, and I can fit 10k rows per SStable, and the SStables are spread across 5 nodes, and I have 1 bloom

Re: Worst case #iops to read a row

2010-04-13 Thread Benjamin Black
On Tue, Apr 13, 2010 at 10:48 AM, Time Less timelessn...@gmail.com wrote: If I have 10B rows in my CF, and I can fit 10k rows per SStable, and the SStables are spread across 5 nodes, and I have 1 bloom The error you are making is in thinking the Memtable thresholds are the SSTable limits.

Re: Worst case #iops to read a row

2010-04-13 Thread Benjamin Black
On Tue, Apr 13, 2010 at 11:31 AM, Paul Prescod pres...@gmail.com wrote: I am just checking math, not model. On Tue, Apr 13, 2010 at 10:48 AM, Time Less timelessn...@gmail.com wrote: numRowsOnNode = 10B / 20 = 500M. 50 million 10B / 20 is 500M. The rest of the analysis from our

Re: Worst case #iops to read a row

2010-04-13 Thread Benjamin Black
On Tue, Apr 13, 2010 at 11:55 AM, Paul Prescod pres...@gmail.com wrote: What do you mean by bad practice? The document above implies that it is nearly impossible. It implies that you will have between 1 and 4 SSTables. Does the administrator have a choice in this matter? Hey, I am arguing

Re: GC options

2010-04-14 Thread Benjamin Black
FYI, G1 has been in 1.6 since u14. 2010/4/13 Peter Schüller sc...@spotify.com: I'm working on getting our latency as consistent as possible, and the gc likes to kick off 60+ms periods of unavailability for a node, which for my application leads to a reasonable number of timed out requests.

Re: GC options

2010-04-14 Thread Benjamin Black
Got it, thanks 2010/4/13 Peter Schüller sc...@spotify.com: FYI, G1 has been in 1.6 since u14. Yes, but (last time I checked) in a considerably older form. The JDK 1.7 one is more mature. -- / Peter Schuller aka scode

Re: RackAware and replication strategy

2010-04-15 Thread Benjamin Black
Have a look at locator/DatacenterShardStrategy.java. On Thu, Apr 15, 2010 at 8:16 AM, Ran Tavory ran...@gmail.com wrote: I'm reading this on this page http://wiki.apache.org/cassandra/ArchitectureInternals : AbstractReplicationStrategy controls what nodes get secondary, tertiary, etc.

Re: Can't start cassandra

2010-04-18 Thread Benjamin Black
If you are trying to run on machines with less than 1GB of memory, or OS resource limits that prevent allocation of 1GB of memory, that is what happens. You shouldn't be increasing -Xms, you should be decreasing -Xmx. Try -Xms16M -Xmx500M. b On Sun, Apr 18, 2010 at 2:30 PM, Soichi Hayashi

Re: cleaning house

2010-04-20 Thread Benjamin Black
Are you deleting data through the API or just doing a bunch of inserts and then running a compaction? The latter will not result in anything to clean up since data must be explicitly deleted. b On Tue, Apr 20, 2010 at 10:33 AM, B. Todd Burruss bburr...@real.com wrote: i'm trying to draw some

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Benjamin Black
I can't answer for its sanity, but I would not do it that way. I'd have a CF for Emails, with 1 email per row, and another CF for UserEmails with per-user index rows referencing the Emails rows. b On Tue, Apr 20, 2010 at 9:44 AM, Mark Jones mjo...@imagehawk.com wrote: To make sure I'm clear

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Benjamin Black
On Tue, Apr 20, 2010 at 11:54 AM, Mark Jones mjo...@imagehawk.com wrote: When I look at this arrangement, I see one lookup by key for the user, followed by a large read for all the email indexes  (these are all columns in the same row, right?) Then one lookup by key for each email  

Re: Quorom consistency in a changing ring

2010-04-26 Thread Benjamin Black
Live nodes that have tokens indicating they should receive a copy of data count towards write quorum. This means if a node is down (not decommissioned) the copy sent to the node acting as the hinted handoff replica will not count towards achieving quorum. If a token is moved, it is moved. It is

Re: Cassandra vs. Voldemort benchmark

2010-05-08 Thread Benjamin Black
My comment on this post: This is an interesting start to performance testing these systems, but raises many more questions than it answers. I am disappointed you chose not to investigate the enormous, unexplained spreads in performance for either system tested, nor to attempt to adjust tuning

Re: Tuning Cassandra

2010-05-10 Thread Benjamin Black
The performance you are describing is completely abnormal. The first step in troubleshooting it is profiling your client behavior because that is almost certainly where the problem is. Where is it spending its time? If that ultimately indicates it is really waiting on Cassandra, you can turn

Re: Is it possible to delete records based upon where condition

2010-05-12 Thread Benjamin Black
The functionality of a WHERE clause usually means maintaining an inverted index, usually another CF, on the information of interest (ses_tstamp in your example). You then retrieve index rows from that CF to find the data rows. b On Wed, May 12, 2010 at 5:34 AM, Moses Dinakaran

Re: how does cassandra compare with mongodb?

2010-05-13 Thread Benjamin Black
Mongo has a rich query API and a weak distribution/replication story. Cassandra has a narrow (read: weak) query API and a strong distribution/replication story. If you want really shallow learning curve, easy querying, etc, won't have that much data, and are handy with the typical master/slave

Re: Cassandra data model for financial data

2010-05-13 Thread Benjamin Black
On Thu, May 13, 2010 at 12:45 PM, Miguel Verde miguelitov...@gmail.com wrote: I also think that's not a good design, but only because the typical query would have to hit several column families instead of just one. This is completely normal in a columnar store. You query at least one index

Re: Cassandra Cluster Setup

2010-06-03 Thread Benjamin Black
http://wiki.apache.org/cassandra/MultinodeCluster On Thu, Jun 3, 2010 at 1:07 PM, Stephan Pfammatter stephan.pfammat...@logmein.com wrote: I’m having difficulties setting up a 3 way cassandra cluster. Any comments/help would be appreciated. My goal is that all data should be fully

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope pstanh...@wimba.com wrote: Here's the scenario: would like R = N where N is the number of nodes. Let's say 8. 1. Create first node, modify storage-conf.xml and change the Seed/ to be the ip of the node. Change replication factor to 8 for CF

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 11:04 AM, Philip Stanhope pstanh...@wimba.com wrote: I am contemplating a situation where there may be 2N servers ... but only N online at any one time. But, for operational purposes, N+n (where n is 1 or 2), N may be occasionally greater than R. Then Cassandra is

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope pstanh...@wimba.com wrote: I guess I'm thick ... What would be the right choice? Our data demands have already been proven to scale beyond what RDB can handle for our purposes. We are quite pleased with Cassandra read/write/scale out. Just

Re: Row Time range

2010-06-04 Thread Benjamin Black
That's entirely up to you. If you make row keys that are time ordered and include the time as a prefix in the key, you just use get_range() as usual, start now, end 7pm yesterday, count of 10. On Fri, Jun 4, 2010 at 2:23 PM, Nicholas Sun nick@raytheon.com wrote: Is there a mechanism to

Re: Is ReplicationFactor values number of replicas or number of copies of data?

2010-06-07 Thread Benjamin Black
There is no 'master' so all copies are replicas. RF=1 means 1 node has the data, RF=2 means 2 do, etc. On Mon, Jun 7, 2010 at 4:16 AM, Per Olesen p...@trifork.com wrote: Hi, I am unclear about what the ReplicationFactor value means. Does RF=1 mean that there is only one single node that has

Re: Beginner Assumptions

2010-06-13 Thread Benjamin Black
On Sun, Jun 13, 2010 at 12:53 AM, Torsten Curdt tcu...@vafer.org wrote: rant TBH while we are using super columns, the somehow feel wrong to me. I would be happier if we could move what we do with super columns into the row key space. But in our case that does not seem to be so easy. /rant

Re: GC Storm

2010-06-13 Thread Benjamin Black
On Sat, Jun 12, 2010 at 7:46 PM, Anty anty@gmail.com wrote: Hi:ALL I have 10 nodes cluster ,after inserting many records into the cluster, i compact each node by nodetool compact. during the compaciton process ,something  wrong with one of the 10 nodes , when the size of the compacted

Re: Data format stability

2010-06-13 Thread Benjamin Black
What specifically is driving you to use trunk rather than the stable, 0.6 branch? On Sun, Jun 13, 2010 at 1:37 PM, Matthew Conway m...@backupify.com wrote: Not so much worried about temporary breakages, but more about design decisions that are made to enhance cassandra at the cost of a data

Re: Beginner Assumptions

2010-06-13 Thread Benjamin Black
On Sun, Jun 13, 2010 at 3:08 PM, Mark Robson mar...@gmail.com wrote: Range queries I think make them less useful, Not to my knowledge. but only work if you're using OrderPreservingPartitioner. The OPP comes with its own caveats - your nodes are likely to become badly unbalanced,

Re: Data modelling question

2010-06-14 Thread Benjamin Black
On Mon, Jun 14, 2010 at 6:09 AM, Per Olesen p...@trifork.com wrote: So, in my use case, when searching on e.g. company, I can then access the DashboardCompanyIndex with a slice on its SC and then grab all the uuids from the columns, and after this, make a lookup in the Dashboard CF for each

Re: java.lang.OutofMemoryerror: Java heap space

2010-06-14 Thread Benjamin Black
My guess: you are outrunning your disk I/O. Each of those 5MB rows gets written to the commitlog, and the memtable is flushed when it hits the configured limit, which you've probably left at 128MB. Every 25 rows or so you are getting memtable flushed to disk. Until these things complete, they

Re: JVM Options for Production

2010-06-14 Thread Benjamin Black
...or does it very greatly from installation to installation? Yes.

Re: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread Benjamin Black
changes take effect. -Original Message- From: Benjamin Black [mailto:b...@b3k.us] Sent: Monday, June 14, 2010 7:46 PM To: user@cassandra.apache.org Subject: Re: java.lang.OutofMemoryerror: Java heap space My guess: you are outrunning your disk I/O.  Each of those 5MB rows gets written

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
You are likely exhausting your heap space (probably still at the very small 1G default?), and maximizing the amount of resource consumption by using CL.ALL. Why are you using ALL? On Tue, Jun 15, 2010 at 11:58 AM, Julie julie.su...@nextcentury.com wrote: I am running a 10 node cassandra 0.6.1

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 1:40 PM, Julie julie.su...@nextcentury.com wrote: Thanks for your reply.  Yes, my heap space is 1G.  My vms have only 1.7G of memory so I hesitate to use more. Then write slower. There is no free lunch. b

Re: stalled streaming

2010-06-15 Thread Benjamin Black
Known bug, fixed in latest 0.6 release. On Tue, Jun 15, 2010 at 3:29 PM, aaron aa...@thelastpickle.com wrote: hello, I have a 4 node cassandra cluster with 0.6.1 installed. We've been running a mixed read / write workload test how it works in our environment, we run about 4M bath mutations

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 3:55 PM, Charles Butterfield charles.butterfi...@nextcentury.com wrote: Benjamin Black b at b3k.us writes: Then write slower.  There is no free lunch. b Are you implying that clients need to throttle their collective load on the server to avoid causing the server

Re: stalled streaming

2010-06-15 Thread Benjamin Black
This is not the bug to which I was referring. I don't recall the number, perhaps someone else can assist on that front? I just know I specifically upgraded to 0.6 trunk a bit before 0.6.2 to pick up the fix (and it worked). b On Tue, Jun 15, 2010 at 6:07 PM, Rob Coli rc...@digg.com wrote:

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:44 PM, Charles Butterfield charles.butterfi...@nextcentury.com wrote: I guess my point is that I have rarely run across database servers that die from either too many client connections, or too rapid client requests.  They generally stop accepting incoming connections

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:44 PM, Charles Butterfield charles.butterfi...@nextcentury.com wrote: To clarify the history here -- initially we were writing with CL=0 and had great performance but ended up killing the server.  It was pointed out that we were really asking the server to accept and

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:58 PM, Jonathan Shook jsh...@gmail.com wrote: If there aren't enough resources on the server side to service the clients, the expectation should be that the servers have a graceful performance degradation, or in the worst case throw an error specific to resource

Re: stalled streaming

2010-06-15 Thread Benjamin Black
Yes! On Tue, Jun 15, 2010 at 6:44 PM, Jonathan Ellis jbel...@gmail.com wrote: I think the one you're referring to is https://issues.apache.org/jira/browse/CASSANDRA-1076 On Tue, Jun 15, 2010 at 8:16 PM, Benjamin Black b...@b3k.us wrote: This is not the bug to which I was referring.  I don't

Re: Best documentation for Java and Cassandra?

2010-06-17 Thread Benjamin Black
Columnar data stores like Cassandra require you to construct indices to answer the queries of interest to you. http://www.slideshare.net/benjaminblack/cassandra-basics-indexing On Thu, Jun 17, 2010 at 12:04 AM, Anthony Ikeda anthony.ik...@cardlink.com.au wrote: I’m wondering if anyone can

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread Benjamin Black
Are these physical machines or virtuals? Did you post your cassandra.in.sh and storage-conf.xml someplace? On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater a...@zuno.com wrote: Total data size in the entire cluster is about twenty 12k images. With no other load on the system. I just ask for one

Re: Get all rows back from a ColumnFamily

2010-06-17 Thread Benjamin Black
you either need to use OPP and issue a get_range() request with empty strings for start and end keys (and you'll generally want to paginate using a count option and saving the last entry in a given page), or you need to index your rows. same as with any other sort of query you might want to

Re: Some questions about using Cassandra

2010-06-17 Thread Benjamin Black
You download the patch and apply it. On Thu, Jun 17, 2010 at 4:10 PM, Anthony Ikeda anthony.ik...@cardlink.com.au wrote: Thanks Sylvia, I would like to actually do that actually. Any idea how I can get started? -Original Message- From: Sylvain Lebresne [mailto:sylv...@yakaz.com]

Re: Best documentation for Java and Cassandra?

2010-06-17 Thread Benjamin Black
On Thu, Jun 17, 2010 at 9:00 PM, Anthony Ikeda anthony.ik...@cardlink.com.au wrote: I'm certain I didn't mention Lucandra, but yes I get the idea about the concepts: Lucandra is Lucene on Cassandra. Unclear to me how else Lucene is relevant, but, ok! * Use indexing tool to reference with

Re: Failover and slow nodes

2010-06-18 Thread Benjamin Black
Would be interesting to have a snitch that manipulated responses for read nodes based on historical response times. On Fri, Jun 18, 2010 at 8:21 AM, James Golick jamesgol...@gmail.com wrote: Our cassandra client fails over if a node times out. Aside from actual failure, repair and major

Re: Problem with Deletes

2010-06-20 Thread Benjamin Black
http://wiki.apache.org/cassandra/DistributedDeletes On Thu, Jun 17, 2010 at 9:10 AM, Amir amir7...@yahoo.com wrote: Hi All, I'm running a benchmark on Cassandra while using a benchmark client which I've written myself. I'm running the following scenario: One Cassandra node on the same

Re: Thrift Client on Ruby, does it need compiled bindings? (or anything else to make it faster?)

2010-06-20 Thread Benjamin Black
Only one: don't use it if you want performance. On Sun, Jun 20, 2010 at 11:11 AM, Christian van der Leeden christian.vanderlee...@googlemail.com wrote: Hi,        I'm just experimenting and benchmarking cassandra for my use case. I'm using the fauna/cassandra and fauna/thrift_client. Is there

Re: Atomic Compare and Swap

2010-06-20 Thread Benjamin Black
No. On Sun, Jun 20, 2010 at 2:42 PM, Rishi Bhardwaj khichri...@yahoo.com wrote: Hi I was wondering if Cassandra has any plans for supporting atomic compare and swap operation on a column value? Compare could be on timestamp for the column or the column value itself and the write of course is

Re: Is there a penalty to a SuperColumn?

2010-06-21 Thread Benjamin Black
If there is ambiguity, something else is wrong and you should probably stick with a regular CF. If you are indexing a regular CF with an SCF you are probably doing it right. If you are trying to model some hierarchical structure from your problem domain, I really recommend just using composite

Re: get_range_slices confused about token ranges after decommissioning a node

2010-06-21 Thread Benjamin Black
Did you forget to run repair? On Mon, Jun 21, 2010 at 7:02 PM, Joost Ouwerkerk jo...@openplaces.org wrote: I believe we did nodetool removetoken on nodes that were already down (due to hardware failure), but I will check to make sure. We're running Cassandra 0.6.2. On Mon, Jun 21, 2010 at

Re: 10 minute cassandra pause

2010-06-23 Thread Benjamin Black
Are you seeing any sort of log messages from Cassandra at all? On Wed, Jun 23, 2010 at 2:26 PM, Sean Bridges sean.brid...@gmail.com wrote: We were running a load test against a single 0.6.2 cassandra node.  24 hours into the test,  Cassandra appeared to be nearly frozen for 10 minutes.  Our

Re: Understanding SuperColumns

2010-06-28 Thread Benjamin Black
On Sun, Jun 27, 2010 at 7:36 PM, Anthony Ikeda anthony.ik...@cardlink.com.au wrote: Say my query is: Get all Work addresses in New York and the address owner. Steps to get the data would be: If this is the query you want to run, then you probably just want to put the owner in the index

Re: live nodes list in ring

2010-06-30 Thread Benjamin Black
Does this happen after you have changed the ring topology, especially adding nodes? 2010/6/30 Stephen Hamer stephen.ha...@xobni.com: When this happens to me I have to do a full cluster restart. Even doing a rolling restart across the cluster doesn't seem to fix them, all of the nodes need to

Re: Implementing Counter on Cassandra

2010-06-30 Thread Benjamin Black
ZK is way overkill for counters. memcache and redis are much better at the job. On Tue, Jun 29, 2010 at 12:32 PM, Jonathan Shook jsh...@gmail.com wrote: Until then, a pragmatic solution, however undesirable, would be to only have a single logical thread/task/actor that is allowed to

Re: UnavailableException with 1 node down and RF=2?

2010-06-30 Thread Benjamin Black
.QUORUM or .ALL (they are the same with RF=2). On Wed, Jun 30, 2010 at 10:22 PM, James Golick jamesgol...@gmail.com wrote: 4 nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J.

Re: Digg 4 Preview on TWiT

2010-07-07 Thread Benjamin Black
Thanks, second funniest thing I've read this month! On Tue, Jul 6, 2010 at 4:13 PM, Matt Su matt...@morningstar.com wrote: Thanks for all your guys’ information. This thread make us raised a concern: we choose Cassandra because FB,Twitter,Digg are using them, and we’re doubting whether

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Benjamin Black
On Thu, Jul 8, 2010 at 9:02 AM, ChingShen chingshenc...@gmail.com wrote: Hmm.. as you mentioned that it will write a hint and report success at CL.ANY, does the hinted handoff only work at CL.ANY? Still no. Hints are written when nodes are down, regardless of CL, unless HH is disabled. CL

Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
(and I'm sure someone will correct me if I am wrong on that) On Thu, Jul 8, 2010 at 11:24 AM, Benjamin Black b...@b3k.us wrote: There is a memtable per CF, regardless of how many keyspaces you have.

Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
- what areas should I investigate to understand the concerns you raise? Thanks again -Original Message- From: Benjamin Black [mailto:b...@b3k.us] Sent: Thursday, July 08, 2010 11:28 AM To: user@cassandra.apache.org Subject: Re: Use of multiple Keyspaces (and I'm sure someone

Re: TechCrunch article on Twitter and Cassandra

2010-07-10 Thread Benjamin Black
On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark co...@cloudeventprocessing.com wrote: Although I'm a fan of Cassandra, there's no way I'd use it today for my tier 1 deployments, because I don't have the resources of Facebook, and even though Cassandra is open source, that doesn't mean I can fix

Re: Question about hinted handoff

2010-07-10 Thread Benjamin Black
You constructed a pathological case and then got confused at the result. Consider instead a realistic case: RF=3, CL=QUORUM. Writes should go to all of A, B, and C. B is down when the write request arrives, so does not acknowledge the it. A and C acknowledge the write. Since quorum is

Re: Question about CL.ZERO

2010-07-11 Thread Benjamin Black
And, to be clear, there is no good reason to use CL.ZERO and it can be a serious resource hog on the coordinator. On Sun, Jul 11, 2010 at 9:21 AM, ChingShen chingshenc...@gmail.com wrote: Hi all,   Does it mean that the coordinator node always return success to the client at CL.ZERO? But if

Re: Question about CL.ZERO

2010-07-12 Thread Benjamin Black
On 07/11/2010 11:09 AM, Benjamin Black wrote: And, to be clear, there is no good reason to use CL.ZERO and it can be a serious resource hog on the coordinator. On Sun, Jul 11, 2010 at 9:21 AM, ChingShenchingshenc...@gmail.com  wrote: Hi all,   Does it mean that the coordinator node always

Re: server needs thrift to run also?

2010-07-12 Thread Benjamin Black
You were just told it is packaged with what it needs. The API is not changed from 0.6.1 to 0.6.3. Why do you think you need to generate client code? On Mon, Jul 12, 2010 at 2:16 PM, S Ahmed sahmed1...@gmail.com wrote: Ok I guess I have to read up on exactly what is going on here. I figured I

  1   2   3   >