CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread S Ahmed
If you store only the key mappings in a column family, for custom ordering of rows etc. for things like: friends = { user_id : { friendid1, friendid2, } } or topForumPosts = { forum_id1 : { post2343, post32343, post32223, ...} } Now on friends page or on the top_forum_posts

How to get previous / next data?

2010-06-15 Thread Bram van der Waaij
Hello, We want to use cassandra to store and retrieve time related data. Storing the time-value pairs is easy and works perfectly. The problem arrives at retrieving the data. We do not only want to retrieve data from within a time range, but also be able to get the previous and/or next data

Re: How to get previous / next data?

2010-06-15 Thread Sylvain Lebresne
You want to use 'reversed' in SliceRange (and a start with whatever you want and a count of 2). -- Sylvain On Tue, Jun 15, 2010 at 12:01 PM, Bram van der Waaij bramat...@gmail.com wrote: Hello, We want to use cassandra to store and retrieve time related data. Storing the time-value pairs is

Re: JVM Options for Production

2010-06-15 Thread Ted Zlatanov
On Mon, 14 Jun 2010 16:01:57 -0700 Anthony Molinaro antho...@alumni.caltech.edu wrote: AM Now I would assume that for 'production' you want to remove AM-ea AM and AM-XX:+HeapDumpOnOutOfMemoryError AM as well as adjust -Xms and Xmx accordingly, but are there any others AM which should

Re: CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread S Ahmed
well it won't be a range, it will be random key lookups. On Tue, Jun 15, 2010 at 8:44 AM, Gary Dusbabek gdusba...@gmail.com wrote: On Tue, Jun 15, 2010 at 04:29, S Ahmed sahmed1...@gmail.com wrote: If you store only the key mappings in a column family, for custom ordering of rows etc. for

Re: How to get previous / next data?

2010-06-15 Thread Bram van der Waaij
Perfect! Thanks :-) 2010/6/15 Sylvain Lebresne sylv...@yakaz.com You want to use 'reversed' in SliceRange (and a start with whatever you want and a count of 2). -- Sylvain On Tue, Jun 15, 2010 at 12:01 PM, Bram van der Waaij bramat...@gmail.com wrote: Hello, We want to use

Re: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread Jonathan Ellis
if you are reading 500MB per thrift request from each of 3 threads, then yes, simple arithmetic indicates that 1GB heap is not enough. On Mon, Jun 14, 2010 at 6:13 PM, Caribbean410 caribbean...@gmail.com wrote: Hi, I wrote 200k records to db with each record 5MB. Get this error when I uses 3

Cassandra timeouts under low load

2010-06-15 Thread Drew Dahlke
Hi, I'm running cassandra .6.2 on a dedicated 4 node cluster and I also have a dedicated 4 node hadoop cluster. I'm trying to run a simple map reduce job against a single column family and it only takes 32 map tasks before I get floods of thrift timeouts. That would make sense to me if the

RE: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread caribbean410
Sorry, the record size should be 5KB not 5MB. Coz 4KB is still OK. I will try Benjamin's suggestion. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, June 15, 2010 8:09 AM To: user@cassandra.apache.org Subject: Re: java.lang.OutofMemoryerror: Java heap

Re: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread Benjamin Black
You should only have to restart once per node to pick up config changes. On Tue, Jun 15, 2010 at 9:41 AM, caribbean410 caribbean...@gmail.com wrote: Today I retry the 2GB heap now it's working. No that out of memory error. Looks like I have to restart Cassandra several times before the new

Re: Replication Factor and Data Centers

2010-06-15 Thread Jonathan Ellis
(moving to user@) On Mon, Jun 14, 2010 at 10:43 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: Is the clearer interpretation of this statement (in conf/datacenters.properties) given anywhere else? # The sum of all the datacenter replication factor values should equal # the replication

Re: help for designing a cassandra

2010-06-15 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/ArticlesAndPresentations might help. On Mon, Jun 14, 2010 at 1:13 PM, Johannes Weissensel whitesensl...@googlemail.com wrote: Hi everyone, i am new to nosql databases and especially column-oriented Databases like cassandra. I am a student on

java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Julie
I am running a 10 node cassandra 0.6.1 cluster with a replication factor of 3. To populate the database to perform my read benchmarking, I have 8 applications using Thrift, each connecting to a different cassandra server and writing 100,000 rows of data (100 KB each row), using a

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
You are likely exhausting your heap space (probably still at the very small 1G default?), and maximizing the amount of resource consumption by using CL.ALL. Why are you using ALL? On Tue, Jun 15, 2010 at 11:58 AM, Julie julie.su...@nextcentury.com wrote: I am running a 10 node cassandra 0.6.1

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Julie
Benjamin Black b at b3k.us writes: You are likely exhausting your heap space (probably still at the very small 1G default?), and maximizing the amount of resource consumption by using CL.ALL. Why are you using ALL? On Tue, Jun 15, 2010 at 11:58 AM, Julie julie.sugar at nextcentury.com

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Phil Stanhope
How are you doing your inserts? I draw a clear line between 1) bootstrapping a cluster with data and 2) simulating expected/projected read/write behavior. If you are bootstrapping then I would look into the batch_mutate APIs. They allow you to improve your performance on writes dramatically.

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 1:40 PM, Julie julie.su...@nextcentury.com wrote: Thanks for your reply.  Yes, my heap space is 1G.  My vms have only 1.7G of memory so I hesitate to use more. Then write slower. There is no free lunch. b

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Jonathan Ellis
On Tue, Jun 15, 2010 at 1:58 PM, Julie julie.su...@nextcentury.com wrote: Coinciding with my write timeouts, all 10 of my cassandra servers are getting the following exception written to system.log: Value too large for defined data type looks like a bug found in older JREs. Upgrade to u19 or

stalled streaming

2010-06-15 Thread aaron
Hello,

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Julie
Phil Stanhope pstanhope at wimba.com writes: How are you doing your inserts? I draw a clear line between 1) bootstrapping a cluster with data and 2) simulating expected/projected read/write behavior. If you are bootstrapping then I would look into the batch_mutate APIs. They allow you

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Jonathan Ellis
On Tue, Jun 15, 2010 at 5:15 PM, Julie julie.su...@nextcentury.com wrote: I'm also baffled that after all compactions are done on every one of the 10 servers, about 5 out of 10 servers are still at 40% CPU usage, although they are doing 0 disk IO. I am not running anything else running on these

[OT] Real Time Open source solutions for aggregation and stream processing

2010-06-15 Thread Ian Holsman
firstly, my apologies for the off-topic message, but I thought most people on this list would be knowledgeable and interested in this kind of thing. We are looking to find a open source, scalable solution to do RT aggregation and stream processing (similar to what the 'hop' project

stalled streaming

2010-06-15 Thread aaron
hello, I have a 4 node cassandra cluster with 0.6.1 installed. We've been running a mixed read / write workload test how it works in our environment, we run about 4M bath mutations and 40M get_range_slice requests over 6 to 8 hours that load about 10 to 15 GB of data. Yesterday while there was

Re: stalled streaming

2010-06-15 Thread Benjamin Black
Known bug, fixed in latest 0.6 release. On Tue, Jun 15, 2010 at 3:29 PM, aaron aa...@thelastpickle.com wrote: hello, I have a 4 node cassandra cluster with 0.6.1 installed. We've been running a mixed read / write workload test how it works in our environment, we run about 4M bath mutations

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Charles Butterfield
Benjamin Black b at b3k.us writes: Then write slower. There is no free lunch. b Are you implying that clients need to throttle their collective load on the server to avoid causing the server to fail? That seems undesirable. Is this a side effect of a server bug, or is it part of the

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 3:55 PM, Charles Butterfield charles.butterfi...@nextcentury.com wrote: Benjamin Black b at b3k.us writes: Then write slower.  There is no free lunch. b Are you implying that clients need to throttle their collective load on the server to avoid causing the server

Re: stalled streaming

2010-06-15 Thread aaron
Thanks, will move to 0.6.2. Aaron On Tue, 15 Jun 2010 15:55:46 -0700, Benjamin Black b...@b3k.us wrote: Known bug, fixed in latest 0.6 release. On Tue, Jun 15, 2010 at 3:29 PM, aaron aa...@thelastpickle.com wrote: hello, I have a 4 node cassandra cluster with 0.6.1 installed. We've been

RE: read operation is slow

2010-06-15 Thread Dop Sun
Thanks for your updates, good to know that your performance is better now. Actually, if the user asks one record a time, usually it will be done in multi-threading, since most likely the requests coming from different users. If a single users want 200k, and there are no difference to get 1

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Charles Butterfield
Benjamin Black b at b3k.us writes: I am only saying something obvious: if you don't have sufficient resources to handle the demand, you should reduce demand, increase resources, or expect errors. Doing lots of writes without much heap space is such a situation (whether or not it is

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Jonathan Shook
Actually, you shouldn't expect errors in the general case, unless you are simply trying to use data that can't fit in available heap. There are some practical limitations, as always. If there aren't enough resources on the server side to service the clients, the expectation should be that the

Some questions about using Cassandra

2010-06-15 Thread Anthony Ikeda
We are currently looking at a distributed database option and so far Cassandra ticks all the boxes. However, I still have some questions. Is there any need for archiving of Cassandra and what backup options are available? As it is a no-data-loss system I'm guessing archiving is not exactly

Re: Some questions about using Cassandra

2010-06-15 Thread Jonathan Shook
There is JSON import and export, of you want a form of external backup. No, you can't hook event subscribers into the storage engine. You can modify it to do this, however. It may not be trivial. An easier way to do this would be to have a boundary system (or dedicated thread, for example)

Re: Some questions about using Cassandra

2010-06-15 Thread Jonathan Shook
Doh! Replace of with if in the top line. On Tue, Jun 15, 2010 at 7:57 PM, Jonathan Shook jsh...@gmail.com wrote: There is JSON import and export, of you want a form of external backup. No, you can't hook event subscribers into the storage engine. You can modify it to do this, however. It may

RE: Some questions about using Cassandra

2010-06-15 Thread Anthony Ikeda
Thanks Jonathan, I was only asking about the event listeners because an alternative we are considering is TIBCO Active Spaces which draws quite a lot of parallels to Cassandra. I guess it would be interesting to find out how other people use Cassandra, i.e., is it your one stop shop for data

Re: stalled streaming

2010-06-15 Thread Benjamin Black
This is not the bug to which I was referring. I don't recall the number, perhaps someone else can assist on that front? I just know I specifically upgraded to 0.6 trunk a bit before 0.6.2 to pick up the fix (and it worked). b On Tue, Jun 15, 2010 at 6:07 PM, Rob Coli rc...@digg.com wrote:

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:44 PM, Charles Butterfield charles.butterfi...@nextcentury.com wrote: I guess my point is that I have rarely run across database servers that die from either too many client connections, or too rapid client requests.  They generally stop accepting incoming connections

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:44 PM, Charles Butterfield charles.butterfi...@nextcentury.com wrote: To clarify the history here -- initially we were writing with CL=0 and had great performance but ended up killing the server.  It was pointed out that we were really asking the server to accept and

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:58 PM, Jonathan Shook jsh...@gmail.com wrote: If there aren't enough resources on the server side to service the clients, the expectation should be that the servers have a graceful performance degradation, or in the worst case throw an error specific to resource

Re: Some questions about using Cassandra

2010-06-15 Thread Rob Coli
On 6/15/10 6:35 PM, Benjamin Black wrote: jmhodges contributed a patch (I remain incompetent at Jira searches) for 'coprocessors' to do what you want. That'd be where I'd start looking. https://issues.apache.org/jira/browse/CASSANDRA-1016 =Rob

RE: Some questions about using Cassandra

2010-06-15 Thread Anthony Ikeda
Thanks Benjamin. Looking at the 'plugins' now :) -Original Message- From: Benjamin Black [mailto:b...@b3k.us] Sent: Wednesday, 16 June 2010 11:35 AM To: user@cassandra.apache.org Subject: Re: Some questions about using Cassandra On Tue, Jun 15, 2010 at 6:07 PM, Anthony Ikeda

Re: stalled streaming

2010-06-15 Thread Jonathan Ellis
I think the one you're referring to is https://issues.apache.org/jira/browse/CASSANDRA-1076 On Tue, Jun 15, 2010 at 8:16 PM, Benjamin Black b...@b3k.us wrote: This is not the bug to which I was referring.  I don't recall the number, perhaps someone else can assist on that front?  I just know I

Re: stalled streaming

2010-06-15 Thread Benjamin Black
Yes! On Tue, Jun 15, 2010 at 6:44 PM, Jonathan Ellis jbel...@gmail.com wrote: I think the one you're referring to is https://issues.apache.org/jira/browse/CASSANDRA-1076 On Tue, Jun 15, 2010 at 8:16 PM, Benjamin Black b...@b3k.us wrote: This is not the bug to which I was referring.  I don't

Re: JVM Options for Production

2010-06-15 Thread Jonathan Ellis
The main change you'd commonly make is decreasing the max new gen size on large heaps (say to 2GB) from the default of 1/3 of the heap. IMO keeping heap dump on OOM around is a good idea in production; it doesn't cost much (you're already screwed at the point where it starts writing a dump, so

RE: read operation is slow

2010-06-15 Thread caribbean410
Thank you for the update. For the select issue, right now we just focus on read and write, later we may test delete operation which need to query all keys. From: Dop Sun [mailto:su...@dopsun.com] ks Sent: Tuesday, June 15, 2010 4:14 PM To: user@cassandra.apache.org Subject: RE: read operation