Re: Does variation in no of columns in rows over the column family has any performance impact ?

2011-02-07 Thread Peter Schuller
> Does huge variation in no. of columns in rows, over the column family > has *any* impact on the performance ? > > Can I have like just 100 columns in some rows and like hundred > thousands of columns in another set of rows, without any downsides ? If I interpret your question the way I think you

Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-07 Thread David Boxenhorn
Why not synchronize on the client side? Make sure that the process that allocates user ids runs on only a single machine, in a synchronized method, and uses QUORUM for its reads and writes to Cassandra? On Sun, Feb 6, 2011 at 11:02 PM, Aaron Morton wrote: > If you mix mysql and Cassandra you risk

OOM during batch_mutate

2011-02-07 Thread Patrik Modesto
Hi all! I'm running into OOM problem during batch_mutate. I've a test cluster of two servers, 32GB and 16GB RAM, real HW. I've one keyspace and one CF with 1,4mil rows, each 10 columns. A row is around 5k in size. I run Hadoop MR task that reads one column and generates Mutation that updates anoth

Re: Does variation in no of columns in rows over the column family has any performance impact ?

2011-02-07 Thread Aditya Narayan
Thanks for the detailed explanation Peter! Definitely cleared my doubts ! On Mon, Feb 7, 2011 at 1:52 PM, Peter Schuller wrote: >> Does huge variation in no. of columns in rows, over the column family >> has *any* impact on the performance ? >> >> Can I have like just 100 columns in some rows a

Java Cassandra Driver: Using CQL

2011-02-07 Thread Vivek Mishra
Hi, Recently I worked on implementation of java jdbc driver for cassandra using CQL. Given below is an example code base(with basic features) about how to use it: import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.s

[0.7.1] Error in ThreadPoolExecutor

2011-02-07 Thread Patrik Modesto
Hi, on my two-node test setup I get repeatedly following error: The 10.0.18.129 server log: INFO 14:10:37,707 Node /10.0.18.99 has restarted, now UP again INFO 14:10:37,708 Checking remote schema before delivering hints INFO 14:10:37,708 Sleeping 45506ms to stagger hint delivery INFO 14:10:3

Re: OOM during batch_mutate

2011-02-07 Thread Patrik Modesto
I forgot to mention I use 0.7.0 stable version. HTH, Patrik

Re: OOM during batch_mutate

2011-02-07 Thread Patrik Modesto
Just tried current 0.7.1 from cassandra-0.7 branch and it does the same. OOM after three runs. -Xm* setting is computed by cassandra-env.sh like this: -Xms8022M -Xmx8022M -Xmn2005M What am I doing wrong? Thanks, Patrik On Mon, Feb 7, 2011 at 14:18, Patrik Modesto wrote: > I forgot to mention

[0.7.1] more exceptions: Illegal mode

2011-02-07 Thread Patrik Modesto
INFO 15:30:49,647 Compacted to /www/foo/cassandra/data/foo/Url-tmp-f-767-Data.db. 4,199,999,762 to 4,162,579,242 (~99% of original) bytes for 379,179 keys. Time: 137,149ms. ERROR 15:30:49,699 Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.RuntimeException: java.lang.Illeg

Re: [0.7.1] more exceptions: Illegal mode

2011-02-07 Thread Thibaut Britz
I think this is related to a faulty disk. On Mon, Feb 7, 2011 at 3:35 PM, Patrik Modesto wrote: > INFO 15:30:49,647 Compacted to > /www/foo/cassandra/data/foo/Url-tmp-f-767-Data.db.  4,199,999,762 to > 4,162,579,242 (~99% of original) bytes for 379,179 keys.  Time: > 137,149ms. > ERROR 15:30:49,

Re: OOM during batch_mutate

2011-02-07 Thread sridhar basam
Looks like you don't have a big enough working set from your GC logs, there doesn't seem to be a lot being reclaimed in the GC process. The process is reclaiming a few hundred MB and is running every few seconds. How big are your caches? The probable reason that it works the first couple times when

Re: Does variation in no of columns in rows over the column family has any performance impact ?

2011-02-07 Thread Edward Capriolo
On Mon, Feb 7, 2011 at 5:40 AM, Aditya Narayan wrote: > Thanks for the detailed explanation Peter! Definitely cleared my doubts ! > > > > On Mon, Feb 7, 2011 at 1:52 PM, Peter Schuller > wrote: >>> Does huge variation in no. of columns in rows, over the column family >>> has *any* impact on the p

Re: OOM during batch_mutate

2011-02-07 Thread Patrik Modesto
On Mon, Feb 7, 2011 at 15:44, sridhar basam wrote: > Looks like you don't have a big enough working set from your GC logs, there > doesn't seem to be a lot being reclaimed in the GC process. The process is > reclaiming a few hundred MB and is running every few seconds. How big are > your caches? T

Re: [0.7.1] more exceptions: Illegal mode

2011-02-07 Thread Patrik Modesto
On Mon, Feb 7, 2011 at 15:42, Thibaut Britz wrote: > I think this is related to a faulty disk. I'm not sure thats the problem. Cassandra 0.7.0 didn't report any problem. It started with Cassandra 0.7.1. Patrik

Re: [0.7.1] more exceptions: Illegal mode

2011-02-07 Thread Jake Luciani
This sounds like a possible bug since the BRAF was re-written in 0.7.1. Could you open a ticket? On Mon, Feb 7, 2011 at 10:32 AM, Patrik Modesto wrote: > On Mon, Feb 7, 2011 at 15:42, Thibaut Britz > wrote: > > I think this is related to a faulty disk. > > I'm not sure thats the problem. Cassand

Re: CF Read and Write Latency Histograms

2011-02-07 Thread Chris Burroughs
On 02/04/2011 12:43 PM, Jonathan Ellis wrote: > Can you create a ticket? I noticed the same thing. CASSANDRA-2123 created.

Re: Java Cassandra Driver: Using CQL

2011-02-07 Thread Eric Evans
On Mon, 2011-02-07 at 12:08 +, Vivek Mishra wrote: > Recently I worked on implementation of java jdbc driver for cassandra > using CQL. Given below is an example code base(with basic features) > about how to use it: [ ... ] Nice! > I am not sure if there is any JIRA related to this. With suc

Re: Ruby thrift is trying to write Time as string

2011-02-07 Thread Ryan King
On Sat, Feb 5, 2011 at 10:12 PM, Joshua Partogi wrote: > Hi, > > I don't know whether my assumption is right or not. When I tried to insert a > Time value into a column I am getting this exception: > > vendor/ruby/1.8/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:106:in > `write_string'

Re: postgis > cassandra?

2011-02-07 Thread Mike Malone
It's not really the storage of spatial data that's tricky. We use geojson as a wire-line format at the higher levels of our system (e.g., the HTTP API). But the hard part is organizing the data for efficient retrieval and keeping those indices consistent with the data being indexed. Efficient multi

Re: Does variation in no of columns in rows over the column family has any performance impact ?

2011-02-07 Thread Daniel Doubleday
It depends a little on your write pattern: - Wide rows tend to get distributed over more sstables so more disk reads are necessary. This will become noticeable when you have high io load and reads actually hit the discs. - If you delete a lot slice query performance might suffer: extreme example

Best way to detect/fix bitrot today?

2011-02-07 Thread Anand Somani
Hi, Our application space is such that there is data that might not be read for a long time. The data is mostly immutable. How should I approach detecting/solving the bitrot problem? One approach is read data and let read repair do the detection, but given the size of data, that does not look very

RE: hadoop library conflict with cassandra (hector) 0.7

2011-02-07 Thread Curt Allred
Hello, I'm trying to access a Cassandra 0.7 cluster in a hadoop map-reduce job (hadoop 0.20.2) and seeing a thrift library conflict. Hadoop uses an older version of thrift than hector 0.7, and this older version is getting picked up by my job, causing the following exception: FATAL org.apache.

Re: Best way to detect/fix bitrot today?

2011-02-07 Thread Peter Schuller
> Our application space is such that there is data that might not be read for > a long time. The data is mostly immutable. How should I approach > detecting/solving the bitrot problem? One approach is read data and let read > repair do the detection, but given the size of data, that does not look v

Re: Argh: Data Corruption (LOST DATA) (0.7.0)

2011-02-07 Thread Ben Coverston
Dan, Do you have any more information on this issue? Have you been able to discover anything from exporing your SSTables to JSON? Thanks, Ben On 1/29/11 12:45 PM, Dan Hendry wrote: I am once again having severe problems with my Cassandra cluster. This time, I straight up cannot read sectio

Java bombs during compaction, please help

2011-02-07 Thread buddhasystem
Hello, one node in my 3-machine cluster cannot perform compaction. I tried multiple times, it ran out of heap space once and I increased it. Now I'm getting the dump below (after it does run for a few minutes). I hope somebody can shed a little light on what' going on, because I'm at a loss and th

Re: Best way to detect/fix bitrot today?

2011-02-07 Thread Anthony John
Some RAID storage might do it, potentially more efficiently!! Rhetorical question - Does Cassandra's architecture of reconciling reads over multiple copies of the same data provide an even more interesting answer? I submit - YES! All bitrot protection mechanisms involve some element of redundant

time to live rows

2011-02-07 Thread Kallin Nagelberg
Hey, I have read about the new TTL columns in Cassandra 0.7. In my case I'd like to expire an entire row automatically after a certain amount of time. Is this possible as well? Thanks, -Kal

Deleted columns still coming back; CASSANDRA-{1748,1837} alive in 0.6.x?

2011-02-07 Thread Scott McCarty
Hi, Does anyone know if anything similar to https://issues.apache.org/jira/browse/CASSANDRA-1748 or https://issues.apache.org/jira/browse/CASSANDRA-1837 exists in 0.6.x releases? Both of those bugs look like they were introduced, found, and fixed in 0.7, and CASSANDRA-1837 comments indicate that

Re: time to live rows

2011-02-07 Thread Bill Speirs
I don't think this is supported (but I could be completely wrong). However, I'd love to see this functionality as well. How would one go about requesting such a feature? Bill- On Mon, Feb 7, 2011 at 4:15 PM, Kallin Nagelberg wrote: > Hey, > > I have read about the new TTL columns in Cassandra 0

Re: Best way to detect/fix bitrot today?

2011-02-07 Thread Peter Schuller
> Some RAID storage might do it, potentially more efficiently!! People keep claiming that but I have yet to confirm that a hardware raid does actual checksumming as opposed to just healing bad blocks. But yes, they might :) > Food for thought, or wild imagination ? That was my intent. Checksummi

Re: Deleted columns still coming back; CASSANDRA-{1748,1837} alive in 0.6.x?

2011-02-07 Thread Aaron Morton
Was there a time where nodetool repair was not run frequently ?There are some steps listed here to reset issues around tombstones coming back to life http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSecondsWhy do you run nodetool

Re: time to live rows

2011-02-07 Thread Aaron Morton
Deleting all the columns in a row via TTL has the same affect as deleting th row, the data will physically by removed during compaction. AaronOn 08 Feb, 2011,at 10:24 AM, Bill Speirs wrote:I don't think this is supported (but I could be completely wrong). However, I'd love to see this functionalit

Re: time to live rows

2011-02-07 Thread Kallin Nagelberg
I tried that but I still see the row coming back on a list in the CLI. My concern is that there will be a pointer to an empty row for all eternity. -Kal On Mon, Feb 7, 2011 at 4:38 PM, Aaron Morton wrote: > Deleting all the columns in a row via TTL has the same affect as deleting th > row, the

Re: Deleted columns still coming back; CASSANDRA-{1748,1837} alive in 0.6.x?

2011-02-07 Thread Scott McCarty
Yes, we failed to run nodetool repair for quite a while and I believe it might have been our situation that prompted the addition of that info to the wiki :-) We've tried/are trying two of the suggested steps there, but haven't done the process of removing/reinserting the pseudo-failed nodes (all

Re: time to live rows

2011-02-07 Thread Kallin Nagelberg
I also tried forcing a major compaction on the column family using JMX but the row remains. On Mon, Feb 7, 2011 at 4:43 PM, Kallin Nagelberg wrote: > I tried that but I still see the row coming back on a list > in the CLI. My concern is that there will be a pointer > to an empty row for all eter

unique key generation

2011-02-07 Thread Kallin Nagelberg
Hey, I am developing a session management system using Cassandra and need to generate unique sessionIDs (cassandra columnfamily keys). Does anyone know of an elegant/simple way to accomplish this? I am not sure about using time based uuids on the client as there a chance that multiple clients coul

Re: [0.7.1] more exceptions: Illegal mode

2011-02-07 Thread Aaron Morton
I noticed this as well in a machine that was left running with the current 0.7 branch code. Created https://issues.apache.org/jira/browse/CASSANDRA-2131aaronOn 08 Feb, 2011,at 04:34 AM, Jake Luciani wrote:This sounds like a possible bug since the BRAF was re-written in 0.7.1. Could you open a tick

Re: unique key generation

2011-02-07 Thread Victor Kabdebon
Hello Kallin. If you use timeUUID the chance to generate two time the same uuid is the following : considering that both client generate the uuid at the *same millisecond*, the chance of generating the same uuid is : 1/1.84467441 × 1019Which is equal to the probability for winning a national lotte

Re: unique key generation

2011-02-07 Thread Kallin Nagelberg
Maybe I can just use java5's UUID.. Need to research how this is effective across multiple clients.. On Mon, Feb 7, 2011 at 4:57 PM, Kallin Nagelberg wrote: > Hey, > > I am developing a session management system using Cassandra and need > to generate unique sessionIDs (cassandra columnfamily keys

Re: order of index expressions

2011-02-07 Thread Jonathan Ellis
On Sun, Feb 6, 2011 at 11:03 AM, Shaun Cutts wrote: > What I think you should be doing is the following: open iterators on the > matching keys for each of the indexes; the inside loop would pick an iterator > at random, and pull a match from it. This would assure that the expected > number of e

Re: seed node failure crash the whole cluster

2011-02-07 Thread Jonathan Ellis
On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing wrote: > cassandra version: 0.7 > > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT > > cluster: 3 machines (A, B, C) > > details: > it works perfectly when all 3 machines are up and running > > but if the seed machine is down, the problems hap

Re: [0.7.1] Error in ThreadPoolExecutor

2011-02-07 Thread Jonathan Ellis
Can you open a ticket for this? And are you using order-preserving partitioner? On Mon, Feb 7, 2011 at 7:16 AM, Patrik Modesto wrote: > Hi, > > on my two-node test setup I get repeatedly following error: > > The 10.0.18.129 server log: > >  INFO 14:10:37,707 Node /10.0.18.99 has restarted, now U

Re: OOM during batch_mutate

2011-02-07 Thread Jonathan Ellis
Sounds like the keyspace was created on the 32GB machine, so it guessed memtable sizes that are too large when run on the 16GB one. Use "update column family" from the cli to cut the throughput and operations thresholds in half, or to 1/4 to be cautious. On Mon, Feb 7, 2011 at 9:00 AM, Patrik Mode

Re: Java bombs during compaction, please help

2011-02-07 Thread Jonathan Ellis
I've patched ColumnSortedMap on the 0.7 branch to not swallow the IOException it's getting. On Mon, Feb 7, 2011 at 3:02 PM, buddhasystem wrote: > > Hello, > one node in my 3-machine cluster cannot perform compaction. I tried multiple > times, it ran out of heap space once and I increased it. Now

Re: seed node failure crash the whole cluster

2011-02-07 Thread Dan Washusen
Hi, I've added some comments and questions inline. Cheers, Dan On 8 February 2011 10:00, Jonathan Ellis wrote: > On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing wrote: > > cassandra version: 0.7 > > > > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT > > > > cluster: 3 machines (A, B, C)

Re: Java bombs during compaction, please help

2011-02-07 Thread buddhasystem
Thanks Jonathan -- does it mean that the machine is experiencing IO problems? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Java-bombs-during-compaction-please-help-tp6001773p6002320.html Sent from the cassandra-u...@incubator.apache.org maili

Re: unique key generation

2011-02-07 Thread Patricio Echagüe
If you are a Hector user, TimeUUIDUtils can be used to create Time UUIDs. https://github.com/rantav/hector/blob/master/core/src/main/java/me/prettyprint/cassandra/utils/TimeUUIDUtils.java On Mon, Feb 7, 2011 at 2:11 PM, Kallin Nagelberg wrote: > Maybe I can just use java5's UUID.. Need to rese

Re: order of index expressions

2011-02-07 Thread Shaun Cutts
Jonathan, Thanks for your thoughts > On Sun, Feb 6, 2011 at 11:03 AM, Shaun Cutts wrote: >> What I think you should be doing is the following: open iterators on the >> matching keys for each of the indexes; the inside loop would pick an >> iterator at random, and pull a match from it. This

Re: Deleted columns still coming back; CASSANDRA-{1748,1837} alive in 0.6.x?

2011-02-07 Thread Peter Schuller
> Yes, we failed to run nodetool repair for quite a while and I believe it > might have been our situation that prompted the addition of that info to the > wiki :-) So is it correct then that nowadays, you hope to not be violating the repair frequency requirement but you're still seeing data pop u

unsubscribe

2011-02-07 Thread mike dooley
unsubscribe

Re: Do supercolumns have a purpose?

2011-02-07 Thread Shaun Cutts
I'm a newbie here, but, with apologies for my presumptuousness, I think you should deprecate SuperColumns. They are already distracting you, and as the years go by the cost of supporting them as you add more and more functionality is only likely to get worse. It would be better to concentrate o

Re: Ruby thrift is trying to write Time as string

2011-02-07 Thread Joshua Partogi
Thanks Ryan. That makes more sense now. So I should instead find a way to (de)serialize Ruby objects to string vice versa when inserting to Column. Kind regards, Joshua On Tue, Feb 8, 2011 at 4:43 AM, Ryan King wrote: > On Sat, Feb 5, 2011 at 10:12 PM, Joshua Partogi > wrote: > > Hi, > > > >

Re: seed node failure crash the whole cluster

2011-02-07 Thread TSANG Yiu Wing
i will continue the issue here: http://groups.google.com/group/scale7/browse_thread/thread/dd74f1d6265ae2e7 thanks On Tue, Feb 8, 2011 at 7:44 AM, Dan Washusen wrote: > Hi, > I've added some comments and questions inline. > > Cheers, > Dan > On 8 February 2011 10:00, Jonathan Ellis wrote: >>

Re: unique key generation

2011-02-07 Thread Kallin Nagelberg
Pretty sure it also uses mac address, so chances are very slim. I'll check out time uuid too, thanks. On 7 Feb 2011 17:11, "Victor Kabdebon" wrote: Hello Kallin. If you use timeUUID the chance to generate two time the same uuid is the following : considering that both client generate the uuid at

Cassandra memory consumption

2011-02-07 Thread Victor Kabdebon
Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I a

Best Approaches for Developer Integration

2011-02-07 Thread Paul Querna
Hi, Lets suppose you are using Cassandra happily in production, but you have an army of coders, with varying levels of knowledge about Cassandra. Currently we have hid most of our developers from the Cassandra dependency by using a Fake interface that returns fake data from it, but this is turnin

Re: Best Approaches for Developer Integration

2011-02-07 Thread Paul Brown
On Feb 7, 2011, at 10:28 PM, Paul Querna wrote: > So, I guess this is coming down to: > 1) Has anyone built any easy to install packages of Cassandra? I didn't find it necessary. I implemented a simple embedding wrapper for Cassandra so that it could be started as part of a web application lif

Re: [0.7.1] Error in ThreadPoolExecutor

2011-02-07 Thread Patrik Modesto
Hi, here is the ticket: https://issues.apache.org/jira/browse/CASSANDRA-2134 I'm using the default partitioner, that should be the RandomPartitioner. HTH, Patrik On Tue, Feb 8, 2011 at 00:03, Jonathan Ellis wrote: > Can you open a ticket for this?  And are you using order-preserving > partit

Re: time to live rows

2011-02-07 Thread Stu Hood
The expired columns were converted into tombstones, which will live for the GC timeout. The "empty" row will be cleaned up when those tombstones are removed. Returning the empty row is unfortunate... we'd love to find a more appropriate solution that might not involve endless scanning. See http:/