RE: 1.1.5 Missing Insert! Strange Problem

2012-09-26 Thread Roshni Rajagopal
By any chance is a TTL (time to live ) set on the columns... Date: Tue, 25 Sep 2012 19:56:19 -0700 Subject: 1.1.5 Missing Insert! Strange Problem From: gouda...@gmail.com To: user@cassandra.apache.org Hi All, I have a 4 node cluster setup in 2 zones with NetworkTopology strategy and strategy

Re: 1.1.5 Missing Insert! Strange Problem

2012-09-26 Thread Arya Goudarzi
No. We don't use TTLs. On Tue, Sep 25, 2012 at 11:47 PM, Roshni Rajagopal roshni_rajago...@hotmail.com wrote: By any chance is a TTL (time to live ) set on the columns... -- Date: Tue, 25 Sep 2012 19:56:19 -0700 Subject: 1.1.5 Missing Insert! Strange Problem

Re: Integrated cassandra

2012-09-26 Thread Robin Verlangen
Some additional information: I already read about Embedding http://wiki.apache.org/cassandra/Embedding however that doesn't seem a rock solid solution to me. The word volatile is not really comforting me ;-) Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E

Re: Prevent queries from OOM nodes

2012-09-26 Thread Віталій Тимчишин
Actually an easy way to put cassandra down is select count(*) from A limit 1000 CQL will read everything into List to make latter a count. 2012/9/26 aaron morton aa...@thelastpickle.com Can you provide some information on the queries and the size of the data they traversed ? The default

Why periodical repairs?

2012-09-26 Thread Thomas Stets
The Cassandra Operations page (http://wiki.apache.org/cassandra/Operations) says: Unless your application performs no deletes, it is vital that production clusters run nodetool repair periodically on all nodes in the cluster. The hard requirement for repair frequency is the value used for

Re: Integrated cassandra

2012-09-26 Thread Vivek Mishra
I guess, you can always open/maintain a socket with running cassandra daemon and have a control over specific column families/keyspace or server itself. -Vivek On Wed, Sep 26, 2012 at 12:51 PM, Robin Verlangen ro...@us2.nl wrote: Some additional information: I already read about Embedding

Re: Integrated cassandra

2012-09-26 Thread Robin Verlangen
Do you have any ideas how to do this Vivek? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named

Re: Integrated cassandra

2012-09-26 Thread Vivek Mishra
if i am getting it correctly, then what you need to do is open a connection with cassandra daemon thread and access via client API, Have a look at: https://github.com/impetus-opensource/Kundera/blob/trunk/kundera-cassandra/src/test/java/com/impetus/client/persistence/CassandraCli.java here,

Re: Nodetool repair and Leveled Compaction

2012-09-26 Thread Omid Aladini
I think this JIRA answers your question: https://issues.apache.org/jira/browse/CASSANDRA-2610 which in order not to duplicate work (creation of Merkle trees) repair is done on all replicas for a range. Cheers, Omid On Tue, Sep 25, 2012 at 8:27 AM, Sergey Tryuber stryu...@gmail.com wrote: Hi

Re: is this a cassandra bug?

2012-09-26 Thread Sylvain Lebresne
You're mistaking 'key validation class' and 'comparator'. It is your key validation class that is DecimalType. Your comparator is UTF8Type, and yes, switching the comparator from UTF8Type to DecimalType is not allowed. -- Sylvain On Tue, Sep 25, 2012 at 10:13 PM, Hiller, Dean

Re: any ways to have compaction use less disk space?

2012-09-26 Thread Sylvain Lebresne
On Wed, Sep 26, 2012 at 2:35 AM, Rob Coli rc...@palominodb.com wrote: 150,000 sstables seem highly unlikely to be performant. As a simple example of why, on the read path the bloom filter for every sstable must be consulted... Unfortunately that's a bad example since that's not true. Leveled

any ideas on what these mean

2012-09-26 Thread Hiller, Dean
We were consistently getting this exception over and over as we put data into the system. A reboot caused it to go away but we don't want to be rebooting in the future…. 1. When does this occur? 2. Is it affecting my data put? (I have seen other weird validation exceptions where my data

Re: is this a cassandra bug?

2012-09-26 Thread Hiller, Dean
bump On 9/25/12 2:40 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Hmmm, is rowkey validation asynchronous to the actually sending of the data to cassandra? I seem to be able to put an invalid type and GET that invalid data back just fine even though my key type was an int and the key comparator

Re: Why periodical repairs?

2012-09-26 Thread Tyler Hobbs
The DistributedDeletes link in that section explains the root reason for needing to do this. It's not that deletes are forgotten, it's that a write (deletes are basically tombstone writes) didn't get replicated to all replicas. For example, at RF=3, write consistency level QUORUM, if one of the

Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
Hello everybody! I have 3 node cluster with replication factor of 3. each node has 800G disk and it used to have 100G of data. What is strange every time I run repair data takes almost 3 times more - 270G, then I run compaction and get 100G back. Unfortunately, yesterday I forget to compact and

Re: any ways to have compaction use less disk space?

2012-09-26 Thread Rob Coli
On Wed, Sep 26, 2012 at 6:05 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Wed, Sep 26, 2012 at 2:35 AM, Rob Coli rc...@palominodb.com wrote: 150,000 sstables seem highly unlikely to be performant. As a simple example of why, on the read path the bloom filter for every sstable must be

Re: Integrated cassandra

2012-09-26 Thread Aaron Turner
Cassandra is a distributed database meant to run across multiple systems. Is your existing Java application distributed as well? Does maintain control mean exclude end users from connecting to it and making changes or merely provisioning and keep it running well operationally for the application?

Re: Integrated cassandra

2012-09-26 Thread Robin Verlangen
Thank you both for your reply. We're not a 100% sure yet about what to use. The application itself is just as distributed as Cassandra is. It also embeds ElasticSearch. At this point I only see the ring as a real pain in the ass, as I have to automatically move nodes around to prevent unbalanced

Re: Why data tripled in size after repair?

2012-09-26 Thread Rob Coli
On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh ailin...@gmail.com wrote: [ repair ballooned my data size ] 1. Why repair almost triples data size? You didn't mention what version of cassandra you're running. In some old versions of cassandra (prior to 1.0), repair often creates even more

Truncate causing subsequent timeout on KeyIterator?

2012-09-26 Thread Conan Cook
Hi, I'm running a bunch of integration tests using an embedded cassandra instance via the Cassandra Maven Plugin v1.0.0-1, using Hector v1.0-5. I've got an issue where one of the tests is using a StringKeyIterator to iterate over all the keys in a CF, but it gets TimedOutExceptions every time

Re: Why data tripled in size after repair?

2012-09-26 Thread Peter Schuller
What is strange every time I run repair data takes almost 3 times more - 270G, then I run compaction and get 100G back. https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the maion issues with repair. In short - in your case the limited granularity of merkle trees is causing too much

pig and widerows

2012-09-26 Thread William Oberman
Hi, I'm trying to figure out what's going on with my cassandra/hadoop/pig system. I created a mini copy of my main cassandra data by randomly subsampling to get ~50,000 keys. I was then writing pig scripts but also the equivalent operation using simple single threaded code to double check pig.

Re:

2012-09-26 Thread aaron morton
That looks right to me. btw, most people use CLI or CQL scripts to manage the schema Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/09/2012, at 7:59 PM, Manu Zhang owenzhang1...@gmail.com wrote: Is there an example to update

Re: Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 11:07 AM, Rob Coli rc...@palominodb.com wrote: On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh ailin...@gmail.com wrote: [ repair ballooned my data size ] 1. Why repair almost triples data size? You didn't mention what version of cassandra you're running. In some old

Re: a node stays in joining

2012-09-26 Thread aaron morton
But the Load keeps on increasing. Sounds like the nodes are / were sending it data. nodetool netstats will show you what's going on. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/09/2012, at 10:22 PM, Satoshi Yamada

Re: Running repair negatively impacts read performance?

2012-09-26 Thread aaron morton
Sounds very odd. Is read performance degrading _after_ repair and compactions that normally result have completed ? What Compaction Strategy ? What OS and JVM ? What are are the bloom filter false positive stats from cf stats ? Do you have some read latency numbers from cfstats ? Also,

Re:

2012-09-26 Thread aaron morton
Set the caching strategy for the CF to be ROWS_ONLY. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/09/2012, at 2:18 PM, Manu Zhang owenzhang1...@gmail.com wrote: The DEFAULT_CACHING_STRATEGY is Caching.KEYS_ONLY but even configuring

1000's of column families

2012-09-26 Thread Hiller, Dean
We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When using the tools they are all geared to analyzing ONE column family at a time :(. If I remember correctly, Cassandra supports as many CF's as you want, correct? Even though I am going to have tons of funs with

Data Modeling: Comments with Voting

2012-09-26 Thread Drew Kutcharian
Hi Guys, Wondering what would be the best way to model a flat (no sub comments, i.e. twitter) comments list with support for voting (where I can sort by create time or votes) in Cassandra? To demonstrate: Sorted by create time: - comment 1 (5 votes) - comment 2 (1 votes) - comment 3 (no

Re: downgrade from 1.1.4 to 1.0.X

2012-09-26 Thread Radim Kolar
We have paid tool capable of downgrading cassandra 1.2, 1.1, 1.0, 0.8.

Re: Data Modeling: Comments with Voting

2012-09-26 Thread Kirk True
Depending on your needs, you could simply duplicate the comments in two separate CFs with the column names including time in one and the vote in the other. If you allow for updates to the comments, that would pose some issues you'd need to solve at the app level. On 9/26/12 4:28 PM, Drew

Re:

2012-09-26 Thread Manu Zhang
I mean I have modifications only on one column; do I have to add the rest columns as well? On Thu, Sep 27, 2012 at 5:18 AM, aaron morton aa...@thelastpickle.comwrote: That looks right to me. btw, most people use CLI or CQL scripts to manage the schema Cheers - Aaron

is node tool row count always way off?

2012-09-26 Thread Hiller, Dean
The node tool cfstats, what is the row count estimate usually off by(what percentage? Or what absolute number?) We have a CF with 4 rows that prints this out…. Column Family: bacnet11700AnalogInput8 SSTable count: 3 Space used (live): 13526

Re:

2012-09-26 Thread Manu Zhang
I still don't see it in jconsole. BTW, how long would you expect to cost to read a column family of 15 rows if it fits into row cache entirely? It takes me around 7s now. My experiment is done on a single node. On Thu, Sep 27, 2012 at 6:00 AM, aaron morton aa...@thelastpickle.comwrote: Set