Re: RF=1 w/ hadoop jobs

2011-09-02 Thread Patrik Modesto
Hi,

On Thu, Sep 1, 2011 at 12:36, Mck m...@apache.org wrote:
 It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8)

 I'm interested in this patch and see it's usefulness but no one will act
 until you attach it to an issue. (I think a new issue is appropriate
 here).

I'm glad someone is interestet in my patch usefull. As Jonathan
already explained himself: ignoring unavailable ranges is a
misfeature, imo I'm thinking opening a new ticket without support
from more users is useless ATM. Please test the patch and if you like
it, than there is time for ticket.

Regards,
P.


Re: RF=1 w/ hadoop jobs

2011-09-02 Thread Mick Semb Wever
On Fri, 2011-09-02 at 08:20 +0200, Patrik Modesto wrote:
 As Jonathan
 already explained himself: ignoring unavailable ranges is a
 misfeature, imo 

Generally it's not what one would want i think.
But I can see the case when data is to be treated volatile and ignoring
unavailable ranges may be acceptable. 

For example if you searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. The worse case
scenario here is no worse than the job always failing on you. Although
some indication of ranges ignored is required.

Another example is when your just trying to extract a small random
sample (like a pig SAMPLE) of data out of cassandra.

Patrik: is it possible to describe the use-case you have here?

~mck

-- 
“The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore, all
progress depends on the unreasonable man.” - George Bernard Shaw 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |



signature.asc
Description: This is a digitally signed message part


Re: Replicate On Write behavior

2011-09-02 Thread Sylvain Lebresne
On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne dha...@gmx.3crowd.com wrote:
 I'm curious... digging through the source, it looks like replicate on write 
 triggers a read of the entire row, and not just the columns/supercolumns that 
 are affected by the counter update.  Is this the case?  It would certainly 
 explain why my inserts/sec decay over time and why the average insert latency 
 increases over time.  The strange thing is that I'm not seeing disk read IO 
 increase over that same period, but that might be due to the OS buffer 
 cache...

It does not. It only reads the columns/supercolumns affected by the
counter update.
In the source, this happens in CounterMutation.java. If you look at
addReadCommandFromColumnFamily you'll see that it does a query by name
only for the column involved in the update (the update is basically
the content of the columnFamily parameter there).

And Cassandra does *not* always reads a full row. Never had, never will.

 On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
 ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that normal? 
  I'm using RandomPartitioner...

 Address         DC          Rack        Status State   Load            Owns   
  Token
                                                                            
 136112946768375385385349842972707284580
 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  0
 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  
 34028236692093846346337460743176821145
 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  
 68056473384187692692674921486353642290
 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  
 102084710076281539039012382229530463435
 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  
 136112946768375385385349842972707284580

 The nodes with ReplicateOnWrites are the 3 in the middle.  The first node and 
 last node both have a count of 0.  This is a clean cluster, and I've been 
 doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours.  
 The last time this test ran, it went all the way down to 500 inserts/sec 
 before I killed it.

Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.

--
Sylvain


Re: RF=1 w/ hadoop jobs

2011-09-02 Thread Patrik Modesto
On Fri, Sep 2, 2011 at 08:54, Mick Semb Wever m...@apache.org wrote:
 Patrik: is it possible to describe the use-case you have here?

Sure.

We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node. But Hadoop does
care, well to be specific the Cassnadra ColumnInputFormat and
ColumnRecortReader are the problem parts. If I stop one Cassandra node
all MR jobs that read/write Cassandra fail. In our case, it doesn't
matter, we can skip the range of URLs. The MR jobs run in a tight
loop, so when the node is back with it's data, we use them. It's not
only about some HW crash but it makes maintenance quite difficult. To
stop a Cassandra node, you have to stop tasktracker there too which is
unfortunate as there are another MR jobs that don't need Cassandra and
can happily run.

Regards,
P.


Re: Removal of old data files

2011-09-02 Thread Sylvain Lebresne
On Fri, Sep 2, 2011 at 12:11 AM,  hiroyuki.watan...@barclayscapital.com wrote:
 Yes, I see files with name like
     Orders-g-6517-Compacted

 However, all of those file have a size of 0.

 Starting from Monday to Thurseday we have 5642 files for -Data.db,
 -Filter.db and Statistics.db and only 128 -Compacted files.
 and all of -Compacted file has size of 0.

 Is this normal, or we are doing something wrong?

You are not doing something wrong. The -Compacted files are just
marker, to indicate
that the -Data file corresponding (with the same number) are, in fact,
compacted and
will eventually be removed. So those files will always have a size of 0.

--
Sylvain



 yuki

 
 From: aaron morton [mailto:aa...@thelastpickle.com]
 Sent: Thursday, August 25, 2011 6:13 PM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files

 If cassandra does not have enough disk space to create a new file it will
 provoke a JVM GC which should result in compacted SStables that are no
 longer needed been deleted. Otherwise they are deleted at some time in the
 future.
 Compacted SSTables have a file written out with a compacted extension.
 Do you see compacted sstables in the data directory?
 Cheers.
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

 We are using Cassandra 0.8.0 with 8 node ring and only one CF.
 Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to
 43200
 (12 hours).  We have to store massive amount of data for one day now and
 eventually for five days if we get more disk space.
 Even for one day, we do run out disk space in a busy day.

 We run nodetool compact command at night or as necessary then we run GC from
 jconsole. We observed that  GC did remove files but not necessarily oldest
 ones.
 Data files from more than 36 hours ago and quite often three days ago are
 still there.

 Does this behavior expected or we need adjust some other parameters?


 Yuki Watanabe

 ___



 This e-mail may contain information that is confidential, privileged or
 otherwise protected from disclosure. If you are not an intended recipient of
 this e-mail, do not duplicate or redistribute it by any means. Please delete
 it and any attachments and notify the sender that you have received it in
 error. Unless specifically indicated, this e-mail is not an offer to buy or
 sell or a solicitation to buy or sell any securities, investment products or
 other financial product or service, an official confirmation of any
 transaction, or an official statement of Barclays. Any views or opinions
 presented are solely those of the author and do not necessarily represent
 those of Barclays. This e-mail is subject to terms available at the
 following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
 you consent to the foregoing.  Barclays Capital is the investment banking
 division of Barclays Bank PLC, a company registered in England (number
 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
 This email may relate to or be sent from other members of the Barclays
 Group.

 ___


Re: Replicate On Write behavior

2011-09-02 Thread David Hawthorne
That's interesting.  I did an experiment wherein I added some entropy to the 
row name based on the time when the increment came in, (e.g. row = row + / + 
(timestamp - (timestamp % 300))) and now not only is the load (in GB) on my 
cluster more balanced, the performance has not decayed and has stayed steady 
(inserts/sec) with a relatively low average ms/insert.  Each row is now 
significantly shorter as a result of this change.



On Sep 2, 2011, at 12:30 AM, Sylvain Lebresne wrote:

 On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne dha...@gmx.3crowd.com wrote:
 I'm curious... digging through the source, it looks like replicate on write 
 triggers a read of the entire row, and not just the columns/supercolumns 
 that are affected by the counter update.  Is this the case?  It would 
 certainly explain why my inserts/sec decay over time and why the average 
 insert latency increases over time.  The strange thing is that I'm not 
 seeing disk read IO increase over that same period, but that might be due to 
 the OS buffer cache...
 
 It does not. It only reads the columns/supercolumns affected by the
 counter update.
 In the source, this happens in CounterMutation.java. If you look at
 addReadCommandFromColumnFamily you'll see that it does a query by name
 only for the column involved in the update (the update is basically
 the content of the columnFamily parameter there).
 
 And Cassandra does *not* always reads a full row. Never had, never will.
 
 On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
 ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that 
 normal?  I'm using RandomPartitioner...
 
 Address DC  RackStatus State   LoadOwns  
   Token

 136112946768375385385349842972707284580
 10.0.0.57datacenter1 rack1   Up Normal  2.26 GB 20.00%  0
 10.0.0.56datacenter1 rack1   Up Normal  2.47 GB 20.00%  
 34028236692093846346337460743176821145
 10.0.0.55datacenter1 rack1   Up Normal  2.52 GB 20.00%  
 68056473384187692692674921486353642290
 10.0.0.54datacenter1 rack1   Up Normal  950.97 MB   20.00%  
 102084710076281539039012382229530463435
 10.0.0.72datacenter1 rack1   Up Normal  383.25 MB   20.00%  
 136112946768375385385349842972707284580
 
 The nodes with ReplicateOnWrites are the 3 in the middle.  The first node 
 and last node both have a count of 0.  This is a clean cluster, and I've 
 been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 
 hours.  The last time this test ran, it went all the way down to 500 
 inserts/sec before I killed it.
 
 Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.
 
 --
 Sylvain



SSTableSimpleUnsortedWriter take long time when inserting big rows

2011-09-02 Thread Benoit Perroud
Hi All,

I started using SSTableSimpleUnsortedWriter to load data, and my data
has a few rows but a lot of column name in each rows.

I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted.

But the time taken to insert columns is increasing as the column
family is increasing. The problem appears because everytime we call
newRow, all the columns of the previous CF is added to the new CF.

Attached is a small patch that check which is the smallest CF, and add
the smallest CF to the biggest one.

Should I open I bug for that ?

Thanks in advance,

Benoit
Index: src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java
===
--- src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java	(revision 1164377)
+++ src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java	(working copy)
@@ -73,9 +73,17 @@
 
 // Note that if the row was existing already, our size estimation will be slightly off
 // since we'll be counting the key multiple times.
-if (previous != null)
-columnFamily.addAll(previous);
-
+if (previous != null) {
+// Add the smallest CF to the other one
+if (columnFamily.getSortedColumns().size()  previous.getSortedColumns().size()) {
+previous.addAll(columnFamily);
+// Re-add the previous CF to the map because it has been overwritten
+keys.put(key, previous);
+} else {
+columnFamily.addAll(previous);
+}
+}
+
 if (currentSize  bufferSize)
 sync();
 }


Re: SSTableSimpleUnsortedWriter take long time when inserting big rows

2011-09-02 Thread Sylvain Lebresne
On Fri, Sep 2, 2011 at 10:29 AM, Benoit Perroud ben...@noisette.ch wrote:
 Hi All,

 I started using SSTableSimpleUnsortedWriter to load data, and my data
 has a few rows but a lot of column name in each rows.

 I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted.

 But the time taken to insert columns is increasing as the column
 family is increasing. The problem appears because everytime we call
 newRow, all the columns of the previous CF is added to the new CF.

If I understand correctly, each row has way more that 10 000 columns, but
you call newRow every 10 000 columns, right ?

Note that you have the possibility to decrease the frequency of the calls to
newRow.

But anyway, I agree that the code shouldn't suck like that.

 Attached is a small patch that check which is the smallest CF, and add
 the smallest CF to the biggest one.

 Should I open I bug for that ?

Please do. I'm actually thinking of a slightly different fix: we should not have
to add all the previous columns to the new column family, we should just
directly reuse the previous column family when adding the new column.
But the JIRA ticket will be a better place to discuss this.

--
Sylvain


Re: SSTableSimpleUnsortedWriter take long time when inserting big rows

2011-09-02 Thread Benoit Perroud
Thanks for your answer.

2011/9/2 Sylvain Lebresne sylv...@datastax.com:
 On Fri, Sep 2, 2011 at 10:29 AM, Benoit Perroud ben...@noisette.ch wrote:
 Hi All,

 I started using SSTableSimpleUnsortedWriter to load data, and my data
 has a few rows but a lot of column name in each rows.

 I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted.

 But the time taken to insert columns is increasing as the column
 family is increasing. The problem appears because everytime we call
 newRow, all the columns of the previous CF is added to the new CF.

 If I understand correctly, each row has way more that 10 000 columns, but
 you call newRow every 10 000 columns, right ?

Yes. I call newRow every 10 000 columns to be sure to flush as soon as possible.

 Note that you have the possibility to decrease the frequency of the calls to
 newRow.

 But anyway, I agree that the code shouldn't suck like that.

 Attached is a small patch that check which is the smallest CF, and add
 the smallest CF to the biggest one.

 Should I open I bug for that ?

 Please do. I'm actually thinking of a slightly different fix: we should not 
 have
 to add all the previous columns to the new column family, we should just
 directly reuse the previous column family when adding the new column.
 But the JIRA ticket will be a better place to discuss this.

Opened : https://issues.apache.org/jira/browse/CASSANDRA-3122
Let's discuss there.

Thanks !

Benoit.

 --
 Sylvain



Re: cassandra-cli describe / dump command

2011-09-02 Thread J T
Thats brilliant, thanks.

On Thu, Sep 1, 2011 at 7:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 yes, cli show schema in 0.8.4+

 On Thu, Sep 1, 2011 at 12:52 PM, J T jt4websi...@googlemail.com wrote:
  Hi,
 
  I'm probably being blind .. but I can't see any way to dump the schema
  definition (and the data in it for that matter)  of a cluster in order to
  capture the current schema in a script file for subsequent replaying in
 to a
  different environment.
 
  For example, say I have a DEV env and wanted to create a script
 containing
  the cli commands to create that schema in a UAT env.
 
  In my case, I have a cassandra schema I've been tweaking / upgrading over
  the last 2 years and I can't see any easy way to capture the schema
  definition.
 
  Is such a thing on the cards for cassandra-cli ?
 
  JT
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Cassandra, CQL, Thrift Deprecation?? and Erlang

2011-09-02 Thread Jonathan Ellis
The Thrift API is not going anywhere any time soon.

I'm not aware of anyone working on an erlang CQL client.

On Fri, Sep 2, 2011 at 7:39 AM, J T jt4websi...@googlemail.com wrote:
 Hi,

 I'm a fan of erlang, and have been using successive cassandra versions via
 the erlang thrift interface for a couple of years now.

 I see that cassandra seems to be moving to using CQL instead and so I was
 wondering if that means the thrift api will be deprecated and if so is there
 any effort underway to by anyone to create (whatever would be neccessary) to
 use cassandra via cql from erlang ?

 JT




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


RE: Removal of old data files

2011-09-02 Thread hiroyuki.watanabe
 
I see. Thank you for helpful information 

Yuki



-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Friday, September 02, 2011 3:40 AM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

On Fri, Sep 2, 2011 at 12:11 AM,  hiroyuki.watan...@barclayscapital.com wrote:
 Yes, I see files with name like
     Orders-g-6517-Compacted

 However, all of those file have a size of 0.

 Starting from Monday to Thurseday we have 5642 files for -Data.db, 
 -Filter.db and Statistics.db and only 128 -Compacted files.
 and all of -Compacted file has size of 0.

 Is this normal, or we are doing something wrong?

You are not doing something wrong. The -Compacted files are just marker, to 
indicate that the -Data file corresponding (with the same number) are, in fact, 
compacted and will eventually be removed. So those files will always have a 
size of 0.

--
Sylvain



 yuki

 
 From: aaron morton [mailto:aa...@thelastpickle.com]
 Sent: Thursday, August 25, 2011 6:13 PM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files

 If cassandra does not have enough disk space to create a new file it 
 will provoke a JVM GC which should result in compacted SStables that 
 are no longer needed been deleted. Otherwise they are deleted at some 
 time in the future.
 Compacted SSTables have a file written out with a compacted extension.
 Do you see compacted sstables in the data directory?
 Cheers.
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

 We are using Cassandra 0.8.0 with 8 node ring and only one CF.
 Every column has TTL of 86400 (24 hours). we also set 'GC grace 
 second' to 43200
 (12 hours).  We have to store massive amount of data for one day now 
 and eventually for five days if we get more disk space.
 Even for one day, we do run out disk space in a busy day.

 We run nodetool compact command at night or as necessary then we run 
 GC from jconsole. We observed that  GC did remove files but not 
 necessarily oldest ones.
 Data files from more than 36 hours ago and quite often three days ago 
 are still there.

 Does this behavior expected or we need adjust some other parameters?


 Yuki Watanabe

 ___



 This e-mail may contain information that is confidential, privileged 
 or otherwise protected from disclosure. If you are not an intended 
 recipient of this e-mail, do not duplicate or redistribute it by any 
 means. Please delete it and any attachments and notify the sender that 
 you have received it in error. Unless specifically indicated, this 
 e-mail is not an offer to buy or sell or a solicitation to buy or sell 
 any securities, investment products or other financial product or 
 service, an official confirmation of any transaction, or an official 
 statement of Barclays. Any views or opinions presented are solely 
 those of the author and do not necessarily represent those of 
 Barclays. This e-mail is subject to terms available at the following 
 link: www.barcap.com/emaildisclaimer. By messaging with Barclays you 
 consent to the foregoing.  Barclays Capital is the investment banking 
 division of Barclays Bank PLC, a company registered in England (number
 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
 This email may relate to or be sent from other members of the Barclays 
 Group.

 ___


looking for information on composite columns

2011-09-02 Thread Yiming Sun
Hi,

I am looking for information/tutorials on the use of composite columns,
including how to use it, what kind of indexing it can offer, and its
advantage over super columns.  I googled but came up with very little
information.  There is a blog article from high performance cassandra on the
compositeType comparator, but the use case is a composite column name rather
than a column value.  Does anyone know of some good resources on this and is
willing to share with me?  Thanks.

-- Y.


Re: Cassandra, CQL, Thrift Deprecation?? and Erlang

2011-09-02 Thread J T
Ok, thats good to know.

If push came to shove I could probably write such a client myself after
doing the necessary research but I'd prefer to save myself the hassle.

Thanks.

On Fri, Sep 2, 2011 at 1:59 PM, Jonathan Ellis jbel...@gmail.com wrote:

 The Thrift API is not going anywhere any time soon.

 I'm not aware of anyone working on an erlang CQL client.

 On Fri, Sep 2, 2011 at 7:39 AM, J T jt4websi...@googlemail.com wrote:
  Hi,
 
  I'm a fan of erlang, and have been using successive cassandra versions
 via
  the erlang thrift interface for a couple of years now.
 
  I see that cassandra seems to be moving to using CQL instead and so I was
  wondering if that means the thrift api will be deprecated and if so is
 there
  any effort underway to by anyone to create (whatever would be neccessary)
 to
  use cassandra via cql from erlang ?
 
  JT
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



removing all column metadata via CLI

2011-09-02 Thread Radim Kolar
I cant find way how to remove all columns definitions without CF 
import/export.


[default@int4] update column family sipdb with column_metadata = [];
Syntax error at position 51: required (...)+ loop did not match anything 
at input ']'


[default@int4] update column family sipdb with column_metadata = [{}];
Command not found: `update column family sipdb with column_metadata = 
[{}];`. Type 'help;' or '?' for help.

[default@int4]



Re: 15 seconds to increment 17k keys?

2011-09-02 Thread Richard Low
On Thu, Sep 1, 2011 at 5:16 PM, Ian Danforth idanfo...@numenta.com wrote:

 Does this scale with multiples of the replication factor or directly
 with number of nodes? Or more succinctly, to double the writes per
 second into the cluster how many more nodes would I need?

The write throughput scales with number of nodes, so double to get
double the write capacity.

Increasing the replication factor in general doesn't improve
performance (and increasing without increasing number of nodes
decreases performance).  This is because operations are performed on
all available replicas (with the exception of reads with low
consistency levels and read_repair_chance  1.0).

Note also that there is just one read per counter increment, not a
read per replica.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu


Re: removing all column metadata via CLI

2011-09-02 Thread Jonathan Ellis
Is this 0.8.4?

2011/9/2 Radim Kolar h...@sendmail.cz:
 I cant find way how to remove all columns definitions without CF
 import/export.

 [default@int4] update column family sipdb with column_metadata = [];
 Syntax error at position 51: required (...)+ loop did not match anything at
 input ']'

 [default@int4] update column family sipdb with column_metadata = [{}];
 Command not found: `update column family sipdb with column_metadata =
 [{}];`. Type 'help;' or '?' for help.
 [default@int4]





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: looking for information on composite columns

2011-09-02 Thread Edward Capriolo
On Fri, Sep 2, 2011 at 9:15 AM, Yiming Sun yiming@gmail.com wrote:

 Hi,

 I am looking for information/tutorials on the use of composite columns,
 including how to use it, what kind of indexing it can offer, and its
 advantage over super columns.  I googled but came up with very little
 information.  There is a blog article from high performance cassandra on the
 compositeType comparator, but the use case is a composite column name rather
 than a column value.  Does anyone know of some good resources on this and is
 willing to share with me?  Thanks.

 -- Y.


I am going to do some more composite recipes in my blog, I noticed from my
search refers that it is a very hot topic.

www.anuff.com/2011/02/indexing-in-cassandra.html
www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
www.datastax.com/2011/06/ed-anuff-to-speak-at-cassandra-sf-2011

Composite columns do not do indexing in themselves but the way they allow
multiple components to live in one column but still sort properly is how
they relate to indexing.

Edward


Re: looking for information on composite columns

2011-09-02 Thread Yiming Sun
Thanks Edward. What's the link to your blog?


On Fri, Sep 2, 2011 at 10:43 AM, Edward Capriolo edlinuxg...@gmail.comwrote:


 On Fri, Sep 2, 2011 at 9:15 AM, Yiming Sun yiming@gmail.com wrote:

 Hi,

 I am looking for information/tutorials on the use of composite columns,
 including how to use it, what kind of indexing it can offer, and its
 advantage over super columns.  I googled but came up with very little
 information.  There is a blog article from high performance cassandra on the
 compositeType comparator, but the use case is a composite column name rather
 than a column value.  Does anyone know of some good resources on this and is
 willing to share with me?  Thanks.

 -- Y.


 I am going to do some more composite recipes in my blog, I noticed from my
 search refers that it is a very hot topic.

 www.anuff.com/2011/02/indexing-in-cassandra.html
 www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
 www.datastax.com/2011/06/ed-anuff-to-speak-at-cassandra-sf-2011

 Composite columns do not do indexing in themselves but the way they allow
 multiple components to live in one column but still sort properly is how
 they relate to indexing.

 Edward



Re: removing all column metadata via CLI

2011-09-02 Thread Radim Kolar

 Is this 0.8.4?
yes



Cassandra prod environment

2011-09-02 Thread Sorin Julean
Hey,

 Currently I'm running Cassandra on Ubuntu 10.4 x86_64 in EC2.

 I'm wondering if anyone observed a better performance  / stability on other
distros ( CentOS / RHEL / ...) or OS (eg. Solaris intel/SPARC) ?
 Is anyone running prod on VMs, not cloud, but ESXi or Solaris zones ? Is
there love or hate :) ?  Any storage best-practices on VM environments ?
 I like xfs ! Any observations on xfs / ext4 / zfs, from Cassandra usage
perspective ?

Cheers,
Sorin


Re: removing all column metadata via CLI

2011-09-02 Thread Jonathan Ellis
Then you'll want to create an issue:
https://issues.apache.org/jira/browse/CASSANDRA

On Fri, Sep 2, 2011 at 10:08 AM, Radim Kolar h...@sendmail.cz wrote:
 Is this 0.8.4?
 yes





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra prod environment

2011-09-02 Thread Jeremy Hanna
We moved off of ubuntu because of kernel issues in the AMIs we found in 10.04 
and 10.10 in ec2.  So we're now on debian squeeze with ext4.  It's been great 
for us.

One thing that bit us is we'd been using property file snitch and the 
availability zones as racks and had an equal number of nodes in each 
availability zone.  However we hadn't realized that you need to rotate between 
racks (AZs) with each token - so for US-East, in token order, we needed to go 
something like AZ A, B, C, A, B, C for six nodes.  Otherwise you will get 
hotspots because of how replication happens.

For some best practices in ec2, check out 
http://www.slideshare.net/mattdennis/cassandra-on-ec2

On Sep 2, 2011, at 10:30 AM, Sorin Julean wrote:

 Hey,
 
  Currently I'm running Cassandra on Ubuntu 10.4 x86_64 in EC2.
 
  I'm wondering if anyone observed a better performance  / stability on other 
 distros ( CentOS / RHEL / ...) or OS (eg. Solaris intel/SPARC) ?
  Is anyone running prod on VMs, not cloud, but ESXi or Solaris zones ? Is 
 there love or hate :) ?  Any storage best-practices on VM environments ?
  I like xfs ! Any observations on xfs / ext4 / zfs, from Cassandra usage 
 perspective ?
 
 Cheers,
 Sorin



Re: Trying to understand QUORUM and Strategies

2011-09-02 Thread Evgeniy Ryabitskiy
So.
You have created keyspace with SimpleStrategy.
If you want to use *LOCAL_QUORUM, *you should create keyspace (or change
existing) with NetworkTopologyStrategy.

I have provided CLI examples on how to do it. If you are creating keyspace
from Hector, you have to do same via Java API.

Evgeny.


Re: Replicate On Write behavior

2011-09-02 Thread Ian Danforth
That ticket explains a lot, looking forward to a resolution on it.
(Sorry I don't have a patch to offer)

Ian

On Fri, Sep 2, 2011 at 12:30 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne dha...@gmx.3crowd.com wrote:
 I'm curious... digging through the source, it looks like replicate on write 
 triggers a read of the entire row, and not just the columns/supercolumns 
 that are affected by the counter update.  Is this the case?  It would 
 certainly explain why my inserts/sec decay over time and why the average 
 insert latency increases over time.  The strange thing is that I'm not 
 seeing disk read IO increase over that same period, but that might be due to 
 the OS buffer cache...

 It does not. It only reads the columns/supercolumns affected by the
 counter update.
 In the source, this happens in CounterMutation.java. If you look at
 addReadCommandFromColumnFamily you'll see that it does a query by name
 only for the column involved in the update (the update is basically
 the content of the columnFamily parameter there).

 And Cassandra does *not* always reads a full row. Never had, never will.

 On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
 ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that 
 normal?  I'm using RandomPartitioner...

 Address         DC          Rack        Status State   Load            Owns  
   Token
                                                                            
 136112946768375385385349842972707284580
 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  0
 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  
 34028236692093846346337460743176821145
 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  
 68056473384187692692674921486353642290
 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  
 102084710076281539039012382229530463435
 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  
 136112946768375385385349842972707284580

 The nodes with ReplicateOnWrites are the 3 in the middle.  The first node 
 and last node both have a count of 0.  This is a clean cluster, and I've 
 been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 
 hours.  The last time this test ran, it went all the way down to 500 
 inserts/sec before I killed it.

 Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.

 --
 Sylvain



Re: Trying to understand QUORUM and Strategies

2011-09-02 Thread Anthony Ikeda
Okay, great I just wanted to confirm that LOCAL_QUORUM will not work with
SimpleStrategy. There was somewhat of a debate amongst my devs that said it
should work.

Anthon


On Fri, Sep 2, 2011 at 9:55 AM, Evgeniy Ryabitskiy 
evgeniy.ryabits...@wikimart.ru wrote:

 So.
 You have created keyspace with SimpleStrategy.
 If you want to use *LOCAL_QUORUM, *you should create keyspace (or change
 existing) with NetworkTopologyStrategy.

 I have provided CLI examples on how to do it. If you are creating keyspace
 from Hector, you have to do same via Java API.

 Evgeny.





JMX TotalReadLatencyMicros sanity check

2011-09-02 Thread David Hawthorne
I've graphed the rate of change of the TotalReadLatencyMicros counter over the 
last 12 hours, and divided by 1,000,000 to get it in seconds.  I'm grabbing it 
every 10 seconds, so I divided by another 10 to get per-second rates.

The result is that I have a CF doing 10 seconds of read *every second*.

Does that make sense?

If I divide it by the number of reads done, it matches up with the latency I'm 
seeing from cfstats:  1.5ms/read.

Streaming stuck on one node during Repair

2011-09-02 Thread Jake Maizel
Hello,

I have one node of a cluster that is stuck in a streaming out state
sending to the node that is being repaired.

If I looked the AE Thread in jconsole I see this trace:

Name: AE-SERVICE-STAGE:1
State: WAITING on java.util.concurrent.FutureTask$Sync@7e3e0044
Total blocked: 0  Total waited: 23

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
java.util.concurrent.FutureTask.get(FutureTask.java:83)
org.apache.cassandra.service.AntiEntropyService$Differencer.performStreamingRepair(AntiEntropyService.java:515)
org.apache.cassandra.service.AntiEntropyService$Differencer.run(AntiEntropyService.java:475)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

The Steam stage shows this trace:

Name: STREAM-STAGE:1
State: WAITING on org.apache.cassandra.utils.SimpleCondition@1158f928
Total blocked: 9  Total waited: 16

Stack trace:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:38)
org.apache.cassandra.streaming.StreamOutManager.waitForStreamCompletion(StreamOutManager.java:164)
org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:138)
org.apache.cassandra.service.AntiEntropyService$Differencer$1.runMayThrow(AntiEntropyService.java:511)
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

Is there a way to unstick these threads?  Or am I stuck restarting the
node and then rerunning the entire repair?  All the other nodes seemed
to complete properly and one is still running.  I am thinking to wait
until the current one finishes and then restart the stuck nodes then
once its up run repair again on the node needing it.

Thoughts?

(0.6.6 on a 7 nodes cluster)



-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail  GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Re: Limiting ColumnSlice range in second composite value

2011-09-02 Thread Nate McCall
Instead of empty strings, try Character.[MAX|MIN-]_VALUE.

On Thu, Sep 1, 2011 at 8:27 PM, Anthony Ikeda
anthony.ikeda@gmail.com wrote:
 My Column name is of Composite(TimeUUIDType, UTF8Type) and I can query
 across the TimeUUIDs correctly, but now I want to also range across the UTF8
 component. Is this possible?

 UUID start = uuidForDate(new Date(1979, 1, 1));

 UUID end = uuidForDate(new Date(Long.MAX_VALUE));

 String startState = ;

 String endState = ;

 if (desiredState != null) {

     mLog.debug(Restricting state to [ + desiredState.getValue() + ]);

     startState = desiredState.getValue();

     endState = desiredState.getValue().concat(_);

 }



 Composite startComp = new Composite(start, startState);

 Composite endComp = new Composite(end, endState);

 query.setRange(startComp, endComp, true, count);

 So far I'm not seeing any effect setting my endState String value.

 Anthony


Re: Trying to understand QUORUM and Strategies

2011-09-02 Thread Jonathan Ellis
Note that this is an implementation detail, not something that
inherently can't work with other strategies.  LOCAL_QUORUM and
EACH_QUORUM are logically equivalent to QUORUM when there is a single
datacenter.

We tried briefly to add support for non-NTS strategies in
https://issues.apache.org/jira/browse/CASSANDRA-2516, but reverted it
in https://issues.apache.org/jira/browse/CASSANDRA-2627.

On Fri, Sep 2, 2011 at 12:53 PM, Anthony Ikeda
anthony.ikeda@gmail.com wrote:
 Okay, great I just wanted to confirm that LOCAL_QUORUM will not work with
 SimpleStrategy. There was somewhat of a debate amongst my devs that said it
 should work.
 Anthon

 On Fri, Sep 2, 2011 at 9:55 AM, Evgeniy Ryabitskiy
 evgeniy.ryabits...@wikimart.ru wrote:

 So.
 You have created keyspace with SimpleStrategy.
 If you want to use LOCAL_QUORUM, you should create keyspace (or change
 existing) with NetworkTopologyStrategy.

 I have provided CLI examples on how to do it. If you are creating keyspace
 from Hector, you have to do same via Java API.

 Evgeny.







-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Replicate On Write behavior

2011-09-02 Thread David Hawthorne
Does it always pick the node with the lowest IP address?  All of my hosts are 
in the same /24.  The fourth node in the 5 node cluster has the lowest value in 
the 4th octet (54).  I erased the cluster and rebuilt it from scratch as a 3 
node cluster using the first 3 nodes, and now the ReplicateOnWrites are all 
going to the third node, which is also the lowest valued IP address (55).

That would explain why only 1 node gets writes in a 3 node cluster (RF=3) and 
why 3 nodes get writes in a 5 node cluster, and why one of those 3 is taking 
66% of the writes.


 
 On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
 ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that 
 normal?  I'm using RandomPartitioner...
 
 Address DC  RackStatus State   LoadOwns  
   Token

 136112946768375385385349842972707284580
 10.0.0.57datacenter1 rack1   Up Normal  2.26 GB 20.00%  0
 10.0.0.56datacenter1 rack1   Up Normal  2.47 GB 20.00%  
 34028236692093846346337460743176821145
 10.0.0.55datacenter1 rack1   Up Normal  2.52 GB 20.00%  
 68056473384187692692674921486353642290
 10.0.0.54datacenter1 rack1   Up Normal  950.97 MB   20.00%  
 102084710076281539039012382229530463435
 10.0.0.72datacenter1 rack1   Up Normal  383.25 MB   20.00%  
 136112946768375385385349842972707284580
 
 The nodes with ReplicateOnWrites are the 3 in the middle.  The first node 
 and last node both have a count of 0.  This is a clean cluster, and I've 
 been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 
 hours.  The last time this test ran, it went all the way down to 500 
 inserts/sec before I killed it.
 
 Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.
 
 --
 Sylvain



Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Nate McCall
It looks like you only have 2 replicas configured in each data center?

If so, LOCAL_QUORUM cannot be achieved with a host down same as with
QUORUM on RF=2 in a single DC cluster.

On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.



Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Oleg Tsvinev
Well, this is the part I don't understand then. I thought that if I
configure 2 replicas with 3 nodes and one of 3 nodes goes down, I'll
still have 2 nodes to store 3 replicas. Is my logic flawed somehere?

On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.




Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Oleg Tsvinev
from http://www.datastax.com/docs/0.8/consistency/index:

A “quorum” of replicas is essentially a majority of replicas, or RF /
2 + 1 with any resulting fractions rounded down.

I have RF=2, so majority of replicas is 2/2+1=2 which I have after 3rd
node goes down?

On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.




Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Nate McCall
In your options, you have configured 2 replicas for each data center:
Options: [DC2:2, DC1:2]

If one of those replicas is down, then LOCAL_QUORUM will fail as there
is only one replica left 'locally.'


On Fri, Sep 2, 2011 at 3:35 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 from http://www.datastax.com/docs/0.8/consistency/index:

 A “quorum” of replicas is essentially a majority of replicas, or RF /
 2 + 1 with any resulting fractions rounded down.

 I have RF=2, so majority of replicas is 2/2+1=2 which I have after 3rd
 node goes down?

 On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.





Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Oleg Tsvinev
Do you mean I need to configure 3 replicas in each DC and keep using
LOCAL_QUORUM? In which case, if I'm following your logic, even one of
the 3 goes down I'll still have 2 to ensure LOCAL_QUORUM succeeds?

On Fri, Sep 2, 2011 at 1:44 PM, Nate McCall n...@datastax.com wrote:
 In your options, you have configured 2 replicas for each data center:
 Options: [DC2:2, DC1:2]

 If one of those replicas is down, then LOCAL_QUORUM will fail as there
 is only one replica left 'locally.'


 On Fri, Sep 2, 2011 at 3:35 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 from http://www.datastax.com/docs/0.8/consistency/index:

 A “quorum” of replicas is essentially a majority of replicas, or RF /
 2 + 1 with any resulting fractions rounded down.

 I have RF=2, so majority of replicas is 2/2+1=2 which I have after 3rd
 node goes down?

 On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.






Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Nate McCall
Yes - you would need at least 3 replicas per data center to use
LOCAL_QUORUM and survive a node failure.

On Fri, Sep 2, 2011 at 3:51 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 Do you mean I need to configure 3 replicas in each DC and keep using
 LOCAL_QUORUM? In which case, if I'm following your logic, even one of
 the 3 goes down I'll still have 2 to ensure LOCAL_QUORUM succeeds?

 On Fri, Sep 2, 2011 at 1:44 PM, Nate McCall n...@datastax.com wrote:
 In your options, you have configured 2 replicas for each data center:
 Options: [DC2:2, DC1:2]

 If one of those replicas is down, then LOCAL_QUORUM will fail as there
 is only one replica left 'locally.'


 On Fri, Sep 2, 2011 at 3:35 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 from http://www.datastax.com/docs/0.8/consistency/index:

 A “quorum” of replicas is essentially a majority of replicas, or RF /
 2 + 1 with any resulting fractions rounded down.

 I have RF=2, so majority of replicas is 2/2+1=2 which I have after 3rd
 node goes down?

 On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com 
 wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.







Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Konstantin Naryshkin
I think that Oleg may have misunderstood how replicas are selected. If you have 
3 nodes in your cluster and a RF of 2, Cassandra first selects what two nodes, 
out of the 3 will get data, then, and only then does it write it out. The 
selection is based on the row key, the token of the node, and you choice of 
partitioner. This means that Cassandra does not need to store what node is 
responsible for a given row. That information can be recalculated whenever it 
is needed.

The error that you are getting is because you may have 2 nodes up, those are 
not the nodes that Cassandra will use to store data.

- Original Message -
From: Nate McCall n...@datastax.com
To: hector-us...@googlegroups.com
Cc: Cassandra Users user@cassandra.apache.org
Sent: Friday, September 2, 2011 4:44:01 PM
Subject: Re: HUnavailableException: : May not be enough replicas present to 
handle consistency level.

In your options, you have configured 2 replicas for each data center:
Options: [DC2:2, DC1:2]

If one of those replicas is down, then LOCAL_QUORUM will fail as there
is only one replica left 'locally.'


On Fri, Sep 2, 2011 at 3:35 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 from http://www.datastax.com/docs/0.8/consistency/index:

 A “quorum” of replicas is essentially a majority of replicas, or RF /
 2 + 1 with any resulting fractions rounded down.

 I have RF=2, so majority of replicas is 2/2+1=2 which I have after 3rd
 node goes down?

 On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.





Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Oleg Tsvinev
And now, when I have one node down with no chance of bringing it back
anytime soon, can I still change RF to 3 and get restore functionality
of my cluster? Should I run 'nodetool repair' or simple keyspace
update will suffice?

On Fri, Sep 2, 2011 at 1:55 PM, Nate McCall n...@datastax.com wrote:
 Yes - you would need at least 3 replicas per data center to use
 LOCAL_QUORUM and survive a node failure.

 On Fri, Sep 2, 2011 at 3:51 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 Do you mean I need to configure 3 replicas in each DC and keep using
 LOCAL_QUORUM? In which case, if I'm following your logic, even one of
 the 3 goes down I'll still have 2 to ensure LOCAL_QUORUM succeeds?

 On Fri, Sep 2, 2011 at 1:44 PM, Nate McCall n...@datastax.com wrote:
 In your options, you have configured 2 replicas for each data center:
 Options: [DC2:2, DC1:2]

 If one of those replicas is down, then LOCAL_QUORUM will fail as there
 is only one replica left 'locally.'


 On Fri, Sep 2, 2011 at 3:35 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 from http://www.datastax.com/docs/0.8/consistency/index:

 A “quorum” of replicas is essentially a majority of replicas, or RF /
 2 + 1 with any resulting fractions rounded down.

 I have RF=2, so majority of replicas is 2/2+1=2 which I have after 3rd
 node goes down?

 On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com 
 wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: 
 org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.








Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Oleg Tsvinev
Yes, I think I get it now. quorum of replicas != quorum of nodes
and I don't think quorum of nodes is ever defined. Thank you,
Konstantin.

Now, I believe I need to change my cluster to store data in two
remaining nodes in DC1, keeping 3 nodes in DC2. I believe nodetool
removetoken is what I need to use. Anything else I can/should do?

On Fri, Sep 2, 2011 at 1:56 PM, Konstantin  Naryshkin
konstant...@a-bb.net wrote:
 I think that Oleg may have misunderstood how replicas are selected. If you 
 have 3 nodes in your cluster and a RF of 2, Cassandra first selects what two 
 nodes, out of the 3 will get data, then, and only then does it write it out. 
 The selection is based on the row key, the token of the node, and you choice 
 of partitioner. This means that Cassandra does not need to store what node is 
 responsible for a given row. That information can be recalculated whenever it 
 is needed.

 The error that you are getting is because you may have 2 nodes up, those are 
 not the nodes that Cassandra will use to store data.

 - Original Message -
 From: Nate McCall n...@datastax.com
 To: hector-us...@googlegroups.com
 Cc: Cassandra Users user@cassandra.apache.org
 Sent: Friday, September 2, 2011 4:44:01 PM
 Subject: Re: HUnavailableException: : May not be enough replicas present to 
 handle consistency level.

 In your options, you have configured 2 replicas for each data center:
 Options: [DC2:2, DC1:2]

 If one of those replicas is down, then LOCAL_QUORUM will fail as there
 is only one replica left 'locally.'


 On Fri, Sep 2, 2011 at 3:35 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 from http://www.datastax.com/docs/0.8/consistency/index:

 A “quorum” of replicas is essentially a majority of replicas, or RF /
 2 + 1 with any resulting fractions rounded down.

 I have RF=2, so majority of replicas is 2/2+1=2 which I have after 3rd
 node goes down?

 On Fri, Sep 2, 2011 at 1:22 PM, Nate McCall n...@datastax.com wrote:
 It looks like you only have 2 replicas configured in each data center?

 If so, LOCAL_QUORUM cannot be achieved with a host down same as with
 QUORUM on RF=2 in a single DC cluster.

 On Fri, Sep 2, 2011 at 1:40 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 I believe I don't quite understand semantics of this exception:

 me.prettyprint.hector.api.exceptions.HUnavailableException: : May not
 be enough replicas present to handle consistency level.

 Does it mean there *might be* enough?
 Does it mean there *is not* enough?

 My case is as following - I have 3 nodes with keyspaces configured as 
 following:

 Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Durable Writes: true
 Options: [DC2:2, DC1:2]

 Hector can only connect to nodes in DC1 and configured to neither see
 nor connect to nodes in DC2. This is for replication by Cassandra
 means, asynchronously between datacenters DC1 and DC2. Each of 6 total
 nodes can see any of the remaining 5.

 and inserts with LOCAL_QUORUM CL work fine when all 3 nodes are up.
 However, this morning one node went down and I started seeing the
 HUnavailableException: : May not be enough replicas present to handle
 consistency level.

 I believed if I have 3 nodes and one goes down, two remaining nodes
 are sufficient for my configuration.

 Please help me to understand what's going on.






Import JSON sstable data

2011-09-02 Thread Zhong Li
Hi,

I try to upload sstable data on cassandra 0.8.4 cluster with json2sstable tool. 
Each time I have to restart the node with new file imported and do repair for 
the column family, otherwise new data will not show. 


Any thoughts?


Thanks,

Zhong Li

Re: Limiting ColumnSlice range in second composite value

2011-09-02 Thread Anthony Ikeda
This is what I'm trying to do:

Sample of the data:
RowKey: localhost
= (column=e3f3c900-d5b0-11e0-aa6b-005056c8:ACTIVE, value=?xml
version=1.0 encoding=UTF-8 standalone=yes?,
timestamp=1315001665761000)
= (column=e4515250-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
version=1.0 encoding=UTF-8 standalone=yes?,
timestamp=1315001654271000)
= (column=e45549f0-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
version=1.0 encoding=UTF-8 standalone=yes?,
timestamp=1315001654327000)
= (column=e45cc400-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
version=1.0 encoding=UTF-8 standalone=yes?,
timestamp=1315001654355000)
= (column=e462de80-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
version=1.0 encoding=UTF-8 standalone=yes?,
timestamp=1315001654394000)


I'll be activating and deactivating the inactive profiles in a chronological
order.


   - So I want to first retrieve current ACTIVE record (easy cause it's
   cached)
   - Put it to use and when ready, recreate the column - same timeUUID but
   EXHAUSTED status (delete then add)
   - Next I have to fetch the first INACTIVE column after this, delete
   that and re-create the record with an ACTIVE composite (same timeuuid,
   again add then delete) and repeat the process.


The second part of my composite is an ENUM of String literals:
Status.ACTIVE, Status.INACTIVE, Status.EXHAUSTED

I want to get the current row key of value (startTimeUUID, ACTIVE) which
should only be one column (provided the code works)

All earlier columns are (timeUUID, EXHAUSTED), all later columns should be
(timeUUID, INACTIVE)

I'm thinking to find the column that is ACTIVE I would set the range:

startComp = new Composite(timeUUID, ACTIVE);
endComp = new Composite(timeUUID, ACTIVE_);

query.setRange(startComp, endComp, false, 2); //Fetch 2 just in case

To get all INACTIVE columns I'd use
startComp = new Composite(timeUUID, INACTIVE);
endComp = new Composite(timeUUID, INACTIVE_);

query.setRange(startComp, endComp, false, 10);

Thing is I'm getting back all columns regardless of what I set for the
second half of the composite. Is what I'm trying to do possible?

Anthony


On Fri, Sep 2, 2011 at 12:29 PM, Nate McCall n...@datastax.com wrote:

 Instead of empty strings, try Character.[MAX|MIN-]_VALUE.

 On Thu, Sep 1, 2011 at 8:27 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  My Column name is of Composite(TimeUUIDType, UTF8Type) and I can query
  across the TimeUUIDs correctly, but now I want to also range across the
 UTF8
  component. Is this possible?
 
  UUID start = uuidForDate(new Date(1979, 1, 1));
 
  UUID end = uuidForDate(new Date(Long.MAX_VALUE));
 
  String startState = ;
 
  String endState = ;
 
  if (desiredState != null) {
 
  mLog.debug(Restricting state to [ + desiredState.getValue() + ]);
 
  startState = desiredState.getValue();
 
  endState = desiredState.getValue().concat(_);
 
  }
 
 
 
  Composite startComp = new Composite(start, startState);
 
  Composite endComp = new Composite(end, endState);
 
  query.setRange(startComp, endComp, true, count);
 
  So far I'm not seeing any effect setting my endState String value.
 
  Anthony



commodity server spec

2011-09-02 Thread China Stoffen
Hi,
Is there any recommendation about commodity server hardware specs
 if 100TB database size is expected and its heavily write application.
Should
 I got with high powered CPU (12 cores) and 48TB HDD and 640GB RAM and 
total of 3 servers of this spec. Or many smaller commodity servers are 
recommended?

Thanks.
China

Re: Limiting ColumnSlice range in second composite value

2011-09-02 Thread Anthony Ikeda
Okay, I reversed the composite and seem to have come up with a solution.
Although the rows are sorted by the status, the statuses are sorted
temporally which helps. I tell you this type of modeling really breaks the
rules :)

Anthony


On Fri, Sep 2, 2011 at 3:54 PM, Anthony Ikeda
anthony.ikeda@gmail.comwrote:

 This is what I'm trying to do:

 Sample of the data:
 RowKey: localhost
 = (column=e3f3c900-d5b0-11e0-aa6b-005056c8:ACTIVE, value=?xml
 version=1.0 encoding=UTF-8 standalone=yes?,
 timestamp=1315001665761000)
 = (column=e4515250-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
 version=1.0 encoding=UTF-8 standalone=yes?,
 timestamp=1315001654271000)
 = (column=e45549f0-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
 version=1.0 encoding=UTF-8 standalone=yes?,
 timestamp=1315001654327000)
 = (column=e45cc400-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
 version=1.0 encoding=UTF-8 standalone=yes?,
 timestamp=1315001654355000)
 = (column=e462de80-d5b0-11e0-aa6b-005056c8:INACTIVE, value=?xml
 version=1.0 encoding=UTF-8 standalone=yes?,
 timestamp=1315001654394000)


 I'll be activating and deactivating the inactive profiles in a
 chronological order.


- So I want to first retrieve current ACTIVE record (easy cause it's
cached)
- Put it to use and when ready, recreate the column - same timeUUID but
EXHAUSTED status (delete then add)
- Next I have to fetch the first INACTIVE column after this, delete
that and re-create the record with an ACTIVE composite (same timeuuid,
again add then delete) and repeat the process.


 The second part of my composite is an ENUM of String literals:
 Status.ACTIVE, Status.INACTIVE, Status.EXHAUSTED

 I want to get the current row key of value (startTimeUUID, ACTIVE) which
 should only be one column (provided the code works)

 All earlier columns are (timeUUID, EXHAUSTED), all later columns should
 be (timeUUID, INACTIVE)

 I'm thinking to find the column that is ACTIVE I would set the range:

 startComp = new Composite(timeUUID, ACTIVE);
 endComp = new Composite(timeUUID, ACTIVE_);

 query.setRange(startComp, endComp, false, 2); //Fetch 2 just in case

 To get all INACTIVE columns I'd use
 startComp = new Composite(timeUUID, INACTIVE);
 endComp = new Composite(timeUUID, INACTIVE_);

 query.setRange(startComp, endComp, false, 10);

 Thing is I'm getting back all columns regardless of what I set for the
 second half of the composite. Is what I'm trying to do possible?

 Anthony


 On Fri, Sep 2, 2011 at 12:29 PM, Nate McCall n...@datastax.com wrote:

 Instead of empty strings, try Character.[MAX|MIN-]_VALUE.

 On Thu, Sep 1, 2011 at 8:27 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  My Column name is of Composite(TimeUUIDType, UTF8Type) and I can query
  across the TimeUUIDs correctly, but now I want to also range across the
 UTF8
  component. Is this possible?
 
  UUID start = uuidForDate(new Date(1979, 1, 1));
 
  UUID end = uuidForDate(new Date(Long.MAX_VALUE));
 
  String startState = ;
 
  String endState = ;
 
  if (desiredState != null) {
 
  mLog.debug(Restricting state to [ + desiredState.getValue() +
 ]);
 
  startState = desiredState.getValue();
 
  endState = desiredState.getValue().concat(_);
 
  }
 
 
 
  Composite startComp = new Composite(start, startState);
 
  Composite endComp = new Composite(end, endState);
 
  query.setRange(startComp, endComp, true, count);
 
  So far I'm not seeing any effect setting my endState String value.
 
  Anthony





Re: commodity server spec

2011-09-02 Thread Radim Kolar

many smaller servers way better