Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
On 19 September 2013 02:06, Jayadev Jayaraman jdisal...@gmail.com wrote:

We use vnodes with num_tokens = 256 ( 256 tokens per node ) . After loading
 some data with sstableloader , we find that the cluster is heavily
 imbalanced :


How did you select the tokens?  Is this a brand new cluster which started
on first boot with num_tokens = 256 and chose random tokens?  Or did you
start with num_tokens = 1 and then increase it?

Richard.


Row size in cfstats vs cfhistograms

2013-09-19 Thread Rene Kochen
Hi all,

I use Cassandra 1.0.11

If I do cfstats for a particular column family, I see a Compacted row
maximum size of 43388628

However, when I do a cfhistograms I do not see such a big row in the Row
Size column. The biggest row there is 126934.

Can someone explain this?

Thanks!

Rene


cqlsh startup error Can't locate transport factory function cqlshlib.tfactory.regular_transport_factory

2013-09-19 Thread Oisin Kim
Hi,

cqlsh stopped working for me recently, I'm unsure how / why it broke and I 
couldn't find anything from the mail archives (or google) that gave me an 
indication of how to fix the problem.

Here's the output I see when I have cassandra running locally (default config 
except using Random Partitioner) and try run cqlsh (running with --debug and 
with the local IP makes no difference) 

oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cqlsh
Can't locate transport factory function 
cqlshlib.tfactory.regular_transport_factory

I Installed cassandra 1.2.9 and Python 2.7.2 via brew and used pip to install 
cql.  I can connect via the cassandra-cli to create and view keyspaces etc 
without any issues.

Any help greatly appreciated, thanks.

Regards,
Oisin

Versions:

oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cassandra -v
xss =  -ea -javaagent:/usr/local/Cellar/cassandra/1.2.9/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4096M -Xmx4096M 
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError
1.2.9



oisin@/usr/local/Cellar: python -V
Python 2.7.2

oisin@/usr/local/Cellar: pip -V
pip 1.4.1 from /usr/local/lib/python2.7/site-packages/pip-1.4.1-py2.7.egg 
(python 2.7)




Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Michał Michalski
I believe the reason is that cfhistograms tells you about the sizes of 
the rows returned by given node in a response to the read request, while 
cfstats tracks the largest row stored on given node.


M.

W dniu 19.09.2013 11:31, Rene Kochen pisze:

Hi all,

I use Cassandra 1.0.11

If I do cfstats for a particular column family, I see a Compacted row
maximum size of 43388628

However, when I do a cfhistograms I do not see such a big row in the Row
Size column. The biggest row there is 126934.

Can someone explain this?

Thanks!

Rene





Re: cqlsh startup error Can't locate transport factory function cqlshlib.tfactory.regular_transport_factory

2013-09-19 Thread Oisin Kim
Fixed this issue, for anyone else with this issue, it was that the version of 
Python installed via brew was 2.7.5 and needed to be put on the path as OS X 
has it's own version of python (2.7.2 currently).



On Thursday 19 September 2013 at 10:33, Oisin Kim wrote:

 Hi,
 
 cqlsh stopped working for me recently, I'm unsure how / why it broke and I 
 couldn't find anything from the mail archives (or google) that gave me an 
 indication of how to fix the problem.
 
 Here's the output I see when I have cassandra running locally (default config 
 except using Random Partitioner) and try run cqlsh (running with --debug and 
 with the local IP makes no difference) 
 
 oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cqlsh
 Can't locate transport factory function 
 cqlshlib.tfactory.regular_transport_factory
 
 I Installed cassandra 1.2.9 and Python 2.7.2 via brew and used pip to install 
 cql.  I can connect via the cassandra-cli to create and view keyspaces etc 
 without any issues.
 
 Any help greatly appreciated, thanks.
 
 Regards,
 Oisin
 
 Versions:
 
 oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cassandra -v
 xss =  -ea -javaagent:/usr/local/Cellar/cassandra/1.2.9/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4096M -Xmx4096M 
 -Xmn800M -XX:+HeapDumpOnOutOfMemoryError
 1.2.9
 
 
 
 oisin@/usr/local/Cellar: python -V
 Python 2.7.2
 
 oisin@/usr/local/Cellar: pip -V
 pip 1.4.1 from /usr/local/lib/python2.7/site-packages/pip-1.4.1-py2.7.egg 
 (python 2.7)
 
 
 
 




Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Richard Low
On 19 September 2013 10:31, Rene Kochen rene.koc...@schange.com wrote:

I use Cassandra 1.0.11

 If I do cfstats for a particular column family, I see a Compacted row
 maximum size of 43388628

 However, when I do a cfhistograms I do not see such a big row in the Row
 Size column. The biggest row there is 126934.

 Can someone explain this?


The 'Row Size' column is showing the number of rows that have a size
indicated by the value in the 'Offset' column.  So if your output is like

Offset  Row Size
1131752  10
1358102  100

It means you have 100 rows with size between 1131752 and 1358102 bytes.  It
doesn't mean there are rows of size 100.

Richard.


Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Rene Kochen
And how does cfstats track the maximum size? What does Compacted mean in
Compacted row maximum size.

Thanks again!

Rene


2013/9/19 Michał Michalski mich...@opera.com

 I believe the reason is that cfhistograms tells you about the sizes of the
 rows returned by given node in a response to the read request, while
 cfstats tracks the largest row stored on given node.

 M.

 W dniu 19.09.2013 11:31, Rene Kochen pisze:

  Hi all,

 I use Cassandra 1.0.11

 If I do cfstats for a particular column family, I see a Compacted row
 maximum size of 43388628

 However, when I do a cfhistograms I do not see such a big row in the Row
 Size column. The biggest row there is 126934.

 Can someone explain this?

 Thanks!

 Rene





Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Rene Kochen
That is indeed how I read it. The maximal size is 3 rows with an offset of
126934, while cfstats reports 43388628.

Thanks,

Rene


2013/9/19 Richard Low rich...@wentnet.com

 On 19 September 2013 10:31, Rene Kochen rene.koc...@schange.com wrote:

 I use Cassandra 1.0.11

 If I do cfstats for a particular column family, I see a Compacted row
 maximum size of 43388628

 However, when I do a cfhistograms I do not see such a big row in the Row
 Size column. The biggest row there is 126934.

 Can someone explain this?


 The 'Row Size' column is showing the number of rows that have a size
 indicated by the value in the 'Offset' column.  So if your output is like

 Offset  Row Size
 1131752  10
 1358102  100

 It means you have 100 rows with size between 1131752 and 1358102 bytes.
  It doesn't mean there are rows of size 100.

 Richard.



Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Hi Richard, 
This is a brand new cluster which started with num_tokens =256 on first boot 
and chose random tokens. The attached ring status is after data is loaded into 
the cluster for the first time using sdtableloader and remains that way even 
after Cassandra is restarted.

Thanks,
Suruchi

On Sep 19, 2013, at 3:46, Richard Low rich...@wentnet.com wrote:

 On 19 September 2013 02:06, Jayadev Jayaraman jdisal...@gmail.com wrote:
 
 We use vnodes with num_tokens = 256 ( 256 tokens per node ) . After loading 
 some data with sstableloader , we find that the cluster is heavily 
 imbalanced : 
 
 How did you select the tokens?  Is this a brand new cluster which started on 
 first boot with num_tokens = 256 and chose random tokens?  Or did you start 
 with num_tokens = 1 and then increase it?
 
 Richard.


Reverse compaction on 1.1.11?

2013-09-19 Thread Michael Theroux
Hello,

Quick question.  Is there a tool that allows sstablesplit (reverse compaction) 
against 1.1.11 sstables?  I seem to recall a separate utility somewhere, but 
I'm having difficulty locating it,

Thanks,
-Mike

Re: Cannot get secondary indexes on fields in compound primary key to work (Cassandra 2.0.0)

2013-09-19 Thread Petter von Dolwitz (Hem)
For the record:

https://issues.apache.org/jira/browse/CASSANDRA-5975 (2.0.1) resolved this
issue for me.






2013/9/8 Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com

 Thank you for you reply.

 I will look into this. I cannot not get my head around why the scenario I
 am describing does not work though. Should I report an issue around this or
 is this expected behaviour? A similar setup is described on this blog post
 by the development lead.

 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts




 2013/9/6 Robert Coli rc...@eventbrite.com

 On Fri, Sep 6, 2013 at 6:18 AM, Petter von Dolwitz (Hem) 
 petter.von.dolw...@gmail.com wrote:

 I am struggling with getting secondary indexes to work. I have created
 secondary indexes on some fields that are part of the compound primary key
 but only one of the indexes seems to work (the one set on the field 'e' on
 the table definition below). Using any other secondary index in a where
 clause causes the message Request did not complete within rpc_timeout..
 It seems like if a put a value in the where clause that does not exist in a
 column with secondary index then cassandra quickly return with the result
 (0 rows) but if a put in a value that do exist I get a timeout. There is no
 exception in the logs in connection with this. I've tried to increase the
 timeout to a minute but it does not help.


 In general unless you absolutely need the atomicity of the update of a
 secondary index with the underlying storage row, you are better off making
 a manual secondary index column family.

 =Rob





Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
I think what has happened is that Cassandra was started with num_tokens =
1, then shutdown and num_tokens set to 256.  When this happens, the first
time Cassandra chooses a single random token.  Then when restarted it
splits the token into 256 adjacent ranges.

You can see something like this has happened because the tokens for each
node are sequential.

The way to fix it is to, assuming you don't want the data, shutdown your
cluster, wipe the whole data and commitlog directories, then start
Cassandra again.

Richard.


On 19 September 2013 13:16, Suruchi Deodhar 
suruchi.deod...@generalsentiment.com wrote:

 Hi Richard,
 This is a brand new cluster which started with num_tokens =256 on first
 boot and chose random tokens. The attached ring status is after data is
 loaded into the cluster for the first time using sdtableloader and remains
 that way even after Cassandra is restarted.

 Thanks,
 Suruchi

 On Sep 19, 2013, at 3:46, Richard Low rich...@wentnet.com wrote:

 On 19 September 2013 02:06, Jayadev Jayaraman jdisal...@gmail.com wrote:

  We use vnodes with num_tokens = 256 ( 256 tokens per node ) . After
 loading some data with sstableloader , we find that the cluster is heavily
 imbalanced :


 How did you select the tokens?  Is this a brand new cluster which started
 on first boot with num_tokens = 256 and chose random tokens?  Or did you
 start with num_tokens = 1 and then increase it?

 Richard.




Re: Reverse compaction on 1.1.11?

2013-09-19 Thread Hiller, Dean
Can ou describe what you mean by reverse compaction?  I mean once you put
a row together and blow away sstables that contained it before, you can't
possibly know how to split it since that information is gone.

Perhaps you want the simple sstable2json script in the bin directory so
you can inspect the file?

Dean

On 9/19/13 7:21 AM, Michael Theroux mthero...@yahoo.com wrote:

Hello,

Quick question.  Is there a tool that allows sstablesplit (reverse
compaction) against 1.1.11 sstables?  I seem to recall a separate utility
somewhere, but I'm having difficulty locating it,

Thanks,
-Mike



Re: Reverse compaction on 1.1.11?

2013-09-19 Thread Nate McCall
See https://issues.apache.org/jira/browse/CASSANDRA-4766

The original gist posted by Rob therein might be helpful/work with earlier
versions (I have not tried).

Worst case, might be a good reason to upgrade to 1.2.x (if you suffering
pressure from a large SSTable, the additional offheap structures will help
a bunch and you may not need to split).


On Thu, Sep 19, 2013 at 8:21 AM, Michael Theroux mthero...@yahoo.comwrote:

 Hello,

 Quick question.  Is there a tool that allows sstablesplit (reverse
 compaction) against 1.1.11 sstables?  I seem to recall a separate utility
 somewhere, but I'm having difficulty locating it,

 Thanks,
 -Mike


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 7:03 AM, Richard Low rich...@wentnet.com wrote:

 I think what has happened is that Cassandra was started with num_tokens =
 1, then shutdown and num_tokens set to 256.  When this happens, the first
 time Cassandra chooses a single random token.  Then when restarted it
 splits the token into 256 adjacent ranges.


Suruchi,

By which mechanism did you install Cassandra? I ask out of concern that
there may be an issue in the some packaging leading to the above sequence
of events.

=Rob


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread Nate McCall
As opposed to stopping compaction altogether, have you experimented with
turning down compaction_throughput_mb_per_sec (16mb default) and/or
explicitly setting concurrent_compactors (defaults to the number of cores,
iirc).


On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar rashmi.aros...@gmail.comwrote:

 Hi,
 In general leveled compaction are I/O heavy so when there are bunch of
 writes do we need to stop leveled compactions at all?
 I found the nodetool stop COMPACTION, which states it stops compaction
 happening, does this work for any type of compaction? Also it states in
 documents 'eventually cassandra restarts the compaction', isn't there a way
 to control when to start the compaction again manually ?
 If this is not applicable for leveled compactions in 1.2, then what can be
 used for stopping/restating those?



 Thanks,
 Rashmi



Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Hi Robert,
I downloaded apache-cassandra-1.2.9.tar.gz from
http://cassandra.apache.org/download/ (
http://apache.mirrors.tds.net/cassandra/1.2.9/apache-cassandra-1.2.9-bin.tar.gz)
and installed it on the individual nodes of the cassandra cluster.
Thanks,
Suruchi


On Thu, Sep 19, 2013 at 12:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 7:03 AM, Richard Low rich...@wentnet.com wrote:

 I think what has happened is that Cassandra was started with num_tokens =
 1, then shutdown and num_tokens set to 256.  When this happens, the first
 time Cassandra chooses a single random token.  Then when restarted it
 splits the token into 256 adjacent ranges.


 Suruchi,

 By which mechanism did you install Cassandra? I ask out of concern that
 there may be an issue in the some packaging leading to the above sequence
 of events.

 =Rob



Re: Problem with counter columns

2013-09-19 Thread Robert Coli
On Wed, Sep 18, 2013 at 11:07 AM, Yulian Oifa oifa.yul...@gmail.com wrote:

 i am using counter columns in cassandra cluster with 3 nodes.



Current cassandra version is 0.8.10.

 How can i debug , find the problem


The problem is using Counters in Cassandra 0.8.

But seriously, I don't know whether the particular issue you describe is
fixed upstream. But if it isn't, no one will fix it in 0.8, so you should
probably...

1) upgrade to Cassandra 1.2.9 (note that you likely need to pass through
1.0/1.1)
2) attempt to reproduce
3) if you can, file a JIRA and update this thread with a link to it

=Rob


Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 3:08 AM, Rene Kochen rene.koc...@schange.comwrote:

 And how does cfstats track the maximum size? What does Compacted mean in
 Compacted row maximum size.


That maximum size is the largest row that I have encountered in the course
of compaction, since I started.

Hence compacted, to try to indicate that it is not necessarily the row of
maximum size which currently exists. For example, if you had a huge row at
some time in the past and have now removed it (and have not restarted in
the interim) this value will be misleading.

=Rob


1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread rash aroskar
Hi,
In general leveled compaction are I/O heavy so when there are bunch of
writes do we need to stop leveled compactions at all?
I found the nodetool stop COMPACTION, which states it stops compaction
happening, does this work for any type of compaction? Also it states in
documents 'eventually cassandra restarts the compaction', isn't there a way
to control when to start the compaction again manually ?
If this is not applicable for leveled compactions in 1.2, then what can be
used for stopping/restating those?



Thanks,
Rashmi


Re: questions related to the SSTable file

2013-09-19 Thread Robert Coli
On Tue, Sep 17, 2013 at 6:51 PM, java8964 java8964 java8...@hotmail.comwrote:

 I thought I was clearer, but your clarification confused me again.



 But there is no way we can be sure that these SSTable files will ONLY
 contain modified data. So the statement being quoted above is not exactly
 right. I agree that all the modified data in that period will be in the
 incremental sstable files, but a lot of other unmodified data will be in
 them too.


The incremental backup directory only includes SSTables recently flushed
from memtables. It does not include SSTables created as a result of
compaction.

Memtables, by definition, only contain modified or new data. Yes, there is
one new copy per replica and the ones processed after the first might
appear unmodified, which may be what you are talking about?

=Rob


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Hi Rob,
Do you suggest I should try with some other installation mechanism? Are
there any known problems with the tar installation of cassandra 1.2.9 that
I should be aware of? Please do let me know.
Thanks,
Suruchi


On Thu, Sep 19, 2013 at 1:04 PM, Suruchi Deodhar 
suruchi.deod...@generalsentiment.com wrote:

 Hi Robert,
 I downloaded apache-cassandra-1.2.9.tar.gz from
 http://cassandra.apache.org/download/ (
 http://apache.mirrors.tds.net/cassandra/1.2.9/apache-cassandra-1.2.9-bin.tar.gz)
  and installed it on the individual nodes of the cassandra cluster.
 Thanks,
 Suruchi


 On Thu, Sep 19, 2013 at 12:35 PM, Robert Coli rc...@eventbrite.comwrote:

 On Thu, Sep 19, 2013 at 7:03 AM, Richard Low rich...@wentnet.com wrote:

 I think what has happened is that Cassandra was started with num_tokens
 = 1, then shutdown and num_tokens set to 256.  When this happens, the first
 time Cassandra chooses a single random token.  Then when restarted it
 splits the token into 256 adjacent ranges.


 Suruchi,

 By which mechanism did you install Cassandra? I ask out of concern that
 there may be an issue in the some packaging leading to the above sequence
 of events.

 =Rob





Re: What are the steps to go from SimpleSnitch to GossipingPropertyFileSnitch in a live cluster?

2013-09-19 Thread Juan Manuel Formoso
Just FYI, I did it with a rolling restart and everything worked great.


On Wed, Sep 18, 2013 at 5:01 PM, Juan Manuel Formoso jform...@gmail.comwrote:

 Besides making sure the datacenter name is the same in the
 cassandra-rackdc.properties file and the one originally created (
 datacenter1), what else do I have to take into account?

 Can I do a rolling restart or should I kill the entire cluster and then
 startup one at a time?

 --
 *Juan Manuel Formoso
 *Senior Geek
 http://twitter.com/juanformoso
 http://seniorgeek.com.ar
 LLAP




-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar 
suruchi.deod...@generalsentiment.com wrote:

 Do you suggest I should try with some other installation mechanism? Are
 there any known problems with the tar installation of cassandra 1.2.9 that
 I should be aware of?


I was asking in the context of this JIRA :

https://issues.apache.org/jira/browse/CASSANDRA-2356

Which does not seem to apply in your case!

=Rob


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread sankalp kohli
You cannot start level compaction. It will run based on data in each level.


On Thu, Sep 19, 2013 at 9:19 AM, Nate McCall n...@thelastpickle.com wrote:

 As opposed to stopping compaction altogether, have you experimented with
 turning down compaction_throughput_mb_per_sec (16mb default) and/or
 explicitly setting concurrent_compactors (defaults to the number of cores,
 iirc).


 On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar 
 rashmi.aros...@gmail.comwrote:

 Hi,
 In general leveled compaction are I/O heavy so when there are bunch of
 writes do we need to stop leveled compactions at all?
 I found the nodetool stop COMPACTION, which states it stops compaction
 happening, does this work for any type of compaction? Also it states in
 documents 'eventually cassandra restarts the compaction', isn't there a way
 to control when to start the compaction again manually ?
 If this is not applicable for leveled compactions in 1.2, then what can
 be used for stopping/restating those?



 Thanks,
 Rashmi





Re: Rebalancing vnodes cluster

2013-09-19 Thread Robert Coli
On Wed, Sep 18, 2013 at 4:26 PM, Nimi Wariboko Jr
nimiwaribo...@gmail.comwrote:

 When I started with cassandra I had originally set it up to use tokens. I
 then migrated to vnodes (using shuffle), but my cluster isn't balanced (
 http://imgur.com/73eNhJ3).


Are you saying that (other than the imbalance that is the subject of this
thread) you were able to use shuffle successfully on a cluster with
~150gb per node?

1) How long did it take?
2) Did you experience any difficulties while doing so?
3) Have you run cleanup yet?
4) What version of Cassandra?

=Rob


Re: AssertionError: sstableloader

2013-09-19 Thread Yuki Morishita
Sounds like a bug.
Would you mind filing JIRA at https://issues.apache.org/jira/browse/CASSANDRA?

Thanks,

On Thu, Sep 19, 2013 at 2:12 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 Hi,
 I am trying to use sstableloader to load some external data and getting
 given below error:
 Established connection to initial hosts
 Opening sstables and calculating sections to stream
 Streaming relevant part of
 /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db to
 [/127.0.0.1]
 progress: [/127.0.0.1 1/1 (100%)] [total: 100% - 0MB/s (avg:
 0MB/s)]Exception in thread STREAM-OUT-/127.0.0.1 java.lang.AssertionError:
 Reference counter -1 for
 /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db
 at
 org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1017)
 at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:120)
 at
 org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:73)
 at
 org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:45)
 at
 org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
 at
 org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:384)
 at
 org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:357)
 at java.lang.Thread.run(Thread.java:722)


 Any pointers?

 -Vivek



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
The only thing you need to guarantee is that Cassandra doesn't start with
num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all the
data before starting it with higher num_tokens.


On 19 September 2013 19:07, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Do you suggest I should try with some other installation mechanism? Are
 there any known problems with the tar installation of cassandra 1.2.9 that
 I should be aware of?


 I was asking in the context of this JIRA :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 Which does not seem to apply in your case!

 =Rob



Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Thanks for your replies. I wiped out my data from the cluster and also
cleared the commitlog before restarting it with num_tokens=256. I then
uploaded data using sstableloader.

However, I am still not able to see a uniform distribution of data across
nodes of the clusters.

The output of the bin/nodetool -h localhost status commands looks like
follows. Some nodes have data as low as 1.12MB while some have as high as
912.57 MB.

Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns (effective)  Host
ID   Rack
UN  10.238.133.174  856.66 MB  256 8.4%
e41d8863-ce37-4d5c-a428-bfacea432a35  1a
UN  10.238.133.97   439.02 MB  256 7.7%
1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
UN  10.151.86.146   1.05 GB256 8.0%
8952645d-4a27-4670-afb2-65061c205734  1a
UN  10.138.10.9 912.57 MB  256 8.6%
25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
UN  10.87.87.24070.85 MB   256 8.6%
ea066827-83bc-458c-83e8-bd15b7fc783c  1b
UN  10.93.5.157 60.56 MB   256 7.6%
4ab9111c-39b4-4d15-9401-359d9d853c16  1b
UN  10.92.231.170   866.73 MB  256 9.3%
a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
UN  10.238.137.250  533.77 MB  256 7.8%
84301648-afff-4f06-aa0b-4be421e0d08f  1a
UN  10.93.91.139478.45 KB  256 8.1%
682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
UN  10.138.2.20 1.12 MB256 7.9%
a6d4672a-0915-4c64-ba47-9f190abbf951  1a
UN  10.93.31.44 282.65 MB  256 7.8%
67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
UN  10.236.138.169  223.66 MB  256 9.1%
cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
UN  10.137.7.90 11.36 MB   256 7.4%
17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
UN  10.93.77.166837.64 MB  256 8.8%
9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
UN  10.120.249.140  838.59 MB  256 9.4%
e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
UN  10.90.246.128   216.75 MB  256 8.4%
054911ec-969d-43d9-aea1-db445706e4d2  1b
UN  10.123.95.248   147.1 MB   256 7.2%
a17deca1-9644-4520-9e62-ac66fc6fef60  1b
UN  10.136.11.404.24 MB256 8.5%
66be1173-b822-40b5-b650-cb38ae3c7a51  1a
UN  10.87.90.42 11.56 MB   256 8.0%
dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
UN  10.87.75.147549 MB 256 8.3%
ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
UN  10.151.49.88119.86 MB  256 8.9%
57043573-ab1b-4e3c-8044-58376f7ce08f  1a
UN  10.87.83.107484.3 MB   256 8.3%
0019439b-9f8a-4965-91b8-7108bbb55593  1b
UN  10.137.20.183   137.67 MB  256 8.4%
15951592-8ab2-473d-920a-da6e9d99507d  1a
UN  10.238.170.159  49.17 MB   256 9.4%
32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a

Is there something else that I should be doing differently?

Thanks for your help!

Suruchi



On Thu, Sep 19, 2013 at 3:20 PM, Richard Low rich...@wentnet.com wrote:

 The only thing you need to guarantee is that Cassandra doesn't start with
 num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all the
 data before starting it with higher num_tokens.


 On 19 September 2013 19:07, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Do you suggest I should try with some other installation mechanism? Are
 there any known problems with the tar installation of cassandra 1.2.9 that
 I should be aware of?


 I was asking in the context of this JIRA :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 Which does not seem to apply in your case!

 =Rob





AssertionError: sstableloader

2013-09-19 Thread Vivek Mishra
Hi,
I am trying to use sstableloader to load some external data and getting
given below error:
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of
/home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db to [/
127.0.0.1]
progress: [/127.0.0.1 1/1 (100%)] [total: 100% - 0MB/s (avg:
0MB/s)]Exception in thread STREAM-OUT-/127.0.0.1
java.lang.AssertionError: Reference counter -1 for
/home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db
at
org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1017)
at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:120)
at
org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:73)
at
org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:45)
at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:384)
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:357)
at java.lang.Thread.run(Thread.java:722)


Any pointers?

-Vivek


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread rash aroskar
Thanks for responses.
Nate - I haven't tried changing compaction_throughput_mb_per_sec. In my
cassandra.yaml I had set it to 32 to begin with. Do you think 32 can be too
much if the cassandra get once in a while writes but when it gets writes
its a big chunk together?


On Thu, Sep 19, 2013 at 12:33 PM, sankalp kohli kohlisank...@gmail.comwrote:

 You cannot start level compaction. It will run based on data in each
 level.


 On Thu, Sep 19, 2013 at 9:19 AM, Nate McCall n...@thelastpickle.comwrote:

 As opposed to stopping compaction altogether, have you experimented with
 turning down compaction_throughput_mb_per_sec (16mb default) and/or
 explicitly setting concurrent_compactors (defaults to the number of cores,
 iirc).


 On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar 
 rashmi.aros...@gmail.comwrote:

 Hi,
 In general leveled compaction are I/O heavy so when there are bunch of
 writes do we need to stop leveled compactions at all?
 I found the nodetool stop COMPACTION, which states it stops compaction
 happening, does this work for any type of compaction? Also it states in
 documents 'eventually cassandra restarts the compaction', isn't there a way
 to control when to start the compaction again manually ?
 If this is not applicable for leveled compactions in 1.2, then what can
 be used for stopping/restating those?



 Thanks,
 Rashmi






Re: how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1

2013-09-19 Thread Cyril Scetbon
Hi,

Did you try to build 1.2.10 and to use it for your tests ? I've got the same 
issue and will give it a try as soon as it's released (expected at the end of 
the week).

Regards
-- 
Cyril SCETBON

On Sep 2, 2013, at 3:09 PM, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 hi all:
 
 More info :
 
 https://issues.apache.org/jira/browse/CASSANDRA-5941
 
 
 
 I tried this (and gen. cassandra 1.2.9)  but do not work for me, 
 
 git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
 cd cassandra
 git checkout cassandra-1.2
 patch -p1  5867-bug-fix-filter-push-down-1.2-branch.txt
 ant
 
 
 
 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com
 
 
 
 2013/9/2 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com
 hi:
 
 I test this in cassandra 1.2.9 new  version and the issue still persists .
 
 :-(
 
 
 
 
 
 
 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com
 
 
 
 2013/8/30 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com
 I try this:
 
 rows = LOAD 
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING 
 CqlStorage();
 dump rows;
 ILLUSTRATE rows;
 describe rows;
 
 values2= FOREACH rows GENERATE  TOTUPLE (id) as (mycolumn:tuple(name,value));
 dump values2;
 describe values2;
 
 But I get this results:
 
 
 
 -
 | rows | id:chararray   | age:int   | title:chararray   | 
 -
 |  | (id, 6)| (age, 30) | (title, QA)   | 
 -
 
 rows: {id: chararray,age: int,title: chararray}
 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1031: Incompatable field schema: left is 
 tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is 
 org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)
 
 
 
 
 
 or 
 
 
 
 
 
 values2= FOREACH rows GENERATE  TOTUPLE (id) ;
 dump values2;
 describe values2;
 
 
 
 and  the results are:
 
 
 ...
 (((id,6)))
 (((id,5)))
 values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}
 
 
 
 Aggg!
 
 
 
 
 
 
 
 
 
 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com
 
 
 
 2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com
 hi:
 
 I can not understand why the schema is  define like 
 id:chararray,age:int,title:chararray  and it does not define like tuples or 
 bag tuples,  if we have pair key-values  columns
 
 
 I try other time to change schema  but it does not work.
 
 any ideas ...
 
 perhaps, is the issue in the definition cql3 tables ?
 
 regards
 
 
 2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com
 hi all:
 
 
 Regards
 
 Still i can resolve this issue. .
 
 does anybody have this issue or try to test this simple example?
 
 
 i am stumped I can not find a solution working. 
 
 I appreciate any comment or help
 
 
 2013/8/22 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com
 hi all:
 
 
 
 
 I,m testing the new CqlStorage() with cassandra 1.28 and pig 0.11.1 
 
 
 I am using this sample data test:
 
  
 http://frommyworkshop.blogspot.com.es/2013/07/hadoop-map-reduce-with-cassandra.html
 
 And I load and dump data Righ with this script:
 
 rows = LOAD 
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING 
 CqlStorage();
 
 dump rows;
 describe rows;
 
 resutls:
 
 ((id,6),(age,30),(title,QA))
 ((id,5),(age,30),(title,QA))
 rows: {id: chararray,age: int,title: chararray}
 
 
 But i can not  get  the column values 
 
 I try to define   another schemas in Load like I used with cassandraStorage()
 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-Pig-how-to-get-column-values-td5641158.html
 
 
 example:
 
 rows = LOAD 
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING 
 CqlStorage() AS (columns: bag {T: tuple(name, value)});
 
 
 and I get this error:
 
 2013-08-22 12:24:45,426 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1031: Incompatable schema: left is 
 columns:bag{T:tuple(name:bytearray,value:bytearray)}, right is 
 id:chararray,age:int,title:chararray
 
 
 
 I try to use, FLATTEN, SUBSTRING, SPLIT UDF`s but i have not get good result:
 
 Example:
 
 when I flatten , I get a set of tuples like
 (title,QA)
 (title,QA)
 2013-08-22 12:42:20,673 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
 paths to process : 1
 A: {title: chararray}
 
 
 but i can get value QA 
 
 Sustring only works with title
 
 
 
 example:
 
 B = FOREACH A GENERATE SUBSTRING(title,2,5);
 
 dump B;
 describe B;
 
 
 results:
 
 (tle)
 (tle)
 B: {chararray}
 
 
 
 i try, this like ERIC LEE inthe other mail  and have the same results:
 
 
  Anyways, what I really what is the column value, not the 

Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Yes, the key distribution does vary across the nodes. For example, on the
node with the highest data, Number of Keys (estimate) is 6527744 for a
particular column family, whereas for the same column family on the node
with least data, Number of Keys (estimate) = 3840.

Is there a way to control this distribution by setting some parameter of
cassandra.

I am using the Murmur3 partitioner with NetworkTopologyStrategy.

Thanks,
Suruchi



On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Can you check cfstats to see number of keys per node?


 On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Thanks for your replies. I wiped out my data from the cluster and also
 cleared the commitlog before restarting it with num_tokens=256. I then
 uploaded data using sstableloader.

 However, I am still not able to see a uniform distribution of data across
 nodes of the clusters.

 The output of the bin/nodetool -h localhost status commands looks like
 follows. Some nodes have data as low as 1.12MB while some have as high as
 912.57 MB.

 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.238.133.174  856.66 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
 UN  10.238.133.97   439.02 MB  256 7.7%
 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
 UN  10.151.86.146   1.05 GB256 8.0%
 8952645d-4a27-4670-afb2-65061c205734  1a
 UN  10.138.10.9 912.57 MB  256 8.6%
 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
 UN  10.87.87.24070.85 MB   256 8.6%
 ea066827-83bc-458c-83e8-bd15b7fc783c  1b
 UN  10.93.5.157 60.56 MB   256 7.6%
 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
 UN  10.92.231.170   866.73 MB  256 9.3%
 a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
 UN  10.238.137.250  533.77 MB  256 7.8%
 84301648-afff-4f06-aa0b-4be421e0d08f  1a
 UN  10.93.91.139478.45 KB  256 8.1%
 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
 UN  10.138.2.20 1.12 MB256 7.9%
 a6d4672a-0915-4c64-ba47-9f190abbf951  1a
 UN  10.93.31.44 282.65 MB  256 7.8%
 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
 UN  10.236.138.169  223.66 MB  256 9.1%
 cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
 UN  10.137.7.90 11.36 MB   256 7.4%
 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
 UN  10.93.77.166837.64 MB  256 8.8%
 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
 UN  10.120.249.140  838.59 MB  256 9.4%
 e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
 UN  10.90.246.128   216.75 MB  256 8.4%
 054911ec-969d-43d9-aea1-db445706e4d2  1b
 UN  10.123.95.248   147.1 MB   256 7.2%
 a17deca1-9644-4520-9e62-ac66fc6fef60  1b
 UN  10.136.11.404.24 MB256 8.5%
 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
 UN  10.87.90.42 11.56 MB   256 8.0%
 dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
 UN  10.87.75.147549 MB 256 8.3%
 ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
 UN  10.151.49.88119.86 MB  256 8.9%
 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
 UN  10.87.83.107484.3 MB   256 8.3%
 0019439b-9f8a-4965-91b8-7108bbb55593  1b
 UN  10.137.20.183   137.67 MB  256 8.4%
 15951592-8ab2-473d-920a-da6e9d99507d  1a
 UN  10.238.170.159  49.17 MB   256 9.4%
 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a

 Is there something else that I should be doing differently?

 Thanks for your help!

 Suruchi



 On Thu, Sep 19, 2013 at 3:20 PM, Richard Low rich...@wentnet.com wrote:

 The only thing you need to guarantee is that Cassandra doesn't start
 with num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all
 the data before starting it with higher num_tokens.


 On 19 September 2013 19:07, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Do you suggest I should try with some other installation mechanism?
 Are there any known problems with the tar installation of cassandra 1.2.9
 that I should be aware of?


 I was asking in the context of this JIRA :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 Which does not seem to apply in your case!

 =Rob







Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Mohit Anchlia
Can you check cfstats to see number of keys per node?

On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar 
suruchi.deod...@generalsentiment.com wrote:

 Thanks for your replies. I wiped out my data from the cluster and also
 cleared the commitlog before restarting it with num_tokens=256. I then
 uploaded data using sstableloader.

 However, I am still not able to see a uniform distribution of data across
 nodes of the clusters.

 The output of the bin/nodetool -h localhost status commands looks like
 follows. Some nodes have data as low as 1.12MB while some have as high as
 912.57 MB.

 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.238.133.174  856.66 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
 UN  10.238.133.97   439.02 MB  256 7.7%
 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
 UN  10.151.86.146   1.05 GB256 8.0%
 8952645d-4a27-4670-afb2-65061c205734  1a
 UN  10.138.10.9 912.57 MB  256 8.6%
 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
 UN  10.87.87.24070.85 MB   256 8.6%
 ea066827-83bc-458c-83e8-bd15b7fc783c  1b
 UN  10.93.5.157 60.56 MB   256 7.6%
 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
 UN  10.92.231.170   866.73 MB  256 9.3%
 a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
 UN  10.238.137.250  533.77 MB  256 7.8%
 84301648-afff-4f06-aa0b-4be421e0d08f  1a
 UN  10.93.91.139478.45 KB  256 8.1%
 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
 UN  10.138.2.20 1.12 MB256 7.9%
 a6d4672a-0915-4c64-ba47-9f190abbf951  1a
 UN  10.93.31.44 282.65 MB  256 7.8%
 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
 UN  10.236.138.169  223.66 MB  256 9.1%
 cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
 UN  10.137.7.90 11.36 MB   256 7.4%
 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
 UN  10.93.77.166837.64 MB  256 8.8%
 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
 UN  10.120.249.140  838.59 MB  256 9.4%
 e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
 UN  10.90.246.128   216.75 MB  256 8.4%
 054911ec-969d-43d9-aea1-db445706e4d2  1b
 UN  10.123.95.248   147.1 MB   256 7.2%
 a17deca1-9644-4520-9e62-ac66fc6fef60  1b
 UN  10.136.11.404.24 MB256 8.5%
 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
 UN  10.87.90.42 11.56 MB   256 8.0%
 dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
 UN  10.87.75.147549 MB 256 8.3%
 ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
 UN  10.151.49.88119.86 MB  256 8.9%
 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
 UN  10.87.83.107484.3 MB   256 8.3%
 0019439b-9f8a-4965-91b8-7108bbb55593  1b
 UN  10.137.20.183   137.67 MB  256 8.4%
 15951592-8ab2-473d-920a-da6e9d99507d  1a
 UN  10.238.170.159  49.17 MB   256 9.4%
 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a

 Is there something else that I should be doing differently?

 Thanks for your help!

 Suruchi



 On Thu, Sep 19, 2013 at 3:20 PM, Richard Low rich...@wentnet.com wrote:

 The only thing you need to guarantee is that Cassandra doesn't start with
 num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all the
 data before starting it with higher num_tokens.


 On 19 September 2013 19:07, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Do you suggest I should try with some other installation mechanism? Are
 there any known problems with the tar installation of cassandra 1.2.9 that
 I should be aware of?


 I was asking in the context of this JIRA :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 Which does not seem to apply in your case!

 =Rob






Storing binary blobs data in Cassandra Column family?

2013-09-19 Thread Raihan Jamal
I need to store binary byte data in Cassandra column family in all my
columns. Each columns will have its own binary byte data. Below is the code
where I will be getting binary byte data. My rowKey is going to be String
but all my columns has to store binary blobs data.

GenericDatumWriterGenericRecord writer = new
GenericDatumWriterGenericRecord(schema);
ByteArrayOutputStream os = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(os, null);
writer.write(record, e);
e.flush();
byte[] byteData = os.toByteArray();
os.close();
  // write byteData in Cassandra for the columns


I am not sure what should be the right way to create the Cassandra column
family for the above use case? Below is the column family, I have created
but I am not sure this is the right way to do that for above use case?

create column family TESTING
with key_validation_class = 'UTF8Type'
and comparator = 'BytesType'
and default_validation_class = 'UTF8Type'
and gc_grace = 86400
and column_metadata = [ {column_name : 'lmd', validation_class :
DateType}];




*Raihan Jamal*


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread Juan Manuel Formoso
concurrent_compactors is ignored when using leveled compactions


On Thu, Sep 19, 2013 at 1:19 PM, Nate McCall n...@thelastpickle.com wrote:

 As opposed to stopping compaction altogether, have you experimented with
 turning down compaction_throughput_mb_per_sec (16mb default) and/or
 explicitly setting concurrent_compactors (defaults to the number of cores,
 iirc).


 On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar 
 rashmi.aros...@gmail.comwrote:

 Hi,
 In general leveled compaction are I/O heavy so when there are bunch of
 writes do we need to stop leveled compactions at all?
 I found the nodetool stop COMPACTION, which states it stops compaction
 happening, does this work for any type of compaction? Also it states in
 documents 'eventually cassandra restarts the compaction', isn't there a way
 to control when to start the compaction again manually ?
 If this is not applicable for leveled compactions in 1.2, then what can
 be used for stopping/restating those?



 Thanks,
 Rashmi





-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
On 19 September 2013 20:36, Suruchi Deodhar 
suruchi.deod...@generalsentiment.com wrote:

 Thanks for your replies. I wiped out my data from the cluster and also
 cleared the commitlog before restarting it with num_tokens=256. I then
 uploaded data using sstableloader.

 However, I am still not able to see a uniform distribution of data across
 nodes of the clusters.

 The output of the bin/nodetool -h localhost status commands looks like
 follows. Some nodes have data as low as 1.12MB while some have as high as
 912.57 MB.


Now the 'Owns (effective)' column is showing the tokens are roughly
balanced.  So now the problem is the data isn't uniform - either you have
some rows much larger than others or some nodes are missing data that could
be replicated by running repair.

Richard.


Re: AssertionError: sstableloader

2013-09-19 Thread Vivek Mishra
More to add on this:

This is happening for column families created via CQL3 with collection type
columns and without WITH COMPACT STORAGE.


On Fri, Sep 20, 2013 at 12:51 AM, Yuki Morishita mor.y...@gmail.com wrote:

 Sounds like a bug.
 Would you mind filing JIRA at
 https://issues.apache.org/jira/browse/CASSANDRA?

 Thanks,

 On Thu, Sep 19, 2013 at 2:12 PM, Vivek Mishra mishra.v...@gmail.com
 wrote:
  Hi,
  I am trying to use sstableloader to load some external data and getting
  given below error:
  Established connection to initial hosts
  Opening sstables and calculating sections to stream
  Streaming relevant part of
  /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db to
  [/127.0.0.1]
  progress: [/127.0.0.1 1/1 (100%)] [total: 100% - 0MB/s (avg:
  0MB/s)]Exception in thread STREAM-OUT-/127.0.0.1
 java.lang.AssertionError:
  Reference counter -1 for
  /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db
  at
 
 org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1017)
  at
 org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:120)
  at
 
 org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:73)
  at
 
 org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:45)
  at
 
 org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
  at
 
 org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:384)
  at
 
 org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:357)
  at java.lang.Thread.run(Thread.java:722)
 
 
  Any pointers?
 
  -Vivek



 --
 Yuki Morishita
  t:yukim (http://twitter.com/yukim)



Re: Decomissioning a datacenter

2013-09-19 Thread Juan Manuel Formoso
Not forever, while I decommission the nodes I assume. What I don't
understand is the wording no longer reference


On Thu, Sep 19, 2013 at 6:17 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 1:52 PM, Juan Manuel Formoso 
 jform...@gmail.comwrote:


 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_decomission_dc_t.html

 When it says Change all keyspaces so they no longer reference the data
 center being removed., does that mean setting my replication_strategy so
 that datacenter1:0,datacenter2:N ? (assuming I'm removing datacenter1)


 I would presume it means remove datacenter1 entirely, not set it to 0
 forever.

 =Rob





-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Decomissioning a datacenter

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 1:52 PM, Juan Manuel Formoso jform...@gmail.comwrote:


 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_decomission_dc_t.html

 When it says Change all keyspaces so they no longer reference the data
 center being removed., does that mean setting my replication_strategy so
 that datacenter1:0,datacenter2:N ? (assuming I'm removing datacenter1)


I would presume it means remove datacenter1 entirely, not set it to 0
forever.

=Rob


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Mohit Anchlia
Can you run nodetool repair on all the nodes first and look at the keys?

On Thu, Sep 19, 2013 at 1:22 PM, Suruchi Deodhar 
suruchi.deod...@generalsentiment.com wrote:

 Yes, the key distribution does vary across the nodes. For example, on the
 node with the highest data, Number of Keys (estimate) is 6527744 for a
 particular column family, whereas for the same column family on the node
 with least data, Number of Keys (estimate) = 3840.

 Is there a way to control this distribution by setting some parameter of
 cassandra.

 I am using the Murmur3 partitioner with NetworkTopologyStrategy.

 Thanks,
 Suruchi



 On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Can you check cfstats to see number of keys per node?


 On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Thanks for your replies. I wiped out my data from the cluster and also
 cleared the commitlog before restarting it with num_tokens=256. I then
 uploaded data using sstableloader.

 However, I am still not able to see a uniform distribution of data
 across nodes of the clusters.

 The output of the bin/nodetool -h localhost status commands looks like
 follows. Some nodes have data as low as 1.12MB while some have as high as
 912.57 MB.

 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.238.133.174  856.66 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
 UN  10.238.133.97   439.02 MB  256 7.7%
 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
 UN  10.151.86.146   1.05 GB256 8.0%
 8952645d-4a27-4670-afb2-65061c205734  1a
 UN  10.138.10.9 912.57 MB  256 8.6%
 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
 UN  10.87.87.24070.85 MB   256 8.6%
 ea066827-83bc-458c-83e8-bd15b7fc783c  1b
 UN  10.93.5.157 60.56 MB   256 7.6%
 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
 UN  10.92.231.170   866.73 MB  256 9.3%
 a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
 UN  10.238.137.250  533.77 MB  256 7.8%
 84301648-afff-4f06-aa0b-4be421e0d08f  1a
 UN  10.93.91.139478.45 KB  256 8.1%
 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
 UN  10.138.2.20 1.12 MB256 7.9%
 a6d4672a-0915-4c64-ba47-9f190abbf951  1a
 UN  10.93.31.44 282.65 MB  256 7.8%
 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
 UN  10.236.138.169  223.66 MB  256 9.1%
 cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
 UN  10.137.7.90 11.36 MB   256 7.4%
 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
 UN  10.93.77.166837.64 MB  256 8.8%
 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
 UN  10.120.249.140  838.59 MB  256 9.4%
 e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
 UN  10.90.246.128   216.75 MB  256 8.4%
 054911ec-969d-43d9-aea1-db445706e4d2  1b
 UN  10.123.95.248   147.1 MB   256 7.2%
 a17deca1-9644-4520-9e62-ac66fc6fef60  1b
 UN  10.136.11.404.24 MB256 8.5%
 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
 UN  10.87.90.42 11.56 MB   256 8.0%
 dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
 UN  10.87.75.147549 MB 256 8.3%
 ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
 UN  10.151.49.88119.86 MB  256 8.9%
 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
 UN  10.87.83.107484.3 MB   256 8.3%
 0019439b-9f8a-4965-91b8-7108bbb55593  1b
 UN  10.137.20.183   137.67 MB  256 8.4%
 15951592-8ab2-473d-920a-da6e9d99507d  1a
 UN  10.238.170.159  49.17 MB   256 9.4%
 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a

 Is there something else that I should be doing differently?

 Thanks for your help!

 Suruchi



 On Thu, Sep 19, 2013 at 3:20 PM, Richard Low rich...@wentnet.comwrote:

 The only thing you need to guarantee is that Cassandra doesn't start
 with num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all
 the data before starting it with higher num_tokens.


 On 19 September 2013 19:07, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Do you suggest I should try with some other installation mechanism?
 Are there any known problems with the tar installation of cassandra 1.2.9
 that I should be aware of?


 I was asking in the context of this JIRA :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 Which does not seem to apply in your case!

 =Rob








Decomissioning a datacenter

2013-09-19 Thread Juan Manuel Formoso
Quick question.
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_decomission_dc_t.html

When it says Change all keyspaces so they no longer reference the data
center being removed., does that mean setting my replication_strategy so
that datacenter1:0,datacenter2:N ? (assuming I'm removing datacenter1)

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-19 Thread srmore
I hit this issue again today and looks like changing -Xss option does not
work :(
I am on 1.0.11 (I know its old, we are upgrading to 1.2.9 right now) and
have about 800-900GB of data. I can see cassandra is spending a lot of time
reading the data files before it quits with  java.lang.OutOfMemoryError:
unable to create new native thread error.

My hard and soft limits seems to be ok as well
Datastax recommends [1]

* soft nofile 32768
* hard nofile 32768


and I have
hardnofile 65536
softnofile 65536

My ulimit -u output is 515038 (which again should be sufficient)

complete output

ulimit -a
core file size  (blocks, -c)0
data seg size   (kbytes, -d)  unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 515038
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 515038
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited




Has anyone run into this ?

[1] http://www.datastax.com/docs/1.1/troubleshooting/index

On Wed, Sep 11, 2013 at 8:47 AM, srmore comom...@gmail.com wrote:

 Thanks Viktor,


 - check (cassandra-env.sh) -Xss size, you may need to increase it for your
 JVM;

 This seems to have done the trick !

 Thanks !


 On Tue, Sep 10, 2013 at 12:46 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:

  For start:

 - check (cassandra-env.sh) -Xss size, you may need to increase it for
 your JVM;

 - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase
 it for your data load/bloom filter/index sizes.

 ** **

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

  [image: Adform News] http://www.adform.com

 *Visit us at Dmexco: *Hall 6 Stand B-52
 September 18-19 Cologne, Germany
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Dmexco 2013] http://www.dmexco.de/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* srmore [mailto:comom...@gmail.com]
 *Sent:* Tuesday, September 10, 2013 6:16 AM
 *To:* user@cassandra.apache.org
 *Subject:* Error during startup - java.lang.OutOfMemoryError: unable to
 create new native thread [heur]

 ** **


 I have a 5 node cluster with a load of around 300GB each. A node went
 down and does not come up. I can see the following exception in the logs.

 ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
 139) Fatal exception in thread Thread[main,5,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at
 java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
 at
 java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
 at
 org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.init(JMXConfigurableThreadPoolExecutor.java:34)
 at
 org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
 at
 org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:42)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)**
 **

 ** **

 The *ulimit -u* output is
 *515042*

 Which is far more than what is recommended [1] (10240) and I am skeptical
 to set it to unlimited as recommended here [2]

 Any pointers as to what could be the issue and how to get the node up.***
 *




 [1]
 

Re: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-19 Thread srmore
Was too fast on the send button, sorry.
The thing I wanted to add was the

pending signals (-i) 515038

that looks odd to me, could that be related.



On Thu, Sep 19, 2013 at 4:53 PM, srmore comom...@gmail.com wrote:


 I hit this issue again today and looks like changing -Xss option does not
 work :(
 I am on 1.0.11 (I know its old, we are upgrading to 1.2.9 right now) and
 have about 800-900GB of data. I can see cassandra is spending a lot of time
 reading the data files before it quits with  java.lang.OutOfMemoryError:
 unable to create new native thread error.

 My hard and soft limits seems to be ok as well
 Datastax recommends [1]

 * soft nofile 32768
 * hard nofile 32768


 and I have
 hardnofile 65536
 softnofile 65536

 My ulimit -u output is 515038 (which again should be sufficient)

 complete output

 ulimit -a
 core file size  (blocks, -c)0
 data seg size   (kbytes, -d)  unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 515038
 max locked memory   (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files  (-n) 1024
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515038
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited




 Has anyone run into this ?

 [1] http://www.datastax.com/docs/1.1/troubleshooting/index

 On Wed, Sep 11, 2013 at 8:47 AM, srmore comom...@gmail.com wrote:

 Thanks Viktor,


 - check (cassandra-env.sh) -Xss size, you may need to increase it for
 your JVM;

 This seems to have done the trick !

 Thanks !


 On Tue, Sep 10, 2013 at 12:46 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:

  For start:

 - check (cassandra-env.sh) -Xss size, you may need to increase it for
 your JVM;

 - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase
 it for your data load/bloom filter/index sizes.

 ** **

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

  [image: Adform News] http://www.adform.com

 *Visit us at Dmexco: *Hall 6 Stand B-52
 September 18-19 Cologne, Germany
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media 
 Suitehttp://vimeo.com/adform/richmedia
  [image: Dmexco 2013] http://www.dmexco.de/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* srmore [mailto:comom...@gmail.com]
 *Sent:* Tuesday, September 10, 2013 6:16 AM
 *To:* user@cassandra.apache.org
 *Subject:* Error during startup - java.lang.OutOfMemoryError: unable to
 create new native thread [heur]

 ** **


 I have a 5 node cluster with a load of around 300GB each. A node went
 down and does not come up. I can see the following exception in the logs.

 ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
 139) Fatal exception in thread Thread[main,5,main]
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:640)
 at
 java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
 at
 java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
 at
 org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.init(JMXConfigurableThreadPoolExecutor.java:34)
 at
 org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
 at
 org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:42)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
 at
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)*
 ***

 ** **

 The *ulimit -u* output is
 *515042*

 Which is far more than what 

Re: Decomissioning a datacenter

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 2:43 PM, Juan Manuel Formoso jform...@gmail.comwrote:

 Not forever, while I decommission the nodes I assume. What I don't
 understand is the wording no longer reference


Why does your replication strategy need to be aware of nodes which receive
zero replicas?

No longer reference almost certainly means just removing any reference to
that DC from the configuration of the replication strategy.

=Rob


Re: I don't understand shuffle progress

2013-09-19 Thread Jeremiah D Jordan
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/configuration/configVnodesProduction_t.html

On Sep 18, 2013, at 9:41 AM, Chris Burroughs chris.burrou...@gmail.com wrote:

 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html
 
 This is a basic outline.
 
 
 On 09/18/2013 10:32 AM, Juan Manuel Formoso wrote:
 I really like this idea. I can create a new cluster and have it replicate
 the old one, after it finishes I can remove the original.
 
 Any good resource that explains how to add a new datacenter to a live
 single dc cluster that anybody can recommend?
 
 
 On Wed, Sep 18, 2013 at 9:58 AM, Chris Burroughs
 chris.burrou...@gmail.comwrote:
 
 On 09/17/2013 09:41 PM, Paulo Motta wrote:
 
 So you're saying the only feasible way of enabling VNodes on an upgraded
 C*
 1.2 is by doing fork writes to a brand new cluster + bulk load of sstables
 from the old cluster? Or is it possible to succeed on shuffling, even if
 that means waiting some weeks for the shuffle to complete?
 
 
 In a multi DC cluster situation you *should* be able to bring up a new
 DC with vnodes, bootstrap it, and then decommission the old cluster.
 
 
 
 
 



Re: Decomissioning a datacenter

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 3:03 PM, Juan Manuel Formoso jform...@gmail.comwrote:

 Oh, so just datacenter2:N then.


Yes.


 Sorry, not a native English speaker, and also tired :)


NP! :D

=Rob


Re: Decomissioning a datacenter

2013-09-19 Thread Juan Manuel Formoso
Oh, so just datacenter2:N then.
Sorry, not a native English speaker, and also tired :)


On Thu, Sep 19, 2013 at 6:57 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 19, 2013 at 2:43 PM, Juan Manuel Formoso 
 jform...@gmail.comwrote:

 Not forever, while I decommission the nodes I assume. What I don't
 understand is the wording no longer reference


 Why does your replication strategy need to be aware of nodes which receive
 zero replicas?

 No longer reference almost certainly means just removing any reference
 to that DC from the configuration of the replication strategy.

 =Rob





-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


NetworkTopologyStrategy Error

2013-09-19 Thread Ashley Martens
I tried to split my cluster and ran into this error, which I did not see in
the tests I performed.

ERROR [pool-1-thread-52165] 2013-09-19 21:48:08,262 Cassandra.java (line
3250) Internal error processing describe_ring
java.lang.IllegalStateException: datacenter (DC103) has no more endpoints,
(3) replicas still needed
at
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:118)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:101)
at
org.apache.cassandra.service.StorageService.constructRangeToEndpointMap(StorageService.java:604)
at
org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:579)
at
org.apache.cassandra.service.StorageService.getRangeToEndpointMap(StorageService.java:553)
at
org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:584)
at
org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.process(Cassandra.java:3246)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


-- 

Ashley  OpenPGP -- KeyID: 0x5B0D6ABB
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x23E861255B0D6ABB


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Jayadev Jayaraman
We ran nodetool repair on all nodes for all Keyspaces / CFs, restarted
cassandra and this is what we get for nodetool status :

bin/nodetool -h localhost status
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns (effective)  Host ID
Rack
UN  10.238.133.174  885.36 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
UN  10.238.133.97   468.66 MB  256 7.7%
 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
UN  10.151.86.146   1.08 GB256 8.0%
 8952645d-4a27-4670-afb2-65061c205734  1a
UN  10.138.10.9 941.44 MB  256 8.6%
 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
UN  10.87.87.24099.69 MB   256 8.6%
 ea066827-83bc-458c-83e8-bd15b7fc783c  1b
UN  10.93.5.157 87.44 MB   256 7.6%
 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
UN  10.238.137.250  561.42 MB  256 7.8%
 84301648-afff-4f06-aa0b-4be421e0d08f  1a
UN  10.92.231.170   893.75 MB  256 9.3%
 a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
UN  10.138.2.20 31.89 MB   256 7.9%
 a6d4672a-0915-4c64-ba47-9f190abbf951  1a
UN  10.93.31.44 312.52 MB  256 7.8%
 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
UN  10.93.91.13930.46 MB   256 8.1%
 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
UN  10.236.138.169  260.15 MB  256 9.1%
 cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
UN  10.137.7.90 38.45 MB   256 7.4%
 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
UN  10.93.77.166867.15 MB  256 8.8%
 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
UN  10.120.249.140  863.98 MB  256 9.4%
 e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
UN  10.90.246.128   242.63 MB  256 8.4%
 054911ec-969d-43d9-aea1-db445706e4d2  1b
UN  10.123.95.248   171.51 MB  256 7.2%
 a17deca1-9644-4520-9e62-ac66fc6fef60  1b
UN  10.136.11.4033.8 MB256 8.5%
 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
UN  10.87.90.42 38.01 MB   256 8.0%
 dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
UN  10.87.75.147579.29 MB  256 8.3%
 ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
UN  10.151.49.88151.06 MB  256 8.9%
 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
UN  10.87.83.107512.91 MB  256 8.3%
 0019439b-9f8a-4965-91b8-7108bbb55593  1b
UN  10.238.170.159  85.04 MB   256 9.4%
 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a
UN  10.137.20.183   167.41 MB  256 8.4%
 15951592-8ab2-473d-920a-da6e9d99507d  1a

It doesn't seem to have changed by much. The loads are still highly uneven.

As for the number of keys in each node's CFs : the largest node now
has 5589120 keys for the column-family that had 6527744 keys before (load
is now 1.08 GB as compares to 1.05 GB before), while the smallest node now
has 71808 keys as compared to 3840 keys before (load is now 31.89 MB as
compares to 1.12 MB before).


On Thu, Sep 19, 2013 at 5:18 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Can you run nodetool repair on all the nodes first and look at the keys?


 On Thu, Sep 19, 2013 at 1:22 PM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Yes, the key distribution does vary across the nodes. For example, on the
 node with the highest data, Number of Keys (estimate) is 6527744 for a
 particular column family, whereas for the same column family on the node
 with least data, Number of Keys (estimate) = 3840.

 Is there a way to control this distribution by setting some parameter of
 cassandra.

 I am using the Murmur3 partitioner with NetworkTopologyStrategy.

 Thanks,
 Suruchi



 On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Can you check cfstats to see number of keys per node?


 On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Thanks for your replies. I wiped out my data from the cluster and also
 cleared the commitlog before restarting it with num_tokens=256. I then
 uploaded data using sstableloader.

 However, I am still not able to see a uniform distribution of data
 across nodes of the clusters.

 The output of the bin/nodetool -h localhost status commands looks like
 follows. Some nodes have data as low as 1.12MB while some have as high as
 912.57 MB.

 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.238.133.174  856.66 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
 UN  10.238.133.97   439.02 MB  256 7.7%
 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
 UN  10.151.86.146   1.05 GB256 8.0%
 8952645d-4a27-4670-afb2-65061c205734  1a
 UN  10.138.10.9 912.57 MB  256 8.6%
 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
 UN  10.87.87.24070.85 MB   256 8.6%
 ea066827-83bc-458c-83e8-bd15b7fc783c  1b
 UN  10.93.5.157 60.56 MB   256 7.6%
 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
 UN  10.92.231.170   866.73 MB  256 9.3%
 a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
 UN  

Re: Rebalancing vnodes cluster

2013-09-19 Thread Nimi Wariboko Jr
We had originally started with 3 nodes w/ 32GB ram and 768GB SSDs. I pretty 
much Google'd my way into setting up cassandra and set it up using tokens 
because I was following an older docco. We were using Cassandra 1.2.5, I 
learned about vnodes later on and regretted waking up that morning.

1.) I'm not sure if shuffle was successful. We started shuffling on Jun 7th and 
killed it on the 17th. We let it run over 2 weekends (10 days) and it the node 
shuffle tool didn't report any meaningful progress. I explained this over IRC 
and was told `node shuffle` takes a really long time and you shouldn't use it. 
At the time our ring looked mostly balanced so we just killed it. We were 
migrating from a MongoDB cluster and didn't want to pay for 2 clusters.
2.) During the shuffle we had upped our RF to 2, did not a do a repair and lost 
1/3rd of our data. Fortunately we could just use sstable tool to reload the 
data as it was really deleted.
3.) We ran cleanup a couple days later
4.) Cassandra 1.2.5

After all this, we converted another mongo node we had into Cassandra (same 
specs) for a cluster of size 4. Now after 4 months, one node (the subject of 
this thread) is growing faster than the others (which is leading to hot 
spotting as well). I guess this has to do with the unfinished shuffle? Are 
there any remedies for this? 

On Thursday, September 19, 2013 at 9:50 AM, Robert Coli wrote:

 On Wed, Sep 18, 2013 at 4:26 PM, Nimi Wariboko Jr nimiwaribo...@gmail.com 
 (mailto:nimiwaribo...@gmail.com) wrote:
  When I started with cassandra I had originally set it up to use tokens. I
  then migrated to vnodes (using shuffle), but my cluster isn't balanced 
  (http://imgur.com/73eNhJ3). 
 
 Are you saying that (other than the imbalance that is the subject of this 
 thread) you were able to use shuffle successfully on a cluster with ~150gb 
 per node?
 
 1) How long did it take?
 2) Did you experience any difficulties while doing so?
 3) Have you run cleanup yet?
 4) What version of Cassandra?
 
 =Rob
  
 
 
 
 
 




Re: NetworkTopologyStrategy Error

2013-09-19 Thread sankalp kohli
Is any of your keyspace still reference this DC?


On Thu, Sep 19, 2013 at 3:03 PM, Ashley Martens ashley.mart...@dena.comwrote:

 I tried to split my cluster and ran into this error, which I did not see
 in the tests I performed.

 ERROR [pool-1-thread-52165] 2013-09-19 21:48:08,262 Cassandra.java (line
 3250) Internal error processing describe_ring
 java.lang.IllegalStateException: datacenter (DC103) has no more endpoints,
 (3) replicas still needed
 at
 org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:118)
  at
 org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:101)
 at
 org.apache.cassandra.service.StorageService.constructRangeToEndpointMap(StorageService.java:604)
  at
 org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:579)
 at
 org.apache.cassandra.service.StorageService.getRangeToEndpointMap(StorageService.java:553)
  at
 org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:584)
 at
 org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.process(Cassandra.java:3246)
  at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)


 --

 Ashley  OpenPGP -- KeyID: 0x5B0D6ABB 
 http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x23E861255B0D6ABB




Re: Cassandra column family using Composite Columns

2013-09-19 Thread Raihan Jamal
Can anyone help me on this?

Any help will be appreciated.. Thanks..





*Raihan Jamal*


On Tue, Sep 17, 2013 at 4:44 PM, Raihan Jamal jamalrai...@gmail.com wrote:

  I am designing the Column Family for our use case in Cassandra. I am
 planning to go with Dynamic Column Structure.

 Below is my requirement per our use case-

 user-id   column1123  (Column1-Value  Column1-SchemaName  LMD)

  For each user-id, we will be storing column1 and its value and that value
 will store these three things always-

 (Column1-Value   Column1-SchemaName LMD)

  In my above example, I have show only one columns but it might have more
 columns and those columns will also follow the same concept.

 Now I am not sure, how to store these three things always at a column
 value level? Should I use composite columns at a column level? if yes, then
 I am not sure how to make a column family like this in Cassandra.

 Column1-value will be in binary, Column1-SchemaName will be String, LMD will 
 be DateType.

  This is what I have so far-

 create column family USER_DATA
 with key_validation_class = 'UTF8Type'
 and comparator = 'UTF8Type'
 and default_validation_class = 'UTF8Type'
 and gc_grace = 86400
 and column_metadata = [ {column_name : 'lmd', validation_class : DateType}];

  Can anyone help me in designing the column family for this? Thanks.



Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Mohit Anchlia
Other thing I noticed is that you are using mutiple RACKS and that might be
contributing factor to it. However, I am not sure.

Can you paste the output of nodetool cfstats and ring? Is it possible to
run the same test but keeping all the nodes in one rack?

I think you should open a JIRA if you are able to reproduce this.

On Thu, Sep 19, 2013 at 4:41 PM, Jayadev Jayaraman jdisal...@gmail.comwrote:

 We ran nodetool repair on all nodes for all Keyspaces / CFs, restarted
 cassandra and this is what we get for nodetool status :

 bin/nodetool -h localhost status
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID
 Rack
 UN  10.238.133.174  885.36 MB  256 8.4%
  e41d8863-ce37-4d5c-a428-bfacea432a35  1a
 UN  10.238.133.97   468.66 MB  256 7.7%
  1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
 UN  10.151.86.146   1.08 GB256 8.0%
  8952645d-4a27-4670-afb2-65061c205734  1a
 UN  10.138.10.9 941.44 MB  256 8.6%
  25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
 UN  10.87.87.24099.69 MB   256 8.6%
  ea066827-83bc-458c-83e8-bd15b7fc783c  1b
 UN  10.93.5.157 87.44 MB   256 7.6%
  4ab9111c-39b4-4d15-9401-359d9d853c16  1b
 UN  10.238.137.250  561.42 MB  256 7.8%
  84301648-afff-4f06-aa0b-4be421e0d08f  1a
 UN  10.92.231.170   893.75 MB  256 9.3%
  a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
 UN  10.138.2.20 31.89 MB   256 7.9%
  a6d4672a-0915-4c64-ba47-9f190abbf951  1a
 UN  10.93.31.44 312.52 MB  256 7.8%
  67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
 UN  10.93.91.13930.46 MB   256 8.1%
  682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
 UN  10.236.138.169  260.15 MB  256 9.1%
  cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
 UN  10.137.7.90 38.45 MB   256 7.4%
  17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
 UN  10.93.77.166867.15 MB  256 8.8%
  9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
 UN  10.120.249.140  863.98 MB  256 9.4%
  e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
 UN  10.90.246.128   242.63 MB  256 8.4%
  054911ec-969d-43d9-aea1-db445706e4d2  1b
 UN  10.123.95.248   171.51 MB  256 7.2%
  a17deca1-9644-4520-9e62-ac66fc6fef60  1b
 UN  10.136.11.4033.8 MB256 8.5%
  66be1173-b822-40b5-b650-cb38ae3c7a51  1a
 UN  10.87.90.42 38.01 MB   256 8.0%
  dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
 UN  10.87.75.147579.29 MB  256 8.3%
  ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
 UN  10.151.49.88151.06 MB  256 8.9%
  57043573-ab1b-4e3c-8044-58376f7ce08f  1a
 UN  10.87.83.107512.91 MB  256 8.3%
  0019439b-9f8a-4965-91b8-7108bbb55593  1b
 UN  10.238.170.159  85.04 MB   256 9.4%
  32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a
 UN  10.137.20.183   167.41 MB  256 8.4%
  15951592-8ab2-473d-920a-da6e9d99507d  1a

 It doesn't seem to have changed by much. The loads are still highly
 uneven.

 As for the number of keys in each node's CFs : the largest node now
 has 5589120 keys for the column-family that had 6527744 keys before (load
 is now 1.08 GB as compares to 1.05 GB before), while the smallest node now
 has 71808 keys as compared to 3840 keys before (load is now 31.89 MB as
 compares to 1.12 MB before).


 On Thu, Sep 19, 2013 at 5:18 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Can you run nodetool repair on all the nodes first and look at the keys?


 On Thu, Sep 19, 2013 at 1:22 PM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Yes, the key distribution does vary across the nodes. For example, on
 the node with the highest data, Number of Keys (estimate) is 6527744 for a
 particular column family, whereas for the same column family on the node
 with least data, Number of Keys (estimate) = 3840.

 Is there a way to control this distribution by setting some parameter of
 cassandra.

 I am using the Murmur3 partitioner with NetworkTopologyStrategy.

 Thanks,
 Suruchi



 On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia 
 mohitanch...@gmail.comwrote:

 Can you check cfstats to see number of keys per node?


 On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar 
 suruchi.deod...@generalsentiment.com wrote:

 Thanks for your replies. I wiped out my data from the cluster and also
 cleared the commitlog before restarting it with num_tokens=256. I then
 uploaded data using sstableloader.

 However, I am still not able to see a uniform distribution of data
 across nodes of the clusters.

 The output of the bin/nodetool -h localhost status commands looks like
 follows. Some nodes have data as low as 1.12MB while some have as high as
 912.57 MB.

 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.238.133.174  856.66 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
 UN  10.238.133.97   439.02 MB  256 7.7%
 

Re: I don't understand shuffle progress

2013-09-19 Thread Juan Manuel Formoso
Thanks. I did this and I finished rebuilding the new cluster in about 8
hours... much better option than shuffle (you have to have the hardware for
duplicating your environment though)


On Thu, Sep 19, 2013 at 7:21 PM, Jeremiah D Jordan 
jeremiah.jor...@gmail.com wrote:


 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/configuration/configVnodesProduction_t.html

 On Sep 18, 2013, at 9:41 AM, Chris Burroughs chris.burrou...@gmail.com
 wrote:

 
 http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html
 
  This is a basic outline.
 
 
  On 09/18/2013 10:32 AM, Juan Manuel Formoso wrote:
  I really like this idea. I can create a new cluster and have it
 replicate
  the old one, after it finishes I can remove the original.
 
  Any good resource that explains how to add a new datacenter to a live
  single dc cluster that anybody can recommend?
 
 
  On Wed, Sep 18, 2013 at 9:58 AM, Chris Burroughs
  chris.burrou...@gmail.comwrote:
 
  On 09/17/2013 09:41 PM, Paulo Motta wrote:
 
  So you're saying the only feasible way of enabling VNodes on an
 upgraded
  C*
  1.2 is by doing fork writes to a brand new cluster + bulk load of
 sstables
  from the old cluster? Or is it possible to succeed on shuffling, even
 if
  that means waiting some weeks for the shuffle to complete?
 
 
  In a multi DC cluster situation you *should* be able to bring up a
 new
  DC with vnodes, bootstrap it, and then decommission the old cluster.
 
 
 
 
 




-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


BigTable-like Versioned Cells, Importing PostgreSQL Data

2013-09-19 Thread Keith Bogs
I've been playing with Cassandra and have a few questions that I've been
stuck on for awhile, and Googling around didn't seem to help much:

1. What's the quickest way to import a bunch of data from PostgreSQL? I
have ~20M rows with mostly text (some long text with newlines, and blob
files.) I tried exporting to CSV but had issues with newlines escaped
characters. I also tried writing an ETL tool in Go, but it was taking a
long time to go through the records.

2. How would I create a versioned schema with CQL? AFAIK Cassandra's cell
versions are only for conflict resolution.

I envision a wide row, with timestamps and keys representing fields of data
through time. For example, for a CF of web page contents (inspired by
Google's Bigtable paper):

Key  1379649588:body 1379649522:body 1379649123:title
a.com/1.html htmlA
a.com/2.html htmlB
b.com/1.html htmlhtmlC

But CQL doesn't seem to support this. (Yes, I've read
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows.)
Once upon a time it seems Thrift and Supercolumns maybe would work?

I'd want to efficiently iterate through the history of a particular row
(in other words, read all the columns for a row) or efficiently iterate
through all the latest values for the CF (not reading the entire row, just
a column slice). In the previous example, I'd want to return the latest
'body' entries with timestamps for every page (row/key) in the database.

Some have talked of having two CFs, one for versioned data and one for
current values?

I've been struggling because most of the documentation revolves around
Java. I'm most comfortable with Ruby and (increasingly) Go.

I'd appreciate any insights, would really like to get Cassandra going for
real. It's been such a pleasure to setup vs. HBase and whatnot.

Keith