Unable to run hadoop_cql3_word_count examples

2013-12-03 Thread Parth Patil
Hi,
I am new to Cassandra and I am exploring the Hadoop integration (MapReduce)
provided by Cassandra.

I am trying to run the hadoop examples provided in the cassandra's repo
under examples/hadoop_cql3_word_count. I am using the cassandra-2.0 branch.
I have a single node cassandra running locally. I was able to run the
./bin/word_count_setup step successfully but when I run the
./bin/word_count step I am getting the following error :

java.lang.RuntimeException at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:661)
at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.(CqlPagingRecordReader.java:297)
at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:163)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: InvalidRequestException(why:consistency level LOCAL_ONE not
compatible with replication strategy
(org.apache.cassandra.locator.SimpleStrategy)) at
org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:52627)
at
org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:52604)
at
org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:52519)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at
org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1785)
at
org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1770)
at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:631)
... 6 more
--

Has anyone seen this before ? Am I missing something ?

-- 
Best,
Parth


How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
Hi,

Is there a way to monitor the progress of a hinted handoff task?

I found the following two mbeans providing some info:

org.apache.cassandra.internal:type=HintedHandoff, which tells me that there
is 1 active task, and
org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
which quite often gives a timeout when executed.

Ideally, I would like to see how many hints have been sent (e.g. over the
last minute or so), and how many hints are still to be sent (although I
assume that's what countPendingHints normally does?)

I'm experiencing hinted handoff tasks that are started, but never finish,
so I would like to know what the task is doing.

My log shows this:

INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java
(line 297) Started hinted handoff for host:
6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
(nothing more for [HintedHandoff:1])

The node is up and running, the network connection is ok, no gossip
messages appear in the logs.

Any idea is welcome.
(Casandra 1.2.3)




-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: bin/cqlsh is missing cqlshlib

2013-12-03 Thread Jason Wee
Hi, if you download the rpm from
http://rpm.datastax.com/community/noarch/, example
cassandra20-2.0.3-1.noarch.rpm , it should contain the cqlshlib
and it is package into /usr/lib/python2.6/site-packages/cqlshlib

hth

/Jason


On Tue, Dec 3, 2013 at 10:17 AM, Ritchie Iu r...@ixl.com wrote:

 No, there is no cqlshlib found at /usr/share/pyshared/cqlshlib although it
 might because I'm using Fedora which isn't debian.

 I did a search and I've found that cqlshlib is in several locations:
 /opt_build/fc17/lib/cassandra/pylib/cqlshlib,
 /opt_build/fc17/lib/cassandra/pylib/cqlshlib,
 /usr/opt/apache-cassandra/1.1.4/top/cassandra/pylib/cqlshlib and
 /usr/opt/apache-cassandra/1.1.0/top/cassandra/pylib/cqlshlib

 So I'm guessing that means the start script doesn't know where to find the
 cqlshlib directory? Any idea which one of the above locations I should tell
 it to point to?

 Thanks,
 Ritchie



How to measure data transfer between data centers?

2013-12-03 Thread Tom van den Berge
Is there a way to know how much data is transferred between two nodes, or
more specifically, between two data centers?

I'm especially interested in how much data is being replicated from one
data center to another, to know how much of the available bandwidth is used.


Thanks,
Tom


Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Rahul Menon
Tom,

You should check the size of the hints column family to determine how much
are present. The hints are a super column family and its keys are
destination tokens. You could look at it if you would like.

Hints send and timedouts are logged, you should be seeing something like

Timed out replaying hints to {}; aborting ({} delivered


OR

Finished hinted handoff of {} rows to endpoint {}



Thanks
Rahul


On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.com wrote:

 Hi,

 Is there a way to monitor the progress of a hinted handoff task?

 I found the following two mbeans providing some info:

 org.apache.cassandra.internal:type=HintedHandoff, which tells me that
 there is 1 active task, and
 org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
 which quite often gives a timeout when executed.

 Ideally, I would like to see how many hints have been sent (e.g. over the
 last minute or so), and how many hints are still to be sent (although I
 assume that's what countPendingHints normally does?)

 I'm experiencing hinted handoff tasks that are started, but never finish,
 so I would like to know what the task is doing.

 My log shows this:

 INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java
 (line 297) Started hinted handoff for host:
 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
 (nothing more for [HintedHandoff:1])

 The node is up and running, the network connection is ok, no gossip
 messages appear in the logs.

 Any idea is welcome.
 (Casandra 1.2.3)




 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com



Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
Hi Rahul,

Thanks for your reply.

I have never seen message like Timed out replaying hints to..., which is
a good thing then, I suppose ;)

Normally, I do see the Finished hinted handoff... log message. However,
every now and then this message is not logged, not even after several
hours. This is the problem I'm trying to solve.

The log messages you describe are quite course-grained; they only tell you
that a task has started or finished, but not how this task is progressing.
And that's exactly what I would like to know if I see that a task has
started, but has not finished after a reasonable amount of time.

So I guess the only way to see learn the progress is to look inside the
'hints' column family then.I'll give that a try.


Thanks,
Tom


On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon ra...@apigee.com wrote:

 Tom,

 You should check the size of the hints column family to determine how much
 are present. The hints are a super column family and its keys are
 destination tokens. You could look at it if you would like.

 Hints send and timedouts are logged, you should be seeing something like

 Timed out replaying hints to {}; aborting ({} delivered



 OR

 Finished hinted handoff of {} rows to endpoint {}



 Thanks
 Rahul


 On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote:

 Hi,

 Is there a way to monitor the progress of a hinted handoff task?

 I found the following two mbeans providing some info:

 org.apache.cassandra.internal:type=HintedHandoff, which tells me that
 there is 1 active task, and
 org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
 which quite often gives a timeout when executed.

 Ideally, I would like to see how many hints have been sent (e.g. over the
 last minute or so), and how many hints are still to be sent (although I
 assume that's what countPendingHints normally does?)

 I'm experiencing hinted handoff tasks that are started, but never finish,
 so I would like to know what the task is doing.

 My log shows this:

 INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java
 (line 297) Started hinted handoff for host:
 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
 (nothing more for [HintedHandoff:1])

 The node is up and running, the network connection is ok, no gossip
 messages appear in the logs.

 Any idea is welcome.
 (Casandra 1.2.3)




 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Rahul Menon
Tom,

Do you know why these hints are piling up? What is the size of the hints cf?

Thanks
Rahul


On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge t...@drillster.com wrote:

 Hi Rahul,

 Thanks for your reply.

 I have never seen message like Timed out replaying hints to..., which is
 a good thing then, I suppose ;)

 Normally, I do see the Finished hinted handoff... log message. However,
 every now and then this message is not logged, not even after several
 hours. This is the problem I'm trying to solve.

 The log messages you describe are quite course-grained; they only tell you
 that a task has started or finished, but not how this task is progressing.
 And that's exactly what I would like to know if I see that a task has
 started, but has not finished after a reasonable amount of time.

 So I guess the only way to see learn the progress is to look inside the
 'hints' column family then.I'll give that a try.


 Thanks,
 Tom


 On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon ra...@apigee.com wrote:

 Tom,

 You should check the size of the hints column family to determine how
 much are present. The hints are a super column family and its keys are
 destination tokens. You could look at it if you would like.

 Hints send and timedouts are logged, you should be seeing something like

 Timed out replaying hints to {}; aborting ({} delivered





 OR

 Finished hinted handoff of {} rows to endpoint {}



 Thanks
 Rahul


 On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote:

 Hi,

 Is there a way to monitor the progress of a hinted handoff task?

 I found the following two mbeans providing some info:

 org.apache.cassandra.internal:type=HintedHandoff, which tells me that
 there is 1 active task, and
 org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
 which quite often gives a timeout when executed.

 Ideally, I would like to see how many hints have been sent (e.g. over
 the last minute or so), and how many hints are still to be sent (although I
 assume that's what countPendingHints normally does?)

 I'm experiencing hinted handoff tasks that are started, but never
 finish, so I would like to know what the task is doing.

 My log shows this:

 INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java
 (line 297) Started hinted handoff for host:
 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
 (nothing more for [HintedHandoff:1])

 The node is up and running, the network connection is ok, no gossip
 messages appear in the logs.

 Any idea is welcome.
 (Casandra 1.2.3)




 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com





 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com



Commitlog replay makes dropped and recreated keyspace and column family rows reappear

2013-12-03 Thread Desimpel, Ignace
Hi,

I have the impression that there is an issue with dropping a keyspace and then 
recreating the keyspace (and column families), combined with a restart of the 
database

My test goes as follows:

Create keyspace K and column families C.
Insert rows X0 column family  C0
Query for X0 : found rows : OK
Drop keyspace K
Query for X0 : found no rows : OK

Create keyspace K and column families C.
Insert rows X1 column family  C1
Query for X0 : not found : OK
Query for X1 : found : OK

Stop the Cassandra database
Start the Cassandra database
Query for X1 : found : OK
Query for X0 : found : NOT OK !

Did someone tested this scenario?

Using : CASSANDRA VERSION 2.02, thrift, java 1.7.x, centos

Ignace Desimpel



Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
Rahul,

This problem occurs every now and then, and currently everything is ok, so
there are no hints. But whenever it happens, the hints are quickly piling
up. This results in heap problems on the node (Heap is 0.813462 full...
appears many times). This in turn results in the flushing of the 'hints'
column family, to relieve memory pressure. According to the log message,
the size varies between 50 and 60MB). But since the HintedHandoffManager is
reading from the hints CF, it will probably pull it back into a memtable
again -- that's at least my understanding of how it works.

So I guess that flushing the hints CF while the HintedHandoffManager is
working on it only makes things worse, and it could be the reason that the
process never ends.

What I typically see when this happens is that the hints keep piling up,
and eventually the node comes to a grinding halt (OOM). Then I have to
rebuild the node entirely (only removing the hints doesn't work).

The reason for hints to start accumulating in the first place might be a
spike in CF writes that must be replicated to a node in another data
center. The available bandwidth to that data center might not be able to
handle the data quickly enough, resulting in stored hints. The
HintedHandoff task that is started is targeting that remote node.


Thanks,
Tom


On Tue, Dec 3, 2013 at 2:22 PM, Rahul Menon ra...@apigee.com wrote:

 Tom,

 Do you know why these hints are piling up? What is the size of the hints
 cf?

 Thanks
 Rahul


 On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge t...@drillster.comwrote:

 Hi Rahul,

 Thanks for your reply.

 I have never seen message like Timed out replaying hints to..., which
 is a good thing then, I suppose ;)

 Normally, I do see the Finished hinted handoff... log message. However,
 every now and then this message is not logged, not even after several
 hours. This is the problem I'm trying to solve.

 The log messages you describe are quite course-grained; they only tell
 you that a task has started or finished, but not how this task is
 progressing. And that's exactly what I would like to know if I see that a
 task has started, but has not finished after a reasonable amount of time.

 So I guess the only way to see learn the progress is to look inside the
 'hints' column family then.I'll give that a try.


 Thanks,
 Tom


 On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon ra...@apigee.com wrote:

 Tom,

 You should check the size of the hints column family to determine how
 much are present. The hints are a super column family and its keys are
 destination tokens. You could look at it if you would like.

 Hints send and timedouts are logged, you should be seeing something like

 Timed out replaying hints to {}; aborting ({} delivered






 OR

 Finished hinted handoff of {} rows to endpoint {}



 Thanks
 Rahul


 On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote:

 Hi,

 Is there a way to monitor the progress of a hinted handoff task?

 I found the following two mbeans providing some info:

 org.apache.cassandra.internal:type=HintedHandoff, which tells me that
 there is 1 active task, and
 org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
 which quite often gives a timeout when executed.

 Ideally, I would like to see how many hints have been sent (e.g. over
 the last minute or so), and how many hints are still to be sent (although I
 assume that's what countPendingHints normally does?)

 I'm experiencing hinted handoff tasks that are started, but never
 finish, so I would like to know what the task is doing.

 My log shows this:

 INFO [HintedHandoff:1] 2013-12-02
 13:49:05,325 HintedHandOffManager.java (line 297) Started hinted handoff
 for host: 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
 (nothing more for [HintedHandoff:1])

 The node is up and running, the network connection is ok, no gossip
 messages appear in the logs.

 Any idea is welcome.
 (Casandra 1.2.3)




 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com





 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: Stack trace from a node during a repair

2013-12-03 Thread John Pyeatt
Then my issue must be the 0.01% because

1) I'm running the repair as root.
2) The directory exists and the permissions are appropriate. root:root 755
3) The three times it occurred during the repair it always complained about
backups directories. But there are dozens other backups directories that
were created during the repair that caused no exceptions.


The biggest issue with this is that is shuts down gossip.




On Mon, Dec 2, 2013 at 5:56 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Dec 2, 2013 at 2:59 PM, John Pyeatt john.pye...@singlewire.comwrote:

 Caused by: java.io.IOException: Unable to create directory
 /data-1/cassandra/data/SinglewireSupport/Binaries/backups


 This is an exception directly from a core java method. The cause is
 99.9% likely to be permissions.

 =Rob




-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
--
608.661.1184
john.pye...@singlewire.com


Re: Stack trace from a node during a repair

2013-12-03 Thread Hannu Kröger
Hi,

Are you running nodetool or cassandra as root? I think it doesn't really
matter what user is running the nodetool. Those directories should be
writable by the user who is running the actual cassandra process.

Hannu


2013/12/3 John Pyeatt john.pye...@singlewire.com

 Then my issue must be the 0.01% because

 1) I'm running the repair as root.
 2) The directory exists and the permissions are appropriate. root:root 755
 3) The three times it occurred during the repair it always complained
 about backups directories. But there are dozens other backups directories
 that were created during the repair that caused no exceptions.


 The biggest issue with this is that is shuts down gossip.




 On Mon, Dec 2, 2013 at 5:56 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Dec 2, 2013 at 2:59 PM, John Pyeatt 
 john.pye...@singlewire.comwrote:

 Caused by: java.io.IOException: Unable to create directory
 /data-1/cassandra/data/SinglewireSupport/Binaries/backups


 This is an exception directly from a core java method. The cause is
 99.9% likely to be permissions.

 =Rob




 --
 John Pyeatt
 Singlewire Software, LLC
 www.singlewire.com
 --
 608.661.1184
 john.pye...@singlewire.com



Re: data dropped when using sstableloader?

2013-12-03 Thread Francisco Nogueira Calmon Sobral
Hi, Ross.

We had the same problem under the same version of Cassandra. We opted to copy 
ALL the stables from the old cluster to each new node, then run nodetool 
refresh. The missing rows have appeared after this procedure.

Best regards,
Francisco.



On Nov 27, 2013, at 7:49 PM, Ross Black ross.w.bl...@gmail.com wrote:

 Hi Tyler,
 
 Thanks (somehow I missed that ticket when I searched for sstableloader bugs).
 
 I will retry with 1.2.12 when we get a chance to upgrade.  In the meantime I 
 have switched to loading data via the normal client API (slower but reliable).
 
 Ross
 
 
 
 On 28 November 2013 03:45, Tyler Hobbs ty...@datastax.com wrote:
 
 On Wed, Nov 27, 2013 at 3:12 AM, Ross Black ross.w.bl...@gmail.com wrote:
 Using Cassandra 1.2.10, I am trying to load sstable data into a cluster of 6 
 machines.
 
 This may be affecting you: 
 https://issues.apache.org/jira/browse/CASSANDRA-6272
 
 Using 1.2.12 for the sstableloader process should work.
 
 
 -- 
 Tyler Hobbs
 DataStax
 



Re: Stack trace from a node during a repair

2013-12-03 Thread John Pyeatt
Both cassandra and nodetool are running as root.

also
ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 59450
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 65536
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 10240
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


On Tue, Dec 3, 2013 at 8:36 AM, Hannu Kröger hkro...@gmail.com wrote:

 Hi,

 Are you running nodetool or cassandra as root? I think it doesn't really
 matter what user is running the nodetool. Those directories should be
 writable by the user who is running the actual cassandra process.

 Hannu


 2013/12/3 John Pyeatt john.pye...@singlewire.com

 Then my issue must be the 0.01% because

 1) I'm running the repair as root.
 2) The directory exists and the permissions are appropriate. root:root 755
 3) The three times it occurred during the repair it always complained
 about backups directories. But there are dozens other backups directories
 that were created during the repair that caused no exceptions.


 The biggest issue with this is that is shuts down gossip.




 On Mon, Dec 2, 2013 at 5:56 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Dec 2, 2013 at 2:59 PM, John Pyeatt 
 john.pye...@singlewire.comwrote:

 Caused by: java.io.IOException: Unable to create directory
 /data-1/cassandra/data/SinglewireSupport/Binaries/backups


 This is an exception directly from a core java method. The cause is
 99.9% likely to be permissions.

 =Rob




 --
 John Pyeatt
 Singlewire Software, LLC
 www.singlewire.com
 --
 608.661.1184
 john.pye...@singlewire.com





-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
--
608.661.1184
john.pye...@singlewire.com


Re: Stack trace from a node during a repair

2013-12-03 Thread Robert Coli
On Tue, Dec 3, 2013 at 6:19 AM, John Pyeatt john.pye...@singlewire.comwrote:

 Then my issue must be the 0.01% because

 1) I'm running the repair as root.


Huh? Repair doesn't care what user your shell is. It is a process built
into cassandra and has the permissions that cassandra does?


 2) The directory exists and the permissions are appropriate. root:root 755


Why are you running Cassandra as root?


 3) The three times it occurred during the repair it always complained
 about backups directories. But there are dozens other backups directories
 that were created during the repair that caused no exceptions.


Cassandra doesn't have a lot of chances to mess up while creating
directories. This appears to be one of them.


 The biggest issue with this is that is shuts down gossip.


That sounds like rather a serious issue, and hints towards a potential
common cause : too many open files?

To rule out other potential causes of issue :

- what o/s?
- what JVM?
- how have you installed cassandra?
- what version of cassandra?

=Rob


Re: Stack trace from a node during a repair

2013-12-03 Thread John Pyeatt
This is running the Amazon Linux OS which is essentially CentOS 6 I believe.

java version 1.6.0_45
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

Installed cassandra 1.2.9 from
http://archive.apache.org/dist/cassandra/1.2.9/apache-cassandra-1.2.9-bin.tar.gzthen
tar -xzf to a directory, change the .yaml file a bit (using vnodes,
set concurrent_writes to 16) change the cassandra-env.sh changed
MAX_HEAP_SIZE to 3G and HEAP_NEWSIZE=200M



On Tue, Dec 3, 2013 at 12:05 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Dec 3, 2013 at 6:19 AM, John Pyeatt john.pye...@singlewire.comwrote:

 Then my issue must be the 0.01% because

 1) I'm running the repair as root.


 Huh? Repair doesn't care what user your shell is. It is a process built
 into cassandra and has the permissions that cassandra does?


 2) The directory exists and the permissions are appropriate. root:root 755


 Why are you running Cassandra as root?


 3) The three times it occurred during the repair it always complained
 about backups directories. But there are dozens other backups directories
 that were created during the repair that caused no exceptions.


 Cassandra doesn't have a lot of chances to mess up while creating
 directories. This appears to be one of them.


 The biggest issue with this is that is shuts down gossip.


 That sounds like rather a serious issue, and hints towards a potential
 common cause : too many open files?

 To rule out other potential causes of issue :

 - what o/s?
 - what JVM?
 - how have you installed cassandra?
 - what version of cassandra?

 =Rob




-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
--
608.661.1184
john.pye...@singlewire.com


CQL workaround for modifying a primary key

2013-12-03 Thread Ike Walker
What is the best practice for modifying the primary key definition of a table 
in Cassandra 1.2.9?

Say I have this table:

CREATE TABLE temperature (
   weatherstation_id text,
   event_time timestamp,
   temperature text,
   PRIMARY KEY (weatherstation_id,event_time)
);

I want to add a new column named version and include that column in the primary 
key.

CQL will let me add the column, but you can't change the primary key for an 
existing table.

So I drop the table and recreate it:

DROP TABLE temperature;

CREATE TABLE temperature (
   weatherstation_id text,
   version int,
   event_time timestamp,
   temperature text,
   PRIMARY KEY (weatherstation_id,version,event_time)
);

But then I start getting errors like this:

java.io.FileNotFoundException: 
/var/lib/cassandra/data/test/temperature/test-temperature-ic-8316-Data.db (No 
such file or directory)

So I guess the drop table doesn't actually delete the data, and I end up with a 
problem like this:

https://issues.apache.org/jira/browse/CASSANDRA-4857

What's a good workaround for this, assuming I don;t want to change the name of 
my table? Should I just truncate the table, then drop it and recreate it?

Thanks.

-Ike Walker

Re: Exactly one wide row per node for a given CF?

2013-12-03 Thread Vivek Mishra
So Basically you want to create a cluster of multiple unique keys, but data
which belongs to one unique should be colocated. correct?

-Vivek


On Tue, Dec 3, 2013 at 10:39 AM, onlinespending onlinespend...@gmail.comwrote:

 Subject says it all. I want to be able to randomly distribute a large set
 of records but keep them clustered in one wide row per node.

 As an example, lets say I’ve got a collection of about 1 million records
 each with a unique id. If I just go ahead and set the primary key (and
 therefore the partition key) as the unique id, I’ll get very good random
 distribution across my server cluster. However, each record will be its own
 row. I’d like to have each record belong to one large wide row (per server
 node) so I can have them sorted or clustered on some other column.

 If I say have 5 nodes in my cluster, I could randomly assign a value of 1
 - 5 at the time of creation and have the partition key set to this value.
 But this becomes troublesome if I add or remove nodes. What effectively I
 want is to partition on the unique id of the record modulus N (id % N;
 where N is the number of nodes).

 I have to imagine there’s a mechanism in Cassandra to simply randomize the
 partitioning without even using a key (and then clustering on some column).

 Thanks for any help.