[ANNOUNCE] Polidoro - A Cassandra client in Scala

2013-08-30 Thread Lanny Ripple
Hi all,

We've open sourced Polidoro.  It's a Cassandra client in Scala on top of 
Astyanax and in the style of Cascal.

Find it at https://github.com/SpotRight/Polidoro

  -Lanny Ripple
  SpotRight, Inc - http://spotright.com

Re: Decommission an entire DC

2013-07-24 Thread Lanny Ripple
That one is documented --
http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html


On Wed, Jul 24, 2013 at 3:33 AM, Cyril Scetbon cyril.scet...@free.frwrote:

 And if we want to add a new DC ? I suppose we should add all nodes and
 alter the replication factor of the keyspace after that, but if anyone can
 confirm it and maybe give me some tips ?
 FYI ,we have 2 DCs with between 10 and 20 nodes in each and a 2To database
 (local replication factor included)

 thanks
 --
 Cyril SCETBON

 On Jul 24, 2013, at 12:04 AM, Omar Shibli o...@eyeviewdigital.com wrote:

 All you need to do is to decrease the replication factor of DC1 to 0, and
 then decommission the nodes one by one,
 I've tried this before and it worked with no issues.

 Thanks,

 On Tue, Jul 23, 2013 at 10:32 PM, Lanny Ripple la...@spotright.comwrote:

 Hi,

 We have a multi-dc setup using DC1:2, DC2:2.  We want to get rid of DC1.
  We're in the position where we don't need to save any of the data on DC1.
  We know we'll lose a (tiny.  already checked) bit of data but our
 processing is such that we'll recover over time.

 How do we drop DC1 and just move forward with DC2?  Using nodetool
 decommision or removetoken looks like we'll eventually end up with a single
 DC1 node containing the entire dc's data which would be slow and costly.

 We've speculated that setting DC1:0 or removing it from the schema would
 do the trick but without finding any hits during searching on that idea I
 hesitate to just do it.  We can drop DC1s data but have to keep a working
 ring in DC2.






Decommission an entire DC

2013-07-23 Thread Lanny Ripple
Hi,

We have a multi-dc setup using DC1:2, DC2:2.  We want to get rid of DC1.
 We're in the position where we don't need to save any of the data on DC1.
 We know we'll lose a (tiny.  already checked) bit of data but our
processing is such that we'll recover over time.

How do we drop DC1 and just move forward with DC2?  Using nodetool
decommision or removetoken looks like we'll eventually end up with a single
DC1 node containing the entire dc's data which would be slow and costly.

We've speculated that setting DC1:0 or removing it from the schema would do
the trick but without finding any hits during searching on that idea I
hesitate to just do it.  We can drop DC1s data but have to keep a working
ring in DC2.


Re: Thrift message length exceeded

2013-04-24 Thread Lanny Ripple
Good catch since that bug also would have shut us down.

The original problem is that previous to Cass 1.1.10 it looks like 
cassandra.yaml values

  * thrift_framed_transport_size_in_mb
  * thrift_max_message_length_in_mb

were ignored (in favor of effectively no limits).  We went from 1.1.5 to 1.2.3 
and these were suddenly turned on for us (and way too low for our data).

Also have confirmed your supplied patch2 works for us.

  -ljr

On Apr 22, 2013, at 6:57 AM, Oleksandr Petrov oleksandr.pet...@gmail.com 
wrote:

 I've submitted a patch that fixes the issue for 1.2.3: 
 https://issues.apache.org/jira/browse/CASSANDRA-5504
 
 Maybe guys know a better way to fix it, but that helped me in a meanwhile.
 
 
 On Mon, Apr 22, 2013 at 1:44 AM, Oleksandr Petrov 
 oleksandr.pet...@gmail.com wrote:
 If you're using Cassandra 1.2.3, and new Hadoop interface, that would make a 
 call to next(), you'll have an eternal loop reading same things all over 
 again from your cassandra nodes (you may see it if you enable Debug output).
 
 next() is clearing key() which is required for Wide Row iteration.
 
 Setting key back fixed issue for me.
 
 
 On Sat, Apr 20, 2013 at 3:05 PM, Oleksandr Petrov 
 oleksandr.pet...@gmail.com wrote:
 Tried to isolate the issue in testing environment,
 
 What I currently have:
 
 That's a setup for test:
 CREATE KEYSPACE cascading_cassandra WITH replication = {'class' : 
 'SimpleStrategy', 'replication_factor' : 1};
 USE cascading_cassandra;
 CREATE TABLE libraries (emitted_at timestamp, additional_info varchar, 
 environment varchar, application varchar, type varchar, PRIMARY KEY 
 (application, environment, type, emitted_at)) WITH COMPACT STORAGE;
 
 Next, insert some test data:
 
 (just for example) 
 [INSERT INTO libraries (application, environment, type, additional_info, 
 emitted_at) VALUES (?, ?, ?, ?, ?); [app env type 0 #inst 
 2013-04-20T13:01:04.935-00:00]]
 
 If keys (e.q. app env type) are all same across the dataset, it works 
 correctly.
 As soon as I start varying keys, e.q. app1, app2, app3 or others, I get 
 the error with Message Length Exceeded.
 
 Does anyone have some ideas?
 Thanks for help!
 
 
 On Sat, Apr 20, 2013 at 1:56 PM, Oleksandr Petrov 
 oleksandr.pet...@gmail.com wrote:
 I can confirm running same problem. 
 
 Tried ConfigHelper.setThriftMaxMessageLengthInMb();, and tuning server side, 
 reducing/increasing batch size. 
 
 Here's stacktrace from Hadoop/Cassandra, maybe it could give a hint:
 
 Caused by: org.apache.thrift.protocol.TProtocolException: Message length 
 exceeded: 8
   at 
 org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
 
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
   at org.apache.cassandra.thrift.Column.read(Column.java:528)
   at 
 org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
   at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
   at 
 org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14157)
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)
 
 
 On Thu, Apr 18, 2013 at 12:34 AM, Lanny Ripple la...@spotright.com wrote:
 It's slow going finding the time to do so but I'm working on that.
 
 We do have another table that has one or sometimes two columns per row.  We 
 can run jobs on it without issue.  I looked through 
 org.apache.cassandra.hadoop code and don't see anything that's really changed 
 since 1.1.5 (which was also using thrift-0.7) so something of a puzzler about 
 what's going on.
 
 
 On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com wrote:
 
  Can you reproduce this in a simple way ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:
 
  That was our first thought.  Using maven's dependency tree info we 
  verified that we're using the expected (cass 1.2.3) jars
 
  $ mvn dependency:tree | grep thrift
  [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
  [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
  I've also dumped the final command run by the hadoop we use (CDH3u5) and 
  verified it's not sneaking thrift in on us.
 
 
  On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com 
  wrote:
  Can you confirm the you are using the same thrift version that ships 1.2.3 
  ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton

Re: Thrift message length exceeded

2013-04-17 Thread Lanny Ripple
That was our first thought.  Using maven's dependency tree info we verified
that we're using the expected (cass 1.2.3) jars

$ mvn dependency:tree | grep thrift
[INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
[INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile

I've also dumped the final command run by the hadoop we use (CDH3u5) and
verified it's not sneaking thrift in on us.


On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you confirm the you are using the same thrift version that ships 1.2.3
 ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:

 A bump to say I found this


 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded

 so others are seeing similar behavior.

 From what I can see of org.apache.cassandra.hadoop nothing has changed
 since 1.1.5 when we didn't see such things but sure looks like there's a
 bug that's slipped in (or been uncovered) somewhere.  I'll try to narrow
 down to a dataset and code that can reproduce.

 On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote:

 We are using Astyanax in production but I cut back to just Hadoop and
 Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.

 We do have some extremely large rows but we went from everything working
 with 1.1.5 to almost everything carping with 1.2.3.  Something has changed.
  Perhaps we were doing something wrong earlier that 1.2.3 exposed but
 surprises are never welcome in production.

 On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:

 I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6
 to 0.8
 Turns out the Thrift message really was too long.
 The mystery to me: Why no complaints in previous versions? Were some
 checks added in Thrift or Hector?

 -Original Message-
 From: Lanny Ripple [mailto:la...@spotright.com]
 Sent: Tuesday, April 09, 2013 6:17 PM
 To: user@cassandra.apache.org
 Subject: Thrift message length exceeded

 Hello,

 We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran
 sstableupgrades and got the ring on its feet and we are now seeing a new
 issue.

 When we run MapReduce jobs against practically any table we find the
 following errors:

 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader:
 Loaded the native-hadoop library
 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid
 exited with exit code 0
 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using
 ResourceCalculatorPlugin :
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running
 child
 java.lang.RuntimeException: org.apache.thrift.TException: Message length
 exceeded: 106
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
 at org.apache.hadoop.mapred.Child.main(Child.java:260)
 Caused by: org.apache.thrift.TException: Message length exceeded: 106
 at
 org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
 at org.apache.cassandra.thrift.Column.read(Column.java:528

Re: Thrift message length exceeded

2013-04-17 Thread Lanny Ripple
It's slow going finding the time to do so but I'm working on that.

We do have another table that has one or sometimes two columns per row.  We can 
run jobs on it without issue.  I looked through org.apache.cassandra.hadoop 
code and don't see anything that's really changed since 1.1.5 (which was also 
using thrift-0.7) so something of a puzzler about what's going on.


On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com wrote:

 Can you reproduce this in a simple way ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:
 
 That was our first thought.  Using maven's dependency tree info we verified 
 that we're using the expected (cass 1.2.3) jars
 
 $ mvn dependency:tree | grep thrift
 [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
 [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
 I've also dumped the final command run by the hadoop we use (CDH3u5) and 
 verified it's not sneaking thrift in on us.
 
 
 On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com 
 wrote:
 Can you confirm the you are using the same thrift version that ships 1.2.3 ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:
 
 A bump to say I found this
 
  
 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
 
 so others are seeing similar behavior.
 
 From what I can see of org.apache.cassandra.hadoop nothing has changed 
 since 1.1.5 when we didn't see such things but sure looks like there's a 
 bug that's slipped in (or been uncovered) somewhere.  I'll try to narrow 
 down to a dataset and code that can reproduce.
 
 On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote:
 
 We are using Astyanax in production but I cut back to just Hadoop and 
 Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
 
 We do have some extremely large rows but we went from everything working 
 with 1.1.5 to almost everything carping with 1.2.3.  Something has 
 changed.  Perhaps we were doing something wrong earlier that 1.2.3 exposed 
 but surprises are never welcome in production.
 
 On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:
 
 I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 
 to 0.8
 Turns out the Thrift message really was too long.
 The mystery to me: Why no complaints in previous versions? Were some 
 checks added in Thrift or Hector?
 
 -Original Message-
 From: Lanny Ripple [mailto:la...@spotright.com] 
 Sent: Tuesday, April 09, 2013 6:17 PM
 To: user@cassandra.apache.org
 Subject: Thrift message length exceeded
 
 Hello,
 
 We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran 
 sstableupgrades and got the ring on its feet and we are now seeing a new 
 issue.
 
 When we run MapReduce jobs against practically any table we find the 
 following errors:
 
 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: 
 Loaded the native-hadoop library
 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin : 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error 
 running child
 java.lang.RuntimeException: org.apache.thrift.TException: Message length 
 exceeded: 106
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
   at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
   at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143

Re: CorruptedBlockException

2013-04-11 Thread Lanny Ripple
Saw this in earlier versions. Our workaround was disable; drain; snap; 
shutdown; delete; link from snap; restart;

  -ljr

On Apr 11, 2013, at 9:45, moshe.kr...@barclays.com wrote:

 I have formulated the following theory regarding C* 1.2.2 which may be 
 relevant: Whenever there is a disk error during compaction of an SS table 
 (e.g., bad block, out of disk space), that SStable’s files stick around 
 forever after, and do not subsequently get deleted by normal compaction 
 (minor or major), long after all its records have been deleted. This causes 
 disk usage to rise dramatically. The only way to make the SStable files 
 disappear is to run “nodetool cleanup” (which takes hours to run).
  
 Just a theory so far….
  
 From: Alexis Rodríguez [mailto:arodrig...@inconcertcc.com] 
 Sent: Thursday, April 11, 2013 5:31 PM
 To: user@cassandra.apache.org
 Subject: Re: CorruptedBlockException
  
 Aaron,
  
 It seems that we are in the same situation as Nury, we are storing a lot of 
 files of ~5MB in a CF.
  
 This happens in a test cluster, with one node using cassandra 1.1.5, we have 
 commitlog in a different partition than the data directory. Normally our 
 tests use nearly 13 GB in data, but when the exception on compaction appears 
 our disk space ramp up to:
  
 # df -h
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sda1 440G  330G   89G  79% /
 tmpfs 7.9G 0  7.9G   0% /lib/init/rw
 udev  7.9G  160K  7.9G   1% /dev
 tmpfs 7.9G 0  7.9G   0% /dev/shm
 /dev/sdb1 459G  257G  179G  59% /cassandra
  
 # cd /cassandra/data/Repository/
  
 # ls Files/*tmp* | wc -l
 1671
  
 # du -ch Files | tail -1
 257Gtotal
  
 # du -ch Files/*tmp* | tail -1
 34G total
  
 We are using cassandra 1.1.5 with one node, our schema for that keyspace is:
  
 [default@unknown] use Repository;
 Authenticated to keyspace: Repository
 [default@Repository] show schema;
 create keyspace Repository
   with placement_strategy = 'NetworkTopologyStrategy'
   and strategy_options = {datacenter1 : 1}
   and durable_writes = true;
  
 use Repository;
  
 create column family Files
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'BytesType'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 864000
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy = 
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'KEYS_ONLY'
   and compaction_strategy_options = {'sstable_size_in_mb' : '120'}
   and compression_options = {'sstable_compression' : 
 'org.apache.cassandra.io.compress.SnappyCompressor'};
  
 In our logs:
  
 ERROR [CompactionExecutor:1831] 2013-04-11 09:12:41,725 
 AbstractCassandraDaemon.java (line 135) Exception in thread 
 Thread[CompactionExecutor:1831,1,main]
 java.io.IOError: org.apache.cassandra.io.compress.CorruptedBlockException: 
 (/cassandra/data/Repository/Files/Repository-Files-he-4533-Data.db): 
 corruption detected, chunk at 43325354 of length 65545.
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99)
 at 
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 at 
 com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
 at 
 org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

Re: Thrift message length exceeded

2013-04-10 Thread Lanny Ripple
We are using Astyanax in production but I cut back to just Hadoop and Cassandra 
to confirm it's a Cassandra (or our use of Cassandra) problem.

We do have some extremely large rows but we went from everything working with 
1.1.5 to almost everything carping with 1.2.3.  Something has changed.  Perhaps 
we were doing something wrong earlier that 1.2.3 exposed but surprises are 
never welcome in production.

On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:

 I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 to 
 0.8
 Turns out the Thrift message really was too long.
 The mystery to me: Why no complaints in previous versions? Were some checks 
 added in Thrift or Hector?
 
 -Original Message-
 From: Lanny Ripple [mailto:la...@spotright.com] 
 Sent: Tuesday, April 09, 2013 6:17 PM
 To: user@cassandra.apache.org
 Subject: Thrift message length exceeded
 
 Hello,
 
 We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran 
 sstableupgrades and got the ring on its feet and we are now seeing a new 
 issue.
 
 When we run MapReduce jobs against practically any table we find the 
 following errors:
 
 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
 the native-hadoop library
 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin : 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.thrift.TException: Message length 
 exceeded: 106
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
   at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
   at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)
 Caused by: org.apache.thrift.TException: Message length exceeded: 106
   at 
 org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
   at org.apache.cassandra.thrift.Column.read(Column.java:528)
   at 
 org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
   at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
   at 
 org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905)
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
   at 
 org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
   ... 16 more
 2013-04-09 09:58:50,481 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 
 The message length listed on each failed job differs (not always 106).  Jobs 
 that used to run fine now fail with code compiled against cass 1.2.3 (and 
 work fine if compiled against 1.1.5 and run against the 1.2.3 servers in 
 production).  I'm using the following setup to configure the job:
 
  def cassConfig(job: Job) {
val conf = job.getConfiguration

Thrift message length exceeded

2013-04-09 Thread Lanny Ripple
Hello,

We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran 
sstableupgrades and got the ring on its feet and we are now seeing a new issue.

When we run MapReduce jobs against practically any table we find the following 
errors:

2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
the native-hadoop library
2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid exited 
with exit code 0
2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.thrift.TException: Message length 
exceeded: 106
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: org.apache.thrift.TException: Message length exceeded: 106
at 
org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
at 
org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
at org.apache.cassandra.thrift.Column.read(Column.java:528)
at 
org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
... 16 more
2013-04-09 09:58:50,481 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task

The message length listed on each failed job differs (not always 106).  Jobs 
that used to run fine now fail with code compiled against cass 1.2.3 (and work 
fine if compiled against 1.1.5 and run against the 1.2.3 servers in 
production).  I'm using the following setup to configure the job:

  def cassConfig(job: Job) {
val conf = job.getConfiguration()

ConfigHelper.setInputRpcPort(conf,  + 9160)
ConfigHelper.setInputInitialAddress(conf, Config.hostip)

ConfigHelper.setInputPartitioner(conf, 
org.apache.cassandra.dht.RandomPartitioner)
ConfigHelper.setInputColumnFamily(conf, Config.keyspace, Config.cfname)

val pred = {
  val range = new SliceRange()
.setStart(.getBytes(UTF-8))
.setFinish(.getBytes(UTF-8))
.setReversed(false)
.setCount(4096 * 1000)

  new SlicePredicate().setSlice_range(range)
}

ConfigHelper.setInputSlicePredicate(conf, pred)
  }

The job consists only of a mapper that increments counters for each row and 
associated columns so all I'm really doing is exercising 
ColumnFamilyRecordReader.

Has anyone else seen this?  Is there a workaround/fix to get our jobs running?

Thanks

Re: lots of extra bytes on disk

2013-03-28 Thread Lanny Ripple
We occasionally (twice now on a 40 node cluster over the last 6-8 months) see 
this.  My best guess is that Cassandra can fail to mark an SSTable for cleanup 
somehow.  Forced GC's or reboots don't clear them out.  We disable thrift and 
gossip; drain; snapshot; shutdown; clear data/Keyspace/Table/*.db and restore 
(hard-linking back into place to avoid data transfer) from the just created 
snapshot; restart.


On Mar 28, 2013, at 10:12 AM, Ben Chobot be...@instructure.com wrote:

 Some of my cassandra nodes in my 1.1.5 cluster show a large discrepancy 
 between what cassandra says the SSTables should sum up to, and what df and du 
 claim exist. During repairs, this is almost always pretty bad, but 
 post-repair compactions tend to bring those numbers to within a few percent 
 of each other... usually. Sometimes they remain much further apart after 
 compactions have finished - for instance, I'm looking at one node now that 
 claims to have 205GB of SSTables, but actually has 450GB of files living in 
 that CF's data directory. No pending compactions, and the most recent 
 compaction for this CF finished just a few hours ago.
 
 nodetool cleanup has no effect.
 
 What could be causing these extra bytes, and how to get them to go away? I'm 
 ok with a few extra GB of unexplained data, but an extra 245GB (more than all 
 the data this node is supposed to have!) is a little extreme.



Re: TimeUUID Order Partitioner

2013-03-27 Thread Lanny Ripple
A type 4 UUID can be created from two Longs.  You could MD5 your strings giving 
you 128 hashed bits and then make UUIDs out of that.  Using Scala:
 
   import java.nio.ByteBuffer
   import java.security.MessageDigest
   import java.util.UUID

   val key = Hello, World!

   val md = MessageDigest.getInstance(MD5)
   val dig = md.digest(key.getBytes(UTF-8))
   val bb = ByteBuffer.wrap(dig)

   val msb = bb.getLong
   val lsb = bb.getLong

   val uuid = new UUID(msb, lsb)


On Mar 26, 2013, at 3:22 PM, aaron morton aa...@thelastpickle.com wrote:

 Any idea?
 Not off the top of my head.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 26/03/2013, at 2:13 AM, Carlos Pérez Miguel cperez...@gmail.com wrote:
 
 Yes it does. Thank you Aaron.
 
 Now I realized that the system keyspace uses string as keys, like Ring or 
 ClusterName, and I don't know how to convert these type of keys into UUID. 
 Any idea?
 
 
 Carlos Pérez Miguel
 
 
 2013/3/25 aaron morton aa...@thelastpickle.com
 The best thing to do is start with a look at ByteOrderedPartitoner and 
 AbstractByteOrderedPartitioner. 
 
 You'll want to create a new TimeUUIDToken extends TokenUUID and a new 
 UUIDPartitioner that extends AbstractPartitioner
 
 Usual disclaimer that ordered partitioners cause problems with load 
 balancing. 
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 25/03/2013, at 1:12 AM, Carlos Pérez Miguel cperez...@gmail.com wrote:
 
 Hi,
 
 I store in my system rows where the key is a UUID version1, TimeUUID. I 
 would like to maintain rows ordered by time. I know that in this case, it 
 is recomended to use an external CF where column names are UUID ordered by 
 time. But in my use case this is not possible, so I would like to use a 
 custom Partitioner in order to do this. If I use ByteOrderedPartitioner 
 rows are not correctly ordered because of the way a UUID stores the 
 timestamp. What is needed in order to implement my own Partitioner?
 
 Thank you.
 
 Carlos Pérez Miguel
 
 
 



Re: TimeUUID Order Partitioner

2013-03-27 Thread Lanny Ripple
Ah. TimeUUID.  Not as useful for you then but still something for the toolbox.

On Mar 27, 2013, at 8:42 AM, Lanny Ripple la...@spotright.com wrote:

 A type 4 UUID can be created from two Longs.  You could MD5 your strings 
 giving you 128 hashed bits and then make UUIDs out of that.  Using Scala:
 
   import java.nio.ByteBuffer
   import java.security.MessageDigest
   import java.util.UUID
 
   val key = Hello, World!
 
   val md = MessageDigest.getInstance(MD5)
   val dig = md.digest(key.getBytes(UTF-8))
   val bb = ByteBuffer.wrap(dig)
 
   val msb = bb.getLong
   val lsb = bb.getLong
 
   val uuid = new UUID(msb, lsb)
 
 
 On Mar 26, 2013, at 3:22 PM, aaron morton aa...@thelastpickle.com wrote:
 
 Any idea?
 Not off the top of my head.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 26/03/2013, at 2:13 AM, Carlos Pérez Miguel cperez...@gmail.com wrote:
 
 Yes it does. Thank you Aaron.
 
 Now I realized that the system keyspace uses string as keys, like Ring or 
 ClusterName, and I don't know how to convert these type of keys into 
 UUID. Any idea?
 
 
 Carlos Pérez Miguel
 
 
 2013/3/25 aaron morton aa...@thelastpickle.com
 The best thing to do is start with a look at ByteOrderedPartitoner and 
 AbstractByteOrderedPartitioner. 
 
 You'll want to create a new TimeUUIDToken extends TokenUUID and a new 
 UUIDPartitioner that extends AbstractPartitioner
 
 Usual disclaimer that ordered partitioners cause problems with load 
 balancing. 
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 25/03/2013, at 1:12 AM, Carlos Pérez Miguel cperez...@gmail.com wrote:
 
 Hi,
 
 I store in my system rows where the key is a UUID version1, TimeUUID. I 
 would like to maintain rows ordered by time. I know that in this case, it 
 is recomended to use an external CF where column names are UUID ordered by 
 time. But in my use case this is not possible, so I would like to use a 
 custom Partitioner in order to do this. If I use ByteOrderedPartitioner 
 rows are not correctly ordered because of the way a UUID stores the 
 timestamp. What is needed in order to implement my own Partitioner?
 
 Thank you.
 
 Carlos Pérez Miguel