Re: Object mapper for CQL

2014-06-08 Thread DuyHai Doan
You can have a look at Achilles, it's using the Java Driver underneath :
https://github.com/doanduyhai/Achilles
Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit :

 Looks like the java-driver is working on an object mapper:

 More modules including a simple object mapper will come shortly.
 But of course I need one now …
 I'm curious what others are doing here.

 I don't want to pass around Row objects in my code if I can avoid it..
 Ideally I would just run a query and get back a POJO.

 Another issue is how are these POJOs generated.  Are they generated from
 the schema?  is the schema generated from the POJOs ?  From a side file?

 And granted, there are existing ORMs out there but I don't think any
 support CQL.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




Re: Data model for streaming a large table in real time.

2014-06-08 Thread Robert Stupp
You do not Need RAID0 for data. Let C* do striping over data disks.

And maybe CL ANY/ONE might be sufficient for your writes.

 Am 08.06.2014 um 06:15 schrieb Kevin Burton bur...@spinn3r.com:
 
 we're using containers for other reasons, not just cassandra.  
 
 Tightly constraining resources means we don't have to worry about cassandra , 
 the JVM , or Linux doing something silly and using too many resources and 
 taking down the whole box.
 
 
 On Sat, Jun 7, 2014 at 8:25 PM, Colin Clark co...@clark.ws wrote:
 You won't need containers - running one instance of Cassandra in that 
 configuration will hum along quite nicely and will make use of the cores and 
 memory.  
 
 I'd forget the raid anyway and just mount the disks separately (jbod)
 
 --
 Colin
 320-221-9531
 
 
 On Jun 7, 2014, at 10:02 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 Right now I'm just putting everything together as a proof of concept… so 
 just two cheap replicas for now.  And it's at 1/1th of the load.
 
 If we lose data it's ok :)
 
 I think our config will be 2-3x 400GB SSDs in RAID0 , 3 replicas, 16 cores, 
 probably 48-64GB of RAM each box.
 
 Just one datacenter for now… 
 
 We're probably going to be migrating to using linux containers at some 
 point.  This way we can have like 16GB , one 400GB SSD, 4 cores for each 
 image.  And we can ditch the RAID which is nice. :)
 
 
 On Sat, Jun 7, 2014 at 7:51 PM, Colin colpcl...@gmail.com wrote:
 To have any redundancy in the system, start with at least 3 nodes and a 
 replication factor of 3.
 
 Try to have at least 8 cores, 32 gig ram, and separate disks for log and 
 data.
 
 Will you be replicating data across data centers?
 
 --
 Colin
 320-221-9531
 
 
 On Jun 7, 2014, at 9:40 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 Oh.. To start with we're going to use from 2-10 nodes.. 
 
 I think we're going to take the original strategy and just to use 100 
 buckets .. 0-99… then the timestamp under that..  I think it should be 
 fine and won't require an ordered partitioner. :)
 
 Thanks!
 
 
 On Sat, Jun 7, 2014 at 7:38 PM, Colin Clark co...@clark.ws wrote:
 With 100 nodes, that ingestion rate is actually quite low and I don't 
 think you'd need another column in the partition key.
 
 You seem to be set in your current direction.  Let us know how it works 
 out.
 
 --
 Colin
 320-221-9531
 
 
 On Jun 7, 2014, at 9:18 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 What's 'source' ? You mean like the URL?
 
 If source too random it's going to yield too many buckets.  
 
 Ingestion rates are fairly high but not insane.  About 4M inserts per 
 hour.. from 5-10GB… 
 
 
 On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark co...@clark.ws wrote:
 Not if you add another column to the partition key; source for 
 example.  
 
 I would really try to stay away from the ordered partitioner if at all 
 possible.
 
 What ingestion rates are you expecting, in size and speed.
 
 --
 Colin
 320-221-9531
 
 
 On Jun 7, 2014, at 9:05 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 
 Thanks for the feedback on this btw.. .it's helpful.  My notes below.
 
 On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark co...@clark.ws wrote:
 No, you're not-the partition key will get distributed across the 
 cluster if you're using random or murmur.
 
 Yes… I'm aware.  But in practice this is how it will work…
 
 If we create bucket b0, that will get hashed to h0…
 
 So say I have 50 machines performing writes, they are all on the same 
 time thanks to ntpd, so they all compute b0 for the current bucket 
 based on the time.
 
 That gets hashed to h0…
 
 If h0 is hosted on node0 … then all writes go to node zero for that 1 
 second interval.
 
 So all my writes are bottlenecking on one node.  That node is 
 *changing* over time… but they're not being dispatched in parallel 
 over N nodes.  At most writes will only ever reach 1 node a time.
 
  
 You could also ensure that by adding another column, like source to 
 ensure distribution. (Add the seconds to the partition key, not the 
 clustering columns)
 
 I can almost guarantee that if you put too much thought into working 
 against what Cassandra offers out of the box, that it will bite you 
 later.
 
 Sure.. I'm trying to avoid the 'bite you later' issues. More so 
 because I'm sure there are Cassandra gotchas to worry about.  
 Everything has them.  Just trying to avoid the land mines :-P
  
 In fact, the use case that you're describing may best be served by a 
 queuing mechanism, and using Cassandra only for the underlying store.
 
 Yes… that's what I'm doing.  We're using apollo to fan out the queue, 
 but the writes go back into cassandra and needs to be read out 
 sequentially.
  
 
 I used this exact same approach in a use case that involved writing 
 over a million events/second to a cluster with no problems.  
 Initially, I thought ordered partitioner was the way to go too.  And 
 I used separate processes to aggregate, conflate, and handle 
 distribution to clients.
 

high pending compactions

2014-06-08 Thread S C
I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending 
compaction count. pending tasks: 67 while active compaction tasks are not 
more than 5. I have a 24CPU machine. Shouldn't I be seeing more compactions? Is 
this a pattern of high writes and compactions backing up? How can I improve 
this? Here are my thoughts.
Increase memtable_total_space_in_mbIncrease 
compaction_throughput_mb_per_secIncrease concurrent_compactions
Sorry if this was discussed already. Any pointers is much appreciated. 
Thanks,Kumar  

Re: high pending compactions

2014-06-08 Thread Jake Luciani
23

On Sunday, June 8, 2014, S C as...@outlook.com wrote:

 I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending
 compaction count. pending tasks: 67 while active compaction tasks are
 not more than 5. I have a 24CPU machine. Shouldn't I be seeing more
 compactions? Is this a pattern of high writes and compactions backing up?
 How can I improve this? Here are my thoughts.


1. Increase memtable_total_space_in_mb
2. Increase compaction_throughput_mb_per_sec
3. Increase concurrent_compactions


 Sorry if this was discussed already. Any pointers is much appreciated.

 Thanks,
 Kumar



-- 
http://twitter.com/tjake


Re: high pending compactions

2014-06-08 Thread William Katsak
Have you verified that these aren't stuck compactions? e.g. even under 
no load, they don't go away?


-Bill


On 06/08/2014 12:32 PM, Jake Luciani wrote:

23

On Sunday, June 8, 2014, S C as...@outlook.com
mailto:as...@outlook.com wrote:

I am using Cassandra 1.1 (sorry bit old) and I am seeing high
pending compaction count. pending tasks: 67 while active
compaction tasks are not more than 5. I have a 24CPU machine.
Shouldn't I be seeing more compactions? Is this a pattern of high
writes and compactions backing up? How can I improve this? Here are
my thoughts.

 1. Increase memtable_total_space_in_mb
 2. Increase compaction_throughput_mb_per_sec
 3. Increase concurrent_compactions


Sorry if this was discussed already. Any pointers is much appreciated.

Thanks,
Kumar



--
http://twitter.com/tjake


RE: high pending compactions

2014-06-08 Thread S C
How to check if there are any stuck compactions?


 Date: Sun, 8 Jun 2014 12:43:45 -0400
 From: wkat...@cs.rutgers.edu
 To: user@cassandra.apache.org
 Subject: Re: high pending compactions
 
 Have you verified that these aren't stuck compactions? e.g. even under 
 no load, they don't go away?
 
 -Bill
 
 
 On 06/08/2014 12:32 PM, Jake Luciani wrote:
  23
 
  On Sunday, June 8, 2014, S C as...@outlook.com
  mailto:as...@outlook.com wrote:
 
  I am using Cassandra 1.1 (sorry bit old) and I am seeing high
  pending compaction count. pending tasks: 67 while active
  compaction tasks are not more than 5. I have a 24CPU machine.
  Shouldn't I be seeing more compactions? Is this a pattern of high
  writes and compactions backing up? How can I improve this? Here are
  my thoughts.
 
   1. Increase memtable_total_space_in_mb
   2. Increase compaction_throughput_mb_per_sec
   3. Increase concurrent_compactions
 
 
  Sorry if this was discussed already. Any pointers is much appreciated.
 
  Thanks,
  Kumar
 
 
 
  --
  http://twitter.com/tjake
  

Re: Data model for streaming a large table in real time.

2014-06-08 Thread Jack Krupansky
Here’s the Jira for the proposal to remove BOP (and OPP), but you can see that 
there is no clear consensus and that the issue is still open:

CASSANDRA-6922 - Investigate if we can drop ByteOrderedPartitioner and 
OrderPreservingPartitioner in 3.0
https://issues.apache.org/jira/browse/CASSANDRA-6922

You can read the DataStax Cassandra doc for why “Using an ordered partitioner 
is not recommended”:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePartitionerBOP_c.html
“Difficult load balancing... Sequential writes can cause hot spots... Uneven 
load balancing for multiple tables”

-- Jack Krupansky

From: Kevin Burton 
Sent: Saturday, June 7, 2014 1:27 PM
To: user@cassandra.apache.org 
Subject: Re: Data model for streaming a large table in real time.

I just checked the source and in 2.1.0 it's not deprecated.   

So it *might* be *being* deprecated but I haven't seen anything stating that.



On Sat, Jun 7, 2014 at 8:03 AM, Colin colpcl...@gmail.com wrote:

  I believe Byteorderedpartitioner is being deprecated and for good reason.  I 
would look at what you could achieve by using wide rows and murmur3partitioner.



  -- 
  Colin
  320-221-9531


  On Jun 6, 2014, at 5:27 PM, Kevin Burton bur...@spinn3r.com wrote:


We have the requirement to have clients read from our tables while they're 
being written. 

Basically, any write that we make to cassandra needs to be sent out over 
the Internet to our customers.

We also need them to resume so if they go offline, they can just pick up 
where they left off.

They need to do this in parallel, so if we have 20 cassandra nodes, they 
can have 20 readers each efficiently (and without coordination) reading from 
our tables.

Here's how we're planning on doing it.

We're going to use the ByteOrderedPartitioner .

I'm writing with a primary key of the timestamp, however, in practice, this 
would yield hotspots.

(I'm also aware that time isn't a very good pk in a distribute system as I 
can easily have a collision so we're going to use a scheme similar to a uuid to 
make it unique per writer).

One node would take all the load, followed by the next node, etc.

So my plan to stop this is to prefix a slice ID to the timestamp.  This way 
each piece of content has a unique ID, but the prefix will place it on a node.

The slide ID is just a byte… so this means there are 255 buckets in which I 
can place data.  

This means I can have clients each start with a slice, and a timestamp, and 
page through the data with tokens.

This way I can have a client reading with 255 threads from 255 regions in 
the cluster, in parallel, without any hot spots.

Thoughts on this strategy?  

-- 


Founder/CEO Spinn3r.com

Location: San Francisco, CA
Skype: burtonator
blog: http://burtonator.wordpress.com
… or check out my Google+ profile

War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
people.




-- 


Founder/CEO Spinn3r.com

Location: San Francisco, CA
Skype: burtonator
blog: http://burtonator.wordpress.com
… or check out my Google+ profile

War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
people.

Re: Data model for streaming a large table in real time.

2014-06-08 Thread Kevin Burton
Hey Jack.  Thanks for posting this… very helpful.

So I guess the status is that it was proposed for deprecation but that
proposal didn't reach consensus.

Also,  this gave me an idea to look at the JIRA to see what's being
proposed for 3.0 :)

Kevin


On Sun, Jun 8, 2014 at 1:26 PM, Jack Krupansky j...@basetechnology.com
wrote:

   Here’s the Jira for the proposal to remove BOP (and OPP), but you can
 see that there is no clear consensus and that the issue is still open:

 CASSANDRA-6922 - Investigate if we can drop ByteOrderedPartitioner and
 OrderPreservingPartitioner in 3.0
 https://issues.apache.org/jira/browse/CASSANDRA-6922

 You can read the DataStax Cassandra doc for why “Using an ordered
 partitioner is not recommended”:

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePartitionerBOP_c.html
 “Difficult load balancing... Sequential writes can cause hot spots...
 Uneven load balancing for multiple tables”

 -- Jack Krupansky

  *From:* Kevin Burton bur...@spinn3r.com
 *Sent:* Saturday, June 7, 2014 1:27 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Data model for streaming a large table in real time.

  I just checked the source and in 2.1.0 it's not deprecated.

 So it *might* be *being* deprecated but I haven't seen anything stating
 that.


 On Sat, Jun 7, 2014 at 8:03 AM, Colin colpcl...@gmail.com wrote:

  I believe Byteorderedpartitioner is being deprecated and for good
 reason.  I would look at what you could achieve by using wide rows and
 murmur3partitioner.



 --
 Colin
 320-221-9531


 On Jun 6, 2014, at 5:27 PM, Kevin Burton bur...@spinn3r.com wrote:

  We have the requirement to have clients read from our tables while
 they're being written.

 Basically, any write that we make to cassandra needs to be sent out over
 the Internet to our customers.

 We also need them to resume so if they go offline, they can just pick up
 where they left off.

 They need to do this in parallel, so if we have 20 cassandra nodes, they
 can have 20 readers each efficiently (and without coordination) reading
 from our tables.

 Here's how we're planning on doing it.

 We're going to use the ByteOrderedPartitioner .

 I'm writing with a primary key of the timestamp, however, in practice,
 this would yield hotspots.

 (I'm also aware that time isn't a very good pk in a distribute system as
 I can easily have a collision so we're going to use a scheme similar to a
 uuid to make it unique per writer).

 One node would take all the load, followed by the next node, etc.

 So my plan to stop this is to prefix a slice ID to the timestamp.  This
 way each piece of content has a unique ID, but the prefix will place it on
 a node.

 The slide ID is just a byte… so this means there are 255 buckets in which
 I can place data.

 This means I can have clients each start with a slice, and a timestamp,
 and page through the data with tokens.

 This way I can have a client reading with 255 threads from 255 regions in
 the cluster, in parallel, without any hot spots.

 Thoughts on this strategy?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
  http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
  http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Advice on how to handle corruption in system/hints

2014-06-08 Thread Francois Richard
Hi everyone,

We are running some Cassandra clusters (Usually a cluster of 5 nodes with 
replication factor of 3.)  And at least once per day we do see some corruption 
related to a specific sstable in system/hints. (We are using Cassandra version 
1.2.16 on RHEL 6.5)

Here is an example of such exception:


ERROR [CompactionExecutor:1694] 2014-06-08 21:37:33,267 CassandraDaemon.java 
(line 191) Exception in thread Thread[CompactionExecutor:1694,1,main]

org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
dataSize of 8224262783474088549 starting at 502360510 would be larger than file 
/home/y/var/cassandra/data/syste

m/hints/system-hints-ic-281-Data.db length 504590769

at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)

at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)

at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)

at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)

at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)

at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)

at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)

at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145)

at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122)

at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96)

at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)

at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)

at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145)

at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)

at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)

at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)

at 
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)

at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.IOException: dataSize of 8224262783474088549 starting at 
502360510 would be larger than file 
/home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 
504590769

at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123)

... 23 more

 INFO [HintedHandoff:35] 2014-06-08 21:37:33,267 HintedHandOffManager.java 
(line 296) Started hinted handoff for host: 
502a48cd-171b-4e83-a9ad-67f32437353a with IP: /10.210.239.190

ERROR [HintedHandoff:33] 2014-06-08 21:37:33,267 CassandraDaemon.java (line 
191) Exception in thread Thread[HintedHandoff:33,1,main]

java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
dataSize of 8224262783474088549 starting at 502360510 would be larger than file 
/home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 
504590769

at 
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:441)

at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)

at 
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)

at 
org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:508)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.util.concurrent.ExecutionException: 
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
dataSize of 8224262783474088549 starting at 502360510 would be larger than file 
/home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 
504590769

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at 

Re: Object mapper for CQL

2014-06-08 Thread Johan Edstrom
Kevin, 

We are about to release 2.0 of https://github.com/savoirtech/hecate
It is an ASL licensed library that started with Jeff Genender writing a Pojo
library in Hector for a project we did for Ecuador (Essentially all of Ecuador 
uses this).
I extended this with Pojo Graph stuff like Collections and Composite key 
indexing.

James Carman then took this a bit further in Cassidy with some new concepts.
I then a while back decided to bite the bullet and my hatred of CQL and just 
write 
the same thing, it started out with a very reflection and somewhat clunky 
interface, 
James decided to re-write this and incorporate the learnings from Cassidy.

- Jeff, James and I all work together. This library is already in use and has 
been 
in use under 30 mil account circumstances as well as quite decent loads.

What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is a 
new API, 
we support single pojo and Object graph, column modifiers, indexer and 
everything
else we could think of in a library that isn't ORM but maps data to C*.

What will be out in I think 2.0.2 is an external indexer very much like Titan 
and 
possibly some more real graph (vertices) stuff. We are also looking at an 
SchemaIdentifier
so that we can get back to working with dynamic columns at a decent conceptual 
speed :)

/je

On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote:

 You can have a look at Achilles, it's using the Java Driver underneath : 
 https://github.com/doanduyhai/Achilles
 
 Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit :
 Looks like the java-driver is working on an object mapper:
 
 More modules including a simple object mapper will come shortly.
 But of course I need one now … 
 I'm curious what others are doing here.  
 
 I don't want to pass around Row objects in my code if I can avoid it.. 
 Ideally I would just run a query and get back a POJO.  
 
 Another issue is how are these POJOs generated.  Are they generated from the 
 schema?  is the schema generated from the POJOs ?  From a side file?  
 
 And granted, there are existing ORMs out there but I don't think any support 
 CQL.
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.
 



Re: Object mapper for CQL

2014-06-08 Thread Colin
I would check out spring Cassandra-most of the java drivers out there for 
Cassandra offer very little over the new 2. driver from Datastax.  Or just use 
the java driver 2. as is.

There's even a query builder light fluent DSL if you don't like cql.  Based 
upon your use case description so far, I don't think you need to get too funky 
with your data access layer.

Whatever you do, make sure the driver you use supports CQL 3 and the native 
protocol.  Thrift, like BOP, will most likely go away at some point in the 
future.
--
Colin
320-221-9531


 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote:
 
 Kevin, 
 
 We are about to release 2.0 of https://github.com/savoirtech/hecate
 It is an ASL licensed library that started with Jeff Genender writing a Pojo
 library in Hector for a project we did for Ecuador (Essentially all of 
 Ecuador uses this).
 I extended this with Pojo Graph stuff like Collections and Composite key 
 indexing.
 
 James Carman then took this a bit further in Cassidy with some new concepts.
 I then a while back decided to bite the bullet and my hatred of CQL and just 
 write 
 the same thing, it started out with a very reflection and somewhat clunky 
 interface, 
 James decided to re-write this and incorporate the learnings from Cassidy.
 
 - Jeff, James and I all work together. This library is already in use and has 
 been 
 in use under 30 mil account circumstances as well as quite decent loads.
 
 What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is a 
 new API, 
 we support single pojo and Object graph, column modifiers, indexer and 
 everything
 else we could think of in a library that isn't ORM but maps data to C*.
 
 What will be out in I think 2.0.2 is an external indexer very much like Titan 
 and 
 possibly some more real graph (vertices) stuff. We are also looking at an 
 SchemaIdentifier
 so that we can get back to working with dynamic columns at a decent 
 conceptual speed :)
 
 /je
 
 On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote:
 
 You can have a look at Achilles, it's using the Java Driver underneath : 
 https://github.com/doanduyhai/Achilles
 
 Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit :
 Looks like the java-driver is working on an object mapper:
 
 More modules including a simple object mapper will come shortly.
 But of course I need one now … 
 I'm curious what others are doing here.  
 
 I don't want to pass around Row objects in my code if I can avoid it.. 
 Ideally I would just run a query and get back a POJO.  
 
 Another issue is how are these POJOs generated.  Are they generated from the 
 schema?  is the schema generated from the POJOs ?  From a side file?  
 
 And granted, there are existing ORMs out there but I don't think any support 
 CQL.
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.
 


Re: Object mapper for CQL

2014-06-08 Thread Colin
I wasn't responding as a Datastax employee.

I have used hector, Achilles and a few others as well.  The .net drivers used 
to have an edge, but that is evaporating as well.

I have also built my own mapping layers.

But all if that was when the drivers from Datastax weren't there yet.

Yes, I work for Datastax.  I also speak at meetups, and contribute to the 
community.  

Datastax doesn't  charge for the drivers by the way.

I have seen folks use third party drivers and end up paying for it down the 
road. 

If you're going to consider using a community driver, then I would recommend 
something that wraps the Datastax drivers, like netflix does.

All I am saying is that sometimes, people make using Casaandra more complex 
than it needs to be and end up introducing a lot of new tech in their initial 
adoption-this increases the risk of the project.

Also, I wouldn't use anything built on thrift.  Datastax has a growing driver 
team, a growing focus on testing and certification, and if you end up wanting 
support for your project and are using an unsupported driver, it can make your 
life more difficult.

In response to how quickly  responded, I often try to provide assistance out 
here-I don't get paid for it, and it's not part of my job. Having close to 5 
years of production experience with Cassandra means that I have made all the 
mistakes out there and probably invented a few of my own.

I have watched a lot if the questions Kevin has asked-his project is ambitious 
for a first dip into Cassandra, I want to see him succeed, and have given him 
the same advice I give our customers.
--
Colin
320-221-9531


 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote:
 
 Comments in line...
 
 On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote:
 
 I would check out spring Cassandra-most of the java drivers out there for 
 Cassandra offer very little over the new 2. driver from Datastax.  Or just 
 use the java driver 2. as is.
 
 Interesting… answer came within 7 minutes… from a vendor (Datastax employee)… 
 and terribly opinionated without data to back up… I’m just sayin… ;-)
 
 Colin… did you even look at the driver referenced by Johan?  If so, thats 
 certainly is the fastest code review and driver test I have ever seen. ;-)
 
 Perhaps a bit more kindness may be more appropriate?  Not a great way to 
 build contributions from the community...
 
 SNIP
 
 Whatever you do, make sure the driver you use supports CQL 3 and the native 
 protocol.  Thrift, like BOP, will most likely go away at some point in the 
 future.
 
 Read what Johan stated… “hecate-cql3” — CQL 3
 
 I think a nice look at what was produced may be a good thing for the 
 community and maybe even Datastax may think its kinda cool?
 
 Jeff Genender
 Apache Member
 http://www.apache.org
 
 
 --
 Colin
 320-221-9531
 
 
 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote:
 
 Kevin, 
 
 We are about to release 2.0 of https://github.com/savoirtech/hecate
 It is an ASL licensed library that started with Jeff Genender writing a Pojo
 library in Hector for a project we did for Ecuador (Essentially all of 
 Ecuador uses this).
 I extended this with Pojo Graph stuff like Collections and Composite key 
 indexing.
 
 James Carman then took this a bit further in Cassidy with some new concepts.
 I then a while back decided to bite the bullet and my hatred of CQL and 
 just write 
 the same thing, it started out with a very reflection and somewhat clunky 
 interface, 
 James decided to re-write this and incorporate the learnings from Cassidy.
 
 - Jeff, James and I all work together. This library is already in use and 
 has been 
 in use under 30 mil account circumstances as well as quite decent loads.
 
 What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is 
 a new API, 
 we support single pojo and Object graph, column modifiers, indexer and 
 everything
 else we could think of in a library that isn't ORM but maps data to C*.
 
 What will be out in I think 2.0.2 is an external indexer very much like 
 Titan and 
 possibly some more real graph (vertices) stuff. We are also looking at an 
 SchemaIdentifier
 so that we can get back to working with dynamic columns at a decent 
 conceptual speed :)
 
 /je
 
 On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote:
 
 You can have a look at Achilles, it's using the Java Driver underneath : 
 https://github.com/doanduyhai/Achilles
 
 Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit :
 Looks like the java-driver is working on an object mapper:
 
 More modules including a simple object mapper will come shortly.
 But of course I need one now … 
 I'm curious what others are doing here.  
 
 I don't want to pass around Row objects in my code if I can avoid it.. 
 Ideally I would just run a query and get back a POJO.  
 
 Another issue is how are these POJOs generated.  Are they generated from 
 the schema?  is the schema generated from 

Re: Object mapper for CQL

2014-06-08 Thread Johan Edstrom
So - you deduced that we were not using the driver, 
were not datstax friendly and we'd be paying for this down the road?

On Jun 8, 2014, at 9:05 PM, Colin colpcl...@gmail.com wrote:

 I wasn't responding as a Datastax employee.
 
 I have used hector, Achilles and a few others as well.  The .net drivers used 
 to have an edge, but that is evaporating as well.
 
 I have also built my own mapping layers.
 
 But all if that was when the drivers from Datastax weren't there yet.
 
 Yes, I work for Datastax.  I also speak at meetups, and contribute to the 
 community.  
 
 Datastax doesn't  charge for the drivers by the way.
 
 I have seen folks use third party drivers and end up paying for it down the 
 road. 
 
 If you're going to consider using a community driver, then I would recommend 
 something that wraps the Datastax drivers, like netflix does.
 
 All I am saying is that sometimes, people make using Casaandra more complex 
 than it needs to be and end up introducing a lot of new tech in their initial 
 adoption-this increases the risk of the project.
 
 Also, I wouldn't use anything built on thrift.  Datastax has a growing driver 
 team, a growing focus on testing and certification, and if you end up wanting 
 support for your project and are using an unsupported driver, it can make 
 your life more difficult.
 
 In response to how quickly  responded, I often try to provide assistance out 
 here-I don't get paid for it, and it's not part of my job. Having close to 5 
 years of production experience with Cassandra means that I have made all the 
 mistakes out there and probably invented a few of my own.
 
 I have watched a lot if the questions Kevin has asked-his project is 
 ambitious for a first dip into Cassandra, I want to see him succeed, and have 
 given him the same advice I give our customers.
 --
 Colin
 320-221-9531
 
 
 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote:
 
 Comments in line...
 
 On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote:
 
 I would check out spring Cassandra-most of the java drivers out there for 
 Cassandra offer very little over the new 2. driver from Datastax.  Or just 
 use the java driver 2. as is.
 
 Interesting… answer came within 7 minutes… from a vendor (Datastax 
 employee)… and terribly opinionated without data to back up… I’m just sayin… 
 ;-)
 
 Colin… did you even look at the driver referenced by Johan?  If so, thats 
 certainly is the fastest code review and driver test I have ever seen. ;-)
 
 Perhaps a bit more kindness may be more appropriate?  Not a great way to 
 build contributions from the community...
 
 SNIP
 
 Whatever you do, make sure the driver you use supports CQL 3 and the native 
 protocol.  Thrift, like BOP, will most likely go away at some point in the 
 future.
 
 Read what Johan stated… “hecate-cql3” — CQL 3
 
 I think a nice look at what was produced may be a good thing for the 
 community and maybe even Datastax may think its kinda cool?
 
 Jeff Genender
 Apache Member
 http://www.apache.org
 
 
 --
 Colin
 320-221-9531
 
 
 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote:
 
 Kevin, 
 
 We are about to release 2.0 of https://github.com/savoirtech/hecate
 It is an ASL licensed library that started with Jeff Genender writing a 
 Pojo
 library in Hector for a project we did for Ecuador (Essentially all of 
 Ecuador uses this).
 I extended this with Pojo Graph stuff like Collections and Composite key 
 indexing.
 
 James Carman then took this a bit further in Cassidy with some new 
 concepts.
 I then a while back decided to bite the bullet and my hatred of CQL and 
 just write 
 the same thing, it started out with a very reflection and somewhat clunky 
 interface, 
 James decided to re-write this and incorporate the learnings from Cassidy.
 
 - Jeff, James and I all work together. This library is already in use and 
 has been 
 in use under 30 mil account circumstances as well as quite decent loads.
 
 What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it 
 is a new API, 
 we support single pojo and Object graph, column modifiers, indexer and 
 everything
 else we could think of in a library that isn't ORM but maps data to C*.
 
 What will be out in I think 2.0.2 is an external indexer very much like 
 Titan and 
 possibly some more real graph (vertices) stuff. We are also looking at an 
 SchemaIdentifier
 so that we can get back to working with dynamic columns at a decent 
 conceptual speed :)
 
 /je
 
 On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote:
 
 You can have a look at Achilles, it's using the Java Driver underneath : 
 https://github.com/doanduyhai/Achilles
 
 Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit :
 Looks like the java-driver is working on an object mapper:
 
 More modules including a simple object mapper will come shortly.
 But of course I need one now … 
 I'm curious what others are doing here.  
 
 I don't want to 

Re: Object mapper for CQL

2014-06-08 Thread Johan Edstrom
On a second reply I'll provide some docs.

We looked at Astynax (Yeah I didn't like the refactor)
We looked at spring - Are you fucking kidding me?
We have done quite a bit of work in the ORM arena.

* I passionately hate the idea of CQL.  *

So - I told myself, I need to make this work so I never ever
have to work with that. See, I liked Big Table, I loved the idea of modeling 
without 
constrained and  contrived relations. I was even more of a fan 
combining analytics and adjoining vertices.

That said, - Hecate-CQL3 does address all of the above, as well as 
a Pojo / DAO cache, a Table Cache, a what was changed store.

If you actually think you'll be writing enterprise code at speed using 
a Rowset, sorry, you need a foam helmet.

/je



On Jun 8, 2014, at 9:05 PM, Colin colpcl...@gmail.com wrote:

 I wasn't responding as a Datastax employee.
 
 I have used hector, Achilles and a few others as well.  The .net drivers used 
 to have an edge, but that is evaporating as well.
 
 I have also built my own mapping layers.
 
 But all if that was when the drivers from Datastax weren't there yet.
 
 Yes, I work for Datastax.  I also speak at meetups, and contribute to the 
 community.  
 
 Datastax doesn't  charge for the drivers by the way.
 
 I have seen folks use third party drivers and end up paying for it down the 
 road. 
 
 If you're going to consider using a community driver, then I would recommend 
 something that wraps the Datastax drivers, like netflix does.
 
 All I am saying is that sometimes, people make using Casaandra more complex 
 than it needs to be and end up introducing a lot of new tech in their initial 
 adoption-this increases the risk of the project.
 
 Also, I wouldn't use anything built on thrift.  Datastax has a growing driver 
 team, a growing focus on testing and certification, and if you end up wanting 
 support for your project and are using an unsupported driver, it can make 
 your life more difficult.
 
 In response to how quickly  responded, I often try to provide assistance out 
 here-I don't get paid for it, and it's not part of my job. Having close to 5 
 years of production experience with Cassandra means that I have made all the 
 mistakes out there and probably invented a few of my own.
 
 I have watched a lot if the questions Kevin has asked-his project is 
 ambitious for a first dip into Cassandra, I want to see him succeed, and have 
 given him the same advice I give our customers.
 --
 Colin
 320-221-9531
 
 
 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote:
 
 Comments in line...
 
 On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote:
 
 I would check out spring Cassandra-most of the java drivers out there for 
 Cassandra offer very little over the new 2. driver from Datastax.  Or just 
 use the java driver 2. as is.
 
 Interesting… answer came within 7 minutes… from a vendor (Datastax 
 employee)… and terribly opinionated without data to back up… I’m just sayin… 
 ;-)
 
 Colin… did you even look at the driver referenced by Johan?  If so, thats 
 certainly is the fastest code review and driver test I have ever seen. ;-)
 
 Perhaps a bit more kindness may be more appropriate?  Not a great way to 
 build contributions from the community...
 
 SNIP
 
 Whatever you do, make sure the driver you use supports CQL 3 and the native 
 protocol.  Thrift, like BOP, will most likely go away at some point in the 
 future.
 
 Read what Johan stated… “hecate-cql3” — CQL 3
 
 I think a nice look at what was produced may be a good thing for the 
 community and maybe even Datastax may think its kinda cool?
 
 Jeff Genender
 Apache Member
 http://www.apache.org
 
 
 --
 Colin
 320-221-9531
 
 
 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote:
 
 Kevin, 
 
 We are about to release 2.0 of https://github.com/savoirtech/hecate
 It is an ASL licensed library that started with Jeff Genender writing a 
 Pojo
 library in Hector for a project we did for Ecuador (Essentially all of 
 Ecuador uses this).
 I extended this with Pojo Graph stuff like Collections and Composite key 
 indexing.
 
 James Carman then took this a bit further in Cassidy with some new 
 concepts.
 I then a while back decided to bite the bullet and my hatred of CQL and 
 just write 
 the same thing, it started out with a very reflection and somewhat clunky 
 interface, 
 James decided to re-write this and incorporate the learnings from Cassidy.
 
 - Jeff, James and I all work together. This library is already in use and 
 has been 
 in use under 30 mil account circumstances as well as quite decent loads.
 
 What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it 
 is a new API, 
 we support single pojo and Object graph, column modifiers, indexer and 
 everything
 else we could think of in a library that isn't ORM but maps data to C*.
 
 What will be out in I think 2.0.2 is an external indexer very much like 
 Titan and 
 possibly some more real graph (vertices) 

Re: Object mapper for CQL

2014-06-08 Thread Colin
Sounds like you've done some great work.  But I still think it's a good idea 
for people new to Cassandra establish a base line so that they have something 
to compare other approaches against.

It sounds like we potentially have different views in this regard, but are 
still interested in the same thing-helping people be successful using Casaandra.

--
Colin
320-221-9531


 On Jun 8, 2014, at 10:24 PM, Johan Edstrom seij...@gmail.com wrote:
 
 On a second reply I'll provide some docs.
 
 We looked at Astynax (Yeah I didn't like the refactor)
 We looked at spring - Are you fucking kidding me?
 We have done quite a bit of work in the ORM arena.
 
 * I passionately hate the idea of CQL.  *
 
 So - I told myself, I need to make this work so I never ever
 have to work with that. See, I liked Big Table, I loved the idea of modeling 
 without 
 constrained and  contrived relations. I was even more of a fan 
 combining analytics and adjoining vertices.
 
 That said, - Hecate-CQL3 does address all of the above, as well as 
 a Pojo / DAO cache, a Table Cache, a what was changed store.
 
 If you actually think you'll be writing enterprise code at speed using 
 a Rowset, sorry, you need a foam helmet.
 
 /je
 
 
 
 On Jun 8, 2014, at 9:05 PM, Colin colpcl...@gmail.com wrote:
 
 I wasn't responding as a Datastax employee.
 
 I have used hector, Achilles and a few others as well.  The .net drivers 
 used to have an edge, but that is evaporating as well.
 
 I have also built my own mapping layers.
 
 But all if that was when the drivers from Datastax weren't there yet.
 
 Yes, I work for Datastax.  I also speak at meetups, and contribute to the 
 community.  
 
 Datastax doesn't  charge for the drivers by the way.
 
 I have seen folks use third party drivers and end up paying for it down the 
 road. 
 
 If you're going to consider using a community driver, then I would recommend 
 something that wraps the Datastax drivers, like netflix does.
 
 All I am saying is that sometimes, people make using Casaandra more complex 
 than it needs to be and end up introducing a lot of new tech in their 
 initial adoption-this increases the risk of the project.
 
 Also, I wouldn't use anything built on thrift.  Datastax has a growing 
 driver team, a growing focus on testing and certification, and if you end up 
 wanting support for your project and are using an unsupported driver, it can 
 make your life more difficult.
 
 In response to how quickly  responded, I often try to provide assistance out 
 here-I don't get paid for it, and it's not part of my job. Having close to 5 
 years of production experience with Cassandra means that I have made all the 
 mistakes out there and probably invented a few of my own.
 
 I have watched a lot if the questions Kevin has asked-his project is 
 ambitious for a first dip into Cassandra, I want to see him succeed, and 
 have given him the same advice I give our customers.
 --
 Colin
 320-221-9531
 
 
 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote:
 
 Comments in line...
 
 On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote:
 
 I would check out spring Cassandra-most of the java drivers out there for 
 Cassandra offer very little over the new 2. driver from Datastax.  Or just 
 use the java driver 2. as is.
 
 Interesting… answer came within 7 minutes… from a vendor (Datastax 
 employee)… and terribly opinionated without data to back up… I’m just 
 sayin… ;-)
 
 Colin… did you even look at the driver referenced by Johan?  If so, thats 
 certainly is the fastest code review and driver test I have ever seen. ;-)
 
 Perhaps a bit more kindness may be more appropriate?  Not a great way to 
 build contributions from the community...
 
 SNIP
 
 Whatever you do, make sure the driver you use supports CQL 3 and the 
 native protocol.  Thrift, like BOP, will most likely go away at some point 
 in the future.
 
 Read what Johan stated… “hecate-cql3” — CQL 3
 
 I think a nice look at what was produced may be a good thing for the 
 community and maybe even Datastax may think its kinda cool?
 
 Jeff Genender
 Apache Member
 http://www.apache.org
 
 
 --
 Colin
 320-221-9531
 
 
 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote:
 
 Kevin, 
 
 We are about to release 2.0 of https://github.com/savoirtech/hecate
 It is an ASL licensed library that started with Jeff Genender writing a 
 Pojo
 library in Hector for a project we did for Ecuador (Essentially all of 
 Ecuador uses this).
 I extended this with Pojo Graph stuff like Collections and Composite key 
 indexing.
 
 James Carman then took this a bit further in Cassidy with some new 
 concepts.
 I then a while back decided to bite the bullet and my hatred of CQL and 
 just write 
 the same thing, it started out with a very reflection and somewhat clunky 
 interface, 
 James decided to re-write this and incorporate the learnings from Cassidy.
 
 - Jeff, James and I all work together. This library is already in