Re: Object mapper for CQL
You can have a look at Achilles, it's using the Java Driver underneath : https://github.com/doanduyhai/Achilles Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit : Looks like the java-driver is working on an object mapper: More modules including a simple object mapper will come shortly. But of course I need one now … I'm curious what others are doing here. I don't want to pass around Row objects in my code if I can avoid it.. Ideally I would just run a query and get back a POJO. Another issue is how are these POJOs generated. Are they generated from the schema? is the schema generated from the POJOs ? From a side file? And granted, there are existing ORMs out there but I don't think any support CQL. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Data model for streaming a large table in real time.
You do not Need RAID0 for data. Let C* do striping over data disks. And maybe CL ANY/ONE might be sufficient for your writes. Am 08.06.2014 um 06:15 schrieb Kevin Burton bur...@spinn3r.com: we're using containers for other reasons, not just cassandra. Tightly constraining resources means we don't have to worry about cassandra , the JVM , or Linux doing something silly and using too many resources and taking down the whole box. On Sat, Jun 7, 2014 at 8:25 PM, Colin Clark co...@clark.ws wrote: You won't need containers - running one instance of Cassandra in that configuration will hum along quite nicely and will make use of the cores and memory. I'd forget the raid anyway and just mount the disks separately (jbod) -- Colin 320-221-9531 On Jun 7, 2014, at 10:02 PM, Kevin Burton bur...@spinn3r.com wrote: Right now I'm just putting everything together as a proof of concept… so just two cheap replicas for now. And it's at 1/1th of the load. If we lose data it's ok :) I think our config will be 2-3x 400GB SSDs in RAID0 , 3 replicas, 16 cores, probably 48-64GB of RAM each box. Just one datacenter for now… We're probably going to be migrating to using linux containers at some point. This way we can have like 16GB , one 400GB SSD, 4 cores for each image. And we can ditch the RAID which is nice. :) On Sat, Jun 7, 2014 at 7:51 PM, Colin colpcl...@gmail.com wrote: To have any redundancy in the system, start with at least 3 nodes and a replication factor of 3. Try to have at least 8 cores, 32 gig ram, and separate disks for log and data. Will you be replicating data across data centers? -- Colin 320-221-9531 On Jun 7, 2014, at 9:40 PM, Kevin Burton bur...@spinn3r.com wrote: Oh.. To start with we're going to use from 2-10 nodes.. I think we're going to take the original strategy and just to use 100 buckets .. 0-99… then the timestamp under that.. I think it should be fine and won't require an ordered partitioner. :) Thanks! On Sat, Jun 7, 2014 at 7:38 PM, Colin Clark co...@clark.ws wrote: With 100 nodes, that ingestion rate is actually quite low and I don't think you'd need another column in the partition key. You seem to be set in your current direction. Let us know how it works out. -- Colin 320-221-9531 On Jun 7, 2014, at 9:18 PM, Kevin Burton bur...@spinn3r.com wrote: What's 'source' ? You mean like the URL? If source too random it's going to yield too many buckets. Ingestion rates are fairly high but not insane. About 4M inserts per hour.. from 5-10GB… On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark co...@clark.ws wrote: Not if you add another column to the partition key; source for example. I would really try to stay away from the ordered partitioner if at all possible. What ingestion rates are you expecting, in size and speed. -- Colin 320-221-9531 On Jun 7, 2014, at 9:05 PM, Kevin Burton bur...@spinn3r.com wrote: Thanks for the feedback on this btw.. .it's helpful. My notes below. On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark co...@clark.ws wrote: No, you're not-the partition key will get distributed across the cluster if you're using random or murmur. Yes… I'm aware. But in practice this is how it will work… If we create bucket b0, that will get hashed to h0… So say I have 50 machines performing writes, they are all on the same time thanks to ntpd, so they all compute b0 for the current bucket based on the time. That gets hashed to h0… If h0 is hosted on node0 … then all writes go to node zero for that 1 second interval. So all my writes are bottlenecking on one node. That node is *changing* over time… but they're not being dispatched in parallel over N nodes. At most writes will only ever reach 1 node a time. You could also ensure that by adding another column, like source to ensure distribution. (Add the seconds to the partition key, not the clustering columns) I can almost guarantee that if you put too much thought into working against what Cassandra offers out of the box, that it will bite you later. Sure.. I'm trying to avoid the 'bite you later' issues. More so because I'm sure there are Cassandra gotchas to worry about. Everything has them. Just trying to avoid the land mines :-P In fact, the use case that you're describing may best be served by a queuing mechanism, and using Cassandra only for the underlying store. Yes… that's what I'm doing. We're using apollo to fan out the queue, but the writes go back into cassandra and needs to be read out sequentially. I used this exact same approach in a use case that involved writing over a million events/second to a cluster with no problems. Initially, I thought ordered partitioner was the way to go too. And I used separate processes to aggregate, conflate, and handle distribution to clients.
high pending compactions
I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending compaction count. pending tasks: 67 while active compaction tasks are not more than 5. I have a 24CPU machine. Shouldn't I be seeing more compactions? Is this a pattern of high writes and compactions backing up? How can I improve this? Here are my thoughts. Increase memtable_total_space_in_mbIncrease compaction_throughput_mb_per_secIncrease concurrent_compactions Sorry if this was discussed already. Any pointers is much appreciated. Thanks,Kumar
Re: high pending compactions
23 On Sunday, June 8, 2014, S C as...@outlook.com wrote: I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending compaction count. pending tasks: 67 while active compaction tasks are not more than 5. I have a 24CPU machine. Shouldn't I be seeing more compactions? Is this a pattern of high writes and compactions backing up? How can I improve this? Here are my thoughts. 1. Increase memtable_total_space_in_mb 2. Increase compaction_throughput_mb_per_sec 3. Increase concurrent_compactions Sorry if this was discussed already. Any pointers is much appreciated. Thanks, Kumar -- http://twitter.com/tjake
Re: high pending compactions
Have you verified that these aren't stuck compactions? e.g. even under no load, they don't go away? -Bill On 06/08/2014 12:32 PM, Jake Luciani wrote: 23 On Sunday, June 8, 2014, S C as...@outlook.com mailto:as...@outlook.com wrote: I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending compaction count. pending tasks: 67 while active compaction tasks are not more than 5. I have a 24CPU machine. Shouldn't I be seeing more compactions? Is this a pattern of high writes and compactions backing up? How can I improve this? Here are my thoughts. 1. Increase memtable_total_space_in_mb 2. Increase compaction_throughput_mb_per_sec 3. Increase concurrent_compactions Sorry if this was discussed already. Any pointers is much appreciated. Thanks, Kumar -- http://twitter.com/tjake
RE: high pending compactions
How to check if there are any stuck compactions? Date: Sun, 8 Jun 2014 12:43:45 -0400 From: wkat...@cs.rutgers.edu To: user@cassandra.apache.org Subject: Re: high pending compactions Have you verified that these aren't stuck compactions? e.g. even under no load, they don't go away? -Bill On 06/08/2014 12:32 PM, Jake Luciani wrote: 23 On Sunday, June 8, 2014, S C as...@outlook.com mailto:as...@outlook.com wrote: I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending compaction count. pending tasks: 67 while active compaction tasks are not more than 5. I have a 24CPU machine. Shouldn't I be seeing more compactions? Is this a pattern of high writes and compactions backing up? How can I improve this? Here are my thoughts. 1. Increase memtable_total_space_in_mb 2. Increase compaction_throughput_mb_per_sec 3. Increase concurrent_compactions Sorry if this was discussed already. Any pointers is much appreciated. Thanks, Kumar -- http://twitter.com/tjake
Re: Data model for streaming a large table in real time.
Here’s the Jira for the proposal to remove BOP (and OPP), but you can see that there is no clear consensus and that the issue is still open: CASSANDRA-6922 - Investigate if we can drop ByteOrderedPartitioner and OrderPreservingPartitioner in 3.0 https://issues.apache.org/jira/browse/CASSANDRA-6922 You can read the DataStax Cassandra doc for why “Using an ordered partitioner is not recommended”: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePartitionerBOP_c.html “Difficult load balancing... Sequential writes can cause hot spots... Uneven load balancing for multiple tables” -- Jack Krupansky From: Kevin Burton Sent: Saturday, June 7, 2014 1:27 PM To: user@cassandra.apache.org Subject: Re: Data model for streaming a large table in real time. I just checked the source and in 2.1.0 it's not deprecated. So it *might* be *being* deprecated but I haven't seen anything stating that. On Sat, Jun 7, 2014 at 8:03 AM, Colin colpcl...@gmail.com wrote: I believe Byteorderedpartitioner is being deprecated and for good reason. I would look at what you could achieve by using wide rows and murmur3partitioner. -- Colin 320-221-9531 On Jun 6, 2014, at 5:27 PM, Kevin Burton bur...@spinn3r.com wrote: We have the requirement to have clients read from our tables while they're being written. Basically, any write that we make to cassandra needs to be sent out over the Internet to our customers. We also need them to resume so if they go offline, they can just pick up where they left off. They need to do this in parallel, so if we have 20 cassandra nodes, they can have 20 readers each efficiently (and without coordination) reading from our tables. Here's how we're planning on doing it. We're going to use the ByteOrderedPartitioner . I'm writing with a primary key of the timestamp, however, in practice, this would yield hotspots. (I'm also aware that time isn't a very good pk in a distribute system as I can easily have a collision so we're going to use a scheme similar to a uuid to make it unique per writer). One node would take all the load, followed by the next node, etc. So my plan to stop this is to prefix a slice ID to the timestamp. This way each piece of content has a unique ID, but the prefix will place it on a node. The slide ID is just a byte… so this means there are 255 buckets in which I can place data. This means I can have clients each start with a slice, and a timestamp, and page through the data with tokens. This way I can have a client reading with 255 threads from 255 regions in the cluster, in parallel, without any hot spots. Thoughts on this strategy? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Data model for streaming a large table in real time.
Hey Jack. Thanks for posting this… very helpful. So I guess the status is that it was proposed for deprecation but that proposal didn't reach consensus. Also, this gave me an idea to look at the JIRA to see what's being proposed for 3.0 :) Kevin On Sun, Jun 8, 2014 at 1:26 PM, Jack Krupansky j...@basetechnology.com wrote: Here’s the Jira for the proposal to remove BOP (and OPP), but you can see that there is no clear consensus and that the issue is still open: CASSANDRA-6922 - Investigate if we can drop ByteOrderedPartitioner and OrderPreservingPartitioner in 3.0 https://issues.apache.org/jira/browse/CASSANDRA-6922 You can read the DataStax Cassandra doc for why “Using an ordered partitioner is not recommended”: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePartitionerBOP_c.html “Difficult load balancing... Sequential writes can cause hot spots... Uneven load balancing for multiple tables” -- Jack Krupansky *From:* Kevin Burton bur...@spinn3r.com *Sent:* Saturday, June 7, 2014 1:27 PM *To:* user@cassandra.apache.org *Subject:* Re: Data model for streaming a large table in real time. I just checked the source and in 2.1.0 it's not deprecated. So it *might* be *being* deprecated but I haven't seen anything stating that. On Sat, Jun 7, 2014 at 8:03 AM, Colin colpcl...@gmail.com wrote: I believe Byteorderedpartitioner is being deprecated and for good reason. I would look at what you could achieve by using wide rows and murmur3partitioner. -- Colin 320-221-9531 On Jun 6, 2014, at 5:27 PM, Kevin Burton bur...@spinn3r.com wrote: We have the requirement to have clients read from our tables while they're being written. Basically, any write that we make to cassandra needs to be sent out over the Internet to our customers. We also need them to resume so if they go offline, they can just pick up where they left off. They need to do this in parallel, so if we have 20 cassandra nodes, they can have 20 readers each efficiently (and without coordination) reading from our tables. Here's how we're planning on doing it. We're going to use the ByteOrderedPartitioner . I'm writing with a primary key of the timestamp, however, in practice, this would yield hotspots. (I'm also aware that time isn't a very good pk in a distribute system as I can easily have a collision so we're going to use a scheme similar to a uuid to make it unique per writer). One node would take all the load, followed by the next node, etc. So my plan to stop this is to prefix a slice ID to the timestamp. This way each piece of content has a unique ID, but the prefix will place it on a node. The slide ID is just a byte… so this means there are 255 buckets in which I can place data. This means I can have clients each start with a slice, and a timestamp, and page through the data with tokens. This way I can have a client reading with 255 threads from 255 regions in the cluster, in parallel, without any hot spots. Thoughts on this strategy? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Advice on how to handle corruption in system/hints
Hi everyone, We are running some Cassandra clusters (Usually a cluster of 5 nodes with replication factor of 3.) And at least once per day we do see some corruption related to a specific sstable in system/hints. (We are using Cassandra version 1.2.16 on RHEL 6.5) Here is an example of such exception: ERROR [CompactionExecutor:1694] 2014-06-08 21:37:33,267 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:1694,1,main] org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/syste m/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123) ... 23 more INFO [HintedHandoff:35] 2014-06-08 21:37:33,267 HintedHandOffManager.java (line 296) Started hinted handoff for host: 502a48cd-171b-4e83-a9ad-67f32437353a with IP: /10.210.239.190 ERROR [HintedHandoff:33] 2014-06-08 21:37:33,267 CassandraDaemon.java (line 191) Exception in thread Thread[HintedHandoff:33,1,main] java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:441) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:508) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at
Re: Object mapper for CQL
Kevin, We are about to release 2.0 of https://github.com/savoirtech/hecate It is an ASL licensed library that started with Jeff Genender writing a Pojo library in Hector for a project we did for Ecuador (Essentially all of Ecuador uses this). I extended this with Pojo Graph stuff like Collections and Composite key indexing. James Carman then took this a bit further in Cassidy with some new concepts. I then a while back decided to bite the bullet and my hatred of CQL and just write the same thing, it started out with a very reflection and somewhat clunky interface, James decided to re-write this and incorporate the learnings from Cassidy. - Jeff, James and I all work together. This library is already in use and has been in use under 30 mil account circumstances as well as quite decent loads. What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is a new API, we support single pojo and Object graph, column modifiers, indexer and everything else we could think of in a library that isn't ORM but maps data to C*. What will be out in I think 2.0.2 is an external indexer very much like Titan and possibly some more real graph (vertices) stuff. We are also looking at an SchemaIdentifier so that we can get back to working with dynamic columns at a decent conceptual speed :) /je On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote: You can have a look at Achilles, it's using the Java Driver underneath : https://github.com/doanduyhai/Achilles Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit : Looks like the java-driver is working on an object mapper: More modules including a simple object mapper will come shortly. But of course I need one now … I'm curious what others are doing here. I don't want to pass around Row objects in my code if I can avoid it.. Ideally I would just run a query and get back a POJO. Another issue is how are these POJOs generated. Are they generated from the schema? is the schema generated from the POJOs ? From a side file? And granted, there are existing ORMs out there but I don't think any support CQL. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Object mapper for CQL
I would check out spring Cassandra-most of the java drivers out there for Cassandra offer very little over the new 2. driver from Datastax. Or just use the java driver 2. as is. There's even a query builder light fluent DSL if you don't like cql. Based upon your use case description so far, I don't think you need to get too funky with your data access layer. Whatever you do, make sure the driver you use supports CQL 3 and the native protocol. Thrift, like BOP, will most likely go away at some point in the future. -- Colin 320-221-9531 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote: Kevin, We are about to release 2.0 of https://github.com/savoirtech/hecate It is an ASL licensed library that started with Jeff Genender writing a Pojo library in Hector for a project we did for Ecuador (Essentially all of Ecuador uses this). I extended this with Pojo Graph stuff like Collections and Composite key indexing. James Carman then took this a bit further in Cassidy with some new concepts. I then a while back decided to bite the bullet and my hatred of CQL and just write the same thing, it started out with a very reflection and somewhat clunky interface, James decided to re-write this and incorporate the learnings from Cassidy. - Jeff, James and I all work together. This library is already in use and has been in use under 30 mil account circumstances as well as quite decent loads. What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is a new API, we support single pojo and Object graph, column modifiers, indexer and everything else we could think of in a library that isn't ORM but maps data to C*. What will be out in I think 2.0.2 is an external indexer very much like Titan and possibly some more real graph (vertices) stuff. We are also looking at an SchemaIdentifier so that we can get back to working with dynamic columns at a decent conceptual speed :) /je On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote: You can have a look at Achilles, it's using the Java Driver underneath : https://github.com/doanduyhai/Achilles Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit : Looks like the java-driver is working on an object mapper: More modules including a simple object mapper will come shortly. But of course I need one now … I'm curious what others are doing here. I don't want to pass around Row objects in my code if I can avoid it.. Ideally I would just run a query and get back a POJO. Another issue is how are these POJOs generated. Are they generated from the schema? is the schema generated from the POJOs ? From a side file? And granted, there are existing ORMs out there but I don't think any support CQL. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Object mapper for CQL
I wasn't responding as a Datastax employee. I have used hector, Achilles and a few others as well. The .net drivers used to have an edge, but that is evaporating as well. I have also built my own mapping layers. But all if that was when the drivers from Datastax weren't there yet. Yes, I work for Datastax. I also speak at meetups, and contribute to the community. Datastax doesn't charge for the drivers by the way. I have seen folks use third party drivers and end up paying for it down the road. If you're going to consider using a community driver, then I would recommend something that wraps the Datastax drivers, like netflix does. All I am saying is that sometimes, people make using Casaandra more complex than it needs to be and end up introducing a lot of new tech in their initial adoption-this increases the risk of the project. Also, I wouldn't use anything built on thrift. Datastax has a growing driver team, a growing focus on testing and certification, and if you end up wanting support for your project and are using an unsupported driver, it can make your life more difficult. In response to how quickly responded, I often try to provide assistance out here-I don't get paid for it, and it's not part of my job. Having close to 5 years of production experience with Cassandra means that I have made all the mistakes out there and probably invented a few of my own. I have watched a lot if the questions Kevin has asked-his project is ambitious for a first dip into Cassandra, I want to see him succeed, and have given him the same advice I give our customers. -- Colin 320-221-9531 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote: Comments in line... On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote: I would check out spring Cassandra-most of the java drivers out there for Cassandra offer very little over the new 2. driver from Datastax. Or just use the java driver 2. as is. Interesting… answer came within 7 minutes… from a vendor (Datastax employee)… and terribly opinionated without data to back up… I’m just sayin… ;-) Colin… did you even look at the driver referenced by Johan? If so, thats certainly is the fastest code review and driver test I have ever seen. ;-) Perhaps a bit more kindness may be more appropriate? Not a great way to build contributions from the community... SNIP Whatever you do, make sure the driver you use supports CQL 3 and the native protocol. Thrift, like BOP, will most likely go away at some point in the future. Read what Johan stated… “hecate-cql3” — CQL 3 I think a nice look at what was produced may be a good thing for the community and maybe even Datastax may think its kinda cool? Jeff Genender Apache Member http://www.apache.org -- Colin 320-221-9531 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote: Kevin, We are about to release 2.0 of https://github.com/savoirtech/hecate It is an ASL licensed library that started with Jeff Genender writing a Pojo library in Hector for a project we did for Ecuador (Essentially all of Ecuador uses this). I extended this with Pojo Graph stuff like Collections and Composite key indexing. James Carman then took this a bit further in Cassidy with some new concepts. I then a while back decided to bite the bullet and my hatred of CQL and just write the same thing, it started out with a very reflection and somewhat clunky interface, James decided to re-write this and incorporate the learnings from Cassidy. - Jeff, James and I all work together. This library is already in use and has been in use under 30 mil account circumstances as well as quite decent loads. What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is a new API, we support single pojo and Object graph, column modifiers, indexer and everything else we could think of in a library that isn't ORM but maps data to C*. What will be out in I think 2.0.2 is an external indexer very much like Titan and possibly some more real graph (vertices) stuff. We are also looking at an SchemaIdentifier so that we can get back to working with dynamic columns at a decent conceptual speed :) /je On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote: You can have a look at Achilles, it's using the Java Driver underneath : https://github.com/doanduyhai/Achilles Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit : Looks like the java-driver is working on an object mapper: More modules including a simple object mapper will come shortly. But of course I need one now … I'm curious what others are doing here. I don't want to pass around Row objects in my code if I can avoid it.. Ideally I would just run a query and get back a POJO. Another issue is how are these POJOs generated. Are they generated from the schema? is the schema generated from
Re: Object mapper for CQL
So - you deduced that we were not using the driver, were not datstax friendly and we'd be paying for this down the road? On Jun 8, 2014, at 9:05 PM, Colin colpcl...@gmail.com wrote: I wasn't responding as a Datastax employee. I have used hector, Achilles and a few others as well. The .net drivers used to have an edge, but that is evaporating as well. I have also built my own mapping layers. But all if that was when the drivers from Datastax weren't there yet. Yes, I work for Datastax. I also speak at meetups, and contribute to the community. Datastax doesn't charge for the drivers by the way. I have seen folks use third party drivers and end up paying for it down the road. If you're going to consider using a community driver, then I would recommend something that wraps the Datastax drivers, like netflix does. All I am saying is that sometimes, people make using Casaandra more complex than it needs to be and end up introducing a lot of new tech in their initial adoption-this increases the risk of the project. Also, I wouldn't use anything built on thrift. Datastax has a growing driver team, a growing focus on testing and certification, and if you end up wanting support for your project and are using an unsupported driver, it can make your life more difficult. In response to how quickly responded, I often try to provide assistance out here-I don't get paid for it, and it's not part of my job. Having close to 5 years of production experience with Cassandra means that I have made all the mistakes out there and probably invented a few of my own. I have watched a lot if the questions Kevin has asked-his project is ambitious for a first dip into Cassandra, I want to see him succeed, and have given him the same advice I give our customers. -- Colin 320-221-9531 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote: Comments in line... On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote: I would check out spring Cassandra-most of the java drivers out there for Cassandra offer very little over the new 2. driver from Datastax. Or just use the java driver 2. as is. Interesting… answer came within 7 minutes… from a vendor (Datastax employee)… and terribly opinionated without data to back up… I’m just sayin… ;-) Colin… did you even look at the driver referenced by Johan? If so, thats certainly is the fastest code review and driver test I have ever seen. ;-) Perhaps a bit more kindness may be more appropriate? Not a great way to build contributions from the community... SNIP Whatever you do, make sure the driver you use supports CQL 3 and the native protocol. Thrift, like BOP, will most likely go away at some point in the future. Read what Johan stated… “hecate-cql3” — CQL 3 I think a nice look at what was produced may be a good thing for the community and maybe even Datastax may think its kinda cool? Jeff Genender Apache Member http://www.apache.org -- Colin 320-221-9531 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote: Kevin, We are about to release 2.0 of https://github.com/savoirtech/hecate It is an ASL licensed library that started with Jeff Genender writing a Pojo library in Hector for a project we did for Ecuador (Essentially all of Ecuador uses this). I extended this with Pojo Graph stuff like Collections and Composite key indexing. James Carman then took this a bit further in Cassidy with some new concepts. I then a while back decided to bite the bullet and my hatred of CQL and just write the same thing, it started out with a very reflection and somewhat clunky interface, James decided to re-write this and incorporate the learnings from Cassidy. - Jeff, James and I all work together. This library is already in use and has been in use under 30 mil account circumstances as well as quite decent loads. What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is a new API, we support single pojo and Object graph, column modifiers, indexer and everything else we could think of in a library that isn't ORM but maps data to C*. What will be out in I think 2.0.2 is an external indexer very much like Titan and possibly some more real graph (vertices) stuff. We are also looking at an SchemaIdentifier so that we can get back to working with dynamic columns at a decent conceptual speed :) /je On Jun 8, 2014, at 2:46 AM, DuyHai Doan doanduy...@gmail.com wrote: You can have a look at Achilles, it's using the Java Driver underneath : https://github.com/doanduyhai/Achilles Le 8 juin 2014 04:24, Kevin Burton bur...@spinn3r.com a écrit : Looks like the java-driver is working on an object mapper: More modules including a simple object mapper will come shortly. But of course I need one now … I'm curious what others are doing here. I don't want to
Re: Object mapper for CQL
On a second reply I'll provide some docs. We looked at Astynax (Yeah I didn't like the refactor) We looked at spring - Are you fucking kidding me? We have done quite a bit of work in the ORM arena. * I passionately hate the idea of CQL. * So - I told myself, I need to make this work so I never ever have to work with that. See, I liked Big Table, I loved the idea of modeling without constrained and contrived relations. I was even more of a fan combining analytics and adjoining vertices. That said, - Hecate-CQL3 does address all of the above, as well as a Pojo / DAO cache, a Table Cache, a what was changed store. If you actually think you'll be writing enterprise code at speed using a Rowset, sorry, you need a foam helmet. /je On Jun 8, 2014, at 9:05 PM, Colin colpcl...@gmail.com wrote: I wasn't responding as a Datastax employee. I have used hector, Achilles and a few others as well. The .net drivers used to have an edge, but that is evaporating as well. I have also built my own mapping layers. But all if that was when the drivers from Datastax weren't there yet. Yes, I work for Datastax. I also speak at meetups, and contribute to the community. Datastax doesn't charge for the drivers by the way. I have seen folks use third party drivers and end up paying for it down the road. If you're going to consider using a community driver, then I would recommend something that wraps the Datastax drivers, like netflix does. All I am saying is that sometimes, people make using Casaandra more complex than it needs to be and end up introducing a lot of new tech in their initial adoption-this increases the risk of the project. Also, I wouldn't use anything built on thrift. Datastax has a growing driver team, a growing focus on testing and certification, and if you end up wanting support for your project and are using an unsupported driver, it can make your life more difficult. In response to how quickly responded, I often try to provide assistance out here-I don't get paid for it, and it's not part of my job. Having close to 5 years of production experience with Cassandra means that I have made all the mistakes out there and probably invented a few of my own. I have watched a lot if the questions Kevin has asked-his project is ambitious for a first dip into Cassandra, I want to see him succeed, and have given him the same advice I give our customers. -- Colin 320-221-9531 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote: Comments in line... On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote: I would check out spring Cassandra-most of the java drivers out there for Cassandra offer very little over the new 2. driver from Datastax. Or just use the java driver 2. as is. Interesting… answer came within 7 minutes… from a vendor (Datastax employee)… and terribly opinionated without data to back up… I’m just sayin… ;-) Colin… did you even look at the driver referenced by Johan? If so, thats certainly is the fastest code review and driver test I have ever seen. ;-) Perhaps a bit more kindness may be more appropriate? Not a great way to build contributions from the community... SNIP Whatever you do, make sure the driver you use supports CQL 3 and the native protocol. Thrift, like BOP, will most likely go away at some point in the future. Read what Johan stated… “hecate-cql3” — CQL 3 I think a nice look at what was produced may be a good thing for the community and maybe even Datastax may think its kinda cool? Jeff Genender Apache Member http://www.apache.org -- Colin 320-221-9531 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote: Kevin, We are about to release 2.0 of https://github.com/savoirtech/hecate It is an ASL licensed library that started with Jeff Genender writing a Pojo library in Hector for a project we did for Ecuador (Essentially all of Ecuador uses this). I extended this with Pojo Graph stuff like Collections and Composite key indexing. James Carman then took this a bit further in Cassidy with some new concepts. I then a while back decided to bite the bullet and my hatred of CQL and just write the same thing, it started out with a very reflection and somewhat clunky interface, James decided to re-write this and incorporate the learnings from Cassidy. - Jeff, James and I all work together. This library is already in use and has been in use under 30 mil account circumstances as well as quite decent loads. What you see in trunk now under hecate-cql3 is what'll go out as 2.0, it is a new API, we support single pojo and Object graph, column modifiers, indexer and everything else we could think of in a library that isn't ORM but maps data to C*. What will be out in I think 2.0.2 is an external indexer very much like Titan and possibly some more real graph (vertices)
Re: Object mapper for CQL
Sounds like you've done some great work. But I still think it's a good idea for people new to Cassandra establish a base line so that they have something to compare other approaches against. It sounds like we potentially have different views in this regard, but are still interested in the same thing-helping people be successful using Casaandra. -- Colin 320-221-9531 On Jun 8, 2014, at 10:24 PM, Johan Edstrom seij...@gmail.com wrote: On a second reply I'll provide some docs. We looked at Astynax (Yeah I didn't like the refactor) We looked at spring - Are you fucking kidding me? We have done quite a bit of work in the ORM arena. * I passionately hate the idea of CQL. * So - I told myself, I need to make this work so I never ever have to work with that. See, I liked Big Table, I loved the idea of modeling without constrained and contrived relations. I was even more of a fan combining analytics and adjoining vertices. That said, - Hecate-CQL3 does address all of the above, as well as a Pojo / DAO cache, a Table Cache, a what was changed store. If you actually think you'll be writing enterprise code at speed using a Rowset, sorry, you need a foam helmet. /je On Jun 8, 2014, at 9:05 PM, Colin colpcl...@gmail.com wrote: I wasn't responding as a Datastax employee. I have used hector, Achilles and a few others as well. The .net drivers used to have an edge, but that is evaporating as well. I have also built my own mapping layers. But all if that was when the drivers from Datastax weren't there yet. Yes, I work for Datastax. I also speak at meetups, and contribute to the community. Datastax doesn't charge for the drivers by the way. I have seen folks use third party drivers and end up paying for it down the road. If you're going to consider using a community driver, then I would recommend something that wraps the Datastax drivers, like netflix does. All I am saying is that sometimes, people make using Casaandra more complex than it needs to be and end up introducing a lot of new tech in their initial adoption-this increases the risk of the project. Also, I wouldn't use anything built on thrift. Datastax has a growing driver team, a growing focus on testing and certification, and if you end up wanting support for your project and are using an unsupported driver, it can make your life more difficult. In response to how quickly responded, I often try to provide assistance out here-I don't get paid for it, and it's not part of my job. Having close to 5 years of production experience with Cassandra means that I have made all the mistakes out there and probably invented a few of my own. I have watched a lot if the questions Kevin has asked-his project is ambitious for a first dip into Cassandra, I want to see him succeed, and have given him the same advice I give our customers. -- Colin 320-221-9531 On Jun 8, 2014, at 9:43 PM, Jeff Genender jgenen...@apache.org wrote: Comments in line... On Jun 8, 2014, at 8:05 PM, Colin colpcl...@gmail.com wrote: I would check out spring Cassandra-most of the java drivers out there for Cassandra offer very little over the new 2. driver from Datastax. Or just use the java driver 2. as is. Interesting… answer came within 7 minutes… from a vendor (Datastax employee)… and terribly opinionated without data to back up… I’m just sayin… ;-) Colin… did you even look at the driver referenced by Johan? If so, thats certainly is the fastest code review and driver test I have ever seen. ;-) Perhaps a bit more kindness may be more appropriate? Not a great way to build contributions from the community... SNIP Whatever you do, make sure the driver you use supports CQL 3 and the native protocol. Thrift, like BOP, will most likely go away at some point in the future. Read what Johan stated… “hecate-cql3” — CQL 3 I think a nice look at what was produced may be a good thing for the community and maybe even Datastax may think its kinda cool? Jeff Genender Apache Member http://www.apache.org -- Colin 320-221-9531 On Jun 8, 2014, at 8:58 PM, Johan Edstrom seij...@gmail.com wrote: Kevin, We are about to release 2.0 of https://github.com/savoirtech/hecate It is an ASL licensed library that started with Jeff Genender writing a Pojo library in Hector for a project we did for Ecuador (Essentially all of Ecuador uses this). I extended this with Pojo Graph stuff like Collections and Composite key indexing. James Carman then took this a bit further in Cassidy with some new concepts. I then a while back decided to bite the bullet and my hatred of CQL and just write the same thing, it started out with a very reflection and somewhat clunky interface, James decided to re-write this and incorporate the learnings from Cassidy. - Jeff, James and I all work together. This library is already in