Re: Experience with Kubernetes
You can do that with the Mesos scheduler https://github.com/elodina/datastax-enterprise-mesos and layout clusters and racks for datacenters based on attributes http://mesos.apache.org/documentation/latest/attributes-resources/ ~ Joestein On Apr 14, 2016 12:05 PM, "Nate McCall"wrote: > > Does anybody here have any experience, positive or negative, with >> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any >> immediate need (or experience), but I am curious about the pros and cons. >> >> > > The last time I played around with kubernetes+cassandra, you could not > specify node allocations across failure boundaries (AZs, Regions, etc). > > To me, that makes it not interesting outside of development or trivial > setups. > > It does look like they are getting farther along on "ubernetes" which > should fix this: > > https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md > > > > -- > - > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >
Re: Experience with Kubernetes
You can use Mesos https://github.com/elodina/datastax-enterprise-mesos ~ Joestein On Apr 14, 2016 10:13 AM, "Jack Krupansky"wrote: > Does anybody here have any experience, positive or negative, with > deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any > immediate need (or experience), but I am curious about the pros and cons. > > There is an example here: > https://github.com/kubernetes/kubernetes/tree/master/examples/cassandra > > Is there a better approach to deploying a Cassandra/DSE cluster than > Kubernetes? > > Thanks. > > -- Jack Krupansky >
Re: Scala Driver?
Here is an example wrapper how to use the DataStax java driver in scala https://github.com/stealthly/scala-cassandra /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Mar 15, 2014, at 1:42 PM, NORD SC jan.algermis...@nordsc.com wrote: Hi all, I am building a system using the Play 2 Framework with Scala and wonder what driver I should use or whether I should wrap the Datastax java-driver myself? Can you share any experience with the available drivers? I have looked a Phantom briefly which seems to use java-driver internally - would that be the best choice? Jan
Re: Queuing System
If performance and availability for messaging is a requirement then use Apache Kafka http://kafka.apache.org/ You can pass the same thrift/avro objects through the Kafka commit log or strings or whatever you want. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote: Hi Michael, Yes I am planning to use RabbitMQ for my messaging system. But I wonder which will give better performance if writing directly into Rabbit with Ack support Vs a temporary Queue in Cassandra first and then dequeue and publish in Rabbit. Complexities involving - Handling scenarios like Rabbit Connection failure etc Vs Cassandra write performance and replication with hinted handoff support etc, makes me wonder if this is a better path. Regards, Jagan On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing michael.la...@nytimes.com wrote We use RabbitMQ for queuing and Cassandra for persistence. RabbitMQ with clustering and/or federation should meet your high availability needs. Michael On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote: Jagan Queue-like data structures are known to be one of the worst anti patterns for Cassandra: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote: Hi, I need to decouple some of the work being processed from the user thread to provide better user experience. For that I need a queuing system with the following needs, High Availability No Data Loss Better Performance. Following are some libraries that were considered along with the limitation I see, Redis - Data Loss ZooKeeper - Not advised for Queue system. TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. With replication requirement, I probably have to look at Apache ActiveMQ+LevelDB. After checking on the third option above, I kind of wonder if Cassandra with Leveled Compaction offer a similar system. Do you see any issues in such a usage or is there other better solutions available. Will be great to get insights on this. Regards, Jagan
Re: Queuing System
Without them you have no durability. With them you have guarantees... More than any other system with messaging features. It is a durable CP commit log. Works very well for data pipelines with AP systems like Cassandra which is a different system solving different problems. When a Kafka leader fails you right might block and wait for 10ms while a new leader is elected but writes can be guaranteed. The consumers then read and process data and write to Cassandra. And then have your app read from Cassandra for what what was processed. These are very typical type architectures at scale https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Feb 22, 2014, at 11:49 AM, Jagan Ranganathan ja...@zohocorp.com wrote: Hi Joe, If my understanding is right, Kafka does not satisfy the high availability/replication part well because of the need for leader and In-Sync replicas. Regards, Jagan On Sat, 22 Feb 2014 22:02:27 +0530 Joe Steincrypt...@gmail.com wrote If performance and availability for messaging is a requirement then use Apache Kafka http://kafka.apache.org/ You can pass the same thrift/avro objects through the Kafka commit log or strings or whatever you want. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote: Hi Michael, Yes I am planning to use RabbitMQ for my messaging system. But I wonder which will give better performance if writing directly into Rabbit with Ack support Vs a temporary Queue in Cassandra first and then dequeue and publish in Rabbit. Complexities involving - Handling scenarios like Rabbit Connection failure etc Vs Cassandra write performance and replication with hinted handoff support etc, makes me wonder if this is a better path. Regards, Jagan On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing michael.la...@nytimes.com wrote We use RabbitMQ for queuing and Cassandra for persistence. RabbitMQ with clustering and/or federation should meet your high availability needs. Michael On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote: Jagan Queue-like data structures are known to be one of the worst anti patterns for Cassandra: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote: Hi, I need to decouple some of the work being processed from the user thread to provide better user experience. For that I need a queuing system with the following needs, High Availability No Data Loss Better Performance. Following are some libraries that were considered along with the limitation I see, Redis - Data Loss ZooKeeper - Not advised for Queue system. TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. With replication requirement, I probably have to look at Apache ActiveMQ+LevelDB. After checking on the third option above, I kind of wonder if Cassandra with Leveled Compaction offer a similar system. Do you see any issues in such a usage or is there other better solutions available. Will be great to get insights on this. Regards, Jagan
Using tab in CQL COPY DELIMITER
Hi, trying to use a tab delimiter when copying out of c* (2.0.4) and getting an error cqlsh:test CREATE TABLE airplanes ( ... name text PRIMARY KEY, ... manufacturer ascii, ... year int, ... mach float ... ); cqlsh:bombast INSERT INTO airplanes (name, manufacturer, year, mach) VALUES ('P38-Lightning', 'Lockheed', 1937, 7); cqlsh:bombast COPY airplanes (name, manufacturer, year, mach) TO 'temp.tsv' WITH DELIMITER = '\t'; delimiter must be an 1-character string any ideas how to use tabs as a delimiter? Thanks /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.
I updated my repo with Vagrant and bash scripts to install Cassandra 2.0.3 https://github.com/stealthly/scala-cassandra/ 0) git clone https://github.com/stealthly/scala-cassandra 1) cd scala-cassandra 2) vagrant up Cassandra will be running in the virtual machine on 172.16.7.2 and is accessible from your host machine (cqlsh, your app, whatever). To verify step 3 would be ./sbt test just to make sure everything is running right. Everyone time you rebuild the VM (takes a minute or two) it is a whole new instance. If you fork foreground you have to worry about data and that not isolated and other stuff. On Fri, Dec 27, 2013 at 10:48 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I think i will invest the time launching cassandra in a forked forground process, maybe building the yaml dynamically. On Friday, December 27, 2013, Nate McCall n...@thelastpickle.com wrote: I've also moved on to container-based (using Vagrant+docker) setup for doing automated integration stuff. This is more difficult to configure for build systems like Jenkins, but it can be done and once completed the benefits are substantial - as Joe notes, the most immediate is the removal of variance between different environments. However, for in process testing with Maven or similar, the Usergrid project [0] probably has the most functionally advanced test architecture [1]. Do understand that it took us a very long time to get there and involves some fairly tight integration with JUnit and (to a lesser degree) maven. The UG plumbing is purpose built towards a specific data model so it's not something that can be just dropped in, but it can be pulled apart in a straight forward way (provided you understand JUnit - which is not really trivial) and generalized pretty easily. It's all ASF-licensed, so take what you need if you find it useful. [0] https://usergrid.incubator.apache.org/ [1] https://github.com/usergrid/usergrid/blob/master/stack/test-utils/src/main/java/org/usergrid/cassandra/CassandraResource.java On Wed, Dec 25, 2013 at 2:42 PM, Joe Stein crypt...@gmail.com wrote: I have been using vagrant (e.g. https://github.com/stealthly/scala-cassandra/ ) which is 100% reproducible across devs and test systems (prod in some cases). Also have a Docker setup too https://github.com/pegasussolutions/docker-cassandra . I have been doing this more and more with clients to better mimic production before production and smoothing the release process from development. I also use packer (scripts released soon) to build images too (http://packer.io) Love vagrant, packer and docker!!! Apache Mesos too :) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Dec 25, 2013, at 3:28 PM, horschi hors...@gmail.com wrote: Hi Ed, my opinion on unit testing with C* is: Use the real database, not any embedded crap :-) All you need are fast truncates, by which I mean: JVM_OPTS=$JVM_OPTS -Dcassandra.unsafesystem=true and auto_snapshot: false This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0 yet). Imho this setup is better for multiple reasons: - No extra classpath issues - Faster: Running JUnits and C* in one JVM would require a really large heap (for me at least). - Faster: No Cassandra startup everytime I run my tests. The only downside is that developers must change the properties in their configs. cheers, Christian On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am not sure there how many people have been around developing Cassandra for as long as I have, but the state of all the client libraries and the cassandra server is WORD_I_DONT_WANT_TO_SAY. Here is an example of something I am seeing: ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main] java.lang.AbstractMethodError: org.apache.thrift.ProcessFunction.isOneway()Z at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99 In short: If you are new to cassandra and only using the newest client I am sure everything is peachy for you. For people that have been using Cassandra for a while it is harder to jump ship when something better comes along. You need sometimes to support both hector and astyanax, it happens. For a while I have been using hector. Even not to use hector
Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.
I have been using vagrant (e.g. https://github.com/stealthly/scala-cassandra/ ) which is 100% reproducible across devs and test systems (prod in some cases). Also have a Docker setup too https://github.com/pegasussolutions/docker-cassandra . I have been doing this more and more with clients to better mimic production before production and smoothing the release process from development. I also use packer (scripts released soon) to build images too (http://packer.io) Love vagrant, packer and docker!!! Apache Mesos too :) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Dec 25, 2013, at 3:28 PM, horschi hors...@gmail.com wrote: Hi Ed, my opinion on unit testing with C* is: Use the real database, not any embedded crap :-) All you need are fast truncates, by which I mean: JVM_OPTS=$JVM_OPTS -Dcassandra.unsafesystem=true and auto_snapshot: false This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0 yet). Imho this setup is better for multiple reasons: - No extra classpath issues - Faster: Running JUnits and C* in one JVM would require a really large heap (for me at least). - Faster: No Cassandra startup everytime I run my tests. The only downside is that developers must change the properties in their configs. cheers, Christian On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am not sure there how many people have been around developing Cassandra for as long as I have, but the state of all the client libraries and the cassandra server is WORD_I_DONT_WANT_TO_SAY. Here is an example of something I am seeing: ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main] java.lang.AbstractMethodError: org.apache.thrift.ProcessFunction.isOneway()Z at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99 In short: If you are new to cassandra and only using the newest client I am sure everything is peachy for you. For people that have been using Cassandra for a while it is harder to jump ship when something better comes along. You need sometimes to support both hector and astyanax, it happens. For a while I have been using hector. Even not to use hector as an API, but the one nice thing I got from hector was a simple EmbeddedServer that would clean up after itself. Hector seems badly broken at the moment. I have no idea how the current versions track with anything out there in the cassandra world. For a while I played with https://github.com/Netflix/astyanax, which has it's own version and schemes and dependent libraries. (astyanax has some packaging error that forces me into maven3) Enter cassandra 2.0 which forces you into java 0.7. Besides that it has it's own kit of things it seems to want. I am guessing since hectors embedded server does not work, and I should go to https://github.com/jsevellec/cassandra-unit not sure...really...how anyone does this anymore. I am sure I could dive into the source code and figure this out, but I would just rather have a stable piece of code that brings up the embedded server that just works and continues working. I can not seem to get this working right either. (since it includes hector I see from the pom) Between thrift, cassandra,client x, it is almost impossible to build a sane classpath, and that is not even counting the fact that people have their own classpath issues (with guava mismatches etc). I think the only sane thing to do is start shipping cassandra-embedded like this: https://github.com/kstyrc/embedded-redis In other words package embedded-cassandra as a binary. Don't force the client/application developer to bring cassandra on the classpath and fight with mismatches in thrift/guava etc. That or provide a completely shaded cassandra server for embedded testing. As it stands now trying to support a setup that uses more than one client or works with multiple versions of cassandra is major pita. (aka library x compiled against 1.2.0 library y compiled against 2.0.3) Does anyone have any thoughts on this, or tried something similar? Edward
Re: Cassandra book/tuturial
http://www.planetcassandra.org has a lot of great resources on it. Eben Hewitt's book is great, as are the other C* books like the High Performance Cookbook http://www.amazon.com/Cassandra-Performance-Cookbook-Edward-Capriolo/dp/1849515123 I would recommend reading both of those books. You can also read http://www.datastax.com/dev/blog/thrift-to-cql3 to help understandings. From there go with CQL http://cassandra.apache.org/doc/cql3/CQL.html /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Sun, Oct 27, 2013 at 11:58 PM, Mohan L l.mohan...@gmail.com wrote: And here also good intro: http://10kloc.wordpress.com/category/nosql-2/ Thanks Mohan L On Mon, Oct 28, 2013 at 8:02 AM, Danie Viljoen dav...@gmail.com wrote: Not a book, but I think this is a good start: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html On Mon, Oct 28, 2013 at 3:14 PM, Dave Brosius dbros...@mebigfatguy.comwrote: Unfortunately, as tech books tend to be, it's quite a bit out of date, at this point. On 10/27/2013 09:54 PM, Mohan L wrote: On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.comwrote: Hey Guys, What is the best book to learn Cassandra from scratch? Thanks in advance, Erwin Hi, Buy : Cassandra: The Definitive Guide By Eben Hewitt : http://shop.oreilly.com/product/0636920010852.do Thanks Mohan L
Re: Cassandra book/tuturial
Reading previous version's documentation and related information from that time in the past (like books) has value! It helps to understand decisions that were made and changed and some that are still the same like Secondary Indexes which were introduced in 0.7 when http://www.amazon.com/Cassandra-Definitive-Guide-Eben-Hewitt/dp/1449390412came out back in 2011. If you are really just getting started then I say go and start here http://www.planetcassandra.org/ /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Mon, Oct 28, 2013 at 12:15 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: With lot of enthusiasm i started reading it. Its out-dated, error prone. I could not even get Cassandra running from that book. Eventually i could not get start with cassandra. On Mon, Oct 28, 2013 at 9:41 AM, Joe Stein crypt...@gmail.com wrote: http://www.planetcassandra.org has a lot of great resources on it. Eben Hewitt's book is great, as are the other C* books like the High Performance Cookbook http://www.amazon.com/Cassandra-Performance-Cookbook-Edward-Capriolo/dp/1849515123 I would recommend reading both of those books. You can also read http://www.datastax.com/dev/blog/thrift-to-cql3 to help understandings. From there go with CQL http://cassandra.apache.org/doc/cql3/CQL.html /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Sun, Oct 27, 2013 at 11:58 PM, Mohan L l.mohan...@gmail.com wrote: And here also good intro: http://10kloc.wordpress.com/category/nosql-2/ Thanks Mohan L On Mon, Oct 28, 2013 at 8:02 AM, Danie Viljoen dav...@gmail.com wrote: Not a book, but I think this is a good start: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html On Mon, Oct 28, 2013 at 3:14 PM, Dave Brosius dbros...@mebigfatguy.com wrote: Unfortunately, as tech books tend to be, it's quite a bit out of date, at this point. On 10/27/2013 09:54 PM, Mohan L wrote: On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.comwrote: Hey Guys, What is the best book to learn Cassandra from scratch? Thanks in advance, Erwin Hi, Buy : Cassandra: The Definitive Guide By Eben Hewitt : http://shop.oreilly.com/product/0636920010852.do Thanks Mohan L -- Deepak
Re: Cassandra Geospatial Search
what about using geo hashes http://geohash.org/dr5ru2mevjppe store as column names the geo hashes geohash#dr5ru2mevjppe geohash#dr5ru2mevjpp geohash#dr5ru2mevjp geohash#dr5ru2mevj geohash#dr5ru2mev geohash#dr5ru2me geohash#dr5ru2m geohash#dr5ru2 geohash#dr5ru geohash#dr5 the rows is what you want to return do a MultigetSliceQuery like this https://github.com/joestein/skeletor/blob/master/src/test/scala/skeletor/SkeletorSpec.scala#L171 the column value you can hold some json objects or more serialization on relationships from there, maybe persisted graph structure here are my slides on how we do this and what for http://files.meetup.com/1794037/jstein.meetup.cassandra2002.pptx On Wed, Feb 13, 2013 at 8:42 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, Has anyone on this mailing list tried to build a bounding box style (get the records inside a known bounding box) geospatial search? I've been researching this a bit and seems like the only attempt at this was by SimpleGeo guys, but there isn't much public info out there on how they did it besides the a video. -- Drew -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
Re: hector timeouts
lots of folks use Apache Kafka, check out https://cwiki.apache.org/confluence/display/KAFKA/Powered+By just to name a few you can read about the performance for yourself http://incubator.apache.org/kafka/performance.html @ http://www.medialets.com we use Kafka upstream of Cassandra acting like a queue so our workers can do their business logic prior to storing their results in Cassandra, Hadoop MySQL this decouples our backend analytics from our forwarding facing system keeping our forward facing system (ad serving to mobile devices) as fast as possible and our backend results near realtime (seconds from data coming in) here are some papers and presentations https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations On Mon, Jul 2, 2012 at 10:09 PM, Deno Vichas d...@syncopated.net wrote: is anybody using kafka? what other options is there? currently i need to do around 50,000 (is that a lot?) a minute. On 7/1/2012 11:39 AM, aaron morton wrote: Using Cassandra as a queue is generally thought of as a bas idea, owing to the high delete workload. Levelled compaction handles it better but it is still no the best approach. Depending on your needs consider running http://incubator.apache.org/kafka/ could you share some details on this? we're using hector and we see random timeout warns in the logs and not sure how to address them. First determine if they are server side or client side timeouts. Then determine what the query was. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/06/2012, at 7:02 AM, Deno Vichas wrote: On 6/28/2012 9:37 AM, David Leimbach wrote: That coupled with Hector timeout issues became a real problem for us. could you share some details on this? we're using hector and we see random timeout warns in the logs and not sure how to address them. thanks, deno -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
Re: node.js library?
Thanks Eric! On Wed, Dec 7, 2011 at 8:37 AM, Eric Evans eev...@acunu.com wrote: On Mon, Dec 5, 2011 at 8:26 AM, Joe Stein crypt...@gmail.com wrote: Hey folks, so I have been noodling on using node.js as a new front end for the system I built for doing real time aggregate metrics within our distributed systems. Does anyone have experience or background story on this lib? http://code.google.com/a/apache-extras.org/p/cassandra-node/ it seems to be the most up to date one supporting CQL only (which should not be an issue) but was not sure if it is maintained or what the background story is on it and such? Any other experiences/horror stories/over the rainbow type stories with node.js C* would be nice to hear. This one is actively maintained, and (as far as I know) is being used in production at Rackspace. -- Eric Evans Acunu | http://www.acunu.com | @acunu -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
Re: CQL Install for 0.8.X?
Thanks Eric! On Mon, Dec 5, 2011 at 10:38 PM, Eric Evans eev...@acunu.com wrote: On Mon, Dec 5, 2011 at 1:40 PM, Joe Stein crypt...@gmail.com wrote: Hey, trying to grab cqlsh for a 0.8.6 cluster but all the online docs I am finding are pointing to http://www.apache.org/dist/cassandra/drivers/pyor say it moved to the source and checked there in the 0.8 branch and nothing either... Right, the drivers were moved and the site wasn't updated (until just now). The Python driver is now hosted on Apache Extras, here: http://code.google.com/a/apache-extras.org/hosting/search?q=label:cql But, that driver probably won't work right with 0.8.6, and the shell has been moved out of the driver and into Cassandra trunk anyway. What you probably want is 1.0.3 from here: http://archive.apache.org/dist/cassandra/drivers/py Sorry, I know, it's a confusing mess. also saw something about about 2.0 not being compatible with 0.8.X so not sure where to go from here. The language incompatibilities are pretty minor, but there were some changes to the results format that will prevent a new driver (for 1.x) from working on an older Cassandra (0.8.x). Let me know, want/need to jump into a bunch of CQL stuff and want to-do it on the cqlsh first if I can. -- Eric Evans Acunu | http://www.acunu.com | @acunu -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
node.js library?
Hey folks, so I have been noodling on using node.js as a new front end for the system I built for doing real time aggregate metrics within our distributed systems. Does anyone have experience or background story on this lib? http://code.google.com/a/apache-extras.org/p/cassandra-node/ it seems to be the most up to date one supporting CQL only (which should not be an issue) but was not sure if it is maintained or what the background story is on it and such? Any other experiences/horror stories/over the rainbow type stories with node.js C* would be nice to hear. /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
CQL Install for 0.8.X?
Hey, trying to grab cqlsh for a 0.8.6 cluster but all the online docs I am finding are pointing to http://www.apache.org/dist/cassandra/drivers/py or say it moved to the source and checked there in the 0.8 branch and nothing either... also saw something about about 2.0 not being compatible with 0.8.X so not sure where to go from here. Let me know, want/need to jump into a bunch of CQL stuff and want to-do it on the cqlsh first if I can. Thanks!!! /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
Counter Experience (Performance)?
Hey folks, I am interested in what others have seen in regards to their experience in the amount of depth and width (CF, Rows Columns) that they can/do write per batch and simultaneously and what is the inflection point where performance degrades. I have been expanding my use of counters and am finding some interesting nuances some in my code and implementation related but others I can't yet quantify. My batches are 1x5x5 (1 row for each of 5 column families and 5 columns for each of those 1 rows within each of the 5 column families). I have 3 nodes each with 100 connections and another thread pool of 100 threads rolling through 6,000,000 rows off data sending data out to Cassandra (the 1x5x5 matrice is constructed from each line). I am finding this to be my sweet spot right now but still not really performing fantastically (or at least what I had hoped) and I am wondering what else (if anything) I can be doing to tweak settings or what to be able to push in more columns or rows. I find changing my pool settings very much froms this causes error on client lib but I will send email to that list separately though I think I have that figured out on my own for now. Thanks in advance!!! I hope to get more work going on this in the next day or so in a more methodic way to find the right count so I can build a sparse matrice that will perform best for system and business. /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
Re: Counter Experience (Performance)?
Thanks Jake, bottleneck is the disk I believe each write is taking 50ms, EBS probably (doing testing in ec2). I will move my testing over to our production network and run it on some nodes on some real hardware since that where it will end up. I am seeing things slow down linearly and nothing dropping off precipitously. Glad to have the benchmarks I have good to compare things. Thanks! On Thu, Oct 27, 2011 at 11:30 AM, Jake Luciani jak...@gmail.com wrote: What's your bottleneck? http://spyced.blogspot.com/2010/01/linux-performance-basics.html On Thu, Oct 27, 2011 at 9:37 AM, Joe Stein crypt...@gmail.com wrote: Hey folks, I am interested in what others have seen in regards to their experience in the amount of depth and width (CF, Rows Columns) that they can/do write per batch and simultaneously and what is the inflection point where performance degrades. I have been expanding my use of counters and am finding some interesting nuances some in my code and implementation related but others I can't yet quantify. My batches are 1x5x5 (1 row for each of 5 column families and 5 columns for each of those 1 rows within each of the 5 column families). I have 3 nodes each with 100 connections and another thread pool of 100 threads rolling through 6,000,000 rows off data sending data out to Cassandra (the 1x5x5 matrice is constructed from each line). I am finding this to be my sweet spot right now but still not really performing fantastically (or at least what I had hoped) and I am wondering what else (if anything) I can be doing to tweak settings or what to be able to push in more columns or rows. I find changing my pool settings very much froms this causes error on client lib but I will send email to that list separately though I think I have that figured out on my own for now. Thanks in advance!!! I hope to get more work going on this in the next day or so in a more methodic way to find the right count so I can build a sparse matrice that will perform best for system and business. /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */ -- http://twitter.com/tjake -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
Skeletor = Scala wrapper of Hector for Cassandra
Hey folks, I pushed my Scala wrapper of Hector for Cassandra https://github.com/joestein/skeletor It not only gets Cassandra hooked into your Scala projects quick and simple but does so in a functional way. It is not a new library interface for Cassandra because Hector is a great library as is. Instead, Skeletor implements Hector so you always have the best of breed under the hood for using Cassandra while also leveraging all of the benefits that Scala offers over Java (ok that was so many buzz words in one sentence that I just vomited in my mouth a little bit, but it is all true). Right now the examples are in the test specs for reading writing (both Counter UTF8 type column families). Basically it is a DSL: //for writing val TestColumnFamily = FixtureTestSkeletor \ TestColumnFamily //define your Keyspace \ ColumnFamily var cv = (TestColumnFamily - rowKey has columnName of columnValue) //create a column value for a row for this column family var rows:Rows = Rows(cv) //add the row to the rows object rows add (TestColumnFamily - rowKey has anotherColumnName of anotherColumnValue) //and add another row Cassandra rows //takes care of all the batch mutate for ya //and for reading def processRow(r:String, c:String, v:String) = { println(r= + r + c= + c + with + v) //whatever you want to do } def sets(mgsq: MultigetSliceQuery[String, String, String]) { mgsq.setKeys(columnName) //we want to pull out the row key we just put into Cassandra mgsq.setColumnNames(columnValue) //and just this column } TestColumnFamily (sets, processRow) //get data out of Cassandra and process it functionally I will put more up on the wiki and also post more examples of where/how I have been using it and will evolve it as I go. Again, for now, the test specs are the place to start https://github.com/joestein/skeletor/blob/master/src/test/scala/skeletor/SkeletorSpec.scala Thanx =) Joestein /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */
Re: Cassandra Certification
Certification is good when a community gets to the point that proverbial management cannot easily discern between posers and those that know what they are talking about. I hope one day Cassandra and it's community grows to that point but as of now there is enough transparency in my opinion. I would no more get a Cassandra certification than I would get one from Cloudera for Hadoop (no offense) nor even a CISSP (which I could do also). I would rather see a certification in scalable distributed computing solutions paramount to what the CSA (Cloud Security Alliance) has done with security. Cassandra is the answer in a lot of situations, but not always the answer. It is probably one of the best tools in your toolbox. As the saying goes = a man with a hammer every problem is a nail, DON'T BE THAT GUY. My .02121513E9 cents /* Joe Stein Chief Architect @medialets http://www.medialets.com http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop */ On Mon, Aug 15, 2011 at 1:23 AM, samal sa...@wakya.in wrote: Does it really make sense? If yes, I think Apache Cassandra Project (ASF) should offer Open Certification. Other entity can offer courses, training materials.