from:"Joe Stein"

Re: Experience with Kubernetes

2016-04-14 Thread Joe Stein

You can do that with the Mesos scheduler
https://github.com/elodina/datastax-enterprise-mesos and layout clusters
and racks for datacenters based on attributes
http://mesos.apache.org/documentation/latest/attributes-resources/

~ Joestein
On Apr 14, 2016 12:05 PM, "Nate McCall"  wrote:

>
> Does anybody here have any experience, positive or negative, with
>> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
>> immediate need (or experience), but I am curious about the pros and cons.
>>
>>
>
> The last time I played around with kubernetes+cassandra, you could not
> specify node allocations across failure boundaries (AZs, Regions, etc).
>
> To me, that makes it not interesting outside of development or trivial
> setups.
>
> It does look like they are getting farther along on "ubernetes" which
> should fix this:
>
> https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md
>
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Experience with Kubernetes

2016-04-14 Thread Joe Stein

You can use Mesos https://github.com/elodina/datastax-enterprise-mesos

~ Joestein
On Apr 14, 2016 10:13 AM, "Jack Krupansky"  wrote:

> Does anybody here have any experience, positive or negative, with
> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
> immediate need (or experience), but I am curious about the pros and cons.
>
> There is an example here:
> https://github.com/kubernetes/kubernetes/tree/master/examples/cassandra
>
> Is there a better approach to deploying a Cassandra/DSE cluster than
> Kubernetes?
>
> Thanks.
>
> -- Jack Krupansky
>

Re: Scala Driver?

2014-03-15 Thread Joe Stein

Here is an example wrapper how to use the DataStax java driver in scala 
https://github.com/stealthly/scala-cassandra


/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


 On Mar 15, 2014, at 1:42 PM, NORD SC jan.algermis...@nordsc.com wrote:
 
 Hi all,
 
 I am building a system using the Play 2 Framework with Scala and wonder what 
 driver I should use or whether I should wrap the Datastax java-driver myself?
 
 Can you share any experience with the available drivers? I have looked a 
 Phantom briefly which seems to use java-driver internally - would that be the 
 best choice?
 
 Jan

Re: Queuing System

2014-02-22 Thread Joe Stein

If performance and availability for messaging is a requirement then use Apache 
Kafka http://kafka.apache.org/

You can pass the same thrift/avro objects through the Kafka commit log or 
strings or whatever you want.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote:

 Hi Michael,
 
 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
 will give better performance if writing directly into Rabbit with Ack support 
 Vs a temporary Queue in Cassandra first and then dequeue and publish in 
 Rabbit.
 
 Complexities involving - Handling scenarios like Rabbit Connection failure 
 etc Vs Cassandra write performance and replication with hinted handoff 
 support etc, makes me wonder if this is a better path.
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing 
 michael.la...@nytimes.com wrote  
 
 We use RabbitMQ for queuing and Cassandra for persistence.
 
 RabbitMQ with clustering and/or federation should meet your high availability 
 needs.
 
 Michael
 
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Jagan 
 
  Queue-like data structures are known to be one of the worst anti 
   patterns for Cassandra:  
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 
 
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:
 Hi,
 
 I need to decouple some of the work being processed from the user thread to 
 provide better user experience. For that I need a queuing system with the 
 following needs,
 High Availability
 No Data Loss
 Better Performance.
 Following are some libraries that were considered along with the limitation I 
 see,
 Redis - Data Loss
 ZooKeeper - Not advised for Queue system.
 TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
 With replication requirement, I probably have to look at Apache 
 ActiveMQ+LevelDB.
 After checking on the third option above, I kind of wonder if Cassandra with 
 Leveled Compaction offer a similar system. Do you see any issues in such a 
 usage or is there other   better solutions available.
 
 Will be great to get insights on this.
 
 Regards,
 Jagan

Re: Queuing System

2014-02-22 Thread Joe Stein

Without them you have no durability.  

With them you have guarantees... More than any other system with messaging 
features.  It is a durable CP commit log.  Works very well for data pipelines 
with AP systems like Cassandra which is a different system solving different 
problems.  When a Kafka leader fails you right might block and wait for 10ms 
while a new leader is elected but writes can be guaranteed.

The consumers then read and process data and write to Cassandra. And then have 
your app read from Cassandra for what what was processed.

These are very typical type architectures at scale 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Feb 22, 2014, at 11:49 AM, Jagan Ranganathan ja...@zohocorp.com wrote:

 Hi Joe,
 
 If my understanding is right, Kafka does not satisfy the high 
 availability/replication part well because of the need for leader and In-Sync 
 replicas. 
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 22:02:27 +0530 Joe Steincrypt...@gmail.com wrote 
  
 
 If performance and availability for messaging is a requirement then use 
 Apache Kafka http://kafka.apache.org/
 
 You can pass the same thrift/avro objects through the Kafka commit log or 
 strings or whatever you want.
 
 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 /
 
 
 On Feb 22, 2014, at 11:13 AM, Jagan Ranganathan ja...@zohocorp.com wrote:
 
 Hi Michael,
 
 Yes I am planning to use RabbitMQ for my messaging system. But I wonder which 
 will give better performance if writing directly into Rabbit with Ack support 
 Vs a temporary Queue in Cassandra first and then dequeue and publish in 
 Rabbit.
 
 Complexities involving - Handling scenarios like Rabbit Connection failure 
 etc Vs Cassandra write performance and replication with hinted handoff 
 support etc, makes me wonder if this is a better path.
 
 Regards,
 Jagan
 
  On Sat, 22 Feb 2014 21:01:14 +0530 Michael Laing 
 michael.la...@nytimes.com wrote  
 
 We use RabbitMQ for queuing and Cassandra for persistence.
 
 RabbitMQ with clustering and/or federation should meet your high availability 
 needs.
 
 Michael
 
 
 On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Jagan 
 
  Queue-like data structures are known to be one of the worst anti patterns 
 for Cassandra:  
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 
 
 
 On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan ja...@zohocorp.com wrote:
 Hi,
 
 I need to decouple some of the work being processed from the user thread to 
 provide better user experience. For that I need a 
 queuing system with the following needs,
 High Availability
 No Data Loss
 Better Performance.
 Following are some libraries that were considered along with the limitation I 
 see,
 Redis - Data Loss
 ZooKeeper - Not advised for Queue system.
 TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be performing better. 
 With replication requirement, I probably have to look at Apache 
 ActiveMQ+LevelDB.
 After checking on the third option above, I kind of wonder if Cassandra with 
 Leveled Compaction offer a similar system. Do you see any issues in such a 
 usage or is there other better solutions available.
 
 Will be great to get insights on this.
 
 Regards,
 Jagan

Using tab in CQL COPY DELIMITER

2014-01-04 Thread Joe Stein

Hi, trying to use a tab delimiter when copying out of c* (2.0.4) and
getting an error

cqlsh:test CREATE TABLE airplanes (
   ...   name text PRIMARY KEY,
   ...   manufacturer ascii,
   ...   year int,
   ...   mach float
   ... );
cqlsh:bombast INSERT INTO airplanes   (name, manufacturer, year, mach)
VALUES ('P38-Lightning', 'Lockheed', 1937, 7);
cqlsh:bombast COPY airplanes (name, manufacturer, year, mach) TO
'temp.tsv' WITH DELIMITER = '\t';
delimiter must be an 1-character string

any ideas how to use tabs as a delimiter?  Thanks

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.

2013-12-29 Thread Joe Stein

I updated my repo with Vagrant and bash scripts to install Cassandra 2.0.3
https://github.com/stealthly/scala-cassandra/

0) git clone https://github.com/stealthly/scala-cassandra
1) cd scala-cassandra
2) vagrant up

Cassandra will be running in the virtual machine on 172.16.7.2 and is
accessible from your host machine (cqlsh, your app, whatever).

To verify step 3 would be ./sbt test just to make sure everything is
running right.

Everyone time you rebuild the VM (takes a minute or two) it is a whole new
instance. If you fork foreground you have to worry about data and that not
isolated and other stuff.

On Fri, Dec 27, 2013 at 10:48 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

I think i will invest the time launching cassandra in a forked forground
process, maybe building the yaml dynamically.

On Friday, December 27, 2013, Nate McCall n...@thelastpickle.com wrote:
I've also moved on to container-based (using Vagrant+docker) setup for
doing automated integration stuff. This is more difficult to configure for
build systems like Jenkins, but it can be done and once completed the
benefits are substantial - as Joe notes, the most immediate is the removal
of variance between different environments.
However, for in process testing with Maven or similar, the Usergrid
project [0] probably has the most functionally advanced test architecture
[1]. Do understand that it took us a very long time to get there and
involves some fairly tight integration with JUnit and (to a lesser degree)
maven.
The UG plumbing is purpose built towards a specific data model so it's
not something that can be just dropped in, but it can be pulled apart in a
straight forward way (provided you understand JUnit - which is not really
trivial) and generalized pretty easily. It's all ASF-licensed, so take what
you need if you find it useful.
[0] https://usergrid.incubator.apache.org/
[1]
https://github.com/usergrid/usergrid/blob/master/stack/test-utils/src/main/java/org/usergrid/cassandra/CassandraResource.java

On Wed, Dec 25, 2013 at 2:42 PM, Joe Stein crypt...@gmail.com wrote:

I have been using vagrant (e.g.
https://github.com/stealthly/scala-cassandra/ ) which is 100%
reproducible across devs and test systems (prod in some cases). Also have
a Docker setup too https://github.com/pegasussolutions/docker-cassandra .
I have been doing this more and more with clients to better mimic
production before production and smoothing the release process from
development. I also use packer (scripts released soon) to build images too
(http://packer.io)
Love vagrant, packer and docker!!! Apache Mesos too :)

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
/

On Dec 25, 2013, at 3:28 PM, horschi hors...@gmail.com wrote:

Hi Ed,

my opinion on unit testing with C* is: Use the real database, not any
embedded crap :-)

All you need are fast truncates, by which I mean:
JVM_OPTS=$JVM_OPTS -Dcassandra.unsafesystem=true
and
auto_snapshot: false

This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0
yet).

Imho this setup is better for multiple reasons:
- No extra classpath issues
- Faster: Running JUnits and C* in one JVM would require a really large
heap (for me at least).
- Faster: No Cassandra startup everytime I run my tests.

The only downside is that developers must change the properties in their
configs.

cheers,
Christian

On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:

I am not sure there how many people have been around developing
Cassandra for as long as I have, but the state of all the client libraries
and the cassandra server is WORD_I_DONT_WANT_TO_SAY.
Here is an example of something I am seeing:
ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main]
java.lang.AbstractMethodError:
org.apache.thrift.ProcessFunction.isOneway()Z
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99
In short: If you are new to cassandra and only using the newest client I
am sure everything is peachy for you.
For people that have been using Cassandra for a while it is harder to
jump ship when something better comes along. You need sometimes to
support both hector and astyanax, it happens.
For a while I have been using hector. Even not to use hector

Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.

2013-12-25 Thread Joe Stein

I have been using vagrant (e.g. https://github.com/stealthly/scala-cassandra/ )
which is 100% reproducible across devs and test systems (prod in some cases).
Also have a Docker setup too
https://github.com/pegasussolutions/docker-cassandra . I have been doing this
more and more with clients to better mimic production before production and
smoothing the release process from development. I also use packer (scripts
released soon) to build images too (http://packer.io)

Love vagrant, packer and docker!!! Apache Mesos too :)

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
/

On Dec 25, 2013, at 3:28 PM, horschi hors...@gmail.com wrote:

Hi Ed,

my opinion on unit testing with C* is: Use the real database, not any
embedded crap :-)

All you need are fast truncates, by which I mean:
JVM_OPTS=$JVM_OPTS -Dcassandra.unsafesystem=true
and
auto_snapshot: false

This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0 yet).

Imho this setup is better for multiple reasons:
- No extra classpath issues
- Faster: Running JUnits and C* in one JVM would require a really large heap
(for me at least).
- Faster: No Cassandra startup everytime I run my tests.

The only downside is that developers must change the properties in their
configs.

cheers,
Christian

On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
I am not sure there how many people have been around developing Cassandra for
as long as I have, but the state of all the client libraries and the
cassandra server is WORD_I_DONT_WANT_TO_SAY.

Here is an example of something I am seeing:
ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main]
java.lang.AbstractMethodError: org.apache.thrift.ProcessFunction.isOneway()Z
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99

In short: If you are new to cassandra and only using the newest client I am
sure everything is peachy for you.

For people that have been using Cassandra for a while it is harder to jump
ship when something better comes along. You need sometimes to support both
hector and astyanax, it happens.

For a while I have been using hector. Even not to use hector as an API, but
the one nice thing I got from hector was a simple EmbeddedServer that would
clean up after itself. Hector seems badly broken at the moment. I have no
idea how the current versions track with anything out there in the cassandra
world.

For a while I played with https://github.com/Netflix/astyanax, which has it's
own version and schemes and dependent libraries. (astyanax has some packaging
error that forces me into maven3)

Enter cassandra 2.0 which forces you into java 0.7. Besides that it has it's
own kit of things it seems to want.

I am guessing since hectors embedded server does not work, and I should go to
https://github.com/jsevellec/cassandra-unit not sure...really...how anyone
does this anymore. I am sure I could dive into the source code and figure
this out, but I would just rather have a stable piece of code that brings up
the embedded server that just works and continues working.

I can not seem to get this working right either. (since it includes hector I
see from the pom)

Between thrift, cassandra,client x, it is almost impossible to build a sane
classpath, and that is not even counting the fact that people have their own
classpath issues (with guava mismatches etc).

I think the only sane thing to do is start shipping cassandra-embedded like
this:

https://github.com/kstyrc/embedded-redis

In other words package embedded-cassandra as a binary. Don't force the
client/application developer to bring cassandra on the classpath and fight
with mismatches in thrift/guava etc. That or provide a completely shaded
cassandra server for embedded testing. As it stands now trying to support a
setup that uses more than one client or works with multiple versions of
cassandra is major pita. (aka library x compiled against 1.2.0 library y
compiled against 2.0.3)

Does anyone have any thoughts on this, or tried something similar?

Edward

Re: Cassandra book/tuturial

2013-10-27 Thread Joe Stein

http://www.planetcassandra.org has a lot of great resources on it.

Eben Hewitt's book is great, as are the other C* books like the High
Performance Cookbook
http://www.amazon.com/Cassandra-Performance-Cookbook-Edward-Capriolo/dp/1849515123

I would recommend reading both of those books.  You can also read
http://www.datastax.com/dev/blog/thrift-to-cql3 to help understandings.

From there go with CQL http://cassandra.apache.org/doc/cql3/CQL.html

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


On Sun, Oct 27, 2013 at 11:58 PM, Mohan L l.mohan...@gmail.com wrote:

 And here also good intro: http://10kloc.wordpress.com/category/nosql-2/

 Thanks
 Mohan L


 On Mon, Oct 28, 2013 at 8:02 AM, Danie Viljoen dav...@gmail.com wrote:

 Not a book, but I think this is a good start:
 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html


 On Mon, Oct 28, 2013 at 3:14 PM, Dave Brosius 
 dbros...@mebigfatguy.comwrote:

  Unfortunately, as tech books tend to be, it's quite a bit out of date,
 at this point.




 On 10/27/2013 09:54 PM, Mohan L wrote:




 On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.comwrote:

   Hey Guys,

  What is the best book to learn Cassandra from scratch?

  Thanks in advance,
  Erwin


 Hi,

 Buy :

 Cassandra: The Definitive Guide By Eben Hewitt :
 http://shop.oreilly.com/product/0636920010852.do

  Thanks
  Mohan L

Re: Cassandra book/tuturial

2013-10-27 Thread Joe Stein

Reading previous version's documentation and related information from that
time in the past (like books) has value!  It helps to understand decisions
that were made and changed and some that are still the same like
Secondary Indexes which were introduced in 0.7 when
http://www.amazon.com/Cassandra-Definitive-Guide-Eben-Hewitt/dp/1449390412came
out back in 2011.

If you are really just getting started then I say go and start here
http://www.planetcassandra.org/

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


On Mon, Oct 28, 2013 at 12:15 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:

 With lot of enthusiasm i started reading it. Its out-dated, error prone. I
 could not even get Cassandra running from that book. Eventually i could not
 get start with cassandra.


 On Mon, Oct 28, 2013 at 9:41 AM, Joe Stein crypt...@gmail.com wrote:

 http://www.planetcassandra.org has a lot of great resources on it.

 Eben Hewitt's book is great, as are the other C* books like the High
 Performance Cookbook
 http://www.amazon.com/Cassandra-Performance-Cookbook-Edward-Capriolo/dp/1849515123

 I would recommend reading both of those books.  You can also read
 http://www.datastax.com/dev/blog/thrift-to-cql3 to help understandings.

 From there go with CQL http://cassandra.apache.org/doc/cql3/CQL.html

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /


 On Sun, Oct 27, 2013 at 11:58 PM, Mohan L l.mohan...@gmail.com wrote:

 And here also good intro: http://10kloc.wordpress.com/category/nosql-2/

 Thanks
 Mohan L


 On Mon, Oct 28, 2013 at 8:02 AM, Danie Viljoen dav...@gmail.com wrote:

 Not a book, but I think this is a good start:
 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html


 On Mon, Oct 28, 2013 at 3:14 PM, Dave Brosius dbros...@mebigfatguy.com
  wrote:

  Unfortunately, as tech books tend to be, it's quite a bit out of
 date, at this point.




 On 10/27/2013 09:54 PM, Mohan L wrote:




 On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.comwrote:

   Hey Guys,

  What is the best book to learn Cassandra from scratch?

  Thanks in advance,
  Erwin


 Hi,

 Buy :

 Cassandra: The Definitive Guide By Eben Hewitt :
 http://shop.oreilly.com/product/0636920010852.do

  Thanks
  Mohan L









 --
 Deepak

Re: Cassandra Geospatial Search

2013-02-13 Thread Joe Stein

what about using geo hashes http://geohash.org/dr5ru2mevjppe

store as column names the geo hashes

geohash#dr5ru2mevjppe
geohash#dr5ru2mevjpp
geohash#dr5ru2mevjp
geohash#dr5ru2mevj
geohash#dr5ru2mev
geohash#dr5ru2me
geohash#dr5ru2m
geohash#dr5ru2
geohash#dr5ru
geohash#dr5

the rows is what you want to return

do a MultigetSliceQuery like this
https://github.com/joestein/skeletor/blob/master/src/test/scala/skeletor/SkeletorSpec.scala#L171

the column value you can hold some json objects or more serialization on
relationships from there, maybe persisted graph structure

here are my slides on how we do this and what for
http://files.meetup.com/1794037/jstein.meetup.cassandra2002.pptx

On Wed, Feb 13, 2013 at 8:42 PM, Drew Kutcharian d...@venarc.com wrote:

 Hi Guys,

 Has anyone on this mailing list tried to build a bounding box style (get
 the records inside a known bounding box) geospatial search? I've been
 researching this a bit and seems like the only attempt at this was by
 SimpleGeo guys, but there isn't much public info out there on how they did
 it besides the a video.

 -- Drew




-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

Re: hector timeouts

2012-07-02 Thread Joe Stein

lots of folks use Apache Kafka, check out
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By just to name a
few

you can read about the performance for yourself
http://incubator.apache.org/kafka/performance.html

@ http://www.medialets.com we use Kafka upstream of Cassandra acting like a
queue so our workers can do their business logic prior to storing their
results in Cassandra, Hadoop  MySQL

this decouples our backend analytics from our forwarding facing system
keeping our forward facing system (ad serving to mobile devices) as fast as
possible and our backend results near realtime (seconds from data coming in)

here are some papers and presentations
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations

On Mon, Jul 2, 2012 at 10:09 PM, Deno Vichas d...@syncopated.net wrote:

  is anybody using kafka?  what other options is there?  currently i need
 to do around 50,000 (is that a lot?) a minute.


 On 7/1/2012 11:39 AM, aaron morton wrote:

 Using Cassandra as a queue is generally thought of as a bas idea, owing to
 the high delete workload. Levelled compaction handles it better but it is
 still no the best approach.

  Depending on your needs consider running
 http://incubator.apache.org/kafka/

   could you share some details on this?  we're using hector and we see
 random timeout warns in the logs and not sure how to address them.

 First determine if they are server side or client side timeouts. Then
 determine what the query was.

  Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 29/06/2012, at 7:02 AM, Deno Vichas wrote:

  On 6/28/2012 9:37 AM, David Leimbach wrote:


  That coupled with Hector timeout issues became a real problem for us.


 could you share some details on this?  we're using hector and we see
 random timeout warns in the logs and not sure how to address them.


 thanks,
 deno







-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

Re: node.js library?

2011-12-07 Thread Joe Stein

Thanks Eric!

On Wed, Dec 7, 2011 at 8:37 AM, Eric Evans eev...@acunu.com wrote:

 On Mon, Dec 5, 2011 at 8:26 AM, Joe Stein crypt...@gmail.com wrote:
  Hey folks, so I have been noodling on using node.js as a new front end
 for
  the system I built for doing real time aggregate metrics within our
  distributed systems.
 
  Does anyone have experience or background story on this
  lib? http://code.google.com/a/apache-extras.org/p/cassandra-node/ it
 seems
  to be the most up to date one supporting CQL only (which should not be an
  issue) but was not sure if it is maintained or what the background story
 is
  on it and such?
 
  Any other experiences/horror stories/over the rainbow type stories with
  node.js  C* would be nice to hear.

 This one is actively maintained, and (as far as I know) is being used
 in production at Rackspace.

 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu




-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

Re: CQL Install for 0.8.X?

2011-12-06 Thread Joe Stein

Thanks Eric!

On Mon, Dec 5, 2011 at 10:38 PM, Eric Evans eev...@acunu.com wrote:

 On Mon, Dec 5, 2011 at 1:40 PM, Joe Stein crypt...@gmail.com wrote:
  Hey, trying to grab cqlsh for a 0.8.6 cluster but all the online docs I
 am
  finding are pointing to http://www.apache.org/dist/cassandra/drivers/pyor
  say it moved to the source and checked there in the 0.8 branch and
 nothing
  either...

 Right, the drivers were moved and the site wasn't updated (until just
 now).  The Python driver is now hosted on Apache Extras, here:

 http://code.google.com/a/apache-extras.org/hosting/search?q=label:cql

 But, that driver probably won't work right with 0.8.6, and the shell
 has been moved out of the driver and into Cassandra trunk anyway.
 What you probably want is 1.0.3 from here:

 http://archive.apache.org/dist/cassandra/drivers/py

 Sorry, I know, it's a confusing mess.

  also saw something about about 2.0 not being compatible with 0.8.X
  so not sure where to go from here.

 The language incompatibilities are pretty minor, but there were some
 changes to the results format that will prevent a new driver (for 1.x)
 from working on an older Cassandra (0.8.x).

  Let me know, want/need to jump into a bunch of CQL stuff and want to-do
 it
  on the cqlsh first if I can.

 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu




-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

node.js library?

2011-12-05 Thread Joe Stein

Hey folks, so I have been noodling on using node.js as a new front end for
the system I built for doing real time aggregate metrics within our
distributed systems.

Does anyone have experience or background story on this lib?
http://code.google.com/a/apache-extras.org/p/cassandra-node/ it seems to be
the most up to date one supporting CQL only (which should not be an issue)
but was not sure if it is maintained or what the background story is on it
and such?

Any other experiences/horror stories/over the rainbow type stories with
node.js  C* would be nice to hear.

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

CQL Install for 0.8.X?

2011-12-05 Thread Joe Stein

Hey, trying to grab cqlsh for a 0.8.6 cluster but all the online docs I am
finding are pointing to http://www.apache.org/dist/cassandra/drivers/py or
say it moved to the source and checked there in the 0.8 branch and nothing
either... also saw something about about 2.0 not being compatible with
0.8.X so not sure where to go from here.

Let me know, want/need to jump into a bunch of CQL stuff and want to-do it
on the cqlsh first if I can.

Thanks!!!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

Counter Experience (Performance)?

2011-10-27 Thread Joe Stein

Hey folks, I am interested in what others have seen in regards to their
experience in the amount of depth and width (CF, Rows  Columns) that they
can/do write per batch and simultaneously and what is the inflection point
where performance degrades.   I have been expanding my use of counters and
am finding some interesting nuances some in my code and implementation
related but others I can't yet quantify.

My batches are 1x5x5 (1 row for each of 5 column families and 5 columns for
each of those 1 rows within each of the 5 column families).  I have 3 nodes
each with 100 connections and another thread pool of 100 threads rolling
through 6,000,000 rows off data sending data out to Cassandra (the 1x5x5
matrice is constructed from each line).  I am finding this to be my sweet
spot right now but still not really performing fantastically (or at least
what I had hoped) and I am wondering what else (if anything) I can be doing
to tweak settings or what to be able to push in more columns or rows.   I
find changing my pool settings very much froms this causes error on client
lib but I will send email to that list separately though I think I have that
figured out on my own for now.

Thanks in advance!!!  I hope to get more work going on this in the next day
or so in a more methodic way to find the right count so I can build a sparse
matrice that will perform best for system and business.

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

Re: Counter Experience (Performance)?

2011-10-27 Thread Joe Stein

Thanks Jake, bottleneck is the disk I believe each write is taking 50ms, EBS
probably (doing testing in ec2).

I will move my testing over to our production network and run it on some
nodes on some real hardware since that where it will end up.

I am seeing things slow down linearly and nothing dropping
off precipitously.  Glad to have the benchmarks I have good to compare
things.  Thanks!

On Thu, Oct 27, 2011 at 11:30 AM, Jake Luciani jak...@gmail.com wrote:

 What's your bottleneck?
 http://spyced.blogspot.com/2010/01/linux-performance-basics.html


 On Thu, Oct 27, 2011 at 9:37 AM, Joe Stein crypt...@gmail.com wrote:

 Hey folks, I am interested in what others have seen in regards to their
 experience in the amount of depth and width (CF, Rows  Columns) that they
 can/do write per batch and simultaneously and what is the inflection point
 where performance degrades.   I have been expanding my use of counters and
 am finding some interesting nuances some in my code and implementation
 related but others I can't yet quantify.

 My batches are 1x5x5 (1 row for each of 5 column families and 5 columns
 for each of those 1 rows within each of the 5 column families).  I have 3
 nodes each with 100 connections and another thread pool of 100 threads
 rolling through 6,000,000 rows off data sending data out to Cassandra (the
 1x5x5 matrice is constructed from each line).  I am finding this to be my
 sweet spot right now but still not really performing fantastically (or at
 least what I had hoped) and I am wondering what else (if anything) I can be
 doing to tweak settings or what to be able to push in more columns or rows.
   I find changing my pool settings very much froms this causes error on
 client lib but I will send email to that list separately though I think I
 have that figured out on my own for now.

 Thanks in advance!!!  I hope to get more work going on this in the next
 day or so in a more methodic way to find the right count so I can build a
 sparse matrice that will perform best for system and business.

 /*
 Joe Stein
 http://www.linkedin.com/in/charmalloc
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 */




 --
 http://twitter.com/tjake




-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

Skeletor = Scala wrapper of Hector for Cassandra

2011-10-03 Thread Joe Stein

Hey folks, I pushed my Scala wrapper of Hector for Cassandra
https://github.com/joestein/skeletor

It not only gets Cassandra hooked into your Scala projects quick and simple
but does so in a functional way.

It is not a new library interface for Cassandra because Hector is a great
library as is.  Instead, Skeletor implements Hector so you always have the
best of breed under the hood for using Cassandra while also leveraging all
of the benefits that Scala offers over Java (ok that was so many buzz words
in one sentence that I just vomited in my mouth a little bit, but it is all
true).

Right now the examples are in the test specs for reading  writing (both
Counter  UTF8 type column families).

Basically it is a DSL:

//for writing
val TestColumnFamily = FixtureTestSkeletor \ TestColumnFamily //define
your Keyspace \ ColumnFamily
var cv = (TestColumnFamily - rowKey has columnName of columnValue)
//create a column value for a row for this column family
var rows:Rows = Rows(cv) //add the row to the rows object
rows add  (TestColumnFamily - rowKey has anotherColumnName of
anotherColumnValue) //and add another row
Cassandra  rows //takes care of all the batch mutate for ya

//and for reading
def processRow(r:String, c:String, v:String) = {
println(r= + r +   c= + c +  with + v) //whatever you want to do
}

def sets(mgsq: MultigetSliceQuery[String, String, String]) {
mgsq.setKeys(columnName) //we want to pull out the row key we just put
into Cassandra
 mgsq.setColumnNames(columnValue) //and just this column
}

TestColumnFamily  (sets, processRow) //get data out of Cassandra and
process it functionally

I will put more up on the wiki and also post more examples of where/how I
have been using it and will evolve it as I go.

Again, for now, the test specs are the place to start
https://github.com/joestein/skeletor/blob/master/src/test/scala/skeletor/SkeletorSpec.scala

Thanx =) Joestein

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/

Re: Cassandra Certification

2011-08-14 Thread Joe Stein

Certification is good when a community gets to the point that proverbial
management cannot easily discern between posers and those that know what
they are talking about.  I hope one day Cassandra and it's community grows
to that point but as of now there is enough transparency in my opinion.

I would no more get a Cassandra certification than I would get one from
Cloudera for Hadoop (no offense) nor even a CISSP (which I could do also).

I would rather see a certification in scalable distributed computing
solutions paramount to what the CSA (Cloud Security Alliance) has done with
security.  Cassandra is the answer in a lot of situations, but not always
the answer.  It is probably one of the best tools in your toolbox.

As the saying goes = a man with a hammer every problem is a nail, DON'T BE
THAT GUY.

My .02121513E9 cents

/*
Joe Stein
Chief Architect @medialets http://www.medialets.com
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
*/
On Mon, Aug 15, 2011 at 1:23 AM, samal sa...@wakya.in wrote:

 Does it really make sense?
 If yes, I think Apache Cassandra Project (ASF) should offer Open
 Certification. Other entity can offer courses, training materials.

Re: Experience with Kubernetes

Re: Experience with Kubernetes

Re: Scala Driver?

Re: Queuing System

Re: Queuing System

Using tab in CQL COPY DELIMITER

Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.

Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.

Re: Cassandra book/tuturial

Re: Cassandra book/tuturial

Re: Cassandra Geospatial Search

Re: hector timeouts

Re: node.js library?

Re: CQL Install for 0.8.X?

node.js library?

CQL Install for 0.8.X?

Counter Experience (Performance)?

Re: Counter Experience (Performance)?

Skeletor = Scala wrapper of Hector for Cassandra

Re: Cassandra Certification

20 matches

Site Navigation

Mail list logo

Footer information