Re: How to speed up SELECT * query in Cassandra

2015-02-16 Thread mck
Could you please share how much data you store on the cluster and what is HW configuration of the nodes? These nodes are dedicated HW, 24 cpu and 50Gb ram. Each node has a few TBs of data (you don't want to go over this) in raid50 (we're migrating over to JBOD). Each c* node is running

Re: How to speed up SELECT * query in Cassandra

2015-02-14 Thread mck
Jirka, But I am really interested how it can work well with Spark/Hadoop where you basically needs to read all the data as well (as far as I understand that). I can't give you any benchmarking between technologies (nor am i particularly interested in getting involved in such a discussion)

Re: How to speed up SELECT * query in Cassandra

2015-02-13 Thread Jens Rantil
Subject: Re: How to speed up SELECT * query in Cassandra I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same question below (how do I get a lot of data out of cassandra?), and my question is always, why arent you updating both places at once

Re: How to speed up SELECT * query in Cassandra

2015-02-12 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Thanks Jirka! From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra Hi, here are some snippets of code in scala which should get you started. Jirka H. loop { lastRow = val query = lastRow

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jiri Horky
@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same question below (how do I get a lot of data out of cassandra?), and my question is always, why arent

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread DuyHai Doan
used a temporary CF in Cassandra to store intermediate results? From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same question below (how do I get a lot of data out

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Colin
if a map / reduce job used a temporary CF in Cassandra to store intermediate results? From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same question below (how do I get

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Colin
, but no one takes advantage on that. What if a map / reduce job used a temporary CF in Cassandra to store intermediate results? From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra I use spark with cassandra, and you dont need DSE. I see a lot of people ask

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jiri Horky
Hi, here are some snippets of code in scala which should get you started. Jirka H. loop {lastRow =val query = lastRow match {case Some(row) = nextPageQuery(row, upperLimit)case None = initialQuery(lowerLimit)}session.execute(query).all} private def nextPageQuery(row: Row, upperLimit: String):

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Colin
I use spark with cassandra, and you dont need DSE. I see a lot of people ask this same question below (how do I get a lot of data out of cassandra?), and my question is always, why arent you updating both places at once? For example, we use hadoop and cassandra in conjunction with each other,

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jens Rantil
On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: If you use Cassandra enterprise, you can use hive, AFAIK. Even better, you can use Spark/Shark with DSE. Cheers, Jens -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jiri Horky
The fastest way I am aware of is to do the queries in parallel to multiple cassandra nodes and make sure that you only ask them for keys they are responsible for. Otherwise, the node needs to resend your query which is much slower and creates unnecessary objects (and thus GC pressure). You can

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread DuyHai Doan
or hadoop does, the shuffling, could be done out of the box with Cassandra, but no one takes advantage on that. What if a map / reduce job used a temporary CF in Cassandra to store intermediate results? From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra I

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Ja Sam
Your answer looks very promising How do you calculate start and stop? On Wed, Feb 11, 2015 at 12:09 PM, Jiri Horky ho...@avast.com wrote: The fastest way I am aware of is to do the queries in parallel to multiple cassandra nodes and make sure that you only ask them for keys they are