Could you please share how much data you store on the cluster and what
is HW configuration of the nodes?
These nodes are dedicated HW, 24 cpu and 50Gb ram.
Each node has a few TBs of data (you don't want to go over this) in
raid50 (we're migrating over to JBOD).
Each c* node is running
Jirka,
But I am really interested how it can work well with Spark/Hadoop where
you basically needs to read all the data as well (as far as I understand
that).
I can't give you any benchmarking between technologies (nor am i
particularly interested in getting involved in such a discussion)
Subject: Re: How to speed up SELECT * query in Cassandra
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask this same question below (how do I get a lot
of data out of cassandra?), and my question is always, why arent you
updating both places at once
Thanks Jirka!
From: user@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
Hi,
here are some snippets of code in scala which should get you started.
Jirka H.
loop { lastRow = val query = lastRow
@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask this same question below (how do I
get a lot of data out of cassandra?), and my question is
always, why arent
used a temporary CF in Cassandra to store intermediate results?
From: user@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask this same question below (how do I get a lot of
data out
if a map / reduce job
used a temporary CF in Cassandra to store intermediate results?
From: user@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask this same question below (how do I get
, but no one takes advantage on that. What if a map /
reduce job used a temporary CF in Cassandra to store intermediate results?
From: user@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask
Hi,
here are some snippets of code in scala which should get you started.
Jirka H.
loop {lastRow =val query = lastRow match {case Some(row) =
nextPageQuery(row, upperLimit)case None =
initialQuery(lowerLimit)}session.execute(query).all}
private def nextPageQuery(row: Row, upperLimit: String):
I use spark with cassandra, and you dont need DSE.
I see a lot of people ask this same question below (how do I get a lot of data
out of cassandra?), and my question is always, why arent you updating both
places at once?
For example, we use hadoop and cassandra in conjunction with each other,
On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON)
mvallemil...@bloomberg.net wrote:
If you use Cassandra enterprise, you can use hive, AFAIK.
Even better, you can use Spark/Shark with DSE.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Email: jens.ran...@tink.se
The fastest way I am aware of is to do the queries in parallel to
multiple cassandra nodes and make sure that you only ask them for keys
they are responsible for. Otherwise, the node needs to resend your query
which is much slower and creates unnecessary objects (and thus GC pressure).
You can
or hadoop does, the shuffling, could be done out of the box
with Cassandra, but no one takes advantage on that. What if a map / reduce
job used a temporary CF in Cassandra to store intermediate results?
From: user@cassandra.apache.org
Subject: Re: How to speed up SELECT * query in Cassandra
I
Your answer looks very promising
How do you calculate start and stop?
On Wed, Feb 11, 2015 at 12:09 PM, Jiri Horky ho...@avast.com wrote:
The fastest way I am aware of is to do the queries in parallel to
multiple cassandra nodes and make sure that you only ask them for keys
they are
14 matches
Mail list logo