Re: Data synchronization between 2 running clusters on different availability zone

2014-12-01 Thread Jeremy Jongsma
Here's a snitch we use for this situation - it uses a property file if it
exists, but falls back to EC2 autodiscovery if it is missing.

https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java

On Mon, Dec 1, 2014 at 12:33 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Nov 27, 2014 at 1:24 AM, Spico Florin spicoflo...@gmail.com
 wrote:

   I have another question. What about the following scenario: two
 Cassandra instances installed on different cloud providers (EC2, Flexiant)?
 How do you synchronize them? Can you use some internal tools or do I have
 to implement my own mechanism?


 That's what I meant by if maybe hybrid in the future, use GPFS :


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html

 hybrid in this case means AWS-and-not-AWS.

 =Rob




EC2 SSD cluster costs

2014-08-19 Thread Jeremy Jongsma
The latest consensus around the web for running Cassandra on EC2 seems to
be use new SSD instances. I've not seen any mention of the elephant in
the room - using the new SSD instances significantly raises the cluster
cost per TB. With Cassandra's strength being linear scalability to many
terabytes of data, it strikes me as odd that everyone is recommending such
a large storage cost hike almost without reservation.

Monthly cost comparison for a 100TB cluster (non-reserved instances):

m1.xlarge (2x420 non-SSD): $30,000 (120 nodes)
m3.xlarge (2x40 SSD): $250,000 (1250 nodes! Clearly not an option)
i2.xlarge (1x800 SSD): $76,000 (125 nodes)

Best case, the cost goes up 150%. How are others approaching these new
instances? Have you migrated and eaten the costs, or are you staying on
previous generation until prices come down?


Best practices for frequently updated columns

2014-08-13 Thread Jeremy Jongsma
We are building a historical timeseries database for stocks and futures,
with trade prices aggregated into daily bars (open, high, low, close values
for the day). The latest bar for each instrument needs to be updated as new
trades arrive on the realtime data feeds. Depending on the trading volume
for an instrument, some columns will be updated multiple times per second.

I've read comments about frequent column updates causing compaction issues
with Cassandra. What is the recommended Cassandra configuration / best
practices for usage scenarios like this?


Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jeremy Jongsma
If your nodes are not actually evenly distributed across physical racks for
redundancy, don't use multiple racks.


On Tue, Aug 5, 2014 at 10:57 AM, DE VITO Dominique 
dominique.dev...@thalesgroup.com wrote:

 First, thanks for your answer.

  This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.

 IMHO, it's not a good enough condition.
 Let's use an example with RF=2

 N1/rack_1   N2/rack_1   N3/rack_1   N4/rack_2

 Here, you have RF= # of racks
 And due to NetworkTopologyStrategy, N4 will store *all* the cluster data,
 leading to a completely imbalanced cluster.

 IMHO, it happens when using nodes *or* vnodes.

 As well-balanced clusters with NetworkTopologyStrategy rely on carefully
 chosen token distribution/path along the ring *and* as tokens are
 randomly-generated with vnodes, my guess is that with vnodes and
 NetworkTopologyStrategy, it's better to define a single (logical) rack //
 due to carefully chosen tokens vs randomly-generated token clash.

 I don't see other options left.
 Do you see other ones ?

 Regards,
 Dominique




 -Message d'origine-
 De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la
 part de Jonathan Haddad
 Envoyé : mardi 5 août 2014 17:43
 À : user@cassandra.apache.org
 Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ?

 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.  For each token, replicas are chosen based on
 the strategy.  Essentially, you could have a wild imbalance in token
 ownership, but it wouldn't matter because the replicas would be distributed
 across the rest of the machines.


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

 On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique 
 dominique.dev...@thalesgroup.com wrote:
  Hi,
 
 
 
  My understanding is that NetworkTopologyStrategy does NOT play well
  with vnodes, due to:
 
  · Vnode = tokens are (usually) randomly generated (AFAIK)
 
  · NetworkTopologyStrategy = required carefully choosen tokens
 for
  all nodes in order to not to get a VERY unbalanced ring like in
  https://issues.apache.org/jira/browse/CASSANDRA-3810
 
 
 
  When playing with vnodes, is the recommendation to define one rack for
  the entire cluster ?
 
 
 
  Thanks.
 
 
 
  Regards,
 
  Dominique
 
 
 
 



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



Re: Authentication exception

2014-07-30 Thread Jeremy Jongsma
Yes, and all nodes have had at least two more scheduled repairs since then.
On Jul 30, 2014 1:47 AM, Or Sher or.sh...@gmail.com wrote:

 Did you ran a repair after changing replication factor for system_auth ?


 On Tue, Jul 29, 2014 at 5:48 PM, Jeremy Jongsma jer...@barchart.com
 wrote:

 This is still happening to me; is there anything else I can check? All
 nodes have NTP installed, all are in sync, all have open communication to
 each other. But usually first thing in the morning, I get this auth
 exception. A little while later, it starts working. I'm very puzzled.


 On Tue, Jul 22, 2014 at 8:53 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 Verified all clocks are in sync.


 On Mon, Jul 21, 2014 at 10:03 PM, Rahul Menon ra...@apigee.com wrote:

 I could you perhaps check your ntp?


 On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 I routinely get this exception from cqlsh on one of my clusters:

 cql.cassandra.ttypes.AuthenticationException:
 AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException:
 Operation timed out - received only 2 responses.')

 The system_auth keyspace is set to replicate X times given X nodes in
 each datacenter, and at the time of the exception all nodes are reporting
 as online and healthy. After a short period (i.e. 30 minutes), it will let
 me in again.

 What could be the cause of this?







 --
 Or Sher



Re: Measuring WAN replication latency

2014-07-30 Thread Jeremy Jongsma
The brute force way would be:

1) Make client connections to a node in each datacenter from your
monitoring tool.
2) Periodically write a row to one datacenter (at whatever consistency
level your application typically uses.)
3) Immediately query the other datacenter nodes for the same row key with
LOCAL_QUORUM consistency. If not found, execute query again immediately in
a loop.
4) Once the row is available, record time since initial write for that
datacenter.

DataStax folks: this actually seems like a useful metric for something
OpsCenter to track, since it is already doing active statistics collection.


On Wed, Jul 30, 2014 at 8:59 AM, Rahul Neelakantan ra...@rahul.be wrote:

 Rob,
 Any ideas you can provide on how to do this will be appreciated, we would
 like to build a latency monitoring tool/dashboard that shows how long it
 takes for data to get sent across various DCs.

 Rahul Neelakantan

 On Jul 29, 2014, at 8:53 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Jul 29, 2014 at 3:15 PM, Rahul Neelakantan ra...@rahul.be wrote:

 Does anyone know of a way to measure/monitor WAN replication latency for
 Cassandra?


 No. [1]

 =Rob

 [1] There are ways to do something like this task, but you probably don't
 actually want to do them. Trying to do them suggests that you are relying
 on WAN replication timing for your application, which is something you
 almost certainly do not want to do. Why do you believe you have this
 requirement?




Re: Measuring WAN replication latency

2014-07-30 Thread Jeremy Jongsma
Yes, the results should definitely not be relied on as a future performance
indicator for key app functionality. but knowing roughly what your current
replication latency is (and whether it's outside of the normal average) can
inform client failover policies, debug data consistency issues, warn of
datacenter link congestion, etc.


On Wed, Jul 30, 2014 at 12:02 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 30, 2014 at 6:59 AM, Rahul Neelakantan ra...@rahul.be wrote:

 Any ideas you can provide on how to do this will be appreciated, we would
 like to build a latency monitoring tool/dashboard that shows how long it
 takes for data to get sent across various DCs.


 The brute force method described downthread by Jeremy Jongsma gives you
 something like the monitoring you're looking for, but I continue to believe
 it's probably a bad idea to try to design a system in this way.

 =Rob




Re: Authentication exception

2014-07-29 Thread Jeremy Jongsma
This is still happening to me; is there anything else I can check? All
nodes have NTP installed, all are in sync, all have open communication to
each other. But usually first thing in the morning, I get this auth
exception. A little while later, it starts working. I'm very puzzled.


On Tue, Jul 22, 2014 at 8:53 AM, Jeremy Jongsma jer...@barchart.com wrote:

 Verified all clocks are in sync.


 On Mon, Jul 21, 2014 at 10:03 PM, Rahul Menon ra...@apigee.com wrote:

 I could you perhaps check your ntp?


 On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 I routinely get this exception from cqlsh on one of my clusters:

 cql.cassandra.ttypes.AuthenticationException:
 AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException:
 Operation timed out - received only 2 responses.')

 The system_auth keyspace is set to replicate X times given X nodes in
 each datacenter, and at the time of the exception all nodes are reporting
 as online and healthy. After a short period (i.e. 30 minutes), it will let
 me in again.

 What could be the cause of this?






Re: Cassandra on AWS suggestions for data safety

2014-07-24 Thread Jeremy Jongsma
We also run a nightly nodetool snapshot on all nodes, and use duplicity
to sync the snapshot to S3, keeping 7 days' worth of backups.

Since duplicity tracks incremental changes this gives you the benefit of
point-in-time snapshots without duplicating sstables that are common across
multiple backups. It also makes it easy to revert all nodes' state to X
days ago in case of accidental or malicious data corruption.


On Thu, Jul 24, 2014 at 12:17 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 23, 2014 at 4:12 PM, Hao Cheng br...@critica.io wrote:

 3. Using a backup system, either manually via rsync or through something
 like Priam, to directly push backups of the data on ephemeral storage to S3.


 https://github.com/JeremyGrosser/tablesnap

 =Rob




Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jeremy Jongsma
My experience is similar to Nicholas'. Basic usage was easy to get a handle
on, but the advanced tuning/tweaking info is scattered EVERYWHERE around
the web, mostly on personal blogs. It feels like it took way too long to
become confident enough in my understanding of Cassandra that I trust our
deployment configuration in production.

Without this mailing list I would still be on the fence.


On Wed, Jul 23, 2014 at 8:20 AM, Peter Lin wool...@gmail.com wrote:

 @benedict - you're right that I've haven't requested permission to edit.
 You're also right that I've given up on getting edit permission to
 cassandra wiki. I've been struggling and struggled with how to manage
 open source projects, so I totally get it. Managing projects is a thankless
 job most of the time. Pleasing everyone is totally impossible. Apache isn't
 alone in this. I've submitted stuff to google's open source projects in the
 past and had it go into a black hole. We all struggle with managing open
 source projects.

 I am committed to contributing Cassandra community, but just not through
 the wiki. There's lots of different ways to contribute. The jira tickets
 I've submitted have gotten good responses generally. It does take several
 days depending on how busy the committers are, but that's normal for all
 projects.



 On Wed, Jul 23, 2014 at 9:00 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 Requesting a change is very different to requesting permission to edit
 (which, I note, still hasn't been made); we do our best to promote
 community engagement, so granting a privilege request has a different
 mental category to a random edit request, which is much more likely to be
 forgotten by any particular committer in the process of attending to their
 more pressing work.

 The relationship between committers and the community is debated at
 length in all projects, often by vocal individuals such as yourselves who
 are unhappy in some way with how the project is being run. However it is
 very hard to please everyone - most of the time we can't even please all
 the committers, and that is a much smaller and more homogenous group.





 On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin wool...@gmail.com wrote:


 I sent a request to add a link my .Net driver for cassandra to the wiki
 over 5 weeks back and no response at all.

 I sent another request way back in 2013 and got zero response. Again, I
 totally understand people are busy and I'm just as guilty as everyone else
 of letting requests slip by. It's the reality of contributing to open
 source as a hobby. If I wasn't serious about contributing to cassandra
 community, I wouldn't have spent 2.5 months porting Hector to C# manually.

 Perhaps the real cause is that some committers can't empathise with
 others in the community?


 On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 All requests I've seen in the past year to edit the wiki (admittedly
 only 2-3) have been answered promptly with editing privileges. Personally I
 don't have a major preference either way for policy - there are positives
 and negatives to each approach - but, like I said, raise it on the dev list
 and see if anybody else does.

 However I must admit I cannot empathise with your characterisation of
 requesting permission as 'begging', or a 'slap in the face', or that it is
 even particularly onerous. It is a slight psychological barrier, but in my
 personal experience when a psychological barrier as low as this prevents me
 from taking action, it's usually because I don't have as much desire to
 contribute as I thought I did.




 On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin wool...@gmail.com wrote:


 I've submitted requests to edit the wiki in the past and nothing ever
 got done.

 Having been an apache committer and contributor over the years, I can
 totally understand that people are busy. I also understand that most
 developer find writing docs tedious.

 I'd rather not harass the committers about wiki edits, since I didn't
 like it when it happened to me in the past. That's why many apache 
 projects
 keep their wiki's open. Honestly, as much as I find writing docs
 challenging and tedious, it's critical and important. For my other open
 source projects, I force myself to write docs.

 my point is, the wiki should be open and the barrier should be
 removed. Having to beg/ask to edit the wiki feels like a slap in the 
 face
 to me, but maybe I'm alone in this. Then again, I've heard the same
 sentiment from other people about cassandra's wiki. The thing is, they 
 just
 chalk it up to cassandra committers don't give a crap about docs. I do 
 my
 best to defend the committers and point out some are volunteers, but it
 does give the public a negative impression. I know the committers care
 about docs, but they don't always have time to do it.

 I know that given a choice between coding or writing docs, 90% of the
 time I'll choose coding. What I've 

Re: Authentication exception

2014-07-22 Thread Jeremy Jongsma
Verified all clocks are in sync.


On Mon, Jul 21, 2014 at 10:03 PM, Rahul Menon ra...@apigee.com wrote:

 I could you perhaps check your ntp?


 On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 I routinely get this exception from cqlsh on one of my clusters:

 cql.cassandra.ttypes.AuthenticationException:
 AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException:
 Operation timed out - received only 2 responses.')

 The system_auth keyspace is set to replicate X times given X nodes in
 each datacenter, and at the time of the exception all nodes are reporting
 as online and healthy. After a short period (i.e. 30 minutes), it will let
 me in again.

 What could be the cause of this?





Authentication exception

2014-07-21 Thread Jeremy Jongsma
I routinely get this exception from cqlsh on one of my clusters:

cql.cassandra.ttypes.AuthenticationException:
AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException:
Operation timed out - received only 2 responses.')

The system_auth keyspace is set to replicate X times given X nodes in each
datacenter, and at the time of the exception all nodes are reporting as
online and healthy. After a short period (i.e. 30 minutes), it will let me
in again.

What could be the cause of this?


Re: New application - separate column family or separate cluster?

2014-07-09 Thread Jeremy Jongsma
Thanks Tupshin, I am thinking #2 is the way to go in my case, and always
have the option of migrating column families to a new cluster if needed.

Parag, At the traffic volumes I'm talking about, #2 (and especially #3)
will have a lot more total VM nodes, because the other apps are used
lightly enough that there is no reason to add capacity specifically for
them to an already large cluster. But app-specific clusters would need at
least 3 nodes each (for redundancy) when the actual traffic load would
require less than one, hence the increased node costs.


On Wed, Jul 9, 2014 at 7:07 AM, Parag Patel ppa...@clearpoolgroup.com
wrote:

  In your scenario #1, is the total number of nodes staying the same?
 Meaning, if you launch multiple clusters for #2, you’d have N total nodes –
 are we assuming #1 has N or less than N?



 If #1 and #2 both have N, wouldn’t the performance be the same since
 Cassandra’s performance increases linearly?



 *From:* Tupshin Harper [mailto:tups...@tupshin.com]
 *Sent:* Tuesday, July 08, 2014 11:13 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: New application - separate column family or separate
 cluster?



 I've seen a lot of deployments, and I think you captured the scenarios and
 reasoning quite well. You can apply other nuances and details to #2 (e.g.
 segment based on SLA or topology), but I agree with all of your reasoning.

 -Tupshin
 -Global Field Strategy
 -Datastax

 On Jul 8, 2014 10:54 AM, Jeremy Jongsma jer...@barchart.com wrote:

  Do you prefer purpose-specific Cassandra clusters that support a single
 application's data set, or a single Cassandra cluster that contains column
 families for many applications? I realize there is no ideal answer for
 every situation, but what have your experiences been in this area for
 cluster planning?



 My reason for asking is that we have one application with high data volume
 (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in
 the first place. Now we have the tools and cluster management
 infrastructure built up to the point where it is not a major investment to
 store smaller sets of data for other applications in C* also, and I am
 debating whether to:



 1) Store everything in one large cluster (no isolation, low cost)

 2) Use one cluster for the high-volume data, and one for everything else
 (good isolation, medium cost)

 3) Give every major service its own cluster, even if they have small
 amounts of data (best isolation, highest cost)



 I suspect #2 is the way to go as far as balancing hosting costs and
 application performance isolation. Any pros or cons am I missing?



 -j




New application - separate column family or separate cluster?

2014-07-08 Thread Jeremy Jongsma
Do you prefer purpose-specific Cassandra clusters that support a single
application's data set, or a single Cassandra cluster that contains column
families for many applications? I realize there is no ideal answer for
every situation, but what have your experiences been in this area for
cluster planning?

My reason for asking is that we have one application with high data volume
(multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in
the first place. Now we have the tools and cluster management
infrastructure built up to the point where it is not a major investment to
store smaller sets of data for other applications in C* also, and I am
debating whether to:

1) Store everything in one large cluster (no isolation, low cost)
2) Use one cluster for the high-volume data, and one for everything else
(good isolation, medium cost)
3) Give every major service its own cluster, even if they have small
amounts of data (best isolation, highest cost)

I suspect #2 is the way to go as far as balancing hosting costs and
application performance isolation. Any pros or cons am I missing?

-j


Re: Storing values of mixed types in a list

2014-06-24 Thread Jeremy Jongsma
Use a ByteBuffer value type with your own serialization (we use protobuf
for complex value structures)
On Jun 24, 2014 5:30 AM, Tuukka Mustonen tuukka.musto...@gmail.com
wrote:

 Hello,

 I need to store a list of mixed types in Cassandra. The list may contain
 numbers, strings and booleans. So I would need something like list?.

 Is this possible in Cassandra and if not, what workaround would you
 suggest for storing a list of mixed type items? I sketched a few (using a
 list per type, using list of user types in Cassandra 2.1, etc.), but I get
 a bad feeling about each.

 Couldn't find an exact answer to this through searches...
 Regards,
 Tuukka

 P.S. I first asked this at SO before realizing the traffic there is very
 low:
 http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra




Re: How to perform Range Queries in Cassandra

2014-06-24 Thread Jeremy Jongsma
You'd be better off using external indexing (ElasticSearch or Solr),
Cassandra isn't really designed for this sort of querying.
On Jun 24, 2014 3:09 AM, Mike Carter jaloos...@gmail.com wrote:

 Hello!


 I'm a beginner in C* and I'm quite struggling with it.

 I’d like to measure the performance of some Cassandra-Range-Queries. The
 idea is to execute multidimensional range-queries on Cassandra. E.g. there
 is a given table of 1million rows with 10 columns and I like to execute
 some queries like “select count(*) from testable where d=1 and v110 and v2
 20 and v3 45 and v470 … allow filtering”.  This kind of queries is very
 slow in C* and soon the tables are bigger, I get a read-timeout probably
 caused by long scan operations.

 In further tests I like to extend the dimensions to more than 200 hundreds
 and the rows to 100millions, but actually I can’t handle this small table.
 Should reorganize the data or is it impossible to perform such high
 multi-dimensional queries on Cassandra?





 The setup:

 Cassandra is installed on a single node with 2 TB disk space and 180GB Ram.

 Connected to Test Cluster at localhost:9160.

 [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]



 Keyspace:

 CREATE KEYSPACE test WITH replication = {

   'class': 'SimpleStrategy',

   'replication_factor': '1'

 };





 Table:

 CREATE TABLE testc21 (

   key int,

   d int,

   v1 int,

   v10 int,

   v2 int,

   v3 int,

   v4 int,

   v5 int,

   v6 int,

   v7 int,

   v8 int,

   v9 int,

   PRIMARY KEY (key)

 ) WITH

   bloom_filter_fp_chance=0.01 AND

   caching='ROWS_ONLY' AND

   comment='' AND

   dclocal_read_repair_chance=0.00 AND

   gc_grace_seconds=864000 AND

   index_interval=128 AND

   read_repair_chance=0.10 AND

   replicate_on_write='true' AND

   populate_io_cache_on_flush='false' AND

   default_time_to_live=0 AND

   speculative_retry='99.0PERCENTILE' AND

   memtable_flush_period_in_ms=0 AND

   compaction={'class': 'SizeTieredCompactionStrategy'} AND

   compression={'sstable_compression': 'LZ4Compressor'};



 CREATE INDEX testc21_d_idx ON testc21 (d);



  select * from testc21 limit 10;

 key| d | v1 | v10 | v2 | v3 | v4  | v5 | v6 | v7 | v8 | v9

 +---++-+++-+++++-

  302602 | 1 | 56 |  55 | 26 | 45 |  67 | 75 | 25 | 50 | 26 |  54

  531141 | 1 | 90 |  77 | 86 | 42 |  76 | 91 | 47 | 31 | 77 |  27

  693077 | 1 | 67 |  71 | 14 | 59 | 100 | 90 | 11 | 15 |  6 |  19

4317 | 1 | 70 |  77 | 44 | 77 |  41 | 68 | 33 |  0 | 99 |  14

  927961 | 1 | 15 |  97 | 95 | 80 |  35 | 36 | 45 |  8 | 11 | 100

  313395 | 1 | 68 |  62 | 56 | 85 |  14 | 96 | 43 |  6 | 32 |   7

  368168 | 1 |  3 |  63 | 55 | 32 |  18 | 95 | 67 | 78 | 83 |  52

  671830 | 1 | 14 |  29 | 28 | 17 |  42 | 42 |  4 |  6 | 61 |  93

   62693 | 1 | 26 |  48 | 15 | 22 |  73 | 94 | 86 |  4 | 66 |  63

  488360 | 1 |  8 |  57 | 86 | 31 |  51 |  9 | 40 | 52 | 91 |  45

 Mike



Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
I've found that if you have any amount of latency between your client and
nodes, and you are executing a large batch of queries, you'll usually want
to send them together to one node unless execution time is of no concern.
The tradeoff is resource usage on the connected node vs. time to complete
all the queries, because you'll need fewer client - node network round
trips.

With large numbers of queries you will still want to make sure you split
them into manageable batches before sending them, to control memory usage
on the executing node. I've been limiting queries to batches of 100 keys in
scenarios like this.


On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com
wrote:

 However my extensive benchmarking this week of the python driver from
 master shows a performance *decrease* when using 'token_aware'.

 This is on 12-node, 2-datacenter, RF-3 cluster in AWS.

 Also why do the work the coordinator will do for you: send all the
 queries, wait for everything to come back in whatever order, and sort the
 result.

 I would rather keep my app code simple.

 But the real point is that you should benchmark in your own environment.

 ml


 On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Yes, I am using the CQL datastax drivers.
 It was a good advice, thanks a lot Janathan.
 []s


 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:

 The only case in which it might be better to use an IN clause is if
 the entire query can be satisfied from that machine.  Otherwise, go
 async.

 The native driver reuses connections and intelligently manages the
 pool for you.  It can also multiplex queries over a single connection.

 I am assuming you're using one of the datastax drivers for CQL, btw.

 Jon

 On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:
  This is interesting, I didn't know that!
  It might make sense then to use select = + async + token aware, I will
 try
  to change my code.
 
  But would it be a recomended solution for these cases? Any other
 options?
 
  I still would if this is the right use case for Cassandra, to look for
  random keys in a huge cluster. After all, the amount of connections to
  Cassandra will still be huge, right... Wouldn't it be a problem?
  Or when you use async the driver reuses the connection?
 
  []s
 
 
  2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:
 
  If you use async and your driver is token aware, it will go to the
  proper node, rather than requiring the coordinator to do so.
 
  Realistically you're going to have a connection open to every server
  anyways.  It's the difference between you querying for the data
  directly and using a coordinator as a proxy.  It's faster to just ask
  the node with the data.
 
  On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
  marc...@s1mbi0se.com.br wrote:
   But using async queries wouldn't be even worse than using SELECT IN?
   The justification in the docs is I could query many nodes, but I
 would
   still
   do it.
  
   Today, I use both async queries AND SELECT IN:
  
   SELECT_ENTITY_LOOKUP = SELECT entity_id FROM  + ENTITY_LOOKUP + 
   WHERE
   name=%s and value in(%s)
  
   for name, values in identifiers.items():
  query = self.SELECT_ENTITY_LOOKUP % ('%s',
   ','.join(['%s']*len(values)))
  args = [name] + values
  query_msg = query % tuple(args)
  futures.append((query_msg, self.session.execute_async(query,
 args)))
  
   for query_msg, future in futures:
  try:
 rows = future.result(timeout=10)
 for row in rows:
   entity_ids.add(row.entity_id)
  except:
 logging.error(Query '%s' returned ERROR  % (query_msg))
 raise
  
   Using async just with select = would mean instead of 1 async query
   (example:
   in (0, 1, 2)), I would do several, one for each value of values
 array
   above.
   In my head, this would mean more connections to Cassandra and the
 same
   amount of work, right? What would be the advantage?
  
   []s
  
  
  
  
   2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:
  
   Your other option is to fire off async queries.  It's pretty
   straightforward w/ the java or python drivers.
  
   On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
   marc...@s1mbi0se.com.br wrote:
I was taking a look at Cassandra anti-patterns list:
   
   
   
   
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
   
Among then is
   
SELECT ... IN or index lookups¶
   
SELECT ... IN and index lookups (formerly secondary indexes)
 should
be
avoided except for specific scenarios. See When not to use IN in
SELECT
and
When not to use an index in Indexing in
   
CQL for Cassandra 2.0
   
And Looking at the SELECT doc, I saw:
   
When not to use IN¶
   
The recommendations about when not to use 

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
That depends on the connection pooling implementation in your driver.
Astyanax will keep N connections open to each node (configurable) and route
each query in a separate message over an existing connection, waiting until
one becomes available if all are in use.


On Fri, Jun 20, 2014 at 12:32 PM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:

 A question, not sure if you guys know the answer:
 Supose I async query 1000 rows using token aware and suppose I have 10
 nodes. Suppose also each node would receive 100 row queries each.
 How does async work in this case? Would it send each row query to each
 node in a different connection? Different message?
 I guess if there was a way to use batch with async, once you commit the
 batch for the 1000 queries, it would create 1 connection to each host and
 query 100 rows in a single message to each host.
 This would decrease resource usage, am I wrong?

 []s


 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com:

 I've found that if you have any amount of latency between your client and
 nodes, and you are executing a large batch of queries, you'll usually want
 to send them together to one node unless execution time is of no concern.
 The tradeoff is resource usage on the connected node vs. time to complete
 all the queries, because you'll need fewer client - node network round
 trips.

 With large numbers of queries you will still want to make sure you split
 them into manageable batches before sending them, to control memory usage
 on the executing node. I've been limiting queries to batches of 100 keys in
 scenarios like this.


 On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 However my extensive benchmarking this week of the python driver from
 master shows a performance *decrease* when using 'token_aware'.

 This is on 12-node, 2-datacenter, RF-3 cluster in AWS.

 Also why do the work the coordinator will do for you: send all the
 queries, wait for everything to come back in whatever order, and sort the
 result.

 I would rather keep my app code simple.

 But the real point is that you should benchmark in your own environment.

 ml


 On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Yes, I am using the CQL datastax drivers.
 It was a good advice, thanks a lot Janathan.
 []s


 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:

 The only case in which it might be better to use an IN clause is if
 the entire query can be satisfied from that machine.  Otherwise, go
 async.

 The native driver reuses connections and intelligently manages the
 pool for you.  It can also multiplex queries over a single connection.

 I am assuming you're using one of the datastax drivers for CQL, btw.

 Jon

 On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:
  This is interesting, I didn't know that!
  It might make sense then to use select = + async + token aware, I
 will try
  to change my code.
 
  But would it be a recomended solution for these cases? Any other
 options?
 
  I still would if this is the right use case for Cassandra, to look
 for
  random keys in a huge cluster. After all, the amount of connections
 to
  Cassandra will still be huge, right... Wouldn't it be a problem?
  Or when you use async the driver reuses the connection?
 
  []s
 
 
  2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:
 
  If you use async and your driver is token aware, it will go to the
  proper node, rather than requiring the coordinator to do so.
 
  Realistically you're going to have a connection open to every server
  anyways.  It's the difference between you querying for the data
  directly and using a coordinator as a proxy.  It's faster to just
 ask
  the node with the data.
 
  On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
  marc...@s1mbi0se.com.br wrote:
   But using async queries wouldn't be even worse than using SELECT
 IN?
   The justification in the docs is I could query many nodes, but I
 would
   still
   do it.
  
   Today, I use both async queries AND SELECT IN:
  
   SELECT_ENTITY_LOOKUP = SELECT entity_id FROM  + ENTITY_LOOKUP +
 
   WHERE
   name=%s and value in(%s)
  
   for name, values in identifiers.items():
  query = self.SELECT_ENTITY_LOOKUP % ('%s',
   ','.join(['%s']*len(values)))
  args = [name] + values
  query_msg = query % tuple(args)
  futures.append((query_msg, self.session.execute_async(query,
 args)))
  
   for query_msg, future in futures:
  try:
 rows = future.result(timeout=10)
 for row in rows:
   entity_ids.add(row.entity_id)
  except:
 logging.error(Query '%s' returned ERROR  % (query_msg))
 raise
  
   Using async just with select = would mean instead of 1 async query
   (example:
   in (0, 1, 2)), I would do several, one for each value of values
 array
   above.
   In my head, this would mean more

Custom snitch classpath?

2014-06-20 Thread Jeremy Jongsma
Where do I add my custom snitch JAR to the Cassandra classpath so I can use
it?


Re: Custom snitch classpath?

2014-06-20 Thread Jeremy Jongsma
Sharing in case anyone else wants to use this:

https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java

Basically it is a proxy that attempts to use GossipingPropertyFileSnitch,
and it that fails to initialize due to missing rack or datacenter
values, it falls back to Ec2MultiRegionSnitch. We are using it for hybrid
cloud deployments between AWS and our private datacenter.


On Fri, Jun 20, 2014 at 1:04 PM, Tyler Hobbs ty...@datastax.com wrote:

 The lib directory (where all the other jars are).  bin/cassandra.in.sh
 does this:

 for jar in $CASSANDRA_HOME/lib/*.jar; do
 CLASSPATH=$CLASSPATH:$jar
 done



 On Fri, Jun 20, 2014 at 12:58 PM, Jeremy Jongsma jer...@barchart.com
 wrote:

 Where do I add my custom snitch JAR to the Cassandra classpath so I can
 use it?




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
There is nothing preventing that in Cassandra, it's just a matter of how
intelligent the driver API is. Submit a feature request to Astyanax or
Datastax driver projects.


On Fri, Jun 20, 2014 at 2:27 PM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:

 The bad design part (just my opinion, no intention to offend) is not allow
 the possibility of sending batches directly to the data nodes, without
 using a coordinator.
 I would choose that option.
 []s


 2014-06-20 16:05 GMT-03:00 DuyHai Doan doanduy...@gmail.com:

 Well it's kind of a trade-off.

  Either you send data directly to the primary replica nodes to take
 advantage of data-locality using token-aware strategy and the price to pay
 is a high number of opened connections from client side.

 Or you just batch data to a random node playing the coordinator role to
 dispatch requests to the right nodes. The price to pay is then spike load
 on 1 node (the coordinator) and intra-cluster bandwdith usage.

  The choice is yours, it has nothing to do with good or bad design.


 On Fri, Jun 20, 2014 at 8:55 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 I am using python + CQL Driver.
 I wonder how they do...
 These things seems little important, but they are fundamental to get a
 good performance in Cassandra...
 I wish there was a simpler way to query in batches. Opening a large
 amount of connections and sending 1 message at a time seems bad to me, as
 sometimes you want to work with small rows.
 It's no surprise Cassandra performs better when we use average row
 sizes. But honestly I disagree with this part of Cassandra/Driver's design.
 []s


 2014-06-20 14:37 GMT-03:00 Jeremy Jongsma jer...@barchart.com:

 That depends on the connection pooling implementation in your driver.
 Astyanax will keep N connections open to each node (configurable) and route
 each query in a separate message over an existing connection, waiting until
 one becomes available if all are in use.


 On Fri, Jun 20, 2014 at 12:32 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 A question, not sure if you guys know the answer:
 Supose I async query 1000 rows using token aware and suppose I have 10
 nodes. Suppose also each node would receive 100 row queries each.
 How does async work in this case? Would it send each row query to each
 node in a different connection? Different message?
 I guess if there was a way to use batch with async, once you commit
 the batch for the 1000 queries, it would create 1 connection to each host
 and query 100 rows in a single message to each host.
 This would decrease resource usage, am I wrong?

 []s


 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com:

 I've found that if you have any amount of latency between your client
 and nodes, and you are executing a large batch of queries, you'll usually
 want to send them together to one node unless execution time is of no
 concern. The tradeoff is resource usage on the connected node vs. time to
 complete all the queries, because you'll need fewer client - node 
 network
 round trips.

 With large numbers of queries you will still want to make sure you
 split them into manageable batches before sending them, to control memory
 usage on the executing node. I've been limiting queries to batches of 100
 keys in scenarios like this.


 On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 However my extensive benchmarking this week of the python driver
 from master shows a performance *decrease* when using 'token_aware'.

 This is on 12-node, 2-datacenter, RF-3 cluster in AWS.

 Also why do the work the coordinator will do for you: send all the
 queries, wait for everything to come back in whatever order, and sort 
 the
 result.

 I would rather keep my app code simple.

 But the real point is that you should benchmark in your own
 environment.

 ml


 On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Yes, I am using the CQL datastax drivers.
 It was a good advice, thanks a lot Janathan.
 []s


 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com:

 The only case in which it might be better to use an IN clause is if
 the entire query can be satisfied from that machine.  Otherwise, go
 async.

 The native driver reuses connections and intelligently manages the
 pool for you.  It can also multiplex queries over a single
 connection.

 I am assuming you're using one of the datastax drivers for CQL,
 btw.

 Jon

 On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle
 marc...@s1mbi0se.com.br wrote:
  This is interesting, I didn't know that!
  It might make sense then to use select = + async + token aware,
 I will try
  to change my code.
 
  But would it be a recomended solution for these cases? Any
 other options?
 
  I still would if this is the right use case for Cassandra, to
 look for
  random keys in a huge cluster. After all, the amount of
 connections to
  Cassandra

Re: running out of diskspace during maintenance tasks

2014-06-18 Thread Jeremy Jongsma
One option is to add new nodes, and do a node repair/cleanup on everything.
That will at least reduce your per-node data size.


On Wed, Jun 18, 2014 at 11:01 AM, Brian Tarbox tar...@cabotresearch.com
wrote:

 I'm running on AWS m2.2xlarge instances using the ~800 gig
 ephemeral/attached disk for my data directory.  My data size per node is
 nearing 400 gig.

 Sometimes during maintenance operations (repairs mostly I think) I run out
 of disk space as my understanding is that some of these operations require
 double the space of one's data.

 Since I can't change the size of attached storage for my instance type my
 question is can I somehow get these maintenance operations to use other
 volumes?

 Failing that, what are my options?  Thanks.

 Brian Tarbox



Re: Large number of row keys in query kills cluster

2014-06-12 Thread Jeremy Jongsma
Good to know, thanks Peter. I am worried about client-to-node latency if I
have to do 20,000 individual queries, but that makes it clearer that at
least batching in smaller sizes is a good idea.


On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford psanf...@retailnext.net
wrote:

 On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 The big problem seems to have been requesting a large number of row keys
 combined with a large number of named columns in a query. 20K rows with 20K
 columns destroyed my cluster. Splitting it into slices of 100 sequential
 queries fixed the performance issue.

 When updating 20K rows at a time, I saw a different issue -
 BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
 that issue.

 Is there any documentation on this? Obviously these limits will vary by
 cluster capacity, but for new users it would be great to know that you can
 run into problems with large queries, and how they present themselves when
 you hit them. The errors I saw are pretty opaque, and took me a couple days
 to track down.


 The first thing that comes to mind is the Multiget section on the Datastax
 anti-patterns page:
 http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets



 -psanford





Re: Backup Cassandra to

2014-06-12 Thread Jeremy Jongsma
That will not necessarily scale, and I wouldn't recommend it - your backup
node will need as much disk space as an entire replica of the cluster
data. For a cluster with a couple of nodes that may be OK, for dozens of
nodes, probably not. You also lose the ability to restore individual nodes
- the only way to replace a dead node is with a full repair.


On Thu, Jun 12, 2014 at 1:38 PM, Jabbar Azam aja...@gmail.com wrote:

 There is another way. You create a cassandra node in it's own datacentre,
 then any changes going to the main cluster will be replicated to this node.
 You can backup from this node. In the event of a disaster the data from
 both clusters and wiped and then replayed to the individual node. The data
 will then be replicated to the main cluster.

 This will also work for the case when the main cluster increases or
 decreases in size.

 Thanks

 Jabbar Azam


 On 12 June 2014 18:27, Andrew redmu...@gmail.com wrote:

 There isn’t a lot of “actual documentation” on the act of backing up, but
 I did research for my own company into the act of backing up and
 unfortunately, you’re not going to have a similar setup as Oracle.  There
 are reasons for this, however.

 If you have more than one replica of the data, that means each node in
 the cluster will likely be holding it’s own unique set of data.  So you
 would need to back up the ENTIRE set of nodes in order to get an accurate
 snapshot.  Likewise, you would need to restore it to the cluster of the
 same size in order to restore it (and then run refresh to tell Cassandra to
 reload the tables from disk).

 Copying the snapshots is easy—it’s just a bunch of files in your data
 directory.  It’s even smaller if you use incremental snapshots.  I’ll
 admit, I’m no expert on tape drives, but I’d imagine it’s as easy as
 copy/pasting the snapshots to the drive (or whatever the equivalent tape
 drive operation is).

 What you (and I, admittedly) would really like to see is a way to back up
 all the logical *data*, and then simply replay it.  This is possible on
 Oracle because it’s typically restricted to either one (plus maybe one or
 two standbys) that don’t “share” any data.  What you could do, in theory,
 is literally select all the data in the entire cluster and simply dump it
 to a file—but this could take hours, days, or even weeks to complete,
 depending on the size of your data, and then simply re-load it.  This is
 probably not a great solution, but hey—maybe it will work for you.

 Netflix (thankfully) has posted a lot of their operational observations
 and what not, including their utility Priam.  In their documentation, they
 include some overviews of what they use:
 https://github.com/Netflix/Priam/wiki/Backups

 Hope this helps!

 Andrew

 On June 12, 2014 at 6:18:57 AM, Jack Krupansky (j...@basetechnology.com)
 wrote:

   The doc for backing up – and restoring – Cassandra is here:

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html

 That doesn’t tell you how to move the “snapshot” to or from tape, but a
 snapshot is the starting point for backing up Cassandra.

 -- Jack Krupansky

  *From:* Camacho, Maria (NSN - FI/Espoo) maria.cama...@nsn.com
 *Sent:* Thursday, June 12, 2014 4:57 AM
 *To:* user@cassandra.apache.org
 *Subject:* Backup Cassandra to


  Hi there,



  I'm trying to find information/instructions about backing up and
 restoring a Cassandra DB to and from a tape unit.



  I was hopping someone in this forum could help me with this since I
 could not find anything useful in Google :(



  Thanks in advance,

  Maria







Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
I'm using Astyanax with a query like this:

clusterContext
  .getClient()
  .getKeyspace(instruments)
  .prepareQuery(INSTRUMENTS_CF)
  .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM)
  .getKeySlice(new String[] {
ROW1,
ROW2,
// 20,000 keys here...
ROW2
  })
  .execute();

At the time this query executes the first time (resulting in unresponsive
cluster), there are zero rows in the column family. Schema is below, pretty
basic:

CREATE KEYSPACE instruments WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'aws-us-east-1': '2'
};

CREATE TABLE instruments (
  key bigint PRIMARY KEY,
  definition blob,
  id bigint,
  name text,
  symbol text,
  updated bigint
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};




On Tue, Jun 10, 2014 at 6:35 PM, Laing, Michael michael.la...@nytimes.com
wrote:

 Perhaps if you described both the schema and the query in more detail, we
 could help... e.g. did the query have an IN clause with 2 keys? Or is
 the key compound? More detail will help.


 On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma jer...@barchart.com
 wrote:

 I didn't explain clearly - I'm not requesting 2 unknown keys
 (resulting in a full scan), I'm requesting 2 specific rows by key.
 On Jun 10, 2014 6:02 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Jeremy

 Basically what you are doing is to ask Cassandra to do a distributed
 full scan on all the partitions across the cluster, it's normal that the
 nodes are somehow stressed.

 How did you make the query? Are you using Thrift or CQL3 API?

 Please note that there is another way to get all partition keys : SELECT
 DISTINCT partition_key FROM..., more details here :
 www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
 I ran an application today that attempted to fetch 20,000+ unique row
 keys in one query against a set of completely empty column families. On a
 4-node cluster (EC2 m1.large instances) with the recommended memory
 settings (2 GB heap), every single node immediately ran out of memory and
 became unresponsive, to the point where I had to kill -9 the cassandra
 processes.

 Now clearly this query is not the best idea in the world, but the
 effects of it are a bit disturbing. What could be going on here? Are there
 any other query pitfalls I should be aware of that have the potential to
 explode the entire cluster?

 -j





Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
The big problem seems to have been requesting a large number of row keys
combined with a large number of named columns in a query. 20K rows with 20K
columns destroyed my cluster. Splitting it into slices of 100 sequential
queries fixed the performance issue.

When updating 20K rows at a time, I saw a different issue -
BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
that issue.

Is there any documentation on this? Obviously these limits will vary by
cluster capacity, but for new users it would be great to know that you can
run into problems with large queries, and how they present themselves when
you hit them. The errors I saw are pretty opaque, and took me a couple days
to track down.

In any case this seems like a bug to me - it shouldn't be possible to
completely lock up a cluster with a valid query that isn't doing a table
scan, should it?


On Wed, Jun 11, 2014 at 9:33 AM, Jeremy Jongsma jer...@barchart.com wrote:

 I'm using Astyanax with a query like this:

 clusterContext
   .getClient()
   .getKeyspace(instruments)
   .prepareQuery(INSTRUMENTS_CF)
   .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM)
   .getKeySlice(new String[] {
 ROW1,
 ROW2,
 // 20,000 keys here...
 ROW2
   })
   .execute();

 At the time this query executes the first time (resulting in unresponsive
 cluster), there are zero rows in the column family. Schema is below, pretty
 basic:

 CREATE KEYSPACE instruments WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'aws-us-east-1': '2'
 };

 CREATE TABLE instruments (
   key bigint PRIMARY KEY,
   definition blob,
   id bigint,
   name text,
   symbol text,
   updated bigint
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};




 On Tue, Jun 10, 2014 at 6:35 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 Perhaps if you described both the schema and the query in more detail, we
 could help... e.g. did the query have an IN clause with 2 keys? Or is
 the key compound? More detail will help.


 On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma jer...@barchart.com
 wrote:

 I didn't explain clearly - I'm not requesting 2 unknown keys
 (resulting in a full scan), I'm requesting 2 specific rows by key.
 On Jun 10, 2014 6:02 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Jeremy

 Basically what you are doing is to ask Cassandra to do a distributed
 full scan on all the partitions across the cluster, it's normal that the
 nodes are somehow stressed.

 How did you make the query? Are you using Thrift or CQL3 API?

 Please note that there is another way to get all partition keys :
 SELECT DISTINCT partition_key FROM..., more details here :
 www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
 I ran an application today that attempted to fetch 20,000+ unique row
 keys in one query against a set of completely empty column families. On a
 4-node cluster (EC2 m1.large instances) with the recommended memory
 settings (2 GB heap), every single node immediately ran out of memory and
 became unresponsive, to the point where I had to kill -9 the cassandra
 processes.

 Now clearly this query is not the best idea in the world, but the
 effects of it are a bit disturbing. What could be going on here? Are there
 any other query pitfalls I should be aware of that have the potential to
 explode the entire cluster?

 -j






Frequent secondary index sstable corruption

2014-06-10 Thread Jeremy Jongsma
I'm in the process of migrating data over to cassandra for several of our
apps, and a few of the schemas use secondary indexes. Four times in the
last couple months I've run into a corrupted sstable belonging to a
secondary index, but have never seen this on any other sstables. When it
happens, any query against the secondary index just hangs until the node is
fixed. It's making me a bit nervous about using secondary indexes in
production.

This has usually happened after a bulk data import, so I am wondering if
the firehose method of dumping initial data into cassandra (write
consistency = any) is causing some sort of write concurrency issue when it
comes to secondary indexes. Has anyone else experienced this?

The cluster is running 1.2.16 on 4x EC2 m1.large instances.


Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I ran an application today that attempted to fetch 20,000+ unique row keys
in one query against a set of completely empty column families. On a 4-node
cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
heap), every single node immediately ran out of memory and became
unresponsive, to the point where I had to kill -9 the cassandra processes.

Now clearly this query is not the best idea in the world, but the effects
of it are a bit disturbing. What could be going on here? Are there any
other query pitfalls I should be aware of that have the potential to
explode the entire cluster?

-j


Re: Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting
in a full scan), I'm requesting 2 specific rows by key.
On Jun 10, 2014 6:02 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Jeremy

 Basically what you are doing is to ask Cassandra to do a distributed full
 scan on all the partitions across the cluster, it's normal that the nodes
 are somehow stressed.

 How did you make the query? Are you using Thrift or CQL3 API?

 Please note that there is another way to get all partition keys : SELECT
 DISTINCT partition_key FROM..., more details here :
 www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
 I ran an application today that attempted to fetch 20,000+ unique row keys
 in one query against a set of completely empty column families. On a 4-node
 cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
 heap), every single node immediately ran out of memory and became
 unresponsive, to the point where I had to kill -9 the cassandra processes.

 Now clearly this query is not the best idea in the world, but the effects
 of it are a bit disturbing. What could be going on here? Are there any
 other query pitfalls I should be aware of that have the potential to
 explode the entire cluster?

 -j



Re: Question about replacing a dead node

2014-06-03 Thread Jeremy Jongsma
A dead node is still allocated key ranges, and Cassandra will wait for it
to come back online rather than redistributing its data. It needs to be
decommissioned or replaced by a new node for it to be truly dead as far as
the cluster is concerned.


On Tue, Jun 3, 2014 at 11:12 AM, Prem Yadav ipremya...@gmail.com wrote:

 Hi,

 in the last week week, we saw at least two emails about dead node
 replacement. Though I saw the documentation about how to do this, i am not
 sure I understand why this is required.

 Assuming replication factor is 2, if a node dies, why does it matter? If
 we add a new node is added, shouldn't it just take the chunk of data it
 server as the primary node from the other existing nodes.
 Why do we need to worry about replacing the dead node?

 Thanks



Re: Cassandra snapshot

2014-06-02 Thread Jeremy Jongsma
I wouldn't recommend doing this before regular backups for the simple
reason that for large data sets it will take a long time to run, and
will require that your node backup schedule be properly staggered (you
should never be running repair on all nodes at the same time.) Backups
should be treated as eventually consistent just like Cassandra itself.

That said, if you are doing a one-time backup of a node and for whatever
reason you want it as up-to-date as possible without unnecessary data, you
should also run nodetool compact.


On Mon, Jun 2, 2014 at 2:18 PM, ng pipeli...@gmail.com wrote:


 I need to make sure that all the data in sstable before taking the
 snapshot.

 I am thinking of
 nodetool cleanup
 nodetool repair
 nodetool flush
 nodetool snapshot

 Am I missing anything else?

 Thanks in advance for the responses/suggestions.

 ng



Re: Managing truststores with inter-node encryption

2014-05-30 Thread Jeremy Jongsma
It appears that only adding the CA certificate to the truststore is
sufficient for this.


On Thu, May 22, 2014 at 10:05 AM, Jeremy Jongsma jer...@barchart.com
wrote:

 The docs say that each node needs every other node's certificate in its
 local truststore:


 http://www.datastax.com/documentation/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html

 This seems like a bit of a headache for adding nodes to a cluster. How do
 others deal with this?

 1) If I am self-signing the client certificates (with puppetmaster), is it
 enough that the truststore just contain the CA certificate used to sign
 them? This is the typical PKI mechanism for verifying trust, so I am hoping
 it works here.

 2) If not, can I use the same certificate for every node? If so, what is
 the downside? I'm mainly concerned with encryption over public internet
 links, not node identity verification.





Managing truststores with inter-node encryption

2014-05-22 Thread Jeremy Jongsma
The docs say that each node needs every other node's certificate in its
local truststore:

http://www.datastax.com/documentation/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html

This seems like a bit of a headache for adding nodes to a cluster. How do
others deal with this?

1) If I am self-signing the client certificates (with puppetmaster), is it
enough that the truststore just contain the CA certificate used to sign
them? This is the typical PKI mechanism for verifying trust, so I am hoping
it works here.

2) If not, can I use the same certificate for every node? If so, what is
the downside? I'm mainly concerned with encryption over public internet
links, not node identity verification.