Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Robert Wille
Okay, this is going to be a pretty long post, but I think its an interesting data model, and hopefully someone will find it worth going through. First, I think it will be easier to understand the modeling choices I made if you see the end product. Go to http://www.fold3.com/browse.php#249|hzUkL

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Jack Krupansky
Hmmm... If you serialize the tree properly in a partition, you could always read an entire sub-tree as a single slice (consecutive CQL rows.) Is there much more to it? -- Jack Krupansky On Fri, Mar 27, 2015 at 7:35 PM, Ben Bromhead wrote: > +1 would love to see how you do it > > On 27 March 201

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Ben Bromhead
+1 would love to see how you do it On 27 March 2015 at 07:18, Jonathan Haddad wrote: > I'd be interested to see that data model. I think the entire list would > benefit! > > On Thu, Mar 26, 2015 at 8:16 PM Robert Wille wrote: > >> I have a cluster which stores tree structures. I keep several hu

Re: Issue with removing a node and adding it back

2015-03-27 Thread Shiwen Cheng
Thanks Robert! Yes I tried what you said: clean the data and re-bootstrap. But still it failed, once at the point of 600GB transferred and once at 1.1TB :( But I could see following exceptions from time to time: = java.io.IOException: net.jpountz.lz4.LZ4Exception: Error decodin

Re: High latencies for simple queries

2015-03-27 Thread Ben Bromhead
One other thing to keep in mind / check is that doing these tests locally the cassandra driver will connect using the network stack, whereas postgres supports local connections over a unix domain socket (this is also enabled by default). Unix domain sockets are significantly faster than tcp as you

Re: High latencies for simple queries

2015-03-27 Thread Laing, Michael
Actually I am in the middle of setting up the same sort of thing for PostgreSQL using psycopg2 and pyev. I'll be using Cassandra and PostgreSQL in an IoT experiment as the backend for swarms of MQTT brokers at something in the 10-100M client range. ml On Fri, Mar 27, 2015 at 4:59 PM, Laing, Mich

Re: High latencies for simple queries

2015-03-27 Thread Laing, Michael
I use callback chaining with the python driver and can confirm that it is very fast. You can "chain the chains" together to perform sequential processing. I do this when retrieving "metadata" and then the referenced "payload" for example, when the metadata has been inverted and the payload is larg

Re: High latencies for simple queries

2015-03-27 Thread Tyler Hobbs
Since you're executing queries sequentially, you may want to look into using callback chaining to avoid the cross-thread signaling that results in the 1ms latencies. Basically, just use session.execute_async() and attach a callback to the returned future that will execute your next query. The cal

Re: High latencies for simple queries

2015-03-27 Thread Artur Siekielski
I think that in your example Postgres spends most time on waiting for fsync() to complete. On Linux, for a battery-backed raid controller, it's safe to mount ext4 filesystem with "barrier=0" option which improves fsync() performance a lot. I have partitions mounted with this option and I did a

Re: High latencies for simple queries

2015-03-27 Thread Artur Siekielski
Yes, I'm concerned about the latency. Throughput can be high even when using Python: http://datastax.github.io/python-driver/performance.html. But in my scenarios I need to run queries sequentially, so latencies matter. And Cassandra requires issuing more queries than SQL databases so these lat

Re: High latencies for simple queries

2015-03-27 Thread Ben Bromhead
Latency can be so variable even when testing things locally. I quickly fired up postgres and did the following with psql: ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i)); CREATE TABLE ben=# \timing Timing is on. ben=# INSERT INTO foo VALUES(2, 'yay'); INSERT 0 1 Time: 1.162 ms ben=# INSERT I

Re: cassandra source code

2015-03-27 Thread Divya Divs
hi I hav run the source of cassandra in eclipse juno by following this document http://brianoneill.blogspot.in/2015/03/getting-started-with-cassandra.html. but i'm getting the exceptions. please help to solve this. INFO 17:43:40 Node localhost/127.0.0.1 state jump to normal INFO 17:43:41 Netty u

Re: High latencies for simple queries

2015-03-27 Thread Tyler Hobbs
Just to check, are you concerned about minimizing that latency or maximizing throughput? I'll that latency is what you're actually concerned about. A fair amount of that latency is probably happening in the python driver. Although it can easily execute ~8k operations per second (using cpython),

Re: Delayed events processing / queue (anti-)pattern

2015-03-27 Thread Thunder Stumpges
Yeah that's the one :) sorry, was on my phone and didn't want to look up the exact name. Cheers, Thunder On Mar 27, 2015 6:17 AM, "Brice Dutheil" wrote: > Would it help here to not actually issue a delete statement but instead > use date based compaction and a dynamically calculated ttl that is

Re: upgrade from 1.0.12 to 1.1.12

2015-03-27 Thread Jonathan Haddad
Running upgrade is a noop if the tables don't need to be upgraded. I consider the cost of this to be less than the cost of missing an upgrade. On Thu, Mar 26, 2015 at 4:23 PM Robert Coli wrote: > On Wed, Mar 25, 2015 at 7:16 PM, Jonathan Haddad > wrote: > >> There's no downside to running upgrad

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Jonathan Haddad
I'd be interested to see that data model. I think the entire list would benefit! On Thu, Mar 26, 2015 at 8:16 PM Robert Wille wrote: > I have a cluster which stores tree structures. I keep several hundred > unrelated trees. The largest has about 180 million nodes, and the smallest > has 1 node. T

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread List
On 3/26/15 10:15 PM, Robert Wille wrote: I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10.

Re: Delayed events processing / queue (anti-)pattern

2015-03-27 Thread Brice Dutheil
Would it help here to not actually issue a delete statement but instead use date based compaction and a dynamically calculated ttl that is some safe distance in the future from your key? I’m not sure about about this part *date based compaction*, do you mean DateTieredCompationStrategy ? Anyway w

Re: Replication to second data center with different number of nodes

2015-03-27 Thread Sibbald, Charles
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens So go with a default 256, and leave initial token empty: num_tokens: 256 # initial_token: Cassandra will always give each node the same number of t

('Unable to complete the operation against any hosts', {})

2015-03-27 Thread Rahul Bhardwaj
Hi All, We are using cassandra version 2.1.2 with cqlsh 5.0.1 (cluster of three nodes with rf 2) I need to load around 40 million records into a table of cassandra db. I have created batch of 1 million ( batch of 1 records also gives the same error) in csv format. when I use copy command t

Re: Replication to second data center with different number of nodes

2015-03-27 Thread Björn Hachmann
2015-03-27 11:58 GMT+01:00 Sibbald, Charles : > Cassandra’s Vnodes config ​Thank you. Yes, we are using vnodes! The num_token parameter controls the number of vnodes assigned to a specific node.​ Might be I am seeing problems where are none. Let me rephrase my question: How does Cassandra know

High latencies for simple queries

2015-03-27 Thread Artur Siekielski
I'm running Cassandra locally and I see that the execution time for the simplest queries is 1-2 milliseconds. By a simple query I mean either INSERT or SELECT from a small table with short keys. While this number is not high, it's about 10-20 times slower than Postgresql (even if INSERTs are w

Re: upgrade from 1.0.12 to 1.1.12

2015-03-27 Thread Jason Wee
Rob, the cluster now upgraded to cassandra 1.0.12 (default hd version, in Descriptor.java) and I ensure all sstables in current cluster are hd version before upgrade to cassandra 1.1. I have also checked in cassandra 1.1.12 , the sstable is version hf version. so i guess, nodetool upgradesstables i

Re: Replication to second data center with different number of nodes

2015-03-27 Thread Sibbald, Charles
I would recommend you utilise Cassandra’s Vnodes config and let it manage this itself. This means it will create these and a mange them all on its own and allows quick and easy scaling and boot strapping. From: Björn Hachmann mailto:bjoern.hachm...@metrigo.de>> Reply-To: "user@cassandra.apache

Replication to second data center with different number of nodes

2015-03-27 Thread Björn Hachmann
Hi, we currently plan to add a second data center to our Cassandra-Cluster. I have read about this procedure in the documentation (eg. https://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html), but at least one question remains: Do I have to provide a

Re: Java Driver 2.1 reading counter values from row

2015-03-27 Thread Amila Paranawithana
Hi All, This is possible with cassandra-driver-core-2.1.5, with 'row.getLong("sum")'. Thanks On Fri, Mar 27, 2015 at 2:51 PM, Amila Paranawithana wrote: > in Apache Cassandra Java Driver 2.1 how to read counter type values from a > row when iterating over result set. > > eg: If I have a counte

Re: sstable loader

2015-03-27 Thread Amila Paranawithana
Hi, This post[1] may be useful. But note that this was done with cassandra older version. So there may be new way to do this. [1]. http://amilaparanawithana.blogspot.com/2012/06/bulk-loading-external-data-to-cassandra.html Thanks, On Fri, Mar 27, 2015 at 11:40 AM, Rahul Bhardwaj < rahul.bhard.

Java Driver 2.1 reading counter values from row

2015-03-27 Thread Amila Paranawithana
in Apache Cassandra Java Driver 2.1 how to read counter type values from a row when iterating over result set. eg: If I have a counter table called 'countertable' with key and a counter colum 'sum' how can I read the value of the counter column using Java driver? If I say, row.getInt("sum") this g

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Fabian Siddiqi
Hi Robert, We're trying to do something similar to the OP and finding it a bit difficult. Would it be possible to provide more details about how you're doing it? Thanks. On Fri, Mar 27, 2015 at 3:15 AM, Robert Wille wrote: > I have a cluster which stores tree structures. I keep several hundred