sstable loader

2015-03-27 Thread Rahul Bhardwaj
Hi All,

 Can we use sstable loader for loading external flat file or csv file.
If yes , kindly share the steps or manual.

I need to put 40 million data into a table of around 70 columns



Regards:
Rahul Bhardwaj

-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki 
Kaam 
Yahin Banta Hai 
https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!


Re: Java Driver 2.1 reading counter values from row

2015-03-27 Thread Amila Paranawithana
Hi All,

This is possible with cassandra-driver-core-2.1.5, with
'row.getLong(sum)'.

Thanks

On Fri, Mar 27, 2015 at 2:51 PM, Amila Paranawithana amila1...@gmail.com
wrote:

 in Apache Cassandra Java Driver 2.1 how to read counter type values from a
 row when iterating over result set.

 eg: If I have a counter table called 'countertable' with key and a counter
 colum 'sum' how can I read the value of the counter column using Java
 driver?
 If I say, row.getInt(sum) this gives the following error,

 com.datastax.driver.core.exceptions.InvalidTypeException: Value sum is of
 type counter

 Code ::

 ResultSet results = session.execute(SELECT * FROM simplex.countertable )

 for (Row row : results) {
 System.out.println(row.getString(key),row.getInt(sum)));
 }

 Thanks,
 Amila


 --

  *Amila Iroshani Paranawithana , **Senior Software Engineer* *,
 AdroitLogic http://adroitlogic.org*
   | ☎: +94779747398
 | ✍:  http://amilaparanawithana.blogspot.com
 [image: Facebook] https://www.facebook.com/amila.paranawithana [image:
 Twitter] https://twitter.com/AmilaPara [image: LinkedIn]
 http://www.linkedin.com/profile/view?id=66289851trk=tab_pro [image:
 Skype] amila.paranawithana
 ​




-- 

 *Amila Iroshani Paranawithana , **Senior Software Engineer* *, AdroitLogic
http://adroitlogic.org*
  | ☎: +94779747398
| ✍:  http://amilaparanawithana.blogspot.com
[image: Facebook] https://www.facebook.com/amila.paranawithana [image:
Twitter] https://twitter.com/AmilaPara [image: LinkedIn]
http://www.linkedin.com/profile/view?id=66289851trk=tab_pro [image:
Skype] amila.paranawithana
​


Re: Replication to second data center with different number of nodes

2015-03-27 Thread Sibbald, Charles
I would recommend you utilise Cassandra’s Vnodes config and let it manage this 
itself.

This means it will create these and a mange them all on its own and allows 
quick and easy scaling and boot strapping.

From: Björn Hachmann 
bjoern.hachm...@metrigo.demailto:bjoern.hachm...@metrigo.de
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, 27 March 2015 10:40
To: user user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Replication to second data center with different number of nodes

Hi,

we currently plan to add a second data center to our Cassandra-Cluster. I have 
read about this procedure in the documentation (eg. 
https://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html),
 but at least one question remains:

Do I have to provide appropriate values for num_tokens dependent on the number 
of nodes per data center, or is this handled somehow by the 
NetworkTopologyStrategy?

Example: We currently have 12 nodes each covering 256 tokens. Our second 
datacenter will have three nodes only. Do I have to set num_tokens to 1024 
(12*256/3) for the nodes in that DC?

Thank you very much for your valuable input!

Kind regards
Björn Hachmann
Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views expressed 
may not be official policy, but the personal views of the originator. If you 
have received it in error, please notify the sender by return e-mail and delete 
it from your system. You should not reproduce, distribute, store, retransmit, 
use or disclose its contents to anyone. Please note we reserve the right to 
monitor all e-mail communication through our internal and external networks. 
SKY and the SKY marks are trademarks of Sky plc and Sky International AG and 
are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home 
Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of Sky plc 
(Registration No. 2247735). All of the companies mentioned in this paragraph 
are incorporated in England and Wales and share the same registered office at 
Grant Way, Isleworth, Middlesex TW7 5QD.


Re: upgrade from 1.0.12 to 1.1.12

2015-03-27 Thread Jason Wee
Rob, the cluster now upgraded to cassandra 1.0.12 (default hd version,
in Descriptor.java) and I ensure all sstables in current cluster are
hd version before upgrade to cassandra 1.1. I have also checked in
cassandra 1.1.12 , the sstable is version hf version. so i guess,
nodetool upgradesstables is needed?

Why not scrub? when you run command nodetool upgradesstables , it is
actually scrubing the data? Can you explain ?

Jason

On Fri, Mar 27, 2015 at 7:21 AM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Mar 25, 2015 at 7:16 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 There's no downside to running upgradesstables. I recommend always doing
 it on upgrade just to be safe.


 For the record and just my opinion : I recommend against paying this fixed
 cost when you don't need to.

 It is basically trivial to ascertain whether there is a new version of the
 SSTable format in your new version, without even relying on the canonical
 NEWS.txt. Type nodetool flush and look at the filename of the table that
 was just flushed. If the version component is different from all the other
 SSTables, you definitely need to run upgradesstables. If it isn't, you
 definitely don't.

 If you're going to run something which unnecessarily rewrites all SSTables,
 why not scrub? That'll check the files for corruption while also upgrading
 them as they are written out 1:1...

 =Rob



Replication to second data center with different number of nodes

2015-03-27 Thread Björn Hachmann
Hi,

we currently plan to add a second data center to our Cassandra-Cluster. I
have read about this procedure in the documentation (eg.
https://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html),
but at least one question remains:

Do I have to provide appropriate values for num_tokens dependent on the
number of nodes per data center, or is this handled somehow by the
NetworkTopologyStrategy?

Example: We currently have 12 nodes each covering 256 tokens. Our second
datacenter will have three nodes only. Do I have to set num_tokens to 1024
(12*256/3) for the nodes in that DC?

Thank you very much for your valuable input!

Kind regards
Björn Hachmann


High latencies for simple queries

2015-03-27 Thread Artur Siekielski
I'm running Cassandra locally and I see that the execution time for the 
simplest queries is 1-2 milliseconds. By a simple query I mean either 
INSERT or SELECT from a small table with short keys.


While this number is not high, it's about 10-20 times slower than 
Postgresql (even if INSERTs are wrapped in transactions). I know that 
the nature of Cassandra compared to Postgresql is different, but for 
some scenarios this difference can matter.


The question is: is it normal for Cassandra to have a minimum latency of 
1 millisecond?


I'm using Cassandra 2.1.2, python-driver.




Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Fabian Siddiqi
Hi Robert,

We're trying to do something similar to the OP and finding it a bit
difficult. Would it be possible to provide more details about how you're
doing it?

Thanks.

On Fri, Mar 27, 2015 at 3:15 AM, Robert Wille rwi...@fold3.com wrote:

 I have a cluster which stores tree structures. I keep several hundred
 unrelated trees. The largest has about 180 million nodes, and the smallest
 has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in
 practice is probably less than 10. I am able to page through children and
 siblings. It works really well.

 Doesn’t sound like its exactly like what you’re looking for, but if you
 want any pointers on how I went about implementing mine, I’d be happy to
 share.

 On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote:

  Not sure if this is the right place to ask, but we are trying to model a
 user-generated tree hierarchy in which they create child objects of a root
 node, and can create an arbitrary number of children (and children of
 children, and on and on).  So far we have looked at storing each tree
 structure as a single document in JSON format and reading/writing it out in
 it's entirety, doing materialized paths where we store the root id with
 every child and the tree structure above the child as a map, and some form
 of an adjacency list (which does not appear to be very viable as looking up
 the entire tree would be ridiculous).
 
  The hope is to end up with a data model that allows us to display the
 entire tree quickly, as well as see the entire path to a leaf when
 selecting that leaf.  If anyone has some suggestions/experience on how to
 model such a tree heirarchy we would greatly appreciate your input.
 




-- 
Fabian Siddiqi
Software Engineer
T: (+44) 776 335 1398


Java Driver 2.1 reading counter values from row

2015-03-27 Thread Amila Paranawithana
in Apache Cassandra Java Driver 2.1 how to read counter type values from a
row when iterating over result set.

eg: If I have a counter table called 'countertable' with key and a counter
colum 'sum' how can I read the value of the counter column using Java
driver?
If I say, row.getInt(sum) this gives the following error,

com.datastax.driver.core.exceptions.InvalidTypeException: Value sum is of
type counter

Code ::

ResultSet results = session.execute(SELECT * FROM simplex.countertable )

for (Row row : results) {
System.out.println(row.getString(key),row.getInt(sum)));
}

Thanks,
Amila


-- 

 *Amila Iroshani Paranawithana , **Senior Software Engineer* *, AdroitLogic
http://adroitlogic.org*
  | ☎: +94779747398
| ✍:  http://amilaparanawithana.blogspot.com
[image: Facebook] https://www.facebook.com/amila.paranawithana [image:
Twitter] https://twitter.com/AmilaPara [image: LinkedIn]
http://www.linkedin.com/profile/view?id=66289851trk=tab_pro [image:
Skype] amila.paranawithana
​


Re: sstable loader

2015-03-27 Thread Amila Paranawithana
Hi,

This post[1] may be useful. But note that this was done with cassandra
older version. So there may be new way to do this.

[1].
http://amilaparanawithana.blogspot.com/2012/06/bulk-loading-external-data-to-cassandra.html

Thanks,


On Fri, Mar 27, 2015 at 11:40 AM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 Hi All,

  Can we use sstable loader for loading external flat file or csv file.
 If yes , kindly share the steps or manual.

 I need to put 40 million data into a table of around 70 columns



 Regards:
 Rahul Bhardwaj





 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai
 https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!




-- 

 *Amila Iroshani Paranawithana , **Senior Software Engineer* *, AdroitLogic
http://adroitlogic.org*
  | ☎: +94779747398
| ✍:  http://amilaparanawithana.blogspot.com
[image: Facebook] https://www.facebook.com/amila.paranawithana [image:
Twitter] https://twitter.com/AmilaPara [image: LinkedIn]
http://www.linkedin.com/profile/view?id=66289851trk=tab_pro [image:
Skype] amila.paranawithana
​


Re: Replication to second data center with different number of nodes

2015-03-27 Thread Sibbald, Charles
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens

So go with a default 256, and leave initial token empty:


num_tokens: 256

# initial_token:

Cassandra will always give each node the same number of tokens, the only time 
you might want to distribute this is if your instances are of different 
sizing/capability which is also a bad scenario.

From: Björn Hachmann 
bjoern.hachm...@metrigo.demailto:bjoern.hachm...@metrigo.de
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, 27 March 2015 12:11
To: user user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Replication to second data center with different number of nodes


2015-03-27 11:58 GMT+01:00 Sibbald, Charles 
charles.sibb...@bskyb.commailto:charles.sibb...@bskyb.com:
Cassandra’s Vnodes config

​Thank you. Yes, we are using vnodes! The num_token parameter controls the 
number of vnodes assigned to a specific node.​

Might be I am seeing problems where are none.

Let me rephrase my question: How does Cassandra know it has to replicate 1/3 of 
all keys to each single node in the second DC? I can see two ways:
 1. It has to be configured explicitly.
 2. It is derived from the number of nodes available in the data center at the 
time `nodetool rebuild` is started.

Kind regards
Björn
Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views expressed 
may not be official policy, but the personal views of the originator. If you 
have received it in error, please notify the sender by return e-mail and delete 
it from your system. You should not reproduce, distribute, store, retransmit, 
use or disclose its contents to anyone. Please note we reserve the right to 
monitor all e-mail communication through our internal and external networks. 
SKY and the SKY marks are trademarks of Sky plc and Sky International AG and 
are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home 
Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of Sky plc 
(Registration No. 2247735). All of the companies mentioned in this paragraph 
are incorporated in England and Wales and share the same registered office at 
Grant Way, Isleworth, Middlesex TW7 5QD.


('Unable to complete the operation against any hosts', {})

2015-03-27 Thread Rahul Bhardwaj
Hi All,



We are using  cassandra version 2.1.2 with cqlsh 5.0.1 (cluster of three
nodes with rf 2)

I need to load around 40 million records into a table of cassandra db. I
have created batch of 1 million ( batch of 1 records also gives the
same error) in csv format. when I use copy command to import I got this
error, which is causing problem.

cqlsh:mesh_glusr copy
glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area)
from 'gl_a' with delimiter = '\t' and QUOTE = '';

Processed 36000 rows; Write: 1769.07 rows/s
Record has the wrong number of fields (9 instead of 10).
Aborting import at record #36769. Previously-inserted values still present.
36669 rows imported in 20.571 seconds.
cqlsh:mesh_glusr copy
glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area)
from 'gl_a' with delimiter = '\t' and QUOTE = '';
Processed 185000 rows; Write: 1800.91 rows/s
Record has the wrong number of fields (9 instead of 10).
Aborting import at record #185607. Previously-inserted values still present.
185507 rows imported in 1 minute and 43.428 seconds.

[cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh use mesh_glusr ;
cqlsh:mesh_glusr copy
glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area)
from 'gl_a1' with delimiter = '\t' and QUOTE = '';
Processed 373000 rows; Write: 1741.23 rows/s
('Unable to complete the operation against any hosts', {})
Aborting import at record #373269. Previously-inserted values still present.


When we remove already inserted records from file and on again starting the
command for rest data, it inserts few more records and gives the same error
without any specific.

please help if any one have some idea about this error.



Regards:
Rahul Bhardwaj

-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki 
Kaam 
Yahin Banta Hai 
https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!


Re: Replication to second data center with different number of nodes

2015-03-27 Thread Björn Hachmann
2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com:

 Cassandra’s Vnodes config


​Thank you. Yes, we are using vnodes! The num_token parameter controls the
number of vnodes assigned to a specific node.​

Might be I am seeing problems where are none.

Let me rephrase my question: How does Cassandra know it has to replicate
1/3 of all keys to each single node in the second DC? I can see two ways:
 1. It has to be configured explicitly.
 2. It is derived from the number of nodes available in the data center at
the time `nodetool rebuild` is started.

Kind regards
Björn


Re: Delayed events processing / queue (anti-)pattern

2015-03-27 Thread Brice Dutheil
Would it help here to not actually issue a delete statement but instead use
date based compaction and a dynamically calculated ttl that is some safe
distance in the future from your key?

I’m not sure about about this part *date based compaction*, do you mean
DateTieredCompationStrategy ?

Anyway we achieved something like that without this strategy with a TTL +
date in partition key based approach. The thing however to watch is the
size of the partition (one should avoid too long partitions (in thrift wide
rows)), so care must be taken on the date increment to be correctly
adjusted.
​

-- Brice

On Thu, Mar 26, 2015 at 5:23 PM, Robin Verlangen ro...@us2.nl wrote:

 Interesting thought, that should work indeed, I'll evaluate both options
 and provide an update here once I have results.

 Best regards,

 Robin Verlangen
 *Chief Data Architect*

 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC
 *What is CloudPelican? http://goo.gl/HkB3D*

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 On Thu, Mar 26, 2015 at 7:09 AM, Thunder Stumpges 
 thunder.stump...@gmail.com wrote:

 Would it help here to not actually issue a delete statement but instead
 use date based compaction and a dynamically calculated ttl that is some
 safe distance in the future from your key?

 Just a thought.
 -Thunder
  On Mar 25, 2015 11:07 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Mar 25, 2015 at 12:45 AM, Robin Verlangen ro...@us2.nl wrote:

 @Robert: can you elaborate a bit more on the not ideal parts? In my
 case I will be throwing away the rows (thus the points in time that are
 now in the past), which will create tombstones which are compacted away.


 Not ideal is what I mean... Cassandra has immutable data files, use
 cases which do DELETE pay an obvious penalty. Some percentage of tombstones
 will exist continuously, and you have to store them and seek past them.

 =Rob






Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread List

On 3/26/15 10:15 PM, Robert Wille wrote:

I have a cluster which stores tree structures. I keep several hundred unrelated 
trees. The largest has about 180 million nodes, and the smallest has 1 node. 
The largest fanout is almost 400K. Depth is arbitrary, but in practice is 
probably less than 10. I am able to page through children and siblings. It 
works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want 
any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote:


Not sure if this is the right place to ask, but we are trying to model a 
user-generated tree hierarchy in which they create child objects of a root 
node, and can create an arbitrary number of children (and children of children, 
and on and on).  So far we have looked at storing each tree structure as a 
single document in JSON format and reading/writing it out in it's entirety, 
doing materialized paths where we store the root id with every child and the 
tree structure above the child as a map, and some form of an adjacency list 
(which does not appear to be very viable as looking up the entire tree would be 
ridiculous).

The hope is to end up with a data model that allows us to display the entire 
tree quickly, as well as see the entire path to a leaf when selecting that 
leaf.  If anyone has some suggestions/experience on how to model such a tree 
heirarchy we would greatly appreciate your input.





Robert,

This certainly sounds like a step in the right direction so yes please 
do share!  Thank you.




Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Jonathan Haddad
I'd be interested to see that data model. I think the entire list would
benefit!
On Thu, Mar 26, 2015 at 8:16 PM Robert Wille rwi...@fold3.com wrote:

 I have a cluster which stores tree structures. I keep several hundred
 unrelated trees. The largest has about 180 million nodes, and the smallest
 has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in
 practice is probably less than 10. I am able to page through children and
 siblings. It works really well.

 Doesn’t sound like its exactly like what you’re looking for, but if you
 want any pointers on how I went about implementing mine, I’d be happy to
 share.

 On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote:

  Not sure if this is the right place to ask, but we are trying to model a
 user-generated tree hierarchy in which they create child objects of a root
 node, and can create an arbitrary number of children (and children of
 children, and on and on).  So far we have looked at storing each tree
 structure as a single document in JSON format and reading/writing it out in
 it's entirety, doing materialized paths where we store the root id with
 every child and the tree structure above the child as a map, and some form
 of an adjacency list (which does not appear to be very viable as looking up
 the entire tree would be ridiculous).
 
  The hope is to end up with a data model that allows us to display the
 entire tree quickly, as well as see the entire path to a leaf when
 selecting that leaf.  If anyone has some suggestions/experience on how to
 model such a tree heirarchy we would greatly appreciate your input.
 




Re: upgrade from 1.0.12 to 1.1.12

2015-03-27 Thread Jonathan Haddad
Running upgrade is a noop if the tables don't need to be upgraded. I
consider the cost of this to be less than the cost of missing an upgrade.
On Thu, Mar 26, 2015 at 4:23 PM Robert Coli rc...@eventbrite.com wrote:

 On Wed, Mar 25, 2015 at 7:16 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 There's no downside to running upgradesstables. I recommend always doing
 it on upgrade just to be safe.


 For the record and just my opinion : I recommend against paying this fixed
 cost when you don't need to.

 It is basically trivial to ascertain whether there is a new version of the
 SSTable format in your new version, without even relying on the canonical
 NEWS.txt. Type nodetool flush and look at the filename of the table that
 was just flushed. If the version component is different from all the other
 SSTables, you definitely need to run upgradesstables. If it isn't, you
 definitely don't.

 If you're going to run something which unnecessarily rewrites all
 SSTables, why not scrub? That'll check the files for corruption while also
 upgrading them as they are written out 1:1...

 =Rob




Re: Delayed events processing / queue (anti-)pattern

2015-03-27 Thread Thunder Stumpges
Yeah that's the one :) sorry, was on my phone and didn't want to look up
the exact name.

Cheers,
Thunder
 On Mar 27, 2015 6:17 AM, Brice Dutheil brice.duth...@gmail.com wrote:

 Would it help here to not actually issue a delete statement but instead
 use date based compaction and a dynamically calculated ttl that is some
 safe distance in the future from your key?

 I’m not sure about about this part *date based compaction*, do you mean
 DateTieredCompationStrategy ?

 Anyway we achieved something like that without this strategy with a TTL +
 date in partition key based approach. The thing however to watch is the
 size of the partition (one should avoid too long partitions (in thrift wide
 rows)), so care must be taken on the date increment to be correctly
 adjusted.
 ​

 -- Brice

 On Thu, Mar 26, 2015 at 5:23 PM, Robin Verlangen ro...@us2.nl wrote:

 Interesting thought, that should work indeed, I'll evaluate both options
 and provide an update here once I have results.

 Best regards,

 Robin Verlangen
 *Chief Data Architect*

 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC
 *What is CloudPelican? http://goo.gl/HkB3D*

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 On Thu, Mar 26, 2015 at 7:09 AM, Thunder Stumpges 
 thunder.stump...@gmail.com wrote:

 Would it help here to not actually issue a delete statement but instead
 use date based compaction and a dynamically calculated ttl that is some
 safe distance in the future from your key?

 Just a thought.
 -Thunder
  On Mar 25, 2015 11:07 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Mar 25, 2015 at 12:45 AM, Robin Verlangen ro...@us2.nl wrote:

 @Robert: can you elaborate a bit more on the not ideal parts? In my
 case I will be throwing away the rows (thus the points in time that are
 now in the past), which will create tombstones which are compacted away.


 Not ideal is what I mean... Cassandra has immutable data files, use
 cases which do DELETE pay an obvious penalty. Some percentage of tombstones
 will exist continuously, and you have to store them and seek past them.

 =Rob







Re: High latencies for simple queries

2015-03-27 Thread Tyler Hobbs
Just to check, are you concerned about minimizing that latency or
maximizing throughput?

I'll that latency is what you're actually concerned about.  A fair amount
of that latency is probably happening in the python driver.  Although it
can easily execute ~8k operations per second (using cpython), in some
scenarios it can be difficult to guarantee sub-ms latency for an individual
query due to how some of the internals work.  In particular, it uses
python's Conditions for cross-thread signalling (from the event loop thread
to the application thread).  Unfortunately, python's Condition
implementation includes a loop with a minimum sleep of 1ms if the Condition
isn't already set when you start the wait() call.  This is why, with a
single application thread, you will typically see a minimum of 1ms latency.

Another source of similar latencies for the python driver is the Asyncore
event loop, which is used when libev isn't available.  I would make sure
that you can use the LibevConnection class with the driver to avoid this.

On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski a...@vhex.net wrote:

 I'm running Cassandra locally and I see that the execution time for the
 simplest queries is 1-2 milliseconds. By a simple query I mean either
 INSERT or SELECT from a small table with short keys.

 While this number is not high, it's about 10-20 times slower than
 Postgresql (even if INSERTs are wrapped in transactions). I know that the
 nature of Cassandra compared to Postgresql is different, but for some
 scenarios this difference can matter.

 The question is: is it normal for Cassandra to have a minimum latency of 1
 millisecond?

 I'm using Cassandra 2.1.2, python-driver.





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: High latencies for simple queries

2015-03-27 Thread Artur Siekielski
Yes, I'm concerned about the latency. Throughput can be high even when 
using Python: http://datastax.github.io/python-driver/performance.html. 
But in my scenarios I need to run queries sequentially, so latencies 
matter. And Cassandra requires issuing more queries than SQL databases 
so these latencies can add up to a significant amount.


I was running Asyncore event loop, because it looks like libev isn't 
supported for PyPy which I'm using. I've switched to CPython and 
LibevConnection for a moment and I don't think I've noticed a major 
speedup, and a minimum latency is still 1ms.


Overall, it looks to me that the issue is not that important, because 
using multi-master, multi-dc databases always involve getting higher and 
somewhat unpredictable latencies, so relying on sub-millisecond 
latencies on production clusters is not very realistic.



On 03/27/2015 04:28 PM, Tyler Hobbs wrote:

Just to check, are you concerned about minimizing that latency or
maximizing throughput?

I'll that latency is what you're actually concerned about.  A fair
amount of that latency is probably happening in the python driver.
Although it can easily execute ~8k operations per second (using
cpython), in some scenarios it can be difficult to guarantee sub-ms
latency for an individual query due to how some of the internals work.
In particular, it uses python's Conditions for cross-thread signalling
(from the event loop thread to the application thread).  Unfortunately,
python's Condition implementation includes a loop with a minimum sleep
of 1ms if the Condition isn't already set when you start the wait()
call.  This is why, with a single application thread, you will typically
see a minimum of 1ms latency.

Another source of similar latencies for the python driver is the
Asyncore event loop, which is used when libev isn't available.  I would
make sure that you can use the LibevConnection class with the driver to
avoid this.

On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski a...@vhex.net
mailto:a...@vhex.net wrote:

I'm running Cassandra locally and I see that the execution time for
the simplest queries is 1-2 milliseconds. By a simple query I mean
either INSERT or SELECT from a small table with short keys.

While this number is not high, it's about 10-20 times slower than
Postgresql (even if INSERTs are wrapped in transactions). I know
that the nature of Cassandra compared to Postgresql is different,
but for some scenarios this difference can matter.

The question is: is it normal for Cassandra to have a minimum
latency of 1 millisecond?

I'm using Cassandra 2.1.2, python-driver.




Re: High latencies for simple queries

2015-03-27 Thread Artur Siekielski
I think that in your example Postgres spends most time on waiting for 
fsync() to complete. On Linux, for a battery-backed raid controller, 
it's safe to mount ext4 filesystem with barrier=0 option which 
improves fsync() performance a lot. I have partitions mounted with this 
option and I did a test from Python, using psycopg2 driver, and I got 
the following latencies, in milliseconds:

- INSERT without COMMIT: 0.04
- INSERT with COMMIT: 0.12
- SELECT: 0.05
I'm also repeating benchmark runs multiple times (I'm using Python's 
timeit module).


On 03/27/2015 07:58 PM, Ben Bromhead wrote:

Latency can be so variable even when testing things locally. I quickly
fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh CREATE KEYSPACE foo WITH replication = { 'class' :
'SimpleStrategy', 'replication_factor' : 1 };
cqlsh USE foo;
cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo TRACING ON;
Now tracing requests.
cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay');





Re: High latencies for simple queries

2015-03-27 Thread Tyler Hobbs
Since you're executing queries sequentially, you may want to look into
using callback chaining to avoid the cross-thread signaling that results in
the 1ms latencies.  Basically, just use session.execute_async() and attach
a callback to the returned future that will execute your next query.  The
callback is executed on the event loop thread.  The main downsides to this
are that you need to be careful to avoid blocking the event loop thread
(including executing session.execute() or prepare()) and you need to ensure
that all exceptions raised in the callback are handled by your application
code.

On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski a...@vhex.net wrote:

 I think that in your example Postgres spends most time on waiting for
 fsync() to complete. On Linux, for a battery-backed raid controller, it's
 safe to mount ext4 filesystem with barrier=0 option which improves
 fsync() performance a lot. I have partitions mounted with this option and I
 did a test from Python, using psycopg2 driver, and I got the following
 latencies, in milliseconds:
 - INSERT without COMMIT: 0.04
 - INSERT with COMMIT: 0.12
 - SELECT: 0.05
 I'm also repeating benchmark runs multiple times (I'm using Python's
 timeit module).


 On 03/27/2015 07:58 PM, Ben Bromhead wrote:

 Latency can be so variable even when testing things locally. I quickly
 fired up postgres and did the following with psql:

 ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
 CREATE TABLE
 ben=# \timing
 Timing is on.
 ben=# INSERT INTO foo VALUES(2, 'yay');
 INSERT 0 1
 Time: 1.162 ms
 ben=# INSERT INTO foo VALUES(3, 'yay');
 INSERT 0 1
 Time: 1.108 ms

 I then fired up a local copy of Cassandra (2.0.12)

 cqlsh CREATE KEYSPACE foo WITH replication = { 'class' :
 'SimpleStrategy', 'replication_factor' : 1 };
 cqlsh USE foo;
 cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text);
 cqlsh:foo TRACING ON;
 Now tracing requests.
 cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay');





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: High latencies for simple queries

2015-03-27 Thread Ben Bromhead
Latency can be so variable even when testing things locally. I quickly
fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh CREATE KEYSPACE foo WITH replication = { 'class' : 'SimpleStrategy',
'replication_factor' : 1 };
cqlsh USE foo;
cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo TRACING ON;
Now tracing requests.
cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay');

Tracing session: 7a7dced0-d4b2-11e4-b950-85c3c9bd91a0

 activity  | timestamp| source
   | source_elapsed
---+--+---+
execute_cql3_query | 11:52:55,229 |
127.0.0.1 |  0
 Parsing INSERT INTO foo (i, j) VALUES (1, 'yay'); | 11:52:55,229 |
127.0.0.1 | 43
   Preparing statement | 11:52:55,229 |
127.0.0.1 |141
 Determining replicas for mutation | 11:52:55,229 |
127.0.0.1 |291
Acquiring switchLock read lock | 11:52:55,229 |
127.0.0.1 |403
Appending to commitlog | 11:52:55,229 |
127.0.0.1 |413
Adding to foo memtable | 11:52:55,229 |
127.0.0.1 |432
  Request complete | 11:52:55,229 |
127.0.0.1 |541

All this on a mac book pro with 16gb of memory and an SSD

So ymmv?

On 27 March 2015 at 08:28, Tyler Hobbs ty...@datastax.com wrote:

 Just to check, are you concerned about minimizing that latency or
 maximizing throughput?

 I'll that latency is what you're actually concerned about.  A fair amount
 of that latency is probably happening in the python driver.  Although it
 can easily execute ~8k operations per second (using cpython), in some
 scenarios it can be difficult to guarantee sub-ms latency for an individual
 query due to how some of the internals work.  In particular, it uses
 python's Conditions for cross-thread signalling (from the event loop thread
 to the application thread).  Unfortunately, python's Condition
 implementation includes a loop with a minimum sleep of 1ms if the Condition
 isn't already set when you start the wait() call.  This is why, with a
 single application thread, you will typically see a minimum of 1ms latency.

 Another source of similar latencies for the python driver is the Asyncore
 event loop, which is used when libev isn't available.  I would make sure
 that you can use the LibevConnection class with the driver to avoid this.

 On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski a...@vhex.net wrote:

 I'm running Cassandra locally and I see that the execution time for the
 simplest queries is 1-2 milliseconds. By a simple query I mean either
 INSERT or SELECT from a small table with short keys.

 While this number is not high, it's about 10-20 times slower than
 Postgresql (even if INSERTs are wrapped in transactions). I know that the
 nature of Cassandra compared to Postgresql is different, but for some
 scenarios this difference can matter.

 The question is: is it normal for Cassandra to have a minimum latency of
 1 millisecond?

 I'm using Cassandra 2.1.2, python-driver.





 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692


Re: cassandra source code

2015-03-27 Thread Divya Divs
hi
I hav run the source of cassandra in eclipse juno by following this
document
http://brianoneill.blogspot.in/2015/03/getting-started-with-cassandra.html.
but i'm getting the exceptions. please help to solve this.

INFO  17:43:40 Node localhost/127.0.0.1 state jump to normal
INFO  17:43:41 Netty using Java NIO event loop
INFO  17:43:41 Using Netty Version:
[netty-buffer=netty-buffer-4.0.23.Final.208198c,
netty-codec=netty-codec-4.0.23.Final.208198c,
netty-codec-http=netty-codec-http-4.0.23.Final.208198c,
netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c,
netty-common=netty-common-4.0.23.Final.208198c,
netty-handler=netty-handler-4.0.23.Final.208198c,
netty-transport=netty-transport-4.0.23.Final.208198c,
netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c,
netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c,
netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]
INFO  17:43:41 Starting listening for CQL clients on
localhost/127.0.0.1:9042...
Exception (java.lang.IllegalStateException) encountered during startup:
Failed to bind port 9042 on 127.0.0.1.
java.lang.IllegalStateException: Failed to bind port 9042 on 127.0.0.1.
ERROR 17:43:41 Exception encountered during startup
java.lang.IllegalStateException: Failed to bind port 9042 on 127.0.0.1.
at org.apache.cassandra.transport.Server.run(Server.java:179) ~[main/:na]
at org.apache.cassandra.transport.Server.start(Server.java:119) ~[main/:na]
at
org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:428)
[main/:na]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:505)
[main/:na]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:599)
[main/:na]
ERROR 17:43:41 Exception encountered during startup
java.lang.IllegalStateException: Failed to bind port 9042 on 127.0.0.1.
at org.apache.cassandra.transport.Server.run(Server.java:179) ~[main/:na]
at org.apache.cassandra.transport.Server.start(Server.java:119) ~[main/:na]
at
org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:428)
[main/:na]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:505)
[main/:na]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:599)
[main/:na]
at org.apache.cassandra.transport.Server.run(Server.java:179)
at org.apache.cassandra.transport.Server.start(Server.java:119)
at
org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:428)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:505)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:599)
INFO  17:43:41 Announcing shutdown
INFO  17:43:43 Waiting for messaging service to quiesce
INFO  17:43:43 MessagingService has terminated the accept() thread


  --
 *From:* Divya Divs divya.divi2...@gmail.com
 *Sent:* Tuesday, March 24, 2015 10:59 AM
 *To:* user@cassandra.apache.org; Jason Wee; Eric Stevens
 *Subject:* cassandra source code

   Hi
 I'm Divya, I'm trying to run the source code of cassandra in eclipse.
 I'm taking the source code from github. I'm using windows 64-bit, I'm
 following the instructions from this website.
 http://runningcassandraineclipse.blogspot.in/. In the github
 cassandra-trunk, conf/log4j-server.properies directories and  
 org.apache.cassandra.thrift.CassandraDaemon,
 main class is not there. please give me a document to run the source code
 of cassandra. Please kindly help me to proceed. Please reply me as soon as
 possible.
Thanking you




 This electronic mail (including any attachment thereto) may be
 confidential and privileged and is intended only for the individual or
 entity named above. Any unauthorized use, printing, copying, disclosure or
 dissemination of this communication may be subject to legal restriction or
 sanction. Accordingly, if you are not the intended recipient, please notify
 the sender by replying to this email immediately and delete this email (and
 any attachment thereto) from your computer system...Thank You.





Re: High latencies for simple queries

2015-03-27 Thread Laing, Michael
I use callback chaining with the python driver and can confirm that it is
very fast.

You can chain the chains together to perform sequential processing. I do
this when retrieving metadata and then the referenced payload for
example, when the metadata has been inverted and the payload is larger than
we want to invert. And you can be running multiple chains of chains
asynchronously - cascade state by employing the userdata of the future.

We also multiprocess, for more parallelism, and we distribute work to
multiple multiprocessing instances using a message broker for yet more
parallel activity, as well as reliability.

ml

On Fri, Mar 27, 2015 at 4:28 PM, Tyler Hobbs ty...@datastax.com wrote:

 Since you're executing queries sequentially, you may want to look into
 using callback chaining to avoid the cross-thread signaling that results in
 the 1ms latencies.  Basically, just use session.execute_async() and attach
 a callback to the returned future that will execute your next query.  The
 callback is executed on the event loop thread.  The main downsides to this
 are that you need to be careful to avoid blocking the event loop thread
 (including executing session.execute() or prepare()) and you need to ensure
 that all exceptions raised in the callback are handled by your application
 code.

 On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski a...@vhex.net wrote:

 I think that in your example Postgres spends most time on waiting for
 fsync() to complete. On Linux, for a battery-backed raid controller, it's
 safe to mount ext4 filesystem with barrier=0 option which improves
 fsync() performance a lot. I have partitions mounted with this option and I
 did a test from Python, using psycopg2 driver, and I got the following
 latencies, in milliseconds:
 - INSERT without COMMIT: 0.04
 - INSERT with COMMIT: 0.12
 - SELECT: 0.05
 I'm also repeating benchmark runs multiple times (I'm using Python's
 timeit module).


 On 03/27/2015 07:58 PM, Ben Bromhead wrote:

 Latency can be so variable even when testing things locally. I quickly
 fired up postgres and did the following with psql:

 ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
 CREATE TABLE
 ben=# \timing
 Timing is on.
 ben=# INSERT INTO foo VALUES(2, 'yay');
 INSERT 0 1
 Time: 1.162 ms
 ben=# INSERT INTO foo VALUES(3, 'yay');
 INSERT 0 1
 Time: 1.108 ms

 I then fired up a local copy of Cassandra (2.0.12)

 cqlsh CREATE KEYSPACE foo WITH replication = { 'class' :
 'SimpleStrategy', 'replication_factor' : 1 };
 cqlsh USE foo;
 cqlsh:foo CREATE TABLE foo(i int PRIMARY KEY, j text);
 cqlsh:foo TRACING ON;
 Now tracing requests.
 cqlsh:foo INSERT INTO foo (i, j) VALUES (1, 'yay');





 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Ben Bromhead
+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad j...@jonhaddad.com wrote:

 I'd be interested to see that data model. I think the entire list would
 benefit!

 On Thu, Mar 26, 2015 at 8:16 PM Robert Wille rwi...@fold3.com wrote:

 I have a cluster which stores tree structures. I keep several hundred
 unrelated trees. The largest has about 180 million nodes, and the smallest
 has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in
 practice is probably less than 10. I am able to page through children and
 siblings. It works really well.

 Doesn’t sound like its exactly like what you’re looking for, but if you
 want any pointers on how I went about implementing mine, I’d be happy to
 share.

 On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote:

  Not sure if this is the right place to ask, but we are trying to model
 a user-generated tree hierarchy in which they create child objects of a
 root node, and can create an arbitrary number of children (and children of
 children, and on and on).  So far we have looked at storing each tree
 structure as a single document in JSON format and reading/writing it out in
 it's entirety, doing materialized paths where we store the root id with
 every child and the tree structure above the child as a map, and some form
 of an adjacency list (which does not appear to be very viable as looking up
 the entire tree would be ridiculous).
 
  The hope is to end up with a data model that allows us to display the
 entire tree quickly, as well as see the entire path to a leaf when
 selecting that leaf.  If anyone has some suggestions/experience on how to
 model such a tree heirarchy we would greatly appreciate your input.
 




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692


Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Jack Krupansky
Hmmm... If you serialize the tree properly in a partition, you could always
read an entire sub-tree as a single slice (consecutive CQL rows.) Is there
much more to it?

-- Jack Krupansky

On Fri, Mar 27, 2015 at 7:35 PM, Ben Bromhead b...@instaclustr.com wrote:

 +1 would love to see how you do it

 On 27 March 2015 at 07:18, Jonathan Haddad j...@jonhaddad.com wrote:

 I'd be interested to see that data model. I think the entire list would
 benefit!

 On Thu, Mar 26, 2015 at 8:16 PM Robert Wille rwi...@fold3.com wrote:

 I have a cluster which stores tree structures. I keep several hundred
 unrelated trees. The largest has about 180 million nodes, and the smallest
 has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in
 practice is probably less than 10. I am able to page through children and
 siblings. It works really well.

 Doesn’t sound like its exactly like what you’re looking for, but if you
 want any pointers on how I went about implementing mine, I’d be happy to
 share.

 On Mar 26, 2015, at 3:05 PM, List l...@airstreamcomm.net wrote:

  Not sure if this is the right place to ask, but we are trying to model
 a user-generated tree hierarchy in which they create child objects of a
 root node, and can create an arbitrary number of children (and children of
 children, and on and on).  So far we have looked at storing each tree
 structure as a single document in JSON format and reading/writing it out in
 it's entirety, doing materialized paths where we store the root id with
 every child and the tree structure above the child as a map, and some form
 of an adjacency list (which does not appear to be very viable as looking up
 the entire tree would be ridiculous).
 
  The hope is to end up with a data model that allows us to display the
 entire tree quickly, as well as see the entire path to a leaf when
 selecting that leaf.  If anyone has some suggestions/experience on how to
 model such a tree heirarchy we would greatly appreciate your input.
 




 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | (650) 284 9692



Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Robert Wille
Okay, this is going to be a pretty long post, but I think its an interesting 
data model, and hopefully someone will find it worth going through.

First, I think it will be easier to understand the modeling choices I made if 
you see the end product. Go to 
http://www.fold3.com/browse.php#249|hzUkLqDmIhttp://www.fold3.com/browse.php#249%7ChzUkLqDmI.
 What you see looks like one big tree, but actually is a combination of trees 
spliced together. There is one tree in a relational database that forms what I 
call the top-level browse. The top-level browse is used to navigate through 
categories until you arrive at a publication. When you drill down into a 
publication, you are then viewing data stored in Cassandra. The link provided 
above points to the root of a publication (in this case, maps from the Civil 
War), so to the left is top-level browse coming from MySQL, and to the right is 
the Cassandra browse. Each publication has an independent tree in Cassandra, 
with all trees stored in the same set of tables (I do not dynamically create 
tables for each publication — I personally think that’s a bad practice). We 
currently have 458 publications, and collectively they have about half a 
billion nodes and consume about 400 GB (RF=3).

My trees are immutable. When there are changes to a publication (e.g. adding 
new documents), it is very difficult to know what changes need to be made to 
the tree to edit it in-place. Also, it would be impossible to maintain 
navigational consistency while a tree is in process of being restructured. So, 
when a publication changes, I create a completely new tree. Once the new tree 
is built, I change a reference to point to the new tree. I have a process that 
nightly pages through the tables and deletes records that belong to obsolete 
trees. This process takes about five hours. If it were much longer than that, I 
would probably run it weekly. My database has quite a bit of churn, which is 
fairly unusual for a Cassandra-based application. Most nights build two or 
three trees, generally resulting in a few tens of millions of new records and a 
slightly fewer number of deletions. Size-tiered compaction is a bad choice for 
churn, so I use leveled compaction. Most publications are at most a few million 
nodes, and generally build in less than 20 minutes.

Since any modeling exercise requires knowing the queries, I should describe 
that before getting into the model. Here are the features I need to support. 
For browsing the tree, I need to be able to get the children of a node 
(paginated), the siblings of a node (also paginated), and the ancestors of a 
node. The leaves of each tree are images and form a filmstrip. You can use the 
filmstrip to navigate through all the images in a publication in the tree’s 
natural order. If you go to my browse page and keep drilling down, you’ll 
eventually get to an image. The filmstrip appears at the bottom of the image 
viewer.

Before I discuss the schema, I should discuss a couple of other non-obvious 
things that are relevant to the data model. One very common operation is to 
retrieve a node and all of its ancestors in order to display a path. 
Denormalization would suggest that I store the data for each node, along with 
that of all of its ancestors. That would mean that in my biggest tree, I would 
store the root node 180 million times. I didn’t consider that kind of bloat to 
be acceptable, so I do not denormalize ancestors. I also wanted to retrieve a 
node and its ancestors in constant time, rather than O(n) as would be typical 
for tree traversal. In order to accomplish this, I use a pretty unique idea for 
a node's primary key. I create a hash from information in the node, and then 
append it to the hash of its parent. So, the primary key is really a path. When 
I need to retrieve a node and its ancestors, I tokenize the path and issue 
queries in parallel to get all the nodes in the ancestry at the same time. In 
keeping with this pattern of not denormalizing, my auxiliary tables do not have 
node data in them, but instead provide a means of getting hash paths, which I 
then tokenize and make parallel requests with. Most requests that use an 
auxiliary table can generally just make a query to the auxiliary table to get 
the hash path, and then retrieve the node and its ancestors from the node 
table. Three or fewer trips to Cassandra are sufficient for all my API’s.

Without further ado, here’s my schema (with commentary):

CREATE TABLE tree (
tree INT,
pub INT,
rhpath VARCHAR,
atime TIMESTAMP,
ccount INT,
ncount INT,
PRIMARY KEY (tree)
) WITH gc_grace_seconds = 864000;

This table maintains the references to the root nodes for each tree. pub is the 
primary key for the publication table in my relational database. There is 
usually just one record for each publication. When a tree is being built (and 
until the old one is retired), a publication may have more than one tree. This 
table is small (458 records), and I cache