Re: Query first 1 columns for each partitioning keys in CQL?
Hello, Thank you for your addressing. But I consider LIMIT to be a keyword to limits result numbers from WHOLE results retrieved by the SELECT statement. The result with SELECT.. LIMIT is below. Unfortunately, This is not what I wanted. I wante latest posts of each authors. (Now I doubt if CQL3 can't represent it) cqlsh:blog_test create table posts( ... author ascii, ... created_at timeuuid, ... entry text, ... primary key(author,created_at) ... )WITH CLUSTERING ORDER BY (created_at DESC); cqlsh:blog_test cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); cqlsh:blog_test select * from posts limit 2; author | created_at | entry +--+-- mike | 1c4d9000-83e9-11e2-8080-808080808080 | This is a new entry by mike mike | 4e52d000-6d1f-11e2-8080-808080808080 | This is an old entry by mike 2014/05/16 23:54、Jonathan Lacefield jlacefi...@datastax.com のメール: Hello, Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3? These may help you achieve your goals. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.com wrote: Hi, I'm modeling some queries in CQL3. I'd like to query first 1 columns for each partitioning keys in CQL3. For example: create table posts( author ascii, created_at timeuuid, entry text, primary key(author,created_at) ); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); And I want results like below. mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john I think that this is what SELECT FIRST statements did in CQL2. The only way I came across in CQL3 is retrieve whole records and drop manually, but it's obviously not efficient. Could you please tell me more straightforward way in CQL3?
Re: Index with same Name but different keyspace
Can you share your schema and the commands you are running? On Thu, May 15, 2014 at 7:54 PM, mahesh rajamani rajamani.mah...@gmail.comwrote: Hi, I am using Cassandra 2.0.5 version. I trying to setup 2 keyspace with same tables for different testing. While creating index on the tables, I realized I am not able to use the same index name though the tables are in different keyspaces. Is maintaining unique index name across keyspace is must/feature? -- Regards, Mahesh Rajamani
Re: Data modeling for Pinterest-like application
A related question is whether it is a good idea to denormalize on read-heavy part of data while normalize on other less frequently-accessed data? Heavy read - denormalize Less frequently accessed data - it depends how less frequent it is and whether it's complicated to denormalize in your code. We will also have a like board for each user containing pins that they like, which can be somewhat private and only viewed by the owner. Since a pin can be potentially liked by thousands of user, if we also denormalize the like board, everytime that pin is liked by another user we would have to update the like count in thousands of like boards. If I understand your use case, a pin consist of a description and a like count isn't it ? It makes sense then to use counter type for the like count but in this case you cannot denormalize the counter type because you cannot mix counter column family with normal column family (containing the pin description and properties). *If you are sure* the the like board is accessed rarely or not very frequently by the users, then normalization could be the answer. You can mitigate further the effect of N+1 select in the like board by paging pins (not showing all of them at once but by page of 10 for example) On Sat, May 17, 2014 at 2:37 AM, ziju feng pkdog...@gmail.com wrote: Thanks for your answer, I really like the frequency of update vs read way of thinking. A related question is whether it is a good idea to denormalize on read-heavy part of data while normalize on other less frequently-accessed data? Our app will have a limited number of system managed boards that are viewed by every user so it makes sense to denormalize and propagate updates of pins to these boards. We will also have a like board for each user containing pins that they like, which can be somewhat private and only viewed by the owner. Since a pin can be potentially liked by thousands of user, if we also denormalize the like board, everytime that pin is liked by another user we would have to update the like count in thousands of like boards. Does normalize work better in this case or cassandra can handle this kind of write load? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594517.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Query first 1 columns for each partitioning keys in CQL?
Clearly with your current data model, having X latest post for each author is not possible. However, what's about this ? CREATE TABLE latest_posts_per_user ( author ascii latest_post mapuuid,text, PRIMARY KEY (author) ) The latest_post will keep a collection of X latest posts for each user. Now the challenge is to update this latest_post map every time an user create a new post. This can be done in a single CQL3 statement: UPDATE latest_posts_per_user SET latest_post = latest_post + {new_uuid: 'new entry', oldest_uuid: null} WHERE author = xxx; You'll need to know the uuid of the oldest post to remove it from the map On Sat, May 17, 2014 at 8:53 AM, 後藤 泰陽 matope@gmail.com wrote: Hello, Thank you for your addressing. But I consider LIMIT to be a keyword to limits result numbers from WHOLE results retrieved by the SELECT statement. The result with SELECT.. LIMIT is below. Unfortunately, This is not what I wanted. I wante latest posts of each authors. (Now I doubt if CQL3 can't represent it) cqlsh:blog_test create table posts( ... author ascii, ... created_at timeuuid, ... entry text, ... primary key(author,created_at) ... )WITH CLUSTERING ORDER BY (created_at DESC); cqlsh:blog_test cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); cqlsh:blog_test select * from posts limit 2; author | created_at | entry +--+-- mike | 1c4d9000-83e9-11e2-8080-808080808080 | This is a new entry by mike mike | 4e52d000-6d1f-11e2-8080-808080808080 | This is an old entry by mike 2014/05/16 23:54、Jonathan Lacefield jlacefi...@datastax.com のメール: Hello, Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3? These may help you achieve your goals. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.com wrote: Hi, I'm modeling some queries in CQL3. I'd like to query first 1 columns for each partitioning keys in CQL3. For example: create table posts( author ascii, created_at timeuuid, entry text, primary key(author,created_at) ); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); And I want results like below. mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john I think that this is what SELECT FIRST statements did in CQL2. The only way I came across in CQL3 is retrieve whole records and drop manually, but it's obviously not efficient. Could you please tell me more straightforward way in CQL3?
Re: What % of cassandra developers are employed by Datastax?
if you look at the new committers since 2012 they are mostly datastax On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com wrote: so 30%… according to that data. On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards, Michael -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Query first 1 columns for each partitioning keys in CQL?
Hmm. Something like a user-managed-index looks the only way to do what I want to do. Thank you, I'll try that. 2014-05-17 18:07 GMT+09:00 DuyHai Doan doanduy...@gmail.com: Clearly with your current data model, having X latest post for each author is not possible. However, what's about this ? CREATE TABLE latest_posts_per_user ( author ascii latest_post mapuuid,text, PRIMARY KEY (author) ) The latest_post will keep a collection of X latest posts for each user. Now the challenge is to update this latest_post map every time an user create a new post. This can be done in a single CQL3 statement: UPDATE latest_posts_per_user SET latest_post = latest_post + {new_uuid: 'new entry', oldest_uuid: null} WHERE author = xxx; You'll need to know the uuid of the oldest post to remove it from the map On Sat, May 17, 2014 at 8:53 AM, 後藤 泰陽 matope@gmail.com wrote: Hello, Thank you for your addressing. But I consider LIMIT to be a keyword to limits result numbers from WHOLE results retrieved by the SELECT statement. The result with SELECT.. LIMIT is below. Unfortunately, This is not what I wanted. I wante latest posts of each authors. (Now I doubt if CQL3 can't represent it) cqlsh:blog_test create table posts( ... author ascii, ... created_at timeuuid, ... entry text, ... primary key(author,created_at) ... )WITH CLUSTERING ORDER BY (created_at DESC); cqlsh:blog_test cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); cqlsh:blog_test select * from posts limit 2; author | created_at | entry +--+-- mike | 1c4d9000-83e9-11e2-8080-808080808080 | This is a new entry by mike mike | 4e52d000-6d1f-11e2-8080-808080808080 | This is an old entry by mike 2014/05/16 23:54、Jonathan Lacefield jlacefi...@datastax.com のメール: Hello, Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3? These may help you achieve your goals. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.comwrote: Hi, I'm modeling some queries in CQL3. I'd like to query first 1 columns for each partitioning keys in CQL3. For example: create table posts( author ascii, created_at timeuuid, entry text, primary key(author,created_at) ); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); And I want results like below. mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john I think that this is what SELECT FIRST statements did in CQL2. The only way I came across in CQL3 is retrieve whole records and drop manually, but it's obviously not efficient. Could you please tell me more straightforward way in CQL3?
Re: What % of cassandra developers are employed by Datastax?
The question assumes that it's likely that datastax employees become committers. Actually, it's more likely that committers become datastax employees. So this underlying tone that datastax only really 'wants' datastax employees to be cassandra committers, is really misleading. Why wouldn't a company want to hire people who have shown a desire and aptitude to work on products that they care about? It's just rational. And damn genius, actually. I'm sure they'd be happy to have an influx of non-datastax committers. patches welcome. dave On 05/17/2014 08:28 AM, Peter Lin wrote: if you look at the new committers since 2012 they are mostly datastax On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com mailto:bur...@spinn3r.com wrote: so 30%… according to that data. On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.org mailto:mich...@pbandjelly.org wrote: On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards, Michael -- Founder/CEO Spinn3r.com http://Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog:**http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Best partition type for Cassandra with JBOD
Thanks for the thoughts! On May 16, 2014 4:23 PM, Ariel Weisberg ar...@weisberg.ws wrote: Hi, Recommending nobarrier (mount option barrier=0) when you don't know if a non-volatile cache in play is probably not the way to go. A non-volatile cache will typically ignore write barriers if a given block device is configured to cache writes anyways. I am also skeptical you will see a boost in performance. Applications that want to defer and batch writes won't emit write barriers frequently and when they do it's because the data has to be there. Filesystems depend on write barriers although it is surprisingly hard to get a reordering that is really bad because of the way journals are managed. Cassandra uses log structured storage and supports asynchronous periodic group commit so it doesn't need to emit write barriers frequently. Setting read ahead to zero on an SSD is necessary to get the maximum number of random reads, but will also disable prefetching for sequential reads. You need a lot less prefetching with an SSD due to the much faster response time, but it's still many microseconds. Someone with more Cassandra specific knowledge can probably give better advice as to when a non-zero read ahead make sense with Cassandra. This is something may be workload specific as well. Regards, Ariel On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote: That and nobarrier... and probably noop for the scheduler if using SSD and setting readahead to zero... On Fri, May 16, 2014 at 10:29 AM, James Campbell ja...@breachintelligence.commailto:ja...@breachintelligence.com wrote: Hi all- What partition type is best/most commonly used for a multi-disk JBOD setup running Cassandra on CentOS 64bit? The datastax production server guidelines recommend XFS for data partitions, saying, Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit. However, the same document also notes that Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node, which makes me think 16TB file sizes would be irrelevant (especially when not using RAID to create a single large volume). What has been the experience of this group? I also noted that the guidelines don't mention setting noatime and nodiratime flags in the fstab for data volumes, but I wonder if that's a common practice. James -- Founder/CEO Spinn3r.comhttp://Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com ... or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts [http://spinn3r.com/images/spinn3r.jpg]http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Tombstones
Thanks! How could I find leveled json manifest? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467p7594535.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What % of cassandra developers are employed by Datastax?
I would note that the original question was about “developers”, not “committers” per se. I sort of assumed that the question implied the latter, but that’s not necessarily true. One can “develop” and optionally “contribute” code without being a committer, per se. There are probably plenty of users of Cassandra out there who do their own enhancement of Cassandra and don’t necessarily want or have the energy to contribute back their enhancements, or intend to and haven’t gotten around to it yet. And there are also “contributors” who have “developed” and “contributed” patches (ANYBODY can do that, not just “committers”) but are not officially anointed as “committers”. So, who knows how many contributors or “developers” are out there beyond the known committers. The important thing is that Cassandra is open source and licensed so that any enterprise can use it and readily and freely debug and enhance it without any sort of mandatory requirement that they be completely dependent on some particular vendor. There’s actually a wiki detailing some of the other vendors, beyond DataStax, who provide consulting (which may include actual Cassandra enhancement in some cases) and support for Cassandra: http://wiki.apache.org/cassandra/ThirdPartySupport (For disclosure, I am a part-time contractor for DataStax, but now on the sales side, although by background is as a developer.) -- Jack Krupansky From: Dave Brosius Sent: Saturday, May 17, 2014 10:48 AM To: user@cassandra.apache.org Subject: Re: What % of cassandra developers are employed by Datastax? The question assumes that it's likely that datastax employees become committers. Actually, it's more likely that committers become datastax employees. So this underlying tone that datastax only really 'wants' datastax employees to be cassandra committers, is really misleading. Why wouldn't a company want to hire people who have shown a desire and aptitude to work on products that they care about? It's just rational. And damn genius, actually. I'm sure they'd be happy to have an influx of non-datastax committers. patches welcome. dave On 05/17/2014 08:28 AM, Peter Lin wrote: if you look at the new committers since 2012 they are mostly datastax On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com wrote: so 30%… according to that data. On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.org wrote: On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards, Michael -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
RE: Tombstones
Hi Dimetrio, From the wiki: Since 0.6.8, minor compactions also GC tombstones Regards Andi Dimetrio wrote Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Data modeling for Pinterest-like application
I was thinking to use counter type a separate pin counter table and, when I need to update the like count, I would use read-after-write to get the current value and timestamp and then denormalize into pin's detail table and board tables. Is it a viable solution in this case? Thanks -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594539.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
initial token crashes cassandra
Hey all, I've set my initial_token in cassandra 2.0.7 using a python script I found at the datastax wiki. I've set the value like this: initial_token: 85070591730234615865843651857942052864 And cassandra crashes when I try to start it: [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f INFO 18:14:38,511 Logging initialized INFO 18:14:38,560 Loading settings from file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data] INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 18:14:39,153 disk_failure_policy is stop INFO 18:14:39,153 commit_failure_policy is stop INFO 18:14:39,161 Global memtable threshold is enabled at 251MB INFO 18:14:39,362 Not using multi-threaded compaction ERROR 18:14:39,365 Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052864 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178) at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560) For input string: 85070591730234615865843651857942052864 Fatal configuration error; unable to start. See log for stacktrace. I really need to get replication going between 2 nodes. Can someone clue me into why this may be crashing? Thanks! Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Java fluent interface for CQL
In our MySQL stack we've been using a fluent interface for Java I developed about five years ago but never open sourced. It's similar to: MSelect sele = MSelect.newInstance(); sele.addTable( Foo.NAME ) .addWhereIsEqual( Foo.COL_A, bar ) .setLimit( 10 ) ; … of course embedding CQL strings into my application is evil so I'll probably build something similar for CQL…. what do you guys usually do for this? Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: initial token crashes cassandra
You may have used the old random partitioner token generator. Use the murmur partitioner token generator instead. -- Colin 320-221-9531 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I've set my initial_token in cassandra 2.0.7 using a python script I found at the datastax wiki. I've set the value like this: initial_token: 85070591730234615865843651857942052864 And cassandra crashes when I try to start it: [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f INFO 18:14:38,511 Logging initialized INFO 18:14:38,560 Loading settings from file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data] INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 18:14:39,153 disk_failure_policy is stop INFO 18:14:39,153 commit_failure_policy is stop INFO 18:14:39,161 Global memtable threshold is enabled at 251MB INFO 18:14:39,362 Not using multi-threaded compaction ERROR 18:14:39,365 Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052864 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178) at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560) For input string: 85070591730234615865843651857942052864 Fatal configuration error; unable to start. See log for stacktrace. I really need to get replication going between 2 nodes. Can someone clue me into why this may be crashing? Thanks! Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: Java fluent interface for CQL
AH… looks like there's one in the Datastax java driver. Looks like it doesn't support everything but probably supports the features I need ;) So I'll just use that! On Sat, May 17, 2014 at 12:39 PM, Kevin Burton bur...@spinn3r.com wrote: In our MySQL stack we've been using a fluent interface for Java I developed about five years ago but never open sourced. It's similar to: MSelect sele = MSelect.newInstance(); sele.addTable( Foo.NAME ) .addWhereIsEqual( Foo.COL_A, bar ) .setLimit( 10 ) ; … of course embedding CQL strings into my application is evil so I'll probably build something similar for CQL…. what do you guys usually do for this? Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: initial token crashes cassandra
Hi and thanks for your response. The puzzling thing is that yes I am using the murmur partition, yet I am still getting the error I just told you guys about: [root@beta:/etc/alternatives/cassandrahome] #grep -i partition conf/cassandra.yaml | grep -v '#' partitioner: org.apache.cassandra.dht.Murmur3Partitioner Thanks Tim On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com wrote: You may have used the old random partitioner token generator. Use the murmur partitioner token generator instead. -- Colin 320-221-9531 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I've set my initial_token in cassandra 2.0.7 using a python script I found at the datastax wiki. I've set the value like this: initial_token: 85070591730234615865843651857942052864 And cassandra crashes when I try to start it: [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f INFO 18:14:38,511 Logging initialized INFO 18:14:38,560 Loading settings from file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data] INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 18:14:39,153 disk_failure_policy is stop INFO 18:14:39,153 commit_failure_policy is stop INFO 18:14:39,161 Global memtable threshold is enabled at 251MB INFO 18:14:39,362 Not using multi-threaded compaction ERROR 18:14:39,365 Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052864 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178) at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560) For input string: 85070591730234615865843651857942052864 Fatal configuration error; unable to start. See log for stacktrace. I really need to get replication going between 2 nodes. Can someone clue me into why this may be crashing? Thanks! Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: Java fluent interface for CQL
Pull requests encouraged. :) -Tupshin On May 17, 2014 7:43 PM, Kevin Burton bur...@spinn3r.com wrote: AH… looks like there's one in the Datastax java driver. Looks like it doesn't support everything but probably supports the features I need ;) So I'll just use that! On Sat, May 17, 2014 at 12:39 PM, Kevin Burton bur...@spinn3r.com wrote: In our MySQL stack we've been using a fluent interface for Java I developed about five years ago but never open sourced. It's similar to: MSelect sele = MSelect.newInstance(); sele.addTable( Foo.NAME ) .addWhereIsEqual( Foo.COL_A, bar ) .setLimit( 10 ) ; … of course embedding CQL strings into my application is evil so I'll probably build something similar for CQL…. what do you guys usually do for this? Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: initial token crashes cassandra
You probably generated the wrong token type. Look for a murmur token generator on the Datastax site. -- Colin 320-221-9531 On May 17, 2014, at 7:00 PM, Tim Dunphy bluethu...@gmail.com wrote: Hi and thanks for your response. The puzzling thing is that yes I am using the murmur partition, yet I am still getting the error I just told you guys about: [root@beta:/etc/alternatives/cassandrahome] #grep -i partition conf/cassandra.yaml | grep -v '#' partitioner: org.apache.cassandra.dht.Murmur3Partitioner Thanks Tim On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com wrote: You may have used the old random partitioner token generator. Use the murmur partitioner token generator instead. -- Colin 320-221-9531 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I've set my initial_token in cassandra 2.0.7 using a python script I found at the datastax wiki. I've set the value like this: initial_token: 85070591730234615865843651857942052864 And cassandra crashes when I try to start it: [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f INFO 18:14:38,511 Logging initialized INFO 18:14:38,560 Loading settings from file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data] INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 18:14:39,153 disk_failure_policy is stop INFO 18:14:39,153 commit_failure_policy is stop INFO 18:14:39,161 Global memtable threshold is enabled at 251MB INFO 18:14:39,362 Not using multi-threaded compaction ERROR 18:14:39,365 Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052864 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178) at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560) For input string: 85070591730234615865843651857942052864 Fatal configuration error; unable to start. See log for stacktrace. I really need to get replication going between 2 nodes. Can someone clue me into why this may be crashing? Thanks! Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: initial token crashes cassandra
What Colin is saying is that the tool you used to create the token, is not creating tokens usable for the Murmur3Partitioner. That tool is probably generating tokens for the (original) RandomPartitioner, which has a different range. On 05/17/2014 07:20 PM, Tim Dunphy wrote: Hi and thanks for your response. The puzzling thing is that yes I am using the murmur partition, yet I am still getting the error I just told you guys about: [root@beta:/etc/alternatives/cassandrahome] #grep -i partition conf/cassandra.yaml | grep -v '#' partitioner: org.apache.cassandra.dht.Murmur3Partitioner Thanks Tim On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com mailto:colpcl...@gmail.com wrote: You may have used the old random partitioner token generator. Use the murmur partitioner token generator instead. -- Colin 320-221-9531 tel:320-221-9531 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com mailto:bluethu...@gmail.com wrote: Hey all, I've set my initial_token in cassandra 2.0.7 using a python script I found at the datastax wiki. I've set the value like this: initial_token: 85070591730234615865843651857942052864 And cassandra crashes when I try to start it: [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f INFO 18:14:38,511 Logging initialized INFO 18:14:38,560 Loading settings from file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data] INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 18:14:39,153 disk_failure_policy is stop INFO 18:14:39,153 commit_failure_policy is stop INFO 18:14:39,161 Global memtable threshold is enabled at 251MB INFO 18:14:39,362 Not using multi-threaded compaction ERROR 18:14:39,365 Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052864 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178) at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560) For input string: 85070591730234615865843651857942052864 Fatal configuration error; unable to start. See log for stacktrace. I really need to get replication going between 2 nodes. Can someone clue me into why this may be crashing? Thanks! Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net http://pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net http://pool.sks-keyservers.net --recv-keys F186197B
Re: initial token crashes cassandra
You probably generated the wrong token type. Look for a murmur token generator on the Datastax site. What Colin is saying is that the tool you used to create the token, is not creating tokens usable for the Murmur3Partitioner. That tool is probably generating tokens for the (original) RandomPartitioner, which has a different range. Thanks guys for your input. And I apologize for reading Colin's initial response too quickly which lets me know that I was probably using the wrong token generator for the wrong partition type. That of course was the case. So what I've done is use this token generator form the datastax website: python -c 'print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] That algorithm generated a token I could use to start Cassandra on my second node. However at this stage I have both nodes running and I believe their gossiping if I understand what I see here correctly: INFO 02:44:13,823 No gossip backlog; proceeding However I've setup web pages for each of the two web servers that are running Cassandra. And it looks like the seed node with all the data is rendering correctly. But the node that's downstream from the seed node is not receiving any of its data despite the message that I've just shown you. And if I go to the seed node and do a describe keyspaces I see the keyspace that drives the website listed. It's called 'joke_fire1' cqlsh describe keyspaces; system joke_fire1 system_traces And if I go to the node that's downstream from the seed node and run the same command: cqlsh describe keyspaces; system system_traces I don't see the important keyspace that runs the site. I have the seed node's IP listed in 'seeds' in the cassandra.yaml on the downstream node. So I'm not really sure why its' not receiving the seed's data. If there's some command I need to run to flush the system or something like that. And if I do a nodetool ring command on the first (seed) host I don't see the IP of the downstream node listed: [root@beta-new:~] #nodetool ring | head -10 Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -9173731940639284976 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -9070607847117718988 10.10.1.94 rack1 kUp Normal 150.64 KB 100.00% -9060190512633067546 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -8935690644016753923 And if I look on the downstream node and run nodetool ring I see only the IP of the downstream node and not the seed listed: [root@beta:/var/lib/cassandra] #nodetool ring | head -15 Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9223372036854775808 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9151314442816847873 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9079256848778919937 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9007199254740992001 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8935141660703064065 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -886308405136129 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8791026472627208193 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8718968878589280257 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8646911284551352321 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8574853690513424385 Yet in my seeds entry in cassandra.yaml I have the correct IP of my seed node listed: seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider # seeds is actually a comma-delimited list of addresses. - seeds: 10.10.1.94 So I'm just wondering what I'm missing in trying to get these two nodes to communicate via gossip at this point. Thanks! Tim On Sat, May 17, 2014 at 8:54 PM, Dave Brosius dbros...@mebigfatguy.comwrote: What Colin is saying is that the tool you used to create the token, is not creating tokens usable for the Murmur3Partitioner. That tool is probably generating tokens for the (original) RandomPartitioner, which has a different range. On 05/17/2014 07:20 PM, Tim Dunphy wrote: Hi and thanks for your response. The puzzling thing is that yes I am using the murmur partition, yet I am still getting the error I just told you guys about: [root@beta:/etc/alternatives/cassandrahome] #grep -i partition conf/cassandra.yaml | grep -v '#' partitioner: org.apache.cassandra.dht.Murmur3Partitioner Thanks Tim On Sat, May 17, 2014 at 3:23 PM,
vcdiff/bmdiff , cassandra , and the ordered partitioner…
So I see that Cassandra doesn't support bmdiff/vcdiff. Is this primarily because most people aren't using the ordered partitioner? bmdiff gets good compression by storing similar content next to each page on disk. So lots of HTML content would compress well. but if everything is being stored at random locations, you wouldn't get that bump in storage / compression reduction. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…
Cassandra offers compression out of the box. Look into the options available upon table creation. The use of orderedpartitioner is an anti-pattern 999/1000 times. It creates hot spots - the use of wide rows can often accomplish the same result through the use of clustering columns. -- Colin 320-221-9531 On May 17, 2014, at 10:15 PM, Kevin Burton bur...@spinn3r.com wrote: So I see that Cassandra doesn't support bmdiff/vcdiff. Is this primarily because most people aren't using the ordered partitioner? bmdiff gets good compression by storing similar content next to each page on disk. So lots of HTML content would compress well. but if everything is being stored at random locations, you wouldn't get that bump in storage / compression reduction. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: initial token crashes cassandra
Looks like you may have put the token next to num-tokens property in the yaml file for one node. I would double check the yaml's to make sure the tokens are setup correctly and that the ip addresses are associated with the right entries as well. Compare them to a fresh download if possible to see what you've changed. -- Colin 320-221-9531 On May 17, 2014, at 10:29 PM, Tim Dunphy bluethu...@gmail.com wrote: You probably generated the wrong token type. Look for a murmur token generator on the Datastax site. What Colin is saying is that the tool you used to create the token, is not creating tokens usable for the Murmur3Partitioner. That tool is probably generating tokens for the (original) RandomPartitioner, which has a different range. Thanks guys for your input. And I apologize for reading Colin's initial response too quickly which lets me know that I was probably using the wrong token generator for the wrong partition type. That of course was the case. So what I've done is use this token generator form the datastax website: python -c 'print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] That algorithm generated a token I could use to start Cassandra on my second node. However at this stage I have both nodes running and I believe their gossiping if I understand what I see here correctly: INFO 02:44:13,823 No gossip backlog; proceeding However I've setup web pages for each of the two web servers that are running Cassandra. And it looks like the seed node with all the data is rendering correctly. But the node that's downstream from the seed node is not receiving any of its data despite the message that I've just shown you. And if I go to the seed node and do a describe keyspaces I see the keyspace that drives the website listed. It's called 'joke_fire1' cqlsh describe keyspaces; system joke_fire1 system_traces And if I go to the node that's downstream from the seed node and run the same command: cqlsh describe keyspaces; system system_traces I don't see the important keyspace that runs the site. I have the seed node's IP listed in 'seeds' in the cassandra.yaml on the downstream node. So I'm not really sure why its' not receiving the seed's data. If there's some command I need to run to flush the system or something like that. And if I do a nodetool ring command on the first (seed) host I don't see the IP of the downstream node listed: [root@beta-new:~] #nodetool ring | head -10 Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -9173731940639284976 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -9070607847117718988 10.10.1.94 rack1 kUp Normal 150.64 KB 100.00% -9060190512633067546 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -8935690644016753923 And if I look on the downstream node and run nodetool ring I see only the IP of the downstream node and not the seed listed: [root@beta:/var/lib/cassandra] #nodetool ring | head -15 Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9223372036854775808 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9151314442816847873 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9079256848778919937 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9007199254740992001 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8935141660703064065 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -886308405136129 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8791026472627208193 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8718968878589280257 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8646911284551352321 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8574853690513424385 Yet in my seeds entry in cassandra.yaml I have the correct IP of my seed node listed: seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider # seeds is actually a comma-delimited list of addresses. - seeds: 10.10.1.94 So I'm just wondering what I'm missing in trying to get these two nodes to communicate via gossip at this point. Thanks! Tim On Sat, May 17, 2014 at 8:54 PM, Dave Brosius dbros...@mebigfatguy.comwrote: What Colin is saying is that the tool you used to create the token, is not creating tokens usable for the Murmur3Partitioner. That tool is probably generating tokens for the (original) RandomPartitioner, which has a different range. On
Re: initial token crashes cassandra
Hey Colin, Looks like you may have put the token next to num-tokens property in the yaml file for one node. I would double check the yaml's to make sure the tokens are setup correctly and that the ip addresses are associated with the right entries as well. Compare them to a fresh download if possible to see what you've changed. Thanks! I did that and now things are working perfectly: [root@beta-new:~] #nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.10.1.94 164.39 KB 256 49.4% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 UN 10.10.1.98 99.08 KB 256 50.6% f2a48fc7-a362-43f5-9061-4bb3739fdeaf rack1 Thanks again for your help! Tim On Sun, May 18, 2014 at 12:35 AM, Colin Clark co...@clark.ws wrote: Looks like you may have put the token next to num-tokens property in the yaml file for one node. I would double check the yaml's to make sure the tokens are setup correctly and that the ip addresses are associated with the right entries as well. Compare them to a fresh download if possible to see what you've changed. -- Colin 320-221-9531 On May 17, 2014, at 10:29 PM, Tim Dunphy bluethu...@gmail.com wrote: You probably generated the wrong token type. Look for a murmur token generator on the Datastax site. What Colin is saying is that the tool you used to create the token, is not creating tokens usable for the Murmur3Partitioner. That tool is probably generating tokens for the (original) RandomPartitioner, which has a different range. Thanks guys for your input. And I apologize for reading Colin's initial response too quickly which lets me know that I was probably using the wrong token generator for the wrong partition type. That of course was the case. So what I've done is use this token generator form the datastax website: python -c 'print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] That algorithm generated a token I could use to start Cassandra on my second node. However at this stage I have both nodes running and I believe their gossiping if I understand what I see here correctly: INFO 02:44:13,823 No gossip backlog; proceeding However I've setup web pages for each of the two web servers that are running Cassandra. And it looks like the seed node with all the data is rendering correctly. But the node that's downstream from the seed node is not receiving any of its data despite the message that I've just shown you. And if I go to the seed node and do a describe keyspaces I see the keyspace that drives the website listed. It's called 'joke_fire1' cqlsh describe keyspaces; system joke_fire1 system_traces And if I go to the node that's downstream from the seed node and run the same command: cqlsh describe keyspaces; system system_traces I don't see the important keyspace that runs the site. I have the seed node's IP listed in 'seeds' in the cassandra.yaml on the downstream node. So I'm not really sure why its' not receiving the seed's data. If there's some command I need to run to flush the system or something like that. And if I do a nodetool ring command on the first (seed) host I don't see the IP of the downstream node listed: [root@beta-new:~] #nodetool ring | head -10 Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -9173731940639284976 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -9070607847117718988 10.10.1.94 rack1 kUp Normal 150.64 KB 100.00% -9060190512633067546 10.10.1.94 rack1 Up Normal 150.64 KB 100.00% -8935690644016753923 And if I look on the downstream node and run nodetool ring I see only the IP of the downstream node and not the seed listed: [root@beta:/var/lib/cassandra] #nodetool ring | head -15 Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9223372036854775808 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9151314442816847873 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9079256848778919937 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -9007199254740992001 10.10.1.98 rack1 Up Normal 91.06 KB99.99% -8935141660703064065 10.10.1.98 rack1 Up Normal 91.06 KB99.99%
Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…
compression … sure.. but bmdiff? Not that I can find. BMDiff is an algorithm that in some situations could result in 10x compression due to the way it's able to find long commons runs. This is a pathological case though. But if you were to copy the US constitution into itself … 10x… bmdiff could ideally get a 10x compression rate. not all compression algorithms are identical. On Sat, May 17, 2014 at 8:59 PM, Colin colpcl...@gmail.com wrote: Cassandra offers compression out of the box. Look into the options available upon table creation. The use of orderedpartitioner is an anti-pattern 999/1000 times. It creates hot spots - the use of wide rows can often accomplish the same result through the use of clustering columns. -- Colin 320-221-9531 On May 17, 2014, at 10:15 PM, Kevin Burton bur...@spinn3r.com wrote: So I see that Cassandra doesn't support bmdiff/vcdiff. Is this primarily because most people aren't using the ordered partitioner? bmdiff gets good compression by storing similar content next to each page on disk. So lots of HTML content would compress well. but if everything is being stored at random locations, you wouldn't get that bump in storage / compression reduction. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.