Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-17 Thread 後藤 泰陽
Hello,

Thank you for your addressing.

But I consider LIMIT to be a keyword to limits result numbers from WHOLE 
results retrieved by the SELECT statement.
The result with SELECT.. LIMIT is below. Unfortunately, This is not what I 
wanted.
I wante latest posts of each authors. (Now I doubt if CQL3 can't represent it)

 cqlsh:blog_test create table posts(
  ... author ascii,
  ... created_at timeuuid,
  ... entry text,
  ... primary key(author,created_at)
  ... )WITH CLUSTERING ORDER BY (created_at DESC);
 cqlsh:blog_test 
 cqlsh:blog_test insert into posts(author,created_at,entry) values 
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john');
 cqlsh:blog_test insert into posts(author,created_at,entry) values 
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john');
 cqlsh:blog_test insert into posts(author,created_at,entry) values 
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike');
 cqlsh:blog_test insert into posts(author,created_at,entry) values 
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike');
 cqlsh:blog_test select * from posts limit 2;
 
  author | created_at   | entry
 +--+--
mike | 1c4d9000-83e9-11e2-8080-808080808080 |  This is a new entry by mike
mike | 4e52d000-6d1f-11e2-8080-808080808080 | This is an old entry by mike



2014/05/16 23:54、Jonathan Lacefield jlacefi...@datastax.com のメール:

 Hello,
 
  Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3?
 
  These may help you achieve your goals.
 
   
 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html
   
 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html
 
 Jonathan Lacefield
 Solutions Architect, DataStax
 (404) 822 3487
 
 
 
 
 
 
 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.com wrote:
 Hi, I'm modeling some queries in CQL3.
 
 I'd like to query first 1 columns for each partitioning keys in CQL3.
 
 For example:
 
 create table posts(
   author ascii,
   created_at timeuuid,
   entry text,
   primary key(author,created_at)
 );
 insert into posts(author,created_at,entry) values 
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john');
 insert into posts(author,created_at,entry) values 
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john');
 insert into posts(author,created_at,entry) values 
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike');
 insert into posts(author,created_at,entry) values 
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike');
 
 And I want results like below.
 
 mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike
 john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john
 
 I think that this is what SELECT FIRST  statements did in CQL2.
 
 The only way I came across in CQL3 is retrieve whole records and drop 
 manually,
 but it's obviously not efficient.
 
 Could you please tell me more straightforward way in CQL3?
 



Re: Index with same Name but different keyspace

2014-05-17 Thread Mark Reddy
Can you share your schema and the commands you are running?


On Thu, May 15, 2014 at 7:54 PM, mahesh rajamani
rajamani.mah...@gmail.comwrote:

 Hi,

 I am using Cassandra 2.0.5 version. I trying to setup 2 keyspace with same
 tables for different testing. While creating index on the tables, I
 realized I am not able to use the same index name  though the tables are in
 different keyspaces. Is maintaining unique index name across keyspace is
 must/feature?

 --
 Regards,
 Mahesh Rajamani



Re: Data modeling for Pinterest-like application

2014-05-17 Thread DuyHai Doan
A related question is whether it is a good idea to denormalize on
read-heavy part of data while normalize on other less frequently-accessed
data?

 Heavy read - denormalize
 Less frequently accessed data - it depends how less frequent it is and
whether it's complicated to denormalize in your code.

We will also have a like board for each user containing pins that they
like, which can be somewhat private and only viewed by the owner.

Since a pin can be potentially liked by thousands of user, if we also
denormalize the like board, everytime that pin is liked by another user we
would have to update the like count in thousands of like boards.

If I understand your use case, a pin consist of a description and a like
count isn't it ?  It makes sense then to use counter type for the like
count but in this case you cannot denormalize the counter type because you
cannot mix counter column family with normal column family (containing the
pin description and properties).

*If you are sure* the the like board  is accessed rarely or not very
frequently by the users, then normalization could be the answer. You can
mitigate further the effect of N+1 select in the like board by paging pins
(not showing all of them at once but by page of 10 for example)








On Sat, May 17, 2014 at 2:37 AM, ziju feng pkdog...@gmail.com wrote:

 Thanks for your answer, I really like the frequency of update vs read way
 of
 thinking.

 A related question is whether it is a good idea to denormalize on
 read-heavy
 part of data while normalize on other less frequently-accessed data?

 Our app will have a limited number of system managed boards that are viewed
 by every user so it makes sense to denormalize and propagate updates of
 pins
 to these boards.

 We will also have a like board for each user containing pins that they
 like,
 which can be somewhat private and only viewed by the owner.

 Since a pin can be potentially liked by thousands of user, if we also
 denormalize the like board, everytime that pin is liked by another user we
 would have to update the like count in thousands of like boards.

 Does normalize work better in this case or cassandra can handle this kind
 of
 write load?



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594517.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-17 Thread DuyHai Doan
Clearly with your current data model, having X latest post for each author
is not possible.

 However, what's about this ?

CREATE TABLE latest_posts_per_user (
   author ascii
   latest_post mapuuid,text,
   PRIMARY KEY (author)
)

 The latest_post will keep a collection of X latest posts for each user.
Now the challenge is to update this latest_post map every time an user
create a new post. This can be done in a single CQL3 statement: UPDATE
latest_posts_per_user SET latest_post = latest_post + {new_uuid: 'new
entry', oldest_uuid: null} WHERE author = xxx;

 You'll need to know the uuid of the oldest post to remove it from the map



On Sat, May 17, 2014 at 8:53 AM, 後藤 泰陽 matope@gmail.com wrote:

 Hello,

 Thank you for your addressing.

 But I consider LIMIT to be a keyword to limits result numbers from WHOLE
 results retrieved by the SELECT statement.
 The result with SELECT.. LIMIT is below. Unfortunately, This is not what I
 wanted.
 I wante latest posts of each authors. (Now I doubt if CQL3 can't represent
 it)

 cqlsh:blog_test create table posts(
  ... author ascii,
  ... created_at timeuuid,
  ... entry text,
  ... primary key(author,created_at)
  ... )WITH CLUSTERING ORDER BY (created_at DESC);
 cqlsh:blog_test
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 john');
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by
 john');
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 mike');
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by
 mike');
 cqlsh:blog_test select * from posts limit 2;

  author | created_at   | entry

 +--+--
mike | 1c4d9000-83e9-11e2-8080-808080808080 |  This is a new entry by
 mike
mike | 4e52d000-6d1f-11e2-8080-808080808080 | This is an old entry by
 mike




 2014/05/16 23:54、Jonathan Lacefield jlacefi...@datastax.com のメール:

 Hello,

  Have you looked at using the CLUSTERING ORDER BY and LIMIT features of
 CQL3?

  These may help you achieve your goals.


 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html

 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

 Jonathan Lacefield
 Solutions Architect, DataStax
 (404) 822 3487
 http://www.linkedin.com/in/jlacefield

 http://www.datastax.com/cassandrasummit14



 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.com wrote:

 Hi, I'm modeling some queries in CQL3.

 I'd like to query first 1 columns for each partitioning keys in CQL3.

 For example:

 create table posts(
 author ascii,
 created_at timeuuid,
 entry text,
 primary key(author,created_at)
 );
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 john');
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 mike');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike');


 And I want results like below.

 mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike
 john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john


 I think that this is what SELECT FIRST  statements did in CQL2.

 The only way I came across in CQL3 is retrieve whole records and drop
 manually,
 but it's obviously not efficient.

 Could you please tell me more straightforward way in CQL3?






Re: What % of cassandra developers are employed by Datastax?

2014-05-17 Thread Peter Lin
if you look at the new committers since 2012 they are mostly datastax


On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com wrote:

 so 30%… according to that data.


 On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.orgwrote:

 On 05/14/2014 03:39 PM, Kevin Burton wrote:

 I'm curious what % of cassandra developers are employed by Datastax?


 http://wiki.apache.org/cassandra/Committers

 --
 Kind regards,
 Michael




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-17 Thread Matope Ono
Hmm. Something like a user-managed-index looks the only way to do what I
want to do.
Thank you, I'll try that.


2014-05-17 18:07 GMT+09:00 DuyHai Doan doanduy...@gmail.com:

 Clearly with your current data model, having X latest post for each author
 is not possible.

  However, what's about this ?

 CREATE TABLE latest_posts_per_user (
author ascii
latest_post mapuuid,text,
PRIMARY KEY (author)
 )

  The latest_post will keep a collection of X latest posts for each user.
 Now the challenge is to update this latest_post map every time an user
 create a new post. This can be done in a single CQL3 statement: UPDATE
 latest_posts_per_user SET latest_post = latest_post + {new_uuid: 'new
 entry', oldest_uuid: null} WHERE author = xxx;

  You'll need to know the uuid of the oldest post to remove it from the map



 On Sat, May 17, 2014 at 8:53 AM, 後藤 泰陽 matope@gmail.com wrote:

 Hello,

 Thank you for your addressing.

 But I consider LIMIT to be a keyword to limits result numbers from WHOLE
 results retrieved by the SELECT statement.
 The result with SELECT.. LIMIT is below. Unfortunately, This is not what
 I wanted.
 I wante latest posts of each authors. (Now I doubt if CQL3 can't
 represent it)

 cqlsh:blog_test create table posts(
  ... author ascii,
  ... created_at timeuuid,
  ... entry text,
  ... primary key(author,created_at)
  ... )WITH CLUSTERING ORDER BY (created_at DESC);
 cqlsh:blog_test
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 john');
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by
 john');
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 mike');
 cqlsh:blog_test insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by
 mike');
 cqlsh:blog_test select * from posts limit 2;

  author | created_at   | entry

 +--+--
mike | 1c4d9000-83e9-11e2-8080-808080808080 |  This is a new entry by
 mike
mike | 4e52d000-6d1f-11e2-8080-808080808080 | This is an old entry by
 mike




 2014/05/16 23:54、Jonathan Lacefield jlacefi...@datastax.com のメール:

 Hello,

  Have you looked at using the CLUSTERING ORDER BY and LIMIT features of
 CQL3?

  These may help you achieve your goals.


 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html

 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

 Jonathan Lacefield
 Solutions Architect, DataStax
 (404) 822 3487
 http://www.linkedin.com/in/jlacefield

 http://www.datastax.com/cassandrasummit14



 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.comwrote:

 Hi, I'm modeling some queries in CQL3.

 I'd like to query first 1 columns for each partitioning keys in CQL3.

 For example:

 create table posts(
 author ascii,
 created_at timeuuid,
 entry text,
 primary key(author,created_at)
 );
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 john');
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by 
 john');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 mike');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by 
 mike');


 And I want results like below.

 mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike
 john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john


 I think that this is what SELECT FIRST  statements did in CQL2.

 The only way I came across in CQL3 is retrieve whole records and drop
 manually,
 but it's obviously not efficient.

 Could you please tell me more straightforward way in CQL3?







Re: What % of cassandra developers are employed by Datastax?

2014-05-17 Thread Dave Brosius
The question assumes that it's likely that datastax employees become 
committers.


Actually, it's more likely that committers become datastax employees.

So this underlying tone that datastax only really 'wants' datastax 
employees to be cassandra committers, is really misleading.


Why wouldn't a company want to hire people who have shown a desire and 
aptitude to work on products that they care about? It's just rational. 
And damn genius, actually.


I'm sure they'd be happy to have an influx of non-datastax committers. 
patches welcome.


dave


On 05/17/2014 08:28 AM, Peter Lin wrote:


if you look at the new committers since 2012 they are mostly datastax


On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com 
mailto:bur...@spinn3r.com wrote:


so 30%… according to that data.


On Thu, May 15, 2014 at 4:59 PM, Michael Shuler
mich...@pbandjelly.org mailto:mich...@pbandjelly.org wrote:

On 05/14/2014 03:39 PM, Kevin Burton wrote:

I'm curious what % of cassandra developers are employed by
Datastax?


http://wiki.apache.org/cassandra/Committers

-- 
Kind regards,

Michael




-- 
Founder/CEO Spinn3r.com http://Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*
blog:**http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength.
Corporations are people.






Re: Best partition type for Cassandra with JBOD

2014-05-17 Thread James Campbell
Thanks for the thoughts!

On May 16, 2014 4:23 PM, Ariel Weisberg ar...@weisberg.ws wrote:
Hi,

Recommending nobarrier (mount option barrier=0) when you don't know if a 
non-volatile cache in play is probably not the way to go. A non-volatile cache 
will typically ignore write barriers if a given block device is configured to 
cache writes anyways.

I am also skeptical you will see a boost in performance. Applications that want 
to defer and batch writes won't emit write barriers frequently and when they do 
it's because the data has to be there. Filesystems depend on write barriers 
although it is surprisingly hard to get a reordering that is really bad because 
of the way journals are managed.

Cassandra uses log structured storage and supports asynchronous periodic group 
commit so it doesn't need to emit write barriers frequently.

Setting read ahead to zero on an SSD is necessary to get the maximum number of 
random reads, but will also disable prefetching for sequential reads. You need 
a lot less prefetching with an SSD due to the much faster response time, but 
it's still many microseconds.

Someone with more Cassandra specific knowledge can probably give better advice 
as to when a non-zero read ahead make sense with Cassandra. This is something 
may be workload specific as well.

Regards,
Ariel

On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote:
That and nobarrier... and probably noop for the scheduler if using SSD and 
setting readahead to zero...


On Fri, May 16, 2014 at 10:29 AM, James Campbell 
ja...@breachintelligence.commailto:ja...@breachintelligence.com wrote:

Hi all-



What partition type is best/most commonly used for a multi-disk JBOD setup 
running Cassandra on CentOS 64bit?



The datastax production server guidelines recommend XFS for data partitions, 
saying, Because Cassandra can use almost half your disk space for a single 
file, use XFS when using large disks, particularly if using a 32-bit kernel. 
XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited 
on 64-bit.



However, the same document also notes that Maximum recommended capacity for 
Cassandra 1.2 and later is 3 to 5TB per node, which makes me think 16TB file 
sizes would be irrelevant (especially when not using RAID to create a single 
large volume).  What has been the experience of this group?



I also noted that the guidelines don't mention setting noatime and nodiratime 
flags in the fstab for data volumes, but I wonder if that's a common practice.


James



--


Founder/CEO Spinn3r.comhttp://Spinn3r.com
Location: San Francisco, CA
Skype: burtonator
blog: http://burtonator.wordpress.com
... or check out my Google+ 
profilehttps://plus.google.com/102718274791889610666/posts
[http://spinn3r.com/images/spinn3r.jpg]http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
people.



Re: Tombstones

2014-05-17 Thread Dimetrio
Thanks!
How could I find leveled json manifest?




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467p7594535.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What % of cassandra developers are employed by Datastax?

2014-05-17 Thread Jack Krupansky
I would note that the original question was about “developers”, not 
“committers” per se. I sort of assumed that the question implied the latter, 
but that’s not necessarily true. One can “develop” and optionally “contribute” 
code without being a committer, per se. There are probably plenty of users of 
Cassandra out there who do their own enhancement of Cassandra and don’t 
necessarily want or have the energy to contribute back their enhancements, or 
intend to and haven’t gotten around to it yet. And there are also 
“contributors” who have “developed” and “contributed” patches (ANYBODY can do 
that, not just “committers”) but are not officially anointed as “committers”.

So, who knows how many contributors or “developers” are out there beyond the 
known committers. The important thing is that Cassandra is open source and 
licensed so that any enterprise can use it and readily and freely debug and 
enhance it without any sort of mandatory requirement that they be completely 
dependent on some particular vendor.

There’s actually a wiki detailing some of the other vendors, beyond DataStax, 
who provide consulting (which may include actual Cassandra enhancement in some 
cases) and support for Cassandra:
http://wiki.apache.org/cassandra/ThirdPartySupport

(For disclosure, I am a part-time contractor for DataStax, but now on the sales 
side, although by background is as a developer.)

-- Jack Krupansky

From: Dave Brosius 
Sent: Saturday, May 17, 2014 10:48 AM
To: user@cassandra.apache.org 
Subject: Re: What % of cassandra developers are employed by Datastax?

The question assumes that it's likely that datastax employees become committers.

Actually, it's more likely that committers become datastax employees.

So this underlying tone that datastax only really 'wants' datastax employees to 
be cassandra committers, is really misleading.

Why wouldn't a company want to hire people who have shown a desire and aptitude 
to work on products that they care about? It's just rational. And damn genius, 
actually.

I'm sure they'd be happy to have an influx of non-datastax committers. patches 
welcome.

dave



On 05/17/2014 08:28 AM, Peter Lin wrote:


  if you look at the new committers since 2012 they are mostly datastax




  On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com wrote:

so 30%… according to that data.  



On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.org 
wrote:

  On 05/14/2014 03:39 PM, Kevin Burton wrote:

I'm curious what % of cassandra developers are employed by Datastax?



  http://wiki.apache.org/cassandra/Committers

  -- 
  Kind regards,
  Michael





-- 

Founder/CEO Spinn3r.com

Location: San Francisco, CA
Skype: burtonator
blog: http://burtonator.wordpress.com
… or check out my Google+ profile

War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
people.




RE: Tombstones

2014-05-17 Thread Andreas Finke
Hi Dimetrio,

From the wiki:

Since 0.6.8, minor compactions also GC tombstones

Regards
Andi


 Dimetrio wrote 

Does cassandra delete tombstones during simple LCS compaction or I should use
node tool repair?

Thanks.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Data modeling for Pinterest-like application

2014-05-17 Thread ziju feng
I was thinking to use counter type a separate pin counter table and, when I
need to update the like count, I would use read-after-write to get the
current value and timestamp and then denormalize into pin's detail table and
board tables. 

Is it a viable solution in this case?

Thanks



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594539.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


initial token crashes cassandra

2014-05-17 Thread Tim Dunphy
Hey all,

 I've set my initial_token in cassandra 2.0.7 using a python script I found
at the datastax wiki.

I've set the value like this:

initial_token: 85070591730234615865843651857942052864

And cassandra crashes when I try to start it:

[root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f
 INFO 18:14:38,511 Logging initialized
 INFO 18:14:38,560 Loading settings from
file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml
 INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data]
 INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog
 INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap,
indexAccessMode is mmap
 INFO 18:14:39,153 disk_failure_policy is stop
 INFO 18:14:39,153 commit_failure_policy is stop
 INFO 18:14:39,161 Global memtable threshold is enabled at 251MB
 INFO 18:14:39,362 Not using multi-threaded compaction
ERROR 18:14:39,365 Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: For input string:
85070591730234615865843651857942052864
at
org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178)
at
org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440)
at
org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560)
For input string: 85070591730234615865843651857942052864
Fatal configuration error; unable to start. See log for stacktrace.

I really need to get replication going between 2 nodes. Can someone clue me
into why this may be crashing?

Thanks!
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Java fluent interface for CQL

2014-05-17 Thread Kevin Burton
In our MySQL stack we've been using a fluent interface for Java I developed
about five years ago but never open sourced.

It's similar to:

MSelect sele = MSelect.newInstance();

sele.addTable( Foo.NAME )
.addWhereIsEqual( Foo.COL_A, bar )
.setLimit( 10 )
;

… of course embedding CQL strings into my application is evil so I'll
probably build something similar for CQL….

what do you guys usually do for this?

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Re: initial token crashes cassandra

2014-05-17 Thread Colin
You may have used the old random partitioner token generator.  Use the murmur 
partitioner token generator instead.

--
Colin
320-221-9531


 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com wrote:
 
 Hey all,
 
  I've set my initial_token in cassandra 2.0.7 using a python script I found 
 at the datastax wiki. 
 
 I've set the value like this:
 
 initial_token: 85070591730234615865843651857942052864
 
 And cassandra crashes when I try to start it:
 
 [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f
  INFO 18:14:38,511 Logging initialized
  INFO 18:14:38,560 Loading settings from 
 file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml
  INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data]
  INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog
  INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap, 
 indexAccessMode is mmap
  INFO 18:14:39,153 disk_failure_policy is stop
  INFO 18:14:39,153 commit_failure_policy is stop
  INFO 18:14:39,161 Global memtable threshold is enabled at 251MB
  INFO 18:14:39,362 Not using multi-threaded compaction
 ERROR 18:14:39,365 Fatal configuration error
 org.apache.cassandra.exceptions.ConfigurationException: For input string: 
 85070591730234615865843651857942052864
 at 
 org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178)
 at 
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440)
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560)
 For input string: 85070591730234615865843651857942052864
 Fatal configuration error; unable to start. See log for stacktrace.
 
 I really need to get replication going between 2 nodes. Can someone clue me 
 into why this may be crashing?
 
 Thanks!
 Tim
 
 -- 
 GPG me!!
 
 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
 


Re: Java fluent interface for CQL

2014-05-17 Thread Kevin Burton
AH… looks like there's one in the Datastax java driver.  Looks like it
doesn't support everything but probably supports the features I need ;)

So I'll just use that!


On Sat, May 17, 2014 at 12:39 PM, Kevin Burton bur...@spinn3r.com wrote:

 In our MySQL stack we've been using a fluent interface for Java I
 developed about five years ago but never open sourced.

 It's similar to:

 MSelect sele = MSelect.newInstance();

 sele.addTable( Foo.NAME )
 .addWhereIsEqual( Foo.COL_A, bar )
 .setLimit( 10 )
 ;

 … of course embedding CQL strings into my application is evil so I'll
 probably build something similar for CQL….

 what do you guys usually do for this?

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Re: initial token crashes cassandra

2014-05-17 Thread Tim Dunphy
Hi and thanks for your response.

The puzzling thing is that yes I am using the murmur partition, yet I am
still getting the error I just told you guys about:

[root@beta:/etc/alternatives/cassandrahome] #grep -i partition
conf/cassandra.yaml | grep -v '#'
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Thanks
Tim


On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com wrote:

 You may have used the old random partitioner token generator.  Use the
 murmur partitioner token generator instead.

 --
 Colin
 320-221-9531


 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I've set my initial_token in cassandra 2.0.7 using a python script I
 found at the datastax wiki.

 I've set the value like this:

 initial_token: 85070591730234615865843651857942052864

 And cassandra crashes when I try to start it:

 [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f
  INFO 18:14:38,511 Logging initialized
  INFO 18:14:38,560 Loading settings from
 file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml
  INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data]
  INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog
  INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap,
 indexAccessMode is mmap
  INFO 18:14:39,153 disk_failure_policy is stop
  INFO 18:14:39,153 commit_failure_policy is stop
  INFO 18:14:39,161 Global memtable threshold is enabled at 251MB
  INFO 18:14:39,362 Not using multi-threaded compaction
 ERROR 18:14:39,365 Fatal configuration error
 org.apache.cassandra.exceptions.ConfigurationException: For input string:
 85070591730234615865843651857942052864
 at
 org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178)
 at
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440)
 at
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560)
 For input string: 85070591730234615865843651857942052864
 Fatal configuration error; unable to start. See log for stacktrace.

 I really need to get replication going between 2 nodes. Can someone clue
 me into why this may be crashing?

 Thanks!
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: Java fluent interface for CQL

2014-05-17 Thread Tupshin Harper
Pull requests encouraged. :)

-Tupshin
On May 17, 2014 7:43 PM, Kevin Burton bur...@spinn3r.com wrote:

 AH… looks like there's one in the Datastax java driver.  Looks like it
 doesn't support everything but probably supports the features I need ;)

 So I'll just use that!


 On Sat, May 17, 2014 at 12:39 PM, Kevin Burton bur...@spinn3r.com wrote:

 In our MySQL stack we've been using a fluent interface for Java I
 developed about five years ago but never open sourced.

 It's similar to:

 MSelect sele = MSelect.newInstance();

 sele.addTable( Foo.NAME )
 .addWhereIsEqual( Foo.COL_A, bar )
 .setLimit( 10 )
 ;

 … of course embedding CQL strings into my application is evil so I'll
 probably build something similar for CQL….

 what do you guys usually do for this?

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




Re: initial token crashes cassandra

2014-05-17 Thread Colin Clark
You probably generated the wrong token type.  Look for a murmur token
generator on the Datastax site.

--
Colin
320-221-9531


On May 17, 2014, at 7:00 PM, Tim Dunphy bluethu...@gmail.com wrote:

Hi and thanks for your response.

The puzzling thing is that yes I am using the murmur partition, yet I am
still getting the error I just told you guys about:

[root@beta:/etc/alternatives/cassandrahome] #grep -i partition
conf/cassandra.yaml | grep -v '#'
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Thanks
Tim


On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com wrote:

 You may have used the old random partitioner token generator.  Use the
 murmur partitioner token generator instead.

 --
 Colin
 320-221-9531


 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I've set my initial_token in cassandra 2.0.7 using a python script I
 found at the datastax wiki.

 I've set the value like this:

 initial_token: 85070591730234615865843651857942052864

 And cassandra crashes when I try to start it:

 [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f
  INFO 18:14:38,511 Logging initialized
  INFO 18:14:38,560 Loading settings from
 file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml
  INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data]
  INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog
  INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap,
 indexAccessMode is mmap
  INFO 18:14:39,153 disk_failure_policy is stop
  INFO 18:14:39,153 commit_failure_policy is stop
  INFO 18:14:39,161 Global memtable threshold is enabled at 251MB
  INFO 18:14:39,362 Not using multi-threaded compaction
 ERROR 18:14:39,365 Fatal configuration error
 org.apache.cassandra.exceptions.ConfigurationException: For input string:
 85070591730234615865843651857942052864
 at
 org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178)
 at
 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440)
 at
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560)
 For input string: 85070591730234615865843651857942052864
 Fatal configuration error; unable to start. See log for stacktrace.

 I really need to get replication going between 2 nodes. Can someone clue
 me into why this may be crashing?

 Thanks!
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: initial token crashes cassandra

2014-05-17 Thread Dave Brosius
What Colin is saying is that the tool you used to create the token, is 
not creating tokens usable for the Murmur3Partitioner. That tool is 
probably generating tokens for the (original) RandomPartitioner, which 
has a different range.



On 05/17/2014 07:20 PM, Tim Dunphy wrote:

Hi and thanks for your response.

The puzzling thing is that yes I am using the murmur partition, yet I 
am still getting the error I just told you guys about:


[root@beta:/etc/alternatives/cassandrahome] #grep -i partition 
conf/cassandra.yaml | grep -v '#'

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Thanks
Tim


On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com 
mailto:colpcl...@gmail.com wrote:


You may have used the old random partitioner token generator.  Use
the murmur partitioner token generator instead.

-- 
Colin

320-221-9531 tel:320-221-9531


On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com
mailto:bluethu...@gmail.com wrote:


Hey all,

 I've set my initial_token in cassandra 2.0.7 using a python
script I found at the datastax wiki.

I've set the value like this:

initial_token: 85070591730234615865843651857942052864

And cassandra crashes when I try to start it:

[root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f
 INFO 18:14:38,511 Logging initialized
 INFO 18:14:38,560 Loading settings from
file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml
 INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data]
 INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog
 INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap,
indexAccessMode is mmap
 INFO 18:14:39,153 disk_failure_policy is stop
 INFO 18:14:39,153 commit_failure_policy is stop
 INFO 18:14:39,161 Global memtable threshold is enabled at 251MB
 INFO 18:14:39,362 Not using multi-threaded compaction
ERROR 18:14:39,365 Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: For input
string: 85070591730234615865843651857942052864
at

org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178)
at

org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440)
at

org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153)
at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560)
For input string: 85070591730234615865843651857942052864
Fatal configuration error; unable to start. See log for stacktrace.

I really need to get replication going between 2 nodes. Can
someone clue me into why this may be crashing?

Thanks!
Tim

-- 
GPG me!!


gpg --keyserver pool.sks-keyservers.net
http://pool.sks-keyservers.net --recv-keys F186197B





--
GPG me!!

gpg --keyserver pool.sks-keyservers.net 
http://pool.sks-keyservers.net --recv-keys F186197B






Re: initial token crashes cassandra

2014-05-17 Thread Tim Dunphy

 You probably generated the wrong token type.  Look for a murmur token
 generator on the Datastax site.

What Colin is saying is that the tool you used to create the token, is not
 creating tokens usable for the Murmur3Partitioner. That tool is probably
 generating tokens for the (original) RandomPartitioner, which has a
 different range.


Thanks guys for your input. And I apologize for reading  Colin's initial
response too quickly which lets me know that I was probably using the wrong
token generator for the wrong partition type. That of course was the case.
So what I've done is use this token generator form the datastax website:

python -c 'print [str(((2**64 / number_of_tokens) * i) - 2**63) for i
in range(number_of_tokens)]


That algorithm generated a token I could use to start Cassandra on my
second node.


However at this stage I have both nodes running and I believe their
gossiping if I understand what I see here correctly:


 INFO 02:44:13,823 No gossip backlog; proceeding


However I've setup web pages for each of the two web servers that are
running Cassandra. And it looks like the seed node with all the data
is rendering correctly. But the node that's downstream from the seed
node is not receiving any of its data despite the message that I've
just shown you.


And if I go to the seed node and do a describe keyspaces I see the
keyspace that drives the website listed. It's called 'joke_fire1'


cqlsh describe keyspaces;

system  joke_fire1  system_traces

And if I go to the node that's downstream from the seed node and run
the same command:


cqlsh describe keyspaces;

system  system_traces


I don't see the important keyspace that runs the site.


I have the seed node's IP listed in 'seeds' in the cassandra.yaml on
the downstream node. So I'm not really sure why its' not receiving the
seed's data. If there's some command I need to run to flush the system
or something like that.


And if I do a nodetool ring command on the first (seed) host I don't
see the IP of the downstream node listed:








[root@beta-new:~] #nodetool ring | head -10

Note: Ownership information does not include topology; for complete
information, specify a keyspace


Datacenter: datacenter1

==

Address RackStatus State   LoadOwns
Token


10.10.1.94  rack1   Up Normal  150.64 KB   100.00%
-9173731940639284976

10.10.1.94  rack1   Up Normal  150.64 KB   100.00%
-9070607847117718988

10.10.1.94  rack1   kUp Normal  150.64 KB   100.00%
 -9060190512633067546

10.10.1.94  rack1   Up Normal  150.64 KB   100.00%
-8935690644016753923


And if I look on the downstream node and run nodetool ring I see only
the IP of the downstream node and not the seed listed:









[root@beta:/var/lib/cassandra] #nodetool ring | head -15


Datacenter: datacenter1

==

Address  RackStatus State   LoadOwns
 Token


10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9223372036854775808

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9151314442816847873

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9079256848778919937

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9007199254740992001

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8935141660703064065

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-886308405136129

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8791026472627208193

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8718968878589280257

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8646911284551352321

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8574853690513424385


Yet in my seeds entry in cassandra.yaml I have the correct IP of my
seed node listed:


seed_provider:

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

  # seeds is actually a comma-delimited list of addresses.

  - seeds: 10.10.1.94


So I'm just wondering what I'm missing in trying to get these two
nodes to communicate via gossip at this point.


Thanks!

Tim








On Sat, May 17, 2014 at 8:54 PM, Dave Brosius dbros...@mebigfatguy.comwrote:

  What Colin is saying is that the tool you used to create the token, is
 not creating tokens usable for the Murmur3Partitioner. That tool is
 probably generating tokens for the (original) RandomPartitioner, which has
 a different range.



 On 05/17/2014 07:20 PM, Tim Dunphy wrote:

 Hi and thanks for your response.

  The puzzling thing is that yes I am using the murmur partition, yet I am
 still getting the error I just told you guys about:

   [root@beta:/etc/alternatives/cassandrahome] #grep -i partition
 conf/cassandra.yaml | grep -v '#'
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner

  Thanks
 Tim


 On Sat, May 17, 2014 at 3:23 PM, 

vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-17 Thread Kevin Burton
So  I see that Cassandra doesn't support bmdiff/vcdiff.

Is this primarily because most people aren't using the ordered partitioner?

bmdiff gets good compression by storing similar content next to each page
on disk.  So lots of HTML content would compress well.

but if everything is being stored at random locations, you wouldn't get
that bump in storage / compression reduction.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-17 Thread Colin
Cassandra offers compression out of the box.  Look into the options available 
upon table creation.

The use of orderedpartitioner is an anti-pattern 999/1000 times.  It creates 
hot spots - the use of wide rows can often accomplish the same result through 
the use of clustering columns.

--
Colin
320-221-9531


 On May 17, 2014, at 10:15 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 So  I see that Cassandra doesn't support bmdiff/vcdiff.
 
 Is this primarily because most people aren't using the ordered partitioner?
 
 bmdiff gets good compression by storing similar content next to each page on 
 disk.  So lots of HTML content would compress well.  
 
 but if everything is being stored at random locations, you wouldn't get that 
 bump in storage / compression reduction.
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.


Re: initial token crashes cassandra

2014-05-17 Thread Colin Clark
Looks like you may have put the token next to num-tokens property in the
yaml file for one node.  I would double check the yaml's to make sure the
tokens are setup correctly and that the ip addresses are associated with
the right entries as well.

Compare them to a fresh download if possible to see what you've changed.

--
Colin
320-221-9531


On May 17, 2014, at 10:29 PM, Tim Dunphy bluethu...@gmail.com wrote:

You probably generated the wrong token type.  Look for a murmur token
 generator on the Datastax site.

What Colin is saying is that the tool you used to create the token, is not
 creating tokens usable for the Murmur3Partitioner. That tool is probably
 generating tokens for the (original) RandomPartitioner, which has a
 different range.


Thanks guys for your input. And I apologize for reading  Colin's initial
response too quickly which lets me know that I was probably using the wrong
token generator for the wrong partition type. That of course was the case.
So what I've done is use this token generator form the datastax website:

python -c 'print [str(((2**64 / number_of_tokens) * i) - 2**63) for i
in range(number_of_tokens)]


That algorithm generated a token I could use to start Cassandra on my
second node.


However at this stage I have both nodes running and I believe their
gossiping if I understand what I see here correctly:


 INFO 02:44:13,823 No gossip backlog; proceeding


However I've setup web pages for each of the two web servers that are
running Cassandra. And it looks like the seed node with all the data
is rendering correctly. But the node that's downstream from the seed
node is not receiving any of its data despite the message that I've
just shown you.


And if I go to the seed node and do a describe keyspaces I see the
keyspace that drives the website listed. It's called 'joke_fire1'


cqlsh describe keyspaces;

system  joke_fire1  system_traces

And if I go to the node that's downstream from the seed node and run
the same command:


cqlsh describe keyspaces;

system  system_traces


I don't see the important keyspace that runs the site.


I have the seed node's IP listed in 'seeds' in the cassandra.yaml on
the downstream node. So I'm not really sure why its' not receiving the
seed's data. If there's some command I need to run to flush the system
or something like that.


And if I do a nodetool ring command on the first (seed) host I don't
see the IP of the downstream node listed:







[root@beta-new:~] #nodetool ring | head -10

Note: Ownership information does not include topology; for complete
information, specify a keyspace


Datacenter: datacenter1

==

Address RackStatus State   LoadOwns
Token


10.10.1.94  rack1   Up Normal  150.64 KB   100.00%
-9173731940639284976

10.10.1.94  rack1   Up Normal  150.64 KB   100.00%
-9070607847117718988

10.10.1.94  rack1   kUp Normal  150.64 KB   100.00%
 -9060190512633067546

10.10.1.94  rack1   Up Normal  150.64 KB   100.00%
-8935690644016753923


And if I look on the downstream node and run nodetool ring I see only
the IP of the downstream node and not the seed listed:









[root@beta:/var/lib/cassandra] #nodetool ring | head -15


Datacenter: datacenter1

==

Address  RackStatus State   LoadOwns
 Token


10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9223372036854775808

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9151314442816847873

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9079256848778919937

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-9007199254740992001

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8935141660703064065

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-886308405136129

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8791026472627208193

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8718968878589280257

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8646911284551352321

10.10.1.98  rack1   Up Normal  91.06 KB99.99%
-8574853690513424385


Yet in my seeds entry in cassandra.yaml I have the correct IP of my
seed node listed:


seed_provider:

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

  # seeds is actually a comma-delimited list of addresses.

  - seeds: 10.10.1.94


So I'm just wondering what I'm missing in trying to get these two
nodes to communicate via gossip at this point.


Thanks!

Tim








On Sat, May 17, 2014 at 8:54 PM, Dave Brosius dbros...@mebigfatguy.comwrote:

  What Colin is saying is that the tool you used to create the token, is
 not creating tokens usable for the Murmur3Partitioner. That tool is
 probably generating tokens for the (original) RandomPartitioner, which has
 a different range.



 On 

Re: initial token crashes cassandra

2014-05-17 Thread Tim Dunphy
Hey Colin,

Looks like you may have put the token next to num-tokens property in the
 yaml file for one node.  I would double check the yaml's to make sure the
 tokens are setup correctly and that the ip addresses are associated with
 the right entries as well.
 Compare them to a fresh download if possible to see what you've changed.


 Thanks! I did that and now things are working perfectly:

[root@beta-new:~] #nodetool status

Datacenter: datacenter1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns   Host ID
Rack

UN  10.10.1.94  164.39 KB  256 49.4%
fd2f76ae-8dcf-4e93-a37f-bf1e9088696e  rack1

UN  10.10.1.98 99.08 KB   256 50.6%
f2a48fc7-a362-43f5-9061-4bb3739fdeaf  rack1


Thanks again for your help!


Tim


On Sun, May 18, 2014 at 12:35 AM, Colin Clark co...@clark.ws wrote:

 Looks like you may have put the token next to num-tokens property in the
 yaml file for one node.  I would double check the yaml's to make sure the
 tokens are setup correctly and that the ip addresses are associated with
 the right entries as well.

 Compare them to a fresh download if possible to see what you've changed.

 --
 Colin
 320-221-9531


 On May 17, 2014, at 10:29 PM, Tim Dunphy bluethu...@gmail.com wrote:

 You probably generated the wrong token type.  Look for a murmur token
 generator on the Datastax site.

 What Colin is saying is that the tool you used to create the token, is not
 creating tokens usable for the Murmur3Partitioner. That tool is probably
 generating tokens for the (original) RandomPartitioner, which has a
 different range.


 Thanks guys for your input. And I apologize for reading  Colin's initial
 response too quickly which lets me know that I was probably using the wrong
 token generator for the wrong partition type. That of course was the case.
 So what I've done is use this token generator form the datastax website:

 python -c 'print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in 
 range(number_of_tokens)]


 That algorithm generated a token I could use to start Cassandra on my second 
 node.


 However at this stage I have both nodes running and I believe their gossiping 
 if I understand what I see here correctly:


  INFO 02:44:13,823 No gossip backlog; proceeding


 However I've setup web pages for each of the two web servers that are running 
 Cassandra. And it looks like the seed node with all the data is rendering 
 correctly. But the node that's downstream from the seed node is not receiving 
 any of its data despite the message that I've just shown you.


 And if I go to the seed node and do a describe keyspaces I see the keyspace 
 that drives the website listed. It's called 'joke_fire1'


 cqlsh describe keyspaces;

 system  joke_fire1  system_traces

 And if I go to the node that's downstream from the seed node and run the same 
 command:


 cqlsh describe keyspaces;

 system  system_traces


 I don't see the important keyspace that runs the site.


 I have the seed node's IP listed in 'seeds' in the cassandra.yaml on the 
 downstream node. So I'm not really sure why its' not receiving the seed's 
 data. If there's some command I need to run to flush the system or something 
 like that.


 And if I do a nodetool ring command on the first (seed) host I don't see the 
 IP of the downstream node listed:






 [root@beta-new:~] #nodetool ring | head -10

 Note: Ownership information does not include topology; for complete 
 information, specify a keyspace


 Datacenter: datacenter1

 ==

 Address RackStatus State   LoadOwns   
  Token


 10.10.1.94  rack1   Up Normal  150.64 KB   100.00% 
 -9173731940639284976

 10.10.1.94  rack1   Up Normal  150.64 KB   100.00% 
 -9070607847117718988

 10.10.1.94  rack1   kUp Normal  150.64 KB   100.00% 
 -9060190512633067546

 10.10.1.94  rack1   Up Normal  150.64 KB   100.00% 
 -8935690644016753923


 And if I look on the downstream node and run nodetool ring I see only the IP 
 of the downstream node and not the seed listed:









 [root@beta:/var/lib/cassandra] #nodetool ring | head -15


 Datacenter: datacenter1

 ==

 Address  RackStatus State   LoadOwns
 Token


 10.10.1.98  rack1   Up Normal  91.06 KB99.99%  
 -9223372036854775808

 10.10.1.98  rack1   Up Normal  91.06 KB99.99%  
 -9151314442816847873

 10.10.1.98  rack1   Up Normal  91.06 KB99.99%  
 -9079256848778919937

 10.10.1.98  rack1   Up Normal  91.06 KB99.99%  
 -9007199254740992001

 10.10.1.98  rack1   Up Normal  91.06 KB99.99%  
 -8935141660703064065

 10.10.1.98  rack1   Up Normal  91.06 KB99.99%  
 

Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-17 Thread Kevin Burton
compression … sure.. but bmdiff? Not that I can find.  BMDiff is an
algorithm that in some situations could result in 10x compression due
to the way it's able to find long commons runs.  This is a pathological
case though.  But if you were to copy the US constitution into itself
… 10x… bmdiff could ideally get a 10x compression rate.

not all compression algorithms are identical.


On Sat, May 17, 2014 at 8:59 PM, Colin colpcl...@gmail.com wrote:

 Cassandra offers compression out of the box.  Look into the options
 available upon table creation.

 The use of orderedpartitioner is an anti-pattern 999/1000 times.  It
 creates hot spots - the use of wide rows can often accomplish the same
 result through the use of clustering columns.

 --
 Colin
 320-221-9531


 On May 17, 2014, at 10:15 PM, Kevin Burton bur...@spinn3r.com wrote:

 So  I see that Cassandra doesn't support bmdiff/vcdiff.

 Is this primarily because most people aren't using the ordered partitioner?

 bmdiff gets good compression by storing similar content next to each page
 on disk.  So lots of HTML content would compress well.

 but if everything is being stored at random locations, you wouldn't get
 that bump in storage / compression reduction.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.