How to store a list of values?

2012-03-26 Thread Ben McCann
I have a profile column family and want to store a list of skills in each
profile.  In BigTable I could store a Protocol
Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith
a repeated field, but I'm not sure how this is typically accomplished
in Cassandra.  One option would be to store a serialized
Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do
this as I believe Cassandra doesn't
have knowledge of these formats, and so the data in the datastore would not
not human readable in CQL queries from the command line.  The other
solution I thought of would be to use a super column and put a random UUID
as the key for each skill:

skills: {
  '4b27c2b3ac48e8df': 'java',
  '84bf94ea7bc92018': 'c++',
  '9103b9a93ce9d18': 'cobol'
}

Is this a good way of handling lists in Cassandra?  I imagine there's some
idiom I'm not aware of.  I'm using the
Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library,
which only supports composite columns instead of super
columns, and so the solution I proposed above would seem quite awkward in
that case.  Though I'm still having some trouble understanding composite
columns as they seem not to be completely documented yet.  Would this
solution work with composite columns?

Thanks,
Ben


Re: Error in FAQ?

2012-03-26 Thread R. Verlangen
If you want to modify a column family, just open the command line interface
(cassandra-cli), connect to a node (probably: connect localhost/9160;).

When you have to create your first keyspace type: create keyspace
MyKeyspace;

For modifying an existing keyspace type: use MyKeyspace;

If you need more information you can just type help;

Good luck!

2012/3/26 Ben McCann b...@benmccann.com

 Hmmm, I don't see anything regarding column families in cassandra.yaml.
  It seems like the answer for that question in the FAQ is very outdated.


 On Sun, Mar 25, 2012 at 4:04 PM, Serge Fonville 
 serge.fonvi...@gmail.comwrote:

 Hi,

 2012/3/26 Ben McCann b...@benmccann.com:
  There's a line that says Make necessary changes to your
 storage-conf.xml.
  I can't find this file.  Does it still exist?  If so, where should I
 look?
   I installed the packaged version of Cassandra available in the Datastax
  community edition.

 From  http://wiki.apache.org/cassandra/StorageConfiguration
 Prior to the 0.7 release, Cassandra storage configuration is described
 by the conf/storage-conf.xml file. As of 0.7, it is described by the
 conf/cassandra.yaml file.

 After googling cassandra storage-conf.xml

 Kind regards/met vriendelijke groet,

 Serge Fonville

 http://www.sergefonville.nl

 Convince Google!!
 They need to add GAL support on Android (star to agree)
 http://code.google.com/p/android/issues/detail?id=4602



 2012/3/26 Ben McCann b...@benmccann.com:
  There's a line that says Make necessary changes to your
 storage-conf.xml.
  I can't find this file.  Does it still exist?  If so, where should I
 look?
   I installed the packaged version of Cassandra available in the Datastax
  community edition.
 
  Thanks,
  Ben
 





Re: unbalanced ring

2012-03-26 Thread Radim Kolar



How can I fix this?

add more data. 1.5M is not enough to get reliable reports



problem in create column family

2012-03-26 Thread puneet loya
It is giving errors like  Unable to find abstract-type class
'org.apache.cassandra.db.marshal.utf8' 

and java.lang.RuntimeException:
org.apache.cassandra.db.marshal.MarshalException: cannot parse
'catalogueId' as hex bytes

where catalogueId is a column that has utf8 as its data type. they may be
just synactical errors..

Please suggest if u can help me out on dis??


Re: How to store a list of values?

2012-03-26 Thread samal
I would take simple approach. create one other CF UserSkill  with row key
same as profile_cf key,
In user_skill cf will add skill as column name and value null. Columns can
be added or removed.

UserProfile={
  '*ben*'={
   blah :blah
   blah :blah
   blah :blah
 }
}

UserSkill={
  '*ben*'={
'java':''
'cassandra':''
  .
  .
  .
  'linux':''
  'skill':'infinity'
 }

}


On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote:

 I have a profile column family and want to store a list of skills in each
 profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a 
 repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this 
 as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's some
 idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only 
 supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben




Re: problem in create column family

2012-03-26 Thread R. Verlangen
You should use the full type names, e.g.

create column family MyColumnFamily with comparator=UTF8Type;

2012/3/26 puneet loya puneetl...@gmail.com

 It is giving errors like  Unable to find abstract-type class
 'org.apache.cassandra.db.marshal.utf8' 

 and java.lang.RuntimeException:
 org.apache.cassandra.db.marshal.MarshalException: cannot parse
 'catalogueId' as hex bytes

 where catalogueId is a column that has utf8 as its data type. they may be
 just synactical errors..

 Please suggest if u can help me out on dis??




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: How to store a list of values?

2012-03-26 Thread Ben McCann
Thanks for the reply Samal.  I did not realize that you could store a
column with null value.  Do you know if this solution would work with
composite columns?  It seems super columns are being phased out in favor of
composites, but I do not understand composites very well yet.  I'm trying
to figure out if there's any way to accomplish what you've suggested using
Astyanax https://github.com/Netflix/astyanax.

Thanks for the help,
Ben


On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with row
 key same as profile_cf key,
 In user_skill cf will add skill as column name and value null. Columns
 can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote:

 I have a profile column family and want to store a list of skills in
 each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith 
 a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this 
 as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's
 some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which 
 only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben






Re: How to store a list of values?

2012-03-26 Thread samal
plus it is fully compatible with CQL.
SELECT * FROM UserSkill WHERE KEY='ben';

On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with row
 key same as profile_cf key,
 In user_skill cf will add skill as column name and value null. Columns can
 be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote:

 I have a profile column family and want to store a list of skills in each
 profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a 
 repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this 
 as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's
 some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only 
 supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben





Re: How to store a list of values?

2012-03-26 Thread samal
On Mon, Mar 26, 2012 at 9:20 PM, Ben McCann b...@benmccann.com wrote:

 Thanks for the reply Samal.



  I did not realize that you could store a column with null value.

values can be null or any value like
[default@node] set hus['test']['wowq']='\{de\'.de\;\}\+\^anything';
Value inserted.
Elapsed time: 4 msec(s).
[default@node]
[default@node]
[default@node] get hus['test'];
= (column=wow, value={de.de;}, timestamp=133222503000)
= (column=wowq, value={de'.de;}+^anything, timestamp=133267425000)
Returned 2 results.
Elapsed time: 65 msec(s).
[default@node]


  Do you know if this solution would work with composite columns?  It seems
 super columns are being phased out in favor of composites, but I do not
 understand composites very well yet.

personally i have phased out Super Column year back, about CC didn't much
dig into it but know key and column name can be composite.

'ben'+'task1'={
   utf8+ascii:''
}


  I'm trying to figure out if there's any way to accomplish what you've
 suggested using Astyanax https://github.com/Netflix/astyanax.

  this is the simplest approach, should work with every client available
since it is independent CF, here two call is required.


 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with row
 key same as profile_cf key,
 In user_skill cf will add skill as column name and value null. Columns
 can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote:

 I have a profile column family and want to store a list of skills in
 each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith 
 a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's
 some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which 
 only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







Re: Counters and replication factor

2012-03-26 Thread aaron morton
Can you describe the situations where counter updates are lost or go backwards ?

Do you ever get TimedOutExceptions when performing counter updates ? 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/03/2012, at 6:34 PM, Radim Kolar wrote:

 
 I still have wrong results (I simulated an event 5 times and it was counted 
 3 times by some counters 4 or 5 times by others.
 I have also wrong results with counters in 1.0.8, many times updates to 
 counter column are just lost and sometimes counters are going backwards even 
 if our app uses only increments. Dont reply on counters for something 
 important. they are still beta quality.
 
 We are now using zookeeper for important counters and cassandra for junk 
 like statistic.



Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-26 Thread aaron morton
 without them overlapping/disturbing each other (assuming that keys lie in 
 above domains) ?
Not sure what you mean by overlapping. 

42 as a int and 42 as a long are the same key.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/03/2012, at 9:47 PM, Ertio Lew wrote:

 I have been writing rows to a CF all with integer(4 byte) keys. So my CF 
 contains rows with keys in the entire range from Integer.MIN_VALUE to 
 Integer.MAX_VALUE. 
 
 Now I want to store Long type keys as well in this CF **without disturbing 
 the integer keys. The range of Long type keys would be excluding the 
 integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63). 
 
 Would it be safe to mix the integer  long keys in single CF without them 
 overlapping/disturbing each other (assuming that keys lie in above domains) ?



Re: How to store a list of values?

2012-03-26 Thread Sasha Dolgy
Save the skills in a single column in json format.  Job done.
On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote:

 True.  But I don't need the skills to be searchable, so I'd rather embed
 them in the user than add another top-level CF.  I was thinking of doing
 something along the lines of adding a skills super column to the User table:

 skills: {
   'java': null,
   'c++': null,
   'cobol': null
 }

 However, I'm still not sure yet how to accomplish this with Astyanax.
  I've only figured out how to make composite columns with predefined column
 names with it and not dynamic column names like this.



 On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote:

 In this case you only neem the columns for values. You don't need the
 column-values to hold multiple columns (the super-column principle). So a
 normal CF would work.


 2012/3/26 Ben McCann b...@benmccann.com

 Thanks for the reply Samal.  I did not realize that you could store a
 column with null value.  Do you know if this solution would work with
 composite columns?  It seems super columns are being phased out in favor of
 composites, but I do not understand composites very well yet.  I'm trying
 to figure out if there's any way to accomplish what you've suggested using
 Astyanax https://github.com/Netflix/astyanax.

 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with
 row key same as profile_cf key,
 In user_skill cf will add skill as column name and value null. Columns
 can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote:

 I have a profile column family and want to store a list of skills in
 each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith
  a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would 
 not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random 
 UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's
 some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which 
 only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





Re: smart client proxy for cassandra

2012-03-26 Thread aaron morton
I've heard of people using HA Proxy  http://haproxy.1wt.eu/ with php as a 
connection pool.

Note that detecting failure in Cassandra can only be done as part of a request. 
So HA Proxy cannot understand if a node is actually functional, only that it 
allows a socket to be opened. 

There is some work being done on creating a proxy server with Hector 
https://github.com/rantav/hector/tree/lcp-first-cut Not sure on it's progress.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/03/2012, at 10:46 PM, Piavlo wrote:

 Hi,
 
 Is there any smart client proxy implementation for cassandra?
 I'd like to proxy short lived phpcassa connections through a smart proxy that 
 will manage a pool of connections and be aware of current cluster state, 
 bad/slow nodes etc...
 The java php libraries https://github.com/s7/scale7-pelops and 
 https://github.com/Netflix/astyanax looks like good choices.
 But since I'm not a java programmer I'd first check if someone already have 
 done this or if someone could give guidelines on how to extend one of the 
 above java clients to also proxy thrift connections.
 
 Thanks
 Alex



Re: Regarding nodetool tpstats

2012-03-26 Thread aaron morton
Cheers :)

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/03/2012, at 1:55 PM, Watanabe Maki wrote:

 - InternalResponseStage
 
 Handles response to non client initiated messages, including bootstrap, 
 schema check, etc.
 
 maki
 
 
 On 2012/03/26, at 2:18, aaron morton aa...@thelastpickle.com wrote:
 
 Work is broken up into a series of stages. 
 
 - ReadStage - performing a local read. 
 - RequestResponseStage - handling responses from other nodes. 
 - MutationStage - performing a local write.
 - ReplicateOnWriteStage - for counter writes, replicates after a local write
 - GossipStage - handles gossip rounds (ever second)
 - AntiEntropyStage - repairs consistency (nodetool reapir)
 - MigrationStage - schema changes
 - MemtablePostFlusher - flush commit log (and other things) after flushing 
 memtable.  
 - StreamStage - streams data between nodes during repair
 - FlushWrite - Flush memtable to disk
 - MiscStage - does misc stuff
 - InternalResponseStage - not sure.
 - HintedHandoff - sends missed mutations to other nodes. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/03/2012, at 9:49 PM, Rishabh Agrawal wrote:
 
 Hello
 I am new to Cassandra and when I run tpstats on my node (Cassandra 1.0.7) I 
 get following output:
  
 Pool NameActive   Pending  Completed   Blocked  All 
 time blocked
 ReadStage 0 0 12 0  
0
 RequestResponseStage  0 0 20 0  
0
 MutationStage 0 0 14 0  
0
 ReadRepairStage   0 0 14 0  
0
 ReplicateOnWriteStage 0 0  0 0  
0
 GossipStage   0 0 273665 0  
0
 AntiEntropyStage  0 0  0 0  
0
 MigrationStage0 0119 0  
0
 MemtablePostFlusher   0 0 13 0  
0
 StreamStage   0 0  0 0  
0
 FlushWriter   0 0 13 0  
0
 MiscStage 0 0  0 0  
0
 InternalResponseStage 0 0318 0  
0
 HintedHandoff 0 0  2 0  
0
  
 Can anyone help with what each pool name stands for?
  
 Thanks and Regards
 Rishabh Agrawal
 
 
 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know 
 more about our Big Data quick-start program at the event. 
 
 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis 
 On-premise’ available at http://bit.ly/z6zT4L. 
 
 
 NOTE: This message may contain information that is confidential, 
 proprietary, privileged or otherwise protected by law. The message is 
 intended solely for the named addressee. If received in error, please 
 destroy and notify the sender. Any use of this email is prohibited when 
 received in error. Impetus does not represent, warrant and/or guarantee, 
 that the integrity of this communication has been maintained nor that the 
 communication is free of errors, virus, interception or interference.
 



Re: Sample Data

2012-03-26 Thread Tyler Hobbs
The 'stress' tool that you can find in a source checkout of cassandra
sounds like what you're looking for.  It's designed to write data to (or
read data from) a cluster as fast as possible, and has plenty of options
for tweaking the type of data it inserts.

You can read more about it here:
http://www.datastax.com/docs/1.0/references/stress_java

On Mon, Mar 26, 2012 at 6:59 AM, Rishabh Agrawal 
rishabh.agra...@impetus.co.in wrote:

 Thanks for the prompt response. I am looking at a solution which can
 generate scripts of statements which I can run or tweak to run in linux
 environment.

 -Original Message-
 From: Benoit Perroud [mailto:ben...@noisette.ch]
 Sent: Monday, March 26, 2012 4:14 PM
 To: user@cassandra.apache.org
 Subject: Re: Sample Data

 Cassandra unit (https://github.com/jsevellec/cassandra-unit) could help
 you.


 Le 26 mars 2012 12:34, Rishabh Agrawal rishabh.agra...@impetus.co.in a
 écrit :
  Hello,
 
 
 
  I wish to test certain things in Cassandra so can someone help me with
  sample database or sample database data generator which can help me
  flood Cassandra nodes with large amount of data.
 
 
 
 
 
 
 
  Thanks and Regards
 
  Rishabh Agrawal
 
 
  
 
  Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22.
  Know more about our Big Data quick-start program at the event.
 
  New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis
 On-premise’
  available at http://bit.ly/z6zT4L.
 
 
  NOTE: This message may contain information that is confidential,
  proprietary, privileged or otherwise protected by law. The message is
  intended solely for the named addressee. If received in error, please
  destroy and notify the sender. Any use of this email is prohibited
  when received in error. Impetus does not represent, warrant and/or
  guarantee, that the integrity of this communication has been
  maintained nor that the communication is free of errors, virus,
 interception or interference.



 --
 sent from my Nokia 3210

 

 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know
 more about our Big Data quick-start program at the event.

 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis
 On-premise’ available at http://bit.ly/z6zT4L.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Sample Data

2012-03-26 Thread Tom Melendez
  I wish to test certain things in Cassandra so can someone help me with
  sample database or sample database data generator which can help me
  flood Cassandra nodes with large amount of data.

I would recommend YCSB:
https://github.com/brianfrankcooper/YCSB/wiki/

Thanks,

Tom


Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-26 Thread Ertio Lew
I need to use the range beyond the integer32 type range,  so I am using
Long to write those keys. I am afraid if this might lead to collisions with
the previously  stored integer keys in the same CF even if I leave out the
int32 type range.

On Mon, Mar 26, 2012 at 10:51 PM, aaron morton aa...@thelastpickle.comwrote:

 without them overlapping/disturbing each other (assuming that keys lie in
 above domains) ?

 Not sure what you mean by overlapping.

 42 as a int and 42 as a long are the same key.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 25/03/2012, at 9:47 PM, Ertio Lew wrote:

 I have been writing rows to a CF all with integer(4 byte) keys. So my CF
 contains rows with keys in the entire range from Integer.MIN_VALUE to
 Integer.MAX_VALUE.

 Now I want to store Long type keys as well in this CF **without disturbing
 the integer keys. The range of Long type keys would be excluding the
 integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63).

 Would it be safe to mix the integer  long keys in single CF without them
 overlapping/disturbing each other (assuming that keys lie in above domains)
 ?





Re: Estimation of memtable size are wrong

2012-03-26 Thread aaron morton
 Yes i noticed that. Its not too often, about 1 times per week.
The assumption would be that the workload stabilises over time.

 INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) 
 CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted 
 was 16.354632747474547).  calculation took 611ms for 8287 columns
Duh, forgot about the 25% fudge factor. 64 * 1.25 = 80. 

It's working as intended. The serialised bytes is the total throughput, which 
includes overwrites. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/03/2012, at 9:11 PM, Radim Kolar wrote:

 Dne 26.3.2012 0:36, aaron morton napsal(a):
 1. its not possible to run them more often? There should be some limit - 
 run live/serialized calculation at least once per hour. They took just few 
 seconds.
 The live ratio is updated every time the operation count (since startup) for 
 the CF doubles.
 Yes i noticed that. Its not too often, about 1 times per week.
 
 The ratio here is a strange 105363280 100.48 MB /  1317041 / 1.26 Mb  = 80. 
 The live ratio is capped at 64.
 Can you see any log messages about the live ratio for this CF ?
 
 Last report from problematic CF:
 INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) 
 CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted 
 was 16.354632747474547).  calculation took 611ms for 8287 columns



Re: One or Two clusters?

2012-03-26 Thread aaron morton
Use one cluster. Use lots-o-machines.

The read and write paths do not directly  interfere with each other like they 
do in a RDBMS. Compaction created by writes can suck up disk IO, but this is 
throttled so in practice it is not such a big problem. Excessive GC created by 
reads or compaction may slow down the server, but you will want to avoid them 
anyway.

The one caveat is: it depends on how you are transforming the data. If you have 
a are using Hadoop consider creating a single cluster with multiple DC's (like 
Data Stax do). One for OLTP and one for OLAP, do the hadoop work in the OLAP DC 
and have the online app read-write to the OLTP one. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/03/2012, at 3:22 AM, Oleg Proudnikov wrote:

 Hi,
 
 Could someone please help me understand the benefits of having a single large 
 cluster vs. having two smaller clusters separated by the pattern of use? One, 
 MOSTLY WRITE cluster could incrementally accumulate large amounts of data 
 throughout the day. The daily increment would be processed, summarized and 
 stored into the second READ cluster at night. Users would only need to 
 interact with the READ portion of the overall system mostly during the day. 
 Writes would be spread throughout the day and will be a function of user 
 activity with some bulk load activity from time to time.  WRITE portion of 
 the database would be an order of magnitude larger than the READ portion. 
 READ portion would have an an order of magnitude higher traffic except during 
 periodic bulk loads.
 
 On one hand, If I were to have a single cluster I would have more  resources 
 for the users and potentially better scalability. A single cluster may need 
 fewer servers overall, provided write activity does not affect reads... On 
 the other hand, write activity and associated memory consumption, GC, as well 
 as maintenance riutines may affect READ system. The system will be hosted on 
 EC2.
 
 I would appreciate any thoughts.
 
 Regards,
 Oleg



Re: Performance overhead when using start and end columns

2012-03-26 Thread aaron morton
See the test's in the article. 

The code I used for profiling is also available.
 
Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote:

 Thanks but if I do have to specify start and end columns then how much 
 overhead roughly would that translate to since reading metadata should be 
 constant overall?
 
 On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 Some information on query plans
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
 
 Tl;Dr; Select columns with no start, in the natural Comparator order. 
  
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote:
 
 I have rows with around 2K-50K columns but when I do a query I only need to 
 fetch few columns between start and end columns. I was wondering what 
 performance overhead does it cause by using slice query with start and end 
 columns?
  
 Looking at the code it looks like when you give start and end column it goes 
 in IndexSliceReader logic, but it's hard to tell how much overhead on an 
 average one would see? Or is it even worth worrying about?
 
 



Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-26 Thread aaron morton
Only if you reuse a  row key.

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/03/2012, at 6:38 AM, Ertio Lew wrote:

 I need to use the range beyond the integer32 type range,  so I am using Long 
 to write those keys. I am afraid if this might lead to collisions with the 
 previously  stored integer keys in the same CF even if I leave out the int32 
 type range.
 
 On Mon, Mar 26, 2012 at 10:51 PM, aaron morton aa...@thelastpickle.com 
 wrote:
 without them overlapping/disturbing each other (assuming that keys lie in 
 above domains) ?
 Not sure what you mean by overlapping. 
 
 42 as a int and 42 as a long are the same key.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 25/03/2012, at 9:47 PM, Ertio Lew wrote:
 
 I have been writing rows to a CF all with integer(4 byte) keys. So my CF 
 contains rows with keys in the entire range from Integer.MIN_VALUE to 
 Integer.MAX_VALUE. 
 
 Now I want to store Long type keys as well in this CF **without disturbing 
 the integer keys. The range of Long type keys would be excluding the 
 integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63). 
 
 Would it be safe to mix the integer  long keys in single CF without them 
 overlapping/disturbing each other (assuming that keys lie in above domains) ?
 
 



Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?

2012-03-26 Thread aaron morton
Only if you reuse a  row key.

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/03/2012, at 6:38 AM, Ertio Lew wrote:

 I need to use the range beyond the integer32 type range,  so I am using Long 
 to write those keys. I am afraid if this might lead to collisions with the 
 previously  stored integer keys in the same CF even if I leave out the int32 
 type range.
 
 On Mon, Mar 26, 2012 at 10:51 PM, aaron morton aa...@thelastpickle.com 
 wrote:
 without them overlapping/disturbing each other (assuming that keys lie in 
 above domains) ?
 Not sure what you mean by overlapping. 
 
 42 as a int and 42 as a long are the same key.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 25/03/2012, at 9:47 PM, Ertio Lew wrote:
 
 I have been writing rows to a CF all with integer(4 byte) keys. So my CF 
 contains rows with keys in the entire range from Integer.MIN_VALUE to 
 Integer.MAX_VALUE. 
 
 Now I want to store Long type keys as well in this CF **without disturbing 
 the integer keys. The range of Long type keys would be excluding the 
 integers's range ie  (-2^63  to -2^31) and (2^31 to 2^63). 
 
 Would it be safe to mix the integer  long keys in single CF without them 
 overlapping/disturbing each other (assuming that keys lie in above domains) ?
 
 



Re: Performance overhead when using start and end columns

2012-03-26 Thread Data Craftsman
Hi Aaron,

Thanks for the benchmark. The matrix is valuable.

Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.com wrote:
 See the test's in the article.

 The code I used for profiling is also available.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote:

 Thanks but if I do have to specify start and end columns then how much
 overhead roughly would that translate to since reading metadata should be
 constant overall?

 On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.com
 wrote:

 Some information on query plans
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

 Tl;Dr; Select columns with no start, in the natural Comparator order.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote:

 I have rows with around 2K-50K columns but when I do a query I only need
 to fetch few columns between start and end columns. I was wondering what
 performance overhead does it cause by using slice query with start and end
 columns?

 Looking at the code it looks like when you give start and end column it
 goes in IndexSliceReader logic, but it's hard to tell how much overhead on
 an average one would see? Or is it even worth worrying about?






Re: How to store a list of values?

2012-03-26 Thread samal
 Save the skills in a single column in json format.  Job done.

Good if  it have fixed set of skills, then any add or delete changes need
handle in app. -read column first-reformat JOSN-update column (2 thrift
calls).

 skill~Java: null,
 skill~Cassandra: null
This is also good option, but any schema change will break it.


On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote:

 True.  But I don't need the skills to be searchable, so I'd rather embed
 them in the user than add another top-level CF.  I was thinking of doing
 something along the lines of adding a skills super column to the User table:

 skills: {
   'java': null,
   'c++': null,
   'cobol': null
 }

 However, I'm still not sure yet how to accomplish this with Astyanax.
  I've only figured out how to make composite columns with predefined column
 names with it and not dynamic column names like this.



 On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote:

 In this case you only neem the columns for values. You don't need the
 column-values to hold multiple columns (the super-column principle). So a
 normal CF would work.


 2012/3/26 Ben McCann b...@benmccann.com

 Thanks for the reply Samal.  I did not realize that you could store a
 column with null value.  Do you know if this solution would work with
 composite columns?  It seems super columns are being phased out in favor of
 composites, but I do not understand composites very well yet.  I'm trying
 to figure out if there's any way to accomplish what you've suggested using
 Astyanax https://github.com/Netflix/astyanax.

 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with
 row key same as profile_cf key,
 In user_skill cf will add skill as column name and value null.
 Columns can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote:

 I have a profile column family and want to store a list of skills in
 each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith
  a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would 
 not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random 
 UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine
 there's some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which 
 only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward 
 in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





Server side scripting support in Cassandra - go Python !

2012-03-26 Thread Data Craftsman
Howdy,

Some Polyglot Persistence(NoSQL) products started support server side
scripting, similar to RDBMS store procedure.
E.g. Redis Lua scripting.

I wish it is Python when Cassandra has the server side scripting feature.

FYI,

http://antirez.com/post/250

http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store

server side scripting support is an extremely powerful tool. Having
processing close to data (i.e. data locality) is a well known
advantage, ..., it can open the doors to completely new features.

Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: Performance overhead when using start and end columns

2012-03-26 Thread Mohit Anchlia
Thanks!

On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.comwrote:

 See the test's in the article.

 The code I used for profiling is also available.

 Cheers

-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

   On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote:

 Thanks but if I do have to specify start and end columns then how much
 overhead roughly would that translate to since reading metadata should be
 constant overall?

 On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.comwrote:

 Some information on query plans
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

 Tl;Dr; Select columns with no start, in the natural Comparator order.

 Cheers


-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote:

  I have rows with around 2K-50K columns but when I do a query I only
 need to fetch few columns between start and end columns. I was wondering
 what performance overhead does it cause by using slice query with start and
 end columns?

 Looking at the code it looks like when you give start and end column it
 goes in IndexSliceReader logic, but it's hard to tell how much overhead on
 an average one would see? Or is it even worth worrying about?







Re: CQL Reversed and Comparator reversed=true

2012-03-26 Thread Praveen Baratam
Thank you Aaron!

On Mon, Mar 26, 2012 at 10:44 PM, aaron morton aa...@thelastpickle.comwrote:

 create column family Comments
 with comparator = 'CompositeType(UTF8Type(reversed=True), UTF8Type)'
 and key_validation_class = 'UTF8Type'
 and default_validation_class = 'UTF8Type';

 Looks ok.

 SELECT FIRST 100 REVERSED 'z'..'0' from Comments where key = 'xyz';

 try
 SELECT FIRST 100 REVERSED * from Comments where key = 'xyz';

 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 24/03/2012, at 9:41 AM, Praveen Baratam wrote:

 Hello,

 I am a bit confused about how to store and retrieve columns in Reversed
 order.

 Currently I store comments for every blog post in a wide row per post.

 I want to store and retrieve comments for each blog post in
 reversed/descending order for efficiency as we display comments in
 descending order by time. Each comment gets a time based sortable id which
 is stored as part of the first component of the composite type.

 Below is the create statement for the column family that stores comments
 for posts.

 create column family Comments
 with comparator = 'CompositeType(UTF8Type(reversed=True), UTF8Type)'
 and key_validation_class = 'UTF8Type'
 and default_validation_class = 'UTF8Type';

 and the CQL I use to retrieve is as follows

 SELECT FIRST 100 REVERSED 'z'..'0' from Comments where key = 'xyz';

 Am I doing the right thing?

 Are the comments stored in descending time order in CF and with this CQL
 Query am I retrieving the columns in their natural sort order with out any
 additional sorting overhead?

 Thank you.





multi region EC2

2012-03-26 Thread Deno Vichas

all,

we just about ready to push our app live and just have some cassandra 
tuning left.  i've been currently running a 4 node (rep factor 3, 
simple) in EC2 using the datastax AMIs (thanks datastax).  so after 
reading through a bunch of docs i have a few questions.


 - what is the min and recommended number of nodes to use in multiple 
region cluster.  we only have a single app server right now.
- can i migrate the replication strategy one node at a time or do i need 
to shut to the whole cluster to do this?
- what type of performance hit am i going to take having my app server 
cross regions to get to a node.  coming from the SQL world, this is 
usually not a good thing.


if i was to stick in a single region is the any best practices for 
backing up a whole cluster?  from the docs it looks like i need to 
snapshot each node one by one and then copy off the snapshot to 
somewhere offsite.



thanks,
deno


copy data for dev

2012-03-26 Thread Deno Vichas

all,

is there a easy way to take a 4 node snapshot and restore it on my 
single node dev cluster?




thanks,
deno


what other ports than 7199 need to be open for nodetool to work?

2012-03-26 Thread Yiming Sun
Hi,

We opened port 7199 on a cassandra node, but were unable to get a nodetool
to talk to it remotely unless we turn off the firewall entirely.  So what
other ports should be opened for this -- online posts all indicate that JMX
uses a random dynamic port, which would be difficult to create a firewall
exception unless writing a custom java agent.  So we just wondering if
cassandra nodetool uses a specific port/port range.  Thanks.

-- Y.


Re: what other ports than 7199 need to be open for nodetool to work?

2012-03-26 Thread Nick Bailey
You are correct about the second random dynamic port. There is a
ticket open to fix that as well as some other jmx issues:

https://issues.apache.org/jira/browse/CASSANDRA-2967

Regarding nodetool, it doesn't do anything special. Nodetool is often
used to connect to 'localhost' which generally does not have any
firewall rules at all so it usually works. It is still connecting to a
random second port though.

On Mon, Mar 26, 2012 at 2:42 PM, Yiming Sun yiming@gmail.com wrote:
 Hi,

 We opened port 7199 on a cassandra node, but were unable to get a nodetool
 to talk to it remotely unless we turn off the firewall entirely.  So what
 other ports should be opened for this -- online posts all indicate that JMX
 uses a random dynamic port, which would be difficult to create a firewall
 exception unless writing a custom java agent.  So we just wondering if
 cassandra nodetool uses a specific port/port range.  Thanks.

 -- Y.


Re: what other ports than 7199 need to be open for nodetool to work?

2012-03-26 Thread Yiming Sun
Thanks Nick -- I didn't know about this ticket.  Good to know.

Yes, nodetool doesn't do anything special - but I still wish I could use
nodetool to examine other nodes, instead of having to ssh to other nodes
first and then nodetool each one (i am lazy :-).

-- Y.

On Mon, Mar 26, 2012 at 3:50 PM, Nick Bailey n...@datastax.com wrote:

 You are correct about the second random dynamic port. There is a
 ticket open to fix that as well as some other jmx issues:

 https://issues.apache.org/jira/browse/CASSANDRA-2967

 Regarding nodetool, it doesn't do anything special. Nodetool is often
 used to connect to 'localhost' which generally does not have any
 firewall rules at all so it usually works. It is still connecting to a
 random second port though.

 On Mon, Mar 26, 2012 at 2:42 PM, Yiming Sun yiming@gmail.com wrote:
  Hi,
 
  We opened port 7199 on a cassandra node, but were unable to get a
 nodetool
  to talk to it remotely unless we turn off the firewall entirely.  So what
  other ports should be opened for this -- online posts all indicate that
 JMX
  uses a random dynamic port, which would be difficult to create a firewall
  exception unless writing a custom java agent.  So we just wondering if
  cassandra nodetool uses a specific port/port range.  Thanks.
 
  -- Y.



Re: How to store a list of values?

2012-03-26 Thread R. Verlangen
 but any schema change will break it 

How do you mean? You don't have to specify the columns in Cassandra so it
should work perfect. Except for the skill~ is preserverd for your list.

2012/3/26 samal samalgo...@gmail.com


 Save the skills in a single column in json format.  Job done.

 Good if  it have fixed set of skills, then any add or delete changes need
 handle in app. -read column first-reformat JOSN-update column (2 thrift
 calls).

  skill~Java: null,
  skill~Cassandra: null
 This is also good option, but any schema change will break it.


 On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote:

 True.  But I don't need the skills to be searchable, so I'd rather embed
 them in the user than add another top-level CF.  I was thinking of doing
 something along the lines of adding a skills super column to the User table:

 skills: {
   'java': null,
   'c++': null,
   'cobol': null
 }

 However, I'm still not sure yet how to accomplish this with Astyanax.
  I've only figured out how to make composite columns with predefined column
 names with it and not dynamic column names like this.



 On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote:

 In this case you only neem the columns for values. You don't need the
 column-values to hold multiple columns (the super-column principle). So a
 normal CF would work.


 2012/3/26 Ben McCann b...@benmccann.com

 Thanks for the reply Samal.  I did not realize that you could store a
 column with null value.  Do you know if this solution would work with
 composite columns?  It seems super columns are being phased out in favor 
 of
 composites, but I do not understand composites very well yet.  I'm trying
 to figure out if there's any way to accomplish what you've suggested using
 Astyanax https://github.com/Netflix/astyanax.

 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with
 row key same as profile_cf key,
 In user_skill cf will add skill as column name and value null.
 Columns can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote:

 I have a profile column family and want to store a list of skills
 in each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith
  a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore 
 would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random 
 UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine
 there's some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, 
 which only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward 
 in
 that case.  Though I'm still having some trouble understanding 
 composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl






-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Performance overhead when using start and end columns

2012-03-26 Thread R. Verlangen
@Aaron: Very interesting article! Mentioned it on my Dutch blog.

2012/3/26 Mohit Anchlia mohitanch...@gmail.com

 Thanks!


 On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.comwrote:

 See the test's in the article.

 The code I used for profiling is also available.

 Cheers

-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

   On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote:

 Thanks but if I do have to specify start and end columns then how much
 overhead roughly would that translate to since reading metadata should be
 constant overall?

 On Mon, Mar 26, 2012 at 10:18 AM, aaron morton 
 aa...@thelastpickle.comwrote:

 Some information on query plans
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

 Tl;Dr; Select columns with no start, in the natural Comparator order.

 Cheers


-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote:

  I have rows with around 2K-50K columns but when I do a query I only
 need to fetch few columns between start and end columns. I was wondering
 what performance overhead does it cause by using slice query with start and
 end columns?

 Looking at the code it looks like when you give start and end column it
 goes in IndexSliceReader logic, but it's hard to tell how much overhead on
 an average one would see? Or is it even worth worrying about?








-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: what other ports than 7199 need to be open for nodetool to work?

2012-03-26 Thread Edward Capriolo
I have documented some of the things you can do to make the random
port nature of JMX happy.

http://www.jointhegrid.com/highperfcassandra/?p=140

Other options are setting up mx4j or using jmxterm, or setting up a
sock proxy and tell jconsole to use your proxy.

Also there is the xwindows over vnc over ssh route.

Edward


On Mon, Mar 26, 2012 at 3:54 PM, Yiming Sun yiming@gmail.com wrote:
 Thanks Nick -- I didn't know about this ticket.  Good to know.

 Yes, nodetool doesn't do anything special - but I still wish I could use
 nodetool to examine other nodes, instead of having to ssh to other nodes
 first and then nodetool each one (i am lazy :-).

 -- Y.

 On Mon, Mar 26, 2012 at 3:50 PM, Nick Bailey n...@datastax.com wrote:

 You are correct about the second random dynamic port. There is a
 ticket open to fix that as well as some other jmx issues:

 https://issues.apache.org/jira/browse/CASSANDRA-2967

 Regarding nodetool, it doesn't do anything special. Nodetool is often
 used to connect to 'localhost' which generally does not have any
 firewall rules at all so it usually works. It is still connecting to a
 random second port though.

 On Mon, Mar 26, 2012 at 2:42 PM, Yiming Sun yiming@gmail.com wrote:
  Hi,
 
  We opened port 7199 on a cassandra node, but were unable to get a
  nodetool
  to talk to it remotely unless we turn off the firewall entirely.  So
  what
  other ports should be opened for this -- online posts all indicate that
  JMX
  uses a random dynamic port, which would be difficult to create a
  firewall
  exception unless writing a custom java agent.  So we just wondering if
  cassandra nodetool uses a specific port/port range.  Thanks.
 
  -- Y.




Re: multi region EC2

2012-03-26 Thread aaron morton
 (rep factor 3, simple)
if this means you are using the SimpleStrategy I would recommend using the 
NetworkTopologyStrategy.

  - what is the min and recommended number of nodes to use in multiple region 
 cluster.  we only have a single app server right now.
It depends on how exciting you want your life to be. 
You probably want at least 3 nodes in each cassandra DC / EC2 region. These 
could be spread across 3 AZ's in an EC2 region. 

Some background on availability http://thelastpickle.com/2011/06/13/Down-For-Me/
 
 - can i migrate the replication strategy one node at a time or do i need to 
 shut to the whole cluster to do this? 
Just use the NTS from the start. 

 - what type of performance hit am i going to take having my app server cross 
 regions to get to a node.  coming from the SQL world, this is usually not a 
 good thing.
You will need to do some tests. Using LOCAL_QUORUM requests will only block on 
nodes in the local DC. 

 if i was to stick in a single region is the any best practices for backing up 
 a whole cluster?  from the docs it looks like i need to snapshot each node 
 one by one and then copy off the snapshot to somewhere offsite.
yes. 

Good luck. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/03/2012, at 8:12 AM, Deno Vichas wrote:

 all,
 
 we just about ready to push our app live and just have some cassandra tuning 
 left.  i've been currently running a 4 node (rep factor 3, simple) in EC2 
 using the datastax AMIs (thanks datastax).  so after reading through a bunch 
 of docs i have a few questions.
 
  - what is the min and recommended number of nodes to use in multiple region 
 cluster.  we only have a single app server right now.
 - can i migrate the replication strategy one node at a time or do i need to 
 shut to the whole cluster to do this? 
 - what type of performance hit am i going to take having my app server cross 
 regions to get to a node.  coming from the SQL world, this is usually not a 
 good thing.
 
 if i was to stick in a single region is the any best practices for backing up 
 a whole cluster?  from the docs it looks like i need to snapshot each node 
 one by one and then copy off the snapshot to somewhere offsite.
 
 
 thanks,
 deno



Schema advice/help

2012-03-26 Thread Ertio Lew
I need to store activities by each user, on 5 items types. I always want to
read last 10 activities on each item type, by a user (ie, total activities
to read at a time =50).

I am wanting to store these activities in a single row for each user so
that they can be retrieved in single row query, since I want to read all
the last 10 activities on each item.. I am thinking of creating composite
names appending itemtype : activityId(activityId is just timestamp
value) but then, I don't see about how to read the last 10 activities from
all itemtypes.

Any ideas about schema to do this better way ?


cassandra 1.08 on java7 and win7

2012-03-26 Thread Frank Hsueh
I think I have cassandra the server started

In another window:

 cassandra-cli.bat -h localhost -p 9160
Starting Cassandra Client
Connected to: Test Cluster on localhost/9160
Welcome to Cassandra CLI version 1.0.8

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] create keyspace DEMO;
log4j:WARN No appenders could be found for logger
(org.apache.cassandra.config.DatabaseDescriptor).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Cannot locate cassandra.yaml
Fatal configuration error; unable to start server.  See log for stacktrace.

C:\Workspace\cassandra\apache-cassandra-1.0.8\bin


anybody seen this before ?


-- 
Frank Hsueh | frank.hs...@gmail.com


Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread R. Verlangen
Ben Coverston wrote earlier today:  Use a version of the Java 6 runtime,
Cassandra hasn't been tested at all with the Java 7 runtime

So I think that might be a good way to start.

2012/3/26 Frank Hsueh frank.hs...@gmail.com

 I think I have cassandra the server started

 In another window:
 
  cassandra-cli.bat -h localhost -p 9160
 Starting Cassandra Client
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 1.0.8

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] create keyspace DEMO;
 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.
 Cannot locate cassandra.yaml
 Fatal configuration error; unable to start server.  See log for stacktrace.

 C:\Workspace\cassandra\apache-cassandra-1.0.8\bin
 

 anybody seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread Sasha Dolgy
interesting.  that behaviour _does_ happen in 1.0.8, but doesn't in 1.0.6
on windows 7 with Java 7.  looks to be a problem with the CLI and not the
actual Cassandra service.

just tried it now.

-sd

On Mon, Mar 26, 2012 at 11:29 PM, R. Verlangen ro...@us2.nl wrote:

 Ben Coverston wrote earlier today:  Use a version of the Java 6 runtime,
 Cassandra hasn't been tested at all with the Java 7 runtime

 So I think that might be a good way to start.




Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread Frank Hsueh
I'm using the latest of Java 1.6 from Oracle.



On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote:

 Ben Coverston wrote earlier today:  Use a version of the Java 6 runtime,
 Cassandra hasn't been tested at all with the Java 7 runtime

 So I think that might be a good way to start.

 2012/3/26 Frank Hsueh frank.hs...@gmail.com

 I think I have cassandra the server started

 In another window:
 
  cassandra-cli.bat -h localhost -p 9160
 Starting Cassandra Client
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 1.0.8

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] create keyspace DEMO;
 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.
 Cannot locate cassandra.yaml
 Fatal configuration error; unable to start server.  See log for
 stacktrace.

 C:\Workspace\cassandra\apache-cassandra-1.0.8\bin
 

 anybody seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl




-- 
Frank Hsueh | frank.hs...@gmail.com


Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread Frank Hsueh
err ...  same thing happens with Java 1.6



On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 I'm using the latest of Java 1.6 from Oracle.



 On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote:

 Ben Coverston wrote earlier today:  Use a version of the Java 6
 runtime, Cassandra hasn't been tested at all with the Java 7 runtime

 So I think that might be a good way to start.

 2012/3/26 Frank Hsueh frank.hs...@gmail.com

 I think I have cassandra the server started

 In another window:
 
  cassandra-cli.bat -h localhost -p 9160
 Starting Cassandra Client
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 1.0.8

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] create keyspace DEMO;
 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor 
 more info.
 Cannot locate cassandra.yaml
 Fatal configuration error; unable to start server.  See log for
 stacktrace.

 C:\Workspace\cassandra\apache-cassandra-1.0.8\bin
 

 anybody seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl




 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
Frank Hsueh | frank.hs...@gmail.com


Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread Sasha Dolgy
best to open an issue:   https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Mar 26, 2012 at 11:35 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 err ...  same thing happens with Java 1.6



 On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.comwrote:

 I'm using the latest of Java 1.6 from Oracle.



 On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote:

 Ben Coverston wrote earlier today:  Use a version of the Java 6
 runtime, Cassandra hasn't been tested at all with the Java 7 runtime

 So I think that might be a good way to start.

 2012/3/26 Frank Hsueh frank.hs...@gmail.com

 I think I have cassandra the server started

 In another window:
 
  cassandra-cli.bat -h localhost -p 9160
 Starting Cassandra Client
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 1.0.8

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] create keyspace DEMO;
 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor 
 more info.
 Cannot locate cassandra.yaml
 Fatal configuration error; unable to start server.  See log for
 stacktrace.

 C:\Workspace\cassandra\apache-cassandra-1.0.8\bin
 

 anybody seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl




 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
Sasha Dolgy
sasha.do...@gmail.com


The cassandra gem on rubygems.org needs to be updated

2012-03-26 Thread Ilya Maykov
If you're not a maintainer of the cassandra gem on rubygems.org, you can
stop reading.

And if you are ... I'm just bringing this up to your attention:
https://github.com/twitter/cassandra/issues/142

Thanks!

-- Ilya


Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread Frank Hsueh
create keyspace via cassandra cli fails
https://issues.apache.org/jira/browse/CASSANDRA-4085



On Mon, Mar 26, 2012 at 2:44 PM, Sasha Dolgy sdo...@gmail.com wrote:

 best to open an issue:   https://issues.apache.org/jira/browse/CASSANDRA


 On Mon, Mar 26, 2012 at 11:35 PM, Frank Hsueh frank.hs...@gmail.comwrote:

 err ...  same thing happens with Java 1.6



 On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.comwrote:

 I'm using the latest of Java 1.6 from Oracle.



 On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote:

 Ben Coverston wrote earlier today:  Use a version of the Java 6
 runtime, Cassandra hasn't been tested at all with the Java 7 runtime

 So I think that might be a good way to start.

 2012/3/26 Frank Hsueh frank.hs...@gmail.com

 I think I have cassandra the server started

 In another window:
 
  cassandra-cli.bat -h localhost -p 9160
 Starting Cassandra Client
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 1.0.8

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] create keyspace DEMO;
 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor 
 more info.
 Cannot locate cassandra.yaml
 Fatal configuration error; unable to start server.  See log for
 stacktrace.

 C:\Workspace\cassandra\apache-cassandra-1.0.8\bin
 

 anybody seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl




 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 Sasha Dolgy
 sasha.do...@gmail.com




-- 
Frank Hsueh | frank.hs...@gmail.com


Re: multi region EC2

2012-03-26 Thread Deno Vichas

On 3/26/2012 2:15 PM, aaron morton wrote:

- can i migrate the replication strategy one node at a time or do i need to 
shut to the whole cluster to do this?

Just use the NTS from the start.
but what if i already have a bunch (8g per node) data that i need and i 
don't have a way to re-create it.



thanks,
deno


Internal error processing get_slice (NullPointerException)

2012-03-26 Thread John Laban
Has anyone seen this particular NPE before from Cassandra?

This is on 1.0.8.  It seems to happen transiently on multiple nodes in my
cluster, every so often, and goes away.


ERROR [Thrift:45] 2012-03-26 19:59:12,024 Cassandra.java (line 3041)
Internal error processing get_slice
java.lang.NullPointerException
at
org.apache.cassandra.db.SliceFromReadCommand.maybeGenerateRetryCommand(SliceFromReadCommand.java:76)
at
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:724)
at
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:564)
at
org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128)
at
org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:283)
at
org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:365)
at
org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:326)
at
org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:3033)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



The line in question is (I think) the one below, so it looks like the
column family reference for a row can sometimes be null?

int liveColumnsInRow = row != null ? row.cf.getLiveColumnCount() : 0;


Thanks,
John


Re: Fwd: information on cassandra

2012-03-26 Thread Maki Watanabe
auto_bootstrap has been removed from cassandra.yaml and always enabled
since 1.0.
fyi.

maki

2012/3/26 R. Verlangen ro...@us2.nl:
 Yes, you can add nodes to a running cluster. It's very simple: configure
 the cluster name and seed node(s) in cassandra.yaml, set auto_bootstrap to
 true and start the node.


 2012/3/26 puneet loya puneetl...@gmail.com

 5n.. consider i m starting on a single node. can I add nodes later?? plz
 reply :)


 On Sun, Mar 25, 2012 at 7:41 PM, Ertio Lew ertio...@gmail.com wrote:

 I guess 2 node cluster with RF=2 might also be a starting point. Isn't it
 ? Are there any issues with this ?

 On Sun, Mar 25, 2012 at 12:20 AM, samal samalgo...@gmail.com wrote:

 Cassandra has distributed architecture. So 1 node does not fit into it.
 although it can used but you loose its benefits , ok if you are just 
 playing
 around, use vm  to learn how cluster communicate, handle request.

 To get full tolerance, redundancy and consistency minimum 3 node is
 required.

 Imp read here:
 http://wiki.apache.org/cassandra/
 http://www.datastax.com/docs/1.0/index
 http://thelastpickle.com/
 http://www.acunu.com/blogs/all/



 On Sat, Mar 24, 2012 at 11:37 PM, Garvita Mehta garvita.me...@tcs.com
 wrote:

 its not advisable to use cassandra on single node, as its basic
 definition says if a node fails, data still remains in the system, 
 atleast 3
 nodes must be there while setting up a cassandra cluster.


 Garvita Mehta
 CEG - Open Source Technology Group
 Tata Consultancy Services
 Ph:- +91 22 67324756
 Mailto: garvita.me...@tcs.com
 Website: http://www.tcs.com
 
 Experience certainty. IT Services
 Business Solutions
 Outsourcing
 

 -puneet loya wrote: -

 To: user@cassandra.apache.org
 From: puneet loya puneetl...@gmail.com
 Date: 03/24/2012 06:36PM
 Subject: Fwd: information on cassandra




 hi,

 I m puneet, an engineering student. I would like to know that, is
 cassandra useful considering we just have a single node(rather a single
 system) having all the information.
 I m looking for decent response time for the database. can you please
 respond?

 Thank you ,

 Regards,

 Puneet Loya

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl



Re: unbalanced ring

2012-03-26 Thread Maki Watanabe
What version are you using?
Anyway try nodetool repair  compact.

maki

2012/3/26 Tamar Fraenkel ta...@tok-media.com

 Hi!
 I created Amazon ring using datastax image and started filling the db.
 The cluster seems un-balanced.

 nodetool ring returns:
 Address DC  RackStatus State   Load
  OwnsToken

  113427455640312821154458202477256070485
 10.34.158.33us-east 1c  Up Normal  514.29 KB
 33.33%  0
 10.38.175.131   us-east 1c  Up Normal  1.5 MB
  33.33%  56713727820156410577229101238628035242
 10.116.83.10us-east 1c  Up Normal  1.5 MB
  33.33%  113427455640312821154458202477256070485

 [default@tok] describe;
 Keyspace: tok:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:2]

 [default@tok] describe cluster;
 Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
 4687d620-7664-11e1--1bcb936807ff: [10.38.175.131,
 10.34.158.33, 10.116.83.10]


 Any idea what is the cause?
 I am running similar code on local ring and it is balanced.

 How can I fix this?

 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




tokLogo.png

Re: Error in FAQ?

2012-03-26 Thread Ben McCann
I updated the FAQ to the best of my ability:
http://wiki.apache.org/cassandra/FAQ#modify_cf_config


On Mon, Mar 26, 2012 at 12:25 AM, R. Verlangen ro...@us2.nl wrote:

 If you want to modify a column family, just open the command line
 interface (cassandra-cli), connect to a node (probably: connect
 localhost/9160;).

 When you have to create your first keyspace type: create keyspace
 MyKeyspace;

 For modifying an existing keyspace type: use MyKeyspace;

 If you need more information you can just type help;

 Good luck!

 2012/3/26 Ben McCann b...@benmccann.com

 Hmmm, I don't see anything regarding column families in cassandra.yaml.
  It seems like the answer for that question in the FAQ is very outdated.


 On Sun, Mar 25, 2012 at 4:04 PM, Serge Fonville serge.fonvi...@gmail.com
  wrote:

 Hi,

 2012/3/26 Ben McCann b...@benmccann.com:
  There's a line that says Make necessary changes to your
 storage-conf.xml.
  I can't find this file.  Does it still exist?  If so, where should I
 look?
   I installed the packaged version of Cassandra available in the
 Datastax
  community edition.

 From  http://wiki.apache.org/cassandra/StorageConfiguration
 Prior to the 0.7 release, Cassandra storage configuration is described
 by the conf/storage-conf.xml file. As of 0.7, it is described by the
 conf/cassandra.yaml file.

 After googling cassandra storage-conf.xml

 Kind regards/met vriendelijke groet,

 Serge Fonville

 http://www.sergefonville.nl

 Convince Google!!
 They need to add GAL support on Android (star to agree)
 http://code.google.com/p/android/issues/detail?id=4602



 2012/3/26 Ben McCann b...@benmccann.com:
  There's a line that says Make necessary changes to your
 storage-conf.xml.
  I can't find this file.  Does it still exist?  If so, where should I
 look?
   I installed the packaged version of Cassandra available in the
 Datastax
  community edition.
 
  Thanks,
  Ben
 






Re: How to store a list of values?

2012-03-26 Thread samal
On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote:

  but any schema change will break it 

 How do you mean? You don't have to specify the columns in Cassandra so it
 should work perfect. Except for the skill~ is preserverd for your list.


 In case skill~ is decided to change to skill:: , it need to be handle at
app level. Or otherwise had t update in all row, read it first, modify it,
insert new version and delete old version.