[BETA RELEASE] Apache Cassandra 1.2.0-beta2 released

2012-11-09 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the second beta for
the future Apache Cassandra 1.2.0.

Let me first stress that this is beta software and as such is *not* ready
for
production use.

This release is still beta so is likely not bug free. However, lots have
been
fixed since beta1 and if everything goes right, we are hopeful that a first
release candidate may follow shortly. Please do help testing this beta to
help
make that happen. If you encounter any problem during your testing, please
report[3,4] them. And be sure to a look at the change log[1] and the release
notes[2] to see where Cassandra 1.2 differs from the previous series.

Apache Cassandra 1.2.0-beta2[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 12x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/wnDAV (CHANGES.txt)
[2]: http://goo.gl/CBsqs (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta2


How to search User (Entity) columns without sec. index?

2012-11-09 Thread Alan Ristić
Here is the thing. I'm modelling User entity and got to problem with
searching trough user columns.

CREATE TABLE users (
  user_uuid uuid PRIMARY KEY,
  date_created timestamp,
  password varchar,
  username varchar,
  name varchar,
  first_name varchar,
  last_name varchar,
  email varchar,
  ...
) ;

CREATE INDEX users__username_idx ON users (username);

Now I know it's a bad practice to model sec. index on 'username' becouse of
uniqness and all but what's the alternative? I'd want 'username' to be
searchable?

Tnx,
*Alan Ristić*

*m*: 040 423 688


Re: How to search User (Entity) columns without sec. index?

2012-11-09 Thread Alain RODRIGUEZ
I think there is just a few solutions.

- Secondary index on username
- CF used as an index (store username as row and all the uuid of users with
this username as columns)
- Get all the data and filter after (really poor performances depending on
the size of the data set)

I can't see an other way to perform your query.


2012/11/9 Alan Ristić alan.ris...@gmail.com

 Here is the thing. I'm modelling User entity and got to problem with
 searching trough user columns.

 CREATE TABLE users (
   user_uuid uuid PRIMARY KEY,
   date_created timestamp,
   password varchar,
   username varchar,
   name varchar,
   first_name varchar,
   last_name varchar,
   email varchar,
   ...
 ) ;

 CREATE INDEX users__username_idx ON users (username);

 Now I know it's a bad practice to model sec. index on 'username' becouse
 of uniqness and all but what's the alternative? I'd want 'username' to be
 searchable?

 Tnx,
 *Alan Ristić*

 *m*: 040 423 688




Remove crashed node

2012-11-09 Thread Robin Verlangen
Hi there,

We have had a crashed node that is currently removed from the rack. However
when I try a schema upgrade / truncate operation it complains of the
unreachable node. I tried the removetoken, but that didn't resolve.

Any ideas on how to fix this?

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Remove crashed node

2012-11-09 Thread Alain RODRIGUEZ
On cassandra-cli if you describe cluster; I guess you will see an
UNREACHABLE node.

If you do so, there is a way to remove this unreachable node.

Go to the JMX management console (ip_of_one_up_node:8081 by default)

Then go to the org.apache.cassandra.net:type=Gossiper link and use the
unsafeAssassinateEndpoint
input. Fill it with the ip of the down node and invoke the function.

nodetool gossipinfo should now tell you that this node has left the ring
and let you truncate or whatever you need to do.

Use this carefully this function is composed of the unsafe and
assassinate words because it forces the node to go out of the ring
without any check or replication.

Alain


2012/11/9 Robin Verlangen ro...@us2.nl

 Hi there,

 We have had a crashed node that is currently removed from the rack.
 However when I try a schema upgrade / truncate operation it complains of
 the unreachable node. I tried the removetoken, but that didn't resolve.

 Any ideas on how to fix this?

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




Re: Remove crashed node

2012-11-09 Thread Robin Verlangen
Hi Alain,

How can I access that? Web browser does not seem to work. Do I need any
software to login? If so, what is proper software for Windows?

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



On Fri, Nov 9, 2012 at 2:50 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 On cassandra-cli if you describe cluster; I guess you will see an
 UNREACHABLE node.

 If you do so, there is a way to remove this unreachable node.

 Go to the JMX management console (ip_of_one_up_node:8081 by default)

 Then go to the org.apache.cassandra.net:type=Gossiper link and use the 
 unsafeAssassinateEndpoint
 input. Fill it with the ip of the down node and invoke the function.

 nodetool gossipinfo should now tell you that this node has left the ring
 and let you truncate or whatever you need to do.

 Use this carefully this function is composed of the unsafe and
 assassinate words because it forces the node to go out of the ring
 without any check or replication.

 Alain


 2012/11/9 Robin Verlangen ro...@us2.nl

 Hi there,

 We have had a crashed node that is currently removed from the rack.
 However when I try a schema upgrade / truncate operation it complains of
 the unreachable node. I tried the removetoken, but that didn't resolve.

 Any ideas on how to fix this?

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.





Re: leveled compaction and tombstoned data

2012-11-09 Thread Mina Naguib


On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote:

 we are having the problem where we have huge SSTABLEs with tombstoned data in 
 them that is not being compacted soon enough (because size tiered compaction 
 requires, by default, 4 like sized SSTABLEs).  this is using more disk space 
 than we anticipated.
 
 we are very write heavy compared to reads, and we delete the data after N 
 number of days (depends on the column family, but N is around 7 days)
 
 my question is would leveled compaction help to get rid of the tombstoned 
 data faster than size tiered, and therefore reduce the disk space usage

From my experience, levelled compaction makes space reclamation after deletes 
even less predictable than sized-tier.

The reason is that deletes, like all mutations, are just recorded into 
sstables.  They enter level0, and get slowly, over time, promoted upwards to 
levelN.

Depending on your *total* mutation volume VS your data set size, this may be 
quite a slow process.  This is made even worse if the size of the data you're 
deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted 
by a small row-level tombstone.  If the row is sitting in level 4, the 
tombstone won't impact it until enough data has pushed over all existing data 
in level3, level2, level1, level0

Finally, to guard against the tombstone missing any data, the tombstone itself 
is not candidate for removal (I believe even after gc_grace has passed) unless 
it's reached the highest populated level in levelled compaction.  This means if 
you have 4 levels and issue a ton of deletes (even deletes that will never 
impact existing data), these tombstones are deadweight that cannot be purged 
until they hit level4.

For a write-heavy workload, I recommend you stick with sized-tier.  You have 
several options at your disposal (compaction min/max thresholds, gc_grace) to 
move things along.  If that doesn't help, I've heard of some fairly reputable 
people doing some fairly blasphemous things (major compactions every night).




Re: Remove crashed node

2012-11-09 Thread Alain RODRIGUEZ
You have to install mx4j-tools.jar.

http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J

It's a java tool, so it is usable on both windows and linux.

Here is the link to dl mx4j-tool.jar :
http://www.java2s.com/Code/JarDownload/mx4j/mx4j-tools-3.0.2.jar.zip

unzip it and add it to the path of your cassandra libraries, restart the
node where it is installed and you should be ok.

Alain


2012/11/9 Robin Verlangen ro...@us2.nl

 Hi Alain,

 How can I access that? Web browser does not seem to work. Do I need any
 software to login? If so, what is proper software for Windows?

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.



 On Fri, Nov 9, 2012 at 2:50 PM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 On cassandra-cli if you describe cluster; I guess you will see an
 UNREACHABLE node.

 If you do so, there is a way to remove this unreachable node.

 Go to the JMX management console (ip_of_one_up_node:8081 by default)

 Then go to the org.apache.cassandra.net:type=Gossiper link and use the 
 unsafeAssassinateEndpoint
 input. Fill it with the ip of the down node and invoke the function.

 nodetool gossipinfo should now tell you that this node has left the
 ring and let you truncate or whatever you need to do.

 Use this carefully this function is composed of the unsafe and
 assassinate words because it forces the node to go out of the ring
 without any check or replication.

 Alain


 2012/11/9 Robin Verlangen ro...@us2.nl

 Hi there,

 We have had a crashed node that is currently removed from the rack.
 However when I try a schema upgrade / truncate operation it complains of
 the unreachable node. I tried the removetoken, but that didn't resolve.

 Any ideas on how to fix this?

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.






Re: Indexing Data in Cassandra with Elastic Search

2012-11-09 Thread Alain RODRIGUEZ
Thanks for sharing this. We are also using Cassandra + Storm + Queue
messaging (Kestrel for now) and are always glad to learn.

Alain


2012/11/9 Brian O'Neill b...@alumni.brown.edu

 For those looking to index data in Cassandra with Elastic Search, here
 is what we decided to do:

 http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html

 -brian

 --
 Brian ONeill
 Lead Architect, Health Market Science (http://healthmarketscience.com)
 mobile:215.588.6024
 blog: http://brianoneill.blogspot.com/
 twitter: @boneill42



Re: leveled compaction and tombstoned data

2012-11-09 Thread Ben Coverston
The rules for tombstone eviction are as follows (regardless of your
compaction strategy):

1. gc_grace must be expired, and
2. No other row fragments can exist for the row that aren't also
participating in the compaction.

For LCS, there is no 'rule' that the tombstones can only be evicted at the
highest level. They can be evicted on whichever of the level that the row
converges on. Depending on your use case this may mean it always happens at
level4, it might also mean that it most often happens at L1, or L2.






On Fri, Nov 9, 2012 at 7:31 AM, Mina Naguib mina.nag...@adgear.com wrote:



 On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote:

  we are having the problem where we have huge SSTABLEs with tombstoned
 data in them that is not being compacted soon enough (because size tiered
 compaction requires, by default, 4 like sized SSTABLEs).  this is using
 more disk space than we anticipated.
 
  we are very write heavy compared to reads, and we delete the data after
 N number of days (depends on the column family, but N is around 7 days)
 
  my question is would leveled compaction help to get rid of the
 tombstoned data faster than size tiered, and therefore reduce the disk
 space usage

 From my experience, levelled compaction makes space reclamation after
 deletes even less predictable than sized-tier.

 The reason is that deletes, like all mutations, are just recorded into
 sstables.  They enter level0, and get slowly, over time, promoted upwards
 to levelN.

 Depending on your *total* mutation volume VS your data set size, this may
 be quite a slow process.  This is made even worse if the size of the data
 you're deleting (say, an entire row worth several hundred kilobytes) is
 to-be-deleted by a small row-level tombstone.  If the row is sitting in
 level 4, the tombstone won't impact it until enough data has pushed over
 all existing data in level3, level2, level1, level0

 Finally, to guard against the tombstone missing any data, the tombstone
 itself is not candidate for removal (I believe even after gc_grace has
 passed) unless it's reached the highest populated level in levelled
 compaction.  This means if you have 4 levels and issue a ton of deletes
 (even deletes that will never impact existing data), these tombstones are
 deadweight that cannot be purged until they hit level4.

 For a write-heavy workload, I recommend you stick with sized-tier.  You
 have several options at your disposal (compaction min/max thresholds,
 gc_grace) to move things along.  If that doesn't help, I've heard of some
 fairly reputable people doing some fairly blasphemous things (major
 compactions every night).





-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


HugeTLB (Hugepage) Support on a Cassandra Cluster

2012-11-09 Thread Morantus, James (PCLN-NW)
Hi,

Does anyone know if DataStax/Cassandra recommends using HugeTLB on a cluster?

Thank you

James Morantus
Sr. Database Administrator
203-299-8733
Priceline.com



Re: leveled compaction and tombstoned data

2012-11-09 Thread Rob Coli
On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote:
 my question is would leveled compaction help to get rid of the tombstoned
 data faster than size tiered, and therefore reduce the disk space usage?

You could also...

1) run a major compaction
2) code up sstablesplit
3) profit!

This method incurs a management penalty if not automated, but is
otherwise the most efficient way to deal with tombstones and obsolete
data.. :D

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Strange delay in query

2012-11-09 Thread André Cruz
That must be it. I dumped the sstables to json and there are lots of records, 
including ones that are returned to my application, that have the deletedAt 
attribute. I think this is because the regular repair job was not running for 
some time, surely more than the grace period, and lots of tombstones stayed 
behind even though we are running repair regularly now.

Thanks!
André

On Nov 8, 2012, at 10:51 PM, Josep Blanquer blanq...@rightscale.com wrote:

 Can it be that you have tons and tons of tombstoned columns in the middle of 
 these two? I've seen plenty of performance issues with wide rows littered 
 with column tombstones (you could check with dumping the sstables...)
 
 Just a thought...
 
 Josep M.
 
 On Thu, Nov 8, 2012 at 12:23 PM, André Cruz andre.c...@co.sapo.pt wrote:
 These are the two columns in question:
 
 = (super_column=13957152-234b-11e2-92bc-e0db550199f4,
  (column=attributes, value=, timestamp=1351681613263657)
  (column=blocks, 
 value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q,
  timestamp=1351681613263657)
  (column=hash, 
 value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg,
  timestamp=1351681613263657)
  (column=icon, value=image_jpg, timestamp=1351681613263657)
  (column=is_deleted, value=true, timestamp=1351681613263657)
  (column=is_dir, value=false, timestamp=1351681613263657)
  (column=mime_type, value=image/jpeg, timestamp=1351681613263657)
  (column=mtime, value=1351646803, timestamp=1351681613263657)
  (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg, 
 timestamp=1351681613263657)
  (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4, 
 timestamp=1351681613263657)
  (column=size, value=1379001, timestamp=1351681613263657)
  (column=thumb_exists, value=true, timestamp=1351681613263657))
 = (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4,
  (column=attributes, value={posix: 420}, timestamp=1351790781154800)
  (column=blocks, 
 value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ,
  timestamp=1351790781154800)
  (column=hash, 
 value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw,
  timestamp=1351790781154800)
  (column=icon, value=text_txt, timestamp=1351790781154800)
  (column=is_dir, value=false, timestamp=1351790781154800)
  (column=mime_type, value=text/plain, timestamp=1351790781154800)
  (column=mtime, value=1351378576, timestamp=1351790781154800)
  (column=name, value=/Documents/VIMDocument.txt, 
 timestamp=1351790781154800)
  (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4, 
 timestamp=1351790781154800)
  (column=size, value=13, timestamp=1351790781154800)
  (column=thumb_exists, value=false, timestamp=1351790781154800))
 
 
 I don't think their size is an issue here.
 
 André
 
 On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh ailin...@gmail.com wrote:
 
 What is the size of columns? Probably those two are huge.
 
 
 On Thu, Nov 8, 2012 at 4:01 AM, André Cruz andre.c...@co.sapo.pt wrote:
 On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote:
 
  This error also happens on my application that uses pycassa, so I don't 
  think this is the same bug.
 
 I have narrowed it down to a slice between two consecutive columns. Observe 
 this behaviour using pycassa:
 
  DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
   column_count=2, 
  column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 
 Connection 52905488 (xxx:9160) was checked out from pool 51715344
 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 
 Connection 52905488 (xxx:9160) was checked in to pool 51715344
 [UUID('13957152-234b-11e2-92bc-e0db550199f4'), 
 UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]
 
 A two column slice took more than 2s to return. If I request the next 2 
 column slice:
 
  DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
   column_count=2, 
  column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 
 Connection 52904912 (xxx:9160) was checked out from pool 51715344
 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 
 Connection 52904912 (xxx:9160) was checked in to pool 51715344
 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), 
 UUID('a364b028-2449-11e2-8882-e0db550199f4')]
 
 This takes 20msec... Is there a rational explanation for this different 
 behaviour? Is there some threshold that I'm running into? Is there any way 
 to obtain more debugging information about this problem?
 
 Thanks,
 André
 
 
 



Frame size exceptions occurring with ColumnFamilyInputFormat for very large rows

2012-11-09 Thread Marko Rodriguez
Hello,

I am trying to run a Hadoop job that pulls data out of Cassandra via 
ColumnFamilyInputFormat. I am getting a frame size exception. To remedy that, 
I have set both the thrift_framed_transport_size_in_mb and 
thrift_max_message_length_in_mb to an infinite amount at 10mb on all 
nodes. Moreover, I have restarted the cluster and the cassandra.yaml files have 
been reloaded.

However, I am still getting:

12/11/09 21:39:52 INFO mapred.JobClient:  map 62% reduce 0%
12/11/09 21:40:09 INFO mapred.JobClient: Task Id : 
attempt_201211082011_0015_m_000479_2, Status : FAILED
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
Frame size (30046945) larger than max length (16384000)!
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:400)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:406)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:324)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:189)

Question: Why is 16384000 bytes (I assume) !=  10mb?

Next, I made this parameter true as a last hail mary attempt:
cassandra.input.widerows=true
...still with no luck.

Does someone know what I might be missing?

Thank you very much for your time,
Marko.

http://markorodriguez.com


Re: backup/restore from sstable files ?

2012-11-09 Thread Rob Coli
On Thu, Nov 8, 2012 at 5:15 PM, Yang tedd...@gmail.com wrote:
 some of my colleagues seem to use this method to backup/restore a cluster,
 successfully:

 on each of the node, save entire /cassandra/data/ dir to S3,
 then on a new set of nodes, with exactly the same number of nodes,  copy
 back each of the data/ dir.

 then boot up cluster.

Yep, that works as long as the two clusters have the same tokens and
replication strategies.

 but I wonder how it worked: doesn't the system keyspace store information
 specific to the current cluster, such as my sibling nodes in the cluster, my
 IP ?? all these would change once you copy the frozen data files onto a
 new set of nodes.

Yes, for this reason you should not restore the system keyspace files
(except, optionally, Schema.). Definitely you should not restore
LocationInfo. LocationInfo contains ip-to-token mappings. Also you
should make your target cluster have a unique cluster name, and the
old cluster name is also stored in LocationInfo...

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: unsubscribe

2012-11-09 Thread Rob Coli
On Thu, Nov 8, 2012 at 4:57 PM, Jeremy McKay
jeremy.mc...@ntrepidcorp.com wrote:


http://wiki.apache.org/cassandra/FAQ#unsubscribe

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: read request distribution

2012-11-09 Thread Wei Zhu
I think the row whose row key falls into the token range of the high latency 
node is likely to have more columns than the other nodes.  I have three nodes 
with RF = 3, so all the nodes have all the data. And CL = Quorum, meaning each 
request is sent to all three nodes and response is sent back to client when two 
of them respond. What exactly does Read Count from nodetool cfstats mean 
then, should it be the same across all the nodes? I checked with Hector, it 
uses Round Robin LB strategy. And I also tested writes, and the writes are 
distributed across the cluster evenly. Below is the output from nodetool. Any 
one has a clue what might happened?

Node1:
Read Count: 318679
Read Latency: 72.47641436367003 ms.
Write Count: 158680
Write Latency: 0.07918750315099571 ms.
Node 2:
Read Count: 251079 Read Latency: 86.91948475579399 ms. Write Count: 158450 
Write Latency: 0.1744694540864626 ms.
Node 3:
Read Count: 149876 Read Latency: 168.14125553123915 ms. Write Count: 157896 
Write Latency: 0.06468631250949992 ms.

 nodetool ring
Address         DC          Rack        Status State   Load            
Effective-Ownership Token                                       
                                                                                
           113427455640312821154458202477256070485     
10.1.3.152      datacenter1 rack1       Up     Normal  35.85 GB        100.00%  
           0                                           
10.1.3.153      datacenter1 rack1       Up     Normal  35.86 GB        100.00%  
           56713727820156410577229101238628035242      
10.1.3.155      datacenter1 rack1       Up     Normal  35.85 GB        100.00%  
           113427455640312821154458202477256070485     


Keyspace: benchmark:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:3]

I am really confused by the Read Count number from nodetool cfstats

Really appreciate any hints.
-Wei



 From: Wei Zhu wz1...@yahoo.com
To: Cassandr usergroup user@cassandra.apache.org 
Sent: Thursday, November 8, 2012 9:37 PM
Subject: read request distribution
 

Hi All,
I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I 
generated 6M rows with sequence  number from 1 to 6m, so the rows should be 
evenly distributed among the three nodes disregarding the replicates. 

I am doing a benchmark with read only requests, I generate read request for 
randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one 
node has only half the requests as the other one and the third node sits in the 
middle. So the ratio is like 2:3:4. The node with the most read requests 
actually has the smallest latency and the one with the least read requests 
reports the largest latency. The difference is pretty big, the fastest is 
almost double the slowest.

All three nodes have the exactly the same hardware and the data size on each 
node are the same since the RF is three and all of them have the complete data. 
I am using Hector as client and the random read request are in millions. I 
can't think of a reasonable explanation.  Can someone please shed some lights?

Thanks.
-Wei

Retrieve Multiple CFs from Range Slice

2012-11-09 Thread Chris Larsen
Hi! Is there a way to retrieve the columns for all column families on a
given row while fetching range slices? My keyspace has two column families
and when I'm scanning over the rows, I'd like to be able to fetch the
columns in both CFs while iterating over the keys so as to avoid having to
run two scan operations. When I set the CF to an empty string, ala
ColumnParent.setColumn_family(), it throws an error non-empty
columnfamily is required. (Using the Thrift API directly from JAVA on Cass
1.1.6) My HBase scans can return both CFs per row so it works nicely.
Thanks!



Re: Retrieve Multiple CFs from Range Slice

2012-11-09 Thread Edward Capriolo
HBase is different is this regard. A table is comprised of multiple
column families, and they can be scanned at once. However, last time I
checked, scanning a table with two column families is still two
seeks across three different column families.

A similar thing can be accomplished in cassandra by issuing two range
scans, (possibly executing them asynchronously in two threads)

I am sure someone will correct me if I am mistaken.


On Fri, Nov 9, 2012 at 11:46 PM, Chris Larsen clar...@euphoriaaudio.com wrote:
 Hi! Is there a way to retrieve the columns for all column families on a
 given row while fetching range slices? My keyspace has two column families
 and when I’m scanning over the rows, I’d like to be able to fetch the
 columns in both CFs while iterating over the keys so as to avoid having to
 run two scan operations. When I set the CF to an empty string, ala
 ColumnParent.setColumn_family(), it throws an error “non-empty
 columnfamily is required”. (Using the Thrift API directly from JAVA on Cass
 1.1.6) My HBase scans can return both CFs per row so it works nicely.
 Thanks!