[RELEASE] Apache Cassandra 1.2.2 released

2013-02-25 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.2.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/fyjra (CHANGES.txt)
[2]: http://goo.gl/yTibi (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: disabling bloomfilter not working? or did I do this wrong?

2013-02-25 Thread Hiller, Dean
Hmmm, ok, that makes sense.  I suspect the same is true with leveled
compaction as well?

Thanks,
Dean

On 2/25/13 6:47 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

Mostly but not 100%. You have a bloom filter for each sstable, so
going to disk means finding the row in each sstable if you end up
skipping some you are better off. Sometimes you have the data but not
in sstable N. The bloom filter helps avoid checking sstable N to find
nothing.



Re: Size Tiered - Leveled Compaction

2013-02-25 Thread Alain RODRIGUEZ
I am confused.  I thought running compact turns off the minor compactions
and users are actually supposed to run upgradesstables  (maybe I am on
old documentation?)

Well, that's not true. What happens is that compaction use sstables with an
aproximate same size. So if you run a major compaction on a 10GB CF, you
have almost no chance of getting that (big) sstable compacted again. You
will have to wait for other sstables to reach this size or run an other
major compaction.

But anyways, this doesn't apply here because we are speaking of LCS
(leveled compaction strategy), which runs differently from the traditional
STC (sized tier compaction).

Not sure about it, but you may run upgradesstable or compaction to rebuild
your sstable after switching from STC  to LCS, I mean both methods trigger
an initialization of LCS on old sstables.

Alain


2013/2/25 Hiller, Dean dean.hil...@nrel.gov

 I am confused.  I thought running compact turns off the minor compactions
 and users are actually supposed to run upgradesstables  (maybe I am on
 old documentation?)

 Can someone verify that?

 Thanks,
 Dean

 From: Michael Theroux mthero...@yahoo.commailto:mthero...@yahoo.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Sunday, February 24, 2013 7:45 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Size Tiered - Leveled Compaction

 Aaron,

 Thanks for the response.  I think I speak for many Cassandra users when I
 say we greatly appreciate your help with our questions and issues.  For the
 specific bug I mentioned, I found this comment :

 http://data.story.lu/2012/10/15/cassandra-1-1-6-has-been-released

 Automatic fixing of overlapping leveled sstables (CASSANDRA-4644)

 Although I had difficulty putting 2 and 2 together from the comments in
 4644 (it mentioned being fixed in 1.1.6, but also being not reproducible).

 We converted two column families yesterday (two we believe would be
 particularly well suited for Leveled Compaction).  We have two more to
 convert, but those will wait until next weekend.  So far no issues, and,
 we've seen some positive results.

 To help answer some of my own questions I posed in this thread, and others
 have expressed interest in knowing, the steps we followed were:

 1) Perform the proper alter table command:

 ALTER TABLE X WITH compaction_strategy_class='LeveledCompactionStrategy'
 AND  compaction_strategy_options:sstable_size_in_mb=10;

 2) Ran compact on all nodes

 nodetool compact keyspace X

 We converted one column family at a time, and temporarily disabled some
 maintenance activities we perform to decrease load while we converted
 column families, as the compaction was resource heavy and I didn't wish to
 interfere with our operational activities as much as possible.In our
 case, the compaction after altering the schema, took about an hour and a
 half.

 Thus far, it appears everything worked without a hitch.  I chose 10 mb for
 the SSTABLE size, based on Wei's feedback (who's data size is on-par with
 ours), and other tid-bits I found through searching.  Based on issues
 people have reported in the relatively distant past. I made sure that we've
 been handling the compaction load properly, and I've run test repairs on
 the specific tables we converted.  We also tested restarting a node after
 the conversion.

 Again, I believe the tables we converted were particularly well suited for
 Leveled Compaction.  These particular column families were situations where
 reads outstripped writes by an order of magnitude or two.

 So far, our results have been very positive.  We've seen a greater than
 50% reduction in read I/O, and a large improvement in performance for some
 activities.  We've also seen an improvement in memory utilization.  I
 imagine other's mileage may vary.

 If everything is stable over the next week, we will convert the last two
 tables we are considering for Leveled Compaction.

 Thanks again!
 -Mike

 On Feb 24, 2013, at 8:56 PM, aaron morton wrote:

 If you did not use LCS until after the upgrade to 1.1.9 I think you are ok.

 If in doubt the steps here look like they helped
 https://issues.apache.org/jira/browse/CASSANDRA-4644?focusedCommentId=13456137page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13456137

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.comhttp://www.thelastpickle.com/

 On 23/02/2013, at 6:56 AM, Mike mthero...@yahoo.commailto:
 mthero...@yahoo.com wrote:

 Hello,

 Still doing research before we potentially move one of our column families
 from Size Tiered-Leveled compaction this weekend.  I was doing some
 research around some of the bugs that were filed against leveled compaction
 in Cassandra and I found this:

 

[RESULT] [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.1-1

2013-02-25 Thread Stephen Connolly
Result

+1: Stephen Connolly, Mikhail Mazursky
0: Fred Cooke
-1:

-Stephen


On 14 February 2013 09:28, Stephen Connolly stephen.alan.conno...@gmail.com
 wrote:

 Hi,

 I'd like to release version 1.2.1-1 of Mojo's Cassandra Maven Plugin
 to sync up with the 1.2.1 release of Apache Cassandra.

 We solved 1 issues:

 http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121version=19089

 Staging Repository:
 https://nexus.codehaus.org/content/repositories/orgcodehausmojo-015/

 Site:
 http://mojo.codehaus.org/cassandra-maven-plugin/index.html

 SCM Tag:
 https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.1-1@17931

  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
 says it looks fine too.
  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
 follow somebody else if only I could decide who
  [ ] -1 No! wait up there I have issues (in general like, ya know,
 and being a trouble-maker is only one of them)

 The vote is open for 72h and will succeed by lazy consensus.

 Guide to testing staged releases:
 http://maven.apache.org/guides/development/guide-testing-releases.html

 Cheers

 -Stephen

 P.S.
  In the interest of ensuring (more is) better testing, and as is now
 tradition for Mojo's Cassandra Maven Plugin, this vote is
 also open to any subscribers of the dev and user@cassandra.apache.org
 mailing lists that want to test or use this plugin.



Re: Q on schema migratins

2013-02-25 Thread Igor

On 02/22/2013 07:47 PM, aaron morton wrote:

 dropped this secondary index after while.

I assume you use UPDATE COLUMN FAMILY in the CLI.



yes


How can I avoid this secondary index building on node join?

Check the schema using show schema in the cli.



I see no indexes for CF in show schema/

Check that all nodes in the cluster have the same schema, using 
describe cluster in the cli.
If they are in disagreement see this 
http://wiki.apache.org/cassandra/FAQ#schema_disagreement



yes, all nodes agreed on single schema version.


Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 5:17 AM, Igor i...@4friends.od.ua 
mailto:i...@4friends.od.ua wrote:



Hello

Cassandra 1.0.7

Some time ago we used secondary index on one of  CF. Due to 
performance reasons we dropped this secondary index after while. But 
now, each time I add and bootstrap new node I see how cassandra again 
build this secondary index on this node (which takes huge time), and 
when  index is built it is not used anymore, so I can safely delete 
files from disk.


How can I avoid this secondary index building on node join?

Thanks for your answers!






Re: Size Tiered - Leveled Compaction

2013-02-25 Thread Alain RODRIGUEZ
After running a major compaction, automatic minor compactions are no
longer triggered,

... Because of the size difference between the big sstable generated and
the new sstable flushed/compacted. Compactions are not stopped, they are
just no longer triggered for a while.

frequently requiring you to manually run major compactions on a routine
basis

... In order to keep a good read latency. If you don't run compaction
periodically and you have some row update, you will have an increasing
amount of rows spread across various sstable. But my guess is that if you
have no delete, no update and no ttl but only write once row, you may keep
this big table uncompacted for as long as you want without any read
performance degradation.

I think the documentation just don't go deep enough in the explanation, or
maybe this information already exists somewhere else in the documentation.

Wait a confirmation of an expert, I am just an humble user.

Alain


2013/2/25 Hiller, Dean dean.hil...@nrel.gov

 So what you are saying is this documentation is not quite accurate
 then….(I am more confused between your statement and the documentation now)

 http://www.datastax.com/docs/1.1/operations/tuning

 Which says After running a major compaction, automatic minor compactions
 are no longer triggered, frequently requiring you to manually run major
 compactions on a routine basis

 Which implied that you have to keep running major compactions and minor
 compactions are not kicking in anymore :( :( and we(my project) want minor
 compactions to continue.

 Thanks,
 Dean


 From: Alain RODRIGUEZ arodr...@gmail.commailto:arodr...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, February 25, 2013 7:15 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Size Tiered - Leveled Compaction

 I am confused.  I thought running compact turns off the minor compactions
 and users are actually supposed to run upgradesstables  (maybe I am on
 old documentation?)

 Well, that's not true. What happens is that compaction use sstables with
 an aproximate same size. So if you run a major compaction on a 10GB CF, you
 have almost no chance of getting that (big) sstable compacted again. You
 will have to wait for other sstables to reach this size or run an other
 major compaction.

 But anyways, this doesn't apply here because we are speaking of LCS
 (leveled compaction strategy), which runs differently from the traditional
 STC (sized tier compaction).

 Not sure about it, but you may run upgradesstable or compaction to rebuild
 your sstable after switching from STC  to LCS, I mean both methods trigger
 an initialization of LCS on old sstables.

 Alain


 2013/2/25 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
 I am confused.  I thought running compact turns off the minor compactions
 and users are actually supposed to run upgradesstables  (maybe I am on
 old documentation?)

 Can someone verify that?

 Thanks,
 Dean

 From: Michael Theroux mthero...@yahoo.commailto:mthero...@yahoo.com
 mailto:mthero...@yahoo.commailto:mthero...@yahoo.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Sunday, February 24, 2013 7:45 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Size Tiered - Leveled Compaction

 Aaron,

 Thanks for the response.  I think I speak for many Cassandra users when I
 say we greatly appreciate your help with our questions and issues.  For the
 specific bug I mentioned, I found this comment :

 http://data.story.lu/2012/10/15/cassandra-1-1-6-has-been-released

 Automatic fixing of overlapping leveled sstables (CASSANDRA-4644)

 Although I had difficulty putting 2 and 2 together from the comments in
 4644 (it mentioned being fixed in 1.1.6, but also being not reproducible).

 We converted two column families yesterday (two we believe would be
 particularly well suited for Leveled Compaction).  We have two more to
 convert, but those will wait until next weekend.  So far no issues, and,
 we've seen some positive results.

 To help answer some of my own questions I posed in this thread, and others
 have expressed interest in knowing, the steps we followed were:

 1) Perform the proper alter table command:

 ALTER TABLE X WITH compaction_strategy_class='LeveledCompactionStrategy'
 AND  compaction_strategy_options:sstable_size_in_mb=10;

 2) Ran compact on all nodes

 nodetool compact keyspace X

 

Re: Size Tiered - Leveled Compaction

2013-02-25 Thread Hiller, Dean
Sweet, thanks for the info.
Dean

From: Alain RODRIGUEZ arodr...@gmail.commailto:arodr...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, February 25, 2013 7:41 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Size Tiered - Leveled Compaction

After running a major compaction, automatic minor compactions are no longer 
triggered,

... Because of the size difference between the big sstable generated and the 
new sstable flushed/compacted. Compactions are not stopped, they are just no 
longer triggered for a while.

frequently requiring you to manually run major compactions on a routine basis

... In order to keep a good read latency. If you don't run compaction 
periodically and you have some row update, you will have an increasing amount 
of rows spread across various sstable. But my guess is that if you have no 
delete, no update and no ttl but only write once row, you may keep this big 
table uncompacted for as long as you want without any read performance 
degradation.

I think the documentation just don't go deep enough in the explanation, or 
maybe this information already exists somewhere else in the documentation.

Wait a confirmation of an expert, I am just an humble user.

Alain


2013/2/25 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
So what you are saying is this documentation is not quite accurate then….(I am 
more confused between your statement and the documentation now)

http://www.datastax.com/docs/1.1/operations/tuning

Which says After running a major compaction, automatic minor compactions are 
no longer triggered, frequently requiring you to manually run major compactions 
on a routine basis

Which implied that you have to keep running major compactions and minor 
compactions are not kicking in anymore :( :( and we(my project) want minor 
compactions to continue.

Thanks,
Dean


From: Alain RODRIGUEZ 
arodr...@gmail.commailto:arodr...@gmail.commailto:arodr...@gmail.commailto:arodr...@gmail.com
Reply-To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, February 25, 2013 7:15 AM
To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Size Tiered - Leveled Compaction

I am confused.  I thought running compact turns off the minor compactions and 
users are actually supposed to run upgradesstables  (maybe I am on old 
documentation?)

Well, that's not true. What happens is that compaction use sstables with an 
aproximate same size. So if you run a major compaction on a 10GB CF, you have 
almost no chance of getting that (big) sstable compacted again. You will have 
to wait for other sstables to reach this size or run an other major compaction.

But anyways, this doesn't apply here because we are speaking of LCS (leveled 
compaction strategy), which runs differently from the traditional STC (sized 
tier compaction).

Not sure about it, but you may run upgradesstable or compaction to rebuild your 
sstable after switching from STC  to LCS, I mean both methods trigger an 
initialization of LCS on old sstables.

Alain


2013/2/25 Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
I am confused.  I thought running compact turns off the minor compactions and 
users are actually supposed to run upgradesstables  (maybe I am on old 
documentation?)

Can someone verify that?

Thanks,
Dean

From: Michael Theroux 
mthero...@yahoo.commailto:mthero...@yahoo.commailto:mthero...@yahoo.commailto:mthero...@yahoo.commailto:mthero...@yahoo.commailto:mthero...@yahoo.commailto:mthero...@yahoo.commailto:mthero...@yahoo.com
Reply-To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sunday, February 24, 2013 7:45 PM
To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 

Pig_cassandra : Map task only running on one node

2013-02-25 Thread Шамим
Dear users,
  We have got very strange beheviour of hadoop cluster after upgrading 
Cassandra from 1.1.5 to Cassandra 1.2.1. We have 5 nodes cluster of Cassandra, 
where three of them are hodoop slaves. Now when we are submitting job through 
Pig script, only one map task runs on one of the hadoop slaves regardless of 
volumume of datas (already tried with more than million rows).
Configure of pig as follows:
export PIG_HOME=/oracle/pig-0.10.0
export PIG_CONF_DIR=${HADOOP_HOME}/conf
export PIG_INITIAL_ADDRESS=192.168.157.103
export PIG_RPC_PORT=9160
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner

Also we have these following properties in hadoop:
 property
 namemapred.tasktracker.map.tasks.maximum/name
 value10/value
 /property
 property
 namemapred.map.tasks/name
 value4/value
 /property 
Any hint or suggestion will be very appreciated.
Thank'x in advance
Shamim


how to read only from local DC without LOCAL_QUORUM?

2013-02-25 Thread Igor

Hello!

We have 1.0.7 multi-DC cassandra setup with strict time limits for read 
(15ms). We use RF=1 per DC and reads with CL=ONE. Data in datacenters 
are in sync, but we have next problem:
when application looks for key which is not yet in database, coordinator 
wait for digests from remote datacenters which breaks our time limits.


I know that we can use RF:3 per DC and read with LOCAL_QUORUM to 
restrict reads to local DC only, but RF:3 is not acceptable for us.


Can we force somehow cassandra not to lookup keys in remote CD?

Thanks for your answers!


Issues with describe_splits_ex

2013-02-25 Thread Hermán J. Camarena
Hi,
I'm trying to use describe_splits_ex to get splits for local records only.  
When I call it, I always get a list with only one CfSplit.  The start_token and 
end_token are always the same I passed as input and row_count is always 128.  
I'm using 1.1.9.  What am I doing wrong?
Thanks,
Hermán




Re: how to read only from local DC without LOCAL_QUORUM?

2013-02-25 Thread Derek Williams
You should be able to use LOCAL_QUORUM with RF=1. Did you try it and get
some error?


On Mon, Feb 25, 2013 at 10:01 AM, Igor i...@4friends.od.ua wrote:

 Hello!

 We have 1.0.7 multi-DC cassandra setup with strict time limits for read
 (15ms). We use RF=1 per DC and reads with CL=ONE. Data in datacenters are
 in sync, but we have next problem:
 when application looks for key which is not yet in database, coordinator
 wait for digests from remote datacenters which breaks our time limits.

 I know that we can use RF:3 per DC and read with LOCAL_QUORUM to restrict
 reads to local DC only, but RF:3 is not acceptable for us.

 Can we force somehow cassandra not to lookup keys in remote CD?

 Thanks for your answers!




-- 
Derek Williams


Re: disabling bloomfilter not working? memory numbers don't add up?

2013-02-25 Thread Hiller, Dean
H, my upgrade completed and then I added node back in and ran my repair.  
What is weird is that my nreldata column family still shows 156Meg of memory 
still(down from 2 gig though!!) in use and a false positive ratio of .99576 
when I have the filter completely disabled(ie. Set to 1.0).  I see the 
*Filter.db files on disk(and size approximately matches the in-memory size).  I 
tried restarting the node as well.

1. Can I stop the node, delete the *Filter.db files and restart the node(is 
this safe)???
2. Why do I have 5 gig being eaten up by cassandra?  nodetool info memory 
5.2Gig, key cache:11 meg and row cache 0 bytes.   All bloomfilters are also 
small 1meg.

Exception to #2 is I have nreldata still using 156MB for some reason but still 
no where close to 5.2 gig that nodetool shows in use.

Thanks,
Dean




Bloom Filter Space Used: 2318392048tel:2318392048
Just to be sane do a quick check of the -Filter.db files on disk for this CF.
If they are very small try a restart on the node.

Number of Keys (estimate): 1249133696
Hey a billion rows on a node, what an age we live in :)

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 23/02/2013, at 4:35 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
 wrote:

So in the cli, I ran

update column family nreldata with bloom_filter_fp_chance=1.0;

Then I ran

nodetool upgradesstables databus5 nreldata;

But my bloom filter size is still around 2gig(and I want to free up this 
heap) According to nodetool cfstats command…

Column Family: nreldata
SSTable count: 10
Space used (live): 96841497731
Space used (total): 96841497731
Number of Keys (estimate): 1249133696
Memtable Columns Count: 7066
Memtable Data Size: 4286174
Memtable Switch Count: 924
Read Count: 19087150
Read Latency: 0.595 ms.
Write Count: 21281994
Write Latency: 0.013 ms.
Pending Tasks: 0
Bloom Filter False Postives: 974393
Bloom Filter False Ratio: 0.8
Bloom Filter Space Used: 2318392048
Compacted row minimum size: 73
Compacted row maximum size: 446
Compacted row mean size: 143








cluster with cross data center and local

2013-02-25 Thread Keith Wright
Hi all,

   I have a cluster with 2 data centers with an RF 2 keyspace using network 
topology on 1.1.10.  I would like to configure it such that some of the data is 
not cross data center replicated but is replicated between the nodes of the 
local data center.  I assume my only options are to create another cluster or 
to create another keyspace using LocalStrategy strategy?  What's the difference 
between LocalStrategy and SimpleStrategy?

Thanks!


Understanding system.log

2013-02-25 Thread Víctor Hugo Oliveira Molinar
Hello everyone!
I'd like to know if there is any guide or description of the cassandra
server log(system.log).
I mean, how should I interpret each log event, and what information may I
retain for it;


1.2.2 as primary storage?

2013-02-25 Thread Chris Dean
I've been away from Cassandra for a while and wondered what the
consensus is on using 1.2.2 as a primary data store?  

Our app has a typical OLTP workload but we have high availability
requirements.  The data set is just under 1TB and I don't see us growing
to more that a small Cassandra cluster.

I have run 0.7.0 on a 3 node cluster in production and that was fine,
but it was a different sort of application.

Thanks!

(FWIW, our second choice would be to run PG and shard at the app level.)

Cheers,
Chris Dean


Re: 1.2.2 as primary storage?

2013-02-25 Thread Michael Kjellman
How big will each mutation be roughly? 1MB, 5MB, 16MB?

On 2/25/13 3:32 PM, Chris Dean ctd...@sokitomi.com wrote:

I've been away from Cassandra for a while and wondered what the
consensus is on using 1.2.2 as a primary data store?

Our app has a typical OLTP workload but we have high availability
requirements.  The data set is just under 1TB and I don't see us growing
to more that a small Cassandra cluster.

I have run 0.7.0 on a 3 node cluster in production and that was fine,
but it was a different sort of application.

Thanks!

(FWIW, our second choice would be to run PG and shard at the app level.)

Cheers,
Chris Dean


Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.


Re: 1.2.2 as primary storage?

2013-02-25 Thread Chris Dean
Michael Kjellman mkjell...@barracuda.com writes:
 How big will each mutation be roughly? 1MB, 5MB, 16MB?

On the small end.  Say 1MB.

Cheers,
Chris Dean


Re: 1.2.2 as primary storage?

2013-02-25 Thread Michael Kjellman
I do this, and have done with with C*, since 0.86

Pitfalls:
1) Large mutations are a pain, which is why it's not really a recommended
use case for C*, I limit mine to 5MB
2) Repairs can get ugly and replication can get ugly due to the fact that
your hints will grow very quickly if you have an issue with network in
your DC

Netflix actually has support for chunking binary blobs in Astyanax.

I'd say you'll be fine if you plan to have 1MB mutations and only 1-2TB of
total load across your cluster.

On 2/25/13 3:37 PM, Chris Dean ctd...@sokitomi.com wrote:

Michael Kjellman mkjell...@barracuda.com writes:
 How big will each mutation be roughly? 1MB, 5MB, 16MB?

On the small end.  Say 1MB.

Cheers,
Chris Dean


Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.


Request trace question

2013-02-25 Thread Ilya Kirnos
Here's a sample request trace (Cassandra 1.2.1), where there's a gap of
almost 60ms between one of the two local quorum nodes receiving a message
and the row cache getting hit.  There's then a further almost 60ms delay
between the response enqueue and the actual send. Please see 54.234.178.159 in
the trace below.  My question is what (besides GC pauses) could be causing
this?  There was no load on the nodes during this request.

Thanks.

tracing on;
CONSISTENCY LOCAL_QUORUM;
select * from Account where key = 'AXNB7rW9q7l4dqOT4gNkcwfU767fcFtW';

Now tracing requests.
Consistency level set to LOCAL_QUORUM.

 key  | name
--+--
 AXNB7rW9q7l4dqOT4gNkcwfU767fcFtW | juicy


Tracing session: 47aaf840-7fa3-11e2-98e2-bbb3d297e375

 activity | timestamp| source |
source_elapsed
--+--++
   execute_cql3_query | 23:30:03,354 |   107.20.35.23 |
 0
Parsing statement | 23:30:03,354 |   107.20.35.23 |
40
   Peparing statement | 23:30:03,354 |   107.20.35.23 |
   190
 Sending message to /10.87.26.112 | 23:30:03,354 |   107.20.35.23 |
   435
  Sending message to /10.35.85.85 | 23:30:03,354 |   107.20.35.23 |
   571
  Message received from /23.22.38.255 | 23:30:03,356 |   107.20.35.23 |
  2754
   Processing response from /23.22.38.255 | 23:30:03,356 |   107.20.35.23 |
  2862
  Message received from /107.20.35.23 | 23:30:03,356 |   23.22.38.255 |
44
Row cache hit | 23:30:03,356 |   23.22.38.255 |
   203
  Enqueuing response to /107.20.35.23 | 23:30:03,356 |   23.22.38.255 |
   281
 Sending message to /10.169.19.28 | 23:30:03,356 |   23.22.38.255 |
   384
  Message received from /107.20.35.23 | 23:30:03,356 | 54.234.178.159 |
20
Row cache hit | 23:30:03,415 | 54.234.178.159 |
 59441
  Enqueuing response to /107.20.35.23 | 23:30:03,415 | 54.234.178.159 |
 59554
 Sending message to /10.169.19.28 | 23:30:03,475 | 54.234.178.159 |
119282
Message received from /54.234.178.159 | 23:30:03,476 |   107.20.35.23 |
122085
 Processing response from /54.234.178.159 | 23:30:03,476 |   107.20.35.23 |
16
 Request complete | 23:30:03,476 |   107.20.35.23 |
122399

EC2 IP mapping:
10.35.85.85 = 54.234.178.159
10.87.26.112 = 23.22.38.255

-- 
-ilya


Read Perf

2013-02-25 Thread Kanwar Sangha
Hi - I am doing a performance run using modified YCSB client and was able to 
populate 8TB on a node and then ran some read workloads. I am seeing an average 
TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question -

Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 
TB ? If I understand correctly, the read should remain constant irrespective of 
the data size since we eventually have sorted SStables and binary search would 
be done on the index filter to find the row ?


Thanks,
Kanwar


Re: Incompatible Gossip 1.1.6 to 1.2.1 Upgrade?

2013-02-25 Thread Arya Goudarzi
No I did not look at nodetool gossipinfo but from the ring on both
pre-upgrade and post upgrade nodes to 1.2.1, what I observed was the
described behavior.

On Sat, Feb 23, 2013 at 1:26 AM, Michael Kjellman
mkjell...@barracuda.comwrote:

 This was a bug with 1.2.0 but resolved in 1.2.1. Did you take a capture of
 nodetool gossipinfo and nodetool ring by chance?

 On Feb 23, 2013, at 12:26 AM, Arya Goudarzi gouda...@gmail.com wrote:

  Hi C* users,
 
  I just upgrade a 12 node test cluster from 1.1.6 to 1.2.1. What I
 noticed from nodetool ring was that the new upgraded nodes only saw each
 other as Normal and the rest of the cluster which was on 1.1.6 as Down.
 Vise versa was true for the nodes running 1.1.6. They saw each other as
 Normal but the 1.2.1 nodes as down. I don't see a note in upgrade docs that
 this would be an issue. Has anyone else observed this problem?
 
  In the debug logs I could see messages saying attempting to connect to
 node IP and then saying it is down.
 
  Cheers,
  -Arya

 Copy, by Barracuda, helps you store, protect, and share all your amazing

 things. Start today: www.copy.com.



Re: Data Model - Additional Column Families or one CF?

2013-02-25 Thread Javier Sotelo
Aaron,

Would 50 CFs be pushing it? According to
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management,
This has been tested to work across hundreds or even thousands of
ColumnFamilies.

What is the bottleneck, IO?

Thanks,
Javier


On Sun, Feb 24, 2013 at 5:51 PM, Adam Venturella aventure...@gmail.comwrote:

 Thanks Aaron, this was a big help!
 —
 Sent from Mailbox https://bit.ly/SZvoJe for iPhone


 On Thu, Feb 21, 2013 at 9:27 AM, aaron morton aa...@thelastpickle.comwrote:

 If you have a limited / known number (say  30)  of types, I would create
 a CF for each of them.

 If the number of types is unknown or very large I would have one CF with
 the row key you described.

 Generally I avoid data models that require new CF's as the data grows.
 Additionally having different CF's allows you to use different cache
 settings, compactions settings and even storage mediums.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 21/02/2013, at 7:43 AM, Adam Venturella aventure...@gmail.com wrote:

 My data needs only require me to store JSON, and I can handle this in 1
 column family by prefixing row keys with a type, for example:

 comments:{message_id}

 Where comments: represents the prefix and {message_id} represents some
 row key to a message object in the same column family.

 In this case comments:{message_id} would be a wide row using comment
 creation time and descending clustering order to sort the messages as they
 are added.

 My question is, would I be better off splitting comments into their own
 Column Family or is storing them in with the Messages Column Family
 sufficient, they are all messages after all.

 Or do Column Families really just provide a nice organizational front for
 data. I'm just storing JSON.