date:20120924

Re: CQL 2, CQL 3 and Thrift confusion

2012-09-24 Thread Oleksandr Petrov

Yup, that was exactly the cause. Somehow I could not figure out why it was
downcasing my keyspace name all the time.
May be good to put it somewhere in reference material with a more detailed
explanation.

On Sun, Sep 23, 2012 at 9:30 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 In CQL3, names are case insensitive by default, while they were case
 sensitive in CQL2. You can force whatever case you want in CQL3
 however using double quotes. So in other words, in CQL3,
   USE TestKeyspace;
 should work as expected.

 --
 Sylvain

 On Sun, Sep 23, 2012 at 9:22 PM, Oleksandr Petrov
 oleksandr.pet...@gmail.com wrote:
  Hi,
 
  I'm currently using Cassandra 1.1.5.
 
  When I'm trying to create a Keyspace from CQL 2 with a command (`cqlsh
 -2`):
 
CREATE KEYSPACE TestKeyspace WITH strategy_class = 'SimpleStrategy' AND
  strategy_options:replication_factor = 1
 
  Then try to access it from CQL 3 (`cqlsh -3`):
 
USE TestKeyspace;
 
  I get an error: Bad Request: Keyspace 'testkeyspace' does not exist
 
  Same thing is applicable to Thrift Interface. Somehow, I can only access
  keyspaces created from CQL 2 via Thrift Interface.
 
  Basically, I get same exact error: InvalidRequestException(why:There is
 no
  ring for the keyspace: CascadingCassandraCql3)
 
  Am I missing some switch? Or maybe it is intended to work that way?...
  Thanks!
 
  --
  alex p




-- 
alex p

Re: compression

2012-09-24 Thread Tamar Fraenkel

Thanks all, that helps. Will start with one - two CFs and let you know the
effect

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Sun, Sep 23, 2012 at 8:21 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 As well as your unlimited column names may all have the same prefix,
 right? Like accounts.rowkey56, accounts.rowkey78, etc. etc.  so the
 accounts gets a ton of compression then.

 Later,
 Dean

 From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Sunday, September 23, 2012 11:46 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: compression

  column metadata, you're still likely to get a reasonable amount of
 compression.  This is especially true if there is some amount of repetition
 in the column names, values, or TTLs in wide rows.  Compression will almost
 always be beneficial unless you're already somehow CPU bound or are using
 large column values that are high in entropy, such as pre-compressed or
 encrypted data.

tokLogo.png

Re: any ways to have compaction use less disk space?

2012-09-24 Thread Віталій Тимчишин

Why so?
What are pluses and minuses?
As for me, I am looking for number of files in directory.
700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
700GB/5MB*5 = 70 files, that is too much for single directory, too much
memory used for SST data, too huge compaction queue (that leads to strange
pauses, I suppose because of compactor thinking what to compact next),...

2012/9/23 Aaron Turner synfina...@gmail.com

 On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  If you think about space, use Leveled compaction! This won't only allow
 you
  to fill more space, but also will shrink you data much faster in case of
  updates. Size compaction can give you 3x-4x more space used than there
 are
  live data. Consider the following (our simplified) scenario:
  1) The data is updated weekly
  2) Each week a large SSTable is written (say, 300GB) after full update
  processing.
  3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
  4) Only after 4th week they all will be compacted into one 300GB SSTable.
 
  Leveled compaction've tamed space for us. Note that you should set
  sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB
  per node) to prevent creating a lot of small files.

 512MB per sstable?  Wow, that's freaking huge.  From my conversations
 with various developers 5-10MB seems far more reasonable.   I guess it
 really depends on your usage patterns, but that seems excessive to me-
 especially as sstables are promoted.


-- 
Best regards,
 Vitalii Tymchyshyn

DunDDD NoSQL and Big Data

2012-09-24 Thread Andy Cobley

Hi All, 

I'm organising the NoSQL and Big Data track at Developer Day Dundee:

http://dun.dddscotland.co.uk/

This is free mini conference at Dundee University, Dundee Scotland.  For the
past 2 years we've had a track on NoSQL and had some great speakers.
However I don't believe we've had anyone from the Cassandra community join
us and give a talk.If your interested, drop me a line and let me know
what your proposing.

I should  point out, as this is a free conference, we can't pay speakers and
unless we get a big sponsor, it's doubtful we can manage much in the way of
expenses !

Andy Cobley
Program director, MSc Business Intelligence and Data Science
School of Computing
University of Dundee
http://www.computing.dundee.ac.uk/


The University of Dundee is a Scottish Registered Charity, No. SC015096.

Cassandra failures while moving token

2012-09-24 Thread Shashilpi Krishan

Hi

Actually problem is that while we move the token in a 12 node cluster we 
observe cassandra misses (no data as per cassandra for requested row key). As 
per our understanding we expect that when we move token then that node will 
first sync up the data as per the new assigned token  only after that it will 
receive the requests for new range. So not sure why cluster gives a miss as 
soon as we move token.

Is there any way/utility through which we can tell that a particular row key 
is fetched from which node so as to ensure that token move is completed fine 
and data is lying on correct new node and also being looked up by cluster on 
correct node. OR

Please tell that what is the best way out to change the tokens in the cluster.

Thanks  Regards

Shashilpi Krishan


CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

workarounds for https://issues.apache.org/jira/browse/CASSANDRA-3741

2012-09-24 Thread Radim Kolar

Are there any tested patches around for fixing this issue in 1.0 branch? 
I have to do keyspace wide flush every 30 seconds to survive delete-only 
workload. This is very inefficient.


https://issues.apache.org/jira/browse/CASSANDRA-3741

Nodetool repair and Leveled Compaction

2012-09-24 Thread Sergey Tryuber

Hi Guys

We've noticed a strange behavior on our 3-nodes staging Cassandra cluster
with RF=2 and LeveledCompactionStrategy. When we run nodetool repair
keyspace cfname -pr on a node, the other nodes start validation
process and when this process is finished one of the other 2 nodes reports
that there are apparently several hundreds of pending compaction tasks and
total disk space used by the column family (in JMX) is doubled((( Repair
process by itself is going well in a background, but the issue I'm
concerned is a lot of unnecessary compaction tasks and doubled disk space
on one of good nodes. Is such behavior designed or it is a bug?

Re: Varchar indexed column and IN(...)

2012-09-24 Thread Sylvain Lebresne

On Sun, Sep 23, 2012 at 11:30 PM, aaron morton aa...@thelastpickle.com wrote:
 If this is intended behavior, could somebody please point me to where this
 is
 documented?

 It is intended.

It is not in fact. We should either refuse the query as yet
unsupported or we should do the right thing, but returning nothing
silently is wrong. I've created
https://issues.apache.org/jira/browse/CASSANDRA-4709 to fix that.

--
Sylvain

Re: downgrade from 1.1.4 to 1.0.X

2012-09-24 Thread Arend-Jan Wijtzes

On Thu, Sep 20, 2012 at 10:13:49AM +1200, aaron morton wrote:
 No. 
 They use different minor file versions which are not backwards compatible. 

Thanks Aaron.

Is upgradesstables capable of downgrading the files to 1.0.8?
Looking for a way to make this work.

Regards,
Arend-Jan


 On 18/09/2012, at 11:18 PM, Arend-Jan Wijtzes ajwyt...@wise-guys.nl wrote:
 
  Hi,
  
  We are running Cassandra 1.1.4 and like to experiment with
  Datastax Enterprise which uses 1.0.8. Can we safely downgrade
  a production cluster or is it incompatible? Any special steps
  involved?

-- 
Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl

[BETA RELEASE] Apache Cassandra 1.2.0-beta1 released

2012-09-24 Thread Sylvain Lebresne

The Cassandra team is pleased to announce the release of the first beta for
the future Apache Cassandra 1.2.0.

Let me first stress that this is beta software and as such is *not* ready for
production use.

The goal of this release is to give a preview of what will become Cassandra
1.2 and to get wider testing before the final release. As such, it is likely
not bug free but all help in testing this beta would be greatly appreciated and
will help make 1.2 a solid release. So please report any problem you may
encounter[3,4] with this release. Have a look at the change log[1] and the
release notes[2] to see where Cassandra 1.2 differs from the previous series.

Apache Cassandra 1.2.0-beta1[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 12x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/qhh8h (CHANGES.txt)
[2]: http://goo.gl/Pu9kh (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]: 
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta1

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle

2012/9/23 Hiller, Dean dean.hil...@nrel.gov

 You need to split data among partitions or your query won't scale as more
 and more data is added to table.  Having the partition means you are
 querying a lot less rows.

This will happen in case I can query just one partition. But if I need to
query things in multiple partitions, wouldn't it be slower?


 He means determine the ONE partition key and query that partition.  Ie. If
 you want just latest user requests, figure out the partition key based on
 which month you are in and query it.  If you want the latest independent of
 user, query the correct single partition for GlobalRequests CF.


But in this case, I didn't understand Aaron's model then. My first query is
to get  all requests for a user. If I did partitions by time, I will need
to query all partitions to get the results, right? In his answer it was
said I would query ONE partition...


 If I want all the requests for the user, couldn't I just select all
 UserRequest records which start with userId?
 He designed it so the user requests table was completely scalable so he
 has partitions there.  If you don't have partitions, you could run into a
 row that is t long.  You don't need to design it this way if you know
 none of your users are going to go into the millions as far as number of
 requests.  In his design then, you need to pick the correct partition and
 query into that partition.

You mean too many rows, not a row too long, right? I am assuming each
request will be a different row, not a new column. Is having billions of
ROWS something non performatic in Cassandra? I know Cassandra allows up to
2 billion columns for a CF, but I am not aware of a limitation for rows...


 I really didn't understand why to use partitions.
 Partitions are a way if you know your rows will go into the trillions of
 breaking them up so each partition has 100k rows or so or even 1 million
 but maxes out in the millions most likely.  Without partitions, you hit a
 limit in the millions.  With partitions, you can keep scaling past that as
 you can have as many partitions as you want.


If I understood it correctly, if I don't specify partitions, Cassandra will
store all my data in a single node? I thought Cassandra would automatically
distribute my data among nodes as I insert rows into a CF. Of course if I
use partitions I understand I could query just one partition (node) to get
the data, if I have the partition field, but to the best of my knowledge,
this is not what happens in my case, right? In the first query I would have
to query all the partitions...
Or you are saying partitions have nothing to do with nodes?? I 99,999% of
my users will have less than 100k requests, would it make sense to
partition by user?


 A multi-get is a query that finds IN PARALLEL all the rows with the
 matching keys you send to cassandra.  If you do 1000 gets(instead of a
 multi-get) with 1ms latency, you will find, it takes 1 second+processing
 time.  If you do ONE multi-get, you only have 1 request and therefore 1ms
 latency.  The other solution is you could send 1000 asycnh gets but I
 have a feeling that would be slower with all the marshalling/unmarshalling
 of the envelope…..really depends on the envelope size like if we were using
 http, you would get killed doing 1000 requests instead of 1 with 1000 keys
 in it.

That's cool! :D So if I need to query data split in 10 partitions, for
instance, I can perform the query in parallel by using a multiget, right?
Out of curiosity, if each get will occur on a different node, I would need
to connect to each of the nodes? Or would I query 1 node and it would
communicate to others?



 Later,
 Dean

 From: Marcelo Elias Del Valle mvall...@gmail.commailto:
 mvall...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Sunday, September 23, 2012 10:23 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Correct model


 2012/9/20 aaron morton aa...@thelastpickle.commailto:
 aa...@thelastpickle.com
 I would consider:

 # User CF
 * row_key: user_id
 * columns: user properties, key=value

 # UserRequests CF
 * row_key: user_id : partition_start where partition_start is the start
 of a time partition that makes sense in your domain. e.g. partition
 monthly. Generally want to avoid rows the grow forever, as a rule of thumb
 avoid rows more than a few 10's of MB.
 * columns: two possible approaches:
 1) If the requests are immutable and you generally want all of the data
 store the request in a single column using JSON or similar, with the column
 name a timestamp.
 2) Otherwise use a composite column name of timestamp : request_property
 to store the request in many columns.
 * In either case consider using Reversed comparators so the most recent
 columns are first  see

Re: Nodetool repair and Leveled Compaction

2012-09-24 Thread Radim Kolar



Repair process by itself is going well in a background, but the issue 
I'm concerned is a lot of unnecessary compaction tasks
number in compaction tasks counter is over estimated. For example i have 
1100 tasks left and if I will stop inserting data, all tasks will finish 
within 30 minutes.


I suppose that this counter is incremented for every sstable which needs 
compaction, but its not decremented properly because you can compact 
about 20 sstables at once, and this reduces counter only by 1.

Re: Correct model

2012-09-24 Thread Hiller, Dean

I am confused.  In this email you say you want get all requests for a user 
and in a previous one you said Select all the users which has new requests, 
since date D so let me answer both…

For latter, you make ONE query into the latest partition(ONE partition) of the 
GlobalRequestsCF which gives you the most recent requests ALONG with the user 
ids of those requests.  If you queried all partitions, you would most likely 
blow out your JVM memory.

For the former, you make ONE query to the UserRequestsCF with userid = your 
user id to get all the requests for that user

You mean too many rows, not a row too long, right? I am assuming each request 
will be a different row, not a new column. Is having billions of ROWS something 
non performatic in Cassandra? I know Cassandra allows up to 2 billion columns 
for a CF, but I am not aware of a limitation for rows…

Sorry, I was skipping some context.  A lot of the backing indexing sometimes is 
done as a long row so in playOrm, too many rows in a partition means == too 
many columns in the indexing row for that partition.  I believe the same is 
true in cassandra for their indexing.

If I understood it correctly, if I don't specify partitions, Cassandra will 
store all my data in a single node?

Cassandra spreads all your data out on all nodes with or without partitions.  A 
single partition does have it's data co-located though.

I 99,999% of my users will have less than 100k requests, would it make sense to 
partition by user?

If you are at 100k(and the requests are rather small), you could embed all the 
requests in the user or go with Aaron's below suggestion of a UserRequestsCF.  
If your requests are rather large, you probably don't want to embed them in the 
User.  Either way, it's one query or one row key lookup.

That's cool! :D So if I need to query data split in 10 partitions, for 
instance, I can perform the query in parallel by using a multiget, right?

Multiget ignores partitions…you feed it a LIST of keys and it gets them.  It 
just so happens that partitionId had to be part of your row key.

Out of curiosity, if each get will occur on a different node, I would need to 
connect to each of the nodes? Or would I query 1 node and it would communicate 
to others?

I have used Hector and now use Astyanax, I don't worry much about that layer, 
but I feed astyanax 3 nodes and I believe it discovers some of the other ones.  
I believe the latter is true but am not 100% sure as I have not looked at that 
code.

As an analogy on the above, if you happen to have used PlayOrm, you would ONLY 
need one Requests table and you partition by user AND time(two views into the 
same data partitioned two different ways) and you can do exactly the same thing 
as Aaron's example.  PlayOrm doesn't embed the partition ids in the key leaving 
it free to partition twice like in your case….and in a refactor, you have to 
map/reduce A LOT more rows because of rows having the FK of 
partitionidsubrowkey whereas if you don't have partition id in the key, you 
only map/reduce the partitioned table in a redesign/refactor.  That said, we 
will be adding support for CQL partitioning in addition to PlayOrm partitioning 
even though it can be a little less flexible sometimes.

Also, CQL locates all the data on one node for a partition.  We have found it 
can be faster sometimes with the parallelized disks that the partitions are 
NOT all on one node so PlayOrm partitions are virtual only and do not relate to 
where the rows are stored.  An example on our 6 nodes was a join query on a 
partition with 1,000,000 rows took 60ms (of course I can't compare to CQL here 
since it doesn't do joins).  It really depends how much data is going to come 
back in the query though too?  There are tradeoff's between disk parallel nodes 
and having your data all on one node of course.

Later,
Dean



From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, September 24, 2012 7:45 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Correct model



2012/9/23 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
You need to split data among partitions or your query won't scale as more and 
more data is added to table.  Having the partition means you are querying a lot 
less rows.
This will happen in case I can query just one partition. But if I need to query 
things in multiple partitions, wouldn't it be slower?

He means determine the ONE partition key and query that partition.  Ie. If you 
want just latest user requests, figure out the partition key based on which 
month you are in and query it.  If you want the latest independent of user, 
query the correct single partition for GlobalRequests CF.

But in this case, I didn't understand Aaron's model

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle

2012/9/24 Hiller, Dean dean.hil...@nrel.gov

 I am confused.  In this email you say you want get all requests for a
 user and in a previous one you said Select all the users which has new
 requests, since date D so let me answer both…


I have both needs. These are the two queries I need to perform on the model.


 For latter, you make ONE query into the latest partition(ONE partition) of
 the GlobalRequestsCF which gives you the most recent requests ALONG with
 the user ids of those requests.  If you queried all partitions, you would
 most likely blow out your JVM memory.

 For the former, you make ONE query to the UserRequestsCF with userid =
 your user id to get all the requests for that user


Now I think I got the main idea! This answered a lot!


 Sorry, I was skipping some context.  A lot of the backing indexing
 sometimes is done as a long row so in playOrm, too many rows in a partition
 means == too many columns in the indexing row for that partition.  I
 believe the same is true in cassandra for their indexing.


Oh, ok, you were talking about the wide row pattern, right? But playORM is
compatible with Aaron's model, isn't it? Can I map exactly this using
playORM? The hardest thing for me to use playORM now is I don't know
Cassandra well yet, and I know playORM even less. Can I ask playOrm
questions in this list? I will try to create a POC here!
Only now I am starting to understand what it does ;-) The examples
directory is empty for now, I would like to see how to set up the
connection with it.


 Cassandra spreads all your data out on all nodes with or without
 partitions.  A single partition does have it's data co-located though.


Now I see. The main advantage of using partitions is keeping the indexes
small enough. It has nothing to do with the nodes. Thanks!


 If you are at 100k(and the requests are rather small), you could embed all
 the requests in the user or go with Aaron's below suggestion of a
 UserRequestsCF.  If your requests are rather large, you probably don't want
 to embed them in the User.  Either way, it's one query or one row key
 lookup.


I see it now.


 Multiget ignores partitions…you feed it a LIST of keys and it gets them.
  It just so happens that partitionId had to be part of your row key.


Do you mean I need to load all the keys in memory to do a multiget?


 I have used Hector and now use Astyanax, I don't worry much about that
 layer, but I feed astyanax 3 nodes and I believe it discovers some of the
 other ones.  I believe the latter is true but am not 100% sure as I have
 not looked at that code.


Why did you move? Hector is being considered for being the official
client for Cassandra, isn't it? I looked at the Astyanax api and it seemed
much more high level though


 As an analogy on the above, if you happen to have used PlayOrm, you would
 ONLY need one Requests table and you partition by user AND time(two views
 into the same data partitioned two different ways) and you can do exactly
 the same thing as Aaron's example.  PlayOrm doesn't embed the partition ids
 in the key leaving it free to partition twice like in your case….and in a
 refactor, you have to map/reduce A LOT more rows because of rows having the
 FK of partitionidsubrowkey whereas if you don't have partition id in
 the key, you only map/reduce the partitioned table in a redesign/refactor.
  That said, we will be adding support for CQL partitioning in addition to
 PlayOrm partitioning even though it can be a little less flexible sometimes.


I am not sure I understood this part. If I need to refactor, having the
partition id in the key would be a bad thing? What would be the
alternative? In my case, as I use userId : partitionId as row key, this
might be a problem, right?


 Also, CQL locates all the data on one node for a partition.  We have found
 it can be faster sometimes with the parallelized disks that the
 partitions are NOT all on one node so PlayOrm partitions are virtual only
 and do not relate to where the rows are stored.  An example on our 6 nodes
 was a join query on a partition with 1,000,000 rows took 60ms (of course I
 can't compare to CQL here since it doesn't do joins).  It really depends
 how much data is going to come back in the query though too?  There are
 tradeoff's between disk parallel nodes and having your data all on one node
 of course.


I guess I am still not ready for this level of info. :D
In the playORM readme, we have the following:

@NoSqlQuery(name=findWithJoinQuery, query=PARTITIONS t(:partId)
SELECT t FROM TABLE as t +
INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and
t.numShares  :shares),

What would happen behind the scenes when I execute this query? You can only
use joins with partition keys, right?
In this case, is partId the row id of TABLE CF?


Thanks a lot for the answers

-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

RE: Cassandra Counters

2012-09-24 Thread Roshni Rajagopal


Hi folks,
   I looked at my mail below, and Im rambling a bit, so Ill try to re-state my 
queries pointwise. 
a) what are the performance tradeoffs on reads  writes between creating a 
standard column family and manually doing the counts by a lookup on a key, 
versus using counters. 
b) whats the current state of counters limitations in the latest version of 
apache cassandra?
c) with there being a possibilty of counter values getting out of sync, would 
counters not be recommended where strong consistency is desired. The normal 
benefits of cassandra's tunable consistency would not be applicable, as 
re-tries may cause overstating. So the normal use case is high performance, and 
where consistency is not paramount.
Regards,roshni


From: roshni_rajago...@hotmail.com
To: user@cassandra.apache.org
Subject: Cassandra Counters
Date: Mon, 24 Sep 2012 16:21:55 +0530





Hi ,
I'm trying to understand if counters are a good fit for my use case.Ive watched 
http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now...and 
still need help!
Suppose I have a list of items- to which I can add or delete a set of items at 
a time,  and I want a count of the items, without considering changing the 
database  or additional components like zookeeper,I have 2 options_ the first 
is a counter col family, and the second is a standard one











 
 
  1. List_Counter_CF
  
  
  
 
 
  
  TotalItems
  
  
  
  
 
 
  ListId
  50
  
  
  
  
 
 
  
  
  
  
  
  
 
 
  2.List_Std_CF


  
  
  
  
  
 
 
  
  TimeUUID1
  TimeUUID2
  TimeUUID3
  TimeUUID4
  TimeUUID5
 
 
  ListId
  3
  70
  -20
  3
  -6
 


And in the second I can add a new col with every set of items added or deleted. 
Over time this row may grow wide.To display the final count, Id need to read 
the row, slice through all columns and add them.
In both cases the writes should be fast, in fact standard col family should be 
faster as there's no read, before write. And for CL ONE write the latency 
should be same. For reads, the first option is very good, just read one column 
for a key
For the second, the read involves reading the row, and adding each column value 
via application code. I dont think there's a way to do math via CQL yet.There 
should be not hot spotting, if the key is sharded well. I could even maintain 
the count derived from the List_Std_CF in a separate column family which is a 
standard col family with the final number, but I could do that as a separate 
process  immediately after the write to List_Std_CF completes, so that its not 
blocking.  I understand cassandra is faster for writes than reads, but how slow 
would Reading by row key be...? Is there any number around after how many 
columns the performance starts deteriorating, or how much worse in performance 
it would be? 
The advantage I see is that I can use the same consistency rules as for the 
rest of column families. If quorum for reads  writes, then you get strongly 
consistent values. In case of counters I see that in case of timeout exceptions 
because the first replica is down or not responding, there's a chance of the 
values getting messed up, and re-trying can mess it up further. Its not 
idempotent like a standard col family design can be.
If it gets messed up, it would need administrator's help (is there a a document 
on how we could resolve counter values going wrong?)
I believe the rest of the limitations still hold good- has anything changed in 
recent versions? In my opinion, they are not as major as the consistency 
question.-removing a counter  then modifying value - behaviour is 
undetermined-special process for counter col family sstable loss( need to 
remove all files)-no TTL support-no secondary indexes

In short, I can recommend counters can be used for analytics or while dealing 
with data where the exact numbers are not important, orwhen its ok to take some 
time to fix the mismatch, and the performance requirements are most 
important.However where the numbers should match , its better to use a std 
column family and a manual implementation.
Please share your thoughts on this.
Regards,roshni

unsubcribe

2012-09-24 Thread Shailesh Bagad

unsubscribe

Re: Correct model

2012-09-24 Thread Hiller, Dean

Oh, ok, you were talking about the wide row pattern, right?

yes

But playORM is compatible with Aaron's model, isn't it?

Not yet, PlayOrm supports partitioning one table multiple ways as it indexes
the columns(in your case, the userid FK column and the time column)

Can I map exactly this using playORM?

Not yet, but the plan is to map these typical Cassandra scenarios as well.

Can I ask playOrm questions in this list?

The best place to ask PlayOrm questions is on stack overflow and tag with
PlayOrm though I monitor this list and stack overflow for questions(there are
already a few questions on stack overflow).

The examples directory is empty for now, I would like to see how to set up the
connection with it.

Running build or build.bat is always kept working and all 62 tests pass(or we
don't merge to master) so to see how to make a connection or run an example

1. Run build.bat or build which generates parsing code
2. Import into eclipse (it already has .classpath and .project for you
already there)
3. In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not and
run any of the tests in-memory or against localhost(We run the test suite also
against a 6 node cluster as well and all passes)
4. FactorySingleton probably has the code you are looking for plus you need a
class called nosql.Persistence or it won't scan your jar file.(class file not
xml file like JPA)

Do you mean I need to load all the keys in memory to do a multi get?

No, you batch. I am not sure about CQL, but PlayOrm returns a Cursor not the
results so you can loop through every key and behind the scenes it is doing
batch requests so you can load up 100 keys and make one multi get request for
those 100 keys and then can load up the next 100 keys, etc. etc. etc. I need
to look more into the apis and protocol of CQL to see if it allows this style
of batching. PlayOrm does support this style of batching today. Aaron would
know if CQL does.

Why did you move? Hector is being considered for being the official client
for Cassandra, isn't it?

At the time, I wanted the file streaming feature. Also, Hector seemed a bit
cumbersome as well compared to astyanax or at least if you were building a
platform and had no use for typing the columns. Just personal preference
really here.

I am not sure I understood this part. If I need to refactor, having the
partition id in the key would be a bad thing? What would be the alternative? In
my case, as I use userId : partitionId as row key, this might be a problem,
right?

PlayOrm indexes the columns you choose(ie. The ones you want to use in the
where clause) and partitions by columns you choose not based on the key so in
PlayOrm, the key is typically a TimeUUID or something cluster unique…..any
tables referencing that TimeUUID never have to change. With Cassandra
partitioning, if you repartition that table a different way or go for some kind
of major change(usually done with map/reduce), all your foreign keys may have
to change….it really depends on the situation though. Maybe you get the design
right and never have to change.

@NoSqlQuery(name=findWithJoinQuery, query=PARTITIONS t(:partId) SELECT t
FROM TABLE as t +
INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares
:shares),

What would happen behind the scenes when I execute this query?

In this case, t or TABLE is a partitioned table since a partition is defined.
And t.activityTypeInfo refers to the ActivityTypeInfo table which is not
partitioned(AND ActivityTypeInfo won't scale to billions of rows because there
is no partitioning but maybe you don't need it!!!). Behind the scenes when you
call getResult, it returns a cursor that has NOT done anything yet. When you
start looping through the cursor, behind the scenes it is batching requests
asking for next 500 matches(configurable) so you never run out of memory….it is
EXACTLY like a database cursor. You can even use the cursor to show a user the
first set of results and when user clicks next pick up right where the cursor
left off (if you saved it to the HttpSession).

You can only use joins with partition keys, right?

Nope, joins work on anything. You only need to specify the partitionId when
you have a partitioned table in the list of join tables. (That is what the
PARTITIONS clause is for, to identify partitionId = what?)…it was put BEFORE
the SQL instead of within it…CQL took the opposite approach but PlayOrm can
also join different partitions together as well ;) ).

In this case, is partId the row id of TABLE CF?

Nope, partId is one of the columns. There is a test case on this class in
PlayOrm …(notice the annotation NoSqlPartitionByThisField on the column/field
in the entity)…

https://github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvazan/test/db/PartitionedSingleTrade.java

PlayOrm allows partitioned tables AND non-partioned tables(non-partitioned
tables won't scale but maybe

RE: Cassandra Counters

2012-09-24 Thread Milind Parikh

IMO

You would use Cassandra Counters (or other variation of distributed
counting) in case of having determined that a centralized version of
counting is not going to work.

You'd determine the non_feasibility of centralized counting by figuring the
speed at which you need to sustain writes and reads and reconcile that with
your hard disk seek times (essentially).

Once you have proved that you can't do centralized counting, the second
layer of arsenal comes into play; which is distributed counting.

In distributed counting , the CAP theorem comes into life.  in Cassandra,
Availability and Network Partitioning trumps over Consistency.

So yes, you sacrifice strong consistency for availability and partion
tolerance; for eventual consistency.
On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com
wrote:

  Hi folks,

I looked at my mail below, and Im rambling a bit, so Ill try to
 re-state my queries pointwise.

 a) what are the performance tradeoffs on reads  writes between creating a
 standard column family and manually doing the counts by a lookup on a key,
 versus using counters.

 b) whats the current state of counters limitations in the latest version
 of apache cassandra?

 c) with there being a possibilty of counter values getting out of sync,
 would counters not be recommended where strong consistency is desired. The
 normal benefits of cassandra's tunable consistency would not be applicable,
 as re-tries may cause overstating. So the normal use case is high
 performance, and where consistency is not paramount.

 Regards,
 roshni



 --
 From: roshni_rajago...@hotmail.com
 To: user@cassandra.apache.org
 Subject: Cassandra Counters
 Date: Mon, 24 Sep 2012 16:21:55 +0530

  Hi ,

 I'm trying to understand if counters are a good fit for my use case.
 Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many
 times over now...
 and still need help!

 Suppose I have a list of items- to which I can add or delete a set of
 items at a time,  and I want a count of the items, without considering
 changing the database  or additional components like zookeeper,
 I have 2 options_ the first is a counter col family, and the second is a
 standard one
   1. List_Counter_CFTotalItemsListId 502.List_Std_CF

 TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5  ListId 3 70 -20 3
 -6

 And in the second I can add a new col with every set of items added or
 deleted. Over time this row may grow wide.
 To display the final count, Id need to read the row, slice through all
 columns and add them.

 In both cases the writes should be fast, in fact standard col family
 should be faster as there's no read, before write. And for CL ONE write the
 latency should be same.
 For reads, the first option is very good, just read one column for a key

 For the second, the read involves reading the row, and adding each column
 value via application code. I dont think there's a way to do math via CQL
 yet.
 There should be not hot spotting, if the key is sharded well. I could even
 maintain the count derived from the List_Std_CF in a separate column family
 which is a standard col family with the final number, but I could do that
 as a separate process  immediately after the write to List_Std_CF
 completes, so that its not blocking.  I understand cassandra is faster for
 writes than reads, but how slow would Reading by row key be...? Is there
 any number around after how many columns the performance starts
 deteriorating, or how much worse in performance it would be?

 The advantage I see is that I can use the same consistency rules as for
 the rest of column families. If quorum for reads  writes, then you get
 strongly consistent values.
 In case of counters I see that in case of timeout exceptions because the
 first replica is down or not responding, there's a chance of the values
 getting messed up, and re-trying can mess it up further. Its not idempotent
 like a standard col family design can be.

 If it gets messed up, it would need administrator's help (is there a a
 document on how we could resolve counter values going wrong?)

 I believe the rest of the limitations still hold good- has anything
 changed in recent versions? In my opinion, they are not as major as the
 consistency question.
 -removing a counter  then modifying value - behaviour is undetermined
 -special process for counter col family sstable loss( need to remove all
 files)
 -no TTL support
 -no secondary indexes


 In short, I can recommend counters can be used for analytics or while
 dealing with data where the exact numbers are not important, or
 when its ok to take some time to fix the mismatch, and the performance
 requirements are most important.
 However where the numbers should match , its better to use a std column
 family and a manual implementation.

 Please share your thoughts on this.

 Regards,
 roshni

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle

Dean,

There is one last thing I would like to ask about playOrm by this list,
the next questiosn will come by stackOverflow. Just because of the context,
I prefer asking this here:
 When you say playOrm indexes a table (which would be a CF behind the
scenes), what do you mean? PlayOrm will automatically create a CF to index
my CF? Will it auto-manage it, like Cassandra's secondary indexes?
 In Cassandra, the application is responsible for maintaining the
index, right? I might be wrong, but unless I am using secondary indexes I
need to update index values manually, right?
 I got confused when you said PlayOrm indexes the columns you choose.
How do I choose and what exactly it means?

Best regards,
Marcelo Valle.

2012/9/24 Hiller, Dean dean.hil...@nrel.gov

 Oh, ok, you were talking about the wide row pattern, right?

 yes

 But playORM is compatible with Aaron's model, isn't it?

 Not yet, PlayOrm supports partitioning one table multiple ways as it
 indexes the columns(in your case, the userid FK column and the time column)

 Can I map exactly this using playORM?

 Not yet, but the plan is to map these typical Cassandra scenarios as well.

  Can I ask playOrm questions in this list?

 The best place to ask PlayOrm questions is on stack overflow and tag with
 PlayOrm though I monitor this list and stack overflow for questions(there
 are already a few questions on stack overflow).

 The examples directory is empty for now, I would like to see how to set up
 the connection with it.

 Running build or build.bat is always kept working and all 62 tests pass(or
 we don't merge to master) so to see how to make a connection or run an
 example

  1.  Run build.bat or build which generates parsing code
  2.  Import into eclipse (it already has .classpath and .project for you
 already there)
  3.  In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not
 and run any of the tests in-memory or against localhost(We run the test
 suite also against a 6 node cluster as well and all passes)
  4.  FactorySingleton probably has the code you are looking for plus you
 need a class called nosql.Persistence or it won't scan your jar file.(class
 file not xml file like JPA)

 Do you mean I need to load all the keys in memory to do a multi get?

 No, you batch.  I am not sure about CQL, but PlayOrm returns a Cursor not
 the results so you can loop through every key and behind the scenes it is
 doing batch requests so you can load up 100 keys and make one multi get
 request for those 100 keys and then can load up the next 100 keys, etc.
 etc. etc.  I need to look more into the apis and protocol of CQL to see if
 it allows this style of batching.  PlayOrm does support this style of
 batching today.  Aaron would know if CQL does.

 Why did you move? Hector is being considered for being the official
 client for Cassandra, isn't it?

 At the time, I wanted the file streaming feature.  Also, Hector seemed a
 bit cumbersome as well compared to astyanax or at least if you were
 building a platform and had no use for typing the columns.  Just personal
 preference really here.

 I am not sure I understood this part. If I need to refactor, having the
 partition id in the key would be a bad thing? What would be the
 alternative? In my case, as I use userId : partitionId as row key, this
 might be a problem, right?

 PlayOrm indexes the columns you choose(ie. The ones you want to use in the
 where clause) and partitions by columns you choose not based on the key so
 in PlayOrm, the key is typically a TimeUUID or something cluster
 unique…..any tables referencing that TimeUUID never have to change.  With
 Cassandra partitioning, if you repartition that table a different way or go
 for some kind of major change(usually done with map/reduce), all your
 foreign keys may have to change….it really depends on the situation
 though.  Maybe you get the design right and never have to change.

 @NoSqlQuery(name=findWithJoinQuery, query=PARTITIONS t(:partId) SELECT
 t FROM TABLE as t +
 INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares 
 :shares),

 What would happen behind the scenes when I execute this query?

 In this case, t or TABLE is a partitioned table since a partition is
 defined.  And t.activityTypeInfo refers to the ActivityTypeInfo table which
 is not partitioned(AND ActivityTypeInfo won't scale to billions of rows
 because there is no partitioning but maybe you don't need it!!!).  Behind
 the scenes when you call getResult, it returns a cursor that has NOT done
 anything yet.  When you start looping through the cursor, behind the scenes
 it is batching requests asking for next 500 matches(configurable) so you
 never run out of memory….it is EXACTLY like a database cursor.  You can
 even use the cursor to show a user the first set of results and when user
 clicks next pick up right where the cursor left off (if you saved it to the
 HttpSession).

 You can only use joins with partition keys,

Re: Correct model

2012-09-24 Thread Hiller, Dean

PlayOrm will automatically create a CF to index my CF?

It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and 
StringIndice such that the ad-hoc tool that is in development can display the 
indices as it knows the prefix of the composite column name is of Integer, 
Decimal or String and it knows the postfix type as well so it can translate 
back from bytes to the types and properly display in a GUI (i.e. On top of 
SELECT, the ad-hoc tool is adding a way to view the induce rows so you can 
check if they got corrupt or not).

Will it auto-manage it, like Cassandra's secondary indexes?

YES

Further detail…

You annotated fields with @NoSqlIndexed and PlayOrm adds/removes from the index 
as you add/modify/remove the entity…..a modify does a remove old val from index 
and insert new value into index.

An example would be PlayOrm stores all long, int, short, byte in a type that 
uses the least amount of space so IF you have a long OR BigInteger between –128 
to 128 it only ends up storing 1 byte in cassandra(SAVING tons of space!!!).  
Then if you are indexing a type that is one of those, PlayOrm creates a 
IntegerIndice table.

Right now, another guy is working on playorm-server which is a webgui to allow 
ad-hoc access to all your data as well so you can ad-hoc queries to see data 
and instead of showing Hex, it shows the real values by translating the bytes 
to String for the schema portions that it is aware of that is.

Later,
Dean

From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, September 24, 2012 12:09 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Correct model

Dean,

There is one last thing I would like to ask about playOrm by this list, the 
next questiosn will come by stackOverflow. Just because of the context, I 
prefer asking this here:
 When you say playOrm indexes a table (which would be a CF behind the 
scenes), what do you mean? PlayOrm will automatically create a CF to index my 
CF? Will it auto-manage it, like Cassandra's secondary indexes?
 In Cassandra, the application is responsible for maintaining the index, 
right? I might be wrong, but unless I am using secondary indexes I need to 
update index values manually, right?
 I got confused when you said PlayOrm indexes the columns you choose. How 
do I choose and what exactly it means?

Best regards,
Marcelo Valle.

2012/9/24 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
Oh, ok, you were talking about the wide row pattern, right?

yes

But playORM is compatible with Aaron's model, isn't it?

Not yet, PlayOrm supports partitioning one table multiple ways as it indexes 
the columns(in your case, the userid FK column and the time column)

Can I map exactly this using playORM?

Not yet, but the plan is to map these typical Cassandra scenarios as well.

 Can I ask playOrm questions in this list?

The best place to ask PlayOrm questions is on stack overflow and tag with 
PlayOrm though I monitor this list and stack overflow for questions(there are 
already a few questions on stack overflow).

The examples directory is empty for now, I would like to see how to set up the 
connection with it.

Running build or build.bat is always kept working and all 62 tests pass(or we 
don't merge to master) so to see how to make a connection or run an example

 1.  Run build.bat or build which generates parsing code
 2.  Import into eclipse (it already has .classpath and .project for you 
already there)
 3.  In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not and 
run any of the tests in-memory or against localhost(We run the test suite also 
against a 6 node cluster as well and all passes)
 4.  FactorySingleton probably has the code you are looking for plus you need a 
class called nosql.Persistence or it won't scan your jar file.(class file not 
xml file like JPA)

Do you mean I need to load all the keys in memory to do a multi get?

No, you batch.  I am not sure about CQL, but PlayOrm returns a Cursor not the 
results so you can loop through every key and behind the scenes it is doing 
batch requests so you can load up 100 keys and make one multi get request for 
those 100 keys and then can load up the next 100 keys, etc. etc. etc.  I need 
to look more into the apis and protocol of CQL to see if it allows this style 
of batching.  PlayOrm does support this style of batching today.  Aaron would 
know if CQL does.

Why did you move? Hector is being considered for being the official client 
for Cassandra, isn't it?

At the time, I wanted the file streaming feature.  Also, Hector seemed a bit 
cumbersome as well compared to astyanax or at least if you were building a 
platform and had no use for typing the columns.  Just personal preference

Re: Is it possible to create a schema before a Cassandra node starts up ?

2012-09-24 Thread Rob Coli

On Fri, Sep 14, 2012 at 7:05 AM, Xu, Zaili z...@pershing.com wrote:
 I am pretty new to Cassandra. I have a script that needs to set up a schema
 first before starting up the cassandra node. Is this possible ? Can I create
 the schema directly on cassandra storage and then when the node starts up it
 will pick up the schema ?

Aaron gave you the scientific answer, which is that you can't load
schema without starting a node.

However if you :

1) start a node for the first time
2) load schema
3) call nodetool drain so all system keyspace CFs are guaranteed to be
flushed to sstables
4) then, from your script, start that node (or a node with identical
configuration) using the flushed system sstables (directly on the
storage)

You can set up a schema before starting up the cassandra node or
having a cassandra node or cluster running all the time. This might be
useful in for example testing contexts...

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle

Dean, this sounds like magic :D
I don't know details about the performance on the index implementations you
chose, but it would pay the way to use it in my case, as I don't need the
best performance in the world when reading, but I need to assure
scalability and have a simple model to maintain. I liked the playOrm
concept regarding this.
I have more doubts, but I will ask them at stack over flow from now on.

2012/9/24 Hiller, Dean dean.hil...@nrel.gov

 PlayOrm will automatically create a CF to index my CF?

 It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and
 StringIndice such that the ad-hoc tool that is in development can display
 the indices as it knows the prefix of the composite column name is of
 Integer, Decimal or String and it knows the postfix type as well so it can
 translate back from bytes to the types and properly display in a GUI (i.e.
 On top of SELECT, the ad-hoc tool is adding a way to view the induce rows
 so you can check if they got corrupt or not).

 Will it auto-manage it, like Cassandra's secondary indexes?

 YES

 Further detail…

 You annotated fields with @NoSqlIndexed and PlayOrm adds/removes from the
 index as you add/modify/remove the entity…..a modify does a remove old val
 from index and insert new value into index.

 An example would be PlayOrm stores all long, int, short, byte in a type
 that uses the least amount of space so IF you have a long OR BigInteger
 between –128 to 128 it only ends up storing 1 byte in cassandra(SAVING tons
 of space!!!).  Then if you are indexing a type that is one of those,
 PlayOrm creates a IntegerIndice table.

 Right now, another guy is working on playorm-server which is a webgui to
 allow ad-hoc access to all your data as well so you can ad-hoc queries to
 see data and instead of showing Hex, it shows the real values by
 translating the bytes to String for the schema portions that it is aware of
 that is.

 Later,
 Dean

 From: Marcelo Elias Del Valle mvall...@gmail.commailto:
 mvall...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, September 24, 2012 12:09 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Correct model

 Dean,

 There is one last thing I would like to ask about playOrm by this
 list, the next questiosn will come by stackOverflow. Just because of the
 context, I prefer asking this here:
  When you say playOrm indexes a table (which would be a CF behind the
 scenes), what do you mean? PlayOrm will automatically create a CF to index
 my CF? Will it auto-manage it, like Cassandra's secondary indexes?
  In Cassandra, the application is responsible for maintaining the
 index, right? I might be wrong, but unless I am using secondary indexes I
 need to update index values manually, right?
  I got confused when you said PlayOrm indexes the columns you
 choose. How do I choose and what exactly it means?

 Best regards,
 Marcelo Valle.

 2012/9/24 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
 Oh, ok, you were talking about the wide row pattern, right?

 yes

 But playORM is compatible with Aaron's model, isn't it?

 Not yet, PlayOrm supports partitioning one table multiple ways as it
 indexes the columns(in your case, the userid FK column and the time column)

 Can I map exactly this using playORM?

 Not yet, but the plan is to map these typical Cassandra scenarios as well.

  Can I ask playOrm questions in this list?

 The best place to ask PlayOrm questions is on stack overflow and tag with
 PlayOrm though I monitor this list and stack overflow for questions(there
 are already a few questions on stack overflow).

 The examples directory is empty for now, I would like to see how to set up
 the connection with it.

 Running build or build.bat is always kept working and all 62 tests pass(or
 we don't merge to master) so to see how to make a connection or run an
 example

  1.  Run build.bat or build which generates parsing code
  2.  Import into eclipse (it already has .classpath and .project for you
 already there)
  3.  In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not
 and run any of the tests in-memory or against localhost(We run the test
 suite also against a 6 node cluster as well and all passes)
  4.  FactorySingleton probably has the code you are looking for plus you
 need a class called nosql.Persistence or it won't scan your jar file.(class
 file not xml file like JPA)

 Do you mean I need to load all the keys in memory to do a multi get?

 No, you batch.  I am not sure about CQL, but PlayOrm returns a Cursor not
 the results so you can loop through every key and behind the scenes it is
 doing batch requests so you can load up 100 keys and make one multi get
 request for those 100 keys and then can load up the next 100 keys, etc.
 etc. etc.  I need to look

Prevent queries from OOM nodes

2012-09-24 Thread Bryce Godfrey

Is there anything I can do on the configuration side to prevent nodes from 
going OOM due to queries that will read large amounts of data and exceed the 
heap available?

For the past few days of we had some nodes consistently freezing/crashing with 
OOM.  We got a heap dump into MAT and figured out the nodes were dying due to 
some queries for a few extremely large data sets.  Tracked it back to an app 
that just didn't prevent users from doing these large queries, but it seems 
like Cassandra could be smart enough to guard against this type of thing?

Basically some kind of setting like if the data to satisfy query  available 
heap then throw an error to the caller and about query.  I would much rather 
return errors to clients then crash a node, as the error is easier to track 
down that way and resolve.

Thanks.

Cassandra compression not working?

2012-09-24 Thread Michael Theroux

Hello,

We are running into an unusual situation that I'm wondering if anyone has any 
insight on.  We've been running a Cassandra cluster for some time, with 
compression enabled on one column family in which text documents are stored.  
We enabled compression on the column family, utilizing the SnappyCompressor and 
a 64k chunk length.

It was recently discovered that Cassandra was reporting a compression ratio of 
0.  I took a snapshot of the data and started a cassandra node in isolation to 
investigate.

Running nodetool scrub, or nodetool upgradesstables had little impact on the 
amount of data that was being stored.

I then disabled compression and ran nodetool upgradesstables on the column 
family.  Again, not impact on the data size stored.

I then reenabled compression and ran nodetool upgradesstables on the column 
family.  This resulting in a 60% reduction in the data size stored, and 
Cassandra reporting a compression ration of about .38.

Any idea what is going on here?  Obviously I can go through this process in 
production to enable compression, however, any idea what is currently happening 
and why new data does not appear to be compressed?

Any insights are appreciated,
Thanks,
-Mike

Re: Cassandra compression not working?

2012-09-24 Thread Mike

I forgot to mention we are running Cassandra 1.1.2.

Thanks,
-Mike

On Sep 24, 2012, at 5:00 PM, Michael Theroux mthero...@yahoo.com wrote:

 Hello,
 
 We are running into an unusual situation that I'm wondering if anyone has any 
 insight on.  We've been running a Cassandra cluster for some time, with 
 compression enabled on one column family in which text documents are stored.  
 We enabled compression on the column family, utilizing the SnappyCompressor 
 and a 64k chunk length.
 
 It was recently discovered that Cassandra was reporting a compression ratio 
 of 0.  I took a snapshot of the data and started a cassandra node in 
 isolation to investigate.
 
 Running nodetool scrub, or nodetool upgradesstables had little impact on the 
 amount of data that was being stored.
 
 I then disabled compression and ran nodetool upgradesstables on the column 
 family.  Again, not impact on the data size stored.
 
 I then reenabled compression and ran nodetool upgradesstables on the column 
 family.  This resulting in a 60% reduction in the data size stored, and 
 Cassandra reporting a compression ration of about .38.
 
 Any idea what is going on here?  Obviously I can go through this process in 
 production to enable compression, however, any idea what is currently 
 happening and why new data does not appear to be compressed?
 
 Any insights are appreciated,
 Thanks,
 -Mike

performance for different kinds of row keys

2012-09-24 Thread Marcelo Elias Del Valle

Suppose two cases:

   1. I have a Cassandra column family with non-composite row keys =
   incremental id
   2. I have a Cassandra column family with a composite row keys =
   incremental id 1 : group id

 Which one will be faster to insert? And which one will be faster to
read by incremental id?

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Re: Code example for CompositeType.Builder and SSTableSimpleUnsortedWriter

2012-09-24 Thread Edward Kibardin

Hey...

From my understanding, there are several ways to use composites
with SSTableSimpleUnsortedWriter but which is the best?
And as usual, code examples are welcome ;)

Thanks in advance!

On Thu, Sep 20, 2012 at 11:23 PM, Edward Kibardin infa...@gmail.com wrote:

 Hi Everyone,

 I'm writing a conversion tool from CSV files to SSTable
 using SSTableSimpleUnsortedWriter and unable to find a good example of
 using CompositeType.Builder with SSTableSimpleUnsortedWriter.
 It also will be great if someone had an sample code for insert/update only
 a single value in composites (if it possible in general).

 Quick Google search didn't help me, so I'll be very appreciated for the
 correct sample ;)

 Thanks in advance,
 Ed

Re: any ways to have compaction use less disk space?

2012-09-24 Thread Aaron Turner

On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com wrote:
 Why so?
 What are pluses and minuses?
 As for me, I am looking for number of files in directory.
 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
 700GB/5MB*5 = 70 files, that is too much for single directory, too much
 memory used for SST data, too huge compaction queue (that leads to strange
 pauses, I suppose because of compactor thinking what to compact next),...


Not sure why a lot of files is a problem... modern filesystems deal
with that pretty well.

Really large sstables mean that compactions now are taking a lot more
disk IO and time to complete.   Remember, Leveled Compaction is more
disk IO intensive, so using large sstables makes that even worse.
This is a big reason why the default is 5MB. Also, each level is 10x
the size as the previous level.  Also, for level compaction, you need
10x the sstable size worth of free space to do compactions.  So now
you need 5GB of free disk, vs 50MB of free disk.

Also, if you're doing deletes in those CF's, that old, deleted data is
going to stick around a LOT longer with 512MB files, because it can't
get deleted until you have 10x512MB files to compact to level 2.
Heaven forbid it doesn't get deleted then because each level is 10x
bigger so you end up waiting a LOT longer to actually delete that data
from disk.

Now, if you're using SSD's then larger sstables is probably doable,
but even then I'd guesstimate 50MB is far more reasonable then 512MB.

-Aaron


 2012/9/23 Aaron Turner synfina...@gmail.com

 On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  If you think about space, use Leveled compaction! This won't only allow
  you
  to fill more space, but also will shrink you data much faster in case of
  updates. Size compaction can give you 3x-4x more space used than there
  are
  live data. Consider the following (our simplified) scenario:
  1) The data is updated weekly
  2) Each week a large SSTable is written (say, 300GB) after full update
  processing.
  3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
  4) Only after 4th week they all will be compacted into one 300GB
  SSTable.
 
  Leveled compaction've tamed space for us. Note that you should set
  sstable_size_in_mb to reasonably high value (it is 512 for us with
  ~700GB
  per node) to prevent creating a lot of small files.

 512MB per sstable?  Wow, that's freaking huge.  From my conversations
 with various developers 5-10MB seems far more reasonable.   I guess it
 really depends on your usage patterns, but that seems excessive to me-
 especially as sstables are promoted.


 --
 Best regards,
  Vitalii Tymchyshyn



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Edward Capriolo

Haha Ok.
It is not a total waste, but practically your time is better spent in other
places. The problem is just about everything is a moving target, schema,
request rate, hardware. Generally tuning nudges a couple variables in one
direction or the other and you see some decent returns. But each nudge
takes a restart and a warm up period, and with how Cassandra distributes
requests you likely have to flip several nodes or all of them before you
can see the change! By the time you do that its probably a different day or
week. Essentially finding our if one setting is better then the other is
like a 3 day test in production.

Before c* I used to deal with this in tomcat. Once in a while we would get
a dev that read some article about tuning, something about a new jvm, or
collector. With bright eyed enthusiasm they would want to try tuning our
current cluster. They spend a couple days and measure something and say it
was good lower memory usage. Meanwhile someone else would come to me and
say higher 95th response time. More short pauses, fewer long pauses,
great taste, less filing.

Most people just want to roflscale their huroku cloud. Tuning stuff is
sysadmin work and the cloud has taught us that the cost of sysadmins are
needless waste of money.

Just kidding !

But I do believe the default cassandra settings are reasonable and
typically I find that most who look at tuning GC usually need more hardware
and actually need to be tuning something somewhere else.

G1 is the perfect example of a time suck. Claims low pause latency for big
heaps, and delivers something regarded by the Cassandra community (and
hbase as well) that works worse then CMS. If you spent 3 hours switching
tuning knobs and analysing, that is 3 hours of your life you will never get
back.

Better to let SUN and other people worry about tuning (at least from where
I sit)

On Saturday, September 15, 2012, Peter Schuller peter.schul...@infidyne.com
wrote:
 Generally tuning the garbage collector is a waste of time.

 Sorry, that's BS. It can be absolutely critical, when done right, and
 only useless when done wrong. There's a spectrum in between.

 Just follow
 someone else's recommendation and use that.

 No, don't.

 Most recommendations out there are completely useless in the general
 case because someone did some very specific benchmark under very
 specific circumstances and then recommends some particular combination
 of options. In order to understand whether a particular recommendation
 applies to you, you need to know enough about your use-case that I
 suspect you're better of just reading up on the available options and
 figuring things out. Of course, randomly trying various different
 settings to see which seems to work well may be realistic - but you
 loose predictability (in the face of changing patterns of traffic for
 example) if you don't know why it's behaving like it is.

 If you care about GC related behavior you want to understand how the
 application behaves, how the garbage collector behaves, what your
 requirements are, and select settings based on those requirements and
 how the application and GC behavior combine to produce emergent
 behavior. The best GC options may vary *wildly* depending on the
 nature of your cluster and your goals. There are also non-GC settings
 (in the specific case of Cassandra) that affect the interaction with
 the garbage collector, like whether you're using row/key caching, or
 things like phi conviction threshold and/or timeouts. It's very hard
 for anyone to give generalized recommendations. If it weren't,
 Cassandra would ship with The One True set of settings that are always
 the best and there would be no discussion.

 It's very unfortunate that the state of GC in the freely available
 JVM:s is at this point given that there exists known and working
 algorithms (and at least one practical implementation) that avoids it,
 mostly. But, it's the situation we're in. The only way around it that
 I know of if you're on Hotspot, is to have the application behave in
 such a way that it avoids the causes of un-predictable behavior w.r.t.
 GC by being careful about it's memory allocation and *retention*
 profile. For the specific case of avoiding *ever* seeing a full gc, it
 gets even more complex.

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

unsubscribe

2012-09-24 Thread Fred Groen

Re: Secondary index loss on node restart

2012-09-24 Thread aaron morton

Can you contribute your experience to this ticket 
https://issues.apache.org/jira/browse/CASSANDRA-4670 ? 

Thanks


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/09/2012, at 6:22 AM, Michael Theroux mthero...@yahoo.com wrote:

 Hello,
 
 We have been noticing an issue where, about 50% of the time in which a node 
 fails or is restarted, secondary indexes appear to be partially lost or 
 corrupted.  A drop and re-add of the index appears to correct the issue.  
 There are no errors in the cassandra logs that I see.  Part of the index 
 seems to be simply missing.  Sometimes this corruption/loss doesn't happen 
 immediately, but sometime after the node is restarted.  In addition, the 
 index never appears to have an issue when the node comes down, it is only 
 after the node comes back up and recovers in which we experience an issue.
 
 We developed some code that goes through all the rows in the table, by key, 
 in which the index is present.  It then attempts to look up the information 
 via secondary index, in an attempt to detect when the issue occurs.  Another 
 odd observation is that the number of members present in the index when we 
 have the issue varies up and down (the index and the tables don't change that 
 often).
 
 We are running a 6 node Cassandra cluster with a replication factor of 3, 
 consistency level for all queries is LOCAL_QUORUM.  We are running Cassandra 
 1.1.2.
 
 Anyone have any insights?
 
 -Mike

Re: any ways to have compaction use less disk space?

2012-09-24 Thread Edward Capriolo

If you are using ext3 there is a hard limit on number if files in a
directory of 32K. EXT4 as a much higher limit (cant remember exactly
IIRC). So true that having many files is not a problem for the file
system though your VFS cache could be less efficient since you would
have a higher inode-data ratio.

Edward

On Mon, Sep 24, 2012 at 7:03 PM, Aaron Turner synfina...@gmail.com wrote:
On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com wrote:
Why so?
What are pluses and minuses?
As for me, I am looking for number of files in directory.
700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
700GB/5MB*5 = 70 files, that is too much for single directory, too much
memory used for SST data, too huge compaction queue (that leads to strange
pauses, I suppose because of compactor thinking what to compact next),...

Not sure why a lot of files is a problem... modern filesystems deal
with that pretty well.

Really large sstables mean that compactions now are taking a lot more
disk IO and time to complete. Remember, Leveled Compaction is more
disk IO intensive, so using large sstables makes that even worse.
This is a big reason why the default is 5MB. Also, each level is 10x
the size as the previous level. Also, for level compaction, you need
10x the sstable size worth of free space to do compactions. So now
you need 5GB of free disk, vs 50MB of free disk.

Also, if you're doing deletes in those CF's, that old, deleted data is
going to stick around a LOT longer with 512MB files, because it can't
get deleted until you have 10x512MB files to compact to level 2.
Heaven forbid it doesn't get deleted then because each level is 10x
bigger so you end up waiting a LOT longer to actually delete that data
from disk.

Now, if you're using SSD's then larger sstables is probably doable,
but even then I'd guesstimate 50MB is far more reasonable then 512MB.

-Aaron

2012/9/23 Aaron Turner synfina...@gmail.com

On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
wrote:
If you think about space, use Leveled compaction! This won't only allow
you
to fill more space, but also will shrink you data much faster in case of
updates. Size compaction can give you 3x-4x more space used than there
are
live data. Consider the following (our simplified) scenario:
1) The data is updated weekly
2) Each week a large SSTable is written (say, 300GB) after full update
processing.
3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
4) Only after 4th week they all will be compacted into one 300GB
SSTable.

Leveled compaction've tamed space for us. Note that you should set
sstable_size_in_mb to reasonably high value (it is 512 for us with
~700GB
per node) to prevent creating a lot of small files.

512MB per sstable? Wow, that's freaking huge. From my conversations
with various developers 5-10MB seems far more reasonable. I guess it
really depends on your usage patterns, but that seems excessive to me-
especially as sstables are promoted.

--
Best regards,
Vitalii Tymchyshyn

--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix
Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero

Re: [problem with OOM in nodes]

2012-09-24 Thread aaron morton

 What exactly is the problem with big rows?
During compaction the row will be passed through a slower two pass processing, 
this add's to IO pressure. 
Counting big rows requires that the entire row be read. 
Repairing big rows requires that the entire row be repaired. 

I generally avoid rows above a few 10's of MB as they result in more memory 
churn and create admin problems as above. 
 
 What exactly is the problem with big rows?
 And, how can we should place our data in this case (see the schema in
 the previous replies)? Splitting one report to multiple rows is
 uncomfortably :-(

Looking at your row sizes below, the question is How do I store an object 
which may be up to 3.5GB in size.

AFAIK there are no hard limits that would prevent you putting that in one row. 
And avoiding super columns may save some space. You could have a Simple CF, 
where the each report is one row, each report row is one column and the report 
row is serialised (with JSON or protobufs etc) and stored in the column value.

But i would recommend creating a model where row size is constrained in space. 
E.g. 

Report CF:
* one report per row. 
* one column per report row 
* column value is empty. 

Report Rows CF:
* one row per 100 report rows, e.g. report_id : first_row_number 
* column name is  report row number. 
* column value is report data 
(Or use composite column names, e.g. row_number : report_column

You can still do ranges, buy you have to do some client side work to work it 
out. 

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/09/2012, at 5:14 PM, Denis Gabaydulin gaba...@gmail.com wrote:

 On Sun, Sep 23, 2012 at 10:41 PM, aaron morton aa...@thelastpickle.com 
 wrote:
 /var/log/cassandra$ cat system.log | grep Compacting large | grep -E
 [0-9]+ bytes -o | cut -d   -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
 print foo MB }'  | sort -nr | head -n 50
 
 
 Is it bad signal?
 
 Sorry, I do not know what this is outputting.
 
 
 This is outputting size of big rows which cassandra had compacted before.
 
 As I can see in cfstats, compacted row maximum size: 386857368 !
 
 Yes.
 Having rows in the 100's of MB is will cause problems. Doubly so if they are
 large super columns.
 
 
 What exactly is the problem with big rows?
 And, how can we should place our data in this case (see the schema in
 the previous replies)? Splitting one report to multiple rows is
 uncomfortably :-(
 
 
 Cheers
 
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22/09/2012, at 5:07 AM, Denis Gabaydulin gaba...@gmail.com wrote:
 
 And some stuff from log:
 
 
 /var/log/cassandra$ cat system.log | grep Compacting large | grep -E
 [0-9]+ bytes -o | cut -d   -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
 print foo MB }'  | sort -nr | head -n 50
 3821.55MB
 3337.85MB
 1221.64MB
 1128.67MB
 930.666MB
 916.4MB
 861.114MB
 843.325MB
 711.813MB
 706.992MB
 674.282MB
 673.861MB
 658.305MB
 557.756MB
 531.577MB
 493.112MB
 492.513MB
 492.291MB
 484.484MB
 479.908MB
 465.742MB
 464.015MB
 459.95MB
 454.472MB
 441.248MB
 428.763MB
 424.028MB
 416.663MB
 416.191MB
 409.341MB
 406.895MB
 397.314MB
 388.27MB
 376.714MB
 371.298MB
 368.819MB
 366.92MB
 361.371MB
 360.509MB
 356.168MB
 355.012MB
 354.897MB
 354.759MB
 347.986MB
 344.109MB
 335.546MB
 329.529MB
 326.857MB
 326.252MB
 326.237MB
 
 Is it bad signal?
 
 On Fri, Sep 21, 2012 at 8:22 PM, Denis Gabaydulin gaba...@gmail.com wrote:
 
 Found one more intersting fact.
 As I can see in cfstats, compacted row maximum size: 386857368 !
 
 On Fri, Sep 21, 2012 at 12:50 PM, Denis Gabaydulin gaba...@gmail.com
 wrote:
 
 Reports - is a SuperColumnFamily
 
 Each report has unique identifier (report_id). This is a key of
 SuperColumnFamily.
 And a report saved in separate row.
 
 A report is consisted of report rows (may vary between 1 and 50,
 but most are small).
 
 Each report row is saved in separate super column. Hector based code:
 
 superCfMutator.addInsertion(
 report_id,
 Reports,
 HFactory.createSuperColumn(
   report_row_id,
   mapper.convertObject(object),
   columnDefinition.getTopSerializer(),
   columnDefinition.getSubSerializer(),
   inferringSerializer
 )
 );
 
 We have two frequent operation:
 
 1. count report rows by report_id (calculate number of super columns
 in the row).
 2. get report rows by report_id and range predicate (get super columns
 from the row with range predicate).
 
 I can't see here a big super columns :-(
 
 On Fri, Sep 21, 2012 at 3:10 AM, Tyler Hobbs ty...@datastax.com wrote:
 
 I'm not 100% that I understand your data model and read patterns correctly,
 but it sounds like you have large supercolumns and are requesting some of
 the subcolumns from individual super columns.  If that's the case, the issue
 is that Cassandra must deserialize the entire supercolumn in memory whenever
 you read *any* of the subcolumns.  This is one of the reasons why

Re: Cassandra compression not working?

2012-09-24 Thread Fred Groen

You are going to need a fully optimized flux-capacitor for that.

On Tue, Sep 25, 2012 at 5:00 AM, Michael Theroux mthero...@yahoo.comwrote:

 Hello,

 We are running into an unusual situation that I'm wondering if anyone has
 any insight on.  We've been running a Cassandra cluster for some time, with
 compression enabled on one column family in which text documents are
 stored.  We enabled compression on the column family, utilizing the
 SnappyCompressor and a 64k chunk length.

 It was recently discovered that Cassandra was reporting a compression
 ratio of 0.  I took a snapshot of the data and started a cassandra node in
 isolation to investigate.

 Running nodetool scrub, or nodetool upgradesstables had little impact on
 the amount of data that was being stored.

 I then disabled compression and ran nodetool upgradesstables on the column
 family.  Again, not impact on the data size stored.

 I then reenabled compression and ran nodetool upgradesstables on the
 column family.  This resulting in a 60% reduction in the data size stored,
 and Cassandra reporting a compression ration of about .38.

 Any idea what is going on here?  Obviously I can go through this process
 in production to enable compression, however, any idea what is currently
 happening and why new data does not appear to be compressed?

 Any insights are appreciated,
 Thanks,
 -Mike

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Peter Schuller

 It is not a total waste, but practically your time is better spent in other
 places. The problem is just about everything is a moving target, schema,
 request rate, hardware. Generally tuning nudges a couple variables in one
 direction or the other and you see some decent returns. But each nudge takes
 a restart and a warm up period, and with how Cassandra distributes requests
 you likely have to flip several nodes or all of them before you can see the
 change! By the time you do that its probably a different day or week.
 Essentially finding our if one setting is better then the other is like a 3
 day test in production.

 Before c* I used to deal with this in tomcat. Once in a while we would get a
 dev that read some article about tuning, something about a new jvm, or
 collector. With bright eyed enthusiasm they would want to try tuning our
 current cluster. They spend a couple days and measure something and say it
 was good lower memory usage. Meanwhile someone else would come to me and
 say higher 95th response time. More short pauses, fewer long pauses, great
 taste, less filing.

That's why blind blackbox testing isn't the way to go. Understanding
what the application does, what the GC does, and the goals you have in
mind is more fruitful. For example, are you trying to improve p99?
Maybe you want to improve p999 at the cost of worse p99? What about
failure modes (non-happy cases)? Perhaps you don't care about
few-hundred-ms pauses but want to avoid full gc:s? There's lots of
different goals one might have, and workloads.

Testing is key, but only in combination with some directed choice of
what to tweak. Especially since it's hard to test for for the
non-happy cases (e.g., node takes a burst of traffic and starts
promoting everything into old-gen prior to processing a request,
resulting in a death spiral).

 G1 is the perfect example of a time suck. Claims low pause latency for big
 heaps, and delivers something regarded by the Cassandra community (and hbase
 as well) that works worse then CMS. If you spent 3 hours switching tuning
 knobs and analysing, that is 3 hours of your life you will never get back.

This is similar to saying that someone told you to switch to CMS (or,
use some particular flag, etc), you tried it, and it didn't have the
result you expected.

G1 and CMS have different trade-offs. Nether one will consistently
result in better latencies across the board. It's all about the
details.

 Better to let SUN and other people worry about tuning (at least from where I
 sit)

They're not tuning. They are providing very general purpose default
behavior, including things that make *no* sense at all with Cassandra.
For example, the default behavior with CMS is to try to make the
marking phase run as late as possible so that it finishes just prior
to heap exhaustion, in order to optimize for throughput; except
that's not a good idea for many cases because is exacerbates
fragmentation problems in old-gen by pushing usage very high
repeatedly, and it increases the chance of full gc because marking
started too late (even if you don't hit promotion failures due to
fragmentation). Sudden changes in workloads (e.g., compaction kicks
in) also makes it harder for CMS's mark triggering heuristics to work
well.

As such, default options for Cassandra are use certain settings that
diverge from that of the default behavior of the JVM, because
Cassandra-in-general is much more specific a use-case than the
completely general target audience of the JVM. Similarly, a particular
cluster (with certain workloads/goals/etc) is a yet more specific
use-case than Cassandra-in-general and may be better served by
settings that differ from that of default Cassandra.

But, I certainly agree with this (which I think roughly matches what
you're saying): Don't randomly pick options someone claims is good in
a blog post and expect it to just make things better. If it were that
easy, it would be the default behavior for obvious reasons. The reason
it's not, is likely that it depends on the situation. Further, even if
you do play the lottery and win - if you don't know *why*, how are you
able to extrapolate the behavior of the system with slightly changed
workloads? It's very hard to blackbox-test GC settings, which is
probably why GC tuning can be perceived as a useless game of
whack-a-mole.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re:

2012-09-24 Thread Vijay

Hi Manu,

Glad that you have the issue resolved.

If i understand the issue correctly
Your cassandra installation had RandomParitioner but the bulk loader
configuration (cassandra.yaml) had Murmur3Partitioner?
By fixing the cassandra.yaml for the bulk loader the issue got resolved?

If not then we might have a bug and your feedback might help the community.

Regards,
/VJ



On Wed, Sep 19, 2012 at 10:41 PM, Manu Zhang owenzhang1...@gmail.comwrote:

 the problem seems to have gone away with changing Murmur3Partitioner back
 to RandomPartitioner


 On Thu, Sep 20, 2012 at 11:14 AM, Manu Zhang owenzhang1...@gmail.comwrote:

 Yeah, BulkLoader. You did help me to elaborate my question. Thanks!


 On Thu, Sep 20, 2012 at 10:58 AM, Michael Kjellman 
 mkjell...@barracuda.com wrote:

 I assumed you were talking about BulkLoader. I haven't played with trunk
 yet so I'm afraid I won't be much help here...

 On Sep 19, 2012, at 7:56 PM, Manu Zhang owenzhang1...@gmail.com
 mailto:owenzhang1...@gmail.com wrote:

 cassandra-trunk (so it's 1.2); no Hadoop, bulk load example here
 http://www.datastax.com/dev/blog/bulk-loading#comment-127019; buffer
 size is 64 MB as in the example; I'm dealing with about 1GB data. job
 config, you mean?

 On Thu, Sep 20, 2012 at 10:32 AM, Michael Kjellman 
 mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote:
 A few questions: what version of 1.1 are you running. What version of
 Hadoop?

 What is your job config? What is the buffer size you've chosen? How much
 data are you dealing with?

 On Sep 19, 2012, at 7:23 PM, Manu Zhang owenzhang1...@gmail.com
 mailto:owenzhang1...@gmail.com wrote:

  I've been bulk loading data into Cassandra and seen the following
 exception:
 
  ERROR 10:10:31,032 Exception in thread
 Thread[CompactionExecutor:5,1,main]
  java.lang.RuntimeException: Last written key
 DecoratedKey(-442063125946754, 313130303136373a31) = current key
 DecoratedKey(-465541023623745, 313036393331333a33) writing into
 /home/manuzhang/cassandra/data/tpch/lineitem/tpch-lineitem-tmp-ia-56-Data.db
at
 org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:131)
at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:152)
at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:169)
at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:69)
at
 org.apache.cassandra.db.compaction.CompactionManager$1.run(CompactionManager.java:152)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
 
  The running Cassandra and that I load data into are the same one.
 
  What's the cause?

 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.

 Visit http://barracudanetworks.com/facebook






 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.
 Visit http://barracudanetworks.com/facebook

RE: Cassandra Counters

2012-09-24 Thread Roshni Rajagopal

Thanks Milind,
Has anyone implemented counting in a standard col family in cassandra, when you 
can have increments and decrements to the count. Any comparisons in performance 
to using counter column families? 
Regards,Roshni

Date: Mon, 24 Sep 2012 11:02:51 -0700
Subject: RE: Cassandra Counters
From: milindpar...@gmail.com
To: user@cassandra.apache.org

IMO
You would use Cassandra Counters (or other variation of distributed counting) 
in case of having determined that a centralized version of counting is not 
going to work.
You'd determine the non_feasibility of centralized counting by figuring the 
speed at which you need to sustain writes and reads and reconcile that with 
your hard disk seek times (essentially).
Once you have proved that you can't do centralized counting, the second layer 
of arsenal comes into play; which is distributed counting.
In distributed counting , the CAP theorem comes into life.  in Cassandra, 
Availability and Network Partitioning trumps over Consistency. 

So yes, you sacrifice strong consistency for availability and partion 
tolerance; for eventual consistency.
On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com 
wrote:

Hi folks,
   I looked at my mail below, and Im rambling a bit, so Ill try to re-state my 
queries pointwise. 
a) what are the performance tradeoffs on reads  writes between creating a 
standard column family and manually doing the counts by a lookup on a key, 
versus using counters. 

b) whats the current state of counters limitations in the latest version of 
apache cassandra?
c) with there being a possibilty of counter values getting out of sync, would 
counters not be recommended where strong consistency is desired. The normal 
benefits of cassandra's tunable consistency would not be applicable, as 
re-tries may cause overstating. So the normal use case is high performance, and 
where consistency is not paramount.

Regards,roshni

From: roshni_rajago...@hotmail.com
To: user@cassandra.apache.org

Subject: Cassandra Counters
Date: Mon, 24 Sep 2012 16:21:55 +0530

Hi ,
I'm trying to understand if counters are a good fit for my use case.Ive watched 
http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now...
and still need help!
Suppose I have a list of items- to which I can add or delete a set of items at 
a time,  and I want a count of the items, without considering changing the 
database  or additional components like zookeeper,
I have 2 options_ the first is a counter col family, and the second is a 
standard one

  1. List_Counter_CF

  TotalItems

  ListId
  50

  2.List_Std_CF

  TimeUUID1
  TimeUUID2
  TimeUUID3
  TimeUUID4
  TimeUUID5

  ListId
  3
  70
  -20
  3
  -6

And in the second I can add a new col with every set of items added or deleted. 
Over time this row may grow wide.To display the final count, Id need to read 
the row, slice through all columns and add them.

In both cases the writes should be fast, in fact standard col family should be 
faster as there's no read, before write. And for CL ONE write the latency 
should be same. For reads, the first option is very good, just read one column 
for a key

For the second, the read involves reading the row, and adding each column value 
via application code. I dont think there's a way to do math via CQL yet.There 
should be not hot spotting, if the key is sharded well. I could even maintain 
the count derived from the List_Std_CF in a separate column family which is a 
standard col family with the final number, but I could do that as a separate 
process  immediately after the write to List_Std_CF completes, so that its not 
blocking.  I understand cassandra is faster for writes than reads, but how slow 
would Reading by row key be...? Is there any number around after how many 
columns the performance starts deteriorating, or how much worse in performance 
it would be? 

The advantage I see is that I can use the same consistency rules as for the 
rest of column families. If quorum for reads  writes, then you get strongly 
consistent values. In case of counters I see that in case of timeout exceptions 
because the first replica is down or not responding, there's a chance of the 
values getting messed up, and re-trying can mess it up further. Its not 
idempotent like a standard col family design can be.

If it gets messed up, it would need administrator's help (is there a a document 
on how we could resolve counter values going wrong?)
I believe the rest of the limitations still hold good- has anything changed in 
recent versions? In my opinion, they are not as major as the consistency 
question.
-removing a counter  then modifying value - behaviour is undetermined-special 
process for counter col family sstable loss( need to remove all files)-no TTL 
support-no secondary indexes

In short, I can recommend counters can be used for

Re:

2012-09-24 Thread Manu Zhang

I had Murmur3Partitioner for both of them, otherwise bulk loader would have
complained  since I put them under the same project.  I saw some negative
token issues of Murmur3Partitioner on JIRA recently so I moved back to
RandomPartitioner.

Thanks for your concern

On Tue, Sep 25, 2012 at 12:49 PM, Vijay vijay2...@gmail.com wrote:

 Hi Manu,

 Glad that you have the issue resolved.

 If i understand the issue correctly
 Your cassandra installation had RandomParitioner but the bulk loader
 configuration (cassandra.yaml) had Murmur3Partitioner?
 By fixing the cassandra.yaml for the bulk loader the issue got resolved?

 If not then we might have a bug and your feedback might help the community.

 Regards,
 /VJ




 On Wed, Sep 19, 2012 at 10:41 PM, Manu Zhang owenzhang1...@gmail.comwrote:

 the problem seems to have gone away with changing Murmur3Partitioner back
 to RandomPartitioner


 On Thu, Sep 20, 2012 at 11:14 AM, Manu Zhang owenzhang1...@gmail.comwrote:

 Yeah, BulkLoader. You did help me to elaborate my question. Thanks!


 On Thu, Sep 20, 2012 at 10:58 AM, Michael Kjellman 
 mkjell...@barracuda.com wrote:

 I assumed you were talking about BulkLoader. I haven't played with
 trunk yet so I'm afraid I won't be much help here...

 On Sep 19, 2012, at 7:56 PM, Manu Zhang owenzhang1...@gmail.com
 mailto:owenzhang1...@gmail.com wrote:

 cassandra-trunk (so it's 1.2); no Hadoop, bulk load example here
 http://www.datastax.com/dev/blog/bulk-loading#comment-127019; buffer
 size is 64 MB as in the example; I'm dealing with about 1GB data. job
 config, you mean?

 On Thu, Sep 20, 2012 at 10:32 AM, Michael Kjellman 
 mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote:
 A few questions: what version of 1.1 are you running. What version of
 Hadoop?

 What is your job config? What is the buffer size you've chosen? How
 much data are you dealing with?

 On Sep 19, 2012, at 7:23 PM, Manu Zhang owenzhang1...@gmail.com
 mailto:owenzhang1...@gmail.com wrote:

  I've been bulk loading data into Cassandra and seen the following
 exception:
 
  ERROR 10:10:31,032 Exception in thread
 Thread[CompactionExecutor:5,1,main]
  java.lang.RuntimeException: Last written key
 DecoratedKey(-442063125946754, 313130303136373a31) = current key
 DecoratedKey(-465541023623745, 313036393331333a33) writing into
 /home/manuzhang/cassandra/data/tpch/lineitem/tpch-lineitem-tmp-ia-56-Data.db
at
 org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:131)
at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:152)
at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:169)
at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:69)
at
 org.apache.cassandra.db.compaction.CompactionManager$1.run(CompactionManager.java:152)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
 
  The running Cassandra and that I load data into are the same one.
 
  What's the cause?

 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.

 Visit http://barracudanetworks.com/facebook






 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.
 Visit http://barracudanetworks.com/facebook

Re: Cassandra Counters

2012-09-24 Thread Oleksandr Petrov

Maybe I'm missing the point, but counting in a standard column family would
be a little overkill.

I assume that distributed counting here was more of a map/reduce
approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
lot. We're doing some more complex counting (e.q. based on sets of rules)
like that. Of course, that would perform _way_ slower than counting
beforehand. On the other side, you will always have a consistent result for
a consistent dataset.

On the other hand, if you use things like AMQP or Storm (sorry to put up my
sentence together like that, as tools are mostly either orthogonal or
complementary, but I hope you get my point), you could build a topology
that makes fault-tolerant writes independently of your original write. Of
course, it would still have a consistency tradeoff, mostly because of race
conditions and different network latencies etc.

So I would say that building a data model in a distributed system often
depends more on your problem than on the common patterns, because
everything has a tradeoff.

Want to have an immediate result? Modify your counter while writing the row.
Can sacrifice speed, but have more counting opportunities? Go with offline
distributed counting.
Want to have kind of both, dispatch a message and react upon it, having the
processing logic and writes decoupled from main application, allowing you
to care less about speed.

However, I may have missed the point somewhere (early morning, you know),
so I may be wrong in any given statement.
Cheers


On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
roshni_rajago...@hotmail.com wrote:

  Thanks Milind,

 Has anyone implemented counting in a standard col family in cassandra,
 when you can have increments and decrements to the count.
 Any comparisons in performance to using counter column families?

 Regards,
 Roshni


 --
 Date: Mon, 24 Sep 2012 11:02:51 -0700
 Subject: RE: Cassandra Counters
 From: milindpar...@gmail.com
 To: user@cassandra.apache.org


 IMO
 You would use Cassandra Counters (or other variation of distributed
 counting) in case of having determined that a centralized version of
 counting is not going to work.
 You'd determine the non_feasibility of centralized counting by figuring
 the speed at which you need to sustain writes and reads and reconcile that
 with your hard disk seek times (essentially).
 Once you have proved that you can't do centralized counting, the second
 layer of arsenal comes into play; which is distributed counting.
 In distributed counting , the CAP theorem comes into life.  in Cassandra,
 Availability and Network Partitioning trumps over Consistency.

 So yes, you sacrifice strong consistency for availability and partion
 tolerance; for eventual consistency.
 On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com
 wrote:

  Hi folks,

I looked at my mail below, and Im rambling a bit, so Ill try to
 re-state my queries pointwise.

 a) what are the performance tradeoffs on reads  writes between creating a
 standard column family and manually doing the counts by a lookup on a key,
 versus using counters.

 b) whats the current state of counters limitations in the latest version
 of apache cassandra?

 c) with there being a possibilty of counter values getting out of sync,
 would counters not be recommended where strong consistency is desired. The
 normal benefits of cassandra's tunable consistency would not be applicable,
 as re-tries may cause overstating. So the normal use case is high
 performance, and where consistency is not paramount.

 Regards,
 roshni



 --
 From: roshni_rajago...@hotmail.com
 To: user@cassandra.apache.org
 Subject: Cassandra Counters
 Date: Mon, 24 Sep 2012 16:21:55 +0530

  Hi ,

 I'm trying to understand if counters are a good fit for my use case.
 Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many
 times over now...
 and still need help!

 Suppose I have a list of items- to which I can add or delete a set of
 items at a time,  and I want a count of the items, without considering
 changing the database  or additional components like zookeeper,
 I have 2 options_ the first is a counter col family, and the second is a
 standard one
   1. List_Counter_CFTotalItemsListId 502.List_Std_CF

 TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5  ListId 3 70 -20 3
 -6

 And in the second I can add a new col with every set of items added or
 deleted. Over time this row may grow wide.
 To display the final count, Id need to read the row, slice through all
 columns and add them.

 In both cases the writes should be fast, in fact standard col family
 should be faster as there's no read, before write. And for CL ONE write the
 latency should be same.
 For reads, the first option is very good, just read one column for a key

 For the second, the read involves reading the row, and adding each column
 value via application code. I

39 matches

Mail list logo