Re: High disk read throughput on only one node.

2012-12-21 Thread Alain RODRIGUEZ
It looks like nobody has already experiment this kind of trouble or even
has a clue about it.

Under heavy load this creates a high latency (because of iowait) in my app
in prod and we can't handle it longer. If there is nothing new in the few
upcoming days I think I'll drop this node and replace it, hopping this will
fix my issue...

I wait a bit more because I am hopping we will find out what is the issue
and this will help the C* community.




2012/12/20 Alain RODRIGUEZ arodr...@gmail.com

 routing more traffic to it?

 So shouldn't I see more network in on that node in the AWS console ?

 It seems that each node is recieving and sending an equal amount of data.

 What value should I use for dynamic-snitch-badness-threshold to give it a
 try ?
 Le 20 déc. 2012 00:37, Bryan Talbot btal...@aeriagames.com a écrit :

 Oh, you're on ec2.  Maybe the dynamic snitch is detecting that one node is
 performing better than the others so is routing more traffic to it?


 http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold

 -Bryan




 On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 @Aaron
 Is there a sustained difference or did it settle back ? 

 Sustained, clearly. During the day all nodes read at about 6MB/s while
 this one reads at 30-40 MB/s. At night while other reads 2MB/s the broken
 nodes reads at 8-10MB/s

 Could this have been compaction or repair or upgrade tables working ? 

 Was my first thought but definitely no. this occurs continuously.

 Do the read / write counts available in nodetool cfstats show anything
 different ? 

 The cfstats shows different counts (a lot less reads/writes for the
 bad node)  but they didn't join the ring at the same time. I join you the
 cfstats just in case it could help somehow.

 Node  38: http://pastebin.com/ViS1MR8d (bad one)
 Node  32: http://pastebin.com/MrSTHH9F
 Node 154: http://pastebin.com/7p0Usvwd

 @Bryan

  clients always connect to that server

 I didn't join it in the screenshot from AWS console, but AWS report an
 (almost) equal network within the nodes (same for output and cpu). The cpu
 load is a lot higher in the broken node as shown by the OpsCenter, but
 that's due to the high iowait...)




 --
 Bryan Talbot
 Architect / Platform team lead, Aeria Games and Entertainment
 Silicon Valley | Berlin | Tokyo | Sao Paulo





Cassandra read throughput with little/no caching.

2012-12-21 Thread James Masson


Hi list-users,

We have an application that has a relatively unusual access pattern in 
cassandra 1.1.6


Essentially we read an entire multi hundred megabyte column family 
sequentially (little chance of a cassandra cache hit), perform some 
operations on the data, and write the data back to another column family 
in the same keyspace.


We do about 250 writes/sec and 100 reads/sec during this process. Write 
request latency is about 900 microsecs, read request latency is about 
4000 microsecs.


* First Question: Do these numbers make sense?

read-request latency seems a little high to me, cassandra hasn't had a 
chance to cache this data, but it's likely in the Linux disk cache, 
given the sizing of the node/data/jvm.


thanks

James M


Re: Correct way to design a cassandra database

2012-12-21 Thread Hiller, Dean
I you have a way to partition tables, relational can be ok.  Thing of a 
business that has trillions of clients as customers and clients have a whole 
slew of things they are related to.  Partitioning by client can be a good way 
to go.  Here are some patterns we have seen in nosql and perhaps they can help 
your situation….

https://github.com/deanhiller/playorm/wiki/Patterns-Page

Later,
Dean

From: David Mohl d...@dave.cxmailto:d...@dave.cx
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, December 21, 2012 4:49 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Correct way to design a cassandra database

Hello!

I've recently started learning cassandra but still have troubles understanding 
the best way to design a cassandra database.
I've posted my question already on stackoverflow but because this would very 
likely result in a discussion, it got closed. Orginal question here: 
http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


Assuming you have 3 types of objects: User, Photo and Album. Obviously a photo 
belongs to a user and can be part of a album. For querying, assume we just want 
to order by last goes first. Paging by 10 elements should be possible.

Would you go like every document has all the informations needed for a correct 
output. Something like this:

-- User
   | -- Name
   | -- ...
   | -- Photos
| -- Photoname
| -- Uploaded at

Or go a more relational way (while having a secondary index on the belongs_to 
columns:

-- User (userid is the row key)
   | -- Name
   | -- ...

-- Photoid
   | -- belongs_to (userid)
   | -- belongs_to_album (albumid)
   | -- ...

-- Albumid
   | -- belongs_to (userid)
   | -- ...

Another way that came in my mind would be kind of a mix:

-- User
   | -- Name
   | -- ...
   | -- Photoids (e.g. 1,2,3,4,5)
   | -- Albumids (e.g. 1,2,3,4,5)

-- Photoid (photoid is the row key)
   | -- Name
   | -- Uploaded at
   | -- ...

-- Albumid (albumid is the row key)
   | -- Name
   | -- Photoids (e.g. 1,2,3,4,5)
   | -- ...

When using a random partitioner, the last example would be (IMO) the way to go. 
I can query the user object (out of a session id or something) and would get 
all the row keys I need for fetching photo / album data. However this would 
result in veeery large columns. Another down point would be inconsistency and 
identification problems. A photo (or a album) could not be identified by the 
row itself.
Example: If I fetch a photo with ID 3456, I don't know in which albums it is 
part nor which user owns it. Adding this kind of information would result in a 
fairly large stack of points I have to alter on creation / update.

The second example has all the information needed. However, if I want to fetch 
all photos that are part of album x, I have to query by a secondary index that 
COULD contain millions of entries over the whole cluster. And I guess I can 
forget the random partitioner on this example.

Am I thinking to relational?
It'd be great to hear some other opinions on this topic

---
David



CQL3 Compound Primary Keys - Do I have the right idea?

2012-12-21 Thread Adam Venturella
Trying to better grasp compound primary keys and what they are conceptually
doing under the hood. When you create a table with a compound primary key
in cql3 (http://www.datastax.com/dev/blog/schema-in-cassandra-1-1) the
first part of the key is the partition key. I get that and the subsequent
parts help with the row name as I understand it.

So when you add a new row to that columnfamily/table, you are still adding
a row. In other words, the RandomPartitioner places it somewhere in the
cluster as a row on it's own as opposed to just adding a new column to an
existing row, which would live on the same node as the row

The effect of the compound key means that those rows are effectively
treated as if they were part of the same column, making it a wide column.

Is that the right idea or do I have the row / rp thing wrong?


Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
I have a few questions for you, James,

1. how many nodes are in your Cassandra ring?
2. what is the replication factor?
3. when you say sequentially, what do you mean?  what Partitioner do you
use?
4. how many columns per row?  how much data per row?  per column?
5. what client library do you use to access Cassandra?  (Hector?).  Is your
client code single threaded?


On Fri, Dec 21, 2012 at 7:27 AM, James Masson james.mas...@opigram.comwrote:


 Hi list-users,

 We have an application that has a relatively unusual access pattern in
 cassandra 1.1.6

 Essentially we read an entire multi hundred megabyte column family
 sequentially (little chance of a cassandra cache hit), perform some
 operations on the data, and write the data back to another column family in
 the same keyspace.

 We do about 250 writes/sec and 100 reads/sec during this process. Write
 request latency is about 900 microsecs, read request latency is about 4000
 microsecs.

 * First Question: Do these numbers make sense?

 read-request latency seems a little high to me, cassandra hasn't had a
 chance to cache this data, but it's likely in the Linux disk cache, given
 the sizing of the node/data/jvm.

 thanks

 James M



Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
I am pretty new to cassandra as well. But here goes nothing:

Assumptions:
- You are using a CQL3 client

- Remember I am a n00bsauce at this as well, so another member of the list
may, and probably does, have a better more enlightened answer than I.
Everyone was new to this a one time though, and you gotta start somewhere,
so here goes:

- This is a long-ish message as it represents a train of thought.

Based on reading here:
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

Store the user accounts:

CREATE TABLE Users (
user_name text,
password text,
PRIMARY KEY (user_name)
);


Store the Users Photos

CREATE TABLE Photos (
user_name text,
created_time timestamp,
image_url text,
meta_data1 text,
meta_dataN text,
PRIMARY KEY (user_name, created_time)
) WITH CLUSTERING ORDER BY (created_time DESC);

Uses a compound Primary key to make a wide row, allows you to: 'SELECT *
FROM Photos WHERE user_name = 'the_user'; and just get the users's photos
ordered by most recent. The meta columns are just for you to store whatever
you like, or you can have 1 meta column for example: 'data' and just store
some JSON that represents more info about the photo. Something along those
lines.

For albums we will see a bit of data duplication, which I think is par for
the course in something like this. The idea with cassandra is storage is
cheap so duplicating info a bit here and there is acceptable. This is also
the part where I feel like someone with more experience may have a better
answer than I.

CREATE TABLE PhotosAlbums (
user_name text,
album_name text,
image_url text,
PRIMARY KEY (user_name, album_name)
);


So we are duplicating user_name and image_url. The limitation here is you
would not be getting any user defined ordering of the album, but you would
be able to run just one query and get all of the photos in an album:

SELECT * FROM PhotoAlbums WHERE user_name = the_user AND album_name =
the_album_name

If you needed to get a list of all of the photo albums for a user you just
need to:

SELECT * FROM PhotoAlbums WHERE user_name = the_user

This would give you results, the issue with this PhotoAlbums
ColumnFamily/Table is that deleting individual photos from it I think is
problematic with the definition I have provided. I'm pretty sure I didn't
leave a way to delete an individual photo. It may be that you need:

PRIMARY KEY (user_name, album_name, image_url)

Hmmm, I don't like that, lemme try again.

I think, and again someone else here knows more than me about this, I could
easily be wrong here, you could add a level sorting per album like this?


CREATE TABLE PhotosAlbums (
user_name text,
album_name text,
seq int,
image_url text,
poster_image_url text,
PRIMARY KEY (user_name, album_name, seq)
) WITH CLUSTERING ORDER BY (seq ASC);

Note the addition of 'seq'

Actually, that may be better as your sequence number basically acts like a
unique key for just the album itself, instead a relying on the image_url
like I previously mentioned.  That way you could delete like this I think:

DELETE FROM PhotoAlbums WHERE user_name = the_user AND album_name =
the_album_name AND seq = 4

The problem then becomes, after you delete you would need to re-sequence
all the images in the album. Admittedly, I don't know how to best handle
that without running an update each Album entry to re-sequence it. The same
would apply if you where to reorder images, you would need to re-sequence a
set of them as well, or all of them if you made a new image #1.


Another option is to store the actual album in JSON and have your
application manipulate that and save it back to the album:


CREATE TABLE PhotosAlbums (
user_name text,
album_name text,
poster_image_url text,
data text
PRIMARY KEY (user_name, album_name)
);

Here we do away with the 'seq' column and add a 'data' and 'poster_image'
column (so we can give the album a representation to the user).  Data would
just be JSON that looks something like this:

[{'image_url':, ...other data you might want ...},
{'image_url':, ...other data you might want ...},
{'image_url':, ...other data you might want ...},
{'image_url':, ...other data you might want ...},
...]


Now your application would be responsible for sorting the images in the
album and updating the whole JSON blob for the album.

You would now be able to get all of the user's photo albums with:

SELECT * FROM PhotoAlbums WHERE user_name = the_user

You can use the poster_image column to render a nice representation image
of the album.  You can get an individual album with:

SELECT * FROM PhotoAlbums WHERE user_name = the_user AND album_name =
the_album_name

You just need to deserialize the 'data' column to get all of your photo
data. If the user makes an update your application needs to update that
JSON.

My hunch is this JSON based Album might be the way to go.
Again, take this with a grain of salt, I am new to this as well.








Re: TTL on SecondaryIndex Columns. A bug?

2012-12-21 Thread cscetbon.ext
Nice job Aaron,

AFAIU now you set the gc_before to the current time for secondary indexes. And 
as it was set to Integer.MAX_VALUE before your patch, removeDeletedStandard 
function was testing if (column.getLocalDeletiontime()  MAX_VALUE) which is 
always true and so was removing all rows from the secondary index. Am I right ?

--
Cyril SCETBON

On Dec 20, 2012, at 9:28 PM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:

Yes, but they will get compacted away again unless the patch is there.

it's a small patch so you should be able to apply it easily enough if you need 
a fix ASAP.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 20/12/2012, at 5:27 PM, B. Todd Burruss 
bto...@gmail.commailto:bto...@gmail.com wrote:

i believe we have hit this as well.  if you use nodetool to
rebuild_index, does it work?

On Wed, Dec 19, 2012 at 8:10 PM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:
Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079

Just testing my idea of a fix now.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 20/12/2012, at 10:33 AM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:

Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

Done and I now get your repo case…

[default@ks123] get cf1 where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 1.44 msec(s).


[default@ks123] get cf1 where 'indexedColumn'='66';
---
RowKey: 66
= (column=1, value=val, timestamp=135595439049, ttl=7884000)
= (column=10, value=val, timestamp=135595439269, ttl=7884000)
...
= (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)

Looking into it now.

Thanks

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/12/2012, at 9:56 PM, Roland Gude roland.g...@ez.no wrote:

I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
Unfortunately apart from me no one was yet able to reproduce.

Check if data is available before/after compaction
If you have leveled compaction it is hard to test because you cannot trigger
compaction manually.

-Ursprüngliche Nachricht-
Von: Alexei Bakanov [mailto:russ...@gmail.com]
Gesendet: Mittwoch, 19. Dezember 2012 09:35
An: user@cassandra.apache.org
Betreff: Re: TTL on SecondaryIndex Columns. A bug?

I'm running on a single node on my laptop.
It looks like the point when rows dissapear from the index depends on JVM
memory settings. With more memory it needs more data to feed in before
things start disappearing.
Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

To be sure, try to get rows for 'indexedColumn'='1':

[default@ks123] get cf1 where 'indexedColumn'='1';

0 Row Returned.

Thanks


On 19 December 2012 05:15, aaron morton aa...@thelastpickle.com wrote:

Thanks for the nice steps to reproduce.

I ran this on my MBP using C* 1.1.7 and got the expected results, both
get's returned a row.

Were you running against a single node or a cluster ? If a cluster did
you change the CL, cassandra-cli defaults to ONE.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/12/2012, at 9:44 PM, Alexei Bakanov russ...@gmail.com wrote:

Hi,

We are having an issue with TTL on Secondary index columns. We get 0
rows in return when running queries on indexed columns that have TTL.
Everything works fine with small amounts of data, but when we get over
a ceratin threshold it looks like older rows dissapear from the index.
In the example below we create 70 rows with 45k columns each + one
indexed column with just the rowkey as value, so we have one row per
indexed value. When the script is finished the index contains rows
66-69. Rows 0-65 are gone from the index.
Using 'indexedColumn' without TTL fixes the problem.


- SCHEMA START - create keyspace ks123
with placement_strategy = 'NetworkTopologyStrategy'
and strategy_options = {datacenter1 : 1}  and durable_writes = true;

use ks123;

create column family cf1
with column_type = 'Standard'
and comparator = 'AsciiType'
and default_validation_class = 'AsciiType'
and key_validation_class = 'AsciiType'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and column_metadata = [
 {column_name : 'indexedColumn',
 validation_class : AsciiType,
 index_name : 'INDEX1',
 index_type : 0}]
and compression_options = {'sstable_compression' :

RE: what happens while node is bootstrapping?

2012-12-21 Thread DE VITO Dominique

  De : Tyler Hobbs [mailto:ty...@datastax.com]
  Envoyé : mardi 16 octobre 2012 17:04
  À : user@cassandra.apache.org
  Objet : Re: what happens while node is bootstrapping?
 
  On Mon, Oct 15, 2012 at 3:50 PM, Andrey Ilinykh ailin...@gmail.com wrote:
  Does it mean that during bootstrapping process only replicas serve
  read requests for new node range? In other words, replication factor
  is RF-1?

 No.  The bootstrapping node will writes for its new range while bootstrapping 
 as consistency optimization (more or less), but does not contribute to the 
 replication factor or consistency level; all of the original replicas for 
 that range still receive writes, serve reads, and are the nodes that count 
 for consistency level.  Basically, the bootstrapping node has no effect on 
 the existing replicas in terms of RF or CL until the bootstrap completes.

 --
 Tyler Hobbs
 DataStax

For the purposes of the consistency optimization, I would have written that the 
(new) node bootstrapping should receive writes, but also its own replicas (!).

In case of SimpleStrategy, it's obvious that the new node replicas are included 
into the original replicas. So, it's valid to say The bootstrapping node will 
writes for its new range while bootstrapping as consistency optimization 
without mentionning the new node replicas.

In case of NetworkTopologyStrategy, after having played with some examples to 
support my following claim, I suspect the new node replicas are also included 
into the original replicas. So, I am inclined to say it's valid too to say The 
bootstrapping node will writes for its new range while bootstrapping as 
consistency optimization without mentionning the new node replicas.

1) In case of NetworkTopologyStrategy, for a bootstrap, is it correct to say 
that the new node replicas are *always* included into the original replicas ?
(I think Cassandra dev have already proved it).

2) in case of bootstrapping multiple nodes at the same time, the replicas of a 
new node are *not*always included into the original replicas.
Is it a pb for Cassandra (and then, do we need to bootstrap nodes one by one 
?), or is Cassandra able to detect multiple nodes are bootstrapping and to deal 
with it to fetch data on the right nodes ?

Thanks.

Regards,
Dominique




Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
Ok.. So here is my latest thinking... Including that index:

CREATE TABLE Users (
user_name text,
password text,
PRIMARY KEY (user_name)
);

^ Same as before

CREATE TABLE Photos(
user_name text,
photo_id uuid,
created_time timestamp,
data text,
PRIMARY KEY (user_name, photo_id, created_time)
) WITH CLUSTERING ORDER BY (created_time DESC);

^ Note the addition of a photo id and using that in the PK def with the
created_time
Data is a JSON like this:
{
thumbnail: url,
standard_resolution:url
}


CREATE TABLE PhotosAlbums (
user_name text,
album_name text,
poster_image_url text,
data text
PRIMARY KEY (user_name, album_name)
);

^ Same as before, data represents a JSON array of the photos:
[{photo_id:..., thumbnail:url, standard_resolution:url},
{photo_id:..., thumbnail:url, standard_resolution:url},
{photo_id:..., thumbnail:url, standard_resolution:url},
{photo_id:..., thumbnail:url, standard_resolution:url}]

CREATE TABLE PhotosAlbumsIndex (
user_name text,
photo_id uuid,
album_name text,
created_time timestamp
PRIMARY KEY (user_name, photo_id, album_name)
);

The create_time column here is because you need to have at least 1 column
that is not part of the PK. Or that's what it looks like in my quick test.

^ Each photo added to an album needs to be added to this index row


As before, your application will need to keep the order of the array in
tact as your users modify the order of things. Now however if they delete a
photo you need to fetch the PhotoAlbums the photo existed in and update
them accordingly:

SELECT * FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
photo_id=uuid

This should return to you all of the albums that the photo was a part of.
Now you need to:

SELECT * FROM PhotosAlbums where user_name = the_user and album_name IN
(name1, name2, name3 )

name1,2,3 are the album names you selected from the PhotosAlbumsIndex query

So now you have all of the photo albums, you would then iterate over those
in your application, deserializing the JSON data locating the photo ID was
was removed and taking it out of the array, then reserializing to JSON and
updating the record.

When that is complete you need to remove the Photo from the
PhotosAlbumsIndex. Now there is where I het stuck a little.. because this
will fail:

DELETE FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
photo_id=uuid;

It seems to want the album name as well since it's part of the PK.
Admittedly, I don't know how to get around that and just delete everything
where the first 2 components of the PK are true.

You would already possess the list of album names though, so it could be
BATCH that you need to perform for the deletes, specifying 1 delete per
album_name.


Anyway, that's my current thinking. I would love to know if it's possible
to get around the DELETE issue another way.



On Fri, Dec 21, 2012 at 7:15 AM, Adam Venturella aventure...@gmail.comwrote:

 Hmmm it just occurred to me that in my examples, there is no convenient
 way to delete a photo and also remove that photo from the albums it is a
 part of.

 As it stands, you would need to iterate over all of the users albums to
 locate the photo and remove it; that's no good.

 Probably need another table that holds just the photo / album identifiers,
 an index. So when the user deletes a photo, you ask the index which albums
 that photo belongs too and just fetch those to update the album with that
 photo removed.

 :: mobile emails ::

 On Dec 21, 2012, at 3:50, David Mohl d...@dave.cx wrote:

  Hello!

 I've recently started learning cassandra but still have troubles
 understanding the best way to design a cassandra database.
 I've posted my question already on stackoverflow but because this would
 very likely result in a discussion, it got closed. Orginal question here:
 http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


 Assuming you have 3 types of objects: User, Photo and Album. Obviously a
 photo belongs to a user and can be part of a album. For querying, assume we
 just want to order by last goes first. Paging by 10 elements should be
 possible.

 Would you go like every document has all the informations needed for a
 correct output. Something like this:

 -- User
| -- Name
| -- ...
| -- Photos
 | -- Photoname
 | -- Uploaded at

 Or go a more relational way (while having a secondary index on the
 belongs_to columns:

 -- User (userid is the row key)
| -- Name
| -- ...

 -- Photoid
| -- belongs_to (userid)
| -- belongs_to_album (albumid)
| -- ...

 -- Albumid
| -- belongs_to (userid)
| -- ...

 Another way that came in my mind would be kind of a mix:

 -- User
| -- Name
| -- ...
| -- Photoids (e.g. 1,2,3,4,5)
| -- Albumids (e.g. 1,2,3,4,5)

 -- Photoid (photoid is the row key)
 

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
James, using RandomPartitioner, the order of the rows is random, so when
you request these rows in Sequential order (sort by the date?), Cassandra
is not reading them sequentially.

The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for each
column? Or are these the total size of the entire column family?  It wasn't
too clear to me.  But if these are the total size of the column families,
you will be able to fit them mostly in memory, so you should enable row
cache.

I happen to have done some performance tests of my own on cassandra, mostly
on the read, and was also only able to get less than 6MB/sec read rate out
of a cluster of 6 nodes RF2 using a single threaded client.  But it makes a
huge difference when I changed the client to an asynchronous multi-threaded
structure.




On Fri, Dec 21, 2012 at 10:36 AM, James Masson james.mas...@opigram.comwrote:


 Hi,

 thanks for the reply


 On 21/12/12 14:36, Yiming Sun wrote:

 I have a few questions for you, James,

 1. how many nodes are in your Cassandra ring?


 2 or 3 - depending on environment - it doesn't seem to make a difference
 to throughput very much. What is a 30 minute task on a 2 node environment
 is a 30 minute task on a 3 node environment.


  2. what is the replication factor?


 1

  3. when you say sequentially, what do you mean?  what Partitioner do you
 use?


 The data is organised by date - the keys are read sequentially in order,
 only once.

 Random partitioner - the data is equally spread across the nodes to avoid
 hotspots.


  4. how many columns per row?  how much data per row?  per column?


 varies - described in the schema.

 create keyspace mykeyspace
   with placement_strategy = 'SimpleStrategy'
   and strategy_options = {replication_factor : 1}
   and durable_writes = true;


 create column family entities
   with column_type = 'Standard'
   and comparator = 'BytesType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'AsciiType'
   and read_repair_chance = 0.0
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = false
   and compaction_strategy = 'org.apache.cassandra.db.**compaction.**
 SizeTieredCompactionStrategy'
   and caching = 'NONE'
   and column_metadata = [
 {column_name : '64656c65746564',
 validation_class : BytesType,
 index_name : 'deleted_idx',
 index_type : 0},
 {column_name : '6576656e744964',
 validation_class : TimeUUIDType,
 index_name : 'eventId_idx',
 index_type : 0},
 {column_name : '7061796c6f6164',
 validation_class : UTF8Type}];

 2 columns per row here - about 200Mb of data in total


 create column family events
   with column_type = 'Standard'
   and comparator = 'BytesType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'TimeUUIDType'
   and read_repair_chance = 0.0
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = false
   and compaction_strategy = 'org.apache.cassandra.db.**compaction.**
 SizeTieredCompactionStrategy'
   and caching = 'NONE';

 1 column per row - about 300Mb of data

 create column family intervals
   with column_type = 'Standard'
   and comparator = 'BytesType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'AsciiType'
   and read_repair_chance = 0.0
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = false
   and compaction_strategy = 'org.apache.cassandra.db.**compaction.**
 SizeTieredCompactionStrategy'
   and caching = 'NONE';

 variable columns per row - about 40Mb of data.



  5. what client library do you use to access Cassandra?  (Hector?).  Is
 your client code single threaded?


 Hector - yes, the processing side of the client is single threaded, but is
 largely waiting for cassandra responses and has plenty of CPU headroom.


 I guess what I'm most interested in is why the discrepancy in between
 read/write latency - although I understand the data volume is much larger
 in reads, even though the request rate is lower.

 Network usage on a cassandra box barely gets above 20Mbit, including
 inter-cluster comms. Averages 5mbit clientcassandra

 There is near zero disk I/O, and what little there is is served sub 1ms.
 Storage is backed by a very fast SAN, but like I said earlier, the dataset
 just about fits in the Linux disk cache. 2Gb VM, 512Mb cassandra heap - GCs
 are nice and quick, no JVM memory problems, used heap oscillates between
 280-350Mb.

 Basically, I'm just puzzled as cassandra doesn't behave as I would expect.
 Huge CPU use in cassandra for very little throughput. I'm struggling to
 find anything that's wrong with the environment, there's no bottleneck that
 I can see.

 thanks

 James M


what happens while node is decommissioning ?

2012-12-21 Thread DE VITO Dominique

  De : Tyler Hobbs [mailto:ty...@datastax.com]
  Envoyé : mardi 16 octobre 2012 17:04
  À : user@cassandra.apache.org
  Objet : Re: what happens while node is bootstrapping?
 
  On Mon, Oct 15, 2012 at 3:50 PM, Andrey Ilinykh ailin...@gmail.com wrote:
  Does it mean that during bootstrapping process only replicas serve
  read requests for new node range? In other words, replication factor
  is RF-1?

 No.  The bootstrapping node will writes for its new range while bootstrapping 
 as consistency optimization (more or less), but does not contribute to the 
 replication factor or consistency level; all of the original replicas for 
 that range still receive writes, serve reads, and are the nodes that count 
 for consistency level.  Basically, the bootstrapping node has no effect on 
 the existing replicas in terms of RF or CL until the bootstrap completes.

 --
 Tyler Hobbs
 DataStax

Is it symmetric for the decommission ?

Well, is it correct that:
- during a decommission, all of the original replicas for that range still 
receive writes, serve reads, and are the nodes that count for consistency level 
?
- and so, basically, the decommissioning node has no effect on the existing 
replicas in terms of RF or CL until the end of decommission ?
- as a consistency optimization, all the new replicas will receive too the 
writes ?

Thanks.

Regards,
Dominique




Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread James Masson



On 21/12/12 16:27, Yiming Sun wrote:

James, using RandomPartitioner, the order of the rows is random, so when
you request these rows in Sequential order (sort by the date?),
Cassandra is not reading them sequentially.


Yes, I understand the next row to be retrieved in sequence is likely 
to be on a different node, and the ordering is random. I'm using the 
word sequential to try to explain that the data being requested is in an 
order, and not repeated, until the next cycle. The data is not 
guaranteed to be of a size that is cache-able as a whole.




The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for
each column? Or are these the total size of the entire column family?
  It wasn't too clear to me.  But if these are the total size of the
column families, you will be able to fit them mostly in memory, so you
should enable row cache.


Size of the column family, on a single node. Row caching is off at the 
moment.


Are you saying that I should increase the JVM heap to fit some data in 
the row cache, at the expense of linux disk caching?


Bear in mind that the data is only going to be re-requested in sequence 
again - I'm not sure what the value is in the cassandra native caching 
if rows are not re-requested before being evicted.


My current key-cache hit-rates are near zero on this workload, hence I'm 
interested in cassandra's zero-cache performance. Unless I can guarantee 
to fit the entire data-set in memory, it's difficult to justify using 
memory on a cassandra cache if LRU and workload means it's not actually 
a benefit.




I happen to have done some performance tests of my own on cassandra,
mostly on the read, and was also only able to get less than 6MB/sec read
rate out of a cluster of 6 nodes RF2 using a single threaded client.
  But it makes a huge difference when I changed the client to an
asynchronous multi-threaded structure.



Yes, I've been talking to the developers about having a separate thread 
or two that keeps cassandra busy, keeping Disruptor 
(http://lmax-exchange.github.com/disruptor/) fed to do the processing work.


But this all doesn't change the fact that under this zero-cache 
workload, cassandra seems to be very CPU expensive for throughput.


thanks

James M





On Fri, Dec 21, 2012 at 10:36 AM, James Masson james.mas...@opigram.com
mailto:james.mas...@opigram.com wrote:


Hi,

thanks for the reply


On 21/12/12 14:36, Yiming Sun wrote:

I have a few questions for you, James,

1. how many nodes are in your Cassandra ring?


2 or 3 - depending on environment - it doesn't seem to make a
difference to throughput very much. What is a 30 minute task on a 2
node environment is a 30 minute task on a 3 node environment.


2. what is the replication factor?


1

3. when you say sequentially, what do you mean?  what
Partitioner do you
use?


The data is organised by date - the keys are read sequentially in
order, only once.

Random partitioner - the data is equally spread across the nodes to
avoid hotspots.


4. how many columns per row?  how much data per row?  per column?


varies - described in the schema.

create keyspace mykeyspace
   with placement_strategy = 'SimpleStrategy'
   and strategy_options = {replication_factor : 1}
   and durable_writes = true;


create column family entities
   with column_type = 'Standard'
   and comparator = 'BytesType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'AsciiType'
   and read_repair_chance = 0.0
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = false
   and compaction_strategy =
'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
   and caching = 'NONE'
   and column_metadata = [
 {column_name : '64656c65746564',
 validation_class : BytesType,
 index_name : 'deleted_idx',
 index_type : 0},
 {column_name : '6576656e744964',
 validation_class : TimeUUIDType,
 index_name : 'eventId_idx',
 index_type : 0},
 {column_name : '7061796c6f6164',
 validation_class : UTF8Type}];

2 columns per row here - about 200Mb of data in total


create column family events
   with column_type = 'Standard'
   and comparator = 'BytesType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'TimeUUIDType'
   and read_repair_chance = 0.0
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = false
   and compaction_strategy =
'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
   and caching = 'NONE';

1 column 

Re: Moving data from one datacenter to another

2012-12-21 Thread Vegard Berget
Thanks for answers. It went quite well. Note what Aaron writes about sstable 
names, as I did the job before his mail, and changed one name wrong :-) - and 
that caused some troubles ( a lot of missing file errors )- i think that was to 
blame for some counter cf being messed up.  As it was not important we didnt 
try from scratch again.

Vegard Berget


aaron morton aa...@thelastpickle.com:

Sounds about right, i've done similar things before. 

Some notes…

* I would make sure repair has completed on the source cluster before making 
changes. I just like to know data is distributed. I would also do it once all 
the moves are done.

* Rather than flush, take a snap shot and copy from that. Then you will have a 
stable set of files and it's easier to go back and see what you copied. 
(Snapshot does a flush) 
 
* Take a second snapshot after you stop writing to the original cluster and 
work out the delta between them. New files in the second snapshot are the ones 
to copy. 

 Both nodes are 1.1.6, but it might be that we upgrade the target to 1.1.7,
 as I can't see that this will cause any problems?
I would always do one thing at a time. Upgrade before or after the move, not 
in the middle of it. 

 1)  It's the same number of nodes on both clusters, but does the tokens need
 to be the same aswell?  (Wouldn't a repair correct that later?)
I *think* you are moving from nodes in one cluster to nodes in a different 
cluster (i.e. not adding a data centre to an existing cluster). In which 
case it does not matter too much but I would keep them the same. 

 2)  Could data files have any name?  Could we, to avoid a filename crash,
 just substitute the numbers with for example XXX in the data-files?
The names have to match the expected patterns. 

It may be easier to rename the files in your first copy, not the second delta 
copy. Bump the file numbers enough that all the files in the delta copy do not 
need to be renamed. 

 3)  Is this really a sane way to do things?
If you are moving data from one set of nodes in a cassandra cluster to another 
set of nodes in another cluster this is reasonable. You could add the new 
nodes as a new DC and do the whole thing without down time but you mentioned 
that was not possible. 

It looks like you are going to have some down time, or can accept some down 
time, so here's a tweak. You should be able to get the delta copy part done 
pretty quickly. If that's the case you can:

1) do the main copy
2) stop the old system.
3) do the delta copy
4) start the new system

That way you will not have stale reads in the new system.
 
Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/12/2012, at 5:08 PM, B. Todd Burruss bto...@gmail.com wrote:

 to get it correct, meaning consistent, it seems you will need to do
 a repair no matter what since the source cluster is taking writes
 during this time and writing to commit log.  so to avoid filename
 issues just do the first copy and then repair.  i am not sure if they
 can have any filename.
 
 to the question about whether the tokens must be the same, the answer
 is they can't be.
 (http://www.datastax.com/docs/datastax_enterprise2.0/multi_dc_install).
 i believe that as long as your replication factor is  1, then using
 repair would fix most any token assignment
 
 On Wed, Dec 19, 2012 at 4:27 AM, Vegard  Berget p...@fantasista.no wrote:
 Hi,
 
 I know this have been a topic here before, but I need some input on how to
 move data from one datacenter to another (and google just gives me some old
 mails) - and at the same time moving production writing the same way.
 To add the target cluster into the source cluster and just replicate data
 before moving source nodes is not an option, but my plan is as follows:
 1)  Flush data on source cluster and move all data/-files to the destination
 cluster.  While this is going on, we are still writing to the source
 cluster.
 2)  When data is copied, start cassandra on the new cluster - and then move
 writing/reading to the new cluster.
 3)  Now, do a new flush on the source cluster.  As I understand, the sstable
 files are immutable, so the _newly added_ data/ files could be moved to the
 target cluster.
 4)  After new data is also copied into the the target data/, do a nodetool
 -refresh to load the new sstables into the system (i know we need to take
 care of filenames).
 
 It's worth noting that none of the data is critical, but it would be nice to
 get it correct.  I know that there will be a short period between 2 and 4
 that reads potentially could read old data (written while copying, reading
 after we have moved read/write).  This is ok in this case.  Our second
 alternative is to:
 
 1) Drain old cluster
 2) Copy to new cluster
 3) Start new cluster
 
 This will cause the cluster to be unavailable for writes in the copy-period,
 and I wish to avoid that (even if that, too, is survivable).
 
 Both 

Re: Last Modified Time Series in cassandra

2012-12-21 Thread Andrey Ilinykh
You can select a column slice (specify time range wich for sure has last
data), but ask cassandra to return only one column. It is latest one. To
have the best performance use reversed sorting order.

Andrey


On Fri, Dec 21, 2012 at 6:40 AM, Ravikumar Govindarajan 
ravikumar.govindara...@gmail.com wrote:

 How do we model a timeseries data in cassandra for last modified time?

 -- ExampleCF
| -- SomeKey = Key
 | -- TimeUUID = Column-Name
 | -- PKID = Column-Value

 -- ExampleReverseIndexCF
| -- SomeKey = Key
 | -- PKID = Column-Name
 | -- TimeUUID = Column-Value

 To correctly reflect last-modified-time, I need to read existing
 timeuuid, delete it and add incoming timeuuid

 Are there alternatives to the above approach, because it looks a bit
 heavy-weight

 --
 Ravi



Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
James, you could experiment with Row cache, with off-heap JNA cache, and
see if it helps.  My own experience with row cache was not good, and the OS
cache seemed to be most useful, but in my case, our data space was big,
over 10TB.  Your sequential access pattern certainly doesn't play well with
LRU, but giving the small data space you have, you may be able to fit the
data from one column family entirely into the row cache.


On Fri, Dec 21, 2012 at 12:03 PM, James Masson james.mas...@opigram.comwrote:



 On 21/12/12 16:27, Yiming Sun wrote:

 James, using RandomPartitioner, the order of the rows is random, so when
 you request these rows in Sequential order (sort by the date?),
 Cassandra is not reading them sequentially.


 Yes, I understand the next row to be retrieved in sequence is likely to
 be on a different node, and the ordering is random. I'm using the word
 sequential to try to explain that the data being requested is in an order,
 and not repeated, until the next cycle. The data is not guaranteed to be of
 a size that is cache-able as a whole.



 The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for
 each column? Or are these the total size of the entire column family?
   It wasn't too clear to me.  But if these are the total size of the
 column families, you will be able to fit them mostly in memory, so you
 should enable row cache.


 Size of the column family, on a single node. Row caching is off at the
 moment.

 Are you saying that I should increase the JVM heap to fit some data in the
 row cache, at the expense of linux disk caching?

 Bear in mind that the data is only going to be re-requested in sequence
 again - I'm not sure what the value is in the cassandra native caching if
 rows are not re-requested before being evicted.

 My current key-cache hit-rates are near zero on this workload, hence I'm
 interested in cassandra's zero-cache performance. Unless I can guarantee to
 fit the entire data-set in memory, it's difficult to justify using memory
 on a cassandra cache if LRU and workload means it's not actually a benefit.



 I happen to have done some performance tests of my own on cassandra,
 mostly on the read, and was also only able to get less than 6MB/sec read
 rate out of a cluster of 6 nodes RF2 using a single threaded client.
   But it makes a huge difference when I changed the client to an
 asynchronous multi-threaded structure.


 Yes, I've been talking to the developers about having a separate thread or
 two that keeps cassandra busy, keeping Disruptor (
 http://lmax-exchange.github.**com/disruptor/http://lmax-exchange.github.com/disruptor/)
 fed to do the processing work.

 But this all doesn't change the fact that under this zero-cache workload,
 cassandra seems to be very CPU expensive for throughput.

 thanks

 James M




 On Fri, Dec 21, 2012 at 10:36 AM, James Masson james.mas...@opigram.com
 mailto:james.masson@opigram.**com james.mas...@opigram.com wrote:


 Hi,

 thanks for the reply


 On 21/12/12 14:36, Yiming Sun wrote:

 I have a few questions for you, James,

 1. how many nodes are in your Cassandra ring?


 2 or 3 - depending on environment - it doesn't seem to make a
 difference to throughput very much. What is a 30 minute task on a 2
 node environment is a 30 minute task on a 3 node environment.


 2. what is the replication factor?


 1

 3. when you say sequentially, what do you mean?  what
 Partitioner do you
 use?


 The data is organised by date - the keys are read sequentially in
 order, only once.

 Random partitioner - the data is equally spread across the nodes to
 avoid hotspots.


 4. how many columns per row?  how much data per row?  per column?


 varies - described in the schema.

 create keyspace mykeyspace
with placement_strategy = 'SimpleStrategy'
and strategy_options = {replication_factor : 1}
and durable_writes = true;


 create column family entities
with column_type = 'Standard'
and comparator = 'BytesType'
and default_validation_class = 'BytesType'
and key_validation_class = 'AsciiType'
and read_repair_chance = 0.0
and dclocal_read_repair_chance = 0.0
and gc_grace = 0
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = false
and compaction_strategy =
 'org.apache.cassandra.db.__**compaction.__**
 SizeTieredCompactionStrategy'

and caching = 'NONE'
and column_metadata = [
  {column_name : '64656c65746564',
  validation_class : BytesType,
  index_name : 'deleted_idx',
  index_type : 0},
  {column_name : '6576656e744964',
  validation_class : TimeUUIDType,
  index_name : 'eventId_idx',
  index_type : 0},
  {column_name : '7061796c6f6164',
  validation_class : 

[RELEASE CANDIDATE] Apache Cassandra 1.2.0-rc2 released

2012-12-21 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the second release candidate (and
likely the last Cassandra release ever since it's the end of the world) for
the
future Apache Cassandra 1.2.0.

Let me first stress that this is not the final release yet and as such is
*not*
ready for production use.

This release is getting very close to a final version but may still contain
bugs. All available testing of this release will help making 1.2.0 final a
better release and would thus be greatly appreciated. If you were to
encounter
any problem during your testing, please report[3,4] them. Be sure to a look
at
the change log[1] and the release notes[2] to see where Cassandra 1.2
differs
from the previous series.

Apache Cassandra 1.2.0-rc2[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 12x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/plZyW (CHANGES.txt)
[2]: http://goo.gl/frTqL (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-rc2


Re: State of Cassandra and Java 7

2012-12-21 Thread Bryan Talbot
Brian, did any of your issues with java 7 result in corrupting data in
cassandra?

We just ran into an issue after upgrading a test cluster from Cassandra
1.1.5 and Oracle JDK 1.6.0_29-b11 to Cassandra 1.1.7 and 7u10.

What we saw is values in columns with validation
Class=org.apache.cassandra.db.marshal.LongType that were proper integers
becoming corrupted so that they become stored as strings.  I don't have
a reproducible test case yet but will work on making one over the holiday
if I can.

For example, a column with a long type that was originally written and
stored properly (say with value 1200) was somehow changed during cassandra
operations (compaction seems the only possibility) to be the value '1200'
with quotes.

The data was written using the phpcassa library and that application and
library haven't been changed.  This has only happened on our test cluster
which was upgraded and hasn't happened on our live cluster which was not
upgraded.  Many of our column families were affected and all affected
columns are Long (or bigint for cql3).

Errors when reading using CQL3 command client look like this:

Failed to decode value '1356441225' (for column 'expires') as bigint:
unpack requires a string argument of length 8

and when reading with cassandra-cli the error is

[default@cf] get
token['fbc1e9f7cc2c0c2fa186138ed28e5f691613409c0bcff648c651ab1f79f9600b'];
= (column=client_id, value=8ec4c29de726ad4db3f89a44cb07909c04f90932d,
timestamp=1355836425784329, ttl=648000)
A long is exactly 8 bytes: 10




-Bryan





On Mon, Dec 17, 2012 at 7:33 AM, Brian Tarbox tar...@cabotresearch.comwrote:

 I was using jre-7u9-linux-x64  which was the latest at the time.

 I'll confess that I did not file any bugs...at the time the advice from
 both the Cassandra and Zookeeper lists was to stay away from Java 7 (and my
 boss had had enough of my reporting that *the problem was Java 7* for
 me to spend a lot more time getting the details).

 Brian


 On Sun, Dec 16, 2012 at 4:54 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Sat, Dec 15, 2012 at 7:12 PM, Michael Kjellman 
 mkjell...@barracuda.com wrote:

 What issues have you ran into? Actually curious because we push
 1.1.5-7 really hard and have no issues whatsoever.


 A related question is which which version of java 7 did you try? The
 first releases of java 7 were apparently famous for having many issues but
 it seems the more recent updates are much more stable.

 --
 Sylvain


 On Dec 15, 2012, at 7:51 AM, Brian Tarbox tar...@cabotresearch.com
 wrote:

 We've reverted all machines back to Java 6 after running into numerous
 Java 7 issues...some running Cassandra, some running Zookeeper, others just
 general problems.  I don't recall any other major language release being
 such a mess.


 On Fri, Dec 14, 2012 at 5:07 PM, Bill de hÓra b...@dehora.net wrote:

 At least that would be one way of defining officially supported.

 Not quite, because, Datastax is not Apache Cassandra.

 the only issue related to Java 7 that I know of is CASSANDRA-4958, but
 that's osx specific (I wouldn't advise using osx in production anyway) and
 it's not directly related to Cassandra anyway so you can easily use the
 beta version of snappy-java as a workaround if you want to. So that non
 blocking issue aside, and as far as we know, Cassandra supports Java 7. Is
 it rock-solid in production? Well, only repeated use in production can
 tell, and that's not really in the hand of the project.

 Exactly right. If enough people use Cassandra on Java7 and enough
 people file bugs about Java 7 and enough people work on bugs for Java 7
 then Cassandra will eventually work well enough on Java7.

 Bill

 On 14 Dec 2012, at 19:43, Drew Kutcharian d...@venarc.com wrote:

  In addition, the DataStax official documentation states: Versions
 earlier than 1.6.0_19 should not be used. Java 7 is not recommended.
 
  http://www.datastax.com/docs/1.1/install/install_rpm
 
 
 
  On Dec 14, 2012, at 9:42 AM, Aaron Turner synfina...@gmail.com
 wrote:
 
  Does Datastax (or any other company) support Cassandra under Java 7?
  Or will they tell you to downgrade when you have some problem,
 because
  they don't support C* running on 7?
 
  At least that would be one way of defining officially supported.
 
  On Fri, Dec 14, 2012 at 2:22 AM, Sylvain Lebresne 
 sylv...@datastax.com wrote:
  What kind of official statement do you want? As far as I can be
 considered
  an official voice of the project, my statement is: various people
 run in
  production with Java 7 and it seems to work.
 
  Or to answer the initial question, the only issue related to Java 7
 that I
  know of is CASSANDRA-4958, but that's osx specific (I wouldn't
 advise using
  osx in production anyway) and it's not directly related to
 Cassandra anyway
  so you can easily use the beta version of snappy-java as a
 workaround if you
  want to. So that non blocking issue aside, and as far as we know,
 Cassandra
  supports Java 7. Is it rock-solid 

Re: Correct way to design a cassandra database

2012-12-21 Thread Adam Venturella
One more link that might be helpful. It's a similar system to photo's but
instead of Photos/Albums it's Songs/Playlists:

http://www.datastax.com/dev/blog/cql3-for-cassandra-experts.

It's not exactly 1:1 but it covers related concepts in making it work.



On Fri, Dec 21, 2012 at 8:02 AM, Adam Venturella aventure...@gmail.comwrote:

 Ok.. So here is my latest thinking... Including that index:

 CREATE TABLE Users (
 user_name text,
 password text,
 PRIMARY KEY (user_name)
 );

 ^ Same as before

 CREATE TABLE Photos(
 user_name text,
 photo_id uuid,
 created_time timestamp,
 data text,
 PRIMARY KEY (user_name, photo_id, created_time)
 ) WITH CLUSTERING ORDER BY (created_time DESC);

 ^ Note the addition of a photo id and using that in the PK def with the
 created_time
 Data is a JSON like this:
 {
 thumbnail: url,
 standard_resolution:url
 }


 CREATE TABLE PhotosAlbums (
 user_name text,
 album_name text,
 poster_image_url text,
 data text
 PRIMARY KEY (user_name, album_name)
 );

 ^ Same as before, data represents a JSON array of the photos:
 [{photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url}]

 CREATE TABLE PhotosAlbumsIndex (
 user_name text,
 photo_id uuid,
 album_name text,
 created_time timestamp
 PRIMARY KEY (user_name, photo_id, album_name)
 );

 The create_time column here is because you need to have at least 1 column
 that is not part of the PK. Or that's what it looks like in my quick test.

 ^ Each photo added to an album needs to be added to this index row


 As before, your application will need to keep the order of the array in
 tact as your users modify the order of things. Now however if they delete a
 photo you need to fetch the PhotoAlbums the photo existed in and update
 them accordingly:

 SELECT * FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
 photo_id=uuid

 This should return to you all of the albums that the photo was a part of.
 Now you need to:

 SELECT * FROM PhotosAlbums where user_name = the_user and album_name IN
 (name1, name2, name3 )

 name1,2,3 are the album names you selected from the PhotosAlbumsIndex query

 So now you have all of the photo albums, you would then iterate over those
 in your application, deserializing the JSON data locating the photo ID was
 was removed and taking it out of the array, then reserializing to JSON and
 updating the record.

 When that is complete you need to remove the Photo from the
 PhotosAlbumsIndex. Now there is where I het stuck a little.. because this
 will fail:

 DELETE FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
 photo_id=uuid;

 It seems to want the album name as well since it's part of the PK.
 Admittedly, I don't know how to get around that and just delete everything
 where the first 2 components of the PK are true.

 You would already possess the list of album names though, so it could be
 BATCH that you need to perform for the deletes, specifying 1 delete per
 album_name.


 Anyway, that's my current thinking. I would love to know if it's possible
 to get around the DELETE issue another way.



 On Fri, Dec 21, 2012 at 7:15 AM, Adam Venturella aventure...@gmail.comwrote:

 Hmmm it just occurred to me that in my examples, there is no convenient
 way to delete a photo and also remove that photo from the albums it is a
 part of.

 As it stands, you would need to iterate over all of the users albums to
 locate the photo and remove it; that's no good.

 Probably need another table that holds just the photo / album
 identifiers, an index. So when the user deletes a photo, you ask the index
 which albums that photo belongs too and just fetch those to update the
 album with that photo removed.

 :: mobile emails ::

 On Dec 21, 2012, at 3:50, David Mohl d...@dave.cx wrote:

  Hello!

 I've recently started learning cassandra but still have troubles
 understanding the best way to design a cassandra database.
 I've posted my question already on stackoverflow but because this would
 very likely result in a discussion, it got closed. Orginal question here:
 http://stackoverflow.com/questions/13975868/correct-way-to-design-a-cassandra-database


 Assuming you have 3 types of objects: User, Photo and Album. Obviously a
 photo belongs to a user and can be part of a album. For querying, assume we
 just want to order by last goes first. Paging by 10 elements should be
 possible.

 Would you go like every document has all the informations needed for a
 correct output. Something like this:

 -- User
| -- Name
| -- ...
| -- Photos
 | -- Photoname
 | -- Uploaded at

 Or go a more relational way (while having a secondary index on the
 belongs_to columns:

 -- User (userid is the row key)
| -- Name
| -- 

thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Qiaobing Xie

Hi,

I am developing a thrift client that inserts and removes columns from a 
column-family (using batch_mutate calls). Everything seems to be working 
fine - my thrift client can add/retrieve/delete/add back columns as 
expected... until I manually deleted a column with cassandra-cli. (I was 
trying to test an error scenario in which my client would discover a 
missing column and recreated it in the column-family). After I deleted a 
column from within cassandra-cli manually, my thrift client detected the 
column of that name missing when it tried to get it. So it tried to 
recreated a new column with that name along with a bunch of other 
columns with a batch_mutate call. The call returned normally and the 
other columns were added/updated, but the one that I manually deleted 
from cassandra-cli was not added/created in the column family.


I tried to restart my client and cassandra-cli but it didn't help. It 
just seemed that my thrift client could no longer add a column with that 
name! Finally I destroyed and recreated the whole column-family and the 
problem went away.


Any idea what I did wrong?

-Qiaobing




Re: thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Edward Capriolo
The cli using microsecond precision your client might be using something
else and the insert with lower timestamps are dropped.

On Friday, December 21, 2012, Qiaobing Xie qiaobing@gmail.com wrote:
 Hi,

 I am developing a thrift client that inserts and removes columns from a
column-family (using batch_mutate calls). Everything seems to be working
fine - my thrift client can add/retrieve/delete/add back columns as
expected... until I manually deleted a column with cassandra-cli. (I was
trying to test an error scenario in which my client would discover a
missing column and recreated it in the column-family). After I deleted a
column from within cassandra-cli manually, my thrift client detected the
column of that name missing when it tried to get it. So it tried to
recreated a new column with that name along with a bunch of other columns
with a batch_mutate call. The call returned normally and the other columns
were added/updated, but the one that I manually deleted from cassandra-cli
was not added/created in the column family.

 I tried to restart my client and cassandra-cli but it didn't help. It
just seemed that my thrift client could no longer add a column with that
name! Finally I destroyed and recreated the whole column-family and the
problem went away.

 Any idea what I did wrong?

 -Qiaobing





Very large HintsColumnFamily

2012-12-21 Thread Keith Wright
Hi all,

I am seeing a VERY large HintsColumnFamily (40+ GB) on one of my nodes (I 
have 2 DC with 3 nodes each with 2 RF).  Nodetool ring as a result reports load 
as being way higher for the one node (the delta being the size of the 
HintsColumnFamily).  This behavior seems to occur if I do a large amount of 
data loading using that node as the coordinator node.  I found a post related 
to this 
(http://mail-archives.apache.org/mod_mbox/cassandra-user/201203.mbox/%3c376cec01195c894cb9f8a3c274029a96b471d...@fish-ex2k10-01.azaleos.net%3E)
 but wanted to see if there were better ways to handle it then the reset 
suggested as it seems somewhat risky.  Nodetool netstats never seems to show 
any streaming data.  With past nodes it seemed like the node eventually fixed 
itself.  Note that I have the OOTB gc_grace_seconds so perhaps I just need to 
wait 10 days before that runs again and the data gets deleted?  Is there a way 
to change gc_grace_seconds outside Cassandra.yaml and thus save myself a node 
restart?

Note that I am seeing severely degraded performance on this node when it 
attempts to compact the HintsColumnFamily to the point where I had to set 
setcompactionthroughput to 999 to ensure it doesn't run again (after which the 
node started serving requests much faster).

I appreciate the help!

Thanks


Re: Correct way to design a cassandra database

2012-12-21 Thread Edward Capriolo
You could store the order as the first part of a composite string say first
picture as A and second as B. To insert one between call it AA. If you
shuffle alot the strings could get really long.

Might be better to store the order in a separate column.

Neither solution mentioned deals with concurrent access well.

On Friday, December 21, 2012, Adam Venturella aventure...@gmail.com wrote:
 One more link that might be helpful. It's a similar system to photo's but
instead of Photos/Albums it's Songs/Playlists:
 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts.

 It's not exactly 1:1 but it covers related concepts in making it work.


 On Fri, Dec 21, 2012 at 8:02 AM, Adam Venturella aventure...@gmail.com
wrote:

 Ok.. So here is my latest thinking... Including that index:
 CREATE TABLE Users (
 user_name text,
 password text,
 PRIMARY KEY (user_name)
 );
 ^ Same as before
 CREATE TABLE Photos(
 user_name text,
 photo_id uuid,
 created_time timestamp,
 data text,
 PRIMARY KEY (user_name, photo_id, created_time)
 ) WITH CLUSTERING ORDER BY (created_time DESC);
 ^ Note the addition of a photo id and using that in the PK def with the
created_time
 Data is a JSON like this:
 {
 thumbnail: url,
 standard_resolution:url
 }

 CREATE TABLE PhotosAlbums (
 user_name text,
 album_name text,
 poster_image_url text,
 data text
 PRIMARY KEY (user_name, album_name)
 );
 ^ Same as before, data represents a JSON array of the photos:
 [{photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url},
 {photo_id:..., thumbnail:url, standard_resolution:url}]

 CREATE TABLE PhotosAlbumsIndex (
 user_name text,
 photo_id uuid,
 album_name text,
 created_time timestamp
 PRIMARY KEY (user_name, photo_id, album_name)
 );
 The create_time column here is because you need to have at least 1 column
that is not part of the PK. Or that's what it looks like in my quick test.
 ^ Each photo added to an album needs to be added to this index row

 As before, your application will need to keep the order of the array in
tact as your users modify the order of things. Now however if they delete a
photo you need to fetch the PhotoAlbums the photo existed in and update
them accordingly:
 SELECT * FROM PhotosAlbumsIndex WHERE user_name='the_user' AND
photo_id=uuid
 This should return to you all of the albums that the photo was a part of.
Now you need to:
 SELECT * FROM PhotosAlbums where user_name = the_user and album_name IN


Re: Very large HintsColumnFamily

2012-12-21 Thread Rob Coli
Before we start.. what version of cassandra?

On Fri, Dec 21, 2012 at 4:25 PM, Keith Wright kwri...@nanigans.com wrote:
 This behavior seems to occur if I do a large
 amount of data loading using that node as the coordinator node.

In general you want to use all nodes to coordinate, not a single one.

 Nodetool netstats never seems to show
 any streaming data.  With past nodes it seemed like the node eventually
 fixed itself.

That node is storing hints for other nodes it believes are or were at
some point in DOWN state. The first step to preventing this problem
from recurring is to understand why it believes/d other nodes are
down. My conjecture is that you are overloading the coordinating node
and/or other nodes with the large amount of write.

 Note that I am seeing severely degraded performance on this node when it
 attempts to compact the HintsColumnFamily to the point where I had to set
 setcompactionthroughput to 999 to ensure it doesn't run again (after which
 the node started serving requests much faster).

Depending on version, your 40gb of hints could be in one 40gb wide
row. Look at nodetool cfstats for HintsColumnFamily to determine if
this is the case.

Do you see Timed out replaying hint messages, or are the hints being
successfully delivered?

You have two broad options :

1) purge your hints and then either reload the data (if reloading it
will be idempotent) or repair -pr on every node in the cluster.
2) reduce load enough that hints will be successfully delivered,
reduce gc_grace_seconds on the hints cf to 0 and then do a major
compaction.

If I were you, I would probably do 1). The easiest way is to stop the
node and remove all sstables in the HintsColumnFamily.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Very large HintsColumnFamily

2012-12-21 Thread Keith
1.1.7

Rob Coli rc...@palominodb.com wrote:

Before we start.. what version of cassandra?

On Fri, Dec 21, 2012 at 4:25 PM, Keith Wright kwri...@nanigans.com wrote:
 This behavior seems to occur if I do a large
 amount of data loading using that node as the coordinator node.

In general you want to use all nodes to coordinate, not a single one.

 Nodetool netstats never seems to show
 any streaming data.  With past nodes it seemed like the node eventually
 fixed itself.

That node is storing hints for other nodes it believes are or were at
some point in DOWN state. The first step to preventing this problem
from recurring is to understand why it believes/d other nodes are
down. My conjecture is that you are overloading the coordinating node
and/or other nodes with the large amount of write.

 Note that I am seeing severely degraded performance on this node when it
 attempts to compact the HintsColumnFamily to the point where I had to set
 setcompactionthroughput to 999 to ensure it doesn't run again (after which
 the node started serving requests much faster).

Depending on version, your 40gb of hints could be in one 40gb wide
row. Look at nodetool cfstats for HintsColumnFamily to determine if
this is the case.

Do you see Timed out replaying hint messages, or are the hints being
successfully delivered?

You have two broad options :

1) purge your hints and then either reload the data (if reloading it
will be idempotent) or repair -pr on every node in the cluster.
2) reduce load enough that hints will be successfully delivered,
reduce gc_grace_seconds on the hints cf to 0 and then do a major
compaction.

If I were you, I would probably do 1). The easiest way is to stop the
node and remove all sstables in the HintsColumnFamily.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Qiaobing Xie
That makes sense - I think my client uses milliseconds. Thanks for 
pointing that out.


-Q

On 12/21/12 6:25 PM, Edward Capriolo wrote:
The cli using microsecond precision your client might be using 
something else and the insert with lower timestamps are dropped.


On Friday, December 21, 2012, Qiaobing Xie qiaobing@gmail.com 
mailto:qiaobing@gmail.com wrote:

 Hi,

 I am developing a thrift client that inserts and removes columns 
from a column-family (using batch_mutate calls). Everything seems to 
be working fine - my thrift client can add/retrieve/delete/add back 
columns as expected... until I manually deleted a column with 
cassandra-cli. (I was trying to test an error scenario in which my 
client would discover a missing column and recreated it in the 
column-family). After I deleted a column from within cassandra-cli 
manually, my thrift client detected the column of that name missing 
when it tried to get it. So it tried to recreated a new column with 
that name along with a bunch of other columns with a batch_mutate 
call. The call returned normally and the other columns were 
added/updated, but the one that I manually deleted from cassandra-cli 
was not added/created in the column family.


 I tried to restart my client and cassandra-cli but it didn't help. 
It just seemed that my thrift client could no longer add a column with 
that name! Finally I destroyed and recreated the whole column-family 
and the problem went away.


 Any idea what I did wrong?

 -Qiaobing