Re: Upserting the same values multiple times

2014-01-21 Thread Robert Wille
No tombstones, just many copies of the same data until compaction occurs.

From:  Sanjeeth Kumar 
Reply-To:  
Date:  Tuesday, January 21, 2014 at 8:37 PM
To:  
Subject:  Upserting the same values multiple times

Hi,
   I have a table A, one of the fields of which is a text column called
body.
 This text's length could vary somewhere between 120 characters to say 400
characters. The contents of this column can be the same for millions of
rows.

To prevent the repetition of the same data, I thought I will add another
table B, which stores \.

Table A {
some fields;

digest text,
.
}
  

TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)

Whenever I insert into table A, I calculate the digest of body, and blindly
call a insert into table B also. I'm not doing any read on B. This could
result in the same  being inserted millions of times in a
short span of time.

Couple of questions.

1) Would this cause an issue due to the number of tombstones created in a
short span of time .I'm assuming for every insert , there would be a
tombstone created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple times
(with compression, space aint that big an issue ?)


- Sanjeeth




RE: Upserting the same values multiple times

2014-01-21 Thread Viktor Jevdokimov
It's not about tombstones. Tombstones are virtually markers for deleted columns 
(using delete or ttl) in new sstables after compaction to keep such columns for 
gcgrace period.

Updates do not create tombstones for previous records, latest version upon 
timestamp will be saved from memtable or when merged from sstables upon 
compaction.

While data is in the memtable, latest timestamp wins, only latest version will 
flush to disk. Then everything depends on how fast you flush memtables and how 
compaction works thereafter. Do not expect any tombstones with updates, except 
when delete columns.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsider
Experience Adform DNA

[Adform News] 
[Adform awarded the Best Employer 2012] 



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Sanjeeth Kumar [mailto:sanje...@exotel.in]
Sent: Wednesday, January 22, 2014 5:37 AM
To: user@cassandra.apache.org
Subject: Upserting the same values multiple times

Hi,
   I have a table A, one of the fields of which is a text column called body.
 This text's length could vary somewhere between 120 characters to say 400 
characters. The contents of this column can be the same for millions of rows.
To prevent the repetition of the same data, I thought I will add another table 
B, which stores \.
Table A {
some fields;

digest text,
.
}


TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)
Whenever I insert into table A, I calculate the digest of body, and blindly 
call a insert into table B also. I'm not doing any read on B. This could result 
in the same  being inserted millions of times in a short span of 
time.
Couple of questions.
1) Would this cause an issue due to the number of tombstones created in a short 
span of time .I'm assuming for every insert , there would be a tombstone 
created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple times 
(with compression, space aint that big an issue ?)

- Sanjeeth
<><>

Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
You’re right. I didn’t catch that. No need to have email in the PRIMARY KEY.

On Jan 21, 2014, at 5:11 PM, Jon Ribbens  
wrote:

> On Tue, Jan 21, 2014 at 10:40:39AM -0800, Drew Kutcharian wrote:
>>   Thanks, I was actually thinking of doing that. Something along the lines
>>   of
>>   CREATE TABLE user (
>> idtimeuuid PRIMARY KEY,
>> emailtext,
>> nametext,
>> ...
>>   );
>>   CREATE TABLE user_email_index (
>> email  text,
>> id  timeuuid,
>> PRIMARY KEY (email, id)
>>   );
>>   And during registration, I would just use LWT on the user_email_index
>>   table first and insert the record and then insert the actual user record
>>   into user table w/o LWT. Does that sound right to you?
> 
> Yes, although unless I'm confused you don't need "id" in the
> primary key on "user_email_index", just "PRIMARY KEY (email)".



Upserting the same values multiple times

2014-01-21 Thread Sanjeeth Kumar
Hi,
   I have a table A, one of the fields of which is a text column called
body.
 This text's length could vary somewhere between 120 characters to say 400
characters. The contents of this column can be the same for millions of
rows.

To prevent the repetition of the same data, I thought I will add another
table B, which stores \.

Table A {
some fields;

digest text,
.
}


TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)

Whenever I insert into table A, I calculate the digest of body, and blindly
call a insert into table B also. I'm not doing any read on B. This could
result in the same  being inserted millions of times in a
short span of time.

Couple of questions.

1) Would this cause an issue due to the number of tombstones created in a
short span of time .I'm assuming for every insert , there would be a
tombstone created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple
times (with compression, space aint that big an issue ?)


- Sanjeeth


Re: Best design for a usecase ??

2014-01-21 Thread Naresh Yadav
just to add : on this table there will be lakhs of select queries to get
tagcombinationid fro a partial set of tags...

On Tue, Jan 21, 2014 at 2:33 PM, Naresh Yadav  wrote:

> Hi,
>
> I need to design a table which will give a UUID to set of tags.
> Each tag itself has unique UUID
>
> *TagCombination* table
> TC1  ->  India, Pen
> TC2  ->  Shampoo, U.K
> TC3  ->  Team1, Product1, Location1
> TC4  ->  Office1, India, Pen
>
> I can have *billion *of such unique combinations and there can be *million
> *of unique tags but each combination will have 2 to 10 tags max.
>
> As data comes daily there would be new combination registered if not
> exists.
>
> *Query on this table :*
> 1. Give me list of tags for Tagcombination Id=TC1
> 2. A set of tags comes in which Tagcombination Ids
> If i say India,Pen comes then it comes in TC1, TC4
> There can be exact match or partial match on tags to get TCids
>
> Please suggest design for this so that this table can handle bigdata.
>
> Thanks
> Naresh
>
>


Re: Moving from relational to Cassandra, how to handle intra-table relationships?

2014-01-21 Thread Les Hartzman
True. Fortunately though in this application, the data is
write-once/read-many. So that is one bullet I would dodge!

Les


On Tue, Jan 21, 2014 at 5:34 PM, Patricia Gorla
wrote:

> Hey,
>
> One thing to keep in mind if you want to go the serialized JSON route, is
> that you will need to read out the data each time you want to do an update.
>
> Cheers,
> Patricia
>
>
> On Tuesday, January 21, 2014, Les Hartzman  wrote:
>
>> Hi,
>>
>> I'm looking to move from a relational DB to Cassandra. I just found that
>> there are intra-table relationships in one table where the ids of the
>> related rows are saved in a 'parent' row.
>>
>> How can these kinds of relationships be handled in Cassandra? I'm
>> thinking that if the individual rows need to live on their own, perhaps I
>> should store the data as serialized JSON in its own column of the parent.
>>
>> All thoughts appreciated!
>>
>> Thanks.
>>
>> Les
>>
>>
>
> --
> Patricia Gorla
> @patriciagorla
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com 
>
>


Re: Moving from relational to Cassandra, how to handle intra-table relationships?

2014-01-21 Thread Patricia Gorla
Hey,

One thing to keep in mind if you want to go the serialized JSON route, is
that you will need to read out the data each time you want to do an update.

Cheers,
Patricia

On Tuesday, January 21, 2014, Les Hartzman  wrote:

> Hi,
>
> I'm looking to move from a relational DB to Cassandra. I just found that
> there are intra-table relationships in one table where the ids of the
> related rows are saved in a 'parent' row.
>
> How can these kinds of relationships be handled in Cassandra? I'm thinking
> that if the individual rows need to live on their own, perhaps I should
> store the data as serialized JSON in its own column of the parent.
>
> All thoughts appreciated!
>
> Thanks.
>
> Les
>
>

-- 
Patricia Gorla
@patriciagorla

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com 


Moving from relational to Cassandra, how to handle intra-table relationships?

2014-01-21 Thread Les Hartzman
Hi,

I'm looking to move from a relational DB to Cassandra. I just found that
there are intra-table relationships in one table where the ids of the
related rows are saved in a 'parent' row.

How can these kinds of relationships be handled in Cassandra? I'm thinking
that if the individual rows need to live on their own, perhaps I should
store the data as serialized JSON in its own column of the parent.

All thoughts appreciated!

Thanks.

Les


Re: Data modeling users table with CQL

2014-01-21 Thread Jon Ribbens
On Tue, Jan 21, 2014 at 10:40:39AM -0800, Drew Kutcharian wrote:
>Thanks, I was actually thinking of doing that. Something along the lines
>of
>CREATE TABLE user (
>  idtimeuuid PRIMARY KEY,
>  emailtext,
>  nametext,
>  ...
>);
>CREATE TABLE user_email_index (
>  email  text,
>  id  timeuuid,
>  PRIMARY KEY (email, id)
>);
>And during registration, I would just use LWT on the user_email_index
>table first and insert the record and then insert the actual user record
>into user table w/o LWT. Does that sound right to you?

Yes, although unless I'm confused you don't need "id" in the
primary key on "user_email_index", just "PRIMARY KEY (email)".


Re: upgrade from cassandra 1.2.3 -> 1.2.13 + start using SSL

2014-01-21 Thread Cyril Scetbon
Yes it really seems to be similar. I'll update the Jira with my information. I 
can easily reproduce it. I saw it lasting for one hour last time and not coming 
back after that.
-- 
Cyril SCETBON

On 21 Jan 2014, at 21:57, Robert Coli  wrote:

> On Mon, Jan 20, 2014 at 3:22 AM, Cyril Scetbon  wrote:
> The only thing I'm worrying about is that I met a situation where I had a lot 
> of flushes on some nodes. You can find one of my system logs at 
> http://pastebin.com/YZKUQLXz. I'm not sure as I didn't let it run for more 
> than 4 minutes, but it seems that there was an infinite loop flushing system 
> column families. A whole restart made this error go away but I'mn not sure if 
> I can have this one come back.
> 
> CASSANDRA-4880 - Endless loop flushing+compacting system/schema_keyspaces and 
> system/schema_columnfamilies
> 
> https://issues.apache.org/jira/browse/CASSANDRA-4880?focusedCommentId=13837624&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13837624
> "
> I have encountered similar bug after I upgrade one of my 1.2.2 node to 1.2.12.
> "
> 
> =Rob
> 



Possible optimization: avoid creating tombstones for TTLed columns if updates to TTLs are disallowed

2014-01-21 Thread Donald Smith
I'm aware of https://issues.apache.org/jira/browse/CASSANDRA-4917, which 
optimizes tombstone creation for TTLed columns: "We only need to ensure that 
ExpiringColumn and tombstone together live as long as gc_grace. If the 
ExpiringColumn's TTL>=gc_grace_seconds then we can create an already gcable 
tombstone and drop that instantly."   I presume the point is that GCable 
tombstones can still do work (preventing spurious writing from nodes that were 
down) but only until the data is flushed to disk.  If the effective TTL exceeds 
gc_grace_seconds then the tombstone will be deleted anyway.

It occurred to me that if you never update the TTL of a column, then there 
should be no need for tombstones at all:  any replicas will have the same TTL.  
So there'd be no risk of missed deletes.  You wouldn't even need GCable 
tombstones.  The purpose of a tombstone is to cover the case where a different 
node was down and it didn't notice the delete and it still had the column and 
tried to replicate it back; but that won't happen if it too had the TTL.

So, if - and it's a big if - a table disallowed updates to TTL, then you could 
really optimize deletion of TTLed columns: you could do away with tombstones 
entirely.   If a table allows updates to TTL then it's possible a different 
node will have the row without the TTL and the tombstone would be needed.

Or am I missing something?

Disallowing updates would seem to enable optimizations in general.   Many data 
are write-once.

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.com

[AudienceScience]

<>

Re: bad interaction between CompositeTypes and Secondary index

2014-01-21 Thread Brian Tarbox
The table was created this way, we also avoid altering exiting tables.


On Tue, Jan 21, 2014 at 4:19 PM, Jacob Rhoden  wrote:

> Was the original table created, or created then altered? It makes a
> difference as I have seen this type of thing occur on tables I first
> created then updated. Not sure if that issue was fixed in 2.0.4, I'm
> avoiding altering tables completely for now.
>
> __
> Sent from iPhone
>
> On 22 Jan 2014, at 7:50 am, Brian Tarbox  wrote:
>
> We're trying to use CompositeTypes and Secondary indexes and are getting
> an assertion failure in ExtendedFilter.java line 258 (running C* 2.0.3)
> when we call getIndexedColumns.  The assertion is for not finding any
> columns.
>
> The strange bit is that if we re-create the column family in question and
> do *not *set ComparatorType then things work fine.  This seems odd since
> as I understand it the ComparatorType is for controlling the ordering of
> columns within a row and the Secondary Index is to find a subset of rows
> that contain a particular column valuein other words they seem like
> they shouldn't have an interaction.
>
> Its also puzzling to us that ExtendedFilter asserts in this case...if it
> find no columns I would have expected an empty return but not a failure
> (that our client code saw as a Timeout exception).
>
> Any clues would be appreciated.
>
> Thanks,
>
> Brian Tarbox
>
>


Re: bad interaction between CompositeTypes and Secondary index

2014-01-21 Thread Jacob Rhoden
Was the original table created, or created then altered? It makes a difference 
as I have seen this type of thing occur on tables I first created then updated. 
Not sure if that issue was fixed in 2.0.4, I'm avoiding altering tables 
completely for now.

__
Sent from iPhone

> On 22 Jan 2014, at 7:50 am, Brian Tarbox  wrote:
> 
> We're trying to use CompositeTypes and Secondary indexes and are getting an 
> assertion failure in ExtendedFilter.java line 258 (running C* 2.0.3) when we 
> call getIndexedColumns.  The assertion is for not finding any columns.
> 
> The strange bit is that if we re-create the column family in question and do 
> not set ComparatorType then things work fine.  This seems odd since as I 
> understand it the ComparatorType is for controlling the ordering of columns 
> within a row and the Secondary Index is to find a subset of rows that contain 
> a particular column valuein other words they seem like they shouldn't 
> have an interaction.
> 
> Its also puzzling to us that ExtendedFilter asserts in this case...if it find 
> no columns I would have expected an empty return but not a failure (that our 
> client code saw as a Timeout exception).
> 
> Any clues would be appreciated.
> 
> Thanks,
> 
> Brian Tarbox


Re: data export with different replication factor.

2014-01-21 Thread Robert Coli
On Sat, Jan 18, 2014 at 11:29 AM, chandra Varahala <
hadoopandcassan...@gmail.com> wrote:

> I have 6 node cluster production cluster with replication factor of 3 with
> 4  keyspaces, and 1 Test cluster with 2 nodes , is there a way I can export
> data from production cluster  and copy  into test cluster with replication
> factor 1 ?
>

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

tl;dr - copy all sstables from all source nodes to all target nodes, being
sure to avoid name collisions, and then run cleanup compaction.

=Rob


Re: upgrade from cassandra 1.2.3 -> 1.2.13 + start using SSL

2014-01-21 Thread Robert Coli
On Mon, Jan 20, 2014 at 3:22 AM, Cyril Scetbon wrote:

> The only thing I'm worrying about is that I met a situation where I had a
> lot of flushes on some nodes. You can find one of my system logs at
> http://pastebin.com/YZKUQLXz. I'm not sure as I didn't let it run for
> more than 4 minutes, but it seems that there was an infinite loop flushing
> system column families. A whole restart made this error go away but I'mn
> not sure if I can have this one come back.
>

CASSANDRA-4880 - Endless loop flushing+compacting system/schema_keyspaces
and system/schema_columnfamilies

https://issues.apache.org/jira/browse/CASSANDRA-4880?focusedCommentId=13837624&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13837624
"
I have encountered similar bug after I upgrade one of my 1.2.2 node to
1.2.12.
"

=Rob


bad interaction between CompositeTypes and Secondary index

2014-01-21 Thread Brian Tarbox
We're trying to use CompositeTypes and Secondary indexes and are getting an
assertion failure in ExtendedFilter.java line 258 (running C* 2.0.3) when
we call getIndexedColumns.  The assertion is for not finding any columns.

The strange bit is that if we re-create the column family in question and
do *not *set ComparatorType then things work fine.  This seems odd since as
I understand it the ComparatorType is for controlling the ordering of
columns within a row and the Secondary Index is to find a subset of rows
that contain a particular column valuein other words they seem like
they shouldn't have an interaction.

Its also puzzling to us that ExtendedFilter asserts in this case...if it
find no columns I would have expected an empty return but not a failure
(that our client code saw as a Timeout exception).

Any clues would be appreciated.

Thanks,

Brian Tarbox


RE: How to add a new DC to cluster in Cassandra 2.x

2014-01-21 Thread Lu, Boying
Thanks a lot.  That’s what I want.

From: Tupshin Harper [mailto:tups...@tupshin.com]
Sent: 2014年1月21日 23:16
To: user@cassandra.apache.org
Subject: Re: How to add a new DC to cluster in Cassandra 2.x


This should be the doc you are looking for.

http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html

-Tupshin
On Jan 21, 2014 2:14 AM, "Lu, Boying" 
mailto:boying...@emc.com>> wrote:
Hi, All,

I’m new to Cassandra. I want to know how to add a DC to existing Cassandra 
cluster (all running Cassandra 2.x).
I found a related document at 
http://www.datastax.com/docs/1.1/cluster_management

Is it still valid for Cassandra 2.x?

Thanks

Boying


RE: Question about node tool repair

2014-01-21 Thread Logendran, Dharsan (Dharsan)
Thanks Rob,

Dharsan

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: January-21-14 2:26 PM
To: user@cassandra.apache.org
Subject: Re: Question about node tool repair

On Mon, Jan 20, 2014 at 2:47 PM, Logendran, Dharsan (Dharsan) 
mailto:dharsan.logend...@alcatel-lucent.com>>
 wrote:
We have a two  node cluster with the replication factor of 2.   The db has more 
than 2500 column families(tables).   The nodetool -pr repair on an empty 
database(one or table has a litter data) takes about 30 hours to complete.   We 
are using Cassandra Version 2.0.4.   Is there any way for us to speed up this?.

Cassandra 2.0.2 made aspects of repair serial and therefore logically much 
slower as a function of replication factor. Yours is not the first report I 
have heard of >= 2.0.2 era repair being unreasonably slow.

https://issues.apache.org/jira/browse/CASSANDRA-5950

You can use -par (not at all confusingly named with -pr!) to get the old 
parallel behavior.

Cassandra 2.1 has this ticket to improve repair with vnodes.

https://issues.apache.org/jira/browse/CASSANDRA-5220

But really you should strongly consider how much you need to run repair, and at 
very least probably increase gc_grace_seconds from the unreasonably low default 
of 10 days to 32 days, and then run your repair on the first of each month.

https://issues.apache.org/jira/browse/CASSANDRA-5850

IMO it is just a complete and total error if repair of an actually empty 
database is anything but a NO-OP. I would file a JIRA ticket, were I you.

=Rob



Re: Data modeling users table with CQL

2014-01-21 Thread Tupshin Harper
It's a broad topic, but I mean all of the best practices alluded to by
writeups like this.

http://www.technicalinfo.net/papers/WebBasedSessionManagement.html

-Tupshin
On Jan 21, 2014 11:37 AM, "Drew Kutcharian"  wrote:

> Cool. BTW, what do you mean by have additional session tracking ids?
> What’d that be for?
>
> - Drew
>
> On Jan 21, 2014, at 10:48 AM, Tupshin Harper  wrote:
>
> It does sound right.
>
> You might want to have additional session tracking id's,  separate from
> the user id, but that is an additional implementation detail, and could be
> external to Cassandra.  But the approach you describe accurately describes
> what I would do as a first pass, at least.
>
> -Tupshin
> On Jan 21, 2014 10:41 AM, "Drew Kutcharian"  wrote:
>
>> Thanks, I was actually thinking of doing that. Something along the lines
>> of
>>
>> CREATE TABLE user (
>>   idtimeuuid PRIMARY KEY,
>>   emailtext,
>>   nametext,
>>   ...
>> );
>>
>> CREATE TABLE user_email_index (
>>   email  text,
>>   id  timeuuid,
>>   PRIMARY KEY (email, id)
>> );
>>
>> And during registration, I would just use LWT on the user_email_index
>> table first and insert the record and then insert the actual user record
>> into user table w/o LWT. Does that sound right to you?
>>
>> - Drew
>>
>>
>>
>> On Jan 21, 2014, at 10:01 AM, Tupshin Harper  wrote:
>>
>> One CQL row per user, keyed off of the UUID.
>>
>> Another table keyed off of email, with another column containing the UUID
>> for lookups in the first table.  Only registration will require a
>> lightweight transaction, and only for the purpose of avoiding duplicate
>> email registration race conditions.
>>
>> -Tupshin
>> On Jan 21, 2014 9:17 AM, "Drew Kutcharian"  wrote:
>>
>>> A shameful bump ;)
>>>
>>> > On Jan 20, 2014, at 2:14 PM, Drew Kutcharian  wrote:
>>> >
>>> > Hey Guys,
>>> >
>>> > I’m new to CQL (but have been using C* for a while now). What would be
>>> the best way to model a users table using CQL/Cassandra 2.0 Lightweight
>>> Transactions where we would like to have:
>>> > - A unique TimeUUID as the primary key of the user
>>> > - A unique email address used for logging in
>>> >
>>> > In the past I would use Zookeeper and/or Astyanax’s "Uniqueness
>>> Constraint” but I want to see how can this be handled natively.
>>> >
>>> > Cheers,
>>> >
>>> > Drew
>>> >
>>>
>>
>>
>


Re: Question about node tool repair

2014-01-21 Thread Robert Coli
On Mon, Jan 20, 2014 at 2:47 PM, Logendran, Dharsan (Dharsan) <
dharsan.logend...@alcatel-lucent.com> wrote:

>  We have a two  node cluster with the replication factor of 2.   The db
> has more than 2500 column families(tables).   The nodetool -pr repair on an
> empty database(one or table has a litter data) takes about 30 hours to
> complete.   We are using Cassandra Version 2.0.4.   Is there any way for us
> to speed up this?.
>

Cassandra 2.0.2 made aspects of repair serial and therefore logically much
slower as a function of replication factor. Yours is not the first report I
have heard of >= 2.0.2 era repair being unreasonably slow.

https://issues.apache.org/jira/browse/CASSANDRA-5950

You can use -par (not at all confusingly named with -pr!) to get the old
parallel behavior.

Cassandra 2.1 has this ticket to improve repair with vnodes.

https://issues.apache.org/jira/browse/CASSANDRA-5220

But really you should strongly consider how much you need to run repair,
and at very least probably increase gc_grace_seconds from the unreasonably
low default of 10 days to 32 days, and then run your repair on the first of
each month.

https://issues.apache.org/jira/browse/CASSANDRA-5850

IMO it is just a complete and total error if repair of an actually empty
database is anything but a NO-OP. I would file a JIRA ticket, were I you.

=Rob


Re: Upgrading 1.0.9 to 2.0

2014-01-21 Thread Robert Coli
On Mon, Jan 20, 2014 at 1:47 AM, Or Sher  wrote:

> Can I use sstableloader to load SSTables from a RandomPartitioner cluster
> to a Murmuer3Partitioner cluster?
>

My expectation would be yes, if you try it and it works, let us know!

=Rob


Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
Cool. BTW, what do you mean by have additional session tracking ids? What’d 
that be for?

- Drew

On Jan 21, 2014, at 10:48 AM, Tupshin Harper  wrote:

> It does sound right. 
> 
> You might want to have additional session tracking id's,  separate from the 
> user id, but that is an additional implementation detail, and could be 
> external to Cassandra.  But the approach you describe accurately describes 
> what I would do as a first pass, at least.
> 
> -Tupshin
> 
> On Jan 21, 2014 10:41 AM, "Drew Kutcharian"  wrote:
> Thanks, I was actually thinking of doing that. Something along the lines of 
> 
> CREATE TABLE user (
>   idtimeuuid PRIMARY KEY,
>   emailtext,
>   nametext,
>   ...
> );
> 
> CREATE TABLE user_email_index (
>   email  text,
>   id  timeuuid,
>   PRIMARY KEY (email, id)
> );
> 
> And during registration, I would just use LWT on the user_email_index table 
> first and insert the record and then insert the actual user record into user 
> table w/o LWT. Does that sound right to you?
> 
> - Drew
> 
> 
> 
> On Jan 21, 2014, at 10:01 AM, Tupshin Harper  wrote:
> 
>> One CQL row per user, keyed off of the UUID. 
>> 
>> Another table keyed off of email, with another column containing the UUID 
>> for lookups in the first table.  Only registration will require a 
>> lightweight transaction, and only for the purpose of avoiding duplicate 
>> email registration race conditions.
>> 
>> -Tupshin
>> 
>> On Jan 21, 2014 9:17 AM, "Drew Kutcharian"  wrote:
>> A shameful bump ;)
>> 
>> > On Jan 20, 2014, at 2:14 PM, Drew Kutcharian  wrote:
>> >
>> > Hey Guys,
>> >
>> > I’m new to CQL (but have been using C* for a while now). What would be the 
>> > best way to model a users table using CQL/Cassandra 2.0 Lightweight 
>> > Transactions where we would like to have:
>> > - A unique TimeUUID as the primary key of the user
>> > - A unique email address used for logging in
>> >
>> > In the past I would use Zookeeper and/or Astyanax’s "Uniqueness 
>> > Constraint” but I want to see how can this be handled natively.
>> >
>> > Cheers,
>> >
>> > Drew
>> >
> 



Re: Data modeling users table with CQL

2014-01-21 Thread Tupshin Harper
It does sound right.

You might want to have additional session tracking id's,  separate from the
user id, but that is an additional implementation detail, and could be
external to Cassandra.  But the approach you describe accurately describes
what I would do as a first pass, at least.

-Tupshin
On Jan 21, 2014 10:41 AM, "Drew Kutcharian"  wrote:

> Thanks, I was actually thinking of doing that. Something along the lines
> of
>
> CREATE TABLE user (
>   idtimeuuid PRIMARY KEY,
>   emailtext,
>   nametext,
>   ...
> );
>
> CREATE TABLE user_email_index (
>   email  text,
>   id  timeuuid,
>   PRIMARY KEY (email, id)
> );
>
> And during registration, I would just use LWT on the user_email_index
> table first and insert the record and then insert the actual user record
> into user table w/o LWT. Does that sound right to you?
>
> - Drew
>
>
>
> On Jan 21, 2014, at 10:01 AM, Tupshin Harper  wrote:
>
> One CQL row per user, keyed off of the UUID.
>
> Another table keyed off of email, with another column containing the UUID
> for lookups in the first table.  Only registration will require a
> lightweight transaction, and only for the purpose of avoiding duplicate
> email registration race conditions.
>
> -Tupshin
> On Jan 21, 2014 9:17 AM, "Drew Kutcharian"  wrote:
>
>> A shameful bump ;)
>>
>> > On Jan 20, 2014, at 2:14 PM, Drew Kutcharian  wrote:
>> >
>> > Hey Guys,
>> >
>> > I’m new to CQL (but have been using C* for a while now). What would be
>> the best way to model a users table using CQL/Cassandra 2.0 Lightweight
>> Transactions where we would like to have:
>> > - A unique TimeUUID as the primary key of the user
>> > - A unique email address used for logging in
>> >
>> > In the past I would use Zookeeper and/or Astyanax’s "Uniqueness
>> Constraint” but I want to see how can this be handled natively.
>> >
>> > Cheers,
>> >
>> > Drew
>> >
>>
>
>


Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
Thanks, I was actually thinking of doing that. Something along the lines of 

CREATE TABLE user (
  idtimeuuid PRIMARY KEY,
  emailtext,
  nametext,
  ...
);

CREATE TABLE user_email_index (
  email  text,
  id  timeuuid,
  PRIMARY KEY (email, id)
);

And during registration, I would just use LWT on the user_email_index table 
first and insert the record and then insert the actual user record into user 
table w/o LWT. Does that sound right to you?

- Drew



On Jan 21, 2014, at 10:01 AM, Tupshin Harper  wrote:

> One CQL row per user, keyed off of the UUID. 
> 
> Another table keyed off of email, with another column containing the UUID for 
> lookups in the first table.  Only registration will require a lightweight 
> transaction, and only for the purpose of avoiding duplicate email 
> registration race conditions.
> 
> -Tupshin
> 
> On Jan 21, 2014 9:17 AM, "Drew Kutcharian"  wrote:
> A shameful bump ;)
> 
> > On Jan 20, 2014, at 2:14 PM, Drew Kutcharian  wrote:
> >
> > Hey Guys,
> >
> > I’m new to CQL (but have been using C* for a while now). What would be the 
> > best way to model a users table using CQL/Cassandra 2.0 Lightweight 
> > Transactions where we would like to have:
> > - A unique TimeUUID as the primary key of the user
> > - A unique email address used for logging in
> >
> > In the past I would use Zookeeper and/or Astyanax’s "Uniqueness Constraint” 
> > but I want to see how can this be handled natively.
> >
> > Cheers,
> >
> > Drew
> >



Re: Data modeling users table with CQL

2014-01-21 Thread Tupshin Harper
One CQL row per user, keyed off of the UUID.

Another table keyed off of email, with another column containing the UUID
for lookups in the first table.  Only registration will require a
lightweight transaction, and only for the purpose of avoiding duplicate
email registration race conditions.

-Tupshin
On Jan 21, 2014 9:17 AM, "Drew Kutcharian"  wrote:

> A shameful bump ;)
>
> > On Jan 20, 2014, at 2:14 PM, Drew Kutcharian  wrote:
> >
> > Hey Guys,
> >
> > I’m new to CQL (but have been using C* for a while now). What would be
> the best way to model a users table using CQL/Cassandra 2.0 Lightweight
> Transactions where we would like to have:
> > - A unique TimeUUID as the primary key of the user
> > - A unique email address used for logging in
> >
> > In the past I would use Zookeeper and/or Astyanax’s "Uniqueness
> Constraint” but I want to see how can this be handled natively.
> >
> > Cheers,
> >
> > Drew
> >
>


Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
A shameful bump ;)

> On Jan 20, 2014, at 2:14 PM, Drew Kutcharian  wrote:
> 
> Hey Guys,
> 
> I’m new to CQL (but have been using C* for a while now). What would be the 
> best way to model a users table using CQL/Cassandra 2.0 Lightweight 
> Transactions where we would like to have:
> - A unique TimeUUID as the primary key of the user
> - A unique email address used for logging in
> 
> In the past I would use Zookeeper and/or Astyanax’s "Uniqueness Constraint” 
> but I want to see how can this be handled natively.
> 
> Cheers,
> 
> Drew
> 


Re: How to add a new DC to cluster in Cassandra 2.x

2014-01-21 Thread Tupshin Harper
This should be the doc you are looking for.

http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html

-Tupshin
 On Jan 21, 2014 2:14 AM, "Lu, Boying"  wrote:

> Hi, All,
>
>
>
> I’m new to Cassandra. I want to know how to add a DC to existing Cassandra
> cluster (all running Cassandra 2.x).
>
> I found a related document at
> http://www.datastax.com/docs/1.1/cluster_management
>
>
>
> Is it still valid for Cassandra 2.x?
>
>
>
> Thanks
>
>
>
> Boying
>


Cassandra Complete Initialisation

2014-01-21 Thread Nigel LEACH
I have a crash and burn cluster, used for all sorts of integration testing 
(DataStax 2.0.2, five nodes, 8GB heap, two seeds, vnodes, RF 2). I wanted to 
completely initialise/refresh my environment, so did something like this (can't 
be sure something else did not slip in too),

*Removed all user keyspaces 
*Standard shutdown of Cassandra on all nodes
*Deleted all files in saved_caches_directory, data_file_directories, 
commitlog_directory
*Rebooted all five nodes

Now, I can usually start Cassandra on the first seed, but not anywhere else. 
There are various errors, but the first, and most prevalent, seem to be 

Loading persisted ring state
ERROR 09:55:36,660 Exception encountered during startup
java.lang.ExceptionInInitializerError
...
Caused by: java.lang.NullPointerException
...
java.lang.ExceptionInInitializerError
...
Exception encountered during startup: null
ERROR 09:55:36,675 Exception in thread Thread[StorageServiceShutdownHook,5,main]
java.lang.ExceptionInInitializerError

Any idea how to get my zapped cluster back up again, and in future how best to 
fully initialise a Cluster.

I don't want any data, or settings, form the original cluster, although the 
more settings I can keep, the better, just from the perspective of work 
involved to creat a clean environmnet.

Many Thanks
Nigel



___
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorised copying, 
disclosure or distribution of the material in this e-mail is prohibited.

Please refer to http://www.bnpparibas.co.uk/en/email-disclaimer/ for additional 
disclosures.



How to add a new DC to cluster in Cassandra 2.x

2014-01-21 Thread Lu, Boying
Hi, All,

I'm new to Cassandra. I want to know how to add a DC to existing Cassandra 
cluster (all running Cassandra 2.x).
I found a related document at 
http://www.datastax.com/docs/1.1/cluster_management

Is it still valid for Cassandra 2.x?

Thanks

Boying


Best design for a usecase ??

2014-01-21 Thread Naresh Yadav
Hi,

I need to design a table which will give a UUID to set of tags.
Each tag itself has unique UUID

*TagCombination* table
TC1  ->  India, Pen
TC2  ->  Shampoo, U.K
TC3  ->  Team1, Product1, Location1
TC4  ->  Office1, India, Pen

I can have *billion *of such unique combinations and there can be *million *of
unique tags but each combination will have 2 to 10 tags max.

As data comes daily there would be new combination registered if not exists.

*Query on this table :*
1. Give me list of tags for Tagcombination Id=TC1
2. A set of tags comes in which Tagcombination Ids
If i say India,Pen comes then it comes in TC1, TC4
There can be exact match or partial match on tags to get TCids

Please suggest design for this so that this table can handle bigdata.

Thanks
Naresh


Long GC due to promotion failures

2014-01-21 Thread John Watson
Pretty reliable, at some point, nodes will have super long GCs.
Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592

Lovely log messages:

  9030.798: [ParNew (0: promotion failure size = 4194306)  (2:
promotion failure size = 4194306)  (4: promotion failure size =
4194306)  (promotion failed)
  Total time for which application threads were stopped: 23.2659990 seconds

Full gc.log until just before restarting the node (see another 32s GC
near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9

Here's graph of GC time, where we can see a an increase 30 minutes
prior (indicator that the issue will happen soon):
http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png

Graph of various Heap usage:
http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png

Running compactions in the same time frame:
http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png

CPU, IO, ops and latencies:
https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png

cfhistograms/cfstats: https://gist.github.com/dctrwatson/9a08b38d0258ae434b15

Cassandra 1.2.13
Oracle JDK 1.6u45

JVM opts:

MAX_HEAP_SIZE="8G"
HEAP_NEW_SIZE="1536M"

Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M
Tried default "-XX:SurvivorRatio=8" and "-XX:SurvivorRatio=4"
Tried default "-XX:MaxTenuringThreshold=1" and "-XX:MaxTenuringThreshold=2"

All still eventually ran into long GC.

Hardware for all 3 nodes:

(2) E5520 @ 2.27Ghz (8 cores w/ HT) ["16" cores]
(6) 4GB RAM [24G RAM]
(1) 500GB 7.2k for commitlog
(2) 400G SSD for data (configured as separate data directories)