How long are expired values actually returned?

2014-05-11 Thread Sebastian Schmidt
Hi,

I'm using the TTL feature for my application. In my tests, when using a
TTL of 5, the inserted rows are still returned after 7 seconds, and
after 70 seconds. Is this normal or am I doing something wrong?.

Kind Regards,
Sebastian



signature.asc
Description: OpenPGP digital signature


Re: Cyclop - CQL web based editor has been released!

2014-05-11 Thread DuyHai Doan
Really nice initiative. Thank you Maciej


On Sun, May 11, 2014 at 7:41 AM, Maciej Miklas mac.mik...@gmail.com wrote:

 Hi everybody,

 I am aware that this mailing list is meant for Cassandra users, but I’ve
 developed something that is strictly related to Cassandra, so I tough that
 it might be interesting for some of you.
 I’ve already sent one email several months ago, but since then a lot of
 things has changed!

 Cyclop is web based CQL editor - you can deploy it in web container and
 use it’s web interface to execute CQL queries or to import/export data.
 There is also live deployment, so you can try it out immediately. Of
 course the whole thing is open source.

 Hier is the project link containing all details:
 https://github.com/maciejmiklas/cyclop

 Regards,
 Maciej



Re: clearing tombstones?

2014-05-11 Thread William Oberman
Not an expert, just a user of cassandra. For me, before was a cf with a
set of files (I forget the official naming system, so I'll make up my own):
A0
A1
...
AN

During:
A0
A1
...
AN
B0

Where B0 is the union of Ai. Due to tombstones, mutations, etc.  B0 is at
most 2x, but also probably close to 2x (unless you are all tombstones,
like me).

After
B0

Since cassandra can clean up Ai. Not sure when this happens.

Not sure what state you are in above. Sounds like between during and
after.

Will

On Thursday, May 8, 2014, Ruchir Jha ruchir@gmail.com wrote:

 I tried to do this, however the doubling in disk space is not temporary
 as you state in your note. What am I missing?


 On Fri, Apr 11, 2014 at 10:44 AM, William Oberman 
 ober...@civicscience.comjavascript:_e(%7B%7D,'cvml','ober...@civicscience.com');
  wrote:

 So, if I was impatient and just wanted to make this happen now, I could:

 1.) Change GCGraceSeconds of the CF to 0
 2.) run nodetool compact (*)
 3.) Change GCGraceSeconds of the CF back to 10 days

 Since I have ~900M tombstones, even if I miss a few due to impatience, I
 don't care *that* much as I could re-run my clean up tool against the now
 much smaller CF.

 (*) A long long time ago I seem to recall reading advice about don't ever
 run nodetool compact, but I can't remember why.  Is there any bad long
 term consequence?  Short term there are several:
 -a heavy operation
 -temporary 2x disk space
 -one big SSTable afterwards
 But moving forward, everything is ok right?  CommitLog/MemTable-SStables,
 minor compactions that merge SSTables, etc...  The only flaw I can think of
 is it will take forever until the SSTable minor compactions build up enough
 to consider including the big SSTable in a compaction, making it likely
 I'll have to self manage compactions.



 On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy mark.re...@boxever.comwrote:

 Correct, a tombstone will only be removed after gc_grace period has
 elapsed. The default value is set to 10 days which allows a great deal of
 time for consistency to be achieved prior to deletion. If you are
 operationally confident that you can achieve consistency via anti-entropy
 repairs within a shorter period you can always reduce that 10 day interval.


 Mark


 On Fri, Apr 11, 2014 at 3:16 PM, William Oberman ober...@civicscience.com
  wrote:

 I'm seeing a lot of articles about a dependency between removing
 tombstones and GCGraceSeconds, which might be my problem (I just checked,
 and this CF has GCGraceSeconds of 10 days).


 On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli tbarbu...@gmail.comwrote:

 compaction should take care of it; for me it never worked so I run
 nodetool compaction on every node; that does it.


 2014-04-11 16:05 GMT+02:00 William Oberman ober...@civicscience.com:

 I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
 repair, or time (as in just wait)?

 I had a CF that was more or less storing session information.  After some
 time, we decided that one piece of this information was pointless to track
 (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
 a row).   I wrote a process to remove all of those columns (which again in
 a vast majority of cases had the effect of removing the whole row).

 This CF had ~1 billion rows, so I expect to be left with ~100m rows.
  After I did this mass delete, everything was the same size on disk (which
 I expected, knowing how tombstoning works).  It wasn't 100% clear to me
 what to poke to cause compactions to clear the tombstones.  First I tried
 nodetool cleanup on a candidate node.  But, afterwards the disk usage was
 the same.  Then I tried nodetool repair on that same node.  But again, disk
 usage is still the same.  The CF has no snapshots.

 So, am I misunderstanding something?  Is there another operation to try?
  Do I have to just wait?  I've only done cleanup/re




-- 
Will Oberman
Civic Science, Inc.
6101 Penn Avenue, Fifth Floor
Pittsburgh, PA 15206
(M) 412-480-7835
(E) ober...@civicscience.com


Re: Avoiding email duplicates when registering users

2014-05-11 Thread Charlie Mason
If you are worried about the over head of malicious bulk registration, you
could develop some rate limiting to restrict the sign ups to X signups per
hour from the same IP. Also you could use a CAPTCHA system to make lots of
requests hard to create.

The other thing that works well is automating the clean up. We do an
initial insert into the Users and UsersByEmail tables with a TTL of 12
hours. Only when they complete the signup do we redo the insert without a
TTL making the data permanent.

That works as a security measure and cleans up failed account creation at
next compaction. We also have some counters so we can keep track of failed
/ successful registration ratios.

Hope that helps,

Charlie M


On Wed, May 7, 2014 at 12:19 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Mon, May 5, 2014 at 10:27 AM, Ignacio Martin natx...@gmail.com wrote:


 When a user registers, the server generates a UUID and performs an INSERT
 ... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform
 a SELECT from the same table and see if the read UUID is the same that the
 one we just generated. If it is, we are allowed to INSERT the data in the
 user table, knowing that no other will be doing it.


 INSERT ... IF NOT EXISTS is the correct thing to do here, but you don't
 need to SELECT afterwards.  If the row does exist, the query results will
 show that the insert was not applied and the existing row will be returned.


 --
 Tyler Hobbs
 DataStax http://datastax.com/



Question about READS in a multi DC environment.

2014-05-11 Thread Mark Farnan
Im trying to understand READ load in Cassandra across a multi-datacenter 
cluster.   (Specifically why it seems to be hitting more than one DC) and hope 
someone can help. 

From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to be 
hitting All 3 datacenters, rather than just the one Iím connected to.   I see  
'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DCs in the trace, 
which seems, wrong. 
I have tried every  Consistency level, same result.   This also is same from my 
C# code via the DataStax driver, (where I first noticed the issue). 

Can someone please shed some light on what is occurring ?  Specifically I dont' 
want a query on one DC, going anywhere near the other 2 as a rule, as in 
production,  these DC's will be accross slower links. 


Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with 
queries against it as it has 100k columns in this key for testing). 

cqlsh:kairosdb consistency local_one
Consistency level set to LOCAL_ONE.

cqlsh:kairosdb select * from data_points where key = 
0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a 
limit 1000;

... Some return data  rows listed here which I've removed 

I’m trying to understand READ load in Cassandra across a multi-datacenter 
cluster. 

From what I’m seeing here, a READ, with Consistency LOCAL_ONE,   seems to be 
hitting All 3 datacenters, rather than just the one I’m connected to.   I see  
'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DCs in the trace, 
which seems, wrong. 
I have tried every  Consistency level, same result.   This also is same from my 
C# code via the DataStax driver, (where I first noticed the issue). 

Can someone please shed some light on what is occurring ?  Specifically I dont' 
want a query on one DC, going anywhere near the other 2 as a rule, as in 
production,  these DC's will be accross slower links. 


Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with 
queries against it as it has 100k columns in this key for testing). 

cqlsh:kairosdb consistency local_one
Consistency level set to LOCAL_ONE.

cqlsh:kairosdb select * from data_points where key = 
0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a 
limit 1000;

... Some return data  rows listed here which I've removed 


Query Respose Trace: 

 activity   
  | timestamp| 
source | source_elapsed
--+--++

   execute_cql3_query | 07:18:12,692 | 
192.168.25.111 |  0

Message received from /192.168.25.111 | 07:18:00,706 | 
192.168.25.131 | 50

  Executing single-partition query on data_points | 07:18:00,707 | 
192.168.25.131 |760

 Acquiring sstable references | 07:18:00,707 | 
192.168.25.131 |814

  Merging memtable tombstones | 07:18:00,707 | 
192.168.25.131 |924

 Bloom filter allows skipping sstable 191 | 07:18:00,707 | 
192.168.25.131 |   1050

 Bloom filter allows skipping sstable 190 | 07:18:00,707 | 
192.168.25.131 |   1166

Key cache hit for sstable 189 | 07:18:00,707 | 
192.168.25.131 |   1275

  Seeking to partition beginning in data file | 07:18:00,707 | 
192.168.25.131 |   1293
Skipped 0/3 
non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 | 
192.168.25.131 |   2173

   Merging data from memtables and 1 sstables | 07:18:00,708 | 
192.168.25.131 |   2195

Read 1001 live and 0 tombstoned cells | 

[RELEASE] Achilles 3.0.3 released

2014-05-11 Thread DuyHai Doan
Hello all

 We are happy to announce the release of Achilles 3.0.3. Among the biggest
changes:

 - full support for distributed CAS (lightweight transaction) with
callbacks (http://goo.gl/cyyY4L)
 - upgrade to Cassandra 2.0.7 and Java Driver 2.0.1

 Link to the changelog: http://goo.gl/tKqpFT

  Regards

 Duy Hai DOAN


Re: Question about READS in a multi DC environment.

2014-05-11 Thread graham sanderson
You have a read_repair_chance of 1.0 which is probably why your query is 
hitting all data centers.

On May 11, 2014, at 3:44 PM, Mark Farnan devm...@petrolink.com wrote:

 Im trying to understand READ load in Cassandra across a multi-datacenter 
 cluster.   (Specifically why it seems to be hitting more than one DC) and 
 hope someone can help. 
 
 From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to be 
 hitting All 3 datacenters, rather than just the one Iím connected to.   I see 
  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DCs in the 
 trace, which seems, wrong. 
 I have tried every  Consistency level, same result.   This also is same from 
 my C# code via the DataStax driver, (where I first noticed the issue). 
 
 Can someone please shed some light on what is occurring ?  Specifically I 
 dont' want a query on one DC, going anywhere near the other 2 as a rule, as 
 in production,  these DC's will be accross slower links. 
 
 
 Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with 
 queries against it as it has 100k columns in this key for testing). 
 
 cqlsh:kairosdb consistency local_one
 Consistency level set to LOCAL_ONE.
 
 cqlsh:kairosdb select * from data_points where key = 
 0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a
  limit 1000;
 
 ... Some return data  rows listed here which I've removed 
 
 CassandraQuery.txt
 Query Respose Trace: 
 
 activity  
| timestamp| 
 source | source_elapsed
 --+--++
   
 execute_cql3_query | 07:18:12,692 | 
 192.168.25.111 |  0
   
  Message received from /192.168.25.111 | 07:18:00,706 | 
 192.168.25.131 | 50
   
Executing single-partition query on data_points | 07:18:00,707 | 
 192.168.25.131 |760
   
   Acquiring sstable references | 07:18:00,707 | 
 192.168.25.131 |814
   
Merging memtable tombstones | 07:18:00,707 | 
 192.168.25.131 |924
   
   Bloom filter allows skipping sstable 191 | 07:18:00,707 | 
 192.168.25.131 |   1050
   
   Bloom filter allows skipping sstable 190 | 07:18:00,707 | 
 192.168.25.131 |   1166
   
  Key cache hit for sstable 189 | 07:18:00,707 | 
 192.168.25.131 |   1275
   
Seeking to partition beginning in data file | 07:18:00,707 | 
 192.168.25.131 |   1293
Skipped 0/3 
 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 
 | 192.168.25.131 |   2173
   
 Merging data from memtables and 1 sstables | 07:18:00,708 | 
 192.168.25.131 |   2195
   
  Read 1001 live and 0 tombstoned cells | 07:18:00,709 | 
 192.168.25.131 |   3259
   
  Enqueuing response to /192.168.25.111 | 07:18:00,710 | 
 192.168.25.131 |   4006
   
 Sending message to /192.168.25.111 | 07:18:00,710 | 
 192.168.25.131 |   4210
 Parsing select * from data_points where key = 
 0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a
  limit 1000; | 07:18:12,692 | 192.168.25.111 | 52
   
Preparing statement | 07:18:12,692 | 
 192.168.25.111 |257
   
  

Re: Cyclop - CQL web based editor has been released!

2014-05-11 Thread graham sanderson
Looks cool - giving it a try now (note FYI when building, 
TestDataConverter.java line 46 assumes a specific time zone)

On May 11, 2014, at 12:41 AM, Maciej Miklas mac.mik...@gmail.com wrote:

 Hi everybody,
 
 I am aware that this mailing list is meant for Cassandra users, but I’ve 
 developed something that is strictly related to Cassandra, so I tough that it 
 might be interesting for some of you. 
 I’ve already sent one email several months ago, but since then a lot of 
 things has changed!
 
 Cyclop is web based CQL editor - you can deploy it in web container and use 
 it’s web interface to execute CQL queries or to import/export data. 
 There is also live deployment, so you can try it out immediately. Of course 
 the whole thing is open source.
 
 Hier is the project link containing all details: 
 https://github.com/maciejmiklas/cyclop
 
 Regards,
 Maciej



smime.p7s
Description: S/MIME cryptographic signature


Fwd: Fw: Webinar - NoSQL Landscape and a Solution to Polyglot Persistence

2014-05-11 Thread Vivek Mishra
Check out this webinar for

1) Nosql  Landscape
2) Building Kundera powered app
3) Polyglot persistence!

-Vivek

-Vivek

  On Thursday, May 8, 2014 5:52 PM, Vivek Mishra vivek.mis...@impetus.co.in
wrote:


 *From:* Pankaj Bagzai
*Sent:* Wednesday, April 30, 2014 8:50 PM
*To:* df-all; Account Management; Asheesh Mangla; Gerard Das; Larry
Pearson; Anand Raman; Anand Venugopal; Presales-Support; Mike Harden; Ray
Cade
*Subject:* Webinar - NoSQL Landscape and a Solution to Polyglot Persistence

Please share with your network.

Best Regards,
Pankaj Bagzai


Having trouble reading this email? View it in your
browser.https://www.leadformix.com/ef1/preview_campaign.php?em=sfimpe...@impetus.co.incmpid=498831269e56653d1086



  Webinar




 [image: Impetus
Technologies]http://www.impetus.com/?utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1


  NoSQL Landscape and a Solution to Polyglot Persistence

May 9‚ 2014 (9:30 am PT/ 12:30 pm ET)
Duration: 45 mins
 
http://www.impetus.com/webinar?eventid=78utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1


Hi Pankaj,

Is your organization planning to migrate to / acquire a NoSQL technology
but struggling to do so?

Does your team need to invest in evaluating many NoSQL options?

Polyglot use of NoSQL with / without RDBMS can further complicate the NoSQL
adoption process.

Before making a commitment, it is important to consider the business
opportunity and the technology need that various NoSQL databases can
support. Technology selection is often governed by the ease of working and
APIs offered.

Join Impetus experts where they will share a solution to these challenges
based on the experience from creating a widely adopted polyglot
client/object-mapper for NoSQL datastores and also working through the
NoSQL technology landscape for several customers.

During this webinar you will learn about:
   •
 When and why you should consider NoSQL
   •
 Considerations when migrating to NoSQLs or a combination with RDBMS
   •
 NoSQL options and APIs available
   •
 A fast and low cost solution to Polyglot Persistence



  *Register Here
http://www.impetus.com/webinar?eventid=78utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1*
 *Share this webinar*
[image: Share on
Linkedin]https://www.linkedin.com/shareArticle?mini=trueurl=http://www.impetus.com/webinar?eventid=77title=Real-time%20Streaming%20Analytics:%20Business%20Value,%20Use%20Cases%20and%20Architectural%20Considerationssummary=As%20IT%20and%20line-of-business%20executives%20begin%20to%20operationalize%20Hadoop%20and%20MPP%20based%20batch%20Big%20Data%20analytics,%20it%27s%20time%20to%20prepare%20for%20the%20next%20wave%20of%20innovation%20in%20data%20processing.%0A%0AJoin%20this%20webinar%20on%20analytics%20over%20real-time%20streaming%20data.source=[image:
Share on 
Twitter]https://twitter.com/home?status=Hear%20Impetus%20Experts%20on%20%27%23NoSQL%20Landscape%20and%20a%20Solution%20to%20Polyglot%20Persistence%27%20May%209%20%23Bigdata%20http://www.impetus.com/webinar?eventid=78[image:
Facebook]https://www.facebook.com/sharer/sharer.php?u=http://www.impetus.com/webinar?eventid=78

   Speakers-

*Vivek Mishra*
Lead Engineer, Big Data RD
(Impetus Technologies)


*Chhavi Gangwal*
Lead Engineer, Big Data RD
(Impetus Technologies)


*Larry Pearson*
VP of Marketing
(Impetus Technologies)
[image: ---]
 Related webcasts


  •
 Leveraging NoSQL to Implement Real-time Data
Architectureshttp://www.impetus.com/webinar?eventid=72utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1


  •
 Resolving the Big Data ROI
Dilemmahttp://www.impetus.com/webinar?eventid=71utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1


  •
 Real-time Predictive Analytics for
Manufacturinghttp://www.impetus.com/webinar?eventid=70utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1










Impetus Technologies, Inc. - 720 University Avenue, Suite 130 ,Los Gatos,
CA 95032, USA


--






NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.