How long are expired values actually returned?
Hi, I'm using the TTL feature for my application. In my tests, when using a TTL of 5, the inserted rows are still returned after 7 seconds, and after 70 seconds. Is this normal or am I doing something wrong?. Kind Regards, Sebastian signature.asc Description: OpenPGP digital signature
Re: Cyclop - CQL web based editor has been released!
Really nice initiative. Thank you Maciej On Sun, May 11, 2014 at 7:41 AM, Maciej Miklas mac.mik...@gmail.com wrote: Hi everybody, I am aware that this mailing list is meant for Cassandra users, but I’ve developed something that is strictly related to Cassandra, so I tough that it might be interesting for some of you. I’ve already sent one email several months ago, but since then a lot of things has changed! Cyclop is web based CQL editor - you can deploy it in web container and use it’s web interface to execute CQL queries or to import/export data. There is also live deployment, so you can try it out immediately. Of course the whole thing is open source. Hier is the project link containing all details: https://github.com/maciejmiklas/cyclop Regards, Maciej
Re: clearing tombstones?
Not an expert, just a user of cassandra. For me, before was a cf with a set of files (I forget the official naming system, so I'll make up my own): A0 A1 ... AN During: A0 A1 ... AN B0 Where B0 is the union of Ai. Due to tombstones, mutations, etc. B0 is at most 2x, but also probably close to 2x (unless you are all tombstones, like me). After B0 Since cassandra can clean up Ai. Not sure when this happens. Not sure what state you are in above. Sounds like between during and after. Will On Thursday, May 8, 2014, Ruchir Jha ruchir@gmail.com wrote: I tried to do this, however the doubling in disk space is not temporary as you state in your note. What am I missing? On Fri, Apr 11, 2014 at 10:44 AM, William Oberman ober...@civicscience.comjavascript:_e(%7B%7D,'cvml','ober...@civicscience.com'); wrote: So, if I was impatient and just wanted to make this happen now, I could: 1.) Change GCGraceSeconds of the CF to 0 2.) run nodetool compact (*) 3.) Change GCGraceSeconds of the CF back to 10 days Since I have ~900M tombstones, even if I miss a few due to impatience, I don't care *that* much as I could re-run my clean up tool against the now much smaller CF. (*) A long long time ago I seem to recall reading advice about don't ever run nodetool compact, but I can't remember why. Is there any bad long term consequence? Short term there are several: -a heavy operation -temporary 2x disk space -one big SSTable afterwards But moving forward, everything is ok right? CommitLog/MemTable-SStables, minor compactions that merge SSTables, etc... The only flaw I can think of is it will take forever until the SSTable minor compactions build up enough to consider including the big SSTable in a compaction, making it likely I'll have to self manage compactions. On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy mark.re...@boxever.comwrote: Correct, a tombstone will only be removed after gc_grace period has elapsed. The default value is set to 10 days which allows a great deal of time for consistency to be achieved prior to deletion. If you are operationally confident that you can achieve consistency via anti-entropy repairs within a shorter period you can always reduce that 10 day interval. Mark On Fri, Apr 11, 2014 at 3:16 PM, William Oberman ober...@civicscience.com wrote: I'm seeing a lot of articles about a dependency between removing tombstones and GCGraceSeconds, which might be my problem (I just checked, and this CF has GCGraceSeconds of 10 days). On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli tbarbu...@gmail.comwrote: compaction should take care of it; for me it never worked so I run nodetool compaction on every node; that does it. 2014-04-11 16:05 GMT+02:00 William Oberman ober...@civicscience.com: I'm wondering what will clear tombstoned rows? nodetool cleanup, nodetool repair, or time (as in just wait)? I had a CF that was more or less storing session information. After some time, we decided that one piece of this information was pointless to track (and was 90%+ of the columns, and in 99% of those cases was ALL columns for a row). I wrote a process to remove all of those columns (which again in a vast majority of cases had the effect of removing the whole row). This CF had ~1 billion rows, so I expect to be left with ~100m rows. After I did this mass delete, everything was the same size on disk (which I expected, knowing how tombstoning works). It wasn't 100% clear to me what to poke to cause compactions to clear the tombstones. First I tried nodetool cleanup on a candidate node. But, afterwards the disk usage was the same. Then I tried nodetool repair on that same node. But again, disk usage is still the same. The CF has no snapshots. So, am I misunderstanding something? Is there another operation to try? Do I have to just wait? I've only done cleanup/re -- Will Oberman Civic Science, Inc. 6101 Penn Avenue, Fifth Floor Pittsburgh, PA 15206 (M) 412-480-7835 (E) ober...@civicscience.com
Re: Avoiding email duplicates when registering users
If you are worried about the over head of malicious bulk registration, you could develop some rate limiting to restrict the sign ups to X signups per hour from the same IP. Also you could use a CAPTCHA system to make lots of requests hard to create. The other thing that works well is automating the clean up. We do an initial insert into the Users and UsersByEmail tables with a TTL of 12 hours. Only when they complete the signup do we redo the insert without a TTL making the data permanent. That works as a security measure and cleans up failed account creation at next compaction. We also have some counters so we can keep track of failed / successful registration ratios. Hope that helps, Charlie M On Wed, May 7, 2014 at 12:19 AM, Tyler Hobbs ty...@datastax.com wrote: On Mon, May 5, 2014 at 10:27 AM, Ignacio Martin natx...@gmail.com wrote: When a user registers, the server generates a UUID and performs an INSERT ... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform a SELECT from the same table and see if the read UUID is the same that the one we just generated. If it is, we are allowed to INSERT the data in the user table, knowing that no other will be doing it. INSERT ... IF NOT EXISTS is the correct thing to do here, but you don't need to SELECT afterwards. If the row does exist, the query results will show that the insert was not applied and the existing row will be returned. -- Tyler Hobbs DataStax http://datastax.com/
Question about READS in a multi DC environment.
Im trying to understand READ load in Cassandra across a multi-datacenter cluster. (Specifically why it seems to be hitting more than one DC) and hope someone can help. From what Iím seeing here, a READ, with Consistency LOCAL_ONE, seems to be hitting All 3 datacenters, rather than just the one Iím connected to. I see 'Read 101 live and 0 tombstoned cells' from EACH of the 3 DCs in the trace, which seems, wrong. I have tried every Consistency level, same result. This also is same from my C# code via the DataStax driver, (where I first noticed the issue). Can someone please shed some light on what is occurring ? Specifically I dont' want a query on one DC, going anywhere near the other 2 as a rule, as in production, these DC's will be accross slower links. Query: (NOTE: Whilst this uses a kairosdb table, i'm just playing with queries against it as it has 100k columns in this key for testing). cqlsh:kairosdb consistency local_one Consistency level set to LOCAL_ONE. cqlsh:kairosdb select * from data_points where key = 0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a limit 1000; ... Some return data rows listed here which I've removed Im trying to understand READ load in Cassandra across a multi-datacenter cluster. From what Im seeing here, a READ, with Consistency LOCAL_ONE, seems to be hitting All 3 datacenters, rather than just the one Im connected to. I see 'Read 101 live and 0 tombstoned cells' from EACH of the 3 DCs in the trace, which seems, wrong. I have tried every Consistency level, same result. This also is same from my C# code via the DataStax driver, (where I first noticed the issue). Can someone please shed some light on what is occurring ? Specifically I dont' want a query on one DC, going anywhere near the other 2 as a rule, as in production, these DC's will be accross slower links. Query: (NOTE: Whilst this uses a kairosdb table, i'm just playing with queries against it as it has 100k columns in this key for testing). cqlsh:kairosdb consistency local_one Consistency level set to LOCAL_ONE. cqlsh:kairosdb select * from data_points where key = 0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a limit 1000; ... Some return data rows listed here which I've removed Query Respose Trace: activity | timestamp| source | source_elapsed --+--++ execute_cql3_query | 07:18:12,692 | 192.168.25.111 | 0 Message received from /192.168.25.111 | 07:18:00,706 | 192.168.25.131 | 50 Executing single-partition query on data_points | 07:18:00,707 | 192.168.25.131 |760 Acquiring sstable references | 07:18:00,707 | 192.168.25.131 |814 Merging memtable tombstones | 07:18:00,707 | 192.168.25.131 |924 Bloom filter allows skipping sstable 191 | 07:18:00,707 | 192.168.25.131 | 1050 Bloom filter allows skipping sstable 190 | 07:18:00,707 | 192.168.25.131 | 1166 Key cache hit for sstable 189 | 07:18:00,707 | 192.168.25.131 | 1275 Seeking to partition beginning in data file | 07:18:00,707 | 192.168.25.131 | 1293 Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 | 192.168.25.131 | 2173 Merging data from memtables and 1 sstables | 07:18:00,708 | 192.168.25.131 | 2195 Read 1001 live and 0 tombstoned cells |
[RELEASE] Achilles 3.0.3 released
Hello all We are happy to announce the release of Achilles 3.0.3. Among the biggest changes: - full support for distributed CAS (lightweight transaction) with callbacks (http://goo.gl/cyyY4L) - upgrade to Cassandra 2.0.7 and Java Driver 2.0.1 Link to the changelog: http://goo.gl/tKqpFT Regards Duy Hai DOAN
Re: Question about READS in a multi DC environment.
You have a read_repair_chance of 1.0 which is probably why your query is hitting all data centers. On May 11, 2014, at 3:44 PM, Mark Farnan devm...@petrolink.com wrote: Im trying to understand READ load in Cassandra across a multi-datacenter cluster. (Specifically why it seems to be hitting more than one DC) and hope someone can help. From what Iím seeing here, a READ, with Consistency LOCAL_ONE, seems to be hitting All 3 datacenters, rather than just the one Iím connected to. I see 'Read 101 live and 0 tombstoned cells' from EACH of the 3 DCs in the trace, which seems, wrong. I have tried every Consistency level, same result. This also is same from my C# code via the DataStax driver, (where I first noticed the issue). Can someone please shed some light on what is occurring ? Specifically I dont' want a query on one DC, going anywhere near the other 2 as a rule, as in production, these DC's will be accross slower links. Query: (NOTE: Whilst this uses a kairosdb table, i'm just playing with queries against it as it has 100k columns in this key for testing). cqlsh:kairosdb consistency local_one Consistency level set to LOCAL_ONE. cqlsh:kairosdb select * from data_points where key = 0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a limit 1000; ... Some return data rows listed here which I've removed CassandraQuery.txt Query Respose Trace: activity | timestamp| source | source_elapsed --+--++ execute_cql3_query | 07:18:12,692 | 192.168.25.111 | 0 Message received from /192.168.25.111 | 07:18:00,706 | 192.168.25.131 | 50 Executing single-partition query on data_points | 07:18:00,707 | 192.168.25.131 |760 Acquiring sstable references | 07:18:00,707 | 192.168.25.131 |814 Merging memtable tombstones | 07:18:00,707 | 192.168.25.131 |924 Bloom filter allows skipping sstable 191 | 07:18:00,707 | 192.168.25.131 | 1050 Bloom filter allows skipping sstable 190 | 07:18:00,707 | 192.168.25.131 | 1166 Key cache hit for sstable 189 | 07:18:00,707 | 192.168.25.131 | 1275 Seeking to partition beginning in data file | 07:18:00,707 | 192.168.25.131 | 1293 Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 | 192.168.25.131 | 2173 Merging data from memtables and 1 sstables | 07:18:00,708 | 192.168.25.131 | 2195 Read 1001 live and 0 tombstoned cells | 07:18:00,709 | 192.168.25.131 | 3259 Enqueuing response to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 | 4006 Sending message to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 | 4210 Parsing select * from data_points where key = 0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f6963653a limit 1000; | 07:18:12,692 | 192.168.25.111 | 52 Preparing statement | 07:18:12,692 | 192.168.25.111 |257
Re: Cyclop - CQL web based editor has been released!
Looks cool - giving it a try now (note FYI when building, TestDataConverter.java line 46 assumes a specific time zone) On May 11, 2014, at 12:41 AM, Maciej Miklas mac.mik...@gmail.com wrote: Hi everybody, I am aware that this mailing list is meant for Cassandra users, but I’ve developed something that is strictly related to Cassandra, so I tough that it might be interesting for some of you. I’ve already sent one email several months ago, but since then a lot of things has changed! Cyclop is web based CQL editor - you can deploy it in web container and use it’s web interface to execute CQL queries or to import/export data. There is also live deployment, so you can try it out immediately. Of course the whole thing is open source. Hier is the project link containing all details: https://github.com/maciejmiklas/cyclop Regards, Maciej smime.p7s Description: S/MIME cryptographic signature
Fwd: Fw: Webinar - NoSQL Landscape and a Solution to Polyglot Persistence
Check out this webinar for 1) Nosql Landscape 2) Building Kundera powered app 3) Polyglot persistence! -Vivek -Vivek On Thursday, May 8, 2014 5:52 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote: *From:* Pankaj Bagzai *Sent:* Wednesday, April 30, 2014 8:50 PM *To:* df-all; Account Management; Asheesh Mangla; Gerard Das; Larry Pearson; Anand Raman; Anand Venugopal; Presales-Support; Mike Harden; Ray Cade *Subject:* Webinar - NoSQL Landscape and a Solution to Polyglot Persistence Please share with your network. Best Regards, Pankaj Bagzai Having trouble reading this email? View it in your browser.https://www.leadformix.com/ef1/preview_campaign.php?em=sfimpe...@impetus.co.incmpid=498831269e56653d1086 Webinar [image: Impetus Technologies]http://www.impetus.com/?utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1 NoSQL Landscape and a Solution to Polyglot Persistence May 9‚ 2014 (9:30 am PT/ 12:30 pm ET) Duration: 45 mins http://www.impetus.com/webinar?eventid=78utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1 Hi Pankaj, Is your organization planning to migrate to / acquire a NoSQL technology but struggling to do so? Does your team need to invest in evaluating many NoSQL options? Polyglot use of NoSQL with / without RDBMS can further complicate the NoSQL adoption process. Before making a commitment, it is important to consider the business opportunity and the technology need that various NoSQL databases can support. Technology selection is often governed by the ease of working and APIs offered. Join Impetus experts where they will share a solution to these challenges based on the experience from creating a widely adopted polyglot client/object-mapper for NoSQL datastores and also working through the NoSQL technology landscape for several customers. During this webinar you will learn about: • When and why you should consider NoSQL • Considerations when migrating to NoSQLs or a combination with RDBMS • NoSQL options and APIs available • A fast and low cost solution to Polyglot Persistence *Register Here http://www.impetus.com/webinar?eventid=78utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1* *Share this webinar* [image: Share on Linkedin]https://www.linkedin.com/shareArticle?mini=trueurl=http://www.impetus.com/webinar?eventid=77title=Real-time%20Streaming%20Analytics:%20Business%20Value,%20Use%20Cases%20and%20Architectural%20Considerationssummary=As%20IT%20and%20line-of-business%20executives%20begin%20to%20operationalize%20Hadoop%20and%20MPP%20based%20batch%20Big%20Data%20analytics,%20it%27s%20time%20to%20prepare%20for%20the%20next%20wave%20of%20innovation%20in%20data%20processing.%0A%0AJoin%20this%20webinar%20on%20analytics%20over%20real-time%20streaming%20data.source=[image: Share on Twitter]https://twitter.com/home?status=Hear%20Impetus%20Experts%20on%20%27%23NoSQL%20Landscape%20and%20a%20Solution%20to%20Polyglot%20Persistence%27%20May%209%20%23Bigdata%20http://www.impetus.com/webinar?eventid=78[image: Facebook]https://www.facebook.com/sharer/sharer.php?u=http://www.impetus.com/webinar?eventid=78 Speakers- *Vivek Mishra* Lead Engineer, Big Data RD (Impetus Technologies) *Chhavi Gangwal* Lead Engineer, Big Data RD (Impetus Technologies) *Larry Pearson* VP of Marketing (Impetus Technologies) [image: ---] Related webcasts • Leveraging NoSQL to Implement Real-time Data Architectureshttp://www.impetus.com/webinar?eventid=72utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1 • Resolving the Big Data ROI Dilemmahttp://www.impetus.com/webinar?eventid=71utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1 • Real-time Predictive Analytics for Manufacturinghttp://www.impetus.com/webinar?eventid=70utm_source=Invite1utm_medium=Emailutm_campaign=PolyglotwebinarMay2014t=1 Impetus Technologies, Inc. - 720 University Avenue, Suite 130 ,Los Gatos, CA 95032, USA -- NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.