[Cassandra] Running node tool cleanup

2013-06-20 Thread Emalayan Vairavanathan
Hi All, 1) What will happen if I run nodetool cleanup immediately after bringing a new node up (i.e. before the key migration process is completed) ?         Will it cause some race conditions ? Or will it result in some part of the space never be reclaimed ? 2) After adding a new machine,

Re: Reduce Cassandra GC

2013-06-20 Thread Joel Samuelsson
12.3 GB data per node (only one ńode). 16GB RAM. In virtual environment with the CPU specified as 8 cores, average CPU use is close to 0% (basically no load, around 12 requests / sec, mostly from OpsCenter). Average memory use is 4GB. Around 1GB heap used by Cassandra (out of 4GB). 2013/6/19

Get fragments of big files (videos)

2013-06-20 Thread Simon Majou
Hello, If I store a video into a column, how can I get a fragment of it without having to download it entirely ? Is there a way to give an offset on a column ? Do I have to fragment it over a lot of small fixed sizes columns ? Is there any disadvantage to do so ? For example fragment a 10GB file

Re: Get fragments of big files (videos)

2013-06-20 Thread Sachin Sinha
Fragment them in rows, that will help. On 20 June 2013 09:43, Simon Majou si...@majou.org wrote: Hello, If I store a video into a column, how can I get a fragment of it without having to download it entirely ? Is there a way to give an offset on a column ? Do I have to fragment it over a

Re: Get fragments of big files (videos)

2013-06-20 Thread Serge Fonville
Also, after a quick Google. http://wiki.apache.org/cassandra/CassandraLimitations states values cannot exceed 2GB, it also answers you offset question HTH Kind regards/met vriendelijke groet, Serge Fonville http://www.sergefonville.nl Convince Microsoft! They need to add TRUNCATE PARTITION in

Re: Get fragments of big files (videos)

2013-06-20 Thread Simon Majou
Thanks Serge Simon On Thu, Jun 20, 2013 at 10:48 AM, Serge Fonville serge.fonvi...@gmail.com wrote: Also, after a quick Google. http://wiki.apache.org/cassandra/CassandraLimitations states values cannot exceed 2GB, it also answers you offset question HTH Kind regards/met vriendelijke

Re: SQL Injection C* (via CQL Thrift)

2013-06-20 Thread aaron morton
As for the thrift side (i.e. using Hector or Astyanax), anyone have a crafty way to inject something? The only thing I've ever heard of coming close was a thrift bug that allowed a malformed request to crash the server. But that was a while ago

Re: [Cassandra] Expanding a Cassandra cluster

2013-06-20 Thread aaron morton
1) Is there any implication in running nodetool repair immediately after bringing a new node up (before key migration process is completed) ? Will it cause some race conditions ? Or will it result in some part of the space never be reclaimed ? Repair will only be concerned with data

Re: Compaction not running

2013-06-20 Thread aaron morton
nodetool compactionstats, gives pending tasks: 13120 If there are no errors in the log, I would say this is a bug. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 19/06/2013, at 11:41 AM, Franc Carter

Re: Compaction not running

2013-06-20 Thread Franc Carter
On Thu, Jun 20, 2013 at 7:27 PM, aaron morton aa...@thelastpickle.comwrote: nodetool compactionstats, gives pending tasks: 13120 If there are no errors in the log, I would say this is a bug. This happened after the node ran out of file descriptors, so an edge case wouldn't surprise me.

Re: Rolling upgrade from 1.1.12 to 1.2.5 visibility issue

2013-06-20 Thread aaron morton
I once had something like this, looking at your logs I donot think it's the same thing but here is a post on it http://thelastpickle.com/2011/12/15/Anatomy-of-a-Cassandra-Partition/ It's a little different in 1.2 but the GossipDigestAckVerbHandler (and ACK2) should be calling

Re: Dropped mutation messages

2013-06-20 Thread aaron morton
What should be the path to investigate this? Dropped messages are a symptom of other problems. Look for the GCInspector logging lots of ParNew, or the IO system being overloaded, or large (1000's) read or write batches from the client. Cheers - Aaron Morton Freelance

RE: Data not fully replicated with 2 nodes and replication factor 2

2013-06-20 Thread James Lee
Rob, Wei, thank you both for your responses - from what Rob says below my test is a valid one. I've run some additional tests and observed the following: -- I mentioned before that some of the initial writes initially failed and then succeed when the test tool retries them. I've checked that

Re: opscentrer is spying

2013-06-20 Thread Radim Kolar
OpsCenter collects anonymous usage data and reports it back to DataStax. For example, number of nodes, keyspaces, column families, etc. Stat reporting isn't required to run OpsCenter however. To turn this feature off, see the docs here (stat_reporter): You never informed user that installing

Re: opscentrer is spying

2013-06-20 Thread Peter Lin
I use Cassandra, but I don't use OpsCenter. Seems like it would be in everyone's best interest to clearly define what data OpsCenter collects today, what OpsCenter won't collect and a promise to users none of the data will be used without first getting a customer's approval. I can understand the

Confirm with cqlsh of Cassandra-1.2.5, the behavior of the export/import

2013-06-20 Thread hiroshi.kise.rk
Dear everyone. I'm Hiroshi Kise. I will confirm with cqlsh of Cassandra-1.2.5, the behavior of the export / import of data. Using the Copy of cqlsh, the data included the “{“ and “[“ (= CollectionType) case, I think in the export / import process, data integrity is compromised. How about?

Re: opscentrer is spying

2013-06-20 Thread Nick Bailey
Thanks everyone. We always appreciate constructive criticism. Regarding what OpsCenter collects, we completely agree it should be documented more clearly. You can expect to see an update to the documentation later today. I will update this thread once that goes live. Regarding notifying the user

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-20 Thread Wei Zhu
I don't think you can fully trust hintedhandoff, it's more like we are trying our best to deliver it but no guarantee. Even if the hints are guaranteed to be delivered and there will be a delay which is supposed to be part of eventual consistency paradigm. If you want enforce real consistency,

Re: SQL Injection C* (via CQL Thrift)

2013-06-20 Thread Robert Coli
On Thu, Jun 20, 2013 at 2:15 AM, aaron morton aa...@thelastpickle.com wrote: As for the thrift side (i.e. using Hector or Astyanax), anyone have a crafty way to inject something? The only thing I've ever heard of coming close was a thrift bug that allowed a malformed request to crash the

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-20 Thread Robert Coli
On Wed, Jun 19, 2013 at 4:35 PM, Wei Zhu wz1...@yahoo.com wrote: I was not aware of that. So we can avoid repair if there is no hardware failure...I found a blog: http://www.datastax.com/dev/blog/modern-hinted-handoff Well... yes but no. First, I do not believe that hardware failure is

Re: opscentrer is spying

2013-06-20 Thread Alain RODRIGUEZ
Good, fast and appreciated reaction from Datastax. Also thanks to Radim for the warning. Alain, Opscenter-free user. 2013/6/20 Nick Bailey n...@datastax.com Thanks everyone. We always appreciate constructive criticism. Regarding what OpsCenter collects, we completely agree it should be

[Cassandra] Replacing a cassandra node

2013-06-20 Thread Emalayan Vairavanathan
Hi All, I have a question. In the case where replace a cassandra node (call it node A) with another one that has the exact same IP (ie. during a node failure), what exactly should we do?  Currently I understand that we should at least run nodetool repair.  I noticed that the cassandra system

Re: [Cassandra] Replacing a cassandra node

2013-06-20 Thread Robert Coli
On Thu, Jun 20, 2013 at 10:40 AM, Emalayan Vairavanathan svemala...@yahoo.com wrote: In the case where replace a cassandra node (call it node A) with another one that has the exact same IP (ie. during a node failure), what exactly should we do? Currently I understand that we should at least

Re: SQL Injection C* (via CQL Thrift)

2013-06-20 Thread Edward Capriolo
My first interaction with cassandra: ../nodeprobe -p 9160 ... Hum I can't seem to reach it :) Ow its no longer running... You've come along way baby. On Thu, Jun 20, 2013 at 12:59 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jun 20, 2013 at 2:15 AM, aaron morton aa...@thelastpickle.com

block size

2013-06-20 Thread Kanwar Sangha
Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?

Re: block size

2013-06-20 Thread Shahab Yunus
Have you seen this? http://www.datastax.com/dev/blog/cassandra-file-system-design Regards, Shahab On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha kan...@mavenir.com wrote: Hi – What is the block size for Cassandra ? is it taken from the OS defaults ?

RE: block size

2013-06-20 Thread Kanwar Sangha
Yes. Is that not specific to hadoop with CFS ? I want to know that If I have a data in column of size 500KB, how many IOPS are needed to read that ? (assuming we have key cache enabled) From: Shahab Yunus [mailto:shahab.yu...@gmail.com] Sent: 20 June 2013 14:32 To: user@cassandra.apache.org

Re: block size

2013-06-20 Thread Shahab Yunus
Ok. Though the closest that I can find is this (Aaron Morton's great blog): http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ I would also like to know the answer as, as such, I also haven't came across 'block size' as a core concept (or a concept to be considered while developing with)

Re: opscentrer is spying

2013-06-20 Thread Nick Bailey
As promised, updated docs detailing the data collected by OpsCenter are now live. http://www.datastax.com/docs/opscenter/configure/configure_opscenter_adv#stat-reporter-interval -Nick On Thu, Jun 20, 2013 at 12:34 PM, Alain RODRIGUEZ arodr...@gmail.comwrote: Good, fast and appreciated

Re: opscentrer is spying

2013-06-20 Thread Dave Finnegan
Nick, Just a nit, but is it 'bdp_version' or 'dbp_versions'? - 'bdp_version': A list of the different DataStax Enterprise versions in the cluster. Also, is this report available from OpsCenter? Seems like it would be nice to display a message to the user about what we send, and then

Re: opscentrer is spying

2013-06-20 Thread Nick Bailey
It is 'bdp_version' even though it is a list. That is a bit confusing. Thanks, I've added that additional feedback to our tracker. -Nick On Thu, Jun 20, 2013 at 3:47 PM, Dave Finnegan d...@datastax.com wrote: Nick, Just a nit, but is it 'bdp_version' or 'dbp_versions'? -

net flix priam - use outside of amazon aws

2013-06-20 Thread Marcelo Elias Del Valle
Hello, I Currently use Netflix Priam to create backups of sstables to Amazon S3. I like how it works, as it can create continuous backups as well as snapshots, but I personally don`t like to be tied to a vendor, in this case, Amazon. Does anybody know if there is some similar tool to do

Re: net flix priam - use outside of amazon aws

2013-06-20 Thread Robert Coli
On Thu, Jun 20, 2013 at 2:48 PM, Marcelo Elias Del Valle mvall...@gmail.com wrote: I Currently use Netflix Priam to create backups of sstables to Amazon S3. I like how it works, as it can create continuous backups as well as snapshots, but I personally don`t like to be tied to a vendor, in

Re: Reduce Cassandra GC

2013-06-20 Thread Mohit Anchlia
Can you paste the output of cfstats and cfhistograms? Also try and get histo at 2 diff points 1) when it looks good 2) when it gets slow http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html Look for jmap -histo On Thu, Jun 20, 2013 at 12:27 AM, Joel Samuelsson

Re: opscentrer is spying

2013-06-20 Thread Edward Capriolo
Really? if you care about security, you deny all outbound traffic anyway and phone home software does not work That which is not specifically allowed is denied. That is how we realized quartz does phone home. Understandably most people don't like phone home features, but i do see that knowing

Gossiper in Cassandra using unicast/broadcast/multicast ?

2013-06-20 Thread Jason Tang
Hi We are considering using Cassandra in virtualization environment. I wonder is Cassandra using unicast/broadcast/multicast for node discover or communication? From the code, I find the broadcast address is used for heartbeat in Gossiper.java, but I don't know how actually it works when

Re: Gossiper in Cassandra using unicast/broadcast/multicast ?

2013-06-20 Thread Andrey Ilinykh
Cassandra works very well in EC2 environment. EC2 doesn't support broadcast/multicast. So, you should be fine. Thank you, Andrey On Thu, Jun 20, 2013 at 7:22 PM, Jason Tang ares.t...@gmail.com wrote: Hi We are considering using Cassandra in virtualization environment. I wonder is