Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Carlos Rolo
You can check if the snapshot exists in the snapshot folder. Repairs stream sstables over, than can temporary increase disk space. But I think Carlos Alonso might be correct. Running compactions might be the issue. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Carlos Alonso
I'd have a look also at possible running compactions. If you have big column families with STCS then large compactions may be happening. Check it with nodetool compactionstats Carlos Alonso | Software Engineer | @calonso On 13 January 2016 at 05:22, Kevin O'Connor

Re: Node stuck when joining a Cassandra 2.2.0 cluster

2016-01-13 Thread Carlos Alonso
Hi Robert. I'm thinking of upgrading hardware in place. Can you please elaborate a bit more on how to use the auto_bootstrap=false + hibernate repair technique? Cheers! Carlos Alonso | Software Engineer | @calonso On 6 January 2016 at 11:10, Herbert Fischer

Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Joseph Tech
Hi, Is it possible in DSE Cassandra Solr to search for JSON content within a column? We store a complex JSON in a column of type "text", very simplified version below. { "userId": "user100", "addressList": [{ "addressId": "100", "address": "100 ABC Street" }], "userName": "user11" } In this,

Spark Cassandra Java Connector: records missing despite consistency=ALL

2016-01-13 Thread Dennis Birkholz
Hi together, we Cassandra to log event data and process it every 15 minutes with Spark. We are using the Cassandra Java Connector for Spark. Randomly our Spark runs produce too few output records because no data is returned from Cassandra for a several minutes window of input data. When

Re: Spark Cassandra Java Connector: records missing despite consistency=ALL

2016-01-13 Thread Alex Popescu
Dennis, You'll have better chances to get an answer on the spark-cassandra-connector mailing list https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user or on IRC #spark-cassandra-connector On Wed, Jan 13, 2016 at 4:17 AM, Dennis Birkholz wrote:

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Hi, Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct? If a node is in bad shape and not working, adding new

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
Hi Anuj, Below is the output of nodetool status. The nodes were replaced following the instructions in Datastax documentation for replacing running nodes since the nodes were running fine, it was that the servers had been incorrectly initialised and they thus had less disk space. The status below

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Node 2 has slightly higher data but that should be ok. Not sure how read ops are so high when no IO intensive activity such as repair and compaction is running on node 3.May be you can try investigating logs to see whats happening. Others on the mailing list could also share their views on the

Re: Help debugging a very slow query

2016-01-13 Thread Jeff Jirsa
Very large partitions create a lot of garbage during reads: https://issues.apache.org/jira/browse/CASSANDRA-9754 - you will see significant GC pauses trying to read from large enough partitions. I suspect GC, though it’s odd that you’re unable to see it. From: Bryan Cheng Reply-To:

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
I think I was incorrect in assuming GC wasn't an issue due to the lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked differences, though comparing the startup flags on the two machines show the GC config is identical.: $ jstat -gcutil S0 S1 E O P

Re: Help debugging a very slow query

2016-01-13 Thread Robert Coli
On Wed, Jan 13, 2016 at 12:40 PM, Bryan Cheng wrote: > 1) What's up with the megapartition? What's the best way to debug this? > Our data model is largely write once, we don't do any updates. We do > DELETE, but the partitions that are giving us issues haven't been

Help debugging a very slow query

2016-01-13 Thread Bryan Cheng
Hi list, Would appreciate some insight into some irregular performance we're seeing. We have a column family that has become problematic recently. We've noticed a few queries take enormous amounts of time, and seem to clog up read resources on the machine (read pending tasks pile up, then

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Rahul Ramesh
Thanks for your suggestion. Compaction was happening on one of the large tables. The disk space did not decrease much after the compaction. So I ran an external compaction. The disk space decreased by around 10%. However it is still consuming close to 750Gb for load of 250Gb. I even restarted

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Joseph Tech
Hi Jack, I didnt exactly understand your suggestion. Could you please share an example? Thanks, Joseph On Thu, Jan 14, 2016 at 10:21 AM, Jack Krupansky wrote: > For a nested object you can just concatenate the sequence of names with > dots or some other separator and

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Jan Kesten
Hi Rahul, just an idea, did you have a look at the data directorys on disk (/var/lib/cassandra/data)? It could be that there are some from old keyspaces that have been deleted and snapshoted before. Try something like "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming

Re: max connection per user

2016-01-13 Thread Robert Coli
On Wed, Jan 13, 2016 at 1:41 PM, oleg yusim wrote: > Quick question, here: does Cassandra have a configuration switch to limit > number of connections per user (protection of DoS attack, security)? > Quick answer : no. =Rob

max connection per user

2016-01-13 Thread oleg yusim
Greetings, Quick question, here: does Cassandra have a configuration switch to limit number of connections per user (protection of DoS attack, security)? Thanks, Oleg

Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-13 Thread Richard L. Burton III
I was hoping that the cqlsh project would expose a class that you can feed is a source file via Java. The parsers in these other projects don't properly parse CQL. e.g., when you encounter a semicolon within a string, ignore it and continue on looking for the end of the string. I ended up having

Re: Cassandra Performance on a Single Machine

2016-01-13 Thread Anurag Khandelwal
Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassandra as an end-to-end distributed system, but to understand a break down of the performance. For instance, if we understand the performance characteristics that we can expect from a single machine cassandra

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Jack Krupansky
For a nested object you can just concatenate the sequence of names with dots or some other separator and use that for each leaf value of the nested tree. -- Jack Krupansky On Wed, Jan 13, 2016 at 11:40 PM, Joseph Tech wrote: > Thanks, Field Transformers is exactly what i

Re: max connection per user

2016-01-13 Thread oleg yusim
OK Rob, I see what you saying. Well, let's dive into the long questions and answers at this case a bit: 1) Is there any other approach Cassandra currently utilizes to mitigate DoS attacks? 2) How about max connection per DB? I know, Cassandra has this parameter on JDBC driver configuration, but

Re: max connection per user

2016-01-13 Thread oleg yusim
Brian - absolutely. To give you are brief description of what I'm doing. I'm working for VMware as security architect, and they tasked me with creating a STIG (working with DISA ) for Cassandra DB. To create a STIG I would walk through the Database SRG security controls and assess them against

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Joseph Tech
Thanks, Field Transformers is exactly what i was looking for. Mine is a somewhat nested object, so will need to see how complex the transformer would get, and if it would become a maintenance hassle later on; will try this out and share feedback. -Joseph On Wed, Jan 13, 2016 at 8:31 PM, Russell

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Ok. I saw dropped mutations on your cluster and full gc is a common cause for that.Can you just search the word GCInspector in system.log and share the frequency of minor and full gc. Moreover, are you printing promotion failures in gc logs?? Why full gc ia getting triggered??promotion failures

Re: max connection per user

2016-01-13 Thread Bryan Cheng
Are you actively exposing your database to users outside of your organization, or are you just asking about security best practices? If you mean the former, this isn't really a common use case and there isn't a huge amount out of the box that Cassandra will do to help. If you're just asking

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread DuyHai Doan
Try SELECT * FROM your_table WHERE solr_query='json:"*100 ABC Street*"'; Warning: since you're storing in JSON format, searching data inside a JSON is equivalent to a wildcard seach *xxx* and it is quite expensive, even for full text search engines like Solr On Wed, Jan 13, 2016 at 2:50 PM,

Re: Sorting & pagination in apache cassandra 2.1

2016-01-13 Thread Narendra Sharma
In the example you gave the primary key user _ name is the row key. Since the default partition is random you are getting rows in random order. Since each row no clustering column there is no further grouping of data. Or in simple terms each row has one record and is being returned ordered by

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Russell Bradberry
You can use the full text wildcard search as mentioned. However, if you need something more specific like certain fields in the JSON indexed, you can use DSE SOLR field transformers. http://www.datastax.com/dev/blog/dse-field-transformers From: DuyHai Doan Reply-To:

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
Hi all, We’ve spent a few days running things but are in the same position. To add some more flavour: - We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues - Nodes 2 & 3 are much newer than node 1. These two