Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Jan Kesten
Hi Rahul, just an idea, did you have a look at the data directorys on disk (/var/lib/cassandra/data)? It could be that there are some from old keyspaces that have been deleted and snapshoted before. Try something like "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming your

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Rahul Ramesh
Thanks for your suggestion. Compaction was happening on one of the large tables. The disk space did not decrease much after the compaction. So I ran an external compaction. The disk space decreased by around 10%. However it is still consuming close to 750Gb for load of 250Gb. I even restarted cas

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Joseph Tech
Hi Jack, I didnt exactly understand your suggestion. Could you please share an example? Thanks, Joseph On Thu, Jan 14, 2016 at 10:21 AM, Jack Krupansky wrote: > For a nested object you can just concatenate the sequence of names with > dots or some other separator and use that for each leaf val

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Jack Krupansky
For a nested object you can just concatenate the sequence of names with dots or some other separator and use that for each leaf value of the nested tree. -- Jack Krupansky On Wed, Jan 13, 2016 at 11:40 PM, Joseph Tech wrote: > Thanks, Field Transformers is exactly what i was looking for. Mine i

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Joseph Tech
Thanks, Field Transformers is exactly what i was looking for. Mine is a somewhat nested object, so will need to see how complex the transformer would get, and if it would become a maintenance hassle later on; will try this out and share feedback. -Joseph On Wed, Jan 13, 2016 at 8:31 PM, Russell B

Re: Cassandra Performance on a Single Machine

2016-01-13 Thread Anurag Khandelwal
Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassandra as an end-to-end distributed system, but to understand a break down of the performance. For instance, if we understand the performance characteristics that we can expect from a single machine cassandra ins

Re: max connection per user

2016-01-13 Thread oleg yusim
Brian - absolutely. To give you are brief description of what I'm doing. I'm working for VMware as security architect, and they tasked me with creating a STIG (working with DISA ) for Cassandra DB. To create a STIG I would walk through the Database SRG security controls and assess them against Cas

Re: max connection per user

2016-01-13 Thread Bryan Cheng
Are you actively exposing your database to users outside of your organization, or are you just asking about security best practices? If you mean the former, this isn't really a common use case and there isn't a huge amount out of the box that Cassandra will do to help. If you're just asking about

Re: max connection per user

2016-01-13 Thread oleg yusim
OK Rob, I see what you saying. Well, let's dive into the long questions and answers at this case a bit: 1) Is there any other approach Cassandra currently utilizes to mitigate DoS attacks? 2) How about max connection per DB? I know, Cassandra has this parameter on JDBC driver configuration, but wh

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Ok. I saw dropped mutations on your cluster and full gc is a common cause for that.Can you just search the word GCInspector in system.log and share the frequency of minor and full gc. Moreover, are you printing promotion failures in gc logs?? Why full gc ia getting triggered??promotion failures

Re: max connection per user

2016-01-13 Thread Robert Coli
On Wed, Jan 13, 2016 at 1:41 PM, oleg yusim wrote: > Quick question, here: does Cassandra have a configuration switch to limit > number of connections per user (protection of DoS attack, security)? > Quick answer : no. =Rob

Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-13 Thread Richard L. Burton III
I was hoping that the cqlsh project would expose a class that you can feed is a source file via Java. The parsers in these other projects don't properly parse CQL. e.g., when you encounter a semicolon within a string, ignore it and continue on looking for the end of the string. I ended up having

max connection per user

2016-01-13 Thread oleg yusim
Greetings, Quick question, here: does Cassandra have a configuration switch to limit number of connections per user (protection of DoS attack, security)? Thanks, Oleg

Re: Help debugging a very slow query

2016-01-13 Thread Robert Coli
On Wed, Jan 13, 2016 at 12:40 PM, Bryan Cheng wrote: > 1) What's up with the megapartition? What's the best way to debug this? > Our data model is largely write once, we don't do any updates. We do > DELETE, but the partitions that are giving us issues haven't been removed. > We had some suspicio

Re: Help debugging a very slow query

2016-01-13 Thread Jeff Jirsa
Very large partitions create a lot of garbage during reads: https://issues.apache.org/jira/browse/CASSANDRA-9754 - you will see significant GC pauses trying to read from large enough partitions. I suspect GC, though it’s odd that you’re unable to see it. From: Bryan Cheng Reply-To: "user@

Help debugging a very slow query

2016-01-13 Thread Bryan Cheng
Hi list, Would appreciate some insight into some irregular performance we're seeing. We have a column family that has become problematic recently. We've noticed a few queries take enormous amounts of time, and seem to clog up read resources on the machine (read pending tasks pile up, then immedia

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
I think I was incorrect in assuming GC wasn't an issue due to the lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked differences, though comparing the startup flags on the two machines show the GC config is identical.: $ jstat -gcutil S0 S1 E O P Y

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Node 2 has slightly higher data but that should be ok. Not sure how read ops are so high when no IO intensive activity such as repair and compaction is running on node 3.May be you can try investigating logs to see whats happening. Others on the mailing list could also share their views on the si

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
Hi Anuj, Below is the output of nodetool status. The nodes were replaced following the instructions in Datastax documentation for replacing running nodes since the nodes were running fine, it was that the servers had been incorrectly initialised and they thus had less disk space. The status below

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Hi, Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct? If a node is in bad shape and not working, adding new nod

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
Hi all, We’ve spent a few days running things but are in the same position. To add some more flavour: - We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues - Nodes 2 & 3 are much newer than node 1. These two no

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Russell Bradberry
You can use the full text wildcard search as mentioned. However, if you need something more specific like certain fields in the JSON indexed, you can use DSE SOLR field transformers. http://www.datastax.com/dev/blog/dse-field-transformers From: DuyHai Doan Reply-To: Date: Wednesday, Janua

Re: Sorting & pagination in apache cassandra 2.1

2016-01-13 Thread Narendra Sharma
In the example you gave the primary key user _ name is the row key. Since the default partition is random you are getting rows in random order. Since each row no clustering column there is no further grouping of data. Or in simple terms each row has one record and is being returned ordered by colu

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread DuyHai Doan
Try SELECT * FROM your_table WHERE solr_query='json:"*100 ABC Street*"'; Warning: since you're storing in JSON format, searching data inside a JSON is equivalent to a wildcard seach *xxx* and it is quite expensive, even for full text search engines like Solr On Wed, Jan 13, 2016 at 2:50 PM, Jose

Re: Spark Cassandra Java Connector: records missing despite consistency=ALL

2016-01-13 Thread Alex Popescu
Dennis, You'll have better chances to get an answer on the spark-cassandra-connector mailing list https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user or on IRC #spark-cassandra-connector On Wed, Jan 13, 2016 at 4:17 AM, Dennis Birkholz wrote: > Hi together, > > we

Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Joseph Tech
Hi, Is it possible in DSE Cassandra Solr to search for JSON content within a column? We store a complex JSON in a column of type "text", very simplified version below. { "userId": "user100", "addressList": [{ "addressId": "100", "address": "100 ABC Street" }], "userName": "user11" } In this, can

Spark Cassandra Java Connector: records missing despite consistency=ALL

2016-01-13 Thread Dennis Birkholz
Hi together, we Cassandra to log event data and process it every 15 minutes with Spark. We are using the Cassandra Java Connector for Spark. Randomly our Spark runs produce too few output records because no data is returned from Cassandra for a several minutes window of input data. When quer

Re: Node stuck when joining a Cassandra 2.2.0 cluster

2016-01-13 Thread Carlos Alonso
Hi Robert. I'm thinking of upgrading hardware in place. Can you please elaborate a bit more on how to use the auto_bootstrap=false + hibernate repair technique? Cheers! Carlos Alonso | Software Engineer | @calonso On 6 January 2016 at 11:10, Herbert Fischer wrote:

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Carlos Rolo
You can check if the snapshot exists in the snapshot folder. Repairs stream sstables over, than can temporary increase disk space. But I think Carlos Alonso might be correct. Running compactions might be the issue. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Carlos Alonso
I'd have a look also at possible running compactions. If you have big column families with STCS then large compactions may be happening. Check it with nodetool compactionstats Carlos Alonso | Software Engineer | @calonso On 13 January 2016 at 05:22, Kevin O'Connor