Re: CAS operation does not return value on failure

2016-05-04 Thread Jack Krupansky
Probably better to ask this on the Java driver user list. -- Jack Krupansky On Wed, May 4, 2016 at 11:46 AM, horschi <hors...@gmail.com> wrote: > Hi, > > I am doing some testing on CAS operations and I am frequently having the > issue that my resultset says wasApplied()==

Re: Security assessment of Cassandra

2016-04-26 Thread Jack Krupansky
Just following up... Oleg, have you gotten a satisfactory level of feedback from the community on the security assessment issues? And if there is any sort of final assessment that can be publicly accessed, that would be great. -- Jack Krupansky On Thu, Feb 11, 2016 at 3:29 PM, oleg yusim

Re: Proper use of COUNT

2016-04-19 Thread Jack Krupansky
ould make that not a problem. Or is there a timeout in cqlsh simply because the operation is slow - as opposed to the server reporting an internal timeout? Thanks. -- Jack Krupansky On Tue, Apr 19, 2016 at 12:45 PM, Tyler Hobbs <ty...@datastax.com> wrote: > > On Tue, Apr 19, 2016 a

Re: Proper use of COUNT

2016-04-19 Thread Jack Krupansky
e as a batch-style OLAP operation rather than a real-time OLTP operation... I think. Thanks. -- Jack Krupansky On Tue, Apr 19, 2016 at 12:04 PM, Tyler Hobbs <ty...@datastax.com> wrote: > > On Tue, Apr 19, 2016 at 9:51 AM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: >

Re: Proper use of COUNT

2016-04-19 Thread Jack Krupansky
proportional to the number of rows on all nodes? I mean, you can't dedupe using only partition keys of the coordinator node, right? What I'm wondering is if the usability of COUNT (et al) is memory limited as well as time. Thanks. -- Jack Krupansky On Tue, Apr 19, 2016 at 5:36 AM, Sylvain Lebresne <s

Proper use of COUNT

2016-04-18 Thread Jack Krupansky
companion question is whether COUNT(column_name) has the same limitations and recommendations. It does have to actually fetch the column values as opposed to simply determining the existence of the row, but how consequential that additional processing is, I couldn't say. -- Jack Krupansky

Re: Most stable version?

2016-04-14 Thread Jack Krupansky
to keep your chosen release in production for longer than the older 3.0 releases will be in production. Ultimately, this is a personality test: Are you adventuresome or conservative? To be clear, with the new tick-tock release scheme, 3.5 is designed to be a stable release. -- Jack Krupansky On Thu

Re: Cassandra 2.1.12 Node size

2016-04-14 Thread Jack Krupansky
times of stress. -- Jack Krupansky On Thu, Apr 14, 2016 at 10:14 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > Would adding nodes be the right way to start if I want to get the data per >> node down > > > Yes, if everything else is fine, the last and always avai

Re: performance question

2016-04-12 Thread Jack Krupansky
Facets can be used, and grouping of results as well, in DSE Search (Solr), but there are a lot of different approaches that can be used, depending on the specific user experience you require. -- Jack Krupansky On Tue, Apr 12, 2016 at 9:32 PM, Gross, Daniel <daniel.gr...@intel.com> wrote:

Re: performance question

2016-04-12 Thread Jack Krupansky
full Solr searches, including faceting. The new SASI secondary index feature in Cassandra 3.4 can be used for some more sophisticated searches as well, but it's not quite up to what Stratio and DSE Search can do. -- Jack Krupansky On Tue, Apr 12, 2016 at 8:07 PM, Gross, Daniel <daniel

Re: DSE Search : NPE when executing Solr CQL queries using solr_query

2016-04-12 Thread Jack Krupansky
the FOT is setting an output column value to NULL. Also, see if there is a "Caused By" entry elsewhere in the Java stack trace. -- Jack Krupansky On Tue, Apr 12, 2016 at 6:07 AM, Joseph Tech <jaalex.t...@gmail.com> wrote: > hi, > > I am facing an issue where Solr queries

Re: Large primary keys

2016-04-11 Thread Jack Krupansky
to the document given the document text. -- Jack Krupansky On Mon, Apr 11, 2016 at 7:12 PM, James Carman <ja...@carmanconsulting.com> wrote: > S3 maybe? > > On Mon, Apr 11, 2016 at 7:05 PM Robert Wille <rwi...@fold3.com> wrote: > >> I do realize its kind of a weird use case

Re: 1, 2, 3...

2016-04-11 Thread Jack Krupansky
ng treated as a single row. -- Jack Krupansky On Mon, Apr 11, 2016 at 11:46 AM, Emīls Šolmanis <emils.solma...@gmail.com> wrote: > Wouldn't the "number of keys" part of *nodetool cfstats* run on every > node, summed and divided by replication factor give you a decent > app

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jack Krupansky
d to bite the bullet and re-model your data to exploit the features of CQL rather than fight CQL trying to mimic Thrift per se. In any case, take another shot at framing the problem and then maybe people here can help you out. -- Jack Krupansky On Mon, Apr 11, 2016 at 10:39 AM, Anuj Wadehra <anujw_

Re: 1, 2, 3...

2016-04-11 Thread Jack Krupansky
for q=*:* and that will very quickly return the total row count. I presume that Stratio will handle this fine as well. -- Jack Krupansky On Mon, Apr 11, 2016 at 11:10 AM, <sean_r_dur...@homedepot.com> wrote: > Cassandra is not good for table scan type queries (which count(*) > typicall

1, 2, 3...

2016-04-08 Thread Jack Krupansky
technique? For example, is it more efficient to query the row count one node at a time? And for bonus points: How do you count (CQL) rows for each node? Again, excluding replication. -- Jack Krupansky

Re: Cassandra Single Node Setup Questions

2016-04-07 Thread Jack Krupansky
Not that we aren't enthusiastic about you moving to Cassandra, but it needs to be for the right reasons, and for Cassandra the right reasons are scaling and HA. In case it's not obvious, I would make a really lousy used-car or real-estate/time-share salesman! -- Jack Krupansky On Thu, Apr 7

Re: Cassandra Single Node Setup Questions

2016-04-06 Thread Jack Krupansky
(properly.) -- Jack Krupansky On Wed, Apr 6, 2016 at 10:30 AM, Paco Trujillo <f.truji...@genetwister.nl> wrote: > The fact that there is one single DC does not mean that you do not need > multiples nodes. Without multiples nodes you do not have redundancy (the > nodes fail

Re: Cassandra table limitation

2016-04-05 Thread Jack Krupansky
or a collection of applications which share the same data. If there are multiple applications that don't share the same data, then they absolutely should be on separate clusters. -- Jack Krupansky On Tue, Apr 5, 2016 at 5:40 PM, Kai Wang <dep...@gmail.com> wrote: > Once a while the question ab

Re: How many nodes do we require

2016-03-31 Thread Jack Krupansky
Maybe that's a great definition of a modern distributed cluster: each person (node) has a different notion of priority. I'll wait for the next user email in which they complain that their data is "too stable" (missing updates.) -- Jack Krupansky On Thu, Mar 31, 2016 at 12:04 PM, Jac

Acceptable repair time

2016-03-28 Thread Jack Krupansky
acceptable full repair times for nodes and what the resulting node data size is. What impact vnodes has on these numbers is a bonus question. Thanks! -- Jack Krupansky

Solr and vnodes anyone?

2016-03-28 Thread Jack Krupansky
is whether 64 or even 32 would deliver acceptable query performance? Anybody here have any practical experience on this issue, either testing or even better, in production? Absent any further input, my advice would be to limit DSE Search/Solr to a token count of 64 per node. -- Jack Krupansky

Re: Does saveToCassandra work with Cassandra Lucene plugin ?

2016-03-28 Thread Jack Krupansky
The exception message has an empty column name. Odd. Not sure if that is a bug in the exception code or whether you actually have an empty column name somewhere. Did you use the absolutely exact same commands to create the keyspace, table, and custom index as in the Stratio readme? -- Jack

Re: *** What is the best way to model this JSON *** ??

2016-03-28 Thread Jack Krupansky
range of potential queries? Which are the most common and need to be the fastest? -- Jack Krupansky On Mon, Mar 28, 2016 at 12:10 PM, Lokesh Ceeba - Vendor < lokesh.ce...@walmart.com> wrote: > Hello Team, > >How to design/develop the best data model for this JSON ?

Re: Consistency Level (QUORUM vs LOCAL_QUORUM)

2016-03-27 Thread Jack Krupansky
com/en/cassandra/3.x/cassandra/dml/dmlConfigConsistency.html . In short, Cassandra does indeed guarantee the degree of immediate consistency that you specify (and presumably want.) -- Jack Krupansky On Sun, Mar 27, 2016 at 6:36 PM, Harikrishnan A <hari...@yahoo.com> wrote: > Hello, >

Re: Understanding Cassandra tuning

2016-03-25 Thread Jack Krupansky
.75 MB/sec, which is fairly close to your numbers, so just a little write amplification or spiking or fuzzy math on AWS end might trigger some AWS throttling. -- Jack Krupansky On Fri, Mar 25, 2016 at 11:42 AM, Giampaolo Trapasso < giampaolo.trapa...@radicalbit.io> wrote: > Hi

Re: How many nodes do we require

2016-03-25 Thread Jack Krupansky
It depends on how much data you have. A single node can store a lot of data, but the more data you have the longer a repair or node replacement will take. How long can you tolerate for a full repair or node replacement? Generally, RF=3 is both sufficient and recommended. -- Jack Krupansky

Re: What is the best way to model my time series?

2016-03-25 Thread Jack Krupansky
ntipattern for Cassandra. But... you can probably get it to work with enough care and sufficient provisioning of the cluster. The big problem is that rapid, large-scale removal from the queue generates tons of tombstones that need to be removed. The DateTieredCompactionStrategy may help as well. -- Jack

Re: Rack aware question.

2016-03-23 Thread Jack Krupansky
be retrieved even when a rack-level failure occurs. In short, if CL=ALL is acceptable, then you might as well dump the rack-aware approach, which was how you got into this situation in the first place. -- Jack Krupansky On Wed, Mar 23, 2016 at 7:31 PM, Anubhav Kale <anubhav.k...@microsoft.com> wrote

Re: com.datastax.driver.core.Connection "This should not happen and is likely a bug, please report."

2016-03-23 Thread Jack Krupansky
iling list is always a good idea, but as we continually see, people have difficulty even discerning the distinction between the Cassandra user list and the driver user lists. In truth, to some (a lot) of us, this distinction between "user" and "driver" is quite baffling. Sor

Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Jack Krupansky
executedby is the ID assigned to an employee. I'm presuming that JSON is to be used for objectbefore/after. This suggests no ability to query by individual object fields. I didn't sense any other columns that would be JSON. -- Jack Krupansky On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge

Re: Single node Solr FTs not working

2016-03-19 Thread Jack Krupansky
highlight what the difference is that causes the problem. Doc: http://docs.datastax.com/en/latest-dse/datastax_enterprise/srch/srchTrnsFrm.html -- Jack Krupansky On Fri, Mar 18, 2016 at 4:30 AM, Joseph Tech <jaalex.t...@gmail.com> wrote: > Hi, > > I had setup a single-node DSE

Re: Questions about Datastax support

2016-03-19 Thread Jack Krupansky
. -- Jack Krupansky On Thu, Mar 17, 2016 at 10:39 AM, Rakesh Kumar <rakeshkumar46...@gmail.com> wrote: > > 1. They have a published support policy: > > http://www.datastax.com/support-policy/supported-software > > Why is the version number so different from the cassa

Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Jack Krupansky
that an MV PK can only include one non-PK data column - CASSANDRA-9928 <https://issues.apache.org/jira/browse/CASSANDRA-9928>.) -- Jack Krupansky On Wed, Mar 16, 2016 at 4:40 PM, I PVP <i...@hotmail.com> wrote: > Jack/Tom > Thanks for answering. > > Here is the table defin

Re: Questions about Datastax support

2016-03-19 Thread Jack Krupansky
1. They have a published support policy: http://www.datastax.com/support-policy/supported-software -- Jack Krupansky On Thu, Mar 17, 2016 at 10:09 AM, Rakesh Kumar <rakeshkumar46...@gmail.com> wrote: > Few questions: > > 1 - Has there been an announcement as to when Dat

Re: Question about SELECT command

2016-03-19 Thread Jack Krupansky
which can directly be mapped to a node (or multiple nodes with replication.) Ad hoc, complex, and expensive queries are anti-patterns in Cassandra (very discouraged if not outright not supported.) -- Jack Krupansky On Thu, Mar 17, 2016 at 12:25 PM, Thouraya TH <thouray...@gmail.com> wrote:

Re:

2016-03-15 Thread Jack Krupansky
Be sure to post your final (working) insert for others to learn from! -- Jack Krupansky On Tue, Mar 15, 2016 at 11:56 AM, Rami Badran <ramibadran...@gmail.com> wrote: > thanks got it > > On Tue, Mar 15, 2016 at 5:54 PM, Jack Krupansky <jack.krupan...@gmail.com> > w

Re:

2016-03-15 Thread Jack Krupansky
There's a UDT example in the doc, showing that you don't put quotes around the UDT key names: https://docs.datastax.com/en/cql/3.3/cql/cql_using/useInsertUDT.html -- Jack Krupansky On Tue, Mar 15, 2016 at 11:52 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > In any case, pl

Re:

2016-03-15 Thread Jack Krupansky
No quotes around the UDT key names. (Or use double quotes.) -- Jack Krupansky On Tue, Mar 15, 2016 at 10:56 AM, Rami Badran <ramibadran...@gmail.com> wrote: > here is the CQL > > insert into users (uid,loginIds) values ('111',{ 'emails' : {' > f...@baggins.com',

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-12 Thread Jack Krupansky
needs to be centered on point queries and narrow contiguous slices. Even with Spark and analytics that may indeed need to do a full scan of a large amount of data, the model needs to be that the big scan is done in small chunks. -- Jack Krupansky On Sat, Mar 12, 2016 at 10:23 AM, Jason Kania

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
Thanks, that level of query detail gives us a better picture to focus on. I think through this some more over the weekend. Also, these queries focus on raw, bulk retrieval of sensor data readings, but do you have reading-based queries, such as range of an actual sensor reading? -- Jack Krupansky

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
ber of rows) without hitting a bulk size issue for the partition. But... I don't want to jump to solutions until we have a firmer handle on the query side of the fence. -- Jack Krupansky On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com> wrote: > Jack, > &

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
of this other stuff upfront. -- Jack Krupansky On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.ka...@ymail.com> wrote: > Jack, > > Thanks for the response. I don't think I provided enough information and > used the wrong terminology as your response is more the canned ad

Re: Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-11 Thread Jack Krupansky
What is your schema and data like - in particular, how wide are your partitions (number of rows and typical row size)? Maybe you just need (a lot) more heap for rows during the repair process. -- Jack Krupansky On Fri, Mar 11, 2016 at 11:19 AM, Adam Plumb <apl...@fiksu.com>

Re: What is wrong in this token function

2016-03-10 Thread Jack Krupansky
html (for 2.2 and 3.x) -- Jack Krupansky On Thu, Mar 10, 2016 at 5:14 PM, Rakesh Kumar <dcrunch...@aim.com> wrote: > I am using default Murmur3. So are you saying in case of Murmur3 the > following two queries > > select count*) > where customer_id = '289' > and ev

Re: Exception about too long clustering key

2016-03-10 Thread Jack Krupansky
f the repo. Interesting. I mean, I wanted to search through the code as of the tag for 2.2.4. You would have to actually check out the code from that tag and then search in an IDE. -- Jack Krupansky On Thu, Mar 10, 2016 at 3:53 PM, Emīls Šolmanis <emils.solma...@gmail.com> wrote: > > Jack >

Re: What is wrong in this token function

2016-03-10 Thread Jack Krupansky
ou can use RDBMS-like WHERE conditions to select a slice of the partition. -- Jack Krupansky On Thu, Mar 10, 2016 at 4:45 PM, Rakesh Kumar <dcrunch...@aim.com> wrote: > > typo: the primary key was (customer_id + event_time ) > > > -Original Message- > From

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jack Krupansky
g the commit log on a separate SSD device. That should probably be mentioned. -- Jack Krupansky On Thu, Mar 10, 2016 at 12:52 PM, Matt Kennedy <matt.kenn...@datastax.com> wrote: > It isn't really the data written by the host that you're concerned with, > it's the data written by your application

Re: Exception about too long clustering key

2016-03-10 Thread Jack Krupansky
Did you ever find the source of the message? I couldn't find it in github either, either in the driver or Cassandra proper. -- Jack Krupansky On Thu, Mar 10, 2016 at 12:39 PM, Emīls Šolmanis <emils.solma...@gmail.com> wrote: > In case someone stumbles upon this same thing later. >

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jack Krupansky
nal tables. As a general proposition, Cassandra should not be used for heavy filtering - query tables with the filtering criteria baked into the PK is the way to go. -- Jack Krupansky On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com> wrote: > Hi, > > We have

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-09 Thread Jack Krupansky
erformance for storage capacity. But... that would be an enhancement, not something that is "supported" out of the box today. What use cases would this satisfy? I mean, who is it that can get away with sacrificing performance these days? -- Jack Krupansky On Mon, Mar 7, 2016 at 3:29

Re: ntpd clock sync

2016-03-09 Thread Jack Krupansky
How far out of sync are the nodes? A few minutes or less? Many hours? Worst case, you could simply take the entire cluster down until that future time has passed and then bring it back up. -- Jack Krupansky On Wed, Mar 9, 2016 at 11:27 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com>

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-06 Thread Jack Krupansky
keyword and prefix/suffix search. But it doesn't support multi-column ad hoc queries, which is what people tend to use Lucene and Solr for. So, again, it all depends on your queries and your data cardinality. -- Jack Krupansky On Sun, Mar 6, 2016 at 1:29 AM, Bhuvan Rawal <bhu1ra...@gmail.com>

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-05 Thread Jack Krupansky
You haven't been clear about how you intend to add Solr. You can also use Stratio or Stargate for basic Lucene search if you don't want need full Solr support and want to stick to open source rather than go with DSE Search for Solr. -- Jack Krupansky On Sun, Mar 6, 2016 at 12:25 AM, Bhuvan Rawal

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-04 Thread Jack Krupansky
, they are absolute requirements. -- Jack Krupansky On Fri, Mar 4, 2016 at 9:04 PM, Hiroyuki Yamada <mogwa...@gmail.com> wrote: > Hi, > > I'm working on some POCs for Cassandra with single 2GB RAM node > environment and > some issues came up with me, so let me ask here. > > I

Re: Updating secondary index options

2016-03-04 Thread Jack Krupansky
Is this a secondary indexer of your own design so that you know that changing the options will be safe for existing index entries? It might be worth a Jira. Otherwise, you may jus have to manually go in and hack the information under the hood. -- Jack Krupansky On Fri, Mar 4, 2016 at 12:14 PM

Re: Removing Node causes bunch of HostUnavailableException

2016-03-03 Thread Jack Krupansky
? When you say that the failures don't last for more than a few minutes, you mean from the moment you perform the nodetool removenode? And is operation completely normal after those few minutes? -- Jack Krupansky On Thu, Mar 3, 2016 at 4:40 PM, Peddi, Praveen <pe...@amazon.com> wrote: >

Re: Removing Node causes bunch of HostUnavailableException

2016-03-03 Thread Jack Krupansky
of nodes that were removed? How many seed nodes does each node typically have? -- Jack Krupansky On Thu, Mar 3, 2016 at 4:16 PM, Peddi, Praveen <pe...@amazon.com> wrote: > Thanks Alain for quick and detailed response. My answers inline. One thing > I want to clarify is, the nodes got

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
It is the total table count, across all key spaces. Memory is memory. -- Jack Krupansky On Tue, Mar 1, 2016 at 6:26 PM, Brian Sam-Bodden <bsbod...@integrallis.com> wrote: > Eric, > Is the keyspace as a multitenancy solution as bad as the many tables > pattern? Is the

Re: List of List

2016-03-01 Thread Jack Krupansky
Thrift? Hah! Sorry, I can't help you if you are going that route. I recommend CQL - only. -- Jack Krupansky On Tue, Mar 1, 2016 at 4:47 PM, Sandeep Kalra <sandeep.ka...@gmail.com> wrote: > The way I was planning is to give a restful interface to lookup details of > a question, a

Re: Commit log size vs memtable total size

2016-03-01 Thread Jack Krupansky
It would be nice to get this info into the doc or at least a blog post. -- Jack Krupansky On Tue, Mar 1, 2016 at 4:37 PM, Tyler Hobbs <ty...@datastax.com> wrote: > > On Tue, Mar 1, 2016 at 6:13 AM, Vlad <qa23d-...@yahoo.com> wrote: > >> So commit log can't keep m

Re: List of List

2016-03-01 Thread Jack Krupansky
Okay, so a very large number of questions, each with a very modest number of answers (generally under 5), each with a modest number of comments (generally under 5). Now we're back to the issue of how you wish to query and access the data. -- Jack Krupansky On Tue, Mar 1, 2016 at 12:39 PM

Re: Cassandra Ussages

2016-03-01 Thread Jack Krupansky
I would spin it as Cassandra being the right choice where your primary need in OLTP and with a secondary need for analytics. IOW, where you would otherwise need to use two separate databases for the same data. -- Jack Krupansky On Tue, Mar 1, 2016 at 12:40 PM, Jonathan Haddad &l

Re: List of List

2016-03-01 Thread Jack Krupansky
Clustering columns are your friends. But the first question is how you need to query the data. Queries drive data models in Cassandra. What is the cardinality of this data - how many answers per question and how many comments per answer? -- Jack Krupansky On Tue, Mar 1, 2016 at 12:23 PM

Re: Cassandra Ussages

2016-03-01 Thread Jack Krupansky
with string key values effectively gives you extensible columns. -- Jack Krupansky On Tue, Mar 1, 2016 at 11:22 AM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > Jonathan thanks for the link, > I believe that maybe is good as Data Store part, because is fast for I/o > and hand

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
se case that can't easily be handled by a single table, that could get the discussion started. -- Jack Krupansky On Tue, Mar 1, 2016 at 9:11 AM, Fernando Jimenez < fernando.jime...@wealth-port.com> wrote: > Hi Jack > > Being purposefully developed to only handle up to “a few hundred” tables

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
is strongly not recommended. As the Jira notes, "having more than dozens or hundreds of tables defined is almost certainly a Bad Idea." "Bad Idea" means not good. As in don't go there. And if you do, don't expect such a mis-adventure to be supported by the community. -- Jack Krup

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
r specific access patterns, and your specific load. And it also depends on your own personal tolerance for degradation of latency and throughput - some people might find a given set of performance metrics acceptable while other might not. -- Jack Krupansky On Tue, Mar 1, 2016 at 3:54 AM, Fernan

Re: Practical limit on number of column families

2016-02-29 Thread Jack Krupansky
lly have two choices: an additional cluster column to distinguish categories of table, or separate clusters for each few hundred of tables. -- Jack Krupansky On Mon, Feb 29, 2016 at 12:30 PM, Fernando Jimenez < fernando.jime...@wealth-port.com> wrote: > Hi all > > I have a use ca

Re: Cassandra Data Audit

2016-02-25 Thread Jack Krupansky
There is an open Jira on this exact topic - Change Data Capture (CDC): https://issues.apache.org/jira/browse/CASSANDRA-8844 Unfortunately, open means not yet done. -- Jack Krupansky On Thu, Feb 25, 2016 at 2:13 AM, Charulata Sharma (charshar) < chars...@cisco.com> wrote: &g

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-24 Thread Jack Krupansky
immediately, fairly soon, or only after about as long as they take from a clean fresh start? -- Jack Krupansky On Wed, Feb 24, 2016 at 7:04 PM, Mike Heffner <m...@librato.com> wrote: > Nate, > > So we have run several install tests, bisecting the 2.1.x release line, &

Re: JBOD device space allocation?

2016-02-24 Thread Jack Krupansky
he user would hit this? I mean, why would the code care either way with respect to JBOD strategy for the case where no local data is stored? -- Jack Krupansky On Wed, Feb 24, 2016 at 2:15 AM, Marcus Eriksson <krum...@gmail.com> wrote: > It is mentioned here btw: http://www.datastax.com/dev

JBOD device space allocation?

2016-02-23 Thread Jack Krupansky
for device space utilization. Thanks! -- Jack Krupansky

Re: Nodes go down periodically

2016-02-23 Thread Jack Krupansky
- but most technical problems on a node would be clearly logged on that node. If you see a lapse of connectivity no more than once or twice a day, consider yourselves lucky. Is it only one node at a time that goes down, and at widely dispersed times? How many nodes? -- Jack Krupansky On Tue, Feb 23

Re: Gossip Protocol

2016-02-21 Thread Jack Krupansky
. What type of info did you wish to pass around? -- Jack Krupansky On Sun, Feb 21, 2016 at 8:56 AM, Thouraya TH <thouray...@gmail.com> wrote: > Hi all; > > Please, where can i find what are the details saved by gossip protocol ? > > Is it possible to add other informa

Re: Forming a cluster of embedded Cassandra instances

2016-02-15 Thread Jack Krupansky
But again, you could also simply spawn a process running Cassandra as-is in its intended form which would eliminate the potential for conflict between the app heap and Casandra's JVM heap. -- Jack Krupansky On Mon, Feb 15, 2016 at 12:56 AM, Jan Kesten <j.kes...@enercast.de> wrote:

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Jack Krupansky
What does your query actually look like today? Is your non-EQ on timestamp selecting a single row a few rows or many rows (dozens, hundreds, thousands)? -- Jack Krupansky On Sun, Feb 14, 2016 at 7:40 PM, Gianluca Borello <gianl...@sysdig.com> wrote: > Thanks again. > > One clar

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Jack Krupansky
You can definitely read all of columns in a single SELECT. And the n-INSERTS can be batched and will insert fewer cells in the storage engine than the previous approach. -- Jack Krupansky On Sun, Feb 14, 2016 at 7:31 PM, Gianluca Borello <gianl...@sysdig.com> wrote: > Thank you for y

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Jack Krupansky
of them. -- Jack Krupansky On Sun, Feb 14, 2016 at 5:22 PM, Gianluca Borello <gianl...@sysdig.com> wrote: > Hi > > I've just painfully discovered a "little" detail in Cassandra: Cassandra > touches all columns on a CQL select (related issues > https://issues.apache

Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread Jack Krupansky
What motivated the use of an embedded instance for development - as opposed to simply spawning a process for Cassandra? -- Jack Krupansky On Sun, Feb 14, 2016 at 2:05 PM, John Sanda <john.sa...@gmail.com> wrote: > The project I work on day to day uses an embedded instance of

Re: Forming a cluster of embedded Cassandra instances

2016-02-13 Thread Jack Krupansky
ndra nodes. That said, if any of the senior Cassandra developers wish to personally support your efforts towards embedded clusters, they are certainly free to do so. we'll see if any of them step forward. -- Jack Krupansky On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas <binil.thomas.pub...@gmail.com &

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Jack Krupansky
the problem recurs for that node? -- Jack Krupansky On Fri, Feb 12, 2016 at 4:06 AM, Skvazh Roman <r...@skvazh.com> wrote: > Hello! > We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached > 1.5 TB 4000 PIOPS EBS drive. > Sometimes one or two nodes user cpu sp

Re: Cassandra Collections performance issue

2016-02-11 Thread Jack Krupansky
? And are you indexing map columns, keys or values? -- Jack Krupansky On Thu, Feb 11, 2016 at 10:44 AM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > I have experienced excessive performance issues while using collections as > well. Mostly my issue was due to the exce

Re: Security labels

2016-02-11 Thread Jack Krupansky
document or is it strictly internal for your employer? I know there is a database of these assessments, but I don't know who controls what becomes public and when. -- Jack Krupansky On Thu, Feb 11, 2016 at 3:23 PM, oleg yusim <olegyu...@gmail.com> wrote: > Hi Dani, > > As promised, I

Re: Session timeout

2016-02-11 Thread Jack Krupansky
efforts if their infrastructure does not implicitly effect mitigation for various security exposures. -- Jack Krupansky On Thu, Feb 11, 2016 at 3:21 PM, oleg yusim <olegyu...@gmail.com> wrote: > Robert, Jack, Bryan, > > As you suggested, I put together d

Re: Rows with same key

2016-02-11 Thread Jack Krupansky
(Note to self... check docs to see if they give this troubleshooting tip. I didn't see it at first glance.) -- Jack Krupansky On Thu, Feb 11, 2016 at 2:45 PM, Kai Wang <dep...@gmail.com> wrote: > Are you supplying timestamps from the client side? Are clocks in sync > cros

Re: distributing load across cluster

2016-02-10 Thread Jack Krupansky
What do your partition and cluster keys look like? Check a nodetool tablestats to see number of partition keys on the nodes. Also check nodetool tablehistograms to see if you have a lot of too-wide rows due to the balance of data between the partition key and clustering columns. -- Jack

Re: distributing load across cluster

2016-02-10 Thread Jack Krupansky
Sorry, I didn't realize you were still living in the stone age with DSE - and Cassandra 2.1. Chnage "table" to "cf" (column family.) -- Jack Krupansky On Wed, Feb 10, 2016 at 3:23 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I don't see tablestats sub-command: >

Re: distributing load across cluster

2016-02-10 Thread Jack Krupansky
That's for one node. You can look at the writes for each node. I'm actually not sure if the partition key count includes memtables in addition to sstables. A nodetool flush will assure that any memtable data gets flushed to sstables. -- Jack Krupansky On Wed, Feb 10, 2016 at 3:30 PM, Ted Yu

Re: [RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jack Krupansky
Technically this is actually "a bug fix release[1] on the 3.x series" - "3.x" rather than "3.3", I think. -- Jack Krupansky On Tue, Feb 9, 2016 at 1:50 PM, Jake Luciani <j...@apache.org> wrote: > The Cassandra team is pleased to announce the re

Re: Latest stable release

2016-02-08 Thread Jack Krupansky
be out a month or so later. -- Jack Krupansky On Mon, Feb 8, 2016 at 4:54 PM, Will Hayworth <whaywo...@atlassian.com> wrote: > We're having good luck running 3.2.1 in production, but ours is a small > cluster and we

Re: Writing a large blob returns WriteTimeoutException

2016-02-08 Thread Jack Krupansky
You appear to be writing the entire bob on each chunk rather than the slice of the blob. -- Jack Krupansky On Mon, Feb 8, 2016 at 1:45 PM, Giampaolo Trapasso < giampaolo.trapa...@radicalbit.io> wrote: > Hi to all, > > I'm trying to put a large binary file (> 500MB) on a

Re: Writing a large blob returns WriteTimeoutException

2016-02-08 Thread Jack Krupansky
the same data for each chunk. -- Jack Krupansky On Mon, Feb 8, 2016 at 5:34 PM, Giampaolo Trapasso < giampaolo.trapa...@radicalbit.io> wrote: > I write at every step MyConfig.blobsize number of bytes, that I configured > to be from 10 to 100. This allows me to "simulate" th

Re: Writing a large blob returns WriteTimeoutException

2016-02-08 Thread Jack Krupansky
of Cassandra internal row management to know whether your chunks should be a little less than some power of 2 so that a single row is not just over a power of 2 in size. You may need more heap as well. Maybe you are hitting a high rate of GC that may cause timeout. -- Jack Krupansky On Mon, Feb 8, 2016

Re: missing rows while importing data using sstable loader

2016-02-05 Thread Jack Krupansky
I sent a message to DataStax Docs to add this nodetool flush suggestion to the doc for sstableloader. -- Jack Krupansky On Fri, Feb 5, 2016 at 3:35 AM, Romain Hardouin <romainh...@yahoo.fr> wrote: > > What is the best practise to create sstables? > > When you run a "no

Re: Duplicated key with an IN statement

2016-02-04 Thread Jack Krupansky
Sylvain, there's a bug in CHANGES.TXT for this issue. It says: "Duplicate rows returned when in clause has repeated values (CASSANDRA-6707)", but the issue number is really 6706. -- Jack Krupansky On Thu, Feb 4, 2016 at 9:54 AM, Sylvain Lebresne <sylv...@datastax.com> wrote: >

Re: Missing rows while scanning table using java driver

2016-02-03 Thread Jack Krupansky
CL=ALL has no benefit is RF=1. Your code snippet doesn't indicate how you initialize and update the token in the query. The ">" operator would assure that you skip the first token. -- Jack Krupansky On Wed, Feb 3, 2016 at 1:36 AM, Priyanka Gugale <pri...@apache.org> wrote: &

Re: EC2 storage options for C*

2016-02-03 Thread Jack Krupansky
agnetic volumes are not recommended for Cassandra data storage volumes for the following reasons:..." as well as: "Note: Use only ephemeral instance-store or the recommended EBS volume types for Cassandra data storage." See: http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningE

Re: Java Driver Question

2016-02-02 Thread Jack Krupansky
none were accessible. -- Jack Krupansky On Tue, Feb 2, 2016 at 10:47 AM, Richard L. Burton III <mrbur...@gmail.com> wrote: > In the case of adding more nodes to the cluster, would my application have > to be restarted to detect the new nodes (as opposed to a node acting like a &

Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-02 Thread Jack Krupansky
And CASSANDRA-10699 seems to be the sub-issue of CASSANDRA-9424 to do that: https://issues.apache.org/jira/browse/CASSANDRA-10699 -- Jack Krupansky On Tue, Feb 2, 2016 at 9:59 AM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > Hi Ken, > > Earlier in this thread

  1   2   3   4   >