Re: Question on 'Average tombstones per slice' when running cfstats

2015-07-06 Thread Jonathan Haddad
Wouldn't it suggest a delete heavy workload, rather than update? On Mon, Jul 6, 2015 at 5:21 PM Robert Coli wrote: > On Mon, Jul 6, 2015 at 4:19 PM, Venkatesh Kandaswamy < > ve...@walmartlabs.com> wrote: > >> I cannot find documentation on the last two parameters given by >> cfstats below. I

Re: auto_bootstrap=false broken?

2015-08-04 Thread Jonathan Haddad
You're trying to solve a problem that doesn't exist. Cassandra only starts serving reads when it's ready. On Tue, Aug 4, 2015 at 10:51 AM horschi wrote: > Hi Robert, > > sorry for the confusion. Perhaps write_survey is not my solution > (unfortunetaly I cant get it to work, so I dont really kno

Re: Retrieve all the columnfamily / tables of thrift and CQL from the keyspace in cassandra

2015-08-05 Thread Jonathan Haddad
+1. The project hasn't had a single relevant commit in almost a full year and is now officially unsupported. Migrate your data asap to CQL. On Wed, Aug 5, 2015 at 9:05 AM Alain RODRIGUEZ wrote: > Hi > > "I use hector" --> This is a very bad idea imho, even more while using C* > 2.1. > > Hecto

Re: Configuring Cassandra to limit number of columns to read

2015-08-14 Thread Jonathan Haddad
250k columns?As in, you have a CREATE TABLE statement that would have over 250K separate, typed fields? On Fri, Aug 14, 2015 at 11:07 AM Ahmed Ferdous wrote: > Hi Guys, > > > > We have designed a table to have rows with large number of columns (more > than 250k). One of my colleagues, mistak

Re: Configuring Cassandra to limit number of columns to read

2015-08-17 Thread Jonathan Haddad
15 11:20 AM > > > *To:* user@cassandra.apache.org > *Subject:* Re: Configuring Cassandra to limit number of columns to read > > > > The idea that you have 250k columns is somewhat of an anti-pattern. In > this case you would typically have a few columns and many rows, th

Re: Configuring Cassandra to limit number of columns to read

2015-08-17 Thread Jonathan Haddad
Also, are you using Thrift? The terminology you're using, specifically column keys, suggests you are. On Mon, Aug 17, 2015 at 5:21 PM Jonathan Haddad wrote: > What version of Cassandra are you running? > > On Fri, Aug 14, 2015 at 11:35 AM Ahmed Ferdous > wrote: > >&g

Re: Practical limitations of too many columns/cells ?

2015-08-24 Thread Jonathan Haddad
Can you post your findings to JIRA as well? Would be good to see some real numbers from production. The refactor of the storage engine (8099) may completely change this, but it's good to have it on the radar. On Sun, Aug 23, 2015 at 10:31 PM Kevin Burton wrote: > Agreed. We’re going to run a

Re: memtable and sstables

2015-09-05 Thread Jonathan Haddad
Technically there could be data in an sstable with a later time stamp than what exists in the memtable. Consider the use case of issuing a delete in the future to avoid race conditions. On Sat, Sep 5, 2015 at 10:42 AM Ray Sutton wrote: > This documentation from Datastax may be helpful to understa

Re: Concurrency in Cassandra

2015-09-15 Thread Jonathan Haddad
You may want to take an hour and watch a video on Cassandra fundamentals. It'll answer a lot of the questions you're likely to ask next, including this one. https://academy.datastax.com/courses/ds101-introduction-cassandra On Tue, Sep 15, 2015 at 2:04 PM Thouraya TH wrote: > Hi all; > > > Pleas

Re: No schema agreement from live replicas

2015-09-16 Thread Jonathan Haddad
With Rf=2, cl=quorum is effectively the same as ALL. Expect downtime anytime you restart a node. On Wed, Sep 16, 2015 at 3:39 PM Sebastian Estevez < sebastian.este...@datastax.com> wrote: > check nodetool describecluster to see the schema versions across your > nodes. >

Re: Cassandra Summit 2015 Roll Call!

2015-09-22 Thread Jonathan Haddad
Yo. It's me. Haddad, aka rustyrazorblade. 6'1", hair probably in a bun and a beard. Helping with training today, giving a talk on pyspark & on the python driver tomorrow. I'll be at the MVP dinner. Wearing a DataStax training t shirt today, not sure about the rest of the time though. Here I

Re: High read latency

2015-09-27 Thread Jonathan Haddad
1. Is it consistently taking that long? 2. Have you traced the requests? 3. Are you watching your GC history? 4. What's the load on the machine? Does dstat show high CPU or disk utilization? I did a webinar about a year ago on how to dig into these issues, you may find it useful: https://www.yout

Re: DC's versions compatibility

2015-09-28 Thread Jonathan Haddad
No, they won't. Always run the same version across your cluster. On Mon, Sep 28, 2015 at 5:29 AM Carlos Alonso wrote: > Hi guys. > > I have a very old cassandra cluster 1.2.19 and I'm looking to add a new > datacenter to it for analytics purposes in a newer version, let's say > 2.1.8. Will thos

Re: Running Cassandra on Java 8 u60..

2015-09-28 Thread Jonathan Haddad
There are plenty of people running huge clusters on G1. On Mon, Sep 28, 2015 at 12:30 AM Nathan Bijnens wrote: > We are running OpenJDK7 with G1GC and encountered no issues so far. We > took the tuning parameters from the Cassandra 3.0 branch. > > Kind regards, > Nathan > > On Mon, Sep 28, 201

Re: Consistency Issues

2015-10-01 Thread Jonathan Haddad
You say that you don't think GC is your issue... but did you actually check? The reasons you suggest aren't very convincing. Can you provide your GC settings, and take a look at jstat --gccause? http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option On Thu, Oct 1

Re: Cassandra Configuration VS Static IPs.

2015-10-04 Thread Jonathan Haddad
Public IP? No, not required unless you're running multiple DCs. Where are you running a DC that IPs aren't cheap? If you're in AWS they're basically free (or at least the cheapest section of your bill by far) On Sun, Oct 4, 2015 at 5:59 PM Renato Perini wrote: > Is cassandra really supposed

Re: Cassandra Configuration VS Static IPs.

2015-10-04 Thread Jonathan Haddad
; instance and the machines are up 24h/7. I have to shut down the machines > during the night for various reasons, so unfortunately they're not totally > free for my use case. > > > > Il 05/10/2015 00:04, Jonathan Haddad ha scritto: > > Public IP? No, not required unless yo

Re: Cassandra Configuration VS Static IPs.

2015-10-04 Thread Jonathan Haddad
on all nodes using their public IPs without being required to > know them (the client would discover them dynamically while connecting). > > > > Il 05/10/2015 00:55, Jonathan Haddad ha scritto: > > So you're not running the client in the same DC as your Cassandra > clus

Re: Cassandra Configuration VS Static IPs.

2015-10-04 Thread Jonathan Haddad
not changing anything. The instances, as I said multiple > times, don't have an elastic ip, so the public IP is dynamic. This means it > changes automatically at every reboot. > > > Il 05/10/2015 02:22, Jonathan Haddad ha scritto: > > If your client is in the same DC, then

Re: Cassandra Configuration VS Static IPs.

2015-10-06 Thread Jonathan Haddad
x >> devcenter. No "same dc" concepts are involved for using it. >> As for AWS, I'm not changing anything. The instances, as I said multiple >> times, don't have an elastic ip, so the public IP is dynamic. This means it >> changes automatically at every

Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread Jonathan Haddad
Unless you're close to running out of disk space, what's the harm in it taking a while? How big is your DC? At 45 min per node, you can do 32 nodes a day. Diverting traffic away from a DC just to run cleanup feels like overkill to me. On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <

Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread Jonathan Haddad
center were to serve traffic. Is running cleanup > in parallel advisable?? > > On Thu, Oct 8, 2015, 17:53 Jonathan Haddad wrote: > >> Unless you're close to running out of disk space, what's the harm in it >> taking a while? How big is your DC? At 45 min pe

Re: Realtime data and (C)AP

2015-10-08 Thread Jonathan Haddad
Your options are 1. Read & write at quorum 2. Recognize that, in general, if you've got a real need for Cassandra, your data is out of date almost immediately after you've read it no matter what guarantee your DB gives you, so you might as well just forget about ever getting the "right" answer bec

Re: Realtime data and (C)AP

2015-10-08 Thread Jonathan Haddad
Renato Perini wrote: > I'm asking because the DataStax DS-201 course states that C* is an ideal > fit for messaging applications. > What I'm not understanding? :-) > Messaging applications generally must be totally consistent, expecially > real-time ones. > > > Il

Re: Spark and intermediate results

2015-10-09 Thread Jonathan Haddad
You can run spark against your Cassandra data directly without using a shared filesystem. https://github.com/datastax/spark-cassandra-connector On Fri, Oct 9, 2015 at 6:09 AM Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > Hello, > > I saw this nice link from an event:

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Jonathan Haddad
I'd be curious to see GC logs. jstat -gccause On Fri, Oct 9, 2015 at 2:16 PM Tyler Hobbs wrote: > Hmm, it seems off to me that the merge step is taking 1 to 2 seconds, > especially when there are only ~500 cells from one sstable and the > memtables. Can you open a ticket ( > https://issues.ap

Re: Can consistency-levels be different for "read" and "write" in Datastax Java-Driver?

2015-10-26 Thread Jonathan Haddad
What's your query? Do you have IF NOT EXISTS in there? On Mon, Oct 26, 2015 at 11:17 AM Ajay Garg wrote: > Right now, I have setup "LOCAL QUORUM" as the consistency level in the > driver, but it seems that "SERIAL" is being used during writes, and I > consistently get this error of type :: > >

Re: Oracle TIMESTAMP(9) equivalent in Cassandra

2015-10-29 Thread Jonathan Haddad
My point is about the difficulty in having perfect clocks in a distributed system. If nanosecond precision isn't happening at Google scale, it's unlikely to be happening anywhere. The fact that dapper was written in the context of tracing is irrelevant. On Thu, Oct 29, 2015 at 7:27 PM Brice Dutheil

Re: Do I have to use the cql in the datastax java driver?

2015-11-08 Thread Jonathan Haddad
You shouldn't use thrift, it's effectively dead. On Fri, Nov 6, 2015 at 10:30 PM Dikang Gu wrote: > Hi there, > > In the datastax java driver, do I have to use the cql to talk to cassandra > cluster? > > Can I still use thrift interface to talk to cassandra? Any reason that we > should not use th

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jonathan Haddad
You could have dropped mutations without downtime. Check nodetool tpstats. On Fri, Nov 13, 2015 at 2:48 PM Peddi, Praveen wrote: > Hi Rob, > We do not currently run repairs because we know our deployment time for > each cassandra node is very short. I do understand we have to run repairs > but wo

Re: Usage volume of older versions of Cassandra

2015-12-15 Thread Jonathan Haddad
Yes... I agree with Rob here. I don't see much benchmarking required for versions of Cassandra that aren't actively supported by the committers. On Tue, Dec 15, 2015 at 10:52 AM Robert Coli wrote: > On Tue, Dec 15, 2015 at 6:28 AM, Andy Kruth wrote: > >> We are trying to decide how to proceed

Re: Better setup to start using in production on one server

2015-12-15 Thread Jonathan Haddad
If I had to choose between running 3x docker instances and 1x instance on a single server, I'd choose the single one. Instead of dealing with RF changing nonsense I'd just set up a 2nd data center w/ 3 nodes and move to that when you're ready. No downtime, easy. With that said - Starting off wit

Re: Query Consistency Issues...

2015-12-15 Thread Jonathan Haddad
High volume updates to a single key in a distributed system that relies on a timestamp for conflict resolution is not a particularly great idea. If you ever do this from multiple clients you'll find unexpected results at least some of the time. On Tue, Dec 15, 2015 at 12:41 PM Paulo Motta wrote:

Re: Cassandra 3.1 - Aggregation query failure

2015-12-21 Thread Jonathan Haddad
Even if you get this to work for now, I really recommend using a different tool, like Spark. Personally I wouldn't use UDAs outside of a single partition. On Mon, Dec 21, 2015 at 1:50 AM Dinesh Shanbhag < dinesh.shanb...@isanasystems.com> wrote: > > Thanks for the pointers! I edited jvm.options

Re: Is CQLSSTableWriter tied to C* version?

2015-12-22 Thread Jonathan Haddad
The streaking format is directly tied to the sstable format. So, in general, if the format changes between versions, you can't stream. I don't think the format changed between these 2 versions, but I'm typing this on my phone and can't verify. On Tue, Dec 22, 2015 at 6:36 PM Kai Wang wrote: > Hi

Re: Write/read heavy usecase in one cluster

2015-12-23 Thread Jonathan Haddad
While I would normally suggest splitting different systems to different hardware, you can easily get away with using 3 rather small machines for this workload. Just be sure to not use SimpleStrategy so you can split the keyspaces out to different clusters later if you need to. On Wed, Dec 23, 201

Re: Slow write speeds

2015-12-31 Thread Jonathan Haddad
The limitation is on the driver side. Try looking at execute_concurrent_with_args in the cassandra.concurrent module to get parallel writes with prepared statements. https://datastax.github.io/python-driver/api/cassandra/concurrent.html On Wed, Dec 30, 2015 at 11:34 PM Alexandre Beaulne < alexandr

Re: Requesting some details for my use case

2016-01-05 Thread Jonathan Haddad
Sorry to nitpick, but Cassandra is not a columnar database. If you're looking for columnar because you have an analytics need, Cassandra is not what you want. If you've just made the same mistake that 99% of people make, well, now you know. Cassandra historically has been referred to as a "Colum

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jonathan Haddad
You could keep a "num_buckets" value associated with the client's account, which can be adjusted accordingly as usage increases. On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote: > On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < > clintlmar...@coolfiretechnologies.com> wrote: > >> What sort of dat

Re: Data rebalancing algorithm

2016-01-07 Thread Jonathan Haddad
num_tokens is the number of tokens per node, not per cluster. On Thu, Jan 7, 2016 at 10:09 PM Alec Collier wrote: > Have a look at this: > > http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 > > > > The vnodes mechanism is there to provide better scalability as new nodes > are adde

Re: Revisit Cassandra EOL Policy

2016-01-08 Thread Jonathan Haddad
Why wouldn't you keep a bug free version of something in production? If I found a version of *anything* that was bug free I don't think I'd ever upgrade again. On Fri, Jan 8, 2016 at 9:18 AM Anuj Wadehra wrote: > Thanks Robert !!! > > *"I don't run X.Y.Z versions where Z is under 6, so in gener

Re: Modeling contact list, plain table or List

2016-01-11 Thread Jonathan Haddad
In general I advise people avoid lists and use Maps or Sets instead. Using this data model, for instance, it's easy to remove a specific Address from a user: CREATE TYPE address ( street text, city text, zip_code int, ); CREATE TABLE user ( user_id int primary key, addresses map> )

Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Jonathan Haddad
The clustering keys determine the sorting of rows within a partition. The partitions within a file are sorted by their token (usually computed by applying the murmur 3 hash to the partition key). If you are using a version of Cassandra < 3.0, you'll need to maintain your own materialized view tab

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jonathan Haddad
I think you actually get a really useful metric by benchmarking 1 machine. You understand your cluster's theoretical maximum performance, which would be Nodes * number of queries. Yes, adding in replication and CL is important, but 1 machine lets you isolate certain performance metrics. On Thu, J

Re: Strategy / order for upgradesstables during rolling upgrade.

2016-01-21 Thread Jonathan Haddad
Definitely B. On Thu, Jan 21, 2016 at 11:42 AM Robert Coli wrote: > On Thu, Jan 21, 2016 at 11:37 AM, Kevin Burton wrote: > >> I think there are two strategies to upgradesstables after a release. >> >> We're doing a 2.0 to 2.1 upgrade (been procrastinating here). >> >> I think we can go with B

Re: Production with Single Node

2016-01-22 Thread Jonathan Haddad
My opinion: http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/ TL;DR: the only reason to run 1 node in prod is if you're super broke but know you'll need to scale up almost immediately after going to prod (maybe after getting some funding). If you're planning on doin

Re: Production with Single Node

2016-01-22 Thread Jonathan Haddad
If you're going to go with a bunch of smaller, single node servers, use Postgres. It's going to be more flexible with a smaller memory footprint. You could even use sqlite. Would you run a single node zookeeper cluster? Single node map reduce? Single node HDFS? I hope not. Cassandra's strengt

Re: Production with Single Node

2016-01-22 Thread Jonathan Haddad
Have you considered running smaller clusters with 1 customer per keyspace? If you're going to run 1 node (and you want to benchmark it properly) then you probably want to switch commitlog_sync to 'batch' and redo your performance tests. Without it, you're risking data loss and you aren't comparin

Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Jonathan Haddad
Instead of using ZK, why not solve your concurrency problem by removing it? By that, I mean simply have 1 process that creates all your tables instead of creating a race condition intentionally? On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton wrote: > Not sure if this is a bug or not or kind of a

Re: Embedded cassandra

2016-01-26 Thread Jonathan Haddad
For the sake of argument... why do you think you should embed Cassandra? I'll be honest with you, making Cassandra restart every time you want to upgrade your daemon sounds like a horrible idea. Run your 10 DB instances on their own and save yourself the operational headache. On Tue, Jan 26, 2016

Re: Embedded cassandra

2016-01-26 Thread Jonathan Haddad
an be launched inside the > same process in order to make the system simpler to manage. Actually we use > h2 for single instance deployments but it is not good for production. > > -- Enrico > > Il giorno Mar 26 Gen 2016 21:59 Jonathan Haddad ha > scritto: > >> For the sa

Re: Rename Keyspace offline

2016-01-27 Thread Jonathan Haddad
Why rename the keyspace? If it was me I'd just give it a name that includes the date or some identifier and include that logic in my app. That's way easier. On Wed, Jan 27, 2016 at 6:49 AM Jean Tremblay < jean.tremb...@zen-innovations.com> wrote: > Hi, > > I have a huge set of data, which takes ab

Re: Read operations freeze for a few second while adding a new node

2016-01-28 Thread Jonathan Haddad
If you've got a read heavy workload you should check out http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html On Thu, Jan 28, 2016 at 8:11 AM Lorand Kasler wrote: > Hi, > > We are struggling with a problem that when adding nodes around 5% read > operations freeze (a

Re: Session timeout

2016-01-29 Thread Jonathan Haddad
I think the reason why most of your queries aren't being answered is because you're asking questions that most people don't have the answer to. On the automatic disconnect, anyone using Cassandra in prod doesn't really need to think about it because we're always running queries, perhaps millions a

Re: Problem while migrating a single node cluster from 2.1 to 3.2

2016-01-30 Thread Jonathan Haddad
Did you also copy the system keyspaces or did you create the schema manually? On Sat, Jan 30, 2016 at 9:39 AM Jeff Jirsa wrote: > Upgrade from 2.1.9+ directly to 3.0 is supported: > > https://github.com/apache/cassandra/blob/cassandra-3.0/NEWS.txt#L83-L85 > > - Upgrade to 3.0 is supported from C

Re: [RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jonathan Haddad
Adding to Jake's point - it's a noop if you run upgrade sstables and it doesn't need to be upgraded. So just do it and save yourself a headache. On Tue, Feb 9, 2016 at 7:10 PM Jake Luciani wrote: > Well typically you should run upgradesstables when you upgrade major > versions as well > > > htt

Re: Schema Versioning

2016-02-10 Thread Jonathan Haddad
I wrote most of the cqlengine keyspace & table management pieces of the Python driver to solve this exact problem. Instead of working with a series of statements for creating tables & managing columns, we simply created classes in Python and sync'ed them to the DB. It automatically figured out wh

Re: Forming a cluster of embedded Cassandra instances

2016-02-13 Thread Jonathan Haddad
+1 to what jack said. Don't mess with embedded till you understand the basics of the db. You're not making your system any less complex, I'd say you're most likely going to shoot yourself in the foot. On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky wrote: > HA requires an odd number of replicas -

Re: Gossip Protocol

2016-02-21 Thread Jonathan Haddad
You don't need to use Gossip to store that, you can just put it in a table. On Sun, Feb 21, 2016 at 9:38 AM Thouraya TH wrote: > Thank you so much for answers :) > > > *What type of info did you wish to pass around?* > [image: Images intégrées 1] > > In fact, i have on each node a directory ‘My

Re: Problem running select with partial partition keys in version 3.3

2016-02-26 Thread Jonathan Haddad
You wouldn't be able to do that query with that schema in any version of Cassandra. Here's the output from 2.1: cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> use test; cqlsh:test> create table if not exists persistent_map ( ...

Re: Cassandra Ussages

2016-02-28 Thread Jonathan Haddad
Cassandra is primarily used as an OLTP database, not analytics. You should watch this 30 min video discussing Cassandra core concepts (coming from a relational background): https://academy.datastax.com/courses/ds101-introduction-cassandra On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi wrote: > Hel

Re: List of List

2016-03-01 Thread Jonathan Haddad
You probably want to watch some intro videos on Datastax Academy. https://academy.datastax.com/ I suggest the intro video to some basics down: https://academy.datastax.com/courses/ds101-introduction-cassandra and then core concepts, a pretty thorough intro: https://academy.datastax.com/courses/ds2

Re: Cassandra Ussages

2016-03-01 Thread Jonathan Haddad
what it worries me is that looks very complex create the structure for >> each Fact table and then extend >> >> regards. >> >> On Sun, Feb 28, 2016 at 12:28 PM, Jonathan Haddad >> wrote: >> >>> Cassandra is primarily used as an OLTP database, not ana

Re: List of List

2016-03-01 Thread Jonathan Haddad
I'd do something like this: CREATE TABLE questions ( question_id timeuuid primary key, question text ); CREATE TABLE answers ( question_id timeuuid, answer_id timeuuid, answer text, primary key(question_id, answer_id) ); CREATE TABLE comments ( answer_id timeuuid,

Re: List of List

2016-03-01 Thread Jonathan Haddad
Thrift is deprecated, and will be removed in Cassandra 4.0 Don't do any new development with it. What video says to use thrift? On Tue, Mar 1, 2016 at 2:29 PM Sandeep Kalra wrote: > I am in very early stage , so, I can change. Infact, the videos you > pointed also says to do so... > > > Best R

Re: Querying on index

2016-03-01 Thread Jonathan Haddad
That feels like a serious bug. Definitely file a JIRA with as many details as possible. https://issues.apache.org/jira/browse/CASSANDRA/ On Tue, Mar 1, 2016 at 4:38 PM Rakesh Kumar wrote: > Looks like Bloom filter size was the issue. Once I disabled it, the query > returns rows correctly, bu

Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Jonathan Haddad
Can you post a gist of the output of jstat -gccause (60 seconds worth)? I think it's cool you're willing to experiment with alternative JVM settings but I've never seen anyone use max tenuring threshold of 50 either and I can't imagine it's helpful. Keep in mind if your objects are actually reach

Re: Cassandra runing on top of NAS (RAIN storage) !?? anyone ?

2016-03-04 Thread Jonathan Haddad
Don't do it On Fri, Mar 4, 2016 at 8:39 AM DE VITO Dominique < dominique.dev...@thalesgroup.com> wrote: > Hi, > > Is there any info about running C* on top of a NAS storage, well, a RAIN > storage (to be precise) in fact ? > > I expect C* to run on top of a RAIN like on top of a high-end SAN: that

Re: Lot of GC on two nodes out of 7

2016-03-04 Thread Jonathan Haddad
se catered. >>> >>> · memtable_total_space_in_mb : Default (1/4 of heap size), can >>> lowered because larger long lived objects will create pressure on HEAP, so >>> its better to reduce some amount of size. >>> >>> · Concurren

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Jonathan Haddad
If you're doing 100 searches a second each machine will be serving at most 100 requests per second, not 2000. On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal wrote: > Well thats certainly true, there are these points worth discussing here : > > 1. Scatter Gather queries - Especially if the cluster

Re: [C*2.1]memtable_allocation_type: offheap_objects

2016-03-09 Thread Jonathan Haddad
Check out Al's Tuning Guide. He discusses offheap objects. https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html On Wed, Mar 9, 2016 at 1:54 AM wrote: > Hi, > > > > offheap_objects was removed in releases 3.2.x then reintroduced in > release 3.4: I vould like to know if someone ha

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
Have you considered making the date (or week, or whatever, some time component) part of your partition key? something like: create table sensordata ( sensor_id int, day date, ts datetime, reading int, primary key((sensor_id, day), ts); Then if you know you need data by a particular date range, j

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
ith the maximum or minimum > timeShard values on every write to the above table would mean pounding a > single row with updates and running SELECT DISTINCT pulls all partition > keys. > > Hopefully this is clearer. > > Again, any suggestions would be appreciated. &g

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
Oops sorry, you wrote below that the shard is what I was suggesting. I didn't fully understand the problem you had. I'll think about it a little bit and come up w/ something. On Thu, Mar 10, 2016 at 9:47 AM Jonathan Haddad wrote: > My advice was to use the date that the reading

this week in cassandra - 3.0 storage engine

2016-03-11 Thread Jonathan Haddad
All, The last month or so we've been doing weekly posting & commentary to Planet Cassandra in a "This Week in Cassandra" theme, similar to some other weekly tech blogs & podcasts. This week we had Aaron Morton & Tyler Hobbs, talking about 3.4, some upcoming Thread Per Core improvements, and the 3.

Re: Rack aware question.

2016-03-23 Thread Jonathan Haddad
Agreed with Jack. I don't think there's ever a reason to use CL=ALL in an application in production. I would only use it if I was debugging certain types of consistency problems. On Wed, Mar 23, 2016 at 4:56 PM Jack Krupansky wrote: > CL=ALL also means that you won't have HA (High Availability

Re: Client drivers

2016-03-24 Thread Jonathan Haddad
Every language has a different means of working with dependencies. Some are compiled in (java, c), some are pulled in via libraries (python). You'll have to be more specific. On Thu, Mar 24, 2016 at 8:14 AM Rakesh Kumar wrote: > Is it possible to install multiple versions of language drivers on

Re: How many nodes do we require

2016-03-25 Thread Jonathan Haddad
Why would using CL-ONE make your cluster fragile? This isn't obvious to me. It's the most practical setting for high availability, which very much says "not fragile". On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet < jacques-henri.berthe...@genesys.com> wrote: > I found this calculator ve

Re: apache cassandra for trading system

2016-03-25 Thread Jonathan Haddad
You can use keyspaces with multiple data centers to get what you want. That said, if you're going to use only 1 node, I don't think Cassandra is the right fit for you. http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/ On Fri, Mar 25, 2016 at 11:09 AM Vero Kato wrot

Re: apache cassandra for trading system

2016-03-26 Thread Jonathan Haddad
ta modeling, and >> also impact how much data you can realistically place on each node. >> >> What are your HA (High Availability) requirements? >> >> >> -- Jack Krupansky >> >> On Fri, Mar 25, 2016 at 2:40 PM, Jonathan Haddad >> wrote: >> &g

Re: How many nodes do we require

2016-03-31 Thread Jonathan Haddad
if > it was not yet replicated. > > > > *--* > > *Jacques-Henri Berthemet* > > > > *From:* Jonathan Haddad [mailto:j...@jonhaddad.com] > *Sent:* vendredi 25 mars 2016 19:37 > > > *To:* user@cassandra.apache.org > *Subject:* Re: How many nodes do we req

Re: Adding Options to Create Statements...

2016-04-01 Thread Jonathan Haddad
Because it's a community driver not provided by the Apache project. There have historically been community provided drivers in the past. See Hector, Astyanax, pycassa, etc. On Fri, Apr 1, 2016 at 10:43 AM James Carman wrote: > A, my bad. One might wonder why the heck the Java driver is "o

Re: Is it possible to achieve "sticky" request routing?

2016-04-05 Thread Jonathan Haddad
Why is this a requirement? Honestly I don't know why you would do this. On Sat, Apr 2, 2016 at 8:06 PM Mukil Kesavan wrote: > Hello, > > We currently have 3 Cassandra servers running in a single datacenter with > a replication factor of 3 for our keyspace. We also use the SimpleSnitch > wiith D

Re: Is it possible to achieve "sticky" request routing?

2016-04-05 Thread Jonathan Haddad
re running a 3 node cluster with RF=3. If your cluster > is going to grow, you can't guarantee that any one server would have all > records. I'd be pretty hesitant to put an invisible constraint like that on > a cluster unless you're pretty sure it'll only ever be 3 nodes. >

Re: Is it possible to achieve "sticky" request routing?

2016-04-05 Thread Jonathan Haddad
hat own the data for a particular token and route > requests to one of them. As I understand it, the OP wants to send requests > for a particular token to the same node every time (assuming it's > available). How does that fail in a large cluster? > > Jim > > On Tue, Apr 5, 2

Re: Cassandra table limitation

2016-04-06 Thread Jonathan Haddad
There's also the issue of lots of memtables flushing to disk during commit log rotations. Can be problematic. On Wed, Apr 6, 2016 at 2:08 PM Michael Penick wrote: > Are the tenants using the same schema? If so, you might consider using the > tenant's ID as part of the primary key for the tables

Re: Efficiently filtering results directly in CS

2016-04-07 Thread Jonathan Haddad
What is CS? On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton wrote: > I have a paging model whereby we stream data from CS by fetching 'pages' > thereby reading (sequentially) entire datasets. > > We're using the bucket approach where we write data for 5 minutes, then we > can just fetch the bucket

Re: schema change management tools

2012-10-04 Thread Jonathan Haddad
Not that I know of. I've always been really strict about dumping my schemas (to start) and keeping my changes in migration files. I don't do a ton of schema changes so I haven't had a need to really automate it. Even with MySQL I never bothered. Jon On Thu, Oct 4, 2012 at 6:27 PM, John Sanda

Re: schema change management tools

2012-10-04 Thread Jonathan Haddad
s already something out there. > If not though, I will be sure to post back to the list with whatever I wind > up doing. > > > On Thu, Oct 4, 2012 at 9:34 PM, Jonathan Haddad wrote: > >> Not that I know of. I've always been really strict about dumping my >> schemas (t

Re: Random slow read times in Cassandra

2017-03-17 Thread Jonathan Haddad
Probably Jvm pauses. Check your logs for long GC times. On Fri, Mar 17, 2017 at 11:51 AM Chuck Reynolds wrote: > I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has > consistently random high read times. In general most reads are under 10 > milliseconds but with in the 30 request the

Re: Purge data from repair_history table?

2017-03-20 Thread Jonathan Haddad
default_time_to_live is a convenience parameter that automatically applies a TTL to incoming data. Every field that's inserted can have a separate TTL. The TL;DR of all this is that changing default_time_to_live doesn't change any existing data retroactively. You'd have to truncate the table if

Re: question on maximum disk seeks

2017-03-21 Thread Jonathan Haddad
The partition index is never updated, as sstables are immutable. On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi wrote: > Thank you Jan & Jeff for the responses. That was really useful. > > Jan - I have one follow-up question. When the data is spread over more > than one SSTable in case of update

Re: question on maximum disk seeks

2017-03-21 Thread Jonathan Haddad
like in such case? > For the same key, we have two different records in different SSTables. How > does partition index store such information? Can it have repeated partition > keys with different disk offsets pointing to different SSTables? > > On Tue, Mar 21, 2017 at 10:09 AM, Jona

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jonathan Haddad
Hey Jerry - very happy to hear the post answered your questions. Alex wrote another great post on TWCS you might find useful, since you're using it: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html On Fri, Apr 7, 2017 at 8:20 AM Jerry Lam wrote: > Hi Jon, > > This Cassandra community

Re: too many compactions pending and compaction is slow on few tables

2017-04-07 Thread Jonathan Haddad
What version of Cassandra? How much data? How often are you reloading it? Is compaction throttled? What disks are you using? Any other load on the machine? On Fri, Apr 7, 2017 at 11:19 AM Giri P wrote: > Hi, > > we are continuously loading a table which has properties properties > compaction stra

Re: Difference between yum and git

2017-05-10 Thread Jonathan Haddad
Where are you getting Cassandra 2.2 built from yum? On Wed, May 10, 2017 at 9:54 PM Yuji Ito wrote: > Hi Joaquin, > > > Were both tests run from the same machine at close the same time? > Yes. I run the both tests within 30 min. > I retried them today. The result was the same as yesterday. > > Th

Re: Reg:- CQL SOLR Query Not gives result

2017-05-11 Thread Jonathan Haddad
This is a question for datastax support, not the Apache mailing list. Folks here are more than happy to help with open source, Apache Cassandra questions, if you've got one. On Thu, May 11, 2017 at 9:06 PM @Nandan@ wrote: > Hi , > > In my table, I am having few records and implemented SOLR for pa

Re: Unsuccessful back-up and restore with differing counts

2017-05-13 Thread Jonathan Haddad
Did you create the nodes with the same tokens? On Sat, May 13, 2017 at 8:44 AM srinivasarao daruna wrote: > Hi, > > We have a cassandra cluster built on Apache Cassandra 3.9 with 6 nodes and > RF = 3. As part of re-building the cluster, we are testing the backup and > restore strategy. > > We to

Re: Reg:- Data Modelling Concepts

2017-05-16 Thread Jonathan Haddad
I don't understand why you need to store the old value a second time. If you know that the value went from A -> B -> C, just store the new value, not the old. You can see that it changed from A->B->C without storing it twice. On Tue, May 16, 2017 at 6:36 PM @Nandan@ wrote: > The requirement is

Re: Reg:- Data Modelling Concepts

2017-05-16 Thread Jonathan Haddad
s) ); for example, if you insert the record: insert into book (name, ts, author) values ('jon talks data modeling', now(), 'jon haddad'); and then you find out that my first name is actually jonathan: insert into book (name, ts, author) values ('jon talks data modeling', no

<    1   2   3   4   5   6   >