Introducing farsandra: A different way to integration test with c*

2014-01-22 Thread Edward Capriolo
The repo: https://github.com/edwardcapriolo/farsandra The code: Farsandra fs = new Farsandra(); fs.withVersion(2.0.4); fs.withCleanInstanceOnStart(true); fs.withInstanceName(1); fs.withCreateConfigurationFiles(true); fs.withHost(localhost);

Re: Datamodel for a highscore list

2014-01-22 Thread Edward Capriolo
It is a tricky type of problem because some ways of doing it involve iterative scans. This presentation discusses a solution for top-k: http://www.slideshare.net/planetcassandra/jonathan-halliday On Wed, Jan 22, 2014 at 12:48 PM, Colin colpcl...@gmail.com wrote: Read users score, increment,

Re: Introducing farsandra: A different way to integration test with c*

2014-01-22 Thread Edward Capriolo
of utilities that cut down on the bolierplate [1]), but I can understand that others will feel differently and more testing can only improve Cassandra. Thanks! [1] https://github.com/riptano/cassandra-dtest On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote: The repo

Re: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Edward Capriolo
If you have only ttl columns, and you never update the column I would not think you need a repair. Repair cures lost deletes. If all your writes have a ttl a lost write should not matter since the column was never written to the node and thus could never be resurected on said node. Unless i am

Re: no more zookeeper?

2014-01-28 Thread Edward Capriolo
Some people had done some custom cassandra zookeper integration back in the day. Triggers, there is some reference in the original facebook thrown over the wall to zk. No official release has ever used zk directly. Though people have suggested it. On Tue, Jan 28, 2014 at 12:08 PM, Andrey Ilinykh

Re: question about secondary index or not

2014-01-28 Thread Edward Capriolo
Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary

Re: Introducing farsandra: A different way to integration test with c*

2014-01-29 Thread Edward Capriolo
builds on ccm to provide a number of utilities that cut down on the bolierplate [1]), but I can understand that others will feel differently and more testing can only improve Cassandra. Thanks! [1] https://github.com/riptano/cassandra-dtest On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo

Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Edward Capriolo
Is this only a ByteOrderPartitioner problem? On Wed, Jan 29, 2014 at 7:34 PM, Tyler Hobbs ty...@datastax.com wrote: Ignace, Thanks for reporting this. I've been able to reproduce the issue with a unit test, so I opened https://issues.apache.org/jira/browse/CASSANDRA-6638. I'm not 100%

Re: cql IN clause question

2014-01-29 Thread Edward Capriolo
Each IN is the equivalent of a thrift get_slice(). You are saving some overhead on round trips but if you have a schema design that calls for large in clauses your may not be designing your schema correctly. On Wed, Jan 29, 2014 at 11:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: select * from

Re: question about secondary index or not

2014-01-30 Thread Edward Capriolo
AM, Edward Capriolo edlinuxg...@gmail.com wrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id

what tool will create noncql columnfamilies in cassandra 3a

2014-02-04 Thread Edward Capriolo
Cassandra 2.0.4 cli is informing me that it will no longer exist in the next major. How will users adjust the meta data of non cql column families and other cfs that do not fit into the cql model? -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.

Re: Ultra wide row anti pattern

2014-02-04 Thread Edward Capriolo
I have actually been building something similar in my space time. You can hang around and wait for it or build your own. Here is the basics. Not perfect but it will work. Create column family queue with gc_grace_period=[1 day] set queue [timeuuid()] [z+timeuuid()] = [ work do do] The producer

Re: Ultra wide row anti pattern

2014-02-04 Thread Edward Capriolo
needs for processing todo? processedlist (user, state) On Tue, Feb 4, 2014 at 7:50 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I have actually been building something similar in my space time. You can hang around and wait for it or build your own. Here is the basics. Not perfect

Re: Ultra wide row anti pattern

2014-02-04 Thread Edward Capriolo
corner cases for 2 consumers to read from the same queue. Reading and writing with QUORUM does not prevent race conditions. I believe the new CAS feature of C* 2.0 might be useful here but with the expense of reduced throughput (because of the Paxos round) On Tue, Feb 4, 2014 at 4:50 PM, Edward

Re: what tool will create noncql columnfamilies in cassandra 3a

2014-02-05 Thread Edward Capriolo
. On Tue, Feb 4, 2014 at 9:53 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Cassandra 2.0.4 cli is informing me that it will no longer exist in the next major. How will users adjust the meta data of non cql column families and other cfs that do not fit into the cql model? -- Sorry

Re: CQL flow control

2014-02-05 Thread Edward Capriolo
I agree you can not really ask your database to capacity plan for you. Cassandra does have backpressure of sorts if requests fail with TimedOutException or UnavailableException. You might be having a capacity problem. The way I would handle this is 1) prototype at scale (dark launches, similar

Re: One of my nodes is in the wrong datacenter - help!

2014-02-10 Thread Edward Capriolo
Maybe that node was just trying to tell you that it really wanted to work in a different data center :) On Mon, Feb 10, 2014 at 10:08 AM, Sholes, Joshua joshua_sho...@cable.comcast.com wrote: In case anyone was following this issue, it ended up being something that looked an awful lot like

Re: Expired column showing up

2014-02-14 Thread Edward Capriolo
You should upgrade. Cassandra 2.0.2 is not the latest version. If you still have the problem report a bug. On Fri, Feb 14, 2014 at 12:50 PM, Yogi Nerella ynerella...@gmail.comwrote: I am just learning, I don't know answer to your question, but What is the use case for TTL as 1 second? On

Re: Turn off compression (1.2.11)

2014-02-18 Thread Edward Capriolo
Personally I think having compression on by default is the wrong choice. Depending on your access patterns and row sizes the overhead of compression can create more Garbage Collection and become your bottleneck before your potentially bottleneck your disk (ssd disk) On Tue, Feb 18, 2014 at 2:23

Re: Bootstrap stuck: vnode enabled 1.2.12

2014-02-18 Thread Edward Capriolo
There is a bug where a node without schema can not bootstrap. Do you have schema? On Tue, Feb 18, 2014 at 1:29 PM, Arindam Barua aba...@247-inc.com wrote: The node is still out of the ring. Any suggestions on how to get it in will be very helpful. *From:* Arindam Barua

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
For what it is worth you schema is simple and uses compact storage. Thus you really dont need anything in cassandra 2.0 as far as i can tell. You might be happier with a stable release like 1.2.something and just hector or astyanax. You are really dealing with many issues you should not have to

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
, 2014, Edward Capriolo edlinuxg...@gmail.com wrote: For what it is worth you schema is simple and uses compact storage. Thus you really dont need anything in cassandra 2.0 as far as i can tell. You might be happier with a stable release like 1.2.something and just hector or astyanax. You are really

Re: High CPU load on one node in the cluster

2014-02-20 Thread Edward Capriolo
Upgrade from 2.0.3. There are several bugs, On Wednesday, February 19, 2014, Yogi Nerella ynerella...@gmail.com wrote: You should start your Cassandra daemon with -verbose:gc (please check syntax) and then run it in foreground, as Cassandra closes the standard out) Please see other emails in

Re: paging state will not work

2014-02-20 Thread Edward Capriolo
I would try a fetch size other then 1. Cassandras slices are start inclusive so maybe that is a bug. On Tuesday, February 18, 2014, Katsutoshi nagapad.0...@gmail.com wrote: Hi. I am using Cassandra 2.0.5 version. If null is explicitly set to a column, paging_state will not work. My test

Re: paging state will not work

2014-02-20 Thread Edward Capriolo
Cassandra has no null. So in this context setting a column to null or updating null is a delete. I think. I remember debating the semantics of null once. On Tuesday, February 18, 2014, Katsutoshi nagapad.0...@gmail.com wrote: Hi. I am using Cassandra 2.0.5 version. If null is explicitly set to

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
...@datastax.comwrote: On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo edlinuxg...@gmail.comwrote: For what it is worth you schema is simple and uses compact storage. Thus you really dont need anything in cassandra 2.0 as far as i can tell. You might be happier with a stable release like 1.2

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
, what do I know about Cassandra. -- Sylvain On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo edlinuxg...@gmail.com wrote: For what it is worth you schema is simple and uses compact storage. Thus you

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
CASSANDRA-6561 is interesting. Though having statically defined columns are not exactly a solution to do everything in thrift. http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/ Before collections or CQL existed I did some of these concepts myself. Say you have a

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
wrote: On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo edlinuxg...@gmail.com wrote: For what it is worth you schema is simple and uses compact storage. Thus you really dont need anything in cassandra 2.0 as far as i can tell. You might be happier with a stable release like 1.2.something

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
, then NewSql is a better fit, plus gives users the full power of SQL with subqueries, like, and joins. NewSql can't handle these kinds of use cases due to static nature of relational tables, row size limit and column limit. On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo edlinuxg...@gmail.com wrote

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
On Thursday, February 20, 2014, Robert Coli rc...@eventbrite.com wrote: On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne sylv...@datastax.com wrote: Of course, if everyone was using that reasoning, no-one would ever test new features and report problems/suggest improvement. So thanks to anyone

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Thursday, February 20, 2014, Robert Coli rc...@eventbrite.com wrote: On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne sylv...@datastax.com wrote: Of course, if everyone was using that reasoning, no-one would ever test new features

Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Edward Capriolo
The main issue is that cassandra has two of everything. Two access apis, two meta data systems, and two groups of users. Those groups of users using the original systems thrift, cfmetadata, and following the advice of three years ago have been labled obsolete (did you ever see that twighlight

Re: Queuing System

2014-02-23 Thread Edward Capriolo
Once schema I was thinking of was this: max.row.age: The producer will not write to this row after max row age, even if not full (e.g. 10 minutes) max.number.of.messages: Most messages that will ever be written to a row max.expire.time: TTL every column is written with set

Re: [OT]: Can I have a non-delivering subscription?

2014-02-24 Thread Edward Capriolo
You can setup the mail to deliver one per day as well. On Saturday, February 22, 2014, Robert Wille rwi...@fold3.com wrote: Yeah, it¹s called a rule. Set one up to delete everything from user@cassandra.apache.org. On 2/22/14, 10:32 AM, Paul LeoNerd Evans leon...@leonerd.org.uk wrote: A

Re: Naive question about orphan rows

2014-02-26 Thread Edward Capriolo
It is probably ok to have redundant songs in playlists, cassandra is about denormalization. Dealing with this issue is going to be hard since the only way to dwal with this would be scanning through the firsr cf and procing counts then using that information to delete in the second table. However

Re: Naive question about orphan rows

2014-02-26 Thread Edward Capriolo
it look like it will create a lot overhead (scanning and producing counts). John *From:* Edward Capriolo [mailto:edlinuxg...@gmail.com] *Sent:* Wednesday, February 26, 2014 5:56 AM *To:* user@cassandra.apache.org *Subject:* Re: Naive question about orphan rows It is probably ok to have

Re: Naive question about orphan rows

2014-02-26 Thread Edward Capriolo
. If the number of people with that song is now 0 the song can be removed. On Wed, Feb 26, 2014 at 11:17 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Right the problem with building a list of counts in a batch is what happens if song added as you are building the counts. On Wed, Feb 26, 2014 at 10

Re: OOM while performing major compaction

2014-02-27 Thread Edward Capriolo
One big downside about major compaction is that (depending on your cassandra version) the bloom filters size is pre-calculated. Thus cassandra needs enough heap for your existing 33 k+ sstables and the new large compacted one. In the past this happened to us when the compaction thread got hung up,

Re: Question regarding java DowngradingConsistencyRetryPolicy

2014-03-05 Thread Edward Capriolo
INSERT INTO x (a,b.c) values (1,2,3) Doesn't this sometimes turn into a batch mutation if b and c are separate columns? On Wed, Mar 5, 2014 at 5:03 AM, Sylvain Lebresne sylv...@datastax.comwrote: Let me first note that the DataStax Java driver has a dedicated mailing list:

Re: Datastax C++ driver on Windows x64

2014-03-05 Thread Edward Capriolo
The vast majority of Java code should be portable. Reiterating should be. It sounds like what we need is CCM via ssh. On Wed, Mar 5, 2014 at 8:07 PM, Green, John M (HP Education) john.gr...@hp.com wrote: Just to clarify, do recommend not running Cassandra on Windows or not using the client

Re: about trigger execution ??? // RE: sending notifications through data replication on remote clusters

2014-03-10 Thread Edward Capriolo
Just so you know you should probably apply the [jira] [Commented] (*CASSANDRA*-6790) *Triggers* are broken in trunk

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
The biggest expense of them is that you need to be authenticated to a keyspace to perform and operation. Thus connection pools are bound to keyspaces. Switching a keyspace is an RPC operation. In the thrift client, If you have 100 keyspaces you need 100 connection pools that starts to be a pain

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
to multiple keyspaces. If you do want to do one of those things, then go for it, make multiple keyspaces. -Jeremiah On Mar 11, 2014, at 10:17 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I am not sure. As stated the only benefit of multiple keyspaces is if you need: 1) different

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
a keyspace and so long as you include the keyspace in all queries instead of just table name, it works fine. In that case, I assume there's only one connection pool for all keyspaces. From: Edward Capriolo edlinuxg...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
method for this small use case. Otherwise I would take that up. On Tue, Mar 11, 2014 at 12:07 PM, Peter Lin wool...@gmail.com wrote: if I have time this summer, I may work on that, since I like having thrift. On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo edlinuxg...@gmail.comwrote

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
Peter, My advice. Do not bother. I have become very active recently in attempting to add features to thrift. I had 4 open tickets I was actively working on. (I even found two bugs in the Cassandra in the process). People were aware of this and still called this vote. Several commit people have

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
the time to do it. On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Peter, My advice. Do not bother. I have become very active recently in attempting to add features to thrift. I had 4 open tickets I was actively working on. (I even found two bugs

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
and continue to enhance the Thrift interface, you would also want to add a bunch of relational features to CQL as part of that same fork? On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo edlinuxg...@gmail.comwrote: one of the things I'd like to see happen is for Cassandra to support queries

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
Cassandra to preserve and continue to enhance the Thrift interface, you would also want to add a bunch of relational features to CQL as part of that same fork? On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo edlinuxg...@gmail.comwrote: one of the things I'd like to see happen is for Cassandra

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
times in database land! On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Peter, Solr is deeply integrated into DSE. Seemingly this can not efficiently be done client side (CQL/Thrift whatever) but the Solandra approach was to embed Solr in Cassandra. I think

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
for the kinds of transactions they handle. I'm bias, these kinds of features are useful and good addition to cassandra. These are interesting times in database land! On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Peter, Solr is deeply integrated into DSE. Seemingly

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
This brainstorming idea has already been -1 ed in jira. ROFL. On Wed, Mar 12, 2014 at 12:26 PM, Tupshin Harper tups...@tupshin.comwrote: OK, so I'm greatly encouraged by the level of interest in this. I went ahead and created https://issues.apache.org/jira/browse/CASSANDRA-6846, and will be

Re:

2014-03-12 Thread Edward Capriolo
That is too much ram for cassandra make that 6g to 10g. The uneven perf could be because your requests do not shard evenly. On Wednesday, March 12, 2014, Batranut Bogdan batra...@yahoo.com wrote: Hello all, The environment: I have a 6 node Cassandra cluster. On each node I have: - 32 G RAM

Re: select query returns wrong value if use DESC option

2014-03-13 Thread Edward Capriolo
Consider filing a jira. Cql is the standard interface to cassandra everything is heavily tested. On Thursday, March 13, 2014, Katsutoshi Nagaoka nagapad.0...@gmail.com wrote: Hi. I am using Cassandra 2.0.6 version. There is a case that select query returns wrong value if use DESC option. My

Re: [RELEASE] Apache Cassandra 3.10 released

2017-02-03 Thread Edward Capriolo
On Fri, Feb 3, 2017 at 6:52 PM, Michael Shuler wrote: > The Cassandra team is pleased to announce the release of Apache > Cassandra version 3.10. > > Apache Cassandra is a fully distributed database. It is the right choice > when you need scalability and high availability

Re: Trouble implementing CAS operation with LWT query

2017-02-22 Thread Edward Capriolo
On Wed, Feb 22, 2017 at 8:42 AM, 안정아 wrote: > Hi, all > > > > I'm trying to implement a typical CAS operation with LWT query(conditional > update). > > But I'm having trouble keeping integrity of the result when > WriteTimeoutException occurs. > > according to

Re: How does cassandra achieve Linearizability?

2017-02-22 Thread Edward Capriolo
atching is of limited usefulness because we only use Paxos for CAS I > think? So in a batch by definition all but one will fail the CAS. This is > something where a distinguished coordinator could help by failing the rest > of the contending requests more inexpensively than it currently does. > > &

Re: cassandra user request log

2017-02-20 Thread Edward Capriolo
Not directly. Consider proxing request through an application server and log at that level. On Friday, February 10, 2017, Benjamin Roth wrote: > If you want to audit write operations only, you could maybe use CDC, this > is a quite new feature in 3.x (I think it was

Re: Count(*) is not working

2017-02-20 Thread Edward Capriolo
Seems worth it to file a bug since some here are under the impression it almost always works and others are under the impression it almost never works. On Friday, February 17, 2017, kurt greaves wrote: > really... well that's good to know. it still almost never works

Re: High disk io read load

2017-02-19 Thread Edward Capriolo
On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth wrote: > We are talking about a read IO increase of over 2000% with 512 tokens > compared to 256 tokens. 100% increase would be linear which would be > perfect. 200% would even okay, taking the RAM/Load ratio for caching

Pluggable throttling of read and write queries

2017-02-20 Thread Edward Capriolo
Older versions had a request scheduler api. On Monday, February 20, 2017, Ben Slater > wrote: > We’ve actually had several customers where we’ve done the opposite - split > large clusters apart to separate

Re: How does cassandra achieve Linearizability?

2017-02-16 Thread Edward Capriolo
On Thu, Feb 16, 2017 at 4:33 PM, Ariel Weisberg wrote: > Hi, > > Classic Paxos doesn't have a leader. There are variants on the original > Lamport approach that will elect a leader (or some other variation like > Mencius) to improve throughput, latency, and performance under

Re: Pluggable throttling of read and write queries

2017-02-22 Thread Edward Capriolo
it's better to split a large cluster into smallers >> except if you also manage client layer that query cass and you can put some >> backpressure or rate limit in it. >> > We have an internal storage API layer that some of the clients use, but > there are many customers who

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
key for every table you > > ever create is low. You will regret BOP until the end of time. > > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo <edlinuxg...@gmail.com > > <mailto:edlinuxg...@gmail.com>> wrote: > > > > Probably best to avoid bop even if you are

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
On Sat, Feb 11, 2017 at 10:54 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > The odds of only using a sha1 as your partition key for every table you > ever create is low. You will regret BOP until the end of time. > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo <edl

Re: Composite partition key token

2017-02-09 Thread Edward Capriolo
On Thu, Feb 9, 2017 at 9:26 AM, Michael Burman wrote: > Hi, > > How about taking it from the BoundStatement directly? > > ByteBuffer routingKey = b.getRoutingKey(ProtocolVersion.NEWEST_SUPPORTED, > codecRegistry); > Token token = metadata.newToken(routingKey); > > In this

Re: High disk io read load

2017-02-16 Thread Edward Capriolo
On Thu, Feb 16, 2017 at 12:38 AM, Benjamin Roth <benjamin.r...@jaumo.com> wrote: > It doesn't really look like that: > https://cl.ly/2c3Z1u2k0u2I > > Thats the ReadLatency.count metric aggregated by host which represents the > actual read operations, correct? > > 2017-0

Re: High disk io read load

2017-02-15 Thread Edward Capriolo
I think it has more than double the load. It is double the data. More read repair chances. More load can swing it's way during node failures etc. On Wednesday, February 15, 2017, Benjamin Roth wrote: > Hi there, > > Following situation in cluster with 10 nodes: > Node

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
Probably best to avoid bop even if you are aflready hashing keys yourself. What do you do when checksuma collide? It is possible right? On Saturday, February 11, 2017, Micha wrote: > Hi, > > my table has a sha-1 sum as partition key. Would in this case the > ByteOrdered

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-02-13 Thread Edward Capriolo
wrote: >> >> Saw this one today... >> >> https://news.ycombinator.com/item?id=13624062 >> >> On Tue, Jan 3, 2017 at 6:27 AM, Eric Evans <john.eric.ev...@gmail.com >> <javascript:_e(%7B%7D,'cvml','john.eric.ev...@gmail.com');>> wrote: >> &

Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Edward Capriolo
On Tue, Jan 17, 2017 at 11:47 AM, Mike Torra wrote: > Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is > indeed what we use today. > > Caching the resulting 'sorted sets' in redis is exactly what I plan to do. > There will be tens of thousands of

Re: Is periodic manual repair necessary?

2017-02-27 Thread Edward Capriolo
There are 4 anti entropy systems in cassandra. Hinted handoff Read repair Commit logs Repair commamd All are basically best effort. Commit logs get corrupt and only flush periodically. Bits rot on disk and while crossing networks network Read repair is async and only happens randomly Hinted

Re: Partition size

2016-09-12 Thread Edward Capriolo
In US english it is also debatable over which words are profane. https://simple.wikipedia.org/wiki/Profanity Different words can be profanity to different people, and what words are thought of as profanity in English can change over time. Suggestion: https://www.youtube.com/watch?v=L0MK7qz13bU

Re: Reproducing exception in cassandra for testing failover scenarios

2016-09-24 Thread Edward Capriolo
You can also look at ccmbridge and farsandra. Here is an example of bringing up an 8 node 3 datacenter cluster in a single unit test using farsandra. https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/ThreeDcTest.java On Sat, Sep 24, 2016 at 3:53 PM, Jonathan Haddad

Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread Edward Capriolo
Truncate does a few things (based on version) truncate takes snapshots truncate causes a flush in very old versions truncate causes a schema migration. In newer versions like cassandra 3.4 you have this knob. # How long the coordinator should wait for truncates to complete # (This can be

Re: Running Cassandra in Integration Tests

2016-10-06 Thread Edward Capriolo
Checkout https://github.com/edwardcapriolo/farsandra. It falls under the realm of almost 100% pure java (besides the fact it uses some shell to launch Cassandra). On Thu, Oct 6, 2016 at 7:08 PM, Ali Akhtar wrote: > Is it possible to create an isolated cassandra instance

Re: Upgrading from Cassandra 2.1.12 to 3.0.9

2016-09-23 Thread Edward Capriolo
To me clear about the mixed versions. You do not want to do it. Especially if the versions are very far apart. Typically you can not run repair in mixed versions. You can not do schema changes with mixed versions. Data files from new versions are not readable from old versions. Basically you

Re: Lightweight tx is good enough to handle counter?

2016-09-23 Thread Edward Capriolo
This might help you: https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/CompareAndSwapTest.java It counts using lwt's with multiple threads. On Fri, Sep 23, 2016 at 2:31 PM, Jaydeep Chovatia < chovatia.jayd...@gmail.com> wrote: > Since SERIAL consistency is not supported for

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
needs to be renamed. > > Any relational db could (and I'm sure one does!) allow for sparse fields > as well. MySQL can be backed by rocksdb now, does that make it not a row > store? > > You're arguing that everything is wrong but you're not proposing an > alternative, which is

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
>> IBM at the top. >> >> Saying the docs suck isn't an indictment of anyone, it's just the reality >> of writing good documentation. >> >> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <j...@jonhaddad.com> >> wrote: >> >>> N

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
I was thinking about this issue. I was wondering on the dev side if it would make sense to make a utility for the unit tests that could enable tracing and then assert that a number of steps in the trace happened. Something like: setup() runQuery("SELECT * FROM X")

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
Since the feature is off by default. The coverage might could be only as deep as the specific tests that test it. On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa wrote: > Seems like it’s probably worth opening a jira issue to track it (either to > confirm it’s a bug, or

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Edward Capriolo
I undertook a similar effort a while ago. https://issues.apache.org/jira/browse/CASSANDRA-7014 Other than the fact that it was closed with no comments, I can tell you that other efforts I had to embed things in Cassandra did not go swimmingly. Although at the time ideas were rejected like groovy

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
Then: Physically: A data store which physically structured-log-merge of SSTables (see) https://cloud.google.com/bigtable/. Now: One of the change made in Apache Cassandra 3.0 is a relatively important refactor of the storage engine . I say

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
I can iterate over JSON data stored in mongo and present it as a table with rows and columns. It does not make mongo a rowstore. On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > The problem with calling it a row store: > > https://en.wikipedi

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
The problem with calling it a row store: https://en.wikipedia.org/wiki/Row_(database) In the context of a relational database , a *row*—also called a record or tuple

Re: Cassandra data model right definition

2016-10-01 Thread Edward Capriolo
case where the user is attempting to write and deleting 1 row and 1 column 6 billion times a day. Then you end up explaining to them http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached and how the cassandra storage model is not "like a relational

Re: Way to write to dc1 but keep data only in dc2

2016-09-29 Thread Edward Capriolo
You can do something like this, though your use of terminology like "queue" really do not apply. You can setup your keyspace with replication in only one data center. CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc2' : 3 }; This will make the

Re: Keyspace/CF creation Timeouts

2016-10-25 Thread Edward Capriolo
I do not believe the ConsistencyLevel matters for schema changes. In recent versions request_timeout_in_ms has been replaced by N variables which allow different timeout values for different types of operations. You seem to have both a lot of keyspaces and column families. It seems likely that

Re: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

2016-10-26 Thread Edward Capriolo
I would suggest you look some existing work http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html and attempt to re-create those scenarios and methodologies for failing nodes and seeing the performance impact. This would yield faster and more easily verifiable results

Re: how to get the size of the particular partition key belonging to an sstable ??

2016-10-28 Thread Edward Capriolo
There are actually multiple tickets for different size functions. Examples include computing size of collections, number of rows, and physical sizes server side. I also have a patch to make the warn and info settable at runtime. https://issues.apache.org/jira/browse/CASSANDRA-12661?filter=-1 It

Re: Tools to manage repairs

2016-10-28 Thread Edward Capriolo
On Fri, Oct 28, 2016 at 11:21 AM, Vincent Rischmann <m...@vrischmann.me> wrote: > Doesn't paging help with this ? Also if we select a range via the cluster > key we're never really selecting the full partition. Or is that wrong ? > > > On Fri, Oct 28, 2016, at 05:00 PM,

Re: Thousands of SSTables generated in only one node

2016-10-25 Thread Edward Capriolo
I have not read the entire thread so sorry if this is already mentioned. You should review your logs, a potential problem could be a corrupted sstable. In a situation like this you will notice that the system is repeatedly trying to compact a given sstable. The compaction fails and based on the

Re: Question on write failures logs show Uncaught exception on thread Thread[MutationStage-1,5,main]

2016-10-24 Thread Edward Capriolo
The driver will enforce a max batch size of 65k. This is an issue in versions of cassandra like 2.1.X. There are control variables for the logged and unlogged batch size. You may also have to tweak your commitlog size as well. I demonstrate this here:

Re: Schema Changes

2016-11-15 Thread Edward Capriolo
You can start here: https://issues.apache.org/jira/browse/CASSANDRA-10699 And here: http://stackoverflow.com/questions/20293897/cassandra-resolution-of-concurrent-schema-changes In a nutshell, schema changes works best when issued serially, when all nodes are up, and reachable. When these 3

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Edward Capriolo
Here is a solution that I have leverage. Ignore the count of the value and use a multi-part column name as it's value. For example: create column family stuff ( rowkey string, column string, value string. counter_to_ignore long, primary key( rowkey, column, value)); On Tue, Nov 1, 2016 at

Re: Cassandra failure during read query at consistency QUORUM (2 responses were required but only 0 replica responded, 2 failed)

2016-10-28 Thread Edward Capriolo
This looks like another case of an assert bubbling through try catch that don't catch assert On Fri, Oct 28, 2016 at 6:30 AM, Denis Mikhaylov wrote: > Hi! > > We’re running Cassandra 3.9 > > On the application side I see failed reads with this exception >

Re: Handle Leap Seconds with Cassandra

2016-10-27 Thread Edward Capriolo
Following https://issues.apache.org/jira/browse/CASSANDRA-9131. It is very interesting to track how the timestamp has moved from the user, to the server, then back to the user quasi the driver. Next we will be accounting for the earths slowing rotation as the ice caps melt :)

Re: How does the "batch" commit log sync works

2016-10-27 Thread Edward Capriolo
I mentioned during my Cassandra.yaml presentation at the summit that I never saw anyone use these settings. Things off by default are typically not highly not covered well by tests. It sounds like it is not working. Quick suggestion: go back in time maybe to a version like 1.2.X or 0.7 and see if

<    2   3   4   5   6   7   8   >