iostat -like tool to parse 'nodetool cfstats'

2016-12-20 Thread Kevin Burton
nodetool cfstats has some valuable data but what I would like is a 1 minute delta. Similar to iostat... It's easy to parse this but has anyone done it? I want to see IO throughput and load on C* for each table. -- We’re hiring if you know of any awesome Java Devops or Linux Operations

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-04 Thread Kevin Burton
BTW. we think we tracked this down to using large partitions to implement inverted indexes. C* just doesn't do a reasonable job at all with large partitions so we're going to migrate this use case to using Elasticsearch On Wed, Aug 3, 2016 at 1:54 PM, Ben Slater

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Kevin Burton
;> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-12231 >> >> Regards, >> >> Ryan Svihla >> >> On Aug 3, 2016, at 2:58 PM, Kevin Burton <bur...@spinn3r.com> wrote: >> >> It seems these are basically impossible to track down.

Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Kevin Burton
It seems these are basically impossible to track down. https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y- has some information but their work around is to increase the transaction log. There's no way to find out WHAT client or

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
We usually use 100 per every 5 minutes.. but you're right. We might actually move this use case over to using Elasticsearch in the next couple of weeks. On Wed, Aug 3, 2016 at 11:09 AM, Jonathan Haddad wrote: > Kevin, > > "Our scheme uses large buckets of content where we

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
path seems risky at best at the moment. In any event, your best >> solution would be to find a way to make your partitions smaller (like >> 1/10th of the size). >> >> Cheers >> Ben >> <https://issues.apache.org/jira/browse/CASSANDRA-11206>

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
solution would be to find a way to make your partitions smaller (like > 1/10th of the size). > > Cheers > Ben > <https://issues.apache.org/jira/browse/CASSANDRA-11206> > > On Wed, 3 Aug 2016 at 12:35 Kevin Burton <bur...@spinn3r.com> wrote: > >> I have a

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-02 Thread Kevin Burton
/content_legacy_2016_08_02:1470154500099 (106107128 bytes) On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <bur...@spinn3r.com> wrote: > We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated > to each C* node. We're aware of the recommended 8GB limit to keep GCs low > but our

Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-02 Thread Kevin Burton
We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated to each C* node. We're aware of the recommended 8GB limit to keep GCs low but our memory has been creeping up (probably) related to this bug. Here's what we're seeing... if we do a low level of writes we think everything

Re: Are counters faster than CAS or vice versa?

2016-07-20 Thread Kevin Burton
On Wed, Jul 20, 2016 at 11:53 AM, Jeff Jirsa wrote: > Can you tolerate the value being “close, but not perfectly accurate”? If > not, don’t use a counter. > > > yeah.. agreed.. this is a problem which is something I was considering. I guess it depends on whether

Are counters faster than CAS or vice versa?

2016-07-20 Thread Kevin Burton
We ended up implementing a task/queue system which uses a global pointer. Basically the pointer just increments ... so we have thousands of tasks that just increment this one pointer. The problem is that we're seeing contention on it and not being able to write this record properly. We're just

Re: Efficiently filtering results directly in CS

2016-04-08 Thread Kevin Burton
CS? >> >> On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton <bur...@spinn3r.com> wrote: >> >>> I have a paging model whereby we stream data from CS by fetching 'pages' >>> thereby reading (sequentially) entire datasets. >>> >>> We're using th

Efficiently filtering results directly in CS

2016-04-07 Thread Kevin Burton
I have a paging model whereby we stream data from CS by fetching 'pages' thereby reading (sequentially) entire datasets. We're using the bucket approach where we write data for 5 minutes, then we can just fetch the bucket for that range. Our app now has TONS of data and we have a piece of

Faster version of 'nodetool status'

2016-02-12 Thread Kevin Burton
Is there a faster way to get the output of 'nodetool status' ? I want us to more aggressively monitor for 'nodetool status' and boxes being DN... I was thinking something like jolokia and REST but I'm not sure if there are variables exported by jolokia for nodetool status. Thoughts? -- We’re

Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-23 Thread Kevin Burton
> was a specific Jira assigned, and the antipattern doc doesn't appear to > reference this scenario. Maybe a committer can shed some more light. > > -- Jack Krupansky > > On Fri, Jan 22, 2016 at 10:29 PM, Kevin Burton <bur...@spinn3r.com> wrote: > >> I sort of ag

Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Kevin Burton
, Jonathan Haddad <j...@jonhaddad.com> wrote: > Instead of using ZK, why not solve your concurrency problem by removing > it? By that, I mean simply have 1 process that creates all your tables > instead of creating a race condition intentionally? > > On Fri, Jan 22, 2016 at 6:16

automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Kevin Burton
Not sure if this is a bug or not or kind of a *fuzzy* area. In 2.0 this worked fine. We have a bunch of automated scripts that go through and create tables... one per day. at midnight UTC our entire CQL went offline.. .took down our whole app. ;-/ The resolution was a full CQL shut down and

Strategy / order for upgradesstables during rolling upgrade.

2016-01-21 Thread Kevin Burton
I think there are two strategies to upgradesstables after a release. We're doing a 2.0 to 2.1 upgrade (been procrastinating here). I think we can go with B below... Would you agree? Strategy A: - foreach server - upgrade to 2.1 - nodetool upgradesstables Strategy B: -

Re: Using cassandra a BLOB store / web cache.

2016-01-20 Thread Kevin Burton
There's also the 'support' issue.. C* is hard enough as it is... maybe you can bring in another system like ES or HDFS but the more you bring in the more your complexity REALLY goes through the roof. Better to keep things simple. I really like the chunking idea for C*... seems like an easy way

Re: Using cassandra a BLOB store / web cache.

2016-01-19 Thread Kevin Burton
com> wrote: > On Mon, Jan 18, 2016 at 6:52 PM, Kevin Burton <bur...@spinn3r.com> wrote: > >> Internally we have the need for a blob store for web content. It's >> MOSTLY key, ,value based but we'd like to have lookups by coarse grained >> tags. >> > > I kn

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
this would resolve this problem. IF anyone else thinks this is an issue I'll create a JIRA. On Mon, Oct 19, 2015 at 3:38 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Mon, Oct 19, 2015 at 9:30 AM, Kevin Burton <bur...@spinn3r.com> wrote: > >> I think the point I was trying t

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
logy, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the w

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-18 Thread Kevin Burton
records. > > > > From: <burtonator2...@gmail.com> on behalf of Kevin Burton > Reply-To: "user@cassandra.apache.org" > Date: Sunday, October 18, 2015 at 3:44 PM > To: "user@cassandra.apache.org" > Subject: Re: Would we have data corruption if we bootstrapp

compact/repair shouldn't compete for normal compaction resources.

2015-10-18 Thread Kevin Burton
I'm doing a big nodetool repair right now and I'm pretty sure the added overhead is impacting our performance. Shouldn't you be able to throttle repair so that normal compactions can use most of the resources? -- We’re hiring if you know of any awesome Java Devops or Linux Operations

Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-17 Thread Kevin Burton
We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new nodes) By default we have auto_boostrap = false so we just push our config to the cluster, the cassandra daemons restart, and they're not cluster members and are the only nodes in the cluster. Anyway. While I was about

Re: reiserfs - DirectoryNotEmptyException

2015-10-17 Thread Kevin Burton
My advice is to not even consider anything else or make any other changes to your architecture until you get onto a modern and maintained filesystem. VERY VERY VERY few people are deploying anything on ReiserFS so you're going to be the first group encountering any problems. On Thu, Oct 15, 2015

Post portem of a large Cassandra datacenter migration.

2015-10-09 Thread Kevin Burton
We just finished up a pretty large migration of about 30 Cassandra boxes to a new datacenter. We'll be migrating to about 60 boxes here in the next month so scalability (and being able to do so cleanly) is important. We also completed an Elasticsearch migration at the same time. The ES

Re: Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

2015-10-07 Thread Kevin Burton
t; delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the wor

Why can't nodetool status include a hostname?

2015-10-07 Thread Kevin Burton
I find it really frustrating that nodetool status doesn't include a hostname Makes it harder to track down problems. I realize it PRIMARILY uses the IP but perhaps cassandra.yml can include an optional 'hostname' parameter that can be set by the user. OR have the box itself include the hostname

Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

2015-10-07 Thread Kevin Burton
Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool cleanup, is excessive data transferred when I add the 6th node? IE do the existing nodes send more data to the 6th node? the documentation is unclear. It sounds like the biggest problem is that the existing data causes things to

Maximum node decommission // bootstrap at once.

2015-10-06 Thread Kevin Burton
We're in the middle of migrating datacenters. We're migrating from 13 nodes to 30 nodes in the new datacenter. The plan was to bootstrap the 30 nodes first, wait until they have joined. then we're going to decommission the old ones. How many nodes can we bootstrap at once? How many can we

Re: Maximum node decommission // bootstrap at once.

2015-10-06 Thread Kevin Burton
com> wrote: > On Tue, Oct 6, 2015 at 12:32 PM, Kevin Burton <bur...@spinn3r.com> wrote: > >> How many nodes can we bootstrap at once? How many can we decommission? >> > > short answer : 1 node can join or part at simultaneously > > longer answer : https://is

Re: Maximum node decommission // bootstrap at once.

2015-10-06 Thread Kevin Burton
P tuning, > > On Tue, Oct 6, 2015 at 1:29 PM, Kevin Burton <bur...@spinn3r.com> wrote: > >> I'm not sure which is faster/easier. Just joining one box at a time and >> then decommissioning or using replace_address. >> >> this stuff is alw

Re: Running Cassandra on Java 8 u60..

2015-09-27 Thread Kevin Burton
mailing list)… I think JDK9 will be the one. > > On Sep 25, 2015, at 7:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote: > > I think those were referring to Java7 and G1GC (early versions were buggy). > > Cheers, > Stefano > > > On Fri, Sep 25, 2015 at

Using inline JSON is 2-3x faster than using many columns (>20)

2015-09-26 Thread Kevin Burton
I wanted to share this with the community in the hopes that it might help someone with their schema design. I didn't get any red flags early on to limit the number of columns we use. If anything the community pushes for dynamic schema because Cassandra has super nice online ALTER TABLE. However,

Running Cassandra on Java 8 u60..

2015-09-25 Thread Kevin Burton
Any issues with running Cassandra 2.0.16 on Java 8? I remember there is long term advice on not changing the GC but not the underlying version of Java. Thoughts? -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com Location: *San

Re: Best strategy for hiring from OSS communities.

2015-09-13 Thread Kevin Burton
upport * http://sematext.com/ > > > On Thu, Aug 13, 2015 at 6:02 PM, Kevin Burton <bur...@spinn3r.com> wrote: > >> Mildly off topic but we are looking to hire someone with Cassandra >> experience.. >> >> I don’t necessarily want to spam the list though.

cassandra-stress on 3.0 with column widths benchmark.

2015-09-13 Thread Kevin Burton
I’m trying to benchmark two scenarios… 10 columns with 150 bytes each vs 150 columns with 10 bytes each. The total row “size” would be 1500 bytes (ignoring overhead). Our app uses 150 columns so I’m trying to see if packing it into a JSON structure using one column would improve performance.

Re: Cassandra 2.2 for time series

2015-09-02 Thread Kevin Burton
Check out kairosd for a time series db on Cassandra. On Aug 31, 2015 7:12 AM, "Peter Lin" wrote: > > I didn't realize they had added max and min as stock functions. > > to get the sample time. you'll probably need to write a custom function. > google for it and you'll find

Re: Practical limitations of too many columns/cells ?

2015-08-25 Thread Kevin Burton
change this, but it's good to have it on the radar. On Sun, Aug 23, 2015 at 10:31 PM Kevin Burton bur...@spinn3r.com wrote: Agreed. We’re going to run a benchmark. Just realized we grew to 144 columns. Fun. Kind of disappointing that Cassandra is so slow in this regard. Kind of defeats

Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
Is there any advantage to using say 40 columns per row vs using 2 columns (one for the pk and the other for data) and then shoving the data into a BLOB as a JSON object? To date, we’ve been just adding new columns. I profiled Cassandra and about 50% of the CPU time is spent on CPU doing

Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
: burtonator2...@gmail.com on behalf of Kevin Burton Reply-To: user@cassandra.apache.org Date: Sunday, August 23, 2015 at 1:02 PM To: user@cassandra.apache.org Subject: Practical limitations of too many columns/cells ? Is there any advantage to using say 40 columns per row vs using 2 columns (one

Store JSON as text or UTF-8 encoded blobs?

2015-08-23 Thread Kevin Burton
Hey. I’m considering migrating my DB from using multiple columns to just 2 columns, with the second one being a JSON object. Is there going to be any real difference between TEXT or UTF-8 encoded BLOB? I guess it would probably be easier to get tools like spark to parse the object as JSON if

Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
). My gist shows a ton of different examples, but they’re not scientific, and at this point they’re old versions (and performance varies version to version). - Jeff From: burtonator2...@gmail.com on behalf of Kevin Burton Reply-To: user@cassandra.apache.org Date: Sunday, August 23, 2015 at 2

Best strategy for hiring from OSS communities.

2015-08-13 Thread Kevin Burton
Mildly off topic but we are looking to hire someone with Cassandra experience.. I don’t necessarily want to spam the list though. We’d like someone from the community who contributes to Open Source, etc. Are there forums for Apache / Cassandra, etc for jobs? I couldn’t fine one. --

Re: TTLs on tables with *only* primary keys?

2015-08-05 Thread Kevin Burton
, 2015 at 9:22 PM, Kevin Burton bur...@spinn3r.com wrote: I have a table which just has primary keys. basically: create table foo ( sequence bigint, signature text, primary key( sequence, signature ) ) I need these to eventually get GCd however it doesn’t seem to work. If I

TTLs on tables with *only* primary keys?

2015-08-04 Thread Kevin Burton
I have a table which just has primary keys. basically: create table foo ( sequence bigint, signature text, primary key( sequence, signature ) ) I need these to eventually get GCd however it doesn’t seem to work. If I then run: select ttl(sequence) from foo; I get: Cannot use

Configuring the java client to retry on write failure.

2015-07-12 Thread Kevin Burton
I can’t seem to find a decent resource to really explain this… Our app seems to fail some write requests, a VERY low percentage. I’d like to retry the write requests that fail due to number of replicas not being correct.

Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
We get lots of write timeouts when we decommission a node. About 80% of them are write timeout and just about 20% of them are read timeout. We’ve tried to adjust streamthroughput (and compaction throughput) for that matter and that doesn’t resolve the issue. We’ve increased

Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
, 2015 at 2:22 PM, Kevin Burton bur...@spinn3r.com wrote: We get lots of write timeouts when we decommission a node. About 80% of them are write timeout and just about 20% of them are read timeout. We’ve tried to adjust streamthroughput (and compaction throughput) for that matter and that doesn’t

Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
WOW.. nice. you rock!! On Wed, Jul 1, 2015 at 3:18 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote: Looks like all of this is happening because we’re using CAS operations and the driver is going to SERIAL consistency level

How the heck do we repair when migrating to 3 replicas on 2.0.x ?

2015-06-11 Thread Kevin Burton
We’re running Cassandra 2.0.9 and just migrated from 2-3 replicas. We changes our consistency level to 2 during this period while we’re running a repair. but we can’t figure out what command to run to repair our data We *think* we have to run “nodetool repair -pr” on each node.. is that right?

Tracking ETA and % complete in nodetool netstats during a decommission ?

2015-05-08 Thread Kevin Burton
I’m trying to track the throughput of nodetool decommission so I can figure out how long until this box is out of service. Basically, I want a % complete, and a ETA on when the job will be done. IS this possible? Without opscenter? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA*

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-05 Thread Kevin Burton
Hi, I switched from HBase to Cassandra and try to find problem solution for timeseries analysis on top Cassandra. Depending on what you’re looking for, you might want to check out KairosDB. 0.95 beta2 just shipped yesterday as well so you have good timing. https://github.com/kairosdb/kairosdb

Re: Fastest way to map/parallel read all values in a table?

2015-02-09 Thread Kevin Burton
I had considered using spark for this but: 1. we tried to deploy spark only to find out that it was missing a number of key things we need. 2. our app needs to shut down to release threads and resources. Spark doesn’t have support for this so all the workers would have stale thread leaking

Re: High GC activity on node with 4TB on data

2015-02-08 Thread Kevin Burton
Do you have a lot of individual tables? Or lots of small compactions? I think the general consensus is that (at least for Cassandra), 8GB heaps are ideal. If you have lots of small tables it’s a known anti-pattern (I believe) because the Cassandra internals could do a better job on handling the

Fastest way to map/parallel read all values in a table?

2015-02-08 Thread Kevin Burton
What’s the fastest way to map/parallel read all values in a table? Kind of like a mini map only job. I’m doing this to compute stats across our entire corpus. What I did to begin with was use token() and then spit it into the number of splits I needed. So I just took the total key range space

Disabling the write ahead log with 2 data centers?

2015-01-23 Thread Kevin Burton
The WAL (and walls in general) impose a performance overhead. If one were to just take a machine out of the cluster, permanently, when a machine crashes, you could quickly get all the shards back up to N replicas after a node crashes. So realistically, running with a WAL is somewhat redundant.

number of replicas per data center?

2015-01-18 Thread Kevin Burton
How do people normally setup multiple data center replication in terms of number of *local* replicas? So say you have two data centers, do you have 2 local replicas, for a total of 4 replicas? Or do you have 2 in one datacenter, and 1 in another? If you only have one in a local datacenter then

Re: number of replicas per data center?

2015-01-18 Thread Kevin Burton
Ah.. six replicas. At least its super inexpensive that way (sarcasm!) On Sun, Jan 18, 2015 at 8:14 PM, Jonathan Haddad j...@jonhaddad.com wrote: Sorry, I left out RF. Yes, I prefer 3 replicas in each datacenter, and that's pretty common. On Sun Jan 18 2015 at 8:02:12 PM Kevin Burton bur

Re: number of replicas per data center?

2015-01-18 Thread Kevin Burton
. On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton bur...@spinn3r.com wrote: How do people normally setup multiple data center replication in terms of number of *local* replicas? So say you have two data centers, do you have 2 local replicas, for a total of 4 replicas? Or do you have 2 in one

Re: Not enough replica available” when consistency is ONE?

2015-01-18 Thread Kevin Burton
are quorum-based ... This kicks in whenever you do CAS operations (eg, IF NOT EXISTS). Otherwise a cluster which became network partitioned would end up being able to have two separate CAS statements which both succeeded, but which disagreed with each other. On Sun, Jan 18, 2015 at 8:02 AM, Kevin

Not enough replica available” when consistency is ONE?

2015-01-18 Thread Kevin Burton
I’m really confused here. I”m calling: acquireInsert.setConsistencyLevel( ConsistencyLevel.ONE ); but I”m still getting the exception: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency SERIAL (2 required but only 1 alive)

is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?

2015-01-01 Thread Kevin Burton
I think the two tables are the same. Correct? create table foo ( source text, target text, primary key( source, target ) ) vs create table foo ( source text, target settext, primary key( source ) ) … meaning that the first one, under the covers is represented the

Re: is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?

2015-01-01 Thread Kevin Burton
of data 2) collections and maps are loaded entirely by Cassandra for each query, whereas with clustering columns you can select a slice of columns On Thu, Jan 1, 2015 at 7:46 PM, Kevin Burton bur...@spinn3r.com wrote: I think the two tables are the same. Correct? create table foo

Re: limit vs sample for indexing a small amount of data quickly?

2014-12-31 Thread Kevin Burton
, Dec 31, 2014 at 7:09 PM, Ganelin, Ilya ilya.gane...@capitalone.com wrote: You want to use take() or takeOrdered. Sent with Good (www.good.com) -Original Message- *From: *Kevin Burton [bur...@spinn3r.com] *Sent: *Wednesday, December 31, 2014 10:02 PM Eastern Standard Time *To: *u

bootstrapping manually when auto_bootstrap=false ?

2014-12-17 Thread Kevin Burton
I’m trying to figure out the best way to bootstrap our nodes. I *think* I want our nodes to be manually bootstrapped. This way an admin has to explicitly bring up the node in the cluster and I don’t have to worry about a script accidentally provisioning new nodes. The problem is HOW do you do

Re: nodetool breaks on firewall ?

2014-12-13 Thread Kevin Burton
? On Fri, Dec 12, 2014 at 2:34 PM, Kevin Burton bur...@spinn3r.com wrote: Oh. and if I specify —host it still doesn’t work. Very weird. On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton bur...@spinn3r.com wrote: OK..I’m stracing it and it’s definitely trying to connect to 173… here’s the log line

Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
-h 10.1.1.100 On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com wrote: I have a firewall I need to bring up to keep our boxes off the Internet (obviously). The problem is that once I do nodetool doesn’t work anymore. There’s a bunch of advice on this on the Internet: http

Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
Oh. and if I specify —host it still doesn’t work. Very weird. On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton bur...@spinn3r.com wrote: OK..I’m stracing it and it’s definitely trying to connect to 173… here’s the log line below. (anonymized). the question is why.. is cassandra configured

Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
desire.Something like: nodetool status -h 10.1.1.100 On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com wrote: I have a firewall I need to bring up to keep our boxes off the Internet (obviously). The problem is that once I do nodetool doesn’t work anymore. There’s a bunch

nodetool breaks on firewall ?

2014-12-11 Thread Kevin Burton
I have a firewall I need to bring up to keep our boxes off the Internet (obviously). The problem is that once I do nodetool doesn’t work anymore. There’s a bunch of advice on this on the Internet:

does safe cassandra shutdown require disable binary?

2014-11-30 Thread Kevin Burton
I’m trying to figure out a safe way to do a rolling restart. http://devblog.michalski.im/2012/11/25/safe-cassandra-shutdown-and-restart/ It has the following command which make sense: root@cssa01:~# nodetool -h cssa01.michalski.im disablegossiproot@cssa01:~# nodetool -h cssa01.michalski.im

RAM vs SSD for real world performance?

2014-11-25 Thread Kevin Burton
The new SSDs that we have (as well as Fusion IO) in theory can saturate the gigabit ethernet port. The 4k random read and write IOs they’re doing now can easily add up quick and they’re faster than gigabit and even two gigabit. However, not all of that 4k is actually used. I suspect that on

Re: RAM vs SSD for real world performance?

2014-11-25 Thread Kevin Burton
I imagine I’d generally be happy if we were CPU bound :-) … as long as the number of transactions per second is generally reasonable. On Tue, Nov 25, 2014 at 7:35 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 5:31 PM, Kevin Burton bur...@spinn3r.com wrote: Curious what

What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Kevin Burton
I’m trying to track down some exceptions in our production cluster. I bumped up our write load and now I’m getting a non-trivial number of these exceptions. Somewhere on the order of 100 per hour. All machines have a somewhat high CPU load because they’re doing other tasks. I’m worried that

Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Kevin Burton
There is no way to mimic IF NOT EXISTS on UPDATE and it's not a bug. INSERT and UPDATE are not totally orthogonal in CQL and you should use INSERT for actual insertion and UPDATE for updates (granted, the database will not reject our query if you break this rule but it's nonetheless the way it's

IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
There’s still a lot of weirdness in CQL. For example, you can do an INSERT with an UPDATE .. .which I’m generally fine with. Kind of make sense. However, with INSERT you can do IF NOT EXISTS. … but you can’t do the same thing on UPDATE. So I foolishly wrote all my code assuming that

Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL -- If mycolumn = null should work Alas.. it doesn’t :-/ -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile

Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
| val +- 1 | new val (1 rows) On Tue, Nov 18, 2014 at 12:12 AM, Kevin Burton bur...@spinn3r.com wrote: you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL -- If mycolumn = null should work Alas.. it doesn’t :-/ -- Founder/CEO Spinn3r.com

Re: Reading the write time of each value in a set?

2014-11-16 Thread Kevin Burton
15 2014 at 12:51:55 AM DuyHai Doan doanduy...@gmail.com wrote: Why don't you use map to store write time as value and data as key? Le 15 nov. 2014 00:24, Kevin Burton bur...@spinn3r.com a écrit : I’m trying to build a histograph in CQL for various records. I’d like to keep a max of ten items

conditional batches across two tables?

2014-11-16 Thread Kevin Burton
I’m trying to have some code acquire a lock by first at performing a table mutation, and then if it wins, performing a second table insert. I don’t think this is possible with batches though. I don’t think I can say “update this table, and if you are able to set the value, and the value doesn’t

writetime of individual set members, and what happens when you add a set member a second time.

2014-11-15 Thread Kevin Burton
So I think there are some operations in CQL WRT sets/maps that aren’t supported yet or at least not very well documented. For example, you can set the TTL on individual set members, but how do you read the writetime() ? normally on a column I can just SELECT writetime(foo) from my_table; but …

Two writers appending to a set to see which one wins?

2014-11-15 Thread Kevin Burton
I have two tasks trying to each insert into a table. The only problem is that I only want one to win, and then never perform that operation again. So my idea was to use the set append support in Cassandra to attempt to append to the set and if we win, then I can perform my operation. The

Reading the write time of each value in a set?

2014-11-14 Thread Kevin Burton
I’m trying to build a histograph in CQL for various records. I’d like to keep a max of ten items or items with a TTL. but if there are too many items, I’d like to trim it so the max number of records is about 20. So if I exceed 20, I need to removed the oldest records. I’m using a set append so

OR mapping for set appends…

2014-11-13 Thread Kevin Burton
I’m trying to figure out the best way to handle things like set appends (and other CQL extensions) in traditional OR mapping. Our OR mapper does basic setFoo() .. then save() to write the record back to the database. So if foo is a Sett then I can set all members. But I want to do some appends

C* on Fusion IO

2014-11-06 Thread Kevin Burton
We’re looking at switching data centers and they’re offering pretty aggressive pricing on boxes with fusion IO cards. 2x 1.2TB Fusion IO 128GB RAM 20 cores. now.. this isn’t the typical cassandra box. Most people are running multiple nodes to scale out vs scale vertically. But these boxes are

Re: C* on Fusion IO

2014-11-06 Thread Kevin Burton
need to repair. Sent from my iPhone On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote: We’re looking at switching data centers and they’re offering pretty aggressive pricing on boxes with fusion IO cards. 2x 1.2TB Fusion IO 128GB RAM 20 cores. now.. this isn’t the typical

Re: C* on Fusion IO

2014-11-06 Thread Kevin Burton
and never leave first gear? As far as saturating the network goes, I guess that all depends on your workload, and how often you need to repair. Sent from my iPhone On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote: We’re looking at switching data centers and they’re offering pretty

Re: C* on Fusion IO

2014-11-06 Thread Kevin Burton
On Thu, Nov 6, 2014 at 2:10 PM, Christopher Brodt ch...@uberbrodt.net wrote: Yep. The trouble with FIOs is that they almost completely remove your disk throughput problems, so then you're constrained by CPU. Concurrent compactors and concurrent writes are two params that come to mind but there

Multiple SSD disks per sever? Ideal config?

2014-11-06 Thread Kevin Burton
I’m curious what people are doing with multiple SSDs per server. I think there are two main paths: - RAID 0 them… the problem here is that RAID0 is not a panacea and the drives may or may not see better IO throughput. - use N cassandra instances per box (or containers) and have one C* node

Re: Multiple SSD disks per sever? Ideal config?

2014-11-06 Thread Kevin Burton
(if have network for it) and compaction throughput if you end up with IO to spare. I generally would not recommend putting multiple C* instances on a single box. --- Chris Lohfink On Thu, Nov 6, 2014 at 5:13 PM, Kevin Burton bur...@spinn3r.com wrote: I’m curious what people are doing

How do you run integration tests for your cassandra code?

2014-10-13 Thread Kevin Burton
Curious to see if any of you have an elegant solution here. Right now I”m using cassandra unit; https://github.com/jsevellec/cassandra-unit for my integration tests. The biggest problem is that it doesn’t support shutdown. so I can’t stop or cleanup after cassandra between tests. I have

describe tables… and vertical formatting?

2014-10-12 Thread Kevin Burton
It seems annoying that I can’t get “describe tables” to vertical. maybe there’s some option I’m missing? Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts

Re: describe tables… and vertical formatting?

2014-10-12 Thread Kevin Burton
huh. That sort of works. The problem now is that there are multiple entries per table... On Sun, Oct 12, 2014 at 10:39 AM, graham sanderson gra...@vast.com wrote: select keyspace_name, columnfamily_name from system.schema_columns; ? On Oct 12, 2014, at 10:29 AM, Kevin Burton bur

is lack of full text search hurting cassandra and datastax?

2014-10-02 Thread Kevin Burton
So right now I have plenty of quality and robust full text search systems I can use. Solr cloud, elastic search. They all also have very robust UIs on top of them… kibana, banana, etc. and my alternative for cassandra is… paying for a proprietary database. Which might be fine for some parties…

Unable to query with token range.. unable to make long from ‘...'

2014-09-28 Thread Kevin Burton
I’m trying to query an entire table in parallel by splitting it up in token ranges. However, it’s not working because I get this: cqlsh:blogindex select token(hashcode), hashcode from source where token(hashcode) = 0 and token(hashcode) = 17014118346046923173168730371588410572 limit 10; Bad

Re: Unable to query with token range.. unable to make long from ‘...'

2014-09-28 Thread Kevin Burton
? On Sep 28, 2014, at 1:39 PM, Kevin Burton bur...@spinn3r.com wrote: I’m trying to query an entire table in parallel by splitting it up in token ranges. However, it’s not working because I get this: cqlsh:blogindex select token(hashcode), hashcode from source where token(hashcode) = 0 and token

Re: Unable to query with token range.. unable to make long from ‘...'

2014-09-28 Thread Kevin Burton
On Sep 28, 2014, at 5:55 PM, Kevin Burton bur...@spinn3r.com wrote: Hm.. is it 64 bits or 128 bits? I’m using Murmur3Partitioner … I can’t find any documentation on it (as usual.. ha) This says: http://www.datastax.com/docs/1.1/initialize/token_generation The tokens assigned to your nodes

paging through an entire table in chunks?

2014-09-27 Thread Kevin Burton
I need a way to do a full table scan across all of our data. Can’t I just use token() for this? This way I could split up our entire keyspace into say 1024 chunks, and then have one activemq task work with range 0, then range 1, etc… that way I can easily just map() my whole table. and since

  1   2   3   >