Re: [EXTERNAL] Re: Connection Pooling in v4.x Java Driver

2019-12-11 Thread Caravaggio, Kevin
Hi Alexandre,


Thank you for the explanation. I understand that reasoning very well now.

Jon, appreciate the link, and will follow up there for this sort of thing then.


Thanks,


Kevin
From: Alexandre Dutra 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, December 11, 2019 at 3:33 AM
To: "user@cassandra.apache.org" 
Subject: [EXTERNAL] Re: Connection Pooling in v4.x Java Driver


*EXTERNAL SENDER*

Hi,

In driver 4.x, pools do not resize dynamically anymore because the ratio 
between concrete benefits brought by this feature and the maintenance burden it 
caused was largely unfavorable: most bugs related to connection pooling in 
driver 3.x were caused by the dynamic pool resizing. Having a fixed pool size 
made driver 4.x pool implementation much more straightforward and robust.

Thanks,

Alex Dutra


On Tue, Dec 10, 2019 at 7:13 PM Jon Haddad 
mailto:j...@jonhaddad.com>> wrote:
I'm not sure how closely the driver maintainers are following this list. You 
might want to ask on the Java Driver mailing list: 
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__groups.google.com_a_lists.datastax.com_forum_-23-21forum_java-2Ddriver-2Duser%26d%3DDwMFaQ%26c%3Dadz96Xi0w1RHqtPMowiL2g%26r%3D-4a_H5oaBvlvMCJS3nMc0tpTTKGn9aXRaK9-aUPruik%26m%3DJx5gtMMrBVRkCsov03DMaauB_eEaFXAH27tt5P92ocs%26s%3D33hNmSDIOrCXZpQkgaLrhmG1_PiiZtq6A9cU21_QvYg%26e%3D=02%7C01%7Ckevin.caravaggio%40lowes.com%7C12db624b9feb427e36d708d77e2ddee1%7Cbcfa3e87841e48c7983b584159dd1a69%7C0%7C0%7C637116607802385179=%2FEf1yJvaSBqFymYKqxYW9TN7wxjWqsQB5YWikmemMEk%3D=0>




On Tue, Dec 10, 2019 at 5:10 PM Caravaggio, Kevin 
mailto:kevin.caravag...@lowes.com>> wrote:
Hello,


When integrating with DataStax OSS Cassandra Java driver v4.x, I noticed 
“Unlike previous versions of the driver, pools do not resize 
dynamically”<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.datastax.com%2Fen%2Fdeveloper%2Fjava-driver%2F4.2%2Fmanual%2Fcore%2Fpooling%2F%23configuration=02%7C01%7Ckevin.caravaggio%40lowes.com%7C12db624b9feb427e36d708d77e2ddee1%7Cbcfa3e87841e48c7983b584159dd1a69%7C0%7C1%7C637116607802395172=LcEwc6uu4ZqajzYG0fUTuRNLzzldH48vfT0ae7ARkco%3D=0>
 in reference to the connection pool configuration. Is anyone aware of the 
reasoning for this departure from dynamic pool sizing, which I believe was 
available in v3.x?


Thanks,


Kevin


NOTICE: All information in and attached to the e-mails below may be 
proprietary, confidential, privileged and otherwise protected from improper or 
erroneous disclosure. If you are not the sender's intended recipient, you are 
not authorized to intercept, read, print, retain, copy, forward, or disseminate 
this message. If you have erroneously received this communication, please 
notify the sender immediately by phone (704-758-1000) or by e-mail and destroy 
all copies of this message electronic, paper, or otherwise. By transmitting 
documents via this email: Users, Customers, Suppliers and Vendors collectively 
acknowledge and agree the transmittal of information via email is voluntary, is 
offered as a convenience, and is not a secured method of communication; Not to 
transmit any payment information E.G. credit card, debit card, checking 
account, wire transfer information, passwords, or sensitive and personal 
information E.G. Driver's license, DOB, social security, or any other 
information the user wishes to remain confidential; To transmit only 
non-confidential information such as plans, pictures and drawings and to assume 
all risk and liability for and indemnify Lowe's from any claims, losses or 
damages that may arise from the transmittal of documents or including 
non-confidential information in the body of an email transmittal. Thank you.


--

Alexandre Dutra  |  Technical Manager, Drivers

alexandre.du...@datastax.com<mailto:alexandre.du...@datastax.com>  |  
datastax.com<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.datastax.com%2F=02%7C01%7Ckevin.caravaggio%40lowes.com%7C12db624b9feb427e36d708d77e2ddee1%7Cbcfa3e87841e48c7983b584159dd1a69%7C0%7C1%7C637116607802395172=eCb2ZtUikUUxrzbrkaLxXxtKg5g8tEksCWhhF6PKfcA%3D=0>

[Image removed by 
sender.]<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Falexandredutra%2F=02%7C01%7Ckevin.caravaggio%40lowes.com%7C12db624b9feb427e36d708d77e2ddee1%7Cbcfa3e87841e48c7983b584159dd1a69%7C0%7C1%7C637116607802405166=VarX7YEU%2Br6eH3mtC56K6ZhaeQWn7Yb6Sh5yEPtCPkQ%3D=0>[Image
 removed by 
sender.]<https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2Fdatastax=02%7C01%7Ckevin.caravaggio%40lowes.com%7C12db624b9feb427e36d708d77e2ddee1%7Cbcfa3e87841e48c7983b584159dd1a69%7C0%7C0%7C637116607802405166=y9r%2B2fgD6B16sBx5Cz%2

Connection Pooling in v4.x Java Driver

2019-12-10 Thread Caravaggio, Kevin
Hello,


When integrating with DataStax OSS Cassandra Java driver v4.x, I noticed 
“Unlike previous versions of the driver, pools do not resize 
dynamically”<https://docs.datastax.com/en/developer/java-driver/4.2/manual/core/pooling/#configuration>
 in reference to the connection pool configuration. Is anyone aware of the 
reasoning for this departure from dynamic pool sizing, which I believe was 
available in v3.x?


Thanks,


Kevin


NOTICE: All information in and attached to the e-mails below may be 
proprietary, confidential, privileged and otherwise protected from improper or 
erroneous disclosure. If you are not the sender's intended recipient, you are 
not authorized to intercept, read, print, retain, copy, forward, or disseminate 
this message. If you have erroneously received this communication, please 
notify the sender immediately by phone (704-758-1000) or by e-mail and destroy 
all copies of this message electronic, paper, or otherwise. By transmitting 
documents via this email: Users, Customers, Suppliers and Vendors collectively 
acknowledge and agree the transmittal of information via email is voluntary, is 
offered as a convenience, and is not a secured method of communication; Not to 
transmit any payment information E.G. credit card, debit card, checking 
account, wire transfer information, passwords, or sensitive and personal 
information E.G. Driver's license, DOB, social security, or any other 
information the user wishes to remain confidential; To transmit only 
non-confidential information such as plans, pictures and drawings and to assume 
all risk and liability for and indemnify Lowe's from any claims, losses or 
damages that may arise from the transmittal of documents or including 
non-confidential information in the body of an email transmittal. Thank you.


Re: Adding a new node with the double of disk space

2017-08-17 Thread Kevin O'Connor
Are you saying if a node had double the hardware capacity in every way it
would be a bad idea to up num_tokens? I thought that was the whole idea of
that setting though?

On Thu, Aug 17, 2017 at 9:52 AM, Carlos Rolo  wrote:

> No.
>
> If you would double all the hardware on that node vs the others would
> still be a bad idea.
> Keep the cluster uniform vnodes wise.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> *linkedin.com/in/carlosjuzarterolo
> *
> Mobile: +351 918 918 100 <+351%20918%20918%20100>
> www.pythian.com
>
> On Thu, Aug 17, 2017 at 5:47 PM, Cogumelos Maravilha <
> cogumelosmaravi...@sapo.pt> wrote:
>
>> Hi all,
>>
>> I need to add a new node to my cluster but this time the new node will
>> have the double of disk space comparing to the other nodes.
>>
>> I'm using the default vnodes (num_tokens: 256). To fully use the disk
>> space in the new node I just have to configure num_tokens: 512?
>>
>> Thanks in advance.
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
> --
>
>
>
>


unsubscribe

2017-07-17 Thread kevin
unsubscribe





Re: Truncate data from a single node

2017-07-12 Thread Kevin O'Connor
Thanks for the suggestions! Could altering the RF from 2 to 1 cause any
issues, or will it basically just be changing the coordinator's write paths
and also guiding future repairs/cleans?

On Wed, Jul 12, 2017 at 22:29 Jeff Jirsa <jji...@apache.org> wrote:

>
>
> On 2017-07-11 20:09 (-0700), "Kevin O'Connor" <ke...@reddit.com.INVALID>
> wrote:
> > This might be an interesting question - but is there a way to truncate
> data
> > from just a single node or two as a test instead of truncating from the
> > entire cluster? We have time series data we don't really care if we're
> > missing gaps in, but it's taking up a huge amount of space and we're
> > looking to clear some. I'm worried if we run a truncate on this huge CF
> > it'll end up locking up the cluster, but I don't care so much if it just
> > kills a single node.
> >
>
> IF YOU CAN TOLERATE DATA INCONSISTENCIES, You can stop a node, delete some
> sstables, and start it again. The risk in deleting arbitrary sstables is
> that you may remove a tombstone and bring data back to life, or remove the
> only replica with a write if you write at CL:ONE, but if you're OK with
> data going missing, you won't hurt much as long as you stop cassandra
> before you go killing sstables.
>
> TWCS does make this easier, because you can use sstablemetadata to
> identify timestamps/tombstone %s, and then nuke sstables that are
> old/mostly-expired first.
>
>
> > Is doing something like deleting SSTables from disk possible? If I alter
> > this keyspace from an RF of 2 down to 1 and then delete them, they won't
> be
> > able to be repaired if I'm thinking this through right.
> >
>
> If you drop RF from 2 to 1, you can just run cleanup and delete half the
> data (though it'll rewrite sstables to do it, which will be a short term
> increase).
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Truncate data from a single node

2017-07-11 Thread Kevin O'Connor
This might be an interesting question - but is there a way to truncate data
from just a single node or two as a test instead of truncating from the
entire cluster? We have time series data we don't really care if we're
missing gaps in, but it's taking up a huge amount of space and we're
looking to clear some. I'm worried if we run a truncate on this huge CF
it'll end up locking up the cluster, but I don't care so much if it just
kills a single node.

Is doing something like deleting SSTables from disk possible? If I alter
this keyspace from an RF of 2 down to 1 and then delete them, they won't be
able to be repaired if I'm thinking this through right.

Thanks!


Re: How to avoid flush if the data can fit into memtable

2017-05-31 Thread Kevin O'Connor
Great post Akhil! Thanks for explaining that.

On Mon, May 29, 2017 at 5:43 PM, Akhil Mehra  wrote:

> Hi Preetika,
>
> After thinking about your scenario I believe your small SSTable size might
> be due to data compression. By default, all tables enable SSTable
> compression.
>
> Let go through your scenario. Let's say you have allocated 4GB to your
> Cassandra node. Your *memtable_heap_space_in_mb* and
>
> *memtable_offheap_space_in_mb  *will roughly come to around 1GB. Since
> you have memtable_cleanup_threshold to .50 table cleanup will be
> triggered when total allocated memtable space exceeds 1/2GB. Note the
> cleanup threshold is .50 of 1GB and not a combination of heap and off heap
> space. This memtable allocation size is the total amount available for all
> tables on your node. This includes all system related keyspaces. The
> cleanup process will write the largest memtable to disk.
>
> For your case, I am assuming that you are on a *single node with only one
> table with insert activity*. I do not think the commit log will trigger a
> flush in this circumstance as by default the commit log has 8192 MB of
> space unless the commit log is placed on a very small disk.
>
> I am assuming your table on disk is smaller than 500MB because of
> compression. You can disable compression on your table and see if this
> helps get the desired size.
>
> I have written up a blog post explaining memtable flushing (
> http://abiasforaction.net/apache-cassandra-memtable-flush/)
>
> Let me know if you have any other question.
>
> I hope this helps.
>
> Regards,
> Akhil Mehra
>
>
> On Fri, May 26, 2017 at 6:58 AM, preetika tyagi 
> wrote:
>
>> I agree that for such a small data, Cassandra is obviously not needed.
>> However, this is purely an experimental setup by using which I'm trying to
>> understand how and exactly when memtable flush is triggered. As I mentioned
>> in my post, I read the documentation and tweaked the parameters accordingly
>> so that I never hit memtable flush but it is still doing that. As far the
>> the setup is concerned, I'm just using 1 node and running Cassandra using
>> "cassandra -R" option and then running some queries to insert some dummy
>> data.
>>
>> I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
>> and add "durable_writes=false" in the keyspace_definition.
>>
>> @Daemeon - The previous post lead to this post but since I was unaware of
>> memtable flush and I assumed memtable flush wasn't happening, the previous
>> post was about something else (throughput/latency etc.). This post is
>> explicitly about exactly when memtable is being dumped to the disk. Didn't
>> want to confuse two different goals that's why posted a new one.
>>
>> On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:
>>
>>> It doesn't have to fit in memory. If your key distribution has strong
>>> temporal locality, then a larger memtable that can coalesce overwrites
>>> greatly reduces the disk I/O load for the memtable flush and subsequent
>>> compactions. Of course, I have no idea if the is what the OP had in mind.
>>>
>>>
>>> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>>>
>>> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
>>> after waking up.
>>>
>>> What I'm asking is why does the OP want to keep his data in the memtable
>>> exclusively?  If the goal is to "make reads fast", then just turn on row
>>> caching.
>>>
>>> If there's so little data that it fits in memory (300MB), and there
>>> aren't going to be any writes past the initial small dataset, why use
>>> Cassandra?  It sounds like the wrong tool for this job.  Sounds like
>>> something that could easily be stored in S3 and loaded in memory when the
>>> app is fired up.
>>>
>>> On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:
>>>
 Not sure whether you're asking me or the original poster, but the more
 times data gets overwritten in a memtable, the less it has to be compacted
 later on (and even without overwrites, larger memtables result in less
 compaction).

 On 05/25/2017 05:59 PM, Jonathan Haddad wrote:

 Why do you think keeping your data in the memtable is a what you need
 to do?
 On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:

> Then it doesn't have to (it still may, for other reasons).
>
> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>
> What if the commit log is disabled?
>
> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>
>> Cassandra has to flush the memtable occasionally, or the commit log
>> grows without bounds.
>>
>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>
>> Hi,
>>
>> I'm running Cassandra with a very small dataset so that the data can
>> exist on memtable only. Below are my configurations:
>>
>> In jvm.options:
>>
>> 

Fail to add a new node to a exist cluster

2017-05-03 Thread kevin
I have a Cassandra(v3.7) cluster with 31 nodes, each node’s hard configuration 
is 4cpu, 64GB memory, 8TB hard disk, and each node has stored about 4TB of data 
now. When I joined a new node, I found that the process has not been completed 
for more than a week, while the CPU load of new node and some other nodes 
continued to be high, and finally had to give up join. Is it a new node to join 
the process itself is very slow, or our way of use (too much data per node) 
improper and cause this problem? Is there any good way to speed up the process 
of adding new nodes?
Thanks,
Kevin








iostat -like tool to parse 'nodetool cfstats'

2016-12-20 Thread Kevin Burton
nodetool cfstats has some valuable data but what I would like is a 1 minute
delta.

Similar to iostat...

It's easy to parse this but has anyone done it?

I want to see IO throughput and load on C* for each table.

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: STCS Compaction with wide rows & TTL'd data

2016-09-02 Thread Kevin O'Connor
On Fri, Sep 2, 2016 at 9:33 AM, Mark Rose <markr...@markrose.ca> wrote:

> Hi Kevin,
>
> The tombstones will live in an sstable until it gets compacted. Do you
> have a lot of pending compactions? If so, increasing the number of
> parallel compactors may help.


Nope, we are pretty well managed on compactions. Only ever 1 or 2 running
at a time per node.


> You may also be able to tun the STCS
> parameters. Here's a good explanation of how it works:
> https://shrikantbang.wordpress.com/2014/04/22/size-
> tiered-compaction-strategy-in-apache-cassandra/


Yeah interesting - I'd like to try that. Is there a way to verify what the
settings are before changing them? DESCRIBE TABLE doesn't seem to show the
compaction subproperties.


> Anyway, LCS would probably be a better fit for your use case. LCS
> would help with eliminating tombstones, but it may also result in
> dramatically higher CPU usage for compaction. If LCS compaction can
> keep up, in addition to getting ride of tombstones faster, LCS should
> reduce the number of sstables that must be read to return the row and
> have a positive impact on read latency. STCS is a bad fit for rows
> that are updated frequently (which includes rows with TTL'ed data).
>

Thanks - that may end up being where we go with this.

Also, you may have an error in your application design. OAuth Access
> Tokens are designed to have a very short lifetime of seconds or
> minutes. On access token expiry, a Refresh Token should be used to get
> a new access token. A long-lived access token is a dangerous thing as
> there is no way to disable it (refresh tokens should be disabled to
> prevent the creation of new access tokens).
>

Yeah, noted. We only allow longer lived access tokens in some very specific
scenarios, so they are much less likely to be in that CF than the standard
3600s ones, but they're there.


>
> -Mark
>
> On Thu, Sep 1, 2016 at 3:53 AM, Kevin O'Connor <ke...@reddit.com> wrote:
> > We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken
> and
> > one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
> > key, and the columns are some data about the OAuth token. There's a TTL
> set
> > on it, usually 3600, but can be higher (up to 1 month).
> > OAuth2AccessTokensByUser has the user as the row key, and then all of the
> > user's token identifiers as column values. Each of the column values has
> a
> > TTL that is set to the same as the access token it corresponds to.
> >
> > The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
> > OAuth2AccessTokensByUser CF takes around ~110 GB. If I use
> sstablemetadata,
> > I can see the droppable tombstones ratio is around 90% for the larger
> > sstables.
> >
> > My question is - why aren't these tombstones getting compacted away? I'm
> > guessing that it's because we use STCS and the large sstables that have
> > built up over time are never considered for compaction. Would LCS be a
> > better fit for the issue of trying to keep the tombstones in check?
> >
> > I've also tried forceUserDefinedCompaction via JMX on some of the largest
> > sstables and it just creates a new sstable of the exact same size, which
> was
> > pretty surprising. Why would this explicit request to compact an sstable
> not
> > remove tombstones?
> >
> > Thanks!
> >
> > Kevin
>


STCS Compaction with wide rows & TTL'd data

2016-09-01 Thread Kevin O'Connor
We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken and
one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
key, and the columns are some data about the OAuth token. There's a TTL set
on it, usually 3600, but can be higher (up to 1 month).
OAuth2AccessTokensByUser has the user as the row key, and then all of the
user's token identifiers as column values. Each of the column values has a
TTL that is set to the same as the access token it corresponds to.

The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
I can see the droppable tombstones ratio is around 90% for the larger
sstables.

My question is - why aren't these tombstones getting compacted away? I'm
guessing that it's because we use STCS and the large sstables that have
built up over time are never considered for compaction. Would LCS be a
better fit for the issue of trying to keep the tombstones in check?

I've also tried forceUserDefinedCompaction via JMX on some of the largest
sstables and it just creates a new sstable of the exact same size, which
was pretty surprising. Why would this explicit request to compact an
sstable not remove tombstones?

Thanks!

Kevin


Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-04 Thread Kevin Burton
BTW. we think we tracked this down to using large partitions to implement
inverted indexes.  C* just doesn't do a reasonable job at all with large
partitions so we're going to migrate this use case to using Elasticsearch

On Wed, Aug 3, 2016 at 1:54 PM, Ben Slater 
wrote:

> Yep,  that was what I was referring to.
>
>
> On Thu, 4 Aug 2016 2:24 am Reynald Bourtembourg <
> reynald.bourtembo...@esrf.fr> wrote:
>
>> Hi,
>>
>> Maybe Ben was referring to this issue which has been mentioned recently
>> on this mailing list:
>> https://issues.apache.org/jira/browse/CASSANDRA-11887
>>
>> Cheers,
>> Reynald
>>
>>
>> On 03/08/2016 18:09, Romain Hardouin wrote:
>>
>> > Curious why the 2.2 to 3.x upgrade path is risky at best.
>> I guess that upgrade from 2.2 is less tested by DataStax QA because DSE4
>> used C* 2.1, not 2.2.
>> I would say the safest upgrade is 2.1 to 3.0.x
>>
>> Best,
>>
>> Romain
>>
>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>



-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Kevin Burton
Yes.. Logging it is far far far far better.

I think a lot of devs don't have experience working in actual production
environments.  YES the client should probably handle it, but WHICH client.
This is why you log things.  Log the statement that was aborted (at least
the first 100 bytes),

On Wed, Aug 3, 2016 at 2:30 PM, Ryan Svihla <r...@foundev.pro> wrote:

> Where I see this a lot is:
>
> 1. DBA notices it in logs
> 2. Everyone says code works fine no errors
> 3. Weeks of combing all apps find out 3 teams are doing fire and forget
> futures...
> 4. Convince each team they really need to handle futures
> 5. Couple months before you figure out who was the culprit by the time he
> deploys hit production.
>
> Would save everyone a ton of brain cells if we just logged it.
>
> Regards,
>
> Ryan Svihla
>
> On Aug 3, 2016, at 4:21 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
> I haven't verified, so i'm not 100% certain, but I believe you'd get back
> an exception to the client.  Yes, this belongs in the DB, but I don't think
> you're totally blind to what went wrong.
>
> My guess is this exception in the Python driver (but other drivers should
> have a similar exception):
> https://github.com/datastax/python-driver/blob/master/cassandra/protocol.py#L288
>
> On Wed, Aug 3, 2016 at 1:59 PM Ryan Svihla <r...@foundev.pro> wrote:
>
>> Made a Jira about it already
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-12231
>>
>> Regards,
>>
>> Ryan Svihla
>>
>> On Aug 3, 2016, at 2:58 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>>
>> It seems these are basically impossible to track down.
>>
>>
>> https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-
>>
>> has some information but their work around is to increase the transaction
>> log.  There's no way to find out WHAT client or what CQL is causing the
>> large mutation.
>>
>> Any thoughts on how to mitigate this?
>>
>> Kevin
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Kevin Burton
It seems these are basically impossible to track down.

https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-

has some information but their work around is to increase the transaction
log.  There's no way to find out WHAT client or what CQL is causing the
large mutation.

Any thoughts on how to mitigate this?

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
We usually use 100 per every 5 minutes.. but you're right.  We might
actually move this use case over to using Elasticsearch in the next couple
of weeks.

On Wed, Aug 3, 2016 at 11:09 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Kevin,
>
> "Our scheme uses large buckets of content where we write to a
> bucket/partition for 5 minutes, then move to a new one."
>
> Are you writing to a single partition and only that partition for 5
> minutes?  If so, you should really rethink your data model.  This method
> does not scale as you add nodes, it can only scale vertically.
>
> On Wed, Aug 3, 2016 at 9:24 AM Reynald Bourtembourg <
> reynald.bourtembo...@esrf.fr> wrote:
>
>> Hi,
>>
>> Maybe Ben was referring to this issue which has been mentioned recently
>> on this mailing list:
>> https://issues.apache.org/jira/browse/CASSANDRA-11887
>>
>> Cheers,
>> Reynald
>>
>>
>> On 03/08/2016 18:09, Romain Hardouin wrote:
>>
>> > Curious why the 2.2 to 3.x upgrade path is risky at best.
>> I guess that upgrade from 2.2 is less tested by DataStax QA because DSE4
>> used C* 2.1, not 2.2.
>> I would say the safest upgrade is 2.1 to 3.0.x
>>
>> Best,
>>
>> Romain
>>
>>
>>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
DuyHai.  Yes.  We're generally happy with our disk throughput.  We're on
all SSD and have about 60 boxes.  The amount of data written isn't THAT
much.  Maybe 5GB max... but its over 60 boxes.



On Wed, Aug 3, 2016 at 3:49 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> On a side node, do you monitor your disk I/O to see whether the disk
> bandwidth can catch up with the huge spikes in write ? Use dstat during the
> insert storm to see if you have big values for CPU wait
>
> On Wed, Aug 3, 2016 at 12:41 PM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>> Yes, looks like you have a (at least one) 100MB partition which is big
>> enough to cause issues. When you do lots of writes to the large partition
>> it is likely to end up getting compacted (as per the log) and compactions
>> often use a lot of memory / cause a lot of GC when they hit large
>> partitions. This, in addition to the write load is probably pushing you
>> over the edge.
>>
>> There are some improvements in 3.6 that might help (
>> https://issues.apache.org/jira/browse/CASSANDRA-11206) but the 2.2 to
>> 3.x upgrade path seems risky at best at the moment. In any event, your best
>> solution would be to find a way to make your partitions smaller (like
>> 1/10th of the size).
>>
>> Cheers
>> Ben
>> <https://issues.apache.org/jira/browse/CASSANDRA-11206>
>>
>> On Wed, 3 Aug 2016 at 12:35 Kevin Burton <bur...@spinn3r.com> wrote:
>>
>>> I have a theory as to what I think is happening here.
>>>
>>> There is a correlation between the massive content all at once, and our
>>> outags.
>>>
>>> Our scheme uses large buckets of content where we write to a
>>> bucket/partition for 5 minutes, then move to a new one.  This way we can
>>> page through buckets.
>>>
>>> I think what's happening is that CS is reading the entire partition into
>>> memory, then slicing through it... which would explain why its running out
>>> of memory.
>>>
>>> system.log:WARN  [CompactionExecutor:294] 2016-08-03 02:01:55,659
>>> BigTableWriter.java:184 - Writing large partition
>>> blogindex/content_legacy_2016_08_02:1470154500099 (106107128 bytes)
>>>
>>> On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>>>
>>>> We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM
>>>> allocated to each C* node.  We're aware of the recommended 8GB limit to
>>>> keep GCs low but our memory has been creeping up (probably) related to this
>>>> bug.
>>>>
>>>> Here's what we're seeing... if we do a low level of writes we think
>>>> everything generally looks good.
>>>>
>>>> What happens is that we then need to catch up and then do a TON of
>>>> writes all in a small time window.  Then CS nodes start dropping like
>>>> flies.  Some of them just GC frequently and are able to recover. When they
>>>> GC like this we see GC pause in the 30 second range which then cause them
>>>> to not gossip for a while and they drop out of the cluster.
>>>>
>>>> This happens as a flurry around the cluster so we're not always able to
>>>> catch which ones are doing it as they recover. However, if we have 3 down,
>>>> we mostly have a locked up cluster.  Writes don't complete and our app
>>>> essentially locks up.
>>>>
>>>> SOME of the boxes never recover. I'm in this state now.  We have t3-5
>>>> nodes that are in GC storms which they won't recover from.
>>>>
>>>> I reconfigured the GC settings to enable jstat.
>>>>
>>>> I was able to catch it while it was happening:
>>>>
>>>> ^Croot@util0067 ~ # sudo -u cassandra jstat -gcutil 4235 2500
>>>>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT
>>>> GCT
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 282

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
Curious why the 2.2 to 3.x upgrade path is risky at best. Do you mean that
this is just for OUR use case since we're having some issues or that the
upgrade path is risky in general?

On Wed, Aug 3, 2016 at 3:41 AM, Ben Slater <ben.sla...@instaclustr.com>
wrote:

> Yes, looks like you have a (at least one) 100MB partition which is big
> enough to cause issues. When you do lots of writes to the large partition
> it is likely to end up getting compacted (as per the log) and compactions
> often use a lot of memory / cause a lot of GC when they hit large
> partitions. This, in addition to the write load is probably pushing you
> over the edge.
>
> There are some improvements in 3.6 that might help (
> https://issues.apache.org/jira/browse/CASSANDRA-11206) but the 2.2 to 3.x
> upgrade path seems risky at best at the moment. In any event, your best
> solution would be to find a way to make your partitions smaller (like
> 1/10th of the size).
>
> Cheers
> Ben
> <https://issues.apache.org/jira/browse/CASSANDRA-11206>
>
> On Wed, 3 Aug 2016 at 12:35 Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I have a theory as to what I think is happening here.
>>
>> There is a correlation between the massive content all at once, and our
>> outags.
>>
>> Our scheme uses large buckets of content where we write to a
>> bucket/partition for 5 minutes, then move to a new one.  This way we can
>> page through buckets.
>>
>> I think what's happening is that CS is reading the entire partition into
>> memory, then slicing through it... which would explain why its running out
>> of memory.
>>
>> system.log:WARN  [CompactionExecutor:294] 2016-08-03 02:01:55,659
>> BigTableWriter.java:184 - Writing large partition
>> blogindex/content_legacy_2016_08_02:1470154500099 (106107128 bytes)
>>
>> On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>>
>>> We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM
>>> allocated to each C* node.  We're aware of the recommended 8GB limit to
>>> keep GCs low but our memory has been creeping up (probably) related to this
>>> bug.
>>>
>>> Here's what we're seeing... if we do a low level of writes we think
>>> everything generally looks good.
>>>
>>> What happens is that we then need to catch up and then do a TON of
>>> writes all in a small time window.  Then CS nodes start dropping like
>>> flies.  Some of them just GC frequently and are able to recover. When they
>>> GC like this we see GC pause in the 30 second range which then cause them
>>> to not gossip for a while and they drop out of the cluster.
>>>
>>> This happens as a flurry around the cluster so we're not always able to
>>> catch which ones are doing it as they recover. However, if we have 3 down,
>>> we mostly have a locked up cluster.  Writes don't complete and our app
>>> essentially locks up.
>>>
>>> SOME of the boxes never recover. I'm in this state now.  We have t3-5
>>> nodes that are in GC storms which they won't recover from.
>>>
>>> I reconfigured the GC settings to enable jstat.
>>>
>>> I was able to catch it while it was happening:
>>>
>>> ^Croot@util0067 ~ # sudo -u cassandra jstat -gcutil 4235 2500
>>>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT
>>>   GCT
>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
>>> 2825.332
>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
>>> 2825.332
>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
>>> 2825.332
>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
>>> 2825.332
>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
>>> 2825.332
>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
>>> 2825.332
>>>
>>> ... as you can see the box is legitimately out of memory.  S0, S1, E and
>>> O are all completely full.
>>>
>>> I'm not sure were to go from here.  I think 20GB for our work load is
>>> more than reasonable.
>>>
>>> 90% of the time they're well below 10GB of RAM used.  While I was
>>> watching this box I was seeing 30% RAM used until it decided to climb to
>>> 100%
>>>
>>> Any advice on what do do next... I don't see anything obvious in the
>>> logs to signal a problem.
>>>
>>> 

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-02 Thread Kevin Burton
I have a theory as to what I think is happening here.

There is a correlation between the massive content all at once, and our
outags.

Our scheme uses large buckets of content where we write to a
bucket/partition for 5 minutes, then move to a new one.  This way we can
page through buckets.

I think what's happening is that CS is reading the entire partition into
memory, then slicing through it... which would explain why its running out
of memory.

system.log:WARN  [CompactionExecutor:294] 2016-08-03 02:01:55,659
BigTableWriter.java:184 - Writing large partition
blogindex/content_legacy_2016_08_02:1470154500099 (106107128 bytes)

On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated
> to each C* node.  We're aware of the recommended 8GB limit to keep GCs low
> but our memory has been creeping up (probably) related to this bug.
>
> Here's what we're seeing... if we do a low level of writes we think
> everything generally looks good.
>
> What happens is that we then need to catch up and then do a TON of writes
> all in a small time window.  Then CS nodes start dropping like flies.  Some
> of them just GC frequently and are able to recover. When they GC like this
> we see GC pause in the 30 second range which then cause them to not gossip
> for a while and they drop out of the cluster.
>
> This happens as a flurry around the cluster so we're not always able to
> catch which ones are doing it as they recover. However, if we have 3 down,
> we mostly have a locked up cluster.  Writes don't complete and our app
> essentially locks up.
>
> SOME of the boxes never recover. I'm in this state now.  We have t3-5
> nodes that are in GC storms which they won't recover from.
>
> I reconfigured the GC settings to enable jstat.
>
> I was able to catch it while it was happening:
>
> ^Croot@util0067 ~ # sudo -u cassandra jstat -gcutil 4235 2500
>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT
> GCT
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
> 2825.332
>
> ... as you can see the box is legitimately out of memory.  S0, S1, E and O
> are all completely full.
>
> I'm not sure were to go from here.  I think 20GB for our work load is more
> than reasonable.
>
> 90% of the time they're well below 10GB of RAM used.  While I was watching
> this box I was seeing 30% RAM used until it decided to climb to 100%
>
> Any advice on what do do next... I don't see anything obvious in the logs
> to signal a problem.
>
> I attached all the command line arguments we use.  Note that I think that
> the cassandra-env.sh script puts them in there twice.
>
> -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
> -XX:+CMSClassUnloadingEnabled
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -Xms2M
> -Xmx2M
> -Xmn4096M
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss256k
> -XX:StringTableSize=103
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> -XX:CompileCommandFile=/hotspot_compiler
> -XX:CMSWaitDuration=1
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
> -XX:CMSWaitDuration=1
> -XX:+UseCondCardMark
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> -XX:PrintFLSStatistics=1
> -Xloggc:/var/log/cassandra/gc.log
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10M
> -Djava.net.preferIPv4Stack=true
> -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.rmi.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
> -XX:+UnlockCommercialFeatures
> -XX:+FlightRecorder
> -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
> -XX:+CMSClassUnloadingEnabled
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -Xms2M
> -Xmx2M
> -Xmn4096M
> -XX:+HeapDumpOnO

Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-02 Thread Kevin Burton
We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated
to each C* node.  We're aware of the recommended 8GB limit to keep GCs low
but our memory has been creeping up (probably) related to this bug.

Here's what we're seeing... if we do a low level of writes we think
everything generally looks good.

What happens is that we then need to catch up and then do a TON of writes
all in a small time window.  Then CS nodes start dropping like flies.  Some
of them just GC frequently and are able to recover. When they GC like this
we see GC pause in the 30 second range which then cause them to not gossip
for a while and they drop out of the cluster.

This happens as a flurry around the cluster so we're not always able to
catch which ones are doing it as they recover. However, if we have 3 down,
we mostly have a locked up cluster.  Writes don't complete and our app
essentially locks up.

SOME of the boxes never recover. I'm in this state now.  We have t3-5 nodes
that are in GC storms which they won't recover from.

I reconfigured the GC settings to enable jstat.

I was able to catch it while it was happening:

^Croot@util0067 ~ # sudo -u cassandra jstat -gcutil 4235 2500
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT
  0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
2825.332
  0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
2825.332
  0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
2825.332
  0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
2825.332
  0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
2825.332
  0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471 1139.142
2825.332

... as you can see the box is legitimately out of memory.  S0, S1, E and O
are all completely full.

I'm not sure were to go from here.  I think 20GB for our work load is more
than reasonable.

90% of the time they're well below 10GB of RAM used.  While I was watching
this box I was seeing 30% RAM used until it decided to climb to 100%

Any advice on what do do next... I don't see anything obvious in the logs
to signal a problem.

I attached all the command line arguments we use.  Note that I think that
the cassandra-env.sh script puts them in there twice.

-ea
-javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
-XX:+CMSClassUnloadingEnabled
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms2M
-Xmx2M
-Xmn4096M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:StringTableSize=103
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
-XX:CompileCommandFile=/hotspot_compiler
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:CMSWaitDuration=1
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.rmi.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Djava.library.path=/usr/share/cassandra/lib/sigar-bin
-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
-ea
-javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
-XX:+CMSClassUnloadingEnabled
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms2M
-Xmx2M
-Xmn4096M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:StringTableSize=103
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
-XX:CompileCommandFile=/etc/cassandra/hotspot_compiler
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:CMSWaitDuration=1
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.rmi.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Djava.library.path=/usr/share/cassandra/lib/sigar-bin
-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
-Dlogback.configurationFile=logback.xml
-Dcassandra.logdir=/var/log/cassandra
-Dcassandra.storagedir=
-Dcassandra-pidfile=/var/run/cassandra/cassandra.pid


-- 

We’re hiring if you know of any awesome 

Re: Are counters faster than CAS or vice versa?

2016-07-20 Thread Kevin Burton
On Wed, Jul 20, 2016 at 11:53 AM, Jeff Jirsa 
wrote:

> Can you tolerate the value being “close, but not perfectly accurate”? If
> not, don’t use a counter.
>
>
>

yeah.. agreed.. this is a problem which is something I was considering.  I
guess it depends on whether they are 10x faster..

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Are counters faster than CAS or vice versa?

2016-07-20 Thread Kevin Burton
We ended up implementing a task/queue system which uses a global pointer.

Basically the pointer just increments ... so we have thousands of tasks
that just increment this one pointer.

The problem is that we're seeing contention on it and not being able to
write this record properly.

We're just doing a CAS operation now to read the existing value, then
increment it.

I think it might have been better to implement this as a counter.  Would
that be inherently faster or would a CAS be about the same?

I can't really test it without deploying it so I figured I would just ask
here first.

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Open source equivalents of OpsCenter

2016-07-13 Thread Kevin O'Connor
Now that OpsCenter doesn't work with open source installs, are there any
runs at an open source equivalent? I'd be more interested in looking at
metrics of a running cluster and doing other tasks like managing
repairs/rolling restarts more so than historical data.


Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

2016-04-12 Thread Kevin O'Connor
Are you in VPC or EC2 Classic? Are you using enhanced networking?

On Tue, Apr 12, 2016 at 9:52 AM, Alessandro Pieri  wrote:

> Hi Jack,
>
> As mentioned before I've used m3.xlarge instance types together with two
> ephemeral disks in raid 0 and, according to Amazon, they have "high"
> network performance.
>
> I ran many tests starting with a brand-new cluster every time and I got
> consistent results.
>
> I believe there's something that I cannot explain yet with the client used
> by cassandra-stress to connect to the nodes, I'd like to understand why
> there is such a big difference:
>
> Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th
> percentile: 38.14ms
> Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms
>
> Hope you can help to figure it out.
>
> Cheers,
> Alessandro
>
>
>
>
> On Tue, Apr 12, 2016 at 5:43 PM, Jack Krupansky 
> wrote:
>
>> Which instance type are you using? Some may be throttled for EBS access,
>> so you could bump into a rate limit, and who knows what AWS will do at that
>> point.
>>
>> -- Jack Krupansky
>>
>> On Tue, Apr 12, 2016 at 6:02 AM, Alessandro Pieri <
>> alessan...@getstream.io> wrote:
>>
>>> Thanks Chris for your reply.
>>>
>>> I ran the tests 3 times for 20 minutes/each and I monitored the network
>>> latency in the meanwhile, it was very low (even the 99th percentile).
>>>
>>> I didn't notice any cpu spike caused by the GC but, as you pointed out,
>>> I will look into the GC log, just to be sure.
>>>
>>> In order to avoid the problem you mentioned with EBS and to keep the
>>> deviation under control I used two ephemeral disks in raid 0.
>>>
>>> I think the odd results come from the way cassandra-stress deals with
>>> multiple nodes. As soon as possible I will go through the Java code to get
>>> some more detail.
>>>
>>> If you have something else in your mind please let me know, your
>>> comments were really appreciated.
>>>
>>> Cheers,
>>> Alessandro
>>>
>>>
>>> On Mon, Apr 11, 2016 at 4:15 PM, Chris Lohfink 
>>> wrote:
>>>
 Where do you get the ~1ms latency between AZs? Comparing a short term
 average to a 99th percentile isn't very fair.

 "Over the last month, the median is 2.09 ms, 90th percentile is
 20ms, 99th percentile is 47ms." - per
 https://www.quora.com/What-are-typical-ping-times-between-different-EC2-availability-zones-within-the-same-region

 Are you using EBS? That would further impact latency on reads and GCs
 will always cause hiccups in the 99th+.

 Chris


 On Mon, Apr 11, 2016 at 7:57 AM, Alessandro Pieri 
 wrote:

> Hi everyone,
>
> Last week I ran some tests to estimate the latency overhead introduces
> in a Cassandra cluster by a multi availability zones setup on AWS EC2.
>
> I started a Cassandra cluster of 6 nodes deployed on 3 different AZs
> (2 nodes/AZ).
>
> Then, I used cassandra-stress to create an INSERT (write) test of 20M
> entries with a replication factor = 3, right after, I ran cassandra-stress
> again to READ 10M entries.
>
> Well, I got the following unexpected result:
>
> Single-AZ, CL=ONE -> median/95th percentile/99th percentile:
> 1.06ms/7.41ms/55.81ms
> Multi-AZ, CL=ONE -> median/95th percentile/99th percentile:
> 1.16ms/38.14ms/47.75ms
>
> Basically, switching to the multi-AZ setup the latency increased of
> ~30ms. That's too much considering the the average network latency between
> AZs on AWS is ~1ms.
>
> Since I couldn't find anything to explain those results, I decided to
> run the cassandra-stress specifying only a single node entry (i.e. 
> "--nodes
> node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and
> surprisingly the latency went back to 5.9 ms.
>
> Trying to recap:
>
> Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" ->
> 95th percentile: 38.14ms
> Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms
>
> For the sake of completeness I've ran a further test using a
> consistency level = LOCAL_QUORUM and the test did not show any large
> variance with using a single node or multiple ones.
>
> Do you guys know what could be the reason?
>
> The test were executed on a m3.xlarge (network optimized) using the
> DataStax AMI 2.6.3 running Cassandra v2.0.15.
>
> Thank you in advance for your help.
>
> Cheers,
> Alessandro
>


>>>
>>>
>>> --
>>> *Alessandro Pieri*
>>> *Software Architect @ Stream.io Inc*
>>> e-Mail: alessan...@getstream.io - twitter: sirio7g
>>> 
>>>
>>>
>>
>


Re: Efficiently filtering results directly in CS

2016-04-08 Thread Kevin Burton
Ha..  Yes... C*...  I guess I need something like coprocessors in bigtable.


On Fri, Apr 8, 2016 at 1:49 AM, vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:

> c* I suppose
>
> 2016-04-07 19:30 GMT+02:00 Jonathan Haddad <j...@jonhaddad.com>:
>
>> What is CS?
>>
>> On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton <bur...@spinn3r.com> wrote:
>>
>>> I have a paging model whereby we stream data from CS by fetching 'pages'
>>> thereby reading (sequentially) entire datasets.
>>>
>>> We're using the bucket approach where we write data for 5 minutes, then
>>> we can just fetch the bucket for that range.
>>>
>>> Our app now has TONS of data and we have a piece of middleware that
>>> filters it based on the client requests.
>>>
>>> So if they only want english they just get english and filter away about
>>> 60% of our data.
>>>
>>> but it doesn't support condition pushdown.  So ALL this data has to be
>>> sent from our CS boxes to our middleware and filtered there (wasting a lot
>>> of network IO).
>>>
>>> Is there away (including refactoring the code) that I could push this
>>> this into CS?  Maybe some way I could discovery the CS topology and put
>>> daemons on each of our CS boxes and fetch from CS directly (doing the
>>> filtering there).
>>>
>>> Thoughts?
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>>
>>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Efficiently filtering results directly in CS

2016-04-07 Thread Kevin Burton
I have a paging model whereby we stream data from CS by fetching 'pages'
thereby reading (sequentially) entire datasets.

We're using the bucket approach where we write data for 5 minutes, then we
can just fetch the bucket for that range.

Our app now has TONS of data and we have a piece of middleware that filters
it based on the client requests.

So if they only want english they just get english and filter away about
60% of our data.

but it doesn't support condition pushdown.  So ALL this data has to be sent
from our CS boxes to our middleware and filtered there (wasting a lot of
network IO).

Is there away (including refactoring the code) that I could push this this
into CS?  Maybe some way I could discovery the CS topology and put daemons
on each of our CS boxes and fetch from CS directly (doing the filtering
there).

Thoughts?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



[ANNOUNCE] YCSB 0.7.0 Release

2016-02-26 Thread Kevin Risden
On behalf of the development community, I am pleased to announce the
release of YCSB 0.7.0.

Highlights:

* GemFire binding replaced with Apache Geode (incubating) binding
* Apache Solr binding was added
* OrientDB binding improvements
* HBase Kerberos support and use single connection
* Accumulo improvements
* JDBC improvements
* Couchbase scan implementation
* MongoDB improvements
* Elasticsearch version increase to 2.1.1

Full release notes, including links to source and convenience binaries:
https://github.com/brianfrankcooper/YCSB/releases/tag/0.7.0

This release covers changes from the last 1 month.


Faster version of 'nodetool status'

2016-02-12 Thread Kevin Burton
Is there a faster way to get the output of 'nodetool status' ?

I want us to more aggressively monitor for 'nodetool status' and boxes
being DN...

I was thinking something like jolokia and REST but I'm not sure if there
are variables exported by jolokia for nodetool status.

Thoughts?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-23 Thread Kevin Burton
Once the CREATE TABLE returns in cqlsh (or programatically) is it safe to
assume it's on all nodes at that point?

If not I'll have to put in even more logic to handle this case..

On Fri, Jan 22, 2016 at 9:22 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> I recall that there was some discussion last year about this issue of how
> risky it is to do an automated CREATE TABLE IF NOT EXISTS due to the
> unpredictable amount of time it takes for the table creation to fully
> propagate around the full cluster. I think it was recognized as a real
> problem, but without an immediate solution, so the recommended practice for
> now is to only manually perform the operation (sure, it can be scripted,
> but only under manual control) to assure that the operation completes and
> that only one attempt is made to create the table. I don't recall if there
> was a specific Jira assigned, and the antipattern doc doesn't appear to
> reference this scenario. Maybe a committer can shed some more light.
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 10:29 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I sort of agree.. but we are also considering migrating to hourly
>> tables.. and what if the single script doesn't run.
>>
>> I like having N nodes make changes like this because in my experience
>> that central / single box will usually fail at the wrong time :-/
>>
>>
>>
>> On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> Instead of using ZK, why not solve your concurrency problem by removing
>>> it?  By that, I mean simply have 1 process that creates all your tables
>>> instead of creating a race condition intentionally?
>>>
>>> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton <bur...@spinn3r.com> wrote:
>>>
>>>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>>>
>>>> In 2.0 this worked fine.
>>>>
>>>> We have a bunch of automated scripts that go through and create
>>>> tables... one per day.
>>>>
>>>> at midnight UTC our entire CQL went offline.. .took down our whole app.
>>>>  ;-/
>>>>
>>>> The resolution was a full CQL shut down and then a drop table to remove
>>>> the bad tables...
>>>>
>>>> pretty sure the issue was with schema disagreement.
>>>>
>>>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT
>>>> EXISTS only checks locally?
>>>>
>>>> My work around is going to be to use zookeeper to create a mutex lock
>>>> during this operation.
>>>>
>>>> Any other things I should avoid?
>>>>
>>>>
>>>> --
>>>>
>>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>>> Engineers!
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>
>>>>
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Kevin Burton
I sort of agree.. but we are also considering migrating to hourly tables..
and what if the single script doesn't run.

I like having N nodes make changes like this because in my experience that
central / single box will usually fail at the wrong time :-/



On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Instead of using ZK, why not solve your concurrency problem by removing
> it?  By that, I mean simply have 1 process that creates all your tables
> instead of creating a race condition intentionally?
>
> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>
>> In 2.0 this worked fine.
>>
>> We have a bunch of automated scripts that go through and create tables...
>> one per day.
>>
>> at midnight UTC our entire CQL went offline.. .took down our whole app.
>>  ;-/
>>
>> The resolution was a full CQL shut down and then a drop table to remove
>> the bad tables...
>>
>> pretty sure the issue was with schema disagreement.
>>
>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS
>> only checks locally?
>>
>> My work around is going to be to use zookeeper to create a mutex lock
>> during this operation.
>>
>> Any other things I should avoid?
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Kevin Burton
Not sure if this is a bug or not or kind of a *fuzzy* area.

In 2.0 this worked fine.

We have a bunch of automated scripts that go through and create tables...
one per day.

at midnight UTC our entire CQL went offline.. .took down our whole app.  ;-/

The resolution was a full CQL shut down and then a drop table to remove the
bad tables...

pretty sure the issue was with schema disagreement.

All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS
only checks locally?

My work around is going to be to use zookeeper to create a mutex lock
during this operation.

Any other things I should avoid?


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Strategy / order for upgradesstables during rolling upgrade.

2016-01-21 Thread Kevin Burton
I think there are two strategies to upgradesstables after a release.

We're doing a 2.0 to 2.1 upgrade (been procrastinating here).

I think we can go with B below... Would you agree?

Strategy A:

- foreach server
- upgrade to 2.1
- nodetool upgradesstables

Strategy B:

- foreach server
- upgrade to 2.1
- foreach server
- nodetool upgradesstables


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Using cassandra a BLOB store / web cache.

2016-01-20 Thread Kevin Burton
There's also the 'support' issue.. C* is hard enough as it is... maybe you
can bring in another system like ES or HDFS but the more you bring in the
more your complexity REALLY goes through the roof.

Better to keep things simple.

I really like the chunking idea for C*... seems like an easy way to store
tons of data.

On Tue, Jan 19, 2016 at 4:13 PM, Robert Coli  wrote:

> On Tue, Jan 19, 2016 at 2:07 PM, Richard L. Burton III  > wrote:
>
>> I would ask why do this over say HDFS, S3, etc. seems like this problem
>> has been solved with other solutions that are specifically designed for
>> blob storage?
>>
>
> HDFS's default block size is 64mb. If you are storing objects smaller than
> this, that might be bad! It also doesn't have http transport, which other
> things do.
>
> Etc..
>
> =Rob
>
>



-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Using cassandra a BLOB store / web cache.

2016-01-19 Thread Kevin Burton
Lots of interesting feedback.. I like the ideal of chunking the IO into
pages.. it would require more thinking but I could even do cassandra async
IO and async HTTP to serve the data and then use HTTP chunks for each
range.

On Tue, Jan 19, 2016 at 10:47 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Mon, Jan 18, 2016 at 6:52 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Internally we have the need for a blob store for web content.  It's
>> MOSTLY key, ,value based but we'd like to have lookups by coarse grained
>> tags.
>>
>
> I know you know how to operate and scale MySQL, so I suggest MogileFS for
> the actual blob storage :
>
> https://github.com/mogilefs
>
> Then do some simple indexing in some search store. Done.
>
> =Rob
>
>



-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: Cassandra is consuming a lot of disk space

2016-01-12 Thread Kevin O'Connor
Have you tried restarting? It's possible there's open file handles to
sstables that have been compacted away. You can verify by doing lsof and
grepping for DEL or deleted.

If it's not that, you can run nodetool cleanup on each node to scan all of
the sstables on disk and remove anything that it's not responsible for.
Generally this would only work if you added nodes recently.

On Tuesday, January 12, 2016, Rahul Ramesh  wrote:

> We have a 2 node Cassandra cluster with a replication factor of 2.
>
> The load factor on the nodes is around 350Gb
>
> Datacenter: Cassandra
> ==
> Address  RackStatus State   LoadOwns
>  Token
>
>   -5072018636360415943
> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>   -7068746880841807701
> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>   -5072018636360415943
>
> However,if I use df -h,
>
> /dev/xvdf   252G  223G   17G  94% /HDD1
> /dev/xvdg   493G  456G   12G  98% /HDD2
> /dev/xvdh   197G  167G   21G  90% /HDD3
>
>
> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one
> of the machine and in another machine it is close to 650Gb.
>
> I started repair 2 days ago, after running repair, the amount of disk
> space consumption has actually increased.
> I also checked if this is because of snapshots. nodetool listsnapshot
> intermittently lists a snapshot but it goes away after sometime.
>
> Can somebody please help me understand,
> 1. why so much disk space is consumed?
> 2. Why did it increase after repair?
> 3. Is there any way to recover from this state.
>
>
> Thanks,
> Rahul
>
>


Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
Yes.. .it's not currently possible :)

I think it should be.

Say the IO on your C* is at 60% utilization.

If you do a repair, this would require 120% utilization obviously not
possible, so now your app is down / offline until the repair finishes.

If you could throttle repair separately this would resolve this problem.

IF anyone else thinks this is an issue I'll create a JIRA.

On Mon, Oct 19, 2015 at 3:38 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Mon, Oct 19, 2015 at 9:30 AM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I think the point I was trying to make is that on highly loaded boxes,
>>  repair should take lower priority than normal compactions.
>>
>
> You can manually do this by changing the thread priority of compaction
> threads which you somhow identify as doing repair related compaction...
>
> ... but incoming streamed SStables are compacted just as if they were
> flushed, so I'm pretty sure what you're asking for is not currently
> possible?
>
> =Rob
>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
I think the point I was trying to make is that on highly loaded boxes,
 repair should take lower priority than normal compactions.

Having a throttle on *both* doesn't solve the problem.

So I need a

setcompactionthroughput

and a

setrepairthroughput

and total througput would be the sum of both.

On Mon, Oct 19, 2015 at 8:30 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> The validation compaction part of repair is susceptible to the compaction
> throttling knob `nodetool getcompactionthroughput`
> / `nodetool setcompactionthroughput` and you can use that to tune down the
> resources that are being used by repair.
>
> Check out this post by driftx on advanced repair techniques
> <http://www.datastax.com/dev/blog/advanced-repair-techniques>.
>
> Given your other question, I agree with Raj that it might be a good idea
> to decommission the new nodes rather than repairing depending on how much
> data has made it to them and how tight you were on resources before adding
> nodes.
>
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Sun, Oct 18, 2015 at 8:18 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I'm doing a big nodetool repair right now and I'm pretty sure the added
>> overhead is impacting our performance.
>>
>> Shouldn't you be able to throttle repair so that normal compactions can
>> use most of the resources?
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-18 Thread Kevin Burton
ouch.. OK.. I think I really shot myself in the foot here then.  This might
be bad.

I'm not sure if I would have missing data.  I mean basically the data is on
the other nodes.. but the cluster has been running with 10 nodes
accidentally bootstrapped with auto_bootstrap=false.

So they have new data and seem to be missing values.

this is somewhat misleading... Initially if you start it up and run
nodetool status , it only returns one node.

So I assumed auto_bootstrap=false meant that it just doesn't join the
cluster.

I'm running a nodetool repair now to hopefully fix this.



On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> auto_bootstrap=false tells it to join the cluster without running
> bootstrap – the node assumes it has all of the necessary data, and won’t
> stream any missing data.
>
> This generally violates consistency guarantees, but if done on a single
> node, is typically correctable with `nodetool repair`.
>
> If you do it on many  nodes at once, it’s possible that the new nodes
> could represent all 3 replicas of the data, but don’t physically have any
> of that data, leading to missing records.
>
>
>
> From: <burtonator2...@gmail.com> on behalf of Kevin Burton
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, October 18, 2015 at 3:44 PM
> To: "user@cassandra.apache.org"
> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at
> once?
>
> An shit.. I think we're seeing corruption.. missing records :-/
>
> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new
>> nodes)
>>
>> By default we have auto_boostrap = false
>>
>> so we just push our config to the cluster, the cassandra daemons restart,
>> and they're not cluster members and are the only nodes in the cluster.
>>
>> Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had about
>> 7 members of the cluster and 8 not yet joined.
>>
>> We are only doing 1 at a time because apparently bootstrapping more than
>> 1 is unsafe.
>>
>> I did a rolling restart whereby I went through and restarted all the
>> cassandra boxes.
>>
>> Somehow the new nodes auto boostrapped themselves EVEN though
>> auto_bootstrap=false.
>>
>> We don't have any errors.  Everything seems functional.  I'm just worried
>> about data loss.
>>
>> Thoughts?
>>
>> Kevin
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


compact/repair shouldn't compete for normal compaction resources.

2015-10-18 Thread Kevin Burton
I'm doing a big nodetool repair right now and I'm pretty sure the added
overhead is impacting our performance.

Shouldn't you be able to throttle repair so that normal compactions can use
most of the resources?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-17 Thread Kevin Burton
We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new
nodes)

By default we have auto_boostrap = false

so we just push our config to the cluster, the cassandra daemons restart,
and they're not cluster members and are the only nodes in the cluster.

Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had about 7
members of the cluster and 8 not yet joined.

We are only doing 1 at a time because apparently bootstrapping more than 1
is unsafe.

I did a rolling restart whereby I went through and restarted all the
cassandra boxes.

Somehow the new nodes auto boostrapped themselves EVEN though
auto_bootstrap=false.

We don't have any errors.  Everything seems functional.  I'm just worried
about data loss.

Thoughts?

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: reiserfs - DirectoryNotEmptyException

2015-10-17 Thread Kevin Burton
My advice is to not even consider anything else or make any other changes
to your architecture until you get onto a modern and maintained filesystem.

VERY VERY VERY few people are deploying anything on ReiserFS so you're
going to be the first group encountering any problems.

On Thu, Oct 15, 2015 at 12:28 PM, Modha, Digant <
digant.mo...@tdsecurities.com> wrote:

> It is deployed on an existing cluster but will be migrated soon to a
> different file system & Linux distribution.
>
> -Original Message-
> From: Michael Shuler [mailto:mshu...@pbandjelly.org] On Behalf Of Michael
> Shuler
> Sent: Wednesday, October 14, 2015 6:02 PM
> To: user@cassandra.apache.org
> Subject: Re: reiserfs - DirectoryNotEmptyException
>
> On 10/13/2015 01:58 PM, Modha, Digant wrote:
> > I am running Cassandra 2.1.10 and noticed intermittent
> > DirectoryNotEmptyExceptions during repair.  My cassandra data drive is
> > reiserfs.
>
> Why? I'm genuinely interested in this filesystem selection, since it is
> unmaintained, has been dropped from some mainstream linux distributions,
> and some may call it "dead". ;)
>
> > I noticed that on reiserfs wiki site
> > https://en.m.wikipedia.org/wiki/ReiserFS#Criticism, it states that
> > unlink operation is not synchronous. Is that the reason for the
> > exception below:
> >
> > ERROR [ValidationExecutor:137] 2015-10-13 00:46:30,759
> > CassandraDaemon.java:227 - Exception in thread
> > Thread[ValidationExecutor:137,1,main]
> >
> > org.apache.cassandra.io.FSWriteError:
> > java.nio.file.DirectoryNotEmptyException:
> >
> > at
> > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.jav
> > a:135)
> >
> >~[apache-cassandra-2.1.10.jar:2.1.10]
> <...>
>
> This seems like a reasonable explanation. Using a modern filesystem like
> ext4 or xfs would certainly be helpful in getting you within the realm of
> a "common" hardware setup.
>
> https://wiki.apache.org/cassandra/CassandraHardware
>
> https://www.safaribooksonline.com/library/view/cassandra-high-performance/9781849515122/ch04s06.html
>
> I think Al Tobey had a slide deck on filesystem tuning for C*, but I
> didn't find it quickly.
>
> --
> Kind regards,
> Michael
>
>
> TD Securities disclaims any liability or losses either direct or
> consequential caused by the use of this information. This communication is
> for informational purposes only and is not intended as an offer or
> solicitation for the purchase or sale of any financial instrument or as an
> official confirmation of any transaction. TD Securities is neither making
> any investment recommendation nor providing any professional or advisory
> services relating to the activities described herein. All market prices,
> data and other information are not warranted as to completeness or accuracy
> and are subject to change without notice Any products described herein are
> (i) not insured by the FDIC, (ii) not a deposit or other obligation of, or
> guaranteed by, an insured depository institution and (iii) subject to
> investment risks, including possible loss of the principal amount invested.
> The information shall not be further distributed or duplicated in whole or
> in part by any means without the prior written consent of TD Securities. TD
> Securities is a trademark of The Toronto-Dominion Bank and represents TD
> Securities (USA) LLC and certain investment banking activities of The
> Toronto-Dominion Bank and its subsidiaries.
>



-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Post portem of a large Cassandra datacenter migration.

2015-10-09 Thread Kevin Burton
We just finished up a pretty large migration of about 30 Cassandra boxes to
a new datacenter.

We'll be migrating to about 60 boxes here in the next month so scalability
(and being able to do so cleanly) is important.

We also completed an Elasticsearch migration at the same time.  The ES
migration worked fine. A few small problems with it doing silly things with
relocating nodes too often but all in all it was somewhat painless.

At one point we were doing 200 shard reallocations in parallel and pushing
about 2-4Gbit...

The Cassandra migration, however, was a LOT harder.

One quick thing I wanted to point out - we're hiring.  So if you're a
killer Java Devops guy drop me an email

Anyway.  Back to the story.

Obviously we did a bunch of research before hand to make sure we had plenty
of bandwidth.  This was a migration from Washington DC to Germany.

Using iperf, we could consistently push about 2Gb back and forth between DC
and Germany.  This includes TCP as we switched to using large window sizes.

The big problem that we had, was that we could only bootstrap one node at a
time.  The ends up taking a LOT more time because you have to keep checking
on a node so that you can start the next one.

I imagine one could write a coordinator script but we had so many problems
with CS that it wouldn't have worked if we tried.

We had 2-3 main problems.

1.  Sometimes streams would just stop and lock up.  No explanation why.
They would just lock up and not resume.  We'd wait 10-15 minutes with no
response.. This would require us abort and retry.  Had we updated to
Cassandra 2.2 before hand I think the new resume support would work.

2.  Some of our keyspaces created by Thrift caused exceptions regarding
"too few resources" when trying to bootstrap. Dropping these keyspaces
fixed the problem.  They were just test keyspaces so it didn't matter.

3.  Because of #1, it's probably better to make sure you have 2x or more
disk space on the remote end before you do the migration.  This way you can
boot the same number of nodes you had before and just decommission the old
ones quickly. (er use nodetool removenode - see below)

4.  We're not sure why, but our OLDER machines kept locking up during this
process.  This kept requiring us to do a rolling restart on all the older
nodes.  We suspect this is GC and we were seeing single cores to 100%.  I
didn't have time to attach a profiler as were all burned out at this point
and just wanted to get it over with.  This problem meant that #1 was
exacerbated because our old boxes would either refuse to send streams or
refuse to accept them.  It seemed to get better when we upgraded the older
boxes to use Java 8.

5.  Don't use nodetool decommission if you have a large number of nodes.
Instead, use nodetool removenode.  It's MUCH faster and does M-N
replication between nodes directly.  The downside is that you go down to
N-1 replicas during this process. However, it was easily 20-30x faster.
This probably saved me about 5 hours of sleep!

In hindsight, I'm not sure what we would have done differently.  Maybe
bought more boxes.  Maybe upgraded to Cassandra 2.2 and probably java 8 as
well.

Setting up datacenter migration might have worked out better too.

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

2015-10-07 Thread Kevin Burton
vnodes ... of course!

On Wed, Oct 7, 2015 at 9:09 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> vnodes or single tokens?
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15_medium=summiticon_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Thu, Oct 8, 2015 at 12:06 AM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool
>> cleanup, is excessive data transferred when I add the 6th node?  IE do the
>> existing nodes send more data to the 6th node?
>>
>> the documentation is unclear.  It sounds like the biggest problem is that
>> the existing data causes things to become unbalanced due to "load" computed
>> wrong".
>>
>> but I also think that the excessive data will be removed in the next
>> major compaction and that nodetool cleanup just triggers a major compaction.
>>
>> Is my hypothesis correct?
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Why can't nodetool status include a hostname?

2015-10-07 Thread Kevin Burton
I find it really frustrating that nodetool status doesn't include a hostname

Makes it harder to track down problems.

I realize it PRIMARILY uses the IP but perhaps cassandra.yml can include an
optional 'hostname' parameter that can be set by the user.  OR have the box
itself include the hostname in gossip when it starts up.

I realize that hostname wouldn't be authoritative and that the IP must
still be shown but we could add another column for the hostname.

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Does failing to run "nodetool cleanup" end up causing more data to be transferred during bootstrapping?

2015-10-07 Thread Kevin Burton
Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool cleanup,
is excessive data transferred when I add the 6th node?  IE do the existing
nodes send more data to the 6th node?

the documentation is unclear.  It sounds like the biggest problem is that
the existing data causes things to become unbalanced due to "load" computed
wrong".

but I also think that the excessive data will be removed in the next major
compaction and that nodetool cleanup just triggers a major compaction.

Is my hypothesis correct?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Maximum node decommission // bootstrap at once.

2015-10-06 Thread Kevin Burton
We're in the middle of migrating datacenters.

We're migrating from 13 nodes to 30 nodes in the new datacenter.

The plan was to bootstrap the 30 nodes first, wait until they have joined.
 then we're going to decommission the old ones.

How many nodes can we bootstrap at once?  How many can we decommission?

I remember reading docs for this but hell if I can find it now :-P

I know what the answer is theoretically.  I just want to make sure we do
everything properly.

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: Maximum node decommission // bootstrap at once.

2015-10-06 Thread Kevin Burton
I'm not sure which is faster/easier.  Just joining one box at a time and
then decommissioning or using replace_address.

this stuff is always something you do rarely and then more complex than it
needs to be.

This complicates long term migration too.  Having to have gigabit is
somewhat of a problem in that you might now actually have it where you're
going.

We're migrating from Washington, DC to Germany so we have to change TCP
send/receive buffers to get decent bandwidth.

But I think we can do this at 1Gb per so per box.


On Tue, Oct 6, 2015 at 12:48 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Oct 6, 2015 at 12:32 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> How many nodes can we bootstrap at once?  How many can we decommission?
>>
>
> short answer : 1 node can join or part at simultaneously
>
> longer answer : https://issues.apache.org/jira/browse/CASSANDRA-2434 /
> https://issues.apache.org/jira/browse/CASSANDRA-7069 /
> -Dconsistent.rangemovement
>
> Have you considered using replace_address to replace your existing 13
> nodes, at which point you just have to join 17 more?
>
> =Rob
>
>



-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: Maximum node decommission // bootstrap at once.

2015-10-06 Thread Kevin Burton
OH. interesting.  Yeah. That's another strategy.  We've already done a
bunch of TCP tuning... we get about 1Gbit with large TCP windows.  So I
think we have that part done.

It's sad that CS can't resume...

Plan be we will just rsync the data.. Does it pretty much work just by
putting the data in a directory or do you have to do anything special?

On Tue, Oct 6, 2015 at 1:34 PM, Bryan Cheng <br...@blockcypher.com> wrote:

> Honestly, we've had more luck bootstrapping in our old DC (defining
> topology properties as the new DC) and using rsync to migrate the data
> files to new machines in the new datacenter. We had 10gig within the
> datacenter but significantly less than this cross-DC, which lead to a lot
> of broken streaming pipes and wasted effort. This might make sense
> depending on your link quality and the resources/time you have available to
> do TCP tuning,
>
> On Tue, Oct 6, 2015 at 1:29 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I'm not sure which is faster/easier.  Just joining one box at a time and
>> then decommissioning or using replace_address.
>>
>> this stuff is always something you do rarely and then more complex than
>> it needs to be.
>>
>> This complicates long term migration too.  Having to have gigabit is
>> somewhat of a problem in that you might now actually have it where you're
>> going.
>>
>> We're migrating from Washington, DC to Germany so we have to change TCP
>> send/receive buffers to get decent bandwidth.
>>
>> But I think we can do this at 1Gb per so per box.
>>
>>
>> On Tue, Oct 6, 2015 at 12:48 PM, Robert Coli <rc...@eventbrite.com>
>> wrote:
>>
>>> On Tue, Oct 6, 2015 at 12:32 PM, Kevin Burton <bur...@spinn3r.com>
>>> wrote:
>>>
>>>> How many nodes can we bootstrap at once?  How many can we decommission?
>>>>
>>>
>>> short answer : 1 node can join or part at simultaneously
>>>
>>> longer answer : https://issues.apache.org/jira/browse/CASSANDRA-2434 /
>>> https://issues.apache.org/jira/browse/CASSANDRA-7069 /
>>> -Dconsistent.rangemovement
>>>
>>> Have you considered using replace_address to replace your existing 13
>>> nodes, at which point you just have to join 17 more?
>>>
>>> =Rob
>>>
>>>
>>
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Re: Running Cassandra on Java 8 u60..

2015-09-27 Thread Kevin Burton
Possibly for existing apps… we’re running G1 for everything except
Elasticsearch and Cassandra and are pretty happy with it.

On Sun, Sep 27, 2015 at 10:28 AM, Graham Sanderson <gra...@vast.com> wrote:

> IMHO G1 is still buggy on JDK8 (based solely on being subscribed to the
> gc-dev mailing list)… I think JDK9 will be the one.
>
> On Sep 25, 2015, at 7:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>
> I think those were referring to Java7 and G1GC (early versions were buggy).
>
> Cheers,
> Stefano
>
>
> On Fri, Sep 25, 2015 at 5:08 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Any issues with running Cassandra 2.0.16 on Java 8? I remember there is
>> long term advice on not changing the GC but not the underlying version of
>> Java.
>>
>> Thoughts?
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com <http://spinn3r.com/>
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>>
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


Using inline JSON is 2-3x faster than using many columns (>20)

2015-09-26 Thread Kevin Burton
I wanted to share this with the community in the hopes that it might help
someone with their schema design.

I didn't get any red flags early on to limit the number of columns we use.
If anything the community pushes for dynamic schema because Cassandra has
super nice online ALTER TABLE.

However, in practice we've found that Cassandra started to use a LOT more
CPU than anything else in our stack.

Including Elasticsearch.  ES uses about 8% of our total CPU whereas
Cassandra uses about 70% of it.. It's not an apples to oranges comparison
mind you but Cassandra definitely warrants some attention in this scenario.

I put Cassandra into a profiler (Java Mission Control) to see if anything
weird was happening and didn't see any red flags.

There were some issues with CAS so I rewrote that to implement a query
before CAS operation where we first check if the row is already there, then
use a CAS if its missing. That was a BIG performance bump.  Probably
reduced our C* usages by 40%

However, I started to speculate that it might be getting overwhelmed with
the raw numbers of rows.

I fired up cassandra_stress to verify and basically split it at 10 columns
with 150 bytes and then 150 columns with 10 bytes.

In this synthetic benchmark C* was actually 5-6x faster for the run with 10
columns.

So this tentatively confirmed my hypothesis.

So I decided to get a bit more aggressive and tried to test it with a less
synthetic benchmark.

I wrote my own benchmark which uses our own schema in two forms.

INLINE_ONLY: 150 columns...
DATA_ONLY: 4 columns (two primary key, 1 data_format and one data_blob)
column

It creates T threads, writes W rows, then reads R rows..

I set T=50, W=50,000, R=50,000

It does a write pass, then a read pass.  I didn't implement a mixed
workload though.. I think that my results wouldn't matter as much.

The results were similarly impressive but not as much as the synthetic
benchmark above.  It was 2x faster (6 minutes vs 3 minutes).

In the inline only benchmark, C* spends 70% of the time in high CPU.  In
data_only it's about 50/50.

I think we're going to move to this model and re-write all our C* stables
to support this inline JSON.

The second benchmark was under 2.0.16... (our production version).  The
cassandra_stress was under 3.0 beta as I wanted to see if a later version
of cassandra fixed the problem. It doesn't.

This was done on a 128GB box with two Samsung SSDs in RAID0.  I didn't test
it with any replicas.

This brings up some interesting issues:

- still interesting that C* spends as much time as it does under high CPU
load.  I'd like to profile it again.

- Looks like there's room for improvement in the JSON encoder/decoder.  I'm
not sure how much we would see though because it's already using the latest
jackson which I've tuned significantly.  I might be able to get some
performance out of it by avoiding GC and garbage collection.

- Later C* might improve our CPU regardless so this might be something we
do anyway (upgrade our cassandra).



-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Running Cassandra on Java 8 u60..

2015-09-25 Thread Kevin Burton
Any issues with running Cassandra 2.0.16 on Java 8? I remember there is
long term advice on not changing the GC but not the underlying version of
Java.

Thoughts?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Best strategy for hiring from OSS communities.

2015-09-13 Thread Kevin Burton
I think j...@apache.org is dead…

I saw this:

http://mail-archives.apache.org/mod_mbox/community-dev/201304.mbox/%3CCAKQbXgAgO_3SzLMR0L4p_qkSALQzE=ehpnbmjndccu6dtm-...@mail.gmail.com%3E

And can’t find any documentation on a j...@apache.org

I think it would be valuable to create one.  Maybe I should post to general@
…

On Fri, Sep 11, 2015 at 5:34 PM, Otis Gospodnetić <
otis.gospodne...@gmail.com> wrote:

> Hey Kevin - I think there is j...@apache.org
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Aug 13, 2015 at 6:02 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Mildly off topic but we are looking to hire someone with Cassandra
>> experience..
>>
>> I don’t necessarily want to spam the list though.  We’d like someone from
>> the community who contributes to Open Source, etc.
>>
>> Are there forums for Apache / Cassandra, etc for jobs? I couldn’t fine
>> one.
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>


cassandra-stress on 3.0 with column widths benchmark.

2015-09-13 Thread Kevin Burton
I’m trying to benchmark two scenarios…

10 columns with 150 bytes each

vs

150 columns with 10 bytes each.

The total row “size” would be 1500 bytes (ignoring overhead).

Our app uses 150 columns so I’m trying to see if packing it into a JSON
structure using one column would improve performance.

I seem to have confirmed my hypothesis.

I’m running two tests:

./tools/bin/cassandra-stress write -insert -col n=FIXED\(10\)
> size=FIXED\(150\) | tee cassandra-stress-10-150.log
>


> time ./tools/bin/cassandra-stress write -insert -col n=FIXED\(150\)
> size=FIXED\(10\) | tee cassandra-stress-150-10.log


this shows that the "op rate” is much much lower when running with 150
columns:

root@util0063 ~/apache-cassandra-3.0.0-beta2 # grep "op rate"
> cassandra-stress-10-150.log
> op rate   : 7632 [WRITE:7632]
> op rate   : 11851 [WRITE:11851]
> op rate   : 31967 [WRITE:31967]
> op rate   : 41798 [WRITE:41798]
> op rate   : 51251 [WRITE:51251]
> op rate   : 58057 [WRITE:58057]
> op rate   : 62977 [WRITE:62977]
> op rate   : 65398 [WRITE:65398]
> op rate   : 67673 [WRITE:67673]
> op rate   : 69198 [WRITE:69198]
> op rate   : 70402 [WRITE:70402]
> op rate   : 71019 [WRITE:71019]
> op rate   : 71574 [WRITE:71574]
> root@util0063 ~/apache-cassandra-3.0.0-beta2 # grep "op rate"
> cassandra-stress-150-10.log
> op rate   : 2570 [WRITE:2570]
> op rate   : 5144 [WRITE:5144]
> op rate   : 10906 [WRITE:10906]
> op rate   : 11832 [WRITE:11832]
> op rate   : 12471 [WRITE:12471]
> op rate   : 12915 [WRITE:12915]
> op rate   : 13620 [WRITE:13620]
> op rate   : 13456 [WRITE:13456]
> op rate   : 13916 [WRITE:13916]
> op rate   : 14029 [WRITE:14029]
> op rate   : 13915 [WRITE:13915]


… what’s WEIRD here is that

Both tests take about 10 minutes.  Yet it’s saying that the op rate for the
second is slower.  Why would that be? That doesn’t make much sense…

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Cassandra 2.2 for time series

2015-09-02 Thread Kevin Burton
Check out kairosd for a time series db on Cassandra.
On Aug 31, 2015 7:12 AM, "Peter Lin"  wrote:

>
> I didn't realize they had added max and min as stock functions.
>
> to get the sample time. you'll probably need to write a custom function.
> google for it and you'll find people that have done it.
>
> On Mon, Aug 31, 2015 at 10:09 AM, Pål Andreassen  > wrote:
>
>> Cassandra 2.2 has min and max built-in. My problem is getting the
>> corresponding sample time as well.
>>
>>
>>
>> *Pål Andreassen*
>>
>> *54°23'58"S 3°18'53"E*
>>
>> *Konsulent*
>>
>> Mobil +47 982 85 504
>>
>> pal.andreas...@bouvet.no
>>
>>
>>
>>
>> *Bouvet Norge AS Avdeling Grenland*
>>
>> Uniongata 18, Klosterøya
>>
>> N-3732 Skien
>>
>> Tlf +47 23 40 60 00
>>
>> *bouvet.no*
>> 
>>
>>
>>
>> *From:* Peter Lin [mailto:wool...@gmail.com]
>> *Sent:* mandag 31. august 2015 16.09
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Cassandra 2.2 for time series
>>
>>
>>
>>
>>
>> Unlike SQL, CQL doesn't have built-in functions like max/min
>>
>> In the past, people would create summary tables to keep rolling stats for
>> reports/analytics. In cql3, there's user defined functions, so you can
>> write a function to do max/min
>>
>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs
>>
>>
>>
>> On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen 
>> wrote:
>>
>> Hi
>>
>>
>>
>> I’m currently evaluating Cassandra as a potiantial database for storing
>> time series data from lots of devices (IoT type of scenario).
>>
>> Currently we have a few thousand devices with X channels (measurements)
>> that they report at different intervals (from 5 minutes and up).
>>
>>
>>
>> I’ve created as simple test table to store the data:
>>
>>
>>
>> CREATE TABLE DataRaw(
>>
>>   channelId int,
>>
>>   sampleTime timestamp,
>>
>>   value double,
>>
>>   PRIMARY KEY (channelId, sampleTime)
>>
>> ) WITH CLUSTERING ORDER BY (sampleTime ASC);
>>
>>
>>
>> This schema seems to work ok, but I have queries that I need to support
>> that I cannot easily figure out how to perform (except getting all the data
>> out and iterate it myself).
>>
>>
>>
>> Query 1: For max and min queries, I not only want the maximum/minimum
>> value, but also the corresponding timestamp.
>>
>>
>>
>> sampleTime  value
>>
>> 2015-08-28 00:0010
>>
>> 2015-08-28 01:0015
>>
>> 2015-08-28 02:0013
>>
>>
>> I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT
>> sampleTime, max(value) FROM DataRAW return the max value, but the first
>> sampleTime.
>>
>> Also I wonder if Cassandra has built-in support for
>> interpolation/extrapolation. Some sort of group by hour/day/week/month and
>> even year function.
>>
>>
>>
>> Query 2: Give me hourly averages for channel X for yesterday. I’d expect
>> to get 24 values each of which is the hourly average. Or give my daily
>> averages for last year for a given channel. Should return 365 daily
>> averages.
>>
>>
>>
>> Best regards
>>
>>
>>
>> *Pål Andreassen*
>>
>> *54°23'58"S 3°18'53"E*
>>
>> *Konsulent*
>>
>> Mobil +47 982 85 504
>>
>> pal.andreas...@bouvet.no
>>
>>
>>
>>
>> *Bouvet Norge AS Avdeling Grenland*
>>
>> Uniongata 18, Klosterøya
>>
>> N-3732 Skien
>>
>> Tlf +47 23 40 60 00
>>
>> *bouvet.no*
>> 
>>
>>
>>
>>
>>
>
>


Re: Practical limitations of too many columns/cells ?

2015-08-25 Thread Kevin Burton
No problem.  IS there a JIRA ticket already for this?

On Mon, Aug 24, 2015 at 6:06 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Can you post your findings to JIRA as well?  Would be good to see some
 real numbers from production.

 The refactor of the storage engine (8099) may completely change this, but
 it's good to have it on the radar.


 On Sun, Aug 23, 2015 at 10:31 PM Kevin Burton bur...@spinn3r.com wrote:

 Agreed.  We’re going to run a benchmark.  Just realized we grew to 144
 columns.  Fun.  Kind of disappointing that Cassandra is so slow in this
 regard.  Kind of defeats the whole point of flexible schema if actually
 using that feature is slow as hell.

 On Sun, Aug 23, 2015 at 4:54 PM, Jeff Jirsa jeff.ji...@crowdstrike.com
 wrote:

 The key is to benchmark it with your real data. Modern cassandra-stress
 let’s you get very close to your actual read/write behavior, and the real
 differentiator will depend on your use case (how often do you write the
 whole row vs updating just one column/field). My gist shows a ton of
 different examples, but they’re not scientific, and at this point they’re
 old versions (and performance varies version to version).

 - Jeff

 From: burtonator2...@gmail.com on behalf of Kevin Burton
 Reply-To: user@cassandra.apache.org
 Date: Sunday, August 23, 2015 at 2:58 PM
 To: user@cassandra.apache.org
 Subject: Re: Practical limitations of too many columns/cells ?

 Ah.. yes.  Great benchmarks. If I’m interpreting them correctly it was
 ~15x slower for 22 columns vs 2 columns?

 Guess we have to refactor again :-P

 Not the end of the world of course.

 On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa jeff.ji...@crowdstrike.com
 wrote:

 A few months back, a user in #cassandra on freenode mentioned that when
 they transitioned from thrift to cql, their overall performance decreased
 significantly. They had 66 columns per table, so I ran some benchmarks with
 various versions of Cassandra and thrift/cql combinations.

 It shouldn’t really surprise you that more columns = more work = slower
 operations. It’s not necessarily the size of the writes, but the amount of
 work that needs to be done with the extra cells (2 large columns totaling
 2k performs better than 66 small columns totaling 0.66k even though it’s
 three times as much raw data being written to disk)

 https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c

 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660
 bytes per): cassandra-stress --operation INSERT --num-keys 100
 --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodes
 Averages from the middle 80% of values: interval_op_rate : 10720

 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200
 bytes per): cassandra-stress --operation INSERT --num-keys 100
 --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes
 Averages from the middle 80% of values: interval_op_rate : 28667

 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes per):
 cassandra-stress --operation INSERT --num-keys 100 --columns 2
 --column-size=1024 --replication-factor 2 --nodesfile=nodes Averages
 from the middle 80% of values: interval_op_rate : 23489

 From: burtonator2...@gmail.com on behalf of Kevin Burton
 Reply-To: user@cassandra.apache.org
 Date: Sunday, August 23, 2015 at 1:02 PM
 To: user@cassandra.apache.org
 Subject: Practical limitations of too many columns/cells ?

 Is there any advantage to using say 40 columns per row vs using 2
 columns (one for the pk and the other for data) and then shoving the data
 into a BLOB as a JSON object?

 To date, we’ve been just adding new columns.  I profiled Cassandra and
 about 50% of the CPU time is spent on CPU doing compactions.  Seeing that
 CS is being CPU bottlenecked maybe this is a way I can optimize it.

 Any thoughts?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
Is there any advantage to using say 40 columns per row vs using 2 columns
(one for the pk and the other for data) and then shoving the data into a
BLOB as a JSON object?

To date, we’ve been just adding new columns.  I profiled Cassandra and
about 50% of the CPU time is spent on CPU doing compactions.  Seeing that
CS is being CPU bottlenecked maybe this is a way I can optimize it.

Any thoughts?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
Ah.. yes.  Great benchmarks. If I’m interpreting them correctly it was ~15x
slower for 22 columns vs 2 columns?

Guess we have to refactor again :-P

Not the end of the world of course.

On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa jeff.ji...@crowdstrike.com
wrote:

 A few months back, a user in #cassandra on freenode mentioned that when
 they transitioned from thrift to cql, their overall performance decreased
 significantly. They had 66 columns per table, so I ran some benchmarks with
 various versions of Cassandra and thrift/cql combinations.

 It shouldn’t really surprise you that more columns = more work = slower
 operations. It’s not necessarily the size of the writes, but the amount of
 work that needs to be done with the extra cells (2 large columns totaling
 2k performs better than 66 small columns totaling 0.66k even though it’s
 three times as much raw data being written to disk)

 https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c

 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660
 bytes per): cassandra-stress --operation INSERT --num-keys 100
 --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodesAverages
 from the middle 80% of values:interval_op_rate : 10720

 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200
 bytes per):cassandra-stress --operation INSERT --num-keys 100
 --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes 
 Averages
 from the middle 80% of values:interval_op_rate : 28667

 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes 
 per):cassandra-stress
 --operation INSERT --num-keys 100 --columns 2 --column-size=1024
 --replication-factor 2 --nodesfile=nodes Averages from the middle 80% of
 values:interval_op_rate : 23489

 From: burtonator2...@gmail.com on behalf of Kevin Burton
 Reply-To: user@cassandra.apache.org
 Date: Sunday, August 23, 2015 at 1:02 PM
 To: user@cassandra.apache.org
 Subject: Practical limitations of too many columns/cells ?

 Is there any advantage to using say 40 columns per row vs using 2 columns
 (one for the pk and the other for data) and then shoving the data into a
 BLOB as a JSON object?

 To date, we’ve been just adding new columns.  I profiled Cassandra and
 about 50% of the CPU time is spent on CPU doing compactions.  Seeing that
 CS is being CPU bottlenecked maybe this is a way I can optimize it.

 Any thoughts?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Store JSON as text or UTF-8 encoded blobs?

2015-08-23 Thread Kevin Burton
Hey.

I’m considering migrating my DB from using multiple columns to just 2
columns, with the second one being a JSON object.  Is there going to be any
real difference between TEXT or UTF-8 encoded BLOB?

I guess it would probably be easier to get tools like spark to parse the
object as JSON if it’s represented as a BLOB.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
Agreed.  We’re going to run a benchmark.  Just realized we grew to 144
columns.  Fun.  Kind of disappointing that Cassandra is so slow in this
regard.  Kind of defeats the whole point of flexible schema if actually
using that feature is slow as hell.

On Sun, Aug 23, 2015 at 4:54 PM, Jeff Jirsa jeff.ji...@crowdstrike.com
wrote:

 The key is to benchmark it with your real data. Modern cassandra-stress
 let’s you get very close to your actual read/write behavior, and the real
 differentiator will depend on your use case (how often do you write the
 whole row vs updating just one column/field). My gist shows a ton of
 different examples, but they’re not scientific, and at this point they’re
 old versions (and performance varies version to version).

 - Jeff

 From: burtonator2...@gmail.com on behalf of Kevin Burton
 Reply-To: user@cassandra.apache.org
 Date: Sunday, August 23, 2015 at 2:58 PM
 To: user@cassandra.apache.org
 Subject: Re: Practical limitations of too many columns/cells ?

 Ah.. yes.  Great benchmarks. If I’m interpreting them correctly it was
 ~15x slower for 22 columns vs 2 columns?

 Guess we have to refactor again :-P

 Not the end of the world of course.

 On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa jeff.ji...@crowdstrike.com
 wrote:

 A few months back, a user in #cassandra on freenode mentioned that when
 they transitioned from thrift to cql, their overall performance decreased
 significantly. They had 66 columns per table, so I ran some benchmarks with
 various versions of Cassandra and thrift/cql combinations.

 It shouldn’t really surprise you that more columns = more work = slower
 operations. It’s not necessarily the size of the writes, but the amount of
 work that needs to be done with the extra cells (2 large columns totaling
 2k performs better than 66 small columns totaling 0.66k even though it’s
 three times as much raw data being written to disk)

 https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c

 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660
 bytes per): cassandra-stress --operation INSERT --num-keys 100
 --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodes
 Averages from the middle 80% of values: interval_op_rate : 10720

 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200
 bytes per): cassandra-stress --operation INSERT --num-keys 100
 --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes
 Averages from the middle 80% of values: interval_op_rate : 28667

 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes per):
 cassandra-stress --operation INSERT --num-keys 100 --columns 2
 --column-size=1024 --replication-factor 2 --nodesfile=nodes Averages
 from the middle 80% of values: interval_op_rate : 23489

 From: burtonator2...@gmail.com on behalf of Kevin Burton
 Reply-To: user@cassandra.apache.org
 Date: Sunday, August 23, 2015 at 1:02 PM
 To: user@cassandra.apache.org
 Subject: Practical limitations of too many columns/cells ?

 Is there any advantage to using say 40 columns per row vs using 2 columns
 (one for the pk and the other for data) and then shoving the data into a
 BLOB as a JSON object?

 To date, we’ve been just adding new columns.  I profiled Cassandra and
 about 50% of the CPU time is spent on CPU doing compactions.  Seeing that
 CS is being CPU bottlenecked maybe this is a way I can optimize it.

 Any thoughts?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Best strategy for hiring from OSS communities.

2015-08-13 Thread Kevin Burton
Mildly off topic but we are looking to hire someone with Cassandra
experience..

I don’t necessarily want to spam the list though.  We’d like someone from
the community who contributes to Open Source, etc.

Are there forums for Apache / Cassandra, etc for jobs? I couldn’t fine one.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: TTLs on tables with *only* primary keys?

2015-08-05 Thread Kevin Burton
Thanks. This is what I was looking for…

I ended up working around this by using a boolean field as a column.
Wastes a bit of space but its not the end of the world.

On Wed, Aug 5, 2015 at 7:33 AM, Tyler Hobbs ty...@datastax.com wrote:

 You can set the TTL on a row when you create it using an INSERT
 statement.  For example:

 INSERT INTO mytable (partitionkey, clusteringkey) VALUES (0, 0) USING TTL
 100;

 However, Cassandra doesn't support the ttl() function on primary key
 columns yet.  The ticket to support this is
 https://issues.apache.org/jira/browse/CASSANDRA-9312.

 On Tue, Aug 4, 2015 at 9:22 PM, Kevin Burton bur...@spinn3r.com wrote:

 I have a table which just has primary keys.

 basically:

 create table foo (

 sequence bigint,
 signature text,
 primary key( sequence, signature )
 )

 I need these to eventually get GCd however it doesn’t seem to work.

 If I then run:

 select ttl(sequence) from foo;

 I get:

 Cannot use selection function ttl on PRIMARY KEY part sequence

 …

 I get the same thing if I do it on the second column .. (signature).

 And the value doesn’t seem to be TTLd.

 What’s the best way to proceed here?


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


TTLs on tables with *only* primary keys?

2015-08-04 Thread Kevin Burton
I have a table which just has primary keys.

basically:

create table foo (

sequence bigint,
signature text,
primary key( sequence, signature )
)

I need these to eventually get GCd however it doesn’t seem to work.

If I then run:

select ttl(sequence) from foo;

I get:

Cannot use selection function ttl on PRIMARY KEY part sequence

…

I get the same thing if I do it on the second column .. (signature).

And the value doesn’t seem to be TTLd.

What’s the best way to proceed here?


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Configuring the java client to retry on write failure.

2015-07-12 Thread Kevin Burton
I can’t seem to find a decent resource to really explain this…

Our app seems to fail some write requests, a VERY low percentage.  I’d like
to retry the write requests that fail due to number of replicas not being
correct.

http://docs.datastax.com/en/developer/java-driver/2.0/common/drivers/reference/tuningPolicies_c.html

This is the best resource I can find.

I think the best strategy is to look at DefaultRetryPolicy and then create
a custom one that keeps retrying on write failures up to say 1 minute.
Latency isn’t critical for us as this is a batch processing system.

The biggest issue is how to test it?  I could unit test that my methods
return on the correct inputs but not really in real world situations.

What’s the best way to unit test this?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
We get lots of write timeouts when we decommission a node.  About 80% of
them are write timeout and just about 20% of them are read timeout.

We’ve tried to adjust streamthroughput (and compaction throughput) for that
matter and that doesn’t resolve the issue.

We’ve increased write_request_timeout_in_ms … and read timeout as well.

Is there anything else I should be looking at?

I can’t seem to find the documentation that explains what the heck is
happening.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
Looks like all of this is happening because we’re using CAS operations and
the driver is going to SERIAL consistency level.

SERIAL and LOCAL_SERIAL write failure scenarios¶

 http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html?scroll=concept_ds_umf_5xx_zj__failure-scenariosIf
 one of three nodes is down, the Paxos commit fails under the following
 conditions:

- CQL query-configured consistency level of ALL


- Driver-configured serial consistency level of SERIAL


- Replication factor of 3


I don’t understand why this would fail.. it seems completely broken in this
situation.

We were having write timeout at replication factor of 2 .. and a lot of
people from the list said of course , because 2 nodes with 1 node down
means there’s no quorum and paxos needs a quorum.  .. and not sure why I
missed that :-P

So we went with 3 replicas, and a quorum,

but this is new and I didn’t see this documented.  We set the driver to
QUORUM but then I guess the driver sees that this is a CAS operation and
forces it back to SERIAL?  Doesn’t this mean that all decommissions result
in failures of CAS?

This is Cassandra 2.0.9 btw.


On Wed, Jul 1, 2015 at 2:22 PM, Kevin Burton bur...@spinn3r.com wrote:

 We get lots of write timeouts when we decommission a node.  About 80% of
 them are write timeout and just about 20% of them are read timeout.

 We’ve tried to adjust streamthroughput (and compaction throughput) for
 that matter and that doesn’t resolve the issue.

 We’ve increased write_request_timeout_in_ms … and read timeout as well.

 Is there anything else I should be looking at?

 I can’t seem to find the documentation that explains what the heck is
 happening.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
WOW.. nice. you rock!!

On Wed, Jul 1, 2015 at 3:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:

 Looks like all of this is happening because we’re using CAS operations
 and the driver is going to SERIAL consistency level.
 ...
 This is Cassandra 2.0.9 btw.


  https://issues.apache.org/jira/browse/CASSANDRA-8640

 =Rob
 (credit to iamaleksey on IRC for remembering the JIRA #)




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


How the heck do we repair when migrating to 3 replicas on 2.0.x ?

2015-06-11 Thread Kevin Burton
We’re running Cassandra 2.0.9 and just migrated from 2-3 replicas.

We changes our consistency level to 2 during this period while we’re
running a repair.

but we can’t figure out what command to run to repair our data

We *think* we have to run “nodetool repair -pr” on each node.. is that
right?  or do we have to run nodetool -h hostname repair ?

We tried to RFTM… we really did :)


`” --

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Tracking ETA and % complete in nodetool netstats during a decommission ?

2015-05-08 Thread Kevin Burton
I’m trying to track the throughput of nodetool decommission so I can figure
out how long until this box is out of service.

Basically, I want a % complete, and a ETA on when the job will be done.

IS this possible? Without opscenter?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Timeseries analysis using Cassandra and partition by date period

2015-04-05 Thread Kevin Burton
 Hi, I switched from HBase to Cassandra and try to find problem solution
for timeseries analysis on top Cassandra.

Depending on what you’re looking for, you might want to check out KairosDB.

0.95 beta2 just shipped yesterday as well so you have good timing.

https://github.com/kairosdb/kairosdb

On Sat, Apr 4, 2015 at 11:29 AM, Serega Sheypak serega.shey...@gmail.com
wrote:

 Okay, so bucketing by day/week/month is a capacity planning stuff and
 actual questions I want to ask.
 As as a conclusion:
 I have a table events

 CREATE TABLE user_plans (
   id timeuuid,
   user_id timeuuid,
   event_ts timestamp,
   event_type int,
   some_other_attr text

 PRIMARY KEY (user_id, ends)
 );
 which fits tactic queries:
 select smth from user_plans where user_id='xxx' and end_ts  now()

 Then I create second table user_plans_daily (or weekly, monthy)

 with DDL:
 CREATE TABLE user_plans_daily/weekly/monthly (
   ymd int,
   user_id timeuuid,
   event_ts timestamp,
   event_type int,
   some_other_attr text
 )
 PRIMARY KEY ((ymd, user_id), event_ts )
 WITH CLUSTERING ORDER BY (event_ts DESC);

 And this table is good for answering strategic questions:
 select * from
 user_plans_daily/weekly/monthly
 where ymd in ()
 And I should avoid long condition inside IN clause, that is why you
 suggest me to create bigger bucket, correct?


 2015-04-04 20:00 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com:

 It sounds like your time bucket should be a month, but it depends on the
 amount of data per user per day and your main query range. Within the
 partition you can then query for a range of days.

 Yes, all of the rows within a partition are stored on one physical node
 as well as the replica nodes.

 -- Jack Krupansky

 On Sat, Apr 4, 2015 at 1:38 PM, Serega Sheypak serega.shey...@gmail.com
 wrote:

 non-equal relation on a partition key is not supported
 Ok, can I generate select query:
 select some_attributes
 from events where ymd = 20150101 or ymd = 20150102 or 20150103 ... or
 20150331

  The partition key determines which node can satisfy the query
 So you mean that all rows with the same *(ymd, user_id)* would be on
 one physical node?


 2015-04-04 16:38 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com:

 Unfortunately, a non-equal relation on a partition key is not
 supported. You would need to bucket by some larger unit, like a month, and
 then use the date/time as a clustering column for the row key. Then you
 could query within the partition. The partition key determines which node
 can satisfy the query. Designing your partition key judiciously is the key
 (haha!) to performant Cassandra applications.

 -- Jack Krupansky

 On Sat, Apr 4, 2015 at 9:33 AM, Serega Sheypak 
 serega.shey...@gmail.com wrote:

 Hi, we plan to have 10^8 users and each user could generate 10 events
 per day.
 So we have:
 10^8 records per day
 10^8*30 records per month.
 Our timewindow analysis could be from 1 to 6 months.

 Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts
 of event.

 So you suggest this approach:
 *PRIMARY KEY ((ymd, user_id), event_ts ) *
 *WITH CLUSTERING ORDER BY (**event_ts*
 * DESC);*

 where ymd=20150102 (the Second of January)?

 *What happens to writes:*
 SSTable with past days (ymd  current_day) stay untouched and don't
 take part in Compaction process since there are o changes to them?

 What happens to read:
 I issue query:
 select some_attributes
 from events where ymd = 20150101 and ymd  20150301
 Does Cassandra skip SSTables which don't have ymd in specified range
 and give me a kind of partition elimination, like in traditional DBs?


 2015-04-04 14:41 GMT+02:00 Jack Krupansky jack.krupan...@gmail.com:

 It depends on the actual number of events per user, but simply
 bucketing the partition key can give you the same effect - clustering 
 rows
 by time range. A composite partition key could be comprised of the user
 name and the date.

 It also depends on the data rate - is it many events per day or just
 a few events per week, or over what time period. You need to be careful -
 you don't want your Cassandra partitions to be too big (millions of rows)
 or too small (just a few or even one row per partition.)

 -- Jack Krupansky

 On Sat, Apr 4, 2015 at 7:03 AM, Serega Sheypak 
 serega.shey...@gmail.com wrote:

 Hi, I switched from HBase to Cassandra and try to find problem
 solution for timeseries analysis on top Cassandra.
 I have a entity named Event.
 Event has attributes:
 user_id - a guy who triggered event
 event_ts - when even happened
 event_type - type of event
 some_other_attr - some other attrs we don't care about right now.

 The DDL for entity event looks this way:

 CREATE TABLE user_plans (

   id timeuuid,
   user_id timeuuid,
   event_ts timestamp,
   event_type int,
   some_other_attr text

 PRIMARY KEY (user_id, ends)
 );

 Table is infinite, It would grow continuously during application
 lifetime.
 I want to ask question:
 Cassandra, give me all event where 

Re: Fastest way to map/parallel read all values in a table?

2015-02-09 Thread Kevin Burton
I had considered using spark for this but:

1.  we tried to deploy spark only to find out that it was missing a number
of key things we need.

2.  our app needs to shut down to release threads and resources.  Spark
doesn’t have support for this so all the workers would have stale thread
leaking afterwards.  Though I guess if I can get workers to fork then I
should be ok.

3.  Spark SQL actually returned invalid data to our queries… so that was
kind of a red flag and a non-starter

On Mon, Feb 9, 2015 at 2:24 AM, Marcelo Valle (BLOOMBERG/ LONDON) 
mvallemil...@bloomberg.net wrote:

 Just for the record, I was doing the exact same thing in an internal
 application in the start up I used to work. We have had the need of writing
 custom code process in parallel all rows of a column family. Normally we
 would use Spark for the job, but in our case the logic was a little more
 complicated, so we wrote custom code.

 What we did was to run N process in M machines (N cores in each), each one
 processing tasks. The tasks were created by splitting the range -2^ 63 to
 2^ 63 -1 in N*M*10 tasks. Even if data was not completely distributed along
 the tasks, no machines were idle, as when some task was completed another
 one was taken from the task pool.

 It was fast enough for us, but I am interested in knowing if there is a
 better way of doing it.

 For your specific case, here is a tool we had opened as open source and
 can be useful for simpler tests:
 https://github.com/s1mbi0se/cql_record_processor

 Also, I guess you probably know that, but I would consider using Spark for
 doing this.

 Best regards,
 Marcelo.

 From: user@cassandra.apache.org
 Subject: Re:Fastest way to map/parallel read all values in a table?

 What’s the fastest way to map/parallel read all values in a table?

 Kind of like a mini map only job.

 I’m doing this to compute stats across our entire corpus.

 What I did to begin with was use token() and then spit it into the number
 of splits I needed.

 So I just took the total key range space which is -2^63 to 2^63 - 1 and
 broke it into N parts.

 Then the queries come back as:

 select * from mytable where token(primaryKey) = x and token(primaryKey) 
 y

 From reading on this list I thought this was the correct way to handle
 this problem.

 However, I’m seeing horrible performance doing this.  After about 1% it
 just flat out locks up.

 Could it be that I need to randomize the token order so that it’s not
 contiguous?  Maybe it’s all mapping on the first box to begin with.



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: High GC activity on node with 4TB on data

2015-02-08 Thread Kevin Burton
Do you have a lot of individual tables?  Or lots of small compactions?

I think the general consensus is that (at least for Cassandra), 8GB heaps
are ideal.

If you have lots of small tables it’s a known anti-pattern (I believe)
because the Cassandra internals could do a better job on handling the in
memory metadata representation.

I think this has been improved in 2.0 and 2.1 though so the fact that
you’re on 1.2.18 could exasperate the issue.  You might want to consider an
upgrade (though that has its own issues as well).

On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky ho...@avast.com wrote:

 Hi all,

 we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
 on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
 (2G for new space). The node runs fine for couple of days when the GC
 activity starts to raise and reaches about 15% of the C* activity which
 causes dropped messages and other problems.

 Taking a look at heap dump, there is about 8G used by SSTableReader
 classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.

 Is this something expected and we have just reached the limit of how
 many data a single Cassandra instance can handle or it is possible to
 tune it better?

 Regards
 Jiri Horky




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Fastest way to map/parallel read all values in a table?

2015-02-08 Thread Kevin Burton
What’s the fastest way to map/parallel read all values in a table?

Kind of like a mini map only job.

I’m doing this to compute stats across our entire corpus.

What I did to begin with was use token() and then spit it into the number
of splits I needed.

So I just took the total key range space which is -2^63 to 2^63 - 1 and
broke it into N parts.

Then the queries come back as:

select * from mytable where token(primaryKey) = x and token(primaryKey)  y

From reading on this list I thought this was the correct way to handle this
problem.

However, I’m seeing horrible performance doing this.  After about 1% it
just flat out locks up.

Could it be that I need to randomize the token order so that it’s not
contiguous?  Maybe it’s all mapping on the first box to begin with.



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Disabling the write ahead log with 2 data centers?

2015-01-23 Thread Kevin Burton
The WAL (and walls in general) impose a performance overhead.

If one were to just take a machine out of the cluster, permanently, when a
machine crashes, you could quickly get all the shards back up to N replicas
after a node crashes.

So realistically, running with a WAL is somewhat redundant.

ESPECIALLY when you have 2 data centers at 3 replicas in each datacenter
(for a total of 6 replicas).

I think this would only be about a 15% performance overhead.

Additionally, on flash, if you lay out the SSTables properly, you arguably
don’t need a WAL because your SSTable itself can be a wall and you could
run without memtables.   This has been proposed in a number of situations.
Especially on something like FusionIO …

Thoughts?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


number of replicas per data center?

2015-01-18 Thread Kevin Burton
How do people normally setup multiple data center replication in terms of
number of *local* replicas?

So say you have two data centers, do you have 2 local replicas, for a total
of 4 replicas?  Or do you have 2 in one datacenter, and 1 in another?

If you only have one in a local datacenter then when it fails you have to
transfer all that data over the WAN.



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: number of replicas per data center?

2015-01-18 Thread Kevin Burton
Ah.. six replicas.  At least its super inexpensive that way (sarcasm!)



On Sun, Jan 18, 2015 at 8:14 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Sorry, I left out RF.  Yes, I prefer 3 replicas in each datacenter, and
 that's pretty common.


 On Sun Jan 18 2015 at 8:02:12 PM Kevin Burton bur...@spinn3r.com wrote:

  3 what? :-P replicas per datacenter or 3 data centers?

 So if you have 2 data centers you would have 6 total replicas with 3
 local replicas per datacenter?

 On Sun, Jan 18, 2015 at 7:53 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 Personally I wouldn't go  3 unless you have a good reason.


 On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton bur...@spinn3r.com
 wrote:

 How do people normally setup multiple data center replication in terms
 of number of *local* replicas?

 So say you have two data centers, do you have 2 local replicas, for a
 total of 4 replicas?  Or do you have 2 in one datacenter, and 1 in another?

 If you only have one in a local datacenter then when it fails you have
 to transfer all that data over the WAN.



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: number of replicas per data center?

2015-01-18 Thread Kevin Burton
 3 what? :-P replicas per datacenter or 3 data centers?

So if you have 2 data centers you would have 6 total replicas with 3 local
replicas per datacenter?

On Sun, Jan 18, 2015 at 7:53 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Personally I wouldn't go  3 unless you have a good reason.


 On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton bur...@spinn3r.com wrote:

 How do people normally setup multiple data center replication in terms of
 number of *local* replicas?

 So say you have two data centers, do you have 2 local replicas, for a
 total of 4 replicas?  Or do you have 2 in one datacenter, and 1 in another?

 If you only have one in a local datacenter then when it fails you have to
 transfer all that data over the WAN.



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Not enough replica available” when consistency is ONE?

2015-01-18 Thread Kevin Burton
OK.. so if I’m running with 2 replicas, then BOTH of them need to be online
for this to work.  Correct?  Because with two replicas I need 2 to form a
quorum.

This is somewhat confusing them.  Because if you have two replicas, and
you’re depending on these types of transactions, then this is a VERY
dangerous state.  Because if ANY of your Cassandra nodes goes offline, then
your entire application crashes.  So the more nodes you have, the HIGHER
the probability that your application will crash.

Which is just what happened to me.  And in retrospect, this makes total
sense, but of course I just missed this in the application design.

So ConsistencyLevel.ONE and if not exists are essentially mutually
incompatible and shouldn’t the driver throw an exception if the user
requests this configuration?

Its dangerous enough that it probably shouldn’t be supported.



On Sun, Jan 18, 2015 at 7:43 AM, Eric Stevens migh...@gmail.com wrote:

 Check out
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_tunable_consistency_c.html

  Cassandra 2.0 uses the Paxos consensus protocol, which resembles
 2-phase commit, to support linearizable consistency. All operations are
 quorum-based ...

 This kicks in whenever you do CAS operations (eg, IF NOT EXISTS).
 Otherwise a cluster which became network partitioned would end up being
 able to have two separate CAS statements which both succeeded, but which
 disagreed with each other.

 On Sun, Jan 18, 2015 at 8:02 AM, Kevin Burton bur...@spinn3r.com wrote:

 I’m really confused here.

 I”m calling:

 acquireInsert.setConsistencyLevel( ConsistencyLevel.ONE );

 but I”m still getting the exception:

 com.datastax.driver.core.exceptions.UnavailableException: Not enough
 replica available for query at consistency SERIAL (2 required but only 1
 alive)

 Does it matter that I’m using:

 ifNotExists();

 and that maybe cassandra needs two because it’s using a coordinator ?

 If so then an exception should probably be thrown when I try to set a
 wrong consistency level.

 which would be weird because I *do* have at least two replicas online. I
 have 4 nodes in my cluster right now...

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Not enough replica available” when consistency is ONE?

2015-01-18 Thread Kevin Burton
I’m really confused here.

I”m calling:

acquireInsert.setConsistencyLevel( ConsistencyLevel.ONE );

but I”m still getting the exception:

com.datastax.driver.core.exceptions.UnavailableException: Not enough
replica available for query at consistency SERIAL (2 required but only 1
alive)

Does it matter that I’m using:

ifNotExists();

and that maybe cassandra needs two because it’s using a coordinator ?

If so then an exception should probably be thrown when I try to set a wrong
consistency level.

which would be weird because I *do* have at least two replicas online. I
have 4 nodes in my cluster right now...

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?

2015-01-01 Thread Kevin Burton
I think the two tables are the same.  Correct?

create table foo (

source text,
target text,
primary key( source, target )
)


vs

create table foo (

source text,
target settext,
primary key( source )
)

… meaning that the first one, under the covers is represented the same as
the second.  As a slice.

Am I correct?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?

2015-01-01 Thread Kevin Burton
AH!!! I had forgotten about both of those issues.  Good points..

On Thu, Jan 1, 2015 at 11:04 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Storage-engine wise, they are almost equivalent, thought there are some
 minor differences:

 1) with Set structure, you cannot store more that 64kb worth of data
 2) collections and maps are loaded entirely by Cassandra for each query,
 whereas with clustering columns you can select a slice of columns



 On Thu, Jan 1, 2015 at 7:46 PM, Kevin Burton bur...@spinn3r.com wrote:

 I think the two tables are the same.  Correct?

 create table foo (

 source text,
 target text,
 primary key( source, target )
 )


 vs

 create table foo (

 source text,
 target settext,
 primary key( source )
 )

 … meaning that the first one, under the covers is represented the same as
 the second.  As a slice.

 Am I correct?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: limit vs sample for indexing a small amount of data quickly?

2014-12-31 Thread Kevin Burton
I thought so but doesn’t that read that into the driver?  I need to keep
piping it into other RDDs.

I have a huge table as the input and I need to do multiple transformations
on the data so I just want to read the first N rows from that as an RDD and
then keep doing my transformations.

On Wed, Dec 31, 2014 at 7:09 PM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:

  You want to use take() or takeOrdered.



 Sent with Good (www.good.com)



 -Original Message-
 *From: *Kevin Burton [bur...@spinn3r.com]
 *Sent: *Wednesday, December 31, 2014 10:02 PM Eastern Standard Time
 *To: *u...@spark.apache.org
 *Subject: *limit vs sample for indexing a small amount of data quickly?

 Is there a limit function which just returns the first N records?

 Sample is nice but I’m trying to do this so it’s super fast and just to
 test the functionality of an algorithm.

 With sample I’d have to compute the % that would yield 1000 results first…

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
  http://spinn3r.com

 --

 The information contained in this e-mail is confidential and/or
 proprietary to Capital One and/or its affiliates. The information
 transmitted herewith is intended only for use by the individual or entity
 to which it is addressed.  If the reader of this message is not the
 intended recipient, you are hereby notified that any review,
 retransmission, dissemination, distribution, copying or other use of, or
 taking of any action in reliance upon this information is strictly
 prohibited. If you have received this communication in error, please
 contact the sender and delete the material from your computer.




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


bootstrapping manually when auto_bootstrap=false ?

2014-12-17 Thread Kevin Burton
I’m trying to figure out the best way to bootstrap our nodes.

I *think* I want our nodes to be manually bootstrapped.  This way an admin
has to explicitly bring up the node in the cluster and I don’t have to
worry about a script accidentally provisioning new nodes.

The problem is HOW do you do it?

I couldn’t find any reference anywhere in the documentation.

I *think* I run nodetool repair? but it’s unclear..

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: nodetool breaks on firewall ?

2014-12-13 Thread Kevin Burton
I ended up working around this by allowing the host to connect to its own
fronted port.

Figured it’s a reasonable solution.

On Fri, Dec 12, 2014 at 12:38 PM, Ryan Svihla rsvi...@datastax.com wrote:

 well did you restart cassandra after changing the JVM_OPTS to match your
 desired address?

 On Fri, Dec 12, 2014 at 2:34 PM, Kevin Burton bur...@spinn3r.com wrote:

 Oh.  and if I specify —host it still doesn’t work. Very weird.

 On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton bur...@spinn3r.com
 wrote:

 OK..I’m stracing it and it’s definitely trying to connect to 173… here’s
 the log line below.  (anonymized).

 the question is why.. is cassandra configured to return something on the
 public address via JMX? I guess I could dump all of JMX metrics and figure
 it out.

 [pid 32331] connect(41, {sa_family=AF_INET6, sin6_port=htons(7199),
 inet_pton(AF_INET6, :::173.x.x.x, sin6_addr), sin6_flowinfo=0,
 sin6_scope_id=0}, 28 unfinished ...

 On Fri, Dec 12, 2014 at 12:20 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 is appears to be localhost, I imagine the issue is more you changed the
 rpc_address to not be localhost anymore


 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java

 lines 87 and 88
 private static final String DEFAULT_HOST = 127.0.0.1;
 private static final int DEFAULT_PORT = 7199;

 On Fri, Dec 12, 2014 at 2:09 PM, Kevin Burton bur...@spinn3r.com
 wrote:

 AH! … ok. I didn’t see that nodetool took a host.  Hm.. How does it
 determine the host to read from by default?

 The problem is that somehow it wants to read from the public interface
 (which is fire walled)

 On Fri, Dec 12, 2014 at 5:19 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 yes the node needs to restart to have cassandra-env.sh take effect,
 and the links you're providing are about making cassandra's JMX bind to 
 the
 interface you want, so nodetool isn't really the issue, nodetool can just
 take an ip argument to connect to the interface you desire.Something 
 like:

 nodetool status -h 10.1.1.100



 On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com
 wrote:

 I have a firewall I need to bring up to keep our boxes off the
 Internet (obviously).

 The problem is that once I do nodetool doesn’t work anymore.

 There’s a bunch of advice on this on the Internet:


 http://stackoverflow.com/questions/17430872/cassandra-1-2-nodetool-getting-failed-to-connect-when-trying-to-connect-to-rem

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootConnectionsFail_r.html

 .. almost all the advice talks about editing cassandra-env.sh

 The problem here is that nodetool doesn’t use the JVM_OPTS param so
 anything added there isn’t used by nodetool.  (at least in 2.0.9)

 I want to force cassandra to always use our 10x network.

 Any advice here?  Do I have to do a forced cassandra restart for my
 cassandra-env.sh to take effect?



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image:
 linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database
 technology, delivering Apache Cassandra to the world’s most innovative
 enterprises. Datastax is built to be agile, always-on, and predictably
 scalable to any size. With more than 500 customers in 45 countries, 
 DataStax
 is the database technology and transactional backbone of choice for the
 worlds most innovative companies such as Netflix, Adobe, Intuit, and 
 eBay.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image:
 linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http

Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
AH! … ok. I didn’t see that nodetool took a host.  Hm.. How does it
determine the host to read from by default?

The problem is that somehow it wants to read from the public interface
(which is fire walled)

On Fri, Dec 12, 2014 at 5:19 AM, Ryan Svihla rsvi...@datastax.com wrote:

 yes the node needs to restart to have cassandra-env.sh take effect, and
 the links you're providing are about making cassandra's JMX bind to the
 interface you want, so nodetool isn't really the issue, nodetool can just
 take an ip argument to connect to the interface you desire.Something like:

 nodetool status -h 10.1.1.100



 On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com wrote:

 I have a firewall I need to bring up to keep our boxes off the Internet
 (obviously).

 The problem is that once I do nodetool doesn’t work anymore.

 There’s a bunch of advice on this on the Internet:


 http://stackoverflow.com/questions/17430872/cassandra-1-2-nodetool-getting-failed-to-connect-when-trying-to-connect-to-rem

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootConnectionsFail_r.html

 .. almost all the advice talks about editing cassandra-env.sh

 The problem here is that nodetool doesn’t use the JVM_OPTS param so
 anything added there isn’t used by nodetool.  (at least in 2.0.9)

 I want to force cassandra to always use our 10x network.

 Any advice here?  Do I have to do a forced cassandra restart for my
 cassandra-env.sh to take effect?



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
Oh.  and if I specify —host it still doesn’t work. Very weird.

On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton bur...@spinn3r.com wrote:

 OK..I’m stracing it and it’s definitely trying to connect to 173… here’s
 the log line below.  (anonymized).

 the question is why.. is cassandra configured to return something on the
 public address via JMX? I guess I could dump all of JMX metrics and figure
 it out.

 [pid 32331] connect(41, {sa_family=AF_INET6, sin6_port=htons(7199),
 inet_pton(AF_INET6, :::173.x.x.x, sin6_addr), sin6_flowinfo=0,
 sin6_scope_id=0}, 28 unfinished ...

 On Fri, Dec 12, 2014 at 12:20 PM, Ryan Svihla rsvi...@datastax.com
 wrote:

 is appears to be localhost, I imagine the issue is more you changed the
 rpc_address to not be localhost anymore


 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java

 lines 87 and 88
 private static final String DEFAULT_HOST = 127.0.0.1;
 private static final int DEFAULT_PORT = 7199;

 On Fri, Dec 12, 2014 at 2:09 PM, Kevin Burton bur...@spinn3r.com wrote:

 AH! … ok. I didn’t see that nodetool took a host.  Hm.. How does it
 determine the host to read from by default?

 The problem is that somehow it wants to read from the public interface
 (which is fire walled)

 On Fri, Dec 12, 2014 at 5:19 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 yes the node needs to restart to have cassandra-env.sh take effect, and
 the links you're providing are about making cassandra's JMX bind to the
 interface you want, so nodetool isn't really the issue, nodetool can just
 take an ip argument to connect to the interface you desire.Something like:

 nodetool status -h 10.1.1.100



 On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com
 wrote:

 I have a firewall I need to bring up to keep our boxes off the
 Internet (obviously).

 The problem is that once I do nodetool doesn’t work anymore.

 There’s a bunch of advice on this on the Internet:


 http://stackoverflow.com/questions/17430872/cassandra-1-2-nodetool-getting-failed-to-connect-when-trying-to-connect-to-rem

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootConnectionsFail_r.html

 .. almost all the advice talks about editing cassandra-env.sh

 The problem here is that nodetool doesn’t use the JVM_OPTS param so
 anything added there isn’t used by nodetool.  (at least in 2.0.9)

 I want to force cassandra to always use our 10x network.

 Any advice here?  Do I have to do a forced cassandra restart for my
 cassandra-env.sh to take effect?



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image:
 linkedin.png] http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
OK..I’m stracing it and it’s definitely trying to connect to 173… here’s
the log line below.  (anonymized).

the question is why.. is cassandra configured to return something on the
public address via JMX? I guess I could dump all of JMX metrics and figure
it out.

[pid 32331] connect(41, {sa_family=AF_INET6, sin6_port=htons(7199),
inet_pton(AF_INET6, :::173.x.x.x, sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, 28 unfinished ...

On Fri, Dec 12, 2014 at 12:20 PM, Ryan Svihla rsvi...@datastax.com wrote:

 is appears to be localhost, I imagine the issue is more you changed the
 rpc_address to not be localhost anymore


 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java

 lines 87 and 88
 private static final String DEFAULT_HOST = 127.0.0.1;
 private static final int DEFAULT_PORT = 7199;

 On Fri, Dec 12, 2014 at 2:09 PM, Kevin Burton bur...@spinn3r.com wrote:

 AH! … ok. I didn’t see that nodetool took a host.  Hm.. How does it
 determine the host to read from by default?

 The problem is that somehow it wants to read from the public interface
 (which is fire walled)

 On Fri, Dec 12, 2014 at 5:19 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 yes the node needs to restart to have cassandra-env.sh take effect, and
 the links you're providing are about making cassandra's JMX bind to the
 interface you want, so nodetool isn't really the issue, nodetool can just
 take an ip argument to connect to the interface you desire.Something like:

 nodetool status -h 10.1.1.100



 On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com
 wrote:

 I have a firewall I need to bring up to keep our boxes off the Internet
 (obviously).

 The problem is that once I do nodetool doesn’t work anymore.

 There’s a bunch of advice on this on the Internet:


 http://stackoverflow.com/questions/17430872/cassandra-1-2-nodetool-getting-failed-to-connect-when-trying-to-connect-to-rem

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootConnectionsFail_r.html

 .. almost all the advice talks about editing cassandra-env.sh

 The problem here is that nodetool doesn’t use the JVM_OPTS param so
 anything added there isn’t used by nodetool.  (at least in 2.0.9)

 I want to force cassandra to always use our 10x network.

 Any advice here?  Do I have to do a forced cassandra restart for my
 cassandra-env.sh to take effect?



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


nodetool breaks on firewall ?

2014-12-11 Thread Kevin Burton
I have a firewall I need to bring up to keep our boxes off the Internet
(obviously).

The problem is that once I do nodetool doesn’t work anymore.

There’s a bunch of advice on this on the Internet:

http://stackoverflow.com/questions/17430872/cassandra-1-2-nodetool-getting-failed-to-connect-when-trying-to-connect-to-rem
http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootConnectionsFail_r.html

.. almost all the advice talks about editing cassandra-env.sh

The problem here is that nodetool doesn’t use the JVM_OPTS param so
anything added there isn’t used by nodetool.  (at least in 2.0.9)

I want to force cassandra to always use our 10x network.

Any advice here?  Do I have to do a forced cassandra restart for my
cassandra-env.sh to take effect?



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


does safe cassandra shutdown require disable binary?

2014-11-30 Thread Kevin Burton
I’m trying to figure out a safe way to do a rolling restart.

http://devblog.michalski.im/2012/11/25/safe-cassandra-shutdown-and-restart/

It has the following command which make sense:

root@cssa01:~# nodetool -h cssa01.michalski.im
disablegossiproot@cssa01:~# nodetool -h cssa01.michalski.im
disablethriftroot@cssa01:~# nodetool -h cssa01.michalski.im drain


… but I don’t think this takes into consideration CQL.


So you would first disablethrift, then disablebinary


anything else needed in modern Cassandra ?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Unsubscribe

2014-11-25 Thread Kevin Daly


Only BlueCat IPAM delivers true network intelligence, a smarter way to 
manage your network and devices.
Read more at www.bluecatnetworks.com/networkintelligence 
http://www.bluecatnetworks.com/networkintelligence.


This e-mail and any attachments are for the sole use of the intended 
recipient and may contain confidential information.





RAM vs SSD for real world performance?

2014-11-25 Thread Kevin Burton
The new SSDs that we have (as well as Fusion IO) in theory can saturate the
gigabit ethernet port.

The 4k random read and write IOs they’re doing now can easily add up quick
and they’re faster than gigabit and even two gigabit.

However, not all of that 4k is actually used.  I suspect that on average
half is wasted.

But the question is how much.  Of course YMMV.

I’m thinking of getting our servers with a moderate amount of RAM.  Say
24GB.  Then allocate 8GB to Cassandra, another 8GB to random daemons we
run, then another 8GB to page cache.

Curious what other people have seen here in practice.  Are they getting
comparable performance to RAM in practice? Latencies would be higher of
course but we’re fine with that.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: RAM vs SSD for real world performance?

2014-11-25 Thread Kevin Burton
I imagine I’d generally be happy if we were CPU bound :-) … as long as the
number of transactions per second is generally reasonable.

On Tue, Nov 25, 2014 at 7:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Nov 25, 2014 at 5:31 PM, Kevin Burton bur...@spinn3r.com wrote:

 Curious what other people have seen here in practice.  Are they getting
 comparable performance to RAM in practice? Latencies would be higher of
 course but we’re fine with that.


 My understanding is that when one runs Cassandra with SSDs, one replaces
 the typical i/o bound with a CPU bound. Cassandra also has various internal
 assumptions that do not make best use of the spare i/o available;
 SSD+Cassandra has only been deployed at scale for a few years, so this
 makes sense.

 =Rob





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Kevin Burton
I’m trying to track down some exceptions in our production cluster.  I
bumped up our write load and now I’m getting a non-trivial number of these
exceptions.  Somewhere on the order of 100 per hour.

All machines have a somewhat high CPU load because they’re doing other
tasks.  I’m worried that perhaps my background tasks are just overloading
cassandra and one way to mitigate this is to nice them to least favorable
priority (this is my first tasks).

But I can’t seem to really track down any documentation on HOW to tune
cassandra to prevent these. I mean I get the core theory behind all of this
just need to track down the docs so I can actually RTFM :)



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Kevin Burton
 There is no way to mimic IF NOT EXISTS on UPDATE and it's not a bug.
INSERT and UPDATE are not totally orthogonal
in CQL and you should use INSERT for actual insertion and UPDATE for
updates (granted, the database will not reject
our query if you break this rule but it's nonetheless the way it's intended
to be used).

OK.. (and not trying to be difficult here).  We can’t have it both ways.
One of these use cases is a bug…

You’re essentially saying “don’t do that, but yeah, you can do it.. “

Either UPDATE should support IF NOT EXISTS or UPDATE should not perform
INSERTs.

At least that’s the way I see it.

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
There’s still a lot of weirdness in CQL.

For example, you can do an INSERT with an UPDATE .. .which I’m generally
fine with.  Kind of make sense.

However, with INSERT you can do IF NOT EXISTS.

… but you can’t do the same thing on UPDATE.

So I foolishly wrote all my code assuming that INSERT/UPDATE were
orthogonal, but now they’re not.

you can still do IF on UPDATE though… but it’s not possible to do IF
mycolumn IS NULL

.. so is there a way to mimic IF NOT EXISTS on UPDATE or is this just a bug?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
 you can still do IF on UPDATE though… but it’s not possible to do IF
 mycolumn IS NULL -- If mycolumn = null should work


Alas.. it doesn’t :-/

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
Oh yes.  That will work because a value is already there. I’m talking if
the value does not exist. Otherwise I’d have to insert a null first.

On Mon, Nov 17, 2014 at 3:30 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Just tested with C* 2.1.1

 cqlsh:test CREATE TABLE simple(id int PRIMARY KEY, val text);
 cqlsh:test INSERT INTO simple (id) VALUES (1);
 cqlsh:test SELECT * FROM simple ;

  id | val
 +--
   1 | null

 (1 rows)

 cqlsh:test UPDATE simple SET val = 'new val' WHERE id=1 *IF val = null*;

  [applied]
 ---
   True

 cqlsh:test SELECT * FROM simple ;

  id | val
 +-
   1 | new val

 (1 rows)

 On Tue, Nov 18, 2014 at 12:12 AM, Kevin Burton bur...@spinn3r.com wrote:


 you can still do IF on UPDATE though… but it’s not possible to do IF
 mycolumn IS NULL -- If mycolumn = null should work


 Alas.. it doesn’t :-/

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Reading the write time of each value in a set?

2014-11-16 Thread Kevin Burton
Thanks. I’ll probably file a bug for this, assuming one doesn’t already
exist.

On Sun, Nov 16, 2014 at 6:20 AM, Eric Stevens migh...@gmail.com wrote:

 I'm not aware of a way to query TTL or writetime on collections from CQL
 yet.  You can access this information from Thrift though.

 On Sat Nov 15 2014 at 12:51:55 AM DuyHai Doan doanduy...@gmail.com
 wrote:

 Why don't you use map to store write time as value and data as key?
 Le 15 nov. 2014 00:24, Kevin Burton bur...@spinn3r.com a écrit :

 I’m trying to build a histograph in CQL for various records. I’d like to
 keep a max of ten items or items with a TTL.  but if there are too many
 items, I’d like to trim it so the max number of records is about 20.

 So if I exceed 20, I need to removed the oldest records.

 I’m using a set append so each member of the set has a different write
 time and ttl.

 But I can’t figure out how to compute the writetime()  of each set
 member since the CQL write time only takes a column reference.

 Any advice?  Seems like I’m an edge case.

 Plan B is to upgrade everything to 2.1 and I can use custom datatypes
 and just store the write times myself, but that takes a while.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


conditional batches across two tables?

2014-11-16 Thread Kevin Burton
I’m trying to have some code acquire a lock by first at performing a table
mutation, and then if it wins, performing a second table insert.

I don’t think this is possible with batches though.

I don’t think I can say “update this table, and if you are able to set the
value, and the value doesn’t already exist, then insert into this other
table”

I can do this in Java code, but it’s not transactional.  Which is fine of
course, I can work around some code to make it safe.

My plan B is to perform a conditional update by doing an IF NOT EXISTS and
if I win then I can do the insert but only for a limited time while I hold
the lock.

But I don’t know if even this is possible without a write then read because
the datastax driver doesn’t allow me to read the values that were written.
I’d have to do another SELECT to read them back out.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


writetime of individual set members, and what happens when you add a set member a second time.

2014-11-15 Thread Kevin Burton
So I think there are some operations in CQL WRT sets/maps that aren’t
supported yet or at least not very well documented.

For example, you can set the TTL on individual set members, but how do you
read the writetime() ?

normally on a column I can just

SELECT writetime(foo) from my_table;

but … I can’t do that for an individual set member.

and what happens to an individual set member’s writetime (and eventual gc,
expiration) if I write it again with the same member?  Does the write time
get changed because it’s a new add or does the write time stay the same
because its’ already there?

Kevin


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Two writers appending to a set to see which one wins?

2014-11-15 Thread Kevin Burton
I have two tasks trying to each insert into a table.  The only problem is
that I only want one to win, and then never perform that operation again.

So my idea was to use the set append support in Cassandra to attempt to
append to the set and if we win, then I can perform my operation.  The
problem is, how.. I don’t think there’s a way to find out that your INSERT
successfully added or failed a set append.

Is there something I’m missing?

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Reading the write time of each value in a set?

2014-11-14 Thread Kevin Burton
I’m trying to build a histograph in CQL for various records. I’d like to
keep a max of ten items or items with a TTL.  but if there are too many
items, I’d like to trim it so the max number of records is about 20.

So if I exceed 20, I need to removed the oldest records.

I’m using a set append so each member of the set has a different write time
and ttl.

But I can’t figure out how to compute the writetime()  of each set member
since the CQL write time only takes a column reference.

Any advice?  Seems like I’m an edge case.

Plan B is to upgrade everything to 2.1 and I can use custom datatypes and
just store the write times myself, but that takes a while.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


  1   2   3   >