nodetool cfstats has some valuable data but what I would like is a 1 minute
delta.
Similar to iostat...
It's easy to parse this but has anyone done it?
I want to see IO throughput and load on C* for each table.
--
We’re hiring if you know of any awesome Java Devops or Linux Operations
BTW. we think we tracked this down to using large partitions to implement
inverted indexes. C* just doesn't do a reasonable job at all with large
partitions so we're going to migrate this use case to using Elasticsearch
On Wed, Aug 3, 2016 at 1:54 PM, Ben Slater
;> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-12231
>>
>> Regards,
>>
>> Ryan Svihla
>>
>> On Aug 3, 2016, at 2:58 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>>
>> It seems these are basically impossible to track down.
It seems these are basically impossible to track down.
https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-
has some information but their work around is to increase the transaction
log. There's no way to find out WHAT client or
We usually use 100 per every 5 minutes.. but you're right. We might
actually move this use case over to using Elasticsearch in the next couple
of weeks.
On Wed, Aug 3, 2016 at 11:09 AM, Jonathan Haddad wrote:
> Kevin,
>
> "Our scheme uses large buckets of content where we
path seems risky at best at the moment. In any event, your best
>> solution would be to find a way to make your partitions smaller (like
>> 1/10th of the size).
>>
>> Cheers
>> Ben
>> <https://issues.apache.org/jira/browse/CASSANDRA-11206>
solution would be to find a way to make your partitions smaller (like
> 1/10th of the size).
>
> Cheers
> Ben
> <https://issues.apache.org/jira/browse/CASSANDRA-11206>
>
> On Wed, 3 Aug 2016 at 12:35 Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I have a
/content_legacy_2016_08_02:1470154500099 (106107128 bytes)
On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <bur...@spinn3r.com> wrote:
> We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated
> to each C* node. We're aware of the recommended 8GB limit to keep GCs low
> but our
We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated
to each C* node. We're aware of the recommended 8GB limit to keep GCs low
but our memory has been creeping up (probably) related to this bug.
Here's what we're seeing... if we do a low level of writes we think
everything
On Wed, Jul 20, 2016 at 11:53 AM, Jeff Jirsa
wrote:
> Can you tolerate the value being “close, but not perfectly accurate”? If
> not, don’t use a counter.
>
>
>
yeah.. agreed.. this is a problem which is something I was considering. I
guess it depends on whether
We ended up implementing a task/queue system which uses a global pointer.
Basically the pointer just increments ... so we have thousands of tasks
that just increment this one pointer.
The problem is that we're seeing contention on it and not being able to
write this record properly.
We're just
CS?
>>
>> On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton <bur...@spinn3r.com> wrote:
>>
>>> I have a paging model whereby we stream data from CS by fetching 'pages'
>>> thereby reading (sequentially) entire datasets.
>>>
>>> We're using th
I have a paging model whereby we stream data from CS by fetching 'pages'
thereby reading (sequentially) entire datasets.
We're using the bucket approach where we write data for 5 minutes, then we
can just fetch the bucket for that range.
Our app now has TONS of data and we have a piece of
Is there a faster way to get the output of 'nodetool status' ?
I want us to more aggressively monitor for 'nodetool status' and boxes
being DN...
I was thinking something like jolokia and REST but I'm not sure if there
are variables exported by jolokia for nodetool status.
Thoughts?
--
We’re
> was a specific Jira assigned, and the antipattern doc doesn't appear to
> reference this scenario. Maybe a committer can shed some more light.
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 10:29 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I sort of ag
, Jonathan Haddad <j...@jonhaddad.com> wrote:
> Instead of using ZK, why not solve your concurrency problem by removing
> it? By that, I mean simply have 1 process that creates all your tables
> instead of creating a race condition intentionally?
>
> On Fri, Jan 22, 2016 at 6:16
Not sure if this is a bug or not or kind of a *fuzzy* area.
In 2.0 this worked fine.
We have a bunch of automated scripts that go through and create tables...
one per day.
at midnight UTC our entire CQL went offline.. .took down our whole app. ;-/
The resolution was a full CQL shut down and
I think there are two strategies to upgradesstables after a release.
We're doing a 2.0 to 2.1 upgrade (been procrastinating here).
I think we can go with B below... Would you agree?
Strategy A:
- foreach server
- upgrade to 2.1
- nodetool upgradesstables
Strategy B:
-
There's also the 'support' issue.. C* is hard enough as it is... maybe you
can bring in another system like ES or HDFS but the more you bring in the
more your complexity REALLY goes through the roof.
Better to keep things simple.
I really like the chunking idea for C*... seems like an easy way
com> wrote:
> On Mon, Jan 18, 2016 at 6:52 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Internally we have the need for a blob store for web content. It's
>> MOSTLY key, ,value based but we'd like to have lookups by coarse grained
>> tags.
>>
>
> I kn
this would resolve this problem.
IF anyone else thinks this is an issue I'll create a JIRA.
On Mon, Oct 19, 2015 at 3:38 PM, Robert Coli <rc...@eventbrite.com> wrote:
> On Mon, Oct 19, 2015 at 9:30 AM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I think the point I was trying t
logy,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the w
records.
>
>
>
> From: <burtonator2...@gmail.com> on behalf of Kevin Burton
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, October 18, 2015 at 3:44 PM
> To: "user@cassandra.apache.org"
> Subject: Re: Would we have data corruption if we bootstrapp
I'm doing a big nodetool repair right now and I'm pretty sure the added
overhead is impacting our performance.
Shouldn't you be able to throttle repair so that normal compactions can use
most of the resources?
--
We’re hiring if you know of any awesome Java Devops or Linux Operations
We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new
nodes)
By default we have auto_boostrap = false
so we just push our config to the cluster, the cassandra daemons restart,
and they're not cluster members and are the only nodes in the cluster.
Anyway. While I was about
My advice is to not even consider anything else or make any other changes
to your architecture until you get onto a modern and maintained filesystem.
VERY VERY VERY few people are deploying anything on ReiserFS so you're
going to be the first group encountering any problems.
On Thu, Oct 15, 2015
We just finished up a pretty large migration of about 30 Cassandra boxes to
a new datacenter.
We'll be migrating to about 60 boxes here in the next month so scalability
(and being able to do so cleanly) is important.
We also completed an Elasticsearch migration at the same time. The ES
t; delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the wor
I find it really frustrating that nodetool status doesn't include a hostname
Makes it harder to track down problems.
I realize it PRIMARILY uses the IP but perhaps cassandra.yml can include an
optional 'hostname' parameter that can be set by the user. OR have the box
itself include the hostname
Let's say I have 10 nodes, I add 5 more, if I fail to run nodetool cleanup,
is excessive data transferred when I add the 6th node? IE do the existing
nodes send more data to the 6th node?
the documentation is unclear. It sounds like the biggest problem is that
the existing data causes things to
We're in the middle of migrating datacenters.
We're migrating from 13 nodes to 30 nodes in the new datacenter.
The plan was to bootstrap the 30 nodes first, wait until they have joined.
then we're going to decommission the old ones.
How many nodes can we bootstrap at once? How many can we
com> wrote:
> On Tue, Oct 6, 2015 at 12:32 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> How many nodes can we bootstrap at once? How many can we decommission?
>>
>
> short answer : 1 node can join or part at simultaneously
>
> longer answer : https://is
P tuning,
>
> On Tue, Oct 6, 2015 at 1:29 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> I'm not sure which is faster/easier. Just joining one box at a time and
>> then decommissioning or using replace_address.
>>
>> this stuff is alw
mailing list)… I think JDK9 will be the one.
>
> On Sep 25, 2015, at 7:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>
> I think those were referring to Java7 and G1GC (early versions were buggy).
>
> Cheers,
> Stefano
>
>
> On Fri, Sep 25, 2015 at
I wanted to share this with the community in the hopes that it might help
someone with their schema design.
I didn't get any red flags early on to limit the number of columns we use.
If anything the community pushes for dynamic schema because Cassandra has
super nice online ALTER TABLE.
However,
Any issues with running Cassandra 2.0.16 on Java 8? I remember there is
long term advice on not changing the GC but not the underlying version of
Java.
Thoughts?
--
We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!
Founder/CEO Spinn3r.com
Location: *San
upport * http://sematext.com/
>
>
> On Thu, Aug 13, 2015 at 6:02 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Mildly off topic but we are looking to hire someone with Cassandra
>> experience..
>>
>> I don’t necessarily want to spam the list though.
I’m trying to benchmark two scenarios…
10 columns with 150 bytes each
vs
150 columns with 10 bytes each.
The total row “size” would be 1500 bytes (ignoring overhead).
Our app uses 150 columns so I’m trying to see if packing it into a JSON
structure using one column would improve performance.
Check out kairosd for a time series db on Cassandra.
On Aug 31, 2015 7:12 AM, "Peter Lin" wrote:
>
> I didn't realize they had added max and min as stock functions.
>
> to get the sample time. you'll probably need to write a custom function.
> google for it and you'll find
change this, but
it's good to have it on the radar.
On Sun, Aug 23, 2015 at 10:31 PM Kevin Burton bur...@spinn3r.com wrote:
Agreed. We’re going to run a benchmark. Just realized we grew to 144
columns. Fun. Kind of disappointing that Cassandra is so slow in this
regard. Kind of defeats
Is there any advantage to using say 40 columns per row vs using 2 columns
(one for the pk and the other for data) and then shoving the data into a
BLOB as a JSON object?
To date, we’ve been just adding new columns. I profiled Cassandra and
about 50% of the CPU time is spent on CPU doing
: burtonator2...@gmail.com on behalf of Kevin Burton
Reply-To: user@cassandra.apache.org
Date: Sunday, August 23, 2015 at 1:02 PM
To: user@cassandra.apache.org
Subject: Practical limitations of too many columns/cells ?
Is there any advantage to using say 40 columns per row vs using 2 columns
(one
Hey.
I’m considering migrating my DB from using multiple columns to just 2
columns, with the second one being a JSON object. Is there going to be any
real difference between TEXT or UTF-8 encoded BLOB?
I guess it would probably be easier to get tools like spark to parse the
object as JSON if
). My gist shows a ton of
different examples, but they’re not scientific, and at this point they’re
old versions (and performance varies version to version).
- Jeff
From: burtonator2...@gmail.com on behalf of Kevin Burton
Reply-To: user@cassandra.apache.org
Date: Sunday, August 23, 2015 at 2
Mildly off topic but we are looking to hire someone with Cassandra
experience..
I don’t necessarily want to spam the list though. We’d like someone from
the community who contributes to Open Source, etc.
Are there forums for Apache / Cassandra, etc for jobs? I couldn’t fine one.
--
, 2015 at 9:22 PM, Kevin Burton bur...@spinn3r.com wrote:
I have a table which just has primary keys.
basically:
create table foo (
sequence bigint,
signature text,
primary key( sequence, signature )
)
I need these to eventually get GCd however it doesn’t seem to work.
If I
I have a table which just has primary keys.
basically:
create table foo (
sequence bigint,
signature text,
primary key( sequence, signature )
)
I need these to eventually get GCd however it doesn’t seem to work.
If I then run:
select ttl(sequence) from foo;
I get:
Cannot use
I can’t seem to find a decent resource to really explain this…
Our app seems to fail some write requests, a VERY low percentage. I’d like
to retry the write requests that fail due to number of replicas not being
correct.
We get lots of write timeouts when we decommission a node. About 80% of
them are write timeout and just about 20% of them are read timeout.
We’ve tried to adjust streamthroughput (and compaction throughput) for that
matter and that doesn’t resolve the issue.
We’ve increased
, 2015 at 2:22 PM, Kevin Burton bur...@spinn3r.com wrote:
We get lots of write timeouts when we decommission a node. About 80% of
them are write timeout and just about 20% of them are read timeout.
We’ve tried to adjust streamthroughput (and compaction throughput) for
that matter and that doesn’t
WOW.. nice. you rock!!
On Wed, Jul 1, 2015 at 3:18 PM, Robert Coli rc...@eventbrite.com wrote:
On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:
Looks like all of this is happening because we’re using CAS operations
and the driver is going to SERIAL consistency level
We’re running Cassandra 2.0.9 and just migrated from 2-3 replicas.
We changes our consistency level to 2 during this period while we’re
running a repair.
but we can’t figure out what command to run to repair our data
We *think* we have to run “nodetool repair -pr” on each node.. is that
right?
I’m trying to track the throughput of nodetool decommission so I can figure
out how long until this box is out of service.
Basically, I want a % complete, and a ETA on when the job will be done.
IS this possible? Without opscenter?
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Hi, I switched from HBase to Cassandra and try to find problem solution
for timeseries analysis on top Cassandra.
Depending on what you’re looking for, you might want to check out KairosDB.
0.95 beta2 just shipped yesterday as well so you have good timing.
https://github.com/kairosdb/kairosdb
I had considered using spark for this but:
1. we tried to deploy spark only to find out that it was missing a number
of key things we need.
2. our app needs to shut down to release threads and resources. Spark
doesn’t have support for this so all the workers would have stale thread
leaking
Do you have a lot of individual tables? Or lots of small compactions?
I think the general consensus is that (at least for Cassandra), 8GB heaps
are ideal.
If you have lots of small tables it’s a known anti-pattern (I believe)
because the Cassandra internals could do a better job on handling the
What’s the fastest way to map/parallel read all values in a table?
Kind of like a mini map only job.
I’m doing this to compute stats across our entire corpus.
What I did to begin with was use token() and then spit it into the number
of splits I needed.
So I just took the total key range space
The WAL (and walls in general) impose a performance overhead.
If one were to just take a machine out of the cluster, permanently, when a
machine crashes, you could quickly get all the shards back up to N replicas
after a node crashes.
So realistically, running with a WAL is somewhat redundant.
How do people normally setup multiple data center replication in terms of
number of *local* replicas?
So say you have two data centers, do you have 2 local replicas, for a total
of 4 replicas? Or do you have 2 in one datacenter, and 1 in another?
If you only have one in a local datacenter then
Ah.. six replicas. At least its super inexpensive that way (sarcasm!)
On Sun, Jan 18, 2015 at 8:14 PM, Jonathan Haddad j...@jonhaddad.com wrote:
Sorry, I left out RF. Yes, I prefer 3 replicas in each datacenter, and
that's pretty common.
On Sun Jan 18 2015 at 8:02:12 PM Kevin Burton bur
.
On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton bur...@spinn3r.com wrote:
How do people normally setup multiple data center replication in terms of
number of *local* replicas?
So say you have two data centers, do you have 2 local replicas, for a
total of 4 replicas? Or do you have 2 in one
are
quorum-based ...
This kicks in whenever you do CAS operations (eg, IF NOT EXISTS).
Otherwise a cluster which became network partitioned would end up being
able to have two separate CAS statements which both succeeded, but which
disagreed with each other.
On Sun, Jan 18, 2015 at 8:02 AM, Kevin
I’m really confused here.
I”m calling:
acquireInsert.setConsistencyLevel( ConsistencyLevel.ONE );
but I”m still getting the exception:
com.datastax.driver.core.exceptions.UnavailableException: Not enough
replica available for query at consistency SERIAL (2 required but only 1
alive)
I think the two tables are the same. Correct?
create table foo (
source text,
target text,
primary key( source, target )
)
vs
create table foo (
source text,
target settext,
primary key( source )
)
… meaning that the first one, under the covers is represented the
of data
2) collections and maps are loaded entirely by Cassandra for each query,
whereas with clustering columns you can select a slice of columns
On Thu, Jan 1, 2015 at 7:46 PM, Kevin Burton bur...@spinn3r.com wrote:
I think the two tables are the same. Correct?
create table foo
, Dec 31, 2014 at 7:09 PM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:
You want to use take() or takeOrdered.
Sent with Good (www.good.com)
-Original Message-
*From: *Kevin Burton [bur...@spinn3r.com]
*Sent: *Wednesday, December 31, 2014 10:02 PM Eastern Standard Time
*To: *u
I’m trying to figure out the best way to bootstrap our nodes.
I *think* I want our nodes to be manually bootstrapped. This way an admin
has to explicitly bring up the node in the cluster and I don’t have to
worry about a script accidentally provisioning new nodes.
The problem is HOW do you do
?
On Fri, Dec 12, 2014 at 2:34 PM, Kevin Burton bur...@spinn3r.com wrote:
Oh. and if I specify —host it still doesn’t work. Very weird.
On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton bur...@spinn3r.com
wrote:
OK..I’m stracing it and it’s definitely trying to connect to 173… here’s
the log line
-h 10.1.1.100
On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com wrote:
I have a firewall I need to bring up to keep our boxes off the Internet
(obviously).
The problem is that once I do nodetool doesn’t work anymore.
There’s a bunch of advice on this on the Internet:
http
Oh. and if I specify —host it still doesn’t work. Very weird.
On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton bur...@spinn3r.com wrote:
OK..I’m stracing it and it’s definitely trying to connect to 173… here’s
the log line below. (anonymized).
the question is why.. is cassandra configured
desire.Something like:
nodetool status -h 10.1.1.100
On Thu, Dec 11, 2014 at 6:38 PM, Kevin Burton bur...@spinn3r.com
wrote:
I have a firewall I need to bring up to keep our boxes off the Internet
(obviously).
The problem is that once I do nodetool doesn’t work anymore.
There’s a bunch
I have a firewall I need to bring up to keep our boxes off the Internet
(obviously).
The problem is that once I do nodetool doesn’t work anymore.
There’s a bunch of advice on this on the Internet:
I’m trying to figure out a safe way to do a rolling restart.
http://devblog.michalski.im/2012/11/25/safe-cassandra-shutdown-and-restart/
It has the following command which make sense:
root@cssa01:~# nodetool -h cssa01.michalski.im
disablegossiproot@cssa01:~# nodetool -h cssa01.michalski.im
The new SSDs that we have (as well as Fusion IO) in theory can saturate the
gigabit ethernet port.
The 4k random read and write IOs they’re doing now can easily add up quick
and they’re faster than gigabit and even two gigabit.
However, not all of that 4k is actually used. I suspect that on
I imagine I’d generally be happy if we were CPU bound :-) … as long as the
number of transactions per second is generally reasonable.
On Tue, Nov 25, 2014 at 7:35 PM, Robert Coli rc...@eventbrite.com wrote:
On Tue, Nov 25, 2014 at 5:31 PM, Kevin Burton bur...@spinn3r.com wrote:
Curious what
I’m trying to track down some exceptions in our production cluster. I
bumped up our write load and now I’m getting a non-trivial number of these
exceptions. Somewhere on the order of 100 per hour.
All machines have a somewhat high CPU load because they’re doing other
tasks. I’m worried that
There is no way to mimic IF NOT EXISTS on UPDATE and it's not a bug.
INSERT and UPDATE are not totally orthogonal
in CQL and you should use INSERT for actual insertion and UPDATE for
updates (granted, the database will not reject
our query if you break this rule but it's nonetheless the way it's
There’s still a lot of weirdness in CQL.
For example, you can do an INSERT with an UPDATE .. .which I’m generally
fine with. Kind of make sense.
However, with INSERT you can do IF NOT EXISTS.
… but you can’t do the same thing on UPDATE.
So I foolishly wrote all my code assuming that
you can still do IF on UPDATE though… but it’s not possible to do IF
mycolumn IS NULL -- If mycolumn = null should work
Alas.. it doesn’t :-/
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
| val
+-
1 | new val
(1 rows)
On Tue, Nov 18, 2014 at 12:12 AM, Kevin Burton bur...@spinn3r.com wrote:
you can still do IF on UPDATE though… but it’s not possible to do IF
mycolumn IS NULL -- If mycolumn = null should work
Alas.. it doesn’t :-/
--
Founder/CEO Spinn3r.com
15 2014 at 12:51:55 AM DuyHai Doan doanduy...@gmail.com
wrote:
Why don't you use map to store write time as value and data as key?
Le 15 nov. 2014 00:24, Kevin Burton bur...@spinn3r.com a écrit :
I’m trying to build a histograph in CQL for various records. I’d like to
keep a max of ten items
I’m trying to have some code acquire a lock by first at performing a table
mutation, and then if it wins, performing a second table insert.
I don’t think this is possible with batches though.
I don’t think I can say “update this table, and if you are able to set the
value, and the value doesn’t
So I think there are some operations in CQL WRT sets/maps that aren’t
supported yet or at least not very well documented.
For example, you can set the TTL on individual set members, but how do you
read the writetime() ?
normally on a column I can just
SELECT writetime(foo) from my_table;
but …
I have two tasks trying to each insert into a table. The only problem is
that I only want one to win, and then never perform that operation again.
So my idea was to use the set append support in Cassandra to attempt to
append to the set and if we win, then I can perform my operation. The
I’m trying to build a histograph in CQL for various records. I’d like to
keep a max of ten items or items with a TTL. but if there are too many
items, I’d like to trim it so the max number of records is about 20.
So if I exceed 20, I need to removed the oldest records.
I’m using a set append so
I’m trying to figure out the best way to handle things like set appends
(and other CQL extensions) in traditional OR mapping.
Our OR mapper does basic setFoo() .. then save() to write the record back
to the database.
So if foo is a Sett then I can set all members.
But I want to do some appends
We’re looking at switching data centers and they’re offering pretty
aggressive pricing on boxes with fusion IO cards.
2x 1.2TB Fusion IO
128GB RAM
20 cores.
now.. this isn’t the typical cassandra box. Most people are running
multiple nodes to scale out vs scale vertically. But these boxes are
need to repair.
Sent from my iPhone
On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:
We’re looking at switching data centers and they’re offering pretty
aggressive pricing on boxes with fusion IO cards.
2x 1.2TB Fusion IO
128GB RAM
20 cores.
now.. this isn’t the typical
and never leave first gear?
As far as saturating the network goes, I guess that all depends on your
workload, and how often you need to repair.
Sent from my iPhone
On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:
We’re looking at switching data centers and they’re offering pretty
On Thu, Nov 6, 2014 at 2:10 PM, Christopher Brodt ch...@uberbrodt.net
wrote:
Yep. The trouble with FIOs is that they almost completely remove your
disk throughput problems, so then you're constrained by CPU. Concurrent
compactors and concurrent writes are two params that come to mind but there
I’m curious what people are doing with multiple SSDs per server.
I think there are two main paths:
- RAID 0 them… the problem here is that RAID0 is not a panacea and the
drives may or may not see better IO throughput.
- use N cassandra instances per box (or containers) and have one C* node
(if have network for it) and compaction throughput if you end up
with IO to spare. I generally would not recommend putting multiple C*
instances on a single box.
---
Chris Lohfink
On Thu, Nov 6, 2014 at 5:13 PM, Kevin Burton bur...@spinn3r.com wrote:
I’m curious what people are doing
Curious to see if any of you have an elegant solution here.
Right now I”m using cassandra unit;
https://github.com/jsevellec/cassandra-unit
for my integration tests.
The biggest problem is that it doesn’t support shutdown. so I can’t stop
or cleanup after cassandra between tests.
I have
It seems annoying that I can’t get “describe tables” to vertical.
maybe there’s some option I’m missing?
Kevin
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
huh. That sort of works. The problem now is that there are multiple
entries per table...
On Sun, Oct 12, 2014 at 10:39 AM, graham sanderson gra...@vast.com wrote:
select keyspace_name, columnfamily_name from system.schema_columns;
?
On Oct 12, 2014, at 10:29 AM, Kevin Burton bur
So right now I have plenty of quality and robust full text search systems I
can use.
Solr cloud, elastic search. They all also have very robust UIs on top of
them… kibana, banana, etc.
and my alternative for cassandra is… paying for a proprietary database.
Which might be fine for some parties…
I’m trying to query an entire table in parallel by splitting it up in token
ranges.
However, it’s not working because I get this:
cqlsh:blogindex select token(hashcode), hashcode from source where
token(hashcode) = 0 and token(hashcode) =
17014118346046923173168730371588410572 limit 10;
Bad
?
On Sep 28, 2014, at 1:39 PM, Kevin Burton bur...@spinn3r.com wrote:
I’m trying to query an entire table in parallel by splitting it up in
token ranges.
However, it’s not working because I get this:
cqlsh:blogindex select token(hashcode), hashcode from source where
token(hashcode) = 0 and token
On Sep 28, 2014, at 5:55 PM, Kevin Burton bur...@spinn3r.com wrote:
Hm.. is it 64 bits or 128 bits?
I’m using Murmur3Partitioner
…
I can’t find any documentation on it (as usual.. ha)
This says:
http://www.datastax.com/docs/1.1/initialize/token_generation
The tokens assigned to your nodes
I need a way to do a full table scan across all of our data.
Can’t I just use token() for this?
This way I could split up our entire keyspace into say 1024 chunks, and
then have one activemq task work with range 0, then range 1, etc… that way
I can easily just map() my whole table.
and since
1 - 100 of 262 matches
Mail list logo