You do not Need RAID0 for data. Let C* do striping over data disks.
And maybe CL ANY/ONE might be sufficient for your writes.
Am 08.06.2014 um 06:15 schrieb Kevin Burton bur...@spinn3r.com:
we're using containers for other reasons, not just cassandra.
Tightly constraining resources
Here’s the Jira for the proposal to remove BOP (and OPP), but you can see that
there is no clear consensus and that the issue is still open:
CASSANDRA-6922 - Investigate if we can drop ByteOrderedPartitioner and
OrderPreservingPartitioner in 3.0
Hey Jack. Thanks for posting this… very helpful.
So I guess the status is that it was proposed for deprecation but that
proposal didn't reach consensus.
Also, this gave me an idea to look at the JIRA to see what's being
proposed for 3.0 :)
Kevin
On Sun, Jun 8, 2014 at 1:26 PM, Jack
I believe Byteorderedpartitioner is being deprecated and for good reason. I
would look at what you could achieve by using wide rows and murmur3partitioner.
--
Colin
320-221-9531
On Jun 6, 2014, at 5:27 PM, Kevin Burton bur...@spinn3r.com wrote:
We have the requirement to have clients
One node would take all the load, followed by the next node -- with
this design, you are not exploiting all the power of the cluster. If only
one node takes all the load at a time, what is the point having 20 or 10
nodes ?
You'd better off using limited wide row with bucketing to achieve this.
I just checked the source and in 2.1.0 it's not deprecated.
So it *might* be *being* deprecated but I haven't seen anything stating
that.
On Sat, Jun 7, 2014 at 8:03 AM, Colin colpcl...@gmail.com wrote:
I believe Byteorderedpartitioner is being deprecated and for good reason.
I would look
It's an anti-pattern and there are better ways to do this.
I have implemented the paging algorithm you've described using wide rows
and bucketing. This approach is a more efficient utilization of
Cassandra's built in wholesome goodness.
Also, I wouldn't let any number of clients (huge) connect
On Sat, Jun 7, 2014 at 10:41 AM, Colin Clark co...@clark.ws wrote:
It's an anti-pattern and there are better ways to do this.
Entirely possible :)
It would be nice to have a document with a bunch of common cassandra design
patterns.
I've been trying to track down a pattern for this and a lot
Another way around this is to have a separate table storing the number of
buckets.
This way if you have too few buckets, you can just increase them in the
future.
Of course, the older data will still have too few buckets :-(
On Sat, Jun 7, 2014 at 11:09 AM, Kevin Burton bur...@spinn3r.com
Maybe it makes sense to describe what you're trying to accomplish in more
detail.
A common bucketing approach is along the lines of year, month, day, hour,
minute, etc and then use a timeuuid as a cluster column.
Depending upon the semantics of the transport protocol you plan on utilizing,
On Sat, Jun 7, 2014 at 1:34 PM, Colin colpcl...@gmail.com wrote:
Maybe it makes sense to describe what you're trying to accomplish in more
detail.
Essentially , I'm appending writes of recent data by our crawler and
sending that data to our customers.
They need to sync to up to date
The add seconds to the bucket. Also, the data will get cached-it's not going
to hit disk on every read.
Look at the key cache settings on the table. Also, in 2.1 you have even more
control over caching.
--
Colin
320-221-9531
On Jun 7, 2014, at 4:30 PM, Kevin Burton bur...@spinn3r.com
well you could add milliseconds, at best you're still bottlenecking most of
your writes one one box.. maybe 2-3 if there are ones that are lagging.
Anyway.. I think using 100 buckets is probably fine..
Kevin
On Sat, Jun 7, 2014 at 2:45 PM, Colin colpcl...@gmail.com wrote:
The add seconds to
No, you're not-the partition key will get distributed across the cluster if
you're using random or murmur. You could also ensure that by adding
another column, like source to ensure distribution. (Add the seconds to the
partition key, not the clustering columns)
I can almost guarantee that if
Thanks for the feedback on this btw.. .it's helpful. My notes below.
On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark co...@clark.ws wrote:
No, you're not-the partition key will get distributed across the cluster
if you're using random or murmur.
Yes… I'm aware. But in practice this is how it
Not if you add another column to the partition key; source for example.
I would really try to stay away from the ordered partitioner if at all
possible.
What ingestion rates are you expecting, in size and speed.
--
Colin
320-221-9531
On Jun 7, 2014, at 9:05 PM, Kevin Burton bur...@spinn3r.com
What's 'source' ? You mean like the URL?
If source too random it's going to yield too many buckets.
Ingestion rates are fairly high but not insane. About 4M inserts per
hour.. from 5-10GB…
On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark co...@clark.ws wrote:
Not if you add another column to the
With 100 nodes, that ingestion rate is actually quite low and I don't think
you'd need another column in the partition key.
You seem to be set in your current direction. Let us know how it works out.
--
Colin
320-221-9531
On Jun 7, 2014, at 9:18 PM, Kevin Burton bur...@spinn3r.com wrote:
Oh.. To start with we're going to use from 2-10 nodes..
I think we're going to take the original strategy and just to use 100
buckets .. 0-99… then the timestamp under that.. I think it should be fine
and won't require an ordered partitioner. :)
Thanks!
On Sat, Jun 7, 2014 at 7:38 PM, Colin
To have any redundancy in the system, start with at least 3 nodes and a
replication factor of 3.
Try to have at least 8 cores, 32 gig ram, and separate disks for log and data.
Will you be replicating data across data centers?
--
Colin
320-221-9531
On Jun 7, 2014, at 9:40 PM, Kevin Burton
Right now I'm just putting everything together as a proof of concept… so
just two cheap replicas for now. And it's at 1/1th of the load.
If we lose data it's ok :)
I think our config will be 2-3x 400GB SSDs in RAID0 , 3 replicas, 16 cores,
probably 48-64GB of RAM each box.
Just one
Write Consistency Level + Read Consistency Level Replication Factor
ensure your reads will read consistently and having 3 nodes lets you
achieve redundancy in event of node failure.
So writing with CL of local quorum and reading with CL of local quorum
(2+23) with replication factor of 3 ensures
You won't need containers - running one instance of Cassandra in that
configuration will hum along quite nicely and will make use of the cores
and memory.
I'd forget the raid anyway and just mount the disks separately (jbod)
--
Colin
320-221-9531
On Jun 7, 2014, at 10:02 PM, Kevin Burton
we're using containers for other reasons, not just cassandra.
Tightly constraining resources means we don't have to worry about cassandra
, the JVM , or Linux doing something silly and using too many resources and
taking down the whole box.
On Sat, Jun 7, 2014 at 8:25 PM, Colin Clark
We have the requirement to have clients read from our tables while they're
being written.
Basically, any write that we make to cassandra needs to be sent out over
the Internet to our customers.
We also need them to resume so if they go offline, they can just pick up
where they left off.
They
25 matches
Mail list logo