Re: Geohash nearby query implementation in Cassandra.

2012-02-17 Thread Mike Malone
2012/2/17 Raúl Raja Martínez raulr...@gmail.com

  Hello everyone,

 I'm working on a application that uses Cassandra and has a geolocation
 component.
 I was wondering beside the slides and video at
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php that
 simplegeo published regarding their strategy if anyone has implemented
 geohash storage and search in cassandra.
 The basic usage is to allow a user to find things close to a geo location
 based on distance radius.

 I though about a couple of approaches.

 1. Have the geohashes be the keys using the Ordered partitioner and get a
 group of rows between keys then store the items as columns in what it would
 end up looking like wide rows since each column would point to another row
 in a different column family representing the item nearby.


That's what we did early on at SimpleGeo.


 2. Simply store the geohash prefixes as columns and use secondary indexes
 to do queries such as = and =.


This seems like a reasonable approach now that secondary indexes are
available. It might even address some of the hotspot problems we had with
the order preserving partitioner since the indices are distributed across
all hosts. Of course there are tradeoffs there too. Seems like a viable
option for sure.


 The problem I'm facing in both cases is ordering by distance and searching
 neighbors.


This will always be a problem with dimensionality reduction techniques like
geohashes. A brief bit of pedantry: it is mathematically impossible to do
dimensionality reduction without losing information. You can't embed a 2
dimensional space in a 1 dimensional space and preserve the 2D
topology. This manifests itself all sorts of ways, but when it comes to
doing kNN queries it's particularly obvious. Things that are near in 2D
space can be far apart in 1D space and vice versa. Doing a 1D embedding
like this will always result in suboptimal performance for at least some
queries. You'll have to over-fetch and post-process to get the correct
results.

That said, a 1D embedding is certainly easier to code since
multidimensional indexes are not available in Cassandra. And there are
plenty of data sets that don't hit any degenerate cases. Moreover, if
you're mostly doing bounding-radius queries the geohash approach isn't
nearly as bad (the only trouble comes when you want to limit the results,
in which case you often want things ordered by distance from centroid and
the query is no longer a bounding radius query - rather, it's a kNN with a
radius constraint). In any case, geohash is a reasonable starting point, at
least.

The neighbors problem is clearly explained here:
 https://github.com/davetroy/geohash-js

 Once the neighbors are calculated an item can be fetched with SQL similar
 to this.

 SELECT * FROM table WHERE LEFT(geohash,6) IN ('dqcjqc',
 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8')

 Since Cassandra does not currently support OR or a IN statement with
 elements that are not keys I'm not sure what the best way to implement
 geohashes may be.


Can't you use the thrift interface and use multiget_slice? If I recall
correctly, we implemented a special version of multiget_slice that stopped
when we got a certain number of columns across all rows. I don't have that
code handy but we did that work early in our Cassandra careers and,
starting from the thrift interface and following control flow for the
multiget_slice command, it wasn't terribly difficult to add.

Mike


Re: Write everywhere, read anywhere

2011-08-04 Thread Mike Malone
2011/8/3 Patricio Echagüe patric...@gmail.com



 On Wed, Aug 3, 2011 at 4:00 PM, Philippe watche...@gmail.com wrote:

 Hello,
 I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at
 CL.ONE. When I take one of the nodes down, writes fail which is what I
 expect.
 When I run a repair, I see data being streamed from those column
 families... that I didn't expect. How can the nodes diverge ? Does this mean
 that reading at CL.ONE may return inconsistent data ?


 we abort the mutation before hand when there are enough replicas alive. If
 a mutation went through and in the middle of it a replica goes down, in that
 case you can write to some nodes and the request will Timeout.
 In that case the CL.ONE may return inconsistence data.


Doesn't CL.QUORUM suffer from the same problem? There's no isolation or
rollback with CL.QUORUM either. So if I do a quorum write with RF=3 and it
fails after hitting a single node, a subsequent quorum read could return the
old data (if it hits the two nodes that didn't receive the write) or the new
data that failed mid-write (if it hits the node that did receive the write).

Basically, the scenarios where CL.ALL + CL.ONE results in a read of
inconsistent data could also cause a CL.QUORUM write followed by a CL.QUORUM
read to return inconsistent data. Right? The problem (if there is one) is
that even in the quorum case columns with the most recent timestamp win
during repair resolution, not columns that have quorum consensus.

Mike


Re: Write everywhere, read anywhere

2011-08-04 Thread Mike Malone
On Thu, Aug 4, 2011 at 10:25 AM, Jeremiah Jordan 
jeremiah.jor...@morningstar.com wrote:

  If you have RF=3 quorum won’t fail with one node down.  So R/W quorum
 will be consistent in the case of one node down.  If two nodes go down at
 the same time, then you can get inconsistent data from quorum write/read if
 the write fails with TimeOut, the nodes come back up, and then read asks the
 two nodes that were down what the value is.  And another read asks the node
 that was up, and a node that was down.  Those two reads will get different
 answers.


So the short answer is: yea, same thing can happen with quorum...

It's true that the failure scenarios are slightly different, but it's not
entirely true that two nodes need to fail to trigger inconsistencies with
quorum. A single node could be partitioned and produce the same result.

If a network event occurs on a single host then any writes that came in
before the event, that are processed before phi evict kicks in and marks the
rest of the cluster unavailable, will be written locally. From the rest of
the cluster's perspective only one node failed, but from that node's
perspective the entire rest of the cluster failed. Obviously, similar things
could happen with DC_QUORUM if a datacenter went offline.

Mike


Re: b-tree

2011-07-22 Thread Mike Malone
On Fri, Jul 22, 2011 at 12:05 AM, Eldad Yamin elda...@gmail.com wrote:

 In order order to split the nodes.
 SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if
 the number is 1,000 they split the node.
 In order to avoid that more then 1 process will edit/split the node -
 transaction is needed.

You don't need a transaction, you just need consensus and/or idempotence. In
this case both can be achieved fairly easily.

Mike


 On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote:
  But how will you be able to maintain it while it evolves and new data is
 added without transactions?
 
  What is the situation you think you need transactions for ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 22 Jul 2011, at 00:06, Eldad Yamin wrote:
 
  Aaron,
  Nested set is exactly what I had in mind.
  But how will you be able to maintain it while it evolves and new data is
 added without transactions?
 
  Thanks!
 
  On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com
 wrote:
  Just throwing out a (half baked) idea, perhaps the Nested Set Model of
 trees would work http://en.wikipedia.org/wiki/Nested_set_model
 
  * Ever row would represent a set with a left and right encoded into the
 key
  * Members are inserted as columns into *every* set / row they are a
 member. So we are de-normalising and trading space for time.
  * May need to maintain a custom secondary index of the materialised
 sets. e.g. slice a row to get the first column = the left value you are
 interested in, that is the key for the set.
 
  I've not thought it through much further than that, a lot would depend
 on your data. The top sets may get very big, .
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:
 
  Im not sure if I have an answer for you, anyway, but I'm curious
 
  A b-tree and a binary tree are not the same thing. A binary tree is a
 basic fundamental data structure, A b-tree is an approach to storing and
 indexing data on disc for a database.
 
  Which do you mean?
 
  On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com
 wrote:
  Hello,
  Is there any good way of storing a binary-tree in Cassandra?
  I wonder if someone already implement something like that and how
 accomplished that without transaction supports (while the tree keep
 evolving)?
 
  I'm asking that becouse I want to save geospatial-data, and SimpleGeo
 did it using b-tree:
 
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
 
  Thanks!
 
 
 
  --
  It's always darkest just before you are eaten by a grue.
 
 
 



Re: Commitlog Disk Full

2011-05-19 Thread Mike Malone
Just noticed this thread and figured I'd chime in since we've had similar
issues with the commit log growing too large on our clusters. Tuning down
the flush timeout wasn't really an acceptable solution for us since we
didn't want to be constantly flushing and generating extra SSTables for no
reason. So we wrote a small tool that we start in a static block in
CassandraServer that periodically checks the commit log size and flushes all
memtables if they're above some threshold.

I've attached that code. Any feedback / improvements are more than welcome.

Mike

On Thu, May 12, 2011 at 11:30 AM, Sanjeev Kulkarni sanj...@locomatix.comwrote:

 Hey guys,
 I have a ec2 debian cluster consisting of several nodes running 0.7.5 on
 ephimeral disks.
 These are fresh installs and not upgrades.
 The commitlog is set to the smaller of the disks which is around 10G in
 size and the datadir is set to the bigger disk.
 The config file is basically the same as the one supplied by the default
 installation.
 Our applications write to the cluster. After about a day of writing we
 started noticing the commitlog disk filling up. Soon we went over the disk
 limit and writes started failing. At this point we stopped the cluster.
 Over the course of the day we inserted around 25G of data. Our columns
 values are pretty small.
 I understand that cassandra periodically cleans up the commitlog
 directories by generating sstables in datadir. Is there any way to speed up
 this movement from commitog to datadir?
 Thanks!




PeriodicMemtableFlusher.java
Description: Binary data


Re: Do supercolumns have a purpose?

2011-02-09 Thread Mike Malone
On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn da...@lookin2.com wrote:

 Shaun, I agree with you, but marking them as deprecated is not good enough
 for me. I can't easily stop using supercolumns. I need an upgrade path.


David,

Cassandra is open source and community developed. The right thing to do is
what's best for the community, which sometimes conflicts with what's best
for individual users. Such strife should be minimized, it will never be
eliminated. Luckily, because this is an open source, liberal licensed
project, if you feel strongly about something you should feel free to add
whatever features you want yourself. I'm sure other people in your situation
will thank you for it.

At a minimum I think it would behoove you to re-read some of the comments
here re: why super columns aren't really needed and take another look at
your data model and code. I would actually be quite surprised to find a use
of super columns that could not be trivially converted to normal columns. In
fact, it should be possible to do at the framework/client library layer -
you probably wouldn't even need to change any application code.

Mike

On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts sh...@cuttshome.net wrote:


 I'm a newbie here, but, with apologies for my presumptuousness, I think
 you should deprecate SuperColumns. They are already distracting you, and as
 the years go by the cost of supporting them as you add more and more
 functionality is only likely to get worse. It would be better to concentrate
 on making the core column families better (and I'm sure we can all think
 of lots of things we'd like).

 Just dropping SuperColumns would be bad for your reputation -- and for
 users like David who are currently using them. But if you mark them clearly
 as deprecated and explain why and what to do instead (perhaps putting a bit
 of effort into migration tools... or even a virtual layer supporting
 arbitrary hierarchical data), then you can drop them in a few years (when
 you get to 1.0, say), without people feeling betrayed.

 -- Shaun

 On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:

 My main point was to say that it's think it is better to create tickets
 for what you want, rather than for something else completely different that
 would, as a by-product, give you what you want.

 Then let me say what I want: I want supercolumn families to have any
 feature that regular column families have.

 My data model is full of supercolumns. I used them, even though I knew it
 didn't *have to*, because they were there, which implied to me that I was
 supposed to use them for some good reason. Now I suspect that they will
 gradually become less and less functional, as features are added to regular
 column families and not supported for supercolumn families.


 On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.com wrote:

 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.comwrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to
 supercolumn families and voting on it. This will be 1 or 2 order of
 magnitude less work than getting rid of super column internally, and
 probably a much better solution anyway.


 I realize that this is largely subjective, and on such matters code
 speaks louder than words, but I don't think I agree with you on the issue 
 of
 which alternative is less work, or even which is a better solution.


 You are right, I put probably too much emphase in that sentence. My main
 point was to say that it's think it is better to create tickets for what you
 want, rather than for something else completely different that would, as a
 by-product, give you what you want.
 Then I suspect that *if* the only goal is to get secondary indexes on
 super columns, then there is a good chance this would be less work than
 getting rid of super columns. But to be fair, secondary indexes on super
 columns may not make too much sense without #598, which itself would require
 quite some work, so clearly I spoke a bit quickly.


 If the goal is to have a hierarchical model, limiting the depth to two
 seems arbitrary. Why not go all the way and allow an arbitrarily deep
 hierarchy?

 If a more sophisticated hierarchical model is deemed unnecessary, or
 impractical, allowing a depth of two seems inconsistent and
 unnecessary. It's pretty trivial to overlay a hierarchical model on top of
 the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
 implemented a custom comparator that does the job [1]. Google's Megastore
 has a similar architecture and goes even further [2].

 It seems to me that super columns are a historical artifact from
 Cassandra's early life as Facebook's inbox storage system. They needed
 posting lists of messages, sharded

Re: postgis cassandra?

2011-02-07 Thread Mike Malone
It's not really the storage of spatial data that's tricky. We use geojson as
a wire-line format at the higher levels of our system (e.g., the HTTP
API). But the hard part is organizing the data for efficient retrieval and
keeping those indices consistent with the data being indexed. Efficient
multi-dimensional indexing is tricky, but that's what you'll need if you
want to support generic spatial querying (overlaps, contains, interacts,
nearest neighbor, etc).

On Sun, Feb 6, 2011 at 1:14 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Here is a recent presentation from simplegeo.com that may provide some
 inspiration

 http://strangeloop2010.com/system/talks/presentations/000/014/495/Malone-DimensionalDataDHT.pdf

 Can you provide some more details on the data you want to store and queries
 you want to run ?

 Aaron

 On 6/02/2011, at 7:04 AM, Sean Ochoa sean.m.oc...@gmail.com wrote:

 That's a good question, Bill.

 The data that I'm trying to store begins as a simple point.  But, moving
 fo=
 rward, it will become more like complex geometries.  I assume that I can
 si=
 mply create a JSON-like object and insert it.  Which, for now, that works.
 =
  I'm just wondering if theres a typical / publicly accepted standard of
 sto=
 ring somewhat complex spatial data in Cassandra.

 Additionally, I would like to figure out how one goes about slicing on
 large spatial data sets given situations where, for instance, I would like
 to get all the points in a column-family where the point is within a shape.
  I guess it boils down to using a spatial comparator of some sort, but I
 haven't seen one, yet.

  - Sean

 On Sat, Feb 5, 2011 at 9:51 AM, William R Speirs  bill.spe...@gmail.com
 bill.spe...@gmail.com wrote:

 I know nothing about postgis and little about spacial data, but if you're
 simply talking about data that relates to some latitude  longitude pair,
 you could have your row key simply be the concatenation of the two:
 lat:long.

 Can you provide more details about the type of data you're looking to
 store?

 Thanks...

 Bill-


 On 02/05/2011 12:22 PM, Sean Ochoa wrote:

 Can someone tell me how to represent spatial data (coming from postgis)
 in
 Cassandra?

  - Sean




 --
 Sean | M (206) 962-7954 | GV (760) 624-8718




Re: Do supercolumns have a purpose?

2011-02-03 Thread Mike Malone
On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.com wrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to supercolumn
 families and voting on it. This will be 1 or 2 order of magnitude less work
 than getting rid of super column internally, and probably a much better
 solution anyway.


I realize that this is largely subjective, and on such matters code speaks
louder than words, but I don't think I agree with you on the issue of which
alternative is less work, or even which is a better solution.

If the goal is to have a hierarchical model, limiting the depth to two seems
arbitrary. Why not go all the way and allow an arbitrarily deep hierarchy?

If a more sophisticated hierarchical model is deemed unnecessary, or
impractical, allowing a depth of two seems inconsistent and
unnecessary. It's pretty trivial to overlay a hierarchical model on top of
the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
implemented a custom comparator that does the job [1]. Google's Megastore
has a similar architecture and goes even further [2].

It seems to me that super columns are a historical artifact from Cassandra's
early life as Facebook's inbox storage system. They needed posting lists of
messages, sharded by user. So that's what they built. In my dealings with
the Cassandra code, super columns end up making a mess all over the place
when algorithms need to be special cased and branch based on the
column/supercolumn distinction.

I won't even mention what it does to the thrift interface.

Mike

[1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
[2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf


Re: GeoIndexing in Cassandra, Open Sourced?

2011-01-21 Thread Mike Malone
A more recent preso I gave about the SimpleGeo architecture is up at
http://strangeloop2010.com/system/talks/presentations/000/014/495/Malone-DimensionalDataDHT.pdf

Mike

On Fri, Jan 21, 2011 at 10:02 AM, Joseph Stein crypt...@gmail.com wrote:

 I hear that a bunch of folks have GeoIndexing built on top of Cassandra and
 running in production.

 Any of them open sourced (Twitter? SimpleGeo? Bueller?) planning on it?

 /*
 Joe Stein
 http://www.linkedin.com/in/charmalloc
 Twitter: @allthingshadoop
 */



Re: cassandra row cache

2011-01-14 Thread Mike Malone
Digest reads could be being dropped..?

On Thu, Jan 13, 2011 at 4:11 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Thu, Jan 13, 2011 at 2:00 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  Is it possible that your are reading at READ.ONE and that READ.ONE
  only warms cache on 1 of your three nodes= 20. 2nd read warms another
  60%, and by the third read all the replicas are warm? 99% ?
 
  This would be true if digest reads were not warming caches.

 Digest reads do go through the cache path.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-13 Thread Mike Malone
Hey folks,

We've discovered an issue on Ubuntu/Lenny with libc6 2.11.1-0ubuntu7.5 (it
may also affect versions between 2.11.1-0ubuntu7.1 and 2.11.1-0ubuntu7.4).
The bug affects systems when a large number of threads (or processes) are
created rapidly. Once triggered, the system will become completely
unresponsive for ten to fifteen minutes. We've seen this issue on our
production Cassandra clusters under high load. Cassandra seems particularly
susceptible to this issue because of the large thread pools that it creates.
In particular, we suspect the unbounded thread pool for connection
management may be pushing some systems over the edge.

We're still trying to narrow down what changed in libc that is causing this
issue. We also haven't tested things outside of xen, or on non-x86
architectures. But if you're seeing these symptoms, you may want to try
upgrading libc6.

I'll send out an update if we find anything else interesting. If anyone has
any thoughts as to what the cause is, we're all ears!

Hope this saves someone some heart-ache,

Mike


Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Mike Malone
Hey Takayuki,

I don't think you're going to find anyone willing to promise that Cassandra
will fit your petabyte scale data analysis problem. That's a lot of data,
and there's not a ton of operational experience at that scale within the
community. And the people who do work on that sort of problem tend to be
busy ;). If your problem is that big, you're probably going to need to do
some experimentation and see if the system will scale for you. I'm sure
someone here can answer any specific questions that may come up if you do
that sort of work.

As you mentioned, the first concern I'd have with a cluster that big is
whether gossip will scale. I'd suggest taking a look at the gossip code.
Cassandra nodes are omniscient in the sense that they all try to maintain
full ring state for the entire cluster. At a certain cluster size that no
longer works.

My best guess is that a cluster of 1000 machines would be fine. Maybe even
an order of maginitude bigger than that. I could be completely wrong, but
given the low overhead that I've observed that estimate seems reasonable. If
you do find that gossip won't work in your situation it would be interesting
to hear why. You may even consider modifying / updating gossip to work for
you. The code isn't as scary as it may seem. At that scale it's likely
you'll encounter bugs and corner cases that other people haven't, so it's
probably worth familiarizing yourself with the code anyways if you decide to
use Cassandra.

Mike

On Tue, Oct 26, 2010 at 1:09 AM, Takayuki Tsunakawa 
tsunakawa.ta...@jp.fujitsu.com wrote:

 Hello, Edward,

 Thank you for giving me insight about large disk nodes.

 From: Edward Capriolo edlinuxg...@gmail.com
  Index sampling on start up. If you have very small rows your indexes
  become large. These have to be sampled on start up and sampling our
  indexes for 300Gb of data can take 5 minutes. This is going to be
  optimized soon.

 5 minutes for 300 GB data ... it's not cheap, is it? Simply, 3 TB of
 data will leat to 50 minutes just for computing input splits. This is
 too expensive when I want only part of the 3 TB data.


  (Just wanted to note some of this as I am in the middle of a process
  of joining a node now :)

 Good luck. I'd appreciate if you could some performance numbers of
 joining nodes (amount of data, time to distribute data, load impact on
 applications, etc) if you can. The cluster our customer is thinking of
 is likely to become very large, so I'm interested in the elasticity.
 Yahoo!'s YCSB report makes me worry about adding nodes.

 Regards,
 Takayuki Tsunakawa


 From: Edward Capriolo edlinuxg...@gmail.com
 [Q3]
 There are some challenges with very large disk nodes.
 Caveats:
 I will use words like long, slow, and large relatively. If you
 have great equipment IE. 10G Ethernet between nodes it will not take
 long to transfer data. If you have an insane disk pack it may not
 take long to compact 200GB of data. I am basing these statements on
 server class hardware. ~32 GB ram ~2x processor, ~6 disk SAS RAID.

 Index sampling on start up. If you have very small rows your indexes
 become large. These have to be sampled on start up and sampling our
 indexes for 300Gb of data can take 5 minutes. This is going to be
 optimized soon.

 Joining nodes: When you go with larger systems joining a new node
 involves a lot of transfer, and can take a long time.  Node join
 process is going to be optimized in 0.7 and 0.8 (quite drastic changes
 in 0.7)

 Major compaction and very large normal compaction can take a long
 time. For example while doing a 200 GB compaction that takes 30
 minutes, other sstables build up, more sstables mean slower reads.

 Achieving a high RAM/DISK ratio may be easier with smaller nodes vs
 one big node with 128 GB RAM $$$.

 As Jonathan pointed out nothing technically is stopping larger disk
 nodes.

 (Just wanted to note some of this as I am in the middle of a process
 of joining a node now :)





Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-08-04 Thread Mike Malone
This may be your problem:
https://issues.apache.org/jira/browse/CASSANDRA-1358

The message deserializer executor is being created with a core pool size of
1. Since it uses a queue with unbounded capacity new requests are always
queued and the thread pool never grows. So the message deserializer becomes
a single-threaded bottleneck through which all traffic must pass. So your 16
cores are reduced to one core for handling all inter-node communication (and
any intra-node communication that's being passed through the messaging
service).

Mike

On Tue, Aug 3, 2010 at 10:02 PM, Dathan Pattishall datha...@gmail.comwrote:

 The output of htop shows threads as procs with a breakdown of how much cpu
 /etc per thread (in ncurses color!). All of these Java procs are just Java
 threads of only 1 instance of Cassandra per Server.


 On Sat, Jul 31, 2010 at 3:45 PM, Benjamin Black b...@b3k.us wrote:

 Sorry, I just noticed: are you running 14 instances of Cassandra on a
 single physical machine or are all those java processes something
 else?

 On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall datha...@gmail.com
 wrote:
  I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
 cores,
  RAID Disks).
 
  ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
  Pool NameActive   Pending  Completed
  STREAM-STAGE  0 0  0
  RESPONSE-STAGE0 0 516280
  ROW-READ-STAGE8  40961164326
  LB-OPERATIONS 0 0  0
  MESSAGE-DESERIALIZER-POOL 16820081818682
  GMFD  0 0   6467
  LB-TARGET 0 0  0
  CONSISTENCY-MANAGER   0 0 661477
  ROW-MUTATION-STAGE0 0 998780
  MESSAGE-STREAMING-POOL0 0  0
  LOAD-BALANCER-STAGE   0 0  0
  FLUSH-SORTER-POOL 0 0  0
  MEMTABLE-POST-FLUSHER 0 0  4
  FLUSH-WRITER-POOL 0 0  4
  AE-SERVICE-STAGE  0 0  0
  HINTED-HANDOFF-POOL   0 0  3
 
  EQX r...@cass04:~# vmstat -n 1
 
  procs ---memory-- ---swap-- -io --system--
  -cpu--
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
 id
  wa st
   6 10   7096 121816  16244 1037549200 1 300  5
 1
  94  0  0
   2 10   7096 116484  16248 1038114400  5636 4 21210 9820  2
 1
  79 18  0
   1  9   7096 108920  16248 1038759200  6216 0 21439 9878  2
 1
  81 16  0
   0  9   7096 129108  16248 1036485200  6024 0 23280 8753  2
 1
  80 17  0
   2  9   7096 122460  16248 1037090800  6072 0 20835 9461  2
 1
  83 14  0
   2  8   7096 115740  16260 1037575200  5168   292 21049 9511  3
 1
  77 20  0
   1 10   7096 108424  16260 1038230000  6244 0 21483 8981  2
 1
  75 22  0
   3  8   7096 125028  16260 1036410400  5584 0 21238 9436  2
 1
  81 16  0
   3  9   7096 117928  16260 1037006400  5988 0 21505 10225
 2  1
  77 19  0
   1  8   7096 109544  16260 1037664000  634028 20840 8602  2
 1
  80 18  0
   0  9   7096 127028  16240 1035765200  5984 0 20853 9158  2
 1
  79 18  0
   9  0   7096 121472  16240 1036349200  5716 0 20520 8489  1
 1
  82 16  0
   3  9   7096 112668  16240 1036987200  6404 0 21314 9459  2
 1
  84 13  0
   1  9   7096 127300  16236 1035344000  5684 0 38914 10068
 2  1
  76 21  0
 
 
  But the 16 cores are hardly utilized. Which indicates to me there is
 some
  bad thread thrashing, but why?
 
 
 
1  [|   8.3%]
 Tasks:
  1070 total, 1 running
2  [0.0%] Load
  average: 8.34 9.05 8.82
3  [0.0%]
 Uptime:
  192 days(!), 15:29:52
4  [|||17.9%]
5  [|   5.7%]
6  [||  1.3%]
7  [||  2.6%]
8  [|   0.6%]
9  [|   0.6%]
10 [||  1.9%]
11 [||  1.9%]
12 [||  1.9%]
13 [||  1.3%]
14 [|   0.6%]
15 [|| 

Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-08-04 Thread Mike Malone
So after 4096 messages get pushed on the row-read-stage queue (or any other
multiThreadedStage) the deserializer basically becomes a single-threaded
blocking queue that prevents any other inter-node RPC from occurring..?
Sounds like it's a problem either way. If the row read stage is what's
backed up, why not have the messages stack up on that stage?

Mike

On Wed, Aug 4, 2010 at 11:46 AM, Jonathan Ellis jbel...@gmail.com wrote:

 No, MDP is backing up because Row-Read-Stage [the stage after MDP on
 reads] is full at 4096, meaning you're not able to process reads as
 quickly as the requests are coming in.

 On Wed, Aug 4, 2010 at 2:21 PM, Mike Malone m...@simplegeo.com wrote:
  This may be your
  problem: https://issues.apache.org/jira/browse/CASSANDRA-1358
  The message deserializer executor is being created with a core pool size
 of
  1. Since it uses a queue with unbounded capacity new requests are always
  queued and the thread pool never grows. So the message deserializer
 becomes
  a single-threaded bottleneck through which all traffic must pass. So your
 16
  cores are reduced to one core for handling all inter-node communication
 (and
  any intra-node communication that's being passed through the messaging
  service).
  Mike
 
  On Tue, Aug 3, 2010 at 10:02 PM, Dathan Pattishall datha...@gmail.com
  wrote:
 
  The output of htop shows threads as procs with a breakdown of how much
 cpu
  /etc per thread (in ncurses color!). All of these Java procs are just
 Java
  threads of only 1 instance of Cassandra per Server.
 
  On Sat, Jul 31, 2010 at 3:45 PM, Benjamin Black b...@b3k.us wrote:
 
  Sorry, I just noticed: are you running 14 instances of Cassandra on a
  single physical machine or are all those java processes something
  else?
 
  On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall 
 datha...@gmail.com
  wrote:
   I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
   cores,
   RAID Disks).
  
   ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
   Pool NameActive   Pending  Completed
   STREAM-STAGE  0 0  0
   RESPONSE-STAGE0 0 516280
   ROW-READ-STAGE8  40961164326
   LB-OPERATIONS 0 0  0
   MESSAGE-DESERIALIZER-POOL 16820081818682
   GMFD  0 0   6467
   LB-TARGET 0 0  0
   CONSISTENCY-MANAGER   0 0 661477
   ROW-MUTATION-STAGE0 0 998780
   MESSAGE-STREAMING-POOL0 0  0
   LOAD-BALANCER-STAGE   0 0  0
   FLUSH-SORTER-POOL 0 0  0
   MEMTABLE-POST-FLUSHER 0 0  4
   FLUSH-WRITER-POOL 0 0  4
   AE-SERVICE-STAGE  0 0  0
   HINTED-HANDOFF-POOL   0 0  3
  
   EQX r...@cass04:~# vmstat -n 1
  
   procs ---memory-- ---swap-- -io --system--
   -cpu--
r  b   swpd   free   buff  cache   si   sobibo   in   cs us
 sy
   id
   wa st
6 10   7096 121816  16244 1037549200 1 300
 5
   1
   94  0  0
2 10   7096 116484  16248 1038114400  5636 4 21210 9820
   2  1
   79 18  0
1  9   7096 108920  16248 1038759200  6216 0 21439 9878
   2  1
   81 16  0
0  9   7096 129108  16248 1036485200  6024 0 23280 8753
   2  1
   80 17  0
2  9   7096 122460  16248 1037090800  6072 0 20835 9461
   2  1
   83 14  0
2  8   7096 115740  16260 1037575200  5168   292 21049 9511
   3  1
   77 20  0
1 10   7096 108424  16260 1038230000  6244 0 21483 8981
   2  1
   75 22  0
3  8   7096 125028  16260 1036410400  5584 0 21238 9436
   2  1
   81 16  0
3  9   7096 117928  16260 1037006400  5988 0 21505 10225
   2  1
   77 19  0
1  8   7096 109544  16260 1037664000  634028 20840 8602
   2  1
   80 18  0
0  9   7096 127028  16240 1035765200  5984 0 20853 9158
   2  1
   79 18  0
9  0   7096 121472  16240 1036349200  5716 0 20520 8489
   1  1
   82 16  0
3  9   7096 112668  16240 1036987200  6404 0 21314 9459
   2  1
   84 13  0
1  9   7096 127300  16236 1035344000  5684 0 38914 10068
   2  1
   76 21  0
  
  
   But the 16 cores are hardly utilized. Which indicates to me there is
   some
   bad thread thrashing, but why?
  
  
  
 1  [|   8.3%]
   Tasks:
   1070 total, 1 running
 2  [0.0%]
   Load
   average: 8.34 9.05 8.82
 3

Re: get_range_slices

2010-07-08 Thread Mike Malone
I think the answer to your question is no, you shouldn't.

I'm feeling far too lazy to do even light research on the topic, but I
remember there being a bug where replicas weren't consolidated and you'd get
a result set that included data from each replica that was consulted for a
query. That could be what you're seeing. Are you running the most recent
release? Trying dropping to CL.ONE and see if you only get one copy. If that
fixes it, I'd suggest searching JIRA.

Mike

On Thu, Jul 8, 2010 at 6:40 PM, Jonathan Shook jsh...@gmail.com wrote:

 Should I ever expect multiples of the same key (with non-empty column
 sets) from the same get_range_slices call?
 I've verified that the column data is identical byte-for-byte, as
 well, including column timestamps?



Re: Coke Products at Digg?

2010-07-07 Thread Mike Malone
On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans eev...@rackspace.com wrote:


 I heard a rumor that Digg was moving away from Coca-Cola products in all
 of its vending machines and break rooms. Can anyone from Digg comment on
 this?

 My near-term beverage consumption strategy is based largely on my
 understanding of Digg's, so if there has been a change, I may need to
 reevaluate.


Not sure about Digg, but I heard Twitter is switching over to Fanta. It's
been adopted by Coke so it must be fairly stable. There's not as much
flexibility in the product lineup, but what they do offer is extremely
delicious. Just my $0.02.

Mike


Re: Coke Products at Digg?

2010-07-07 Thread Mike Malone
On Wed, Jul 7, 2010 at 8:55 AM, Miguel Verde miguelitov...@gmail.comwrote:

 Dr. Pepper has recently been picked up by Coca Cola as well.  I wonder if
 the UnCola solutions like 7Up and Fanta are just a fad?


I'm on the fence. I mean, there's really nothing wrong with a nice cold Coke
to satiate your thirst. But we've all been drinking cola-flavored beverages
for so long I think they've become a hammer, so to speak. Can't hurt to
shake things up a bit.

Let's be real here: if you're thirsty, you should be drinking water. Coffee
or teas are more effective at delivering caffeine. And who wants to sit down
to a big steak dinner with a glass of Cola? A nice red wine is a much better
tool for the job. Horses for courses, that's my take.

Seems to me the carbonated beverage manufacturers are just starting to
realize that they can flavor their drinks with something other than the
cola-blend that Angelo Mariani invented in 1863!

Mike


 On Wed, Jul 7, 2010 at 10:50 AM, Mike Malone m...@simplegeo.com wrote:

 On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans eev...@rackspace.com wrote:


 I heard a rumor that Digg was moving away from Coca-Cola products in all
 of its vending machines and break rooms. Can anyone from Digg comment on
 this?

 My near-term beverage consumption strategy is based largely on my
 understanding of Digg's, so if there has been a change, I may need to
 reevaluate.


 Not sure about Digg, but I heard Twitter is switching over to Fanta. It's
 been adopted by Coke so it must be fairly stable. There's not as much
 flexibility in the product lineup, but what they do offer is extremely
 delicious. Just my $0.02.

 Mike





Re: Cassandra and Thrift on the Server Side

2010-06-29 Thread Mike Malone

 Still, to Clint's point, everyone knows how to make an HTTP request. If you
 want a cassandra client running on, let's say, an iPhone for some reason, a
 REST API is going to be a lot more straight forward to implement.


There's no reason an HTTP service would have to live inside the Cassandra
project though, right... we're just talking about a proxy that translates
from one protocol (HTTP) to another (thrift / avro). Shouldn't be too hard
to implement. It could even be open sourced, and referenced from the
Cassandra website, maybe even endorsed by the Cassandra project. High level
though I think it's important to resist the temptation to build things in
that could just as easily live separately and develop orthogonally.

I feel the same way about access control... I think it's more natural and
flexible for that to be handled in an application rather than in the
database... If your particular requirements end up pushing access control
back to the data store tier than it should be fairly easy to wrap the
Cassandra service at either the Java level (by subclassing) or the OS level
(by having Cassandra listen only on localhost and have an authenticating /
authorizing proxy listen for remote requests  forward). But it looks like
that decision has already been made.

Mike


Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Mike Malone
  Yes, I know. And I might end up doing this in the end. I do though have
 pretty hard upper limits of how many rows I will end up with for each key,
 but anyways it might be a good idea none the less. Thanks for the advice on
 that one.

 You set count to Integer.MAX. Did you try with say 3? IIRC that
 makes a difference (while it shouldn't) even when you have still less
 than 3.


Er, really? Just off hand, I feel like I've looked through most of the code
that would be relevant and I can't think of any reason that would be the
case. If it is, that definitely seems like a bug, particularly since the
general strategy for fetching all the things in this row is to set count
to Integer.MAX_VALUE!

Mike


Re: Is SuperColumn necessary?

2010-05-11 Thread Mike Malone
On Tue, May 11, 2010 at 7:46 AM, David Boxenhorn da...@lookin2.com wrote:

 I would like an API with a variable number of arguments. Using Java
 varargs, something like

 value = keyspace.get(articles, cars, John Smith, 2010-05-01,
 comment-25);

 or

 valueArray = keyspace.get(articles, predicate1, predicate2, predicate3,
 predicate4);


Hrm. I haven't dug that deeply into the joys of predicate logic,
propositional DAGs, etc. but couldn't this also be represented as a nested
tree of predicates / other primitives. So it would be something like:

   SubColumns = Transformation that takes a predicate, applies it to a
Column, then gets it's SubColumns
   keyspace.get(articles, SubColumns(predicate1, SubColumns(predicate2,
SubColumns(predicate3, predicate4;

It's more like functional programming-ish, I suppose, but I think that model
might apply more cleanly here. FP does tend to result in nice clean
algorithms for manipulating large data sets.

Mike




 The storage layout would be determined by the configuration, as below:

 Column Name=ThingThatsNowKey Indexed=True ClusterPartitioned=True
 ...




 On Tue, May 11, 2010 at 5:26 PM, Jonathan Shook jsh...@gmail.com wrote:

 This is one of the sticking points with the key concatenation
 argument. You can't simply access subpartitions of data along an
 aggregate name using a concatenated key unless you can efficiently
 address a range of the keys according to a property of a subset. I'm
 hoping this will bear out with more of this discussion.

 Another facet of this issue is performance with respect to storage
 layout. Presently columns within a row are inherently organized for
 efficient range operations. The key space is not generally optimal in
 this way. I'm hoping to see some discussion of this, as well.

 On Tue, May 11, 2010 at 6:17 AM, vd vineetdan...@gmail.com wrote:
  Hi
 
  Can we make range search on ID:ID format as this would be treated as
  single ID by API or can it bifurcate on ':' . If now then how do can
  we ignore usage of supercolumns where we need to associate 'n' number
  of rows to a single ID.
  Like
   CatID1- articleID1
   CatID1- articleID2
   CatID1- articleID3
   CatID1- articleID4
  How can we map such scenarios with simple column families.
 
  Rgds.
 
  On Tue, May 11, 2010 at 2:11 PM, Torsten Curdt tcu...@vafer.org
 wrote:
  Exactly.
 
  On Tue, May 11, 2010 at 10:20, David Boxenhorn da...@lookin2.com
 wrote:
  Don't think of it as getting rid of supercolum. Think of it as adding
  superdupercolums, supertriplecolums, etc. Or, in sparse array
 terminology:
  array[dim1][dim2][dim3].[dimN] = value
 
  Or, as said above:
 
Column Name=ThingThatsNowKey Indexed=True
 ClusterPartitioned=True
  Type=UTF8
  Column Name=ThingThatsNowColumnFamily DiskPartitioned=True
  Type=UTF8
Column Name=ThingThatsNowSuperColumnName Type=Long
  Column Name=ThingThatsNowColumnName Indexed=True
 Type=ASCII
Column Name=ThingThatCantCurrentlyBeRepresented/
  /Column
/Column
  /Column
/Column
 
 





Re: How to write WHERE .. LIKE query ?

2010-05-11 Thread Mike Malone
On Tue, May 11, 2010 at 8:54 AM, Schubert Zhang zson...@gmail.com wrote:

 In the future, maybe cassandra can provide some Filter or Coprocessor
 interfaces. Just like what of Bigtable do.
 But now, cassandra is too young, there are many things to do for a clear
 core.


There's been talk of adding coprocessors. It will probably happen one day.
Unfortunately, that day is probably a ways off.

Mike




 On Tue, May 11, 2010 at 11:35 PM, Mike Malone m...@simplegeo.com wrote:

 On Mon, May 10, 2010 at 11:36 PM, vd vineetdan...@gmail.com wrote:

 Hi Mike

 AFAIK cassandra queries only on keys and not on column names, please
 verify.


 Incorrect. You can slice a row or rows (identified by a key) on a column
 name range (e.g., a through m) or ask for specific columns in a row or
 rows (e.g., please give me the first_name, last_name and
 hashed_password fields from my Users column family where the key equals
 mmalone).

 See the get_range_slices() method in the thrift service.

 Mike





 On Tue, May 11, 2010 at 11:06 AM, Mike Malone m...@simplegeo.com
 wrote:
 
 
  On Mon, May 10, 2010 at 9:00 PM, Shuge Lee shuge@gmail.com
 wrote:
 
  Hi all:
  How to write WHERE ... LIKE query ?
  For examples(described in Python):
  Schema:
  # columnfamily name
  resources = [
 # key
  'foo': {
  # columns and value
  'url': 'foo.com',
  'pushlier': 'foo',
  },
  'oof': {
  'url': 'oof.com',
  'pushlier': 'off',
  },
 #  ... ,
  }
  # this is very easy,
  SELECT * FROM KEY = 'foo'
  but following are really hard:
  SELECT * FROM resources WHERE key LIKE 'o%' # get all records which
 key
  name contains character 'o'?
 
  get_range_slices(keyspace, ColumnParent(column_family),
  SlicePredicate(slice_range=SliceRange('',''), KeyRange('o', 'o~'),
  ConsistencyLevel.ONE);
 
 
  SELECT * FROM resources WHERE url == 'oof.com'
 
  This is a projection. Cassandra doesn't support this sort of query out
 of
  the box. You'll have to structure your data so that data you want to
 query
  by is in the key or column name. Or you'll have to manually build
 secondary
  indexes.
 
  Mike
 






Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
Maybe... but honestly, it doesn't affect the architecture or interface at
all. I'm more interested in thinking about how the system should work than
what things are called. Naming things are important, but that can happen
later.

Does anyone have any thoughts or comments on the architecture I suggested
earlier?

Mike

On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang zson...@gmail.com wrote:

 Yes, the column here is not appropriate.
 Maybe we need not to create new terms, in Google's Bigtable, the term
 qualifier is a good one.


 On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn da...@lookin2.com wrote:

 That would be a good time to get rid of the confusing column term, which
 incorrectly suggests a two-dimensional tabular structure.

 Suggestions:

 1. A hypercube (or hypocube, if only two dimensions): replace key and
 column with 1st dimension, 2nd dimension, etc.

 2. A file system: replace key and column with directory and
 subdirectory

 3. A tuple tree: Column family replaced by top-level tuple, whose value
 is the set of keys, whose value is the set of supercolumns of the key, whose
 value is the set of columns for the supercolumn, etc.

 4. Etc.

 On Thu, May 6, 2010 at 2:28 AM, Mike Malone m...@simplegeo.com wrote:

 Nice, Ed, we're doing something very similar but less generic.

 Now replace all of the various methods for querying with a simple query
 interface that takes a Predicate, allow the user to specify (in
 storage-conf) which levels of the nested Columns should be indexed, and
 completely remove Comparators and have people subclass Column / implement
 IColumn and we'd really be on to something ;).

 Mock storage-conf.xml:
   Column Name=ThingThatsNowKey Indexed=True
 ClusterPartitioned=True Type=UTF8
 Column Name=ThingThatsNowColumnFamily DiskPartitioned=True
 Type=UTF8
   Column Name=ThingThatsNowSuperColumnName Type=Long
 Column Name=ThingThatsNowColumnName Indexed=True
 Type=ASCII
   Column Name=ThingThatCantCurrentlyBeRepresented/
 /Column
   /Column
 /Column
   /Column

 Thrift:
   struct NamePredicate {
 1: required listbinary column_names,
   }
   struct SlicePredicate {
 1: required binary start,
 2: required binary end,
   }
   struct CountPredicate {
 1: required struct predicate,
 2: required i32 count=100,
   }
   struct AndPredicate {
 1: required Predicate left,
 2: required Predicate right,
   }
   struct SubColumnsPredicate {
 1: required Predicate columns,
 2: required Predicate subcolumns,
   }
   ... OrPredicate, OtherUsefulPredicates ...
   query(predicate, count, consistency_level) # Count here would be total
 count of leaf values returned, whereas CountPredicate specifies a column
 count for a particular sub-slice.

 Not fully baked... but I think this could really simplify stuff and make
 it more flexible. Downside is it may give people enough rope to hang
 themselves, but at least the predicate stuff is easily distributable.

 I'm thinking I'll play around with implementing some of this stuff myself
 if I have any free time in the near future.

 Mike


 On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis jbel...@gmail.comwrote:

 Very interesting, thanks!

 On Wed, May 5, 2010 at 1:31 PM, Ed Anuff e...@anuff.com wrote:
  Follow-up from last weeks discussion, I've been playing around with a
 simple
  column comparator for composite column names that I put up on github.
 I'd
  be interested to hear what people think of this approach.
 
  http://github.com/edanuff/CassandraCompositeType
 
  Ed
 
  On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff e...@anuff.com wrote:
 
  It might make sense to create a CompositeType subclass of
 AbstractType for
  the purpose of constructing and comparing these types of composite
 column
  names so that if you could more easily do that sort of thing rather
 than
  having to concatenate into one big string.
 
  On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone m...@simplegeo.com
 wrote:
 
  The only thing SuperColumns appear to buy you (as someone pointed
 out to
  me at the Cassandra meetup - I think it was Eric Florenzano) is that
 you can
  use different comparator types for the Super/SubColumns, I guess..?
 But you
  should be able to do the same thing by creating your own Column
 comparator.
  I guess my point is that SuperColumns are mostly a convenience
 mechanism, as
  far as I can tell.
  Mike
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com







Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook jsh...@gmail.com wrote:

 I have to disagree about the naming of things. The name of something
 isn't just a literal identifier. It affects the way people think about
 it. For new users, the whole naming thing has been a persistent
 barrier.


I'm saying we shouldn't be worried too much about coming up with names and
analogies until we've decided what it is we're naming.


 As for your suggestions, I'm all for simplifying or generalizing the
 how it works part down to a more generalized set of operations. I'm
 not sure it's a good idea to require users to think in terms building
 up a fluffy query structure just to thread it through a needle of an
 API, even for the simplest of queries. At some point, the level of
 generic boilerplate takes away from the semantic hand rails that
 developers like. So I guess I'm suggesting that how it works and
 how we use it are not always exactly the same. At least they should
 both hinge on a common conceptual model, which is where the naming
 becomes an important anchoring point.


If things are done properly, client libraries could expose simplified query
interfaces without much effort. Most ORMs these days work by building a
propositional directed acyclic graph that's serialized to SQL. This would
work the same way, but it wouldn't be converted into a 4GL.

Mike



 Jonathan

 On Mon, May 10, 2010 at 11:37 AM, Mike Malone m...@simplegeo.com wrote:
  Maybe... but honestly, it doesn't affect the architecture or interface at
  all. I'm more interested in thinking about how the system should work
 than
  what things are called. Naming things are important, but that can happen
  later.
  Does anyone have any thoughts or comments on the architecture I suggested
  earlier?
 
  Mike
 
  On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang zson...@gmail.com
 wrote:
 
  Yes, the column here is not appropriate.
  Maybe we need not to create new terms, in Google's Bigtable, the term
  qualifier is a good one.
 
  On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  That would be a good time to get rid of the confusing column term,
  which incorrectly suggests a two-dimensional tabular structure.
 
  Suggestions:
 
  1. A hypercube (or hypocube, if only two dimensions): replace key and
  column with 1st dimension, 2nd dimension, etc.
 
  2. A file system: replace key and column with directory and
  subdirectory
 
  3. A tuple tree: Column family replaced by top-level tuple, whose
 value
  is the set of keys, whose value is the set of supercolumns of the key,
 whose
  value is the set of columns for the supercolumn, etc.
 
  4. Etc.
 
  On Thu, May 6, 2010 at 2:28 AM, Mike Malone m...@simplegeo.com
 wrote:
 
  Nice, Ed, we're doing something very similar but less generic.
  Now replace all of the various methods for querying with a simple
 query
  interface that takes a Predicate, allow the user to specify (in
  storage-conf) which levels of the nested Columns should be indexed,
 and
  completely remove Comparators and have people subclass Column /
 implement
  IColumn and we'd really be on to something ;).
  Mock storage-conf.xml:
Column Name=ThingThatsNowKey Indexed=True
  ClusterPartitioned=True Type=UTF8
  Column Name=ThingThatsNowColumnFamily DiskPartitioned=True
  Type=UTF8
Column Name=ThingThatsNowSuperColumnName Type=Long
  Column Name=ThingThatsNowColumnName Indexed=True
  Type=ASCII
Column Name=ThingThatCantCurrentlyBeRepresented/
  /Column
/Column
  /Column
/Column
  Thrift:
struct NamePredicate {
  1: required listbinary column_names,
}
struct SlicePredicate {
  1: required binary start,
  2: required binary end,
}
struct CountPredicate {
  1: required struct predicate,
  2: required i32 count=100,
}
struct AndPredicate {
  1: required Predicate left,
  2: required Predicate right,
}
struct SubColumnsPredicate {
  1: required Predicate columns,
  2: required Predicate subcolumns,
}
... OrPredicate, OtherUsefulPredicates ...
query(predicate, count, consistency_level) # Count here would be
 total
  count of leaf values returned, whereas CountPredicate specifies a
 column
  count for a particular sub-slice.
  Not fully baked... but I think this could really simplify stuff and
 make
  it more flexible. Downside is it may give people enough rope to hang
  themselves, but at least the predicate stuff is easily distributable.
  I'm thinking I'll play around with implementing some of this stuff
  myself if I have any free time in the near future.
  Mike
 
  On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  Very interesting, thanks!
 
  On Wed, May 5, 2010 at 1:31 PM, Ed Anuff e...@anuff.com wrote:
   Follow-up from last weeks discussion, I've been playing around with
 a
   simple
   column comparator for composite column names that I put up

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 4:31 PM, AJ Chen ajc...@web2express.org wrote:

 supercolumn is good for modeling profile type of data. simple example is
 blog:
 blog { blog {author,  title, ...}
  comments   {time: commenter}  //sort by TimeUUID
 }
 when retrieving a blog, you get all the comments sorted by time already.
 without supercolumn, you would need to concatenate multiple comment times
 together as you suggested.

 requiring user to concatenating data fields together is not only an extra
 burden on user but also a less clean design.  there will be cases where the
 list property of a profile data is a long list (say a million items). in
 such cases, user wants to be able to directly insert/delete an item in that
 list because it's more efficient.  Retrieving the whole list, updating it,
 concatenating again, and then putting it back to datastore is awkward and
 less efficient.


There's nothing you said here that can't be implemented efficiently using
columns. You can slice rows and get a subset of Columns. In fact, this
example is particularly easy to implement. If you have a Blog with Entries
and Comments you'd do:

  ColumnFamily Name=Blog CompareWith=UTF8Type /

  Insert blog post:
batch_mutate(key=blog post id, [{name=~post:author, value=author},
{name=~post:title, value=title, ...))
  Insert comment:
batch_mutate(key=blog post id, [{name=TimeUUID + :author, ... }]

Then you can get the Post only (slice for [~, ]), the comments only
(slice for [, ~]), or the post _and_ comments (slice for [, ]).
Inserting a comment does _not_ require a get/concatenate/insert.

Yes, concatenating the names on the client side is hacky, clunky, and
inconvenient. That's why we _should_ build an interface that doesn't require
the client to concatenate names. But SuperColumns aren't the right way to do
it. They add no value. They could be implemented in client libraries, for
example, and nobody would know the difference.

To really understand the problem with SuperColumns, though, you need to look
at the Cassandra source. Removing SuperColumns would make the code-base much
cleaner and tighter, and would probably reduce SLOC by 20%. I think a
replacement that assumed nested Columns (or Entries, or Thingies) would be
much cleaner. That's what Stu is working on.

Mike

On Mon, May 10, 2010 at 2:20 PM, Mike Malone m...@simplegeo.com wrote:

 On Mon, May 10, 2010 at 1:38 PM, AJ Chen ajc...@web2express.org wrote:

 Could someone confirm this discussion is not about abandoning supercolumn
 family? I have found modeling data with supercolumn family is actually an
 advantage of cassadra compared to relational database. Hope you are going to
 drop this important concept.  How it's implemented internally is a different
 matter.


 SuperColumns are useful as a convenience mechanism. That's pretty much it.
 There's _nothing_ (as far as I can tell) that you can do with SuperColumns
 that you can't do by manually concatenating key names with a separator on
 the client side and implementing a custom comparator on the server (as ugly
 as that is).

 This discussion is about getting rid of SuperColumns and adding a more
 generic mechanism that will actually be useful and interesting and will
 continue to be convenient for the types of use cases for which people use
 SuperColumns.

 If there's a particular use case that you feel you can only implement with
 SuperColumns, please share! I honestly can't think of any.

 Mike


 On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook jsh...@gmail.comwrote:

 Agreed

 On Mon, May 10, 2010 at 12:01 PM, Mike Malone m...@simplegeo.com
 wrote:
  On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook jsh...@gmail.com
 wrote:
 
  I have to disagree about the naming of things. The name of something
  isn't just a literal identifier. It affects the way people think
 about
  it. For new users, the whole naming thing has been a persistent
  barrier.
 
  I'm saying we shouldn't be worried too much about coming up with names
 and
  analogies until we've decided what it is we're naming.
 
 
  As for your suggestions, I'm all for simplifying or generalizing the
  how it works part down to a more generalized set of operations. I'm
  not sure it's a good idea to require users to think in terms building
  up a fluffy query structure just to thread it through a needle of an
  API, even for the simplest of queries. At some point, the level of
  generic boilerplate takes away from the semantic hand rails that
  developers like. So I guess I'm suggesting that how it works and
  how we use it are not always exactly the same. At least they should
  both hinge on a common conceptual model, which is where the naming
  becomes an important anchoring point.
 
  If things are done properly, client libraries could expose simplified
 query
  interfaces without much effort. Most ORMs these days work by building
 a
  propositional directed acyclic graph that's serialized to SQL. This
 would
  work the same way

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone

 Mike just suggested to concate comment id with each of the comment field
 names so that the above data can be stored in normal column family. It looks
 fine except that I'm not sure the time sorting on comments still works or
 not.


In the case of time you can just use lexicographically sortable strings that
represent your timestamp (e.g., RFC 3339). You're right, I don't think
TimeUUID does that. For more complicated things (e.g., TimeUUIDs or packed
numerics that you don't want to zero pad) you'd have to implement a custom
comparator. So the convenience mechanisms that would have to be
implemented (and, in fact, Stu and Ed have pretty much already implemented)
would take care of concatenating the column names and doing the chained
comparisons for you.

Mike




 On Mon, May 10, 2010 at 5:36 PM, William Ashley wash...@gmail.com wrote:

 I'm having a difficult time understanding your syntax. Could you provide
 an example with actual data?

 On May 10, 2010, at 5:25 PM, AJ Chen wrote:

 your suggestion works for fixed supercolumn name. the blog example now
 becomes:
 { blog-id {name, title, ...}
   blog-id-comments {time:commenter}
 }

 what about supercolumn names that are not fixed? for example, I want to
 store comment's details with the blog like this:
 { blog-id { blog { name, title, ...}
   comments {comment-id:commenter}
   comment-id {commenter, time, text, ...}
 }

 a comment-id is generated on-the-fly when the comment is made.  how do you
 flatten the comment-id supercolumn to normal column?  just for brain
 exercise, not meant to pick on you.

 thanks,
 -aj



 On Mon, May 10, 2010 at 4:39 PM, William Ashley wash...@gmail.comwrote:

 If you're storing your super column under a fixed name, you could just
 concatenate that name with the row key and use normal columns. Then you get
 your paging and sorting the way you want it.


 On May 10, 2010, at 4:31 PM, AJ Chen wrote:

 supercolumn is good for modeling profile type of data. simple example is
 blog:
 blog { blog {author,  title, ...}
  comments   {time: commenter}  //sort by TimeUUID
 }
 when retrieving a blog, you get all the comments sorted by time already.
 without supercolumn, you would need to concatenate multiple comment times
 together as you suggested.

 requiring user to concatenating data fields together is not only an extra
 burden on user but also a less clean design.  there will be cases where the
 list property of a profile data is a long list (say a million items). in
 such cases, user wants to be able to directly insert/delete an item in that
 list because it's more efficient.  Retrieving the whole list, updating it,
 concatenating again, and then putting it back to datastore is awkward and
 less efficient.

 -aj


 On Mon, May 10, 2010 at 2:20 PM, Mike Malone m...@simplegeo.com wrote:

 On Mon, May 10, 2010 at 1:38 PM, AJ Chen ajc...@web2express.orgwrote:

 Could someone confirm this discussion is not about abandoning
 supercolumn family? I have found modeling data with supercolumn family is
 actually an advantage of cassadra compared to relational database. Hope 
 you
 are going to drop this important concept.  How it's implemented internally
 is a different matter.


 SuperColumns are useful as a convenience mechanism. That's pretty much
 it. There's _nothing_ (as far as I can tell) that you can do with
 SuperColumns that you can't do by manually concatenating key names with a
 separator on the client side and implementing a custom comparator on the
 server (as ugly as that is).

 This discussion is about getting rid of SuperColumns and adding a more
 generic mechanism that will actually be useful and interesting and will
 continue to be convenient for the types of use cases for which people use
 SuperColumns.

 If there's a particular use case that you feel you can only implement
 with SuperColumns, please share! I honestly can't think of any.

 Mike


 On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook jsh...@gmail.comwrote:

 Agreed

 On Mon, May 10, 2010 at 12:01 PM, Mike Malone m...@simplegeo.com
 wrote:
  On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook jsh...@gmail.com
 wrote:
 
  I have to disagree about the naming of things. The name of
 something
  isn't just a literal identifier. It affects the way people think
 about
  it. For new users, the whole naming thing has been a persistent
  barrier.
 
  I'm saying we shouldn't be worried too much about coming up with
 names and
  analogies until we've decided what it is we're naming.
 
 
  As for your suggestions, I'm all for simplifying or generalizing
 the
  how it works part down to a more generalized set of operations.
 I'm
  not sure it's a good idea to require users to think in terms
 building
  up a fluffy query structure just to thread it through a needle of
 an
  API, even for the simplest of queries. At some point, the level of
  generic boilerplate takes away from the semantic hand rails that
  developers like. So I guess I'm

Re: Is SuperColumn necessary?

2010-05-07 Thread Mike Malone
On Thu, May 6, 2010 at 5:38 PM, Vijay vijay2...@gmail.com wrote:

 I would rather be interested in Tree type structure where supercolumns have
 supercolumns in it. you dont need to compare all the columns to find a
 set of columns and will also reduce the bytes transfered for separator, at
 least string concatenation (Or something like that) for read and write
 column name generation. it is more logically stored and structured by this
 way and also we can make caching work better by selectively caching the
 tree (User defined if you will)

 But nothing wrong in supporting both :)


I'm 99% sure we're talking about the same thing and we don't need to support
both. How names/values are separated is pretty irrelevant. It has to happen
somewhere. I agree that it'd be nice if it happened on the server, but doing
it in the client makes it easier to explore ideas.

On Thu, May 6, 2010 at 5:27 PM, philip andrew philip14...@gmail.com wrote:

 Please create a new term word if the existing terms are misleading, if its
 not a file system then its not good to call it a file system.


While it's seriously bikesheddy, I guess you're right.

Let's call them thingies for now, then. So you can have a top-level
thingy and it can have an arbitrarily nested tree of sub-thingies. Each
thingy has a thingy type [1]. You can also tell Cassandra if you want a
particular level of thingy to be indexed. At one (or maybe more) levels
you can tell Cassandra you want your thingies to be split onto separate
nodes in your cluster. At one (or maybe more) levels you could also tell
Cassandra that you want your thingies split into separate files [2].

The upshot is, the Cassandra data model would go from being it's a nested
dictionary, just kidding no it's not! to being it's a nested dictionary,
for serious. Again, these are all just ideas... but I think this simplified
data model would allow you to express pretty much any query in a graph of
simple primitives like Predicates, Filters, Aggregations, Transformations,
etc. The indexes would allow you to cheat when evaluating certain types of
queries - if you get a SlicePredicate on an indexed thingy you don't have
to enumerate the entire set of sub-thingies for example.

So, you'd query your thingies by building out a predicate,
transformations, filters, etc., serializing the graph of primitives, and
sending it over the wire to Cassandra. Cassandra would rebuild the graph and
run it over your dataset.

So instead of:

  Cassandra.get_range_slices(
keyspace=AwesomeApp,
column_parent=ColumnParent(column_family=user),
slice_predicate=SlicePredicate(column_names=['username', 'dob']),
range=KeyRange(start_key='a', end_key='m'),
consistency_level=ONE
  )

You'd do something like:

  Cassandra.query(
SubThingyTransformer(
NamePredicate(names=[AwesomeApp],
SubThingyTransformer(
NamePredicate(names=[user]),
SubThingyTransformer(
SlicePredicate(start=a, end=m),
NamePredicate(names=[username, dob])
)
)
),
consistency_level=ONE
  )

Which seems complicated, but it's basically just [(user['username'],
user['dob']) for user in Cassandra['AwesomeApp']['user'].slice('a', 'm')]
and could probably be expressed that way in a client library.

I think batch_mutate is awesome the way it is and should be the only way to
insert/update data. I'd rename it mutate. So our interface becomes:

  Cassandra.query(query, consistency_level)
  Cassandra.mutate(mutation, consistency_level)

Ta-da.

Anyways, I was trying to avoid writing all of this out in prose and try
mocking some of it up in code instead. I guess this this works too. Either
way, I do think something like this would simplify the codebase, simplify
the data model, simplify the interface, make the entire system more
flexible, and be generally awesome.

Mike

[1] These can be subclasses of Thingy in Java... or maybe they'd implement
IThingy. But either way they'd handle serialization and probably implement
compareTo to define natural ordering. So you'd have classes like
ASCIIThingy, UTF8Thingy, and LongThingy (ahem) - these would replace
comparators.

[2] I think there's another simplification here. Splitting into separate
files is really very similar to splitting onto separate nodes. There might
be a way around some of the row size limitations with this sort of concept.
And we may be able to get better utilization of multiple disks by giving
each disk (or data directory) a subset of the node's token range. Caveat:
thought not fully baked.


Re: pagination through slices with deleted keys

2010-05-07 Thread Mike Malone
On Fri, May 7, 2010 at 5:29 AM, Joost Ouwerkerk jo...@openplaces.orgwrote:

 +1.  There is some disagreement on whether or not the API should
 return empty columns or skip rows when no data is found.  In all of
 our use cases, we would prefer skipped rows.  And based on how
 frequently new cassandra users appear to be confused about the current
 behaviour, this might be a more common use case than the need for
 empty cols.  Perhaps this could be added as an option on
 SlicePredicate ?  (e.g. skipEmpty=true).


That's exactly how we implemented it:

struct SlicePredicate {
1: optional listbinary column_names,
2: optional SliceRange   slice_range,
3: optional bool ignore_empty_rows=0,
}

Mike


Re: pagination through slices with deleted keys

2010-05-06 Thread Mike Malone
Our solution at SimpleGeo has been to hack Cassandra to (optionally, at
least) be sensible and drop Rows that don't have any Columns. The claim from
the FAQ that Cassandra would have to check if there are any other columns
in the row is inaccurate. The common case for us at least is that we're
only interested in Rows that have Columns matching our predicate. So if
there aren't any, we just don't return that row. No need to check if the
entire row is deleted.

Mike

On Thu, May 6, 2010 at 9:17 AM, Ian Kallen spidaman.l...@gmail.com wrote:

 I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki
 which do a good job describing how difficult deletion is in an eventually
 consistent system. But practical application strategies for dealing with it
 aren't there (that I saw). I'm wondering how folks implement pagination in
 their applications; if you want to render N results in an application, is
 the only solution to over-fetch and filter out the tombstones? Or is there
 something simpler that I overlooked? I'd like to be able to count (even if
 the counts are approximate) and fetch rows with the deleted ones filtered
 out (without waiting for the GCGraceSeconds interval + compaction) but from
 what I see so far, the burden is on the app to deal with the tombstones.
 -Ian



Re: pagination through slices with deleted keys

2010-05-06 Thread Mike Malone
On Thu, May 6, 2010 at 3:27 PM, Ian Kallen spidaman.l...@gmail.com wrote:

 Cool, is this a patch you've applied on the server side? Are you running
 0.6.x? I'm wondering if this kind of thing can make it into future versions
 of Cassandra.


Yea, server side. It's basically doing the same thing clients typically want
to do (again, at least for our use cases) but doing it closer to the data.
Our patch is kind of janky though. I can probably get some version of it
pushed back upstream - or at least on github or something - if there's any
interest.

Mike


Re: Is SuperColumn necessary?

2010-05-05 Thread Mike Malone
Nice, Ed, we're doing something very similar but less generic.

Now replace all of the various methods for querying with a simple query
interface that takes a Predicate, allow the user to specify (in
storage-conf) which levels of the nested Columns should be indexed, and
completely remove Comparators and have people subclass Column / implement
IColumn and we'd really be on to something ;).

Mock storage-conf.xml:
  Column Name=ThingThatsNowKey Indexed=True ClusterPartitioned=True
Type=UTF8
Column Name=ThingThatsNowColumnFamily DiskPartitioned=True
Type=UTF8
  Column Name=ThingThatsNowSuperColumnName Type=Long
Column Name=ThingThatsNowColumnName Indexed=True Type=ASCII
  Column Name=ThingThatCantCurrentlyBeRepresented/
/Column
  /Column
/Column
  /Column

Thrift:
  struct NamePredicate {
1: required listbinary column_names,
  }
  struct SlicePredicate {
1: required binary start,
2: required binary end,
  }
  struct CountPredicate {
1: required struct predicate,
2: required i32 count=100,
  }
  struct AndPredicate {
1: required Predicate left,
2: required Predicate right,
  }
  struct SubColumnsPredicate {
1: required Predicate columns,
2: required Predicate subcolumns,
  }
  ... OrPredicate, OtherUsefulPredicates ...
  query(predicate, count, consistency_level) # Count here would be total
count of leaf values returned, whereas CountPredicate specifies a column
count for a particular sub-slice.

Not fully baked... but I think this could really simplify stuff and make it
more flexible. Downside is it may give people enough rope to hang
themselves, but at least the predicate stuff is easily distributable.

I'm thinking I'll play around with implementing some of this stuff myself if
I have any free time in the near future.

Mike

On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Very interesting, thanks!

 On Wed, May 5, 2010 at 1:31 PM, Ed Anuff e...@anuff.com wrote:
  Follow-up from last weeks discussion, I've been playing around with a
 simple
  column comparator for composite column names that I put up on github.
 I'd
  be interested to hear what people think of this approach.
 
  http://github.com/edanuff/CassandraCompositeType
 
  Ed
 
  On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff e...@anuff.com wrote:
 
  It might make sense to create a CompositeType subclass of AbstractType
 for
  the purpose of constructing and comparing these types of composite
 column
  names so that if you could more easily do that sort of thing rather than
  having to concatenate into one big string.
 
  On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone m...@simplegeo.com
 wrote:
 
  The only thing SuperColumns appear to buy you (as someone pointed out
 to
  me at the Cassandra meetup - I think it was Eric Florenzano) is that
 you can
  use different comparator types for the Super/SubColumns, I guess..? But
 you
  should be able to do the same thing by creating your own Column
 comparator.
  I guess my point is that SuperColumns are mostly a convenience
 mechanism, as
  far as I can tell.
  Mike
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Is SuperColumn necessary?

2010-04-28 Thread Mike Malone
On Wed, Apr 28, 2010 at 5:24 AM, David Boxenhorn da...@lookin2.com wrote:

 If I understand correctly, the distinction between supercolumns and
 subcolumns is critical to good database design if you want to use random
 partitioning: you can do range queries on subcolumns but not on
 supercolumns.

 Is this correct?


You can do efficient range queries of normal (not super) columns in a
ColumnFamily. I think SuperColumn's are not indexed, so it's less efficient
to do a slice of subcolumns from a column, if there are lots of subcolumns.

I agree that SuperColumns are technically unnecessary. There aren't any use
cases I can come up with that a SuperColumn satisfies that normal Columns
can't. You can simulate SuperColumn behavior by concatenating key parts with
a separator and using the concatenated key as your column name, then doing a
slice. So if you had a SuperColumn that stored usernames, and sub-columns
that stored document IDs, you could instead have a normal CF that stores
username:document-id.

The only thing SuperColumns appear to buy you (as someone pointed out to me
at the Cassandra meetup - I think it was Eric Florenzano) is that you can
use different comparator types for the Super/SubColumns, I guess..? But you
should be able to do the same thing by creating your own Column comparator.
I guess my point is that SuperColumns are mostly a convenience mechanism, as
far as I can tell.

Mike


Re: At what point does the cluster get faster than the individual nodes?

2010-04-22 Thread Mike Malone
On Wed, Apr 21, 2010 at 9:50 AM, Mark Greene green...@gmail.com wrote:

 Right it's a similar concept to DB sharding where you spread the write load
 around to different DB servers but won't necessarily increase the throughput
 of an one DB server but rather collectively.


Except with Cassandra, read-repair causes every read to go to every replica
for a piece of data.

Mike


Re: timestamp not found

2010-04-15 Thread Mike Malone
Looks like the timestamp, in this case, is 0. Does Cassandra allow zero
timestamps? Could be a bug in Cassandra doing an implicit boolean coercion
in a conditional where it shouldn't.

Mike

On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker l...@socialagency.com wrote:

 We are currently migrating about 70G of data from mysql to cassandra.  I am
 occasionally getting the following error:

 Required field 'timestamp' was not found in serialized data! Struct:
 Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F
 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62
 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B
 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00,
 timestamp:0)

 The loop which is building out the mutation map for the batch_mutate call
 is adding a timestamp to each column.  I have verified that the time stamp
 is there for several calls and I feel like if the logic was bad, i would see
 the error more frequently.  Does anyone have suggestions as to what may be
 causing this?

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]



Re: Reading thousands of columns

2010-04-14 Thread Mike Malone
On Wed, Apr 14, 2010 at 7:45 AM, Jonathan Ellis jbel...@gmail.com wrote:

 35-50ms for how many rows of 1000 columns each?

 get_range_slices does not use the row cache, for the same reason that
 oracle doesn't cache tuples from sequential scans -- blowing away
 1000s of rows worth of recently used rows queried by key, for a swath
 of rows from the scan, is the wrong call more often than it is the
 right one.


Couldn't you cache a list of keys that were returned for the key range, then
cache individual rows separately or not at all?

By blowing away rows queried by key I'm guessing you mean pushing them
out of the LRU cache, not explicitly blowing them away? Either way I'm not
entirely convinced. In my experience I've had pretty good success caching
items that were pulled out via more complicated join / range type queries.
If your system is doing lots of range quereis, and not a lot of lookups by
key, you'd obviously see a performance win from caching the range queries.
Maybe range scan caching could be turned on separately?

Mike


Re: How do vector clocks and conflicts work?

2010-04-06 Thread Mike Malone
On Tue, Apr 6, 2010 at 11:03 AM, Tatu Saloranta tsalora...@gmail.comwrote:

 On Tue, Apr 6, 2010 at 8:45 AM, Mike Malone m...@simplegeo.com wrote:
  As long as the conflict resolver knows that two writers each tried to
  increment, then it can increment twice. The conflict resolver must know
  about the semantics of increment or decrement or string append or
  binary patch or whatever other merge strategy you choose. You'll
 register
  your strategy with Cassandra and it will apply it. Presumably it will
 also
  maintain enough context about what you were trying to accomplish to
 allow
  the merge strategy plugin to do it properly.
 
 
  That is to say, my understanding was that vector clocks would be
  required but not sufficient for reconciliation of concurrent value
  updates.
 
  The way I envisioned eventually consistent counters working would
 require
  something slightly more sophisticated... but not too bad. As incr/decr
  operations happen on distributed nodes, each node would keep a (vector
  clock, delta) tuple for that node's local changes. When a client fetched
 the
  value of the counter the vector clock deltas and the reconciled count
 would
  be combined into a single result. Similarly, when a replication /
  hinted-handoff / read-repair reconciliation occurred the counts would be
  merged into a single (vector clock, count) tuple.
  Maybe there's a more elegant solution, but that's how I had been thinking
  about this particular problem.

 I doubt there is any simple and elegant solution -- if there was, it
 would have been invented in 50s if there was. :-)

 Given this, yes, something along these lines sounds realistic. It also
 sounds like implementation would greatly benefit (if not require)
 foundational support from core, as opposed to being done outside of
 Cassandra (which I understand you are suggesting). I wasn't sure if
 the idea was to try to do this completely separate (aside from vector
 clock support).


I'd probably put it in core. Or at least put some more generic support for
this sort of conflict resolution in core. I'm looking forward to seeing
Digg's patch for this stuff.

Mike


Re: Memcached protocol?

2010-04-05 Thread Mike Malone
On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod p...@ayogo.com wrote:

 On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone m...@simplegeo.com wrote:
  That's useful information Mike. I am a bit curious about what the most
  common use cases are for atomic increment/decrement. I'm familiar with
  atomic add as a sort of locking mechanism.
 
  They're useful for caching denormalized counts of things. Especially
 things
  that change rapidly. Instead of invalidating the counter whenever an
 event
  occurs that would incr/decr the counter, you can incr/decr the cached
 count
  too.

 Do you think that a future cassandra increment/decrement would be
 incompatible with those use cases?

 It seems to me that in that use case, an eventually consistent counter
 is as useful as any other eventually consistent datum.


An eventually consistent count operation in Cassandra would be great, and it
would satisfy all of the use cases I would typically use counts for in
memcached. It's just a matter of reconciling inconsistencies with a more
sophisticated operation than latest write wins (specifically, the
reconciliation operation should apply all incr/decr ops).

Mike


Re: Ring management and load balance

2010-03-25 Thread Mike Malone
On Thu, Mar 25, 2010 at 9:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 The advantage to doing it the way Cassandra does is that you can keep
 keys sorted with OrderPreservingPartitioner for range scans.  grabbing
 one token of many from each node in the ring would prohibit that.

 So we rely on active load balancing to get to a good enough balance,
 say within 50%.  It doesn't need to be perfect.


This makes sense for the order preserving partitioner. But for the random
partitioner multiple tokens per node would certainly make balancing
easier... I haven't dug into that bit of the Cassandra implementation yet.
Would it be very difficult to support both modes of operation?

For what it's worth, we've already seen annoying behavior when adding nodes
to the cluster. It's obviously true that the absolute size of partitions
becomes smaller as the cluster grows, but if your relatively balanced 100
node cluster is at, say, 70% capacity and you add 10 more nodes you would
presumably want this additional capacity to be evenly distributed. And right
now that's pretty much impossible to do without rebalancing the entire
cluster.

Mike