Re: how does cassandra compare with mongodb?

2010-05-13 Thread philip andrew
MongoDB encourages you to define your schema in your application code by
using mapping classes. This logically infers that it makes no sense to
define the schema twice, in the database and in your application code.

On Fri, May 14, 2010 at 3:48 AM, Steve Lihn stevel...@gmail.com wrote:

 What is changing? A more flexible schema or no need to restart (some kind
 of hot-reboot)?

 Mongo guys claims that Mongo's advantage is a schema-less design. Basically
 you can have any data structure you want and you can change them anyway you
 want. This is done in the name of flexibility, but I am not sure this is a
 good practice. People argued for years that perl is bad because it is
 typeless and java is strong typed and is better. Now the java community is
 developing a database like Mongo that is schema-less. How does this
 complements the strong-type argument?

 The less requirement is put on database schema design, the more burden is
 put on the application to maintain data integrity. Why is this a good trend?
 Can someone kindly explain?

 Steve




 On Thu, May 13, 2010 at 1:22 PM, Vijay vijay2...@gmail.com wrote:

 Cassandra requires the schema to be defined before the database starts,
 MongoDB can have any schema at run-time just like a normal database.

 This is changing in 0.7

 Regards,
 /VJ





Re: Is SuperColumn necessary?

2010-05-06 Thread philip andrew
Please create a new term word if the existing terms are misleading, if its
not a file system then its not good to call it a file system.

On Thu, May 6, 2010 at 3:50 PM, Torsten Curdt tcu...@vafer.org wrote:

 +1 on all of that

 On Thu, May 6, 2010 at 09:04, David Boxenhorn da...@lookin2.com wrote:
  That would be a good time to get rid of the confusing column term,
 which
  incorrectly suggests a two-dimensional tabular structure.
 
  Suggestions:
 
  1. A hypercube (or hypocube, if only two dimensions): replace key and
  column with 1st dimension, 2nd dimension, etc.
 
  2. A file system: replace key and column with directory and
  subdirectory
 
  3. A tuple tree: Column family replaced by top-level tuple, whose value
 is
  the set of keys, whose value is the set of supercolumns of the key, whose
  value is the set of columns for the supercolumn, etc.
 
  4. Etc.
 
  On Thu, May 6, 2010 at 2:28 AM, Mike Malone m...@simplegeo.com wrote:
 
  Nice, Ed, we're doing something very similar but less generic.
  Now replace all of the various methods for querying with a simple query
  interface that takes a Predicate, allow the user to specify (in
  storage-conf) which levels of the nested Columns should be indexed, and
  completely remove Comparators and have people subclass Column /
 implement
  IColumn and we'd really be on to something ;).
  Mock storage-conf.xml:
Column Name=ThingThatsNowKey Indexed=True
 ClusterPartitioned=True
  Type=UTF8
  Column Name=ThingThatsNowColumnFamily DiskPartitioned=True
  Type=UTF8
Column Name=ThingThatsNowSuperColumnName Type=Long
  Column Name=ThingThatsNowColumnName Indexed=True
  Type=ASCII
Column Name=ThingThatCantCurrentlyBeRepresented/
  /Column
/Column
  /Column
/Column
  Thrift:
struct NamePredicate {
  1: required listbinary column_names,
}
struct SlicePredicate {
  1: required binary start,
  2: required binary end,
}
struct CountPredicate {
  1: required struct predicate,
  2: required i32 count=100,
}
struct AndPredicate {
  1: required Predicate left,
  2: required Predicate right,
}
struct SubColumnsPredicate {
  1: required Predicate columns,
  2: required Predicate subcolumns,
}
... OrPredicate, OtherUsefulPredicates ...
query(predicate, count, consistency_level) # Count here would be total
  count of leaf values returned, whereas CountPredicate specifies a column
  count for a particular sub-slice.
  Not fully baked... but I think this could really simplify stuff and make
  it more flexible. Downside is it may give people enough rope to hang
  themselves, but at least the predicate stuff is easily distributable.
  I'm thinking I'll play around with implementing some of this stuff
 myself
  if I have any free time in the near future.
  Mike
 
  On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  Very interesting, thanks!
 
  On Wed, May 5, 2010 at 1:31 PM, Ed Anuff e...@anuff.com wrote:
   Follow-up from last weeks discussion, I've been playing around with a
   simple
   column comparator for composite column names that I put up on github.
   I'd
   be interested to hear what people think of this approach.
  
   http://github.com/edanuff/CassandraCompositeType
  
   Ed
  
   On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff e...@anuff.com wrote:
  
   It might make sense to create a CompositeType subclass of
 AbstractType
   for
   the purpose of constructing and comparing these types of composite
   column
   names so that if you could more easily do that sort of thing rather
   than
   having to concatenate into one big string.
  
   On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone m...@simplegeo.com
   wrote:
  
   The only thing SuperColumns appear to buy you (as someone pointed
 out
   to
   me at the Cassandra meetup - I think it was Eric Florenzano) is
 that
   you can
   use different comparator types for the Super/SubColumns, I guess..?
   But you
   should be able to do the same thing by creating your own Column
   comparator.
   I guess my point is that SuperColumns are mostly a convenience
   mechanism, as
   far as I can tell.
   Mike
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 
 
 



How to model 2D data in Cassandra?

2010-04-17 Thread philip andrew
Hi,

Lets say I wanted to store 2 dimensional data in the database, each object
has a X and Y location in a very large space.

I want to query Cassandra for all objects within a rectangle.

My understanding is that my objects can only be indexed by one key, one key
for each single object in my table. The key can be string or number or time,
which is supported by the index.

So I guess I could find all objects in the range of 500x510 if x is my
key. Also if I had another table with y as the key, then I could find all
objects with 800y810 range, then bring them into my program and search
through them to find the intersection of both conditions for those objects.

Am I miss-understanding anything?

Thanks! Philip