Re: Data Model Index Text

2010-01-08 Thread Drew Schleck
I think I am reading this right, basically you want to query for a
word and find all of the documents that contain it? While there may be
a better way to do this, the way the people at Facebook do it is with
supercolumns. Inside the supercolumn column family they have columns
for every word, such as Michael and Jordan, and within each of
those columns they have keys that correspond to the ids of all of the
documents.

I suppose if you do it this way you're forced to figure out which
documents are contained in all of the sets in memory, but if it's good
enough for Facebook I suppose it can't be too bad.

This video talks about it briefly:
http://www.facebook.com/video/video.php?v=540974400803

Drew

On Fri, Jan 8, 2010 at 14:12, ML_Seda sonnyh...@gmail.com wrote:

 Hey,

 I've been reading up on the Cassandra data model a bit, and would like to
 get some input from this forum on different techniques for a particular
 problem.

 Assume I need to index millions of text docs (e.g. research papers), and
 allow the ability to query them by a given word inside or around any of the
 indexed docs.  meaning if i search for terms i would like to get a list of
 docs in which these terms show up (e.g. Michael Jordan = Michael is the main
 term, and Jordan is next term n1.  The same can be applied by indicating
 previous terms to Michael)

 How do I model this in Cassandra?

 Would my Keys be a concat of the middle term + docid?  Will I be able to do
 queries by wildcarding the docid?

 Thanks.
 --
 View this message in context: 
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: problem running lazyboy example: save() is not working?

2009-10-19 Thread Drew Schleck
Hey,
  I think I've fixed these before, let me know how the attached patch works
for you. The first issue which caused the the second True is because it's
not undirtying itself correctly, and there's also a bug in
examples/record.py which makes it not run because the variable key doesn't
exist.

Drew

On Mon, Oct 19, 2009 at 04:48, Walter Gillett walter_gill...@yahoo.comwrote:

 I'm running the latest lazyboy against apache-cassandra-incubating-0.4.0,
 with python 2.6.2 on ubuntu 9.04. I have modified the cassandra file
 storage-conf.xml to add the Keyspace that lazyboy wants:

 Keyspace Name=UserData
   ColumnFamily CompareWith=BytesType Name=Users/
 /Keyspace

 and have cleaned out all the data under the /var/lib/cassandra directories
 to ensure that cassandra gets a clean start. On running the lazyboy example
 like so: python record.py I get the output shown below, with an error
 about Bad key passed to load(). (I've modified the example slightly to
 make the print statements more informative but the executable code is
 unchanged.)

 On reading the code, looks like things go off the tracks earlier than that.
 The example code first prints u.is_modified(), which is True because u
 contains an unsaved User object. Then comes u.save() to save the User to
 cassandra. Then print u.is_modified()again to show that now the value is
 False. Except that the value I get is actually True! Thinking that explains
 the downstream error in trying to load the object, it didn't get saved
 properly so it can't be loaded.

 Any ideas what's going on? I tried the obvious trick of putting in sleep(5)
 in case there is a race condition but that didn't work. Is lazyboy
 compatible with this version of cassandra? The lazyboy example comes with
 Cassandra-0.4.0-py2.6.egg suggesting that the version is compatible, but
 clearly something is wrong. I have also included the cassandra console
 output below in case that's useful.

 Walter Gillett

 Running the example (with modified print statements):

 $ python record.py
 u.key:
 {'column_family': 'Users', 'keyspace': 'UserData', 'super_column': None,
 'key': 'd86e9901ba6845ba84d3d93b7ff4a7b8'}

 User(data):
 User: {'username': 'ieure', 'email': 'i...@digg.com'}

 u.is_modified()
 True

 u.is_modified()
 True

 Traceback (most recent call last):
   File record.py, line 80, in module
 u_ = User().load(key)
   File /usr/lib/python2.6/site-packages/lazyboy/record.py, line 128, in
 load
 assert isinstance(key, Key), Bad key passed to load()
 AssertionError: Bad key passed to load()

 Cassandra console output:

 $ $CASSANDRA/bin/cassandra -f
 Listening for transport dt_socket at address: 
 DEBUG - Loading settings from /opt/cassandra/bin/../conf/storage-conf.xml
 DEBUG - Syncing log with a period of 1000
 DEBUG - opening keyspace Keyspace1
 DEBUG - adding Super1 as 0
 DEBUG - adding Standard2 as 1
 DEBUG - adding Standard1 as 2
 DEBUG - adding StandardByUUID1 as 3
 DEBUG - adding LocationInfo as 4
 DEBUG - adding HintsColumnFamily as 5
 DEBUG - adding Users as 6
 DEBUG - opening keyspace system
 DEBUG - opening keyspace UserData
 INFO - Saved Token not found. Using 90758413375805622705049558609953482402
 DEBUG - Starting to listen on 127.0.0.1:7001
 DEBUG - Binding thrift service to localhost:9160
 INFO - Cassandra starting up...
 DEBUG - batch_insert
 DEBUG - insertBlocking writing key f9df5e4875b54281ab57d36efed723b0 to 12@
 [127.0.0.1:7000]
 DEBUG - Applying RowMutation(table='UserData',
 key='f9df5e4875b54281ab57d36efed723b0', modifications=[ColumnFamily(Users
 [[101, 109, 97, 105, 108],[117, 115, 101, 114, 110, 97, 109, 101],])])
 DEBUG - RowMutation(table='UserData',
 key='f9df5e4875b54281ab57d36efed723b0', modifications=[ColumnFamily(Users
 [[101, 109, 97, 105, 108],[117, 115, 101, 114, 110, 97, 109, 101],])])
 applied.  Sending response to 1...@127.0.0.1:7000
 DEBUG - Processing response on a callback from 1...@127.0.0.1:7000

From 76089688c4edc624dba64618d5f85adc1b07ea64 Mon Sep 17 00:00:00 2001
From: Drew Schleck drew.schl...@gmail.com
Date: Mon, 19 Oct 2009 19:03:25 -0700
Subject: [PATCH] Fixed record.py

---
 examples/record.py |2 +-
 lazyboy/record.py  |4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/examples/record.py b/examples/record.py
index 4df24c0..bda595d 100644
--- a/examples/record.py
+++ b/examples/record.py
@@ -79,7 +79,7 @@ u.save()   # - {'username': 'ieure', 'email': 
'i...@digg.com'}
 print u.is_modified()   # - False
 
 # Load it in a new instance.
-u_ = User().load(key)
+u_ = User().load(u.key.clone())
 print u_   # - {'username': 'ieure', 'email': 'i...@digg.com'}
 
 print u.is_modified()   # - False
diff --git a/lazyboy/record.py b/lazyboy/record.py
index 0e2f100..0bc0c9c 100644
--- a/lazyboy/record.py
+++ b/lazyboy/record.py
@@ -161,12 +161,14 @@ class Record(CassandraBase, dict):
 for path in deleted:
 client.remove(self.key.keyspace, self.key.key

Re: Newbe´s question

2009-08-25 Thread Drew Schleck
For anyone using my branch of Lazyboy, Ian Eure pulled my work,
improved it, and more. You ought to switch back to his version.

Drew


Re: Pls, help with fetching of super-column's value

2009-08-19 Thread Drew Schleck
Try setting start to  and end to ~. This is what I did to fix
Lazyboy and it seems to work alright for now.

2009/8/19 Teodor Sigaev teo...@sigaev.ru:
 start and finish in SliceRange are non-optional.  Try empty strings.

 This is a partial fix :) - it works and doesn't emit any exception but
 returns nothing.

 --
 Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/