Re: Data Model Index Text

2010-01-13 Thread Ryan Daum
On the topic of Lucandra, apart from having it work with 0.5 of Cassandra,
has any work been done to get it up to date with Lucene 2.9/3.0?

Also, I'm a bit concerned about its use of OrderPreservingPartitioner; is
there an architecture for storage that could be considered that would work
with RandomPartitioner?

Ryan

On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote:


 i do see the classes now, but All the way back in version .20.  Is there a
 newer version of Lucandra.  It would be nice for us to use the lastest
 cassandra (trunk).
 --
 View this message in context:
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Data Model Index Text

2010-01-13 Thread ML_Seda

I'm assuming I have to run the thrift gen-java from cassandra .4 release.  Is
there any documentation or tutorial on how to get that up and running?

I've checked both cassandra and lucandra into eclipse, but the lucandra
project is still unable to resolve some Classes.  This is because I need to
generate the java client classes?

Thanks.

Jake Luciani wrote:
 
 It should work but not a ton has changed in 2.9/3.0 AFAIK.  I'm going to
 work on updating Lucandra to work with 0.5 branch I can try to update this
 as well.  BTW, if you want to see Lucandra in action check out
 http://flocking.me (example: http://flocking.me/tjake )
 
 You can use a random partitioner if you store the entire index under a
 supercolumn (how it was originally implemented) but then you need to
 accept
 the entire index will be in memory for any operation on that index (bad
 for
 big indexes).
 
 -Jake
 
 On Wed, Jan 13, 2010 at 9:14 AM, Ryan Daum r...@thimbleware.com wrote:
 
 On the topic of Lucandra, apart from having it work with 0.5 of
 Cassandra,
 has any work been done to get it up to date with Lucene 2.9/3.0?

 Also, I'm a bit concerned about its use of OrderPreservingPartitioner; is
 there an architecture for storage that could be considered that would
 work
 with RandomPartitioner?

 Ryan


 On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote:


 i do see the classes now, but All the way back in version .20.  Is there
 a
 newer version of Lucandra.  It would be nice for us to use the lastest
 cassandra (trunk).
 --
 View this message in context:
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive
 at
 Nabble.com.



 
 

-- 
View this message in context: 
http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4349520.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Data Model Index Text

2010-01-13 Thread Jonathan Ellis
On Wed, Jan 13, 2010 at 10:04 AM, ML_Seda sonnyh...@gmail.com wrote:

 I'm assuming I have to run the thrift gen-java from cassandra .4 release.  Is
 there any documentation or tutorial on how to get that up and running?

No, cassandra includes a copy of the thrift Java classes.  You don't
need to mess w/ the thrift compiler.

-Jonathan


Re: Data Model Index Text

2010-01-13 Thread Jake Luciani
You should be using the ant file to build lucandra, see README.

For eclipse you need to add lucandra/gen-java to src path (this contains the
thrift stubs).

-Jake

On Wed, Jan 13, 2010 at 11:04 AM, ML_Seda sonnyh...@gmail.com wrote:


 I'm assuming I have to run the thrift gen-java from cassandra .4 release.
  Is
 there any documentation or tutorial on how to get that up and running?

 I've checked both cassandra and lucandra into eclipse, but the lucandra
 project is still unable to resolve some Classes.  This is because I need to
 generate the java client classes?

 Thanks.

 Jake Luciani wrote:
 
  It should work but not a ton has changed in 2.9/3.0 AFAIK.  I'm going
 to
  work on updating Lucandra to work with 0.5 branch I can try to update
 this
  as well.  BTW, if you want to see Lucandra in action check out
  http://flocking.me (example: http://flocking.me/tjake )
 
  You can use a random partitioner if you store the entire index under a
  supercolumn (how it was originally implemented) but then you need to
  accept
  the entire index will be in memory for any operation on that index (bad
  for
  big indexes).
 
  -Jake
 
  On Wed, Jan 13, 2010 at 9:14 AM, Ryan Daum r...@thimbleware.com wrote:
 
  On the topic of Lucandra, apart from having it work with 0.5 of
  Cassandra,
  has any work been done to get it up to date with Lucene 2.9/3.0?
 
  Also, I'm a bit concerned about its use of OrderPreservingPartitioner;
 is
  there an architecture for storage that could be considered that would
  work
  with RandomPartitioner?
 
  Ryan
 
 
  On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote:
 
 
  i do see the classes now, but All the way back in version .20.  Is
 there
  a
  newer version of Lucandra.  It would be nice for us to use the lastest
  cassandra (trunk).
  --
  View this message in context:
  http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html
  Sent from the cassandra-user@incubator.apache.org mailing list archive
  at
  Nabble.com.
 
 
 
 
 

 --
 View this message in context:
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4349520.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Data Model Index Text

2010-01-13 Thread Jake Luciani
Ah, yes. I made this change locally once. let me try to find it.

On Wed, Jan 13, 2010 at 10:43 AM, Ryan Daum r...@thimbleware.com wrote:

 The only tricky point I saw with 3.0 Lucene switch was that the TokenStream
 API changed completely, and IndexWriter in your code depended on the old
 API.

 I've ruled out OrderPreservingPartitioner for other jobs of mine because
 distribution of keys is likely not ideal across my cluster. I'm curious with
 Lucandra if the keys truly distribute well?

 R

 On Wed, Jan 13, 2010 at 10:26 AM, Jake Luciani jak...@gmail.com wrote:

 It should work but not a ton has changed in 2.9/3.0 AFAIK.  I'm going to
 work on updating Lucandra to work with 0.5 branch I can try to update this
 as well.  BTW, if you want to see Lucandra in action check out
 http://flocking.me (example: http://flocking.me/tjake )

 You can use a random partitioner if you store the entire index under a
 supercolumn (how it was originally implemented) but then you need to accept
 the entire index will be in memory for any operation on that index (bad for
 big indexes).

 -Jake


 On Wed, Jan 13, 2010 at 9:14 AM, Ryan Daum r...@thimbleware.com wrote:

 On the topic of Lucandra, apart from having it work with 0.5 of
 Cassandra, has any work been done to get it up to date with Lucene 2.9/3.0?

 Also, I'm a bit concerned about its use of OrderPreservingPartitioner; is
 there an architecture for storage that could be considered that would work
 with RandomPartitioner?

 Ryan


 On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote:


 i do see the classes now, but All the way back in version .20.  Is there
 a
 newer version of Lucandra.  It would be nice for us to use the lastest
 cassandra (trunk).
 --
 View this message in context:
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive
 at Nabble.com.







Re: Data Model Index Text

2010-01-12 Thread ML_Seda

i do see the classes now, but All the way back in version .20.  Is there a
newer version of Lucandra.  It would be nice for us to use the lastest
cassandra (trunk).
-- 
View this message in context: 
http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Data Model Index Text

2010-01-11 Thread ML_Seda

Thanks Drew.  That is correct.

That would be one of the queries (give me all documents in which a list of
terms are present)
But not only that, another query would allow users to search for words
around a given word.

Keying in Michael would have a list of words in all documents after the
word Michael (e.g. Jordan, Jackson etc).  The same is done for words
before a given word.

Is cassandra not optimal for this?  As pointed out by Ian I will look into
Lucandra as well.  Thanks.
-- 
View this message in context: 
http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4286704.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Data Model Index Text

2010-01-11 Thread ML_Seda

Is there a particular version of cassandra required for Lucandra to work? 

It's not able to resolve Cassandra Class, along with a few others.   I have
trunk cassandra checked out, and Lucandra from github the link provided
below.


Ian Holsman-3 wrote:
 
 Hi ML.
 this sounds more like a job for SOLR, but if you want to do this with
 cassandra, 
 you should look at Jake's Lucandra http://github.com/tjake/Lucandra
 
 
 you should also look at
 http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/
 
 I wouldn't recommend you building your own IR engine, just use one of the
 ones out there.
 
 regards
 Ian
 On Jan 9, 2010, at 9:12 AM, ML_Seda wrote:
 
 
 Hey,
 
 I've been reading up on the Cassandra data model a bit, and would like to
 get some input from this forum on different techniques for a particular
 problem.
 
 Assume I need to index millions of text docs (e.g. research papers), and
 allow the ability to query them by a given word inside or around any of
 the
 indexed docs.  meaning if i search for terms i would like to get a list
 of
 docs in which these terms show up (e.g. Michael Jordan = Michael is the
 main
 term, and Jordan is next term n1.  The same can be applied by indicating
 previous terms to Michael)
 
 How do I model this in Cassandra?
 
 Would my Keys be a concat of the middle term + docid?  Will I be able to
 do
 queries by wildcarding the docid?
 
 Thanks.
 -- 
 View this message in context:
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive at
 Nabble.com.
 
 --
 Ian Holsman
 i...@holsman.net
 
 
 
 
 

-- 
View this message in context: 
http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4288808.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Data Model Index Text

2010-01-11 Thread Jake Luciani
currently uses the 0.4 release series.

On Mon, Jan 11, 2010 at 6:21 PM, ML_Seda sonnyh...@gmail.com wrote:


 Is there a particular version of cassandra required for Lucandra to work?

 It's not able to resolve Cassandra Class, along with a few others.   I have
 trunk cassandra checked out, and Lucandra from github the link provided
 below.


 Ian Holsman-3 wrote:
 
  Hi ML.
  this sounds more like a job for SOLR, but if you want to do this with
  cassandra,
  you should look at Jake's Lucandra http://github.com/tjake/Lucandra
 
 
  you should also look at
  http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/
 
  I wouldn't recommend you building your own IR engine, just use one of the
  ones out there.
 
  regards
  Ian
  On Jan 9, 2010, at 9:12 AM, ML_Seda wrote:
 
 
  Hey,
 
  I've been reading up on the Cassandra data model a bit, and would like
 to
  get some input from this forum on different techniques for a particular
  problem.
 
  Assume I need to index millions of text docs (e.g. research papers), and
  allow the ability to query them by a given word inside or around any of
  the
  indexed docs.  meaning if i search for terms i would like to get a list
  of
  docs in which these terms show up (e.g. Michael Jordan = Michael is the
  main
  term, and Jordan is next term n1.  The same can be applied by indicating
  previous terms to Michael)
 
  How do I model this in Cassandra?
 
  Would my Keys be a concat of the middle term + docid?  Will I be able to
  do
  queries by wildcarding the docid?
 
  Thanks.
  --
  View this message in context:
  http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html
  Sent from the cassandra-user@incubator.apache.org mailing list archive
 at
  Nabble.com.
 
  --
  Ian Holsman
  i...@holsman.net
 
 
 
 
 

 --
 View this message in context:
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4288808.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Data Model Index Text

2010-01-11 Thread ML_Seda

Thanks Jake.

I don't see import org.apache.cassandra.service.Cassandra in 0.4 which is
referenced in BookmarksDemo.java.



Jake Luciani wrote:
 
 currently uses the 0.4 release series.
 
 On Mon, Jan 11, 2010 at 6:21 PM, ML_Seda sonnyh...@gmail.com wrote:
 

 Is there a particular version of cassandra required for Lucandra to work?

 It's not able to resolve Cassandra Class, along with a few others.   I
 have
 trunk cassandra checked out, and Lucandra from github the link provided
 below.


 Ian Holsman-3 wrote:
 
  Hi ML.
  this sounds more like a job for SOLR, but if you want to do this with
  cassandra,
  you should look at Jake's Lucandra http://github.com/tjake/Lucandra
 
 
  you should also look at
  http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/
 
  I wouldn't recommend you building your own IR engine, just use one of
 the
  ones out there.
 
  regards
  Ian
  On Jan 9, 2010, at 9:12 AM, ML_Seda wrote:
 
 
  Hey,
 
  I've been reading up on the Cassandra data model a bit, and would like
 to
  get some input from this forum on different techniques for a
 particular
  problem.
 
  Assume I need to index millions of text docs (e.g. research papers),
 and
  allow the ability to query them by a given word inside or around any
 of
  the
  indexed docs.  meaning if i search for terms i would like to get a
 list
  of
  docs in which these terms show up (e.g. Michael Jordan = Michael is
 the
  main
  term, and Jordan is next term n1.  The same can be applied by
 indicating
  previous terms to Michael)
 
  How do I model this in Cassandra?
 
  Would my Keys be a concat of the middle term + docid?  Will I be able
 to
  do
  queries by wildcarding the docid?
 
  Thanks.
  --
  View this message in context:
  http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html
  Sent from the cassandra-user@incubator.apache.org mailing list archive
 at
  Nabble.com.
 
  --
  Ian Holsman
  i...@holsman.net
 
 
 
 
 

 --
 View this message in context:
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4288808.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive at
 Nabble.com.

 
 

-- 
View this message in context: 
http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4289009.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Data Model Index Text

2010-01-08 Thread Drew Schleck
I think I am reading this right, basically you want to query for a
word and find all of the documents that contain it? While there may be
a better way to do this, the way the people at Facebook do it is with
supercolumns. Inside the supercolumn column family they have columns
for every word, such as Michael and Jordan, and within each of
those columns they have keys that correspond to the ids of all of the
documents.

I suppose if you do it this way you're forced to figure out which
documents are contained in all of the sets in memory, but if it's good
enough for Facebook I suppose it can't be too bad.

This video talks about it briefly:
http://www.facebook.com/video/video.php?v=540974400803

Drew

On Fri, Jan 8, 2010 at 14:12, ML_Seda sonnyh...@gmail.com wrote:

 Hey,

 I've been reading up on the Cassandra data model a bit, and would like to
 get some input from this forum on different techniques for a particular
 problem.

 Assume I need to index millions of text docs (e.g. research papers), and
 allow the ability to query them by a given word inside or around any of the
 indexed docs.  meaning if i search for terms i would like to get a list of
 docs in which these terms show up (e.g. Michael Jordan = Michael is the main
 term, and Jordan is next term n1.  The same can be applied by indicating
 previous terms to Michael)

 How do I model this in Cassandra?

 Would my Keys be a concat of the middle term + docid?  Will I be able to do
 queries by wildcarding the docid?

 Thanks.
 --
 View this message in context: 
 http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html
 Sent from the cassandra-user@incubator.apache.org mailing list archive at 
 Nabble.com.