Re: Data Model Index Text
On the topic of Lucandra, apart from having it work with 0.5 of Cassandra, has any work been done to get it up to date with Lucene 2.9/3.0? Also, I'm a bit concerned about its use of OrderPreservingPartitioner; is there an architecture for storage that could be considered that would work with RandomPartitioner? Ryan On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote: i do see the classes now, but All the way back in version .20. Is there a newer version of Lucandra. It would be nice for us to use the lastest cassandra (trunk). -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
I'm assuming I have to run the thrift gen-java from cassandra .4 release. Is there any documentation or tutorial on how to get that up and running? I've checked both cassandra and lucandra into eclipse, but the lucandra project is still unable to resolve some Classes. This is because I need to generate the java client classes? Thanks. Jake Luciani wrote: It should work but not a ton has changed in 2.9/3.0 AFAIK. I'm going to work on updating Lucandra to work with 0.5 branch I can try to update this as well. BTW, if you want to see Lucandra in action check out http://flocking.me (example: http://flocking.me/tjake ) You can use a random partitioner if you store the entire index under a supercolumn (how it was originally implemented) but then you need to accept the entire index will be in memory for any operation on that index (bad for big indexes). -Jake On Wed, Jan 13, 2010 at 9:14 AM, Ryan Daum r...@thimbleware.com wrote: On the topic of Lucandra, apart from having it work with 0.5 of Cassandra, has any work been done to get it up to date with Lucene 2.9/3.0? Also, I'm a bit concerned about its use of OrderPreservingPartitioner; is there an architecture for storage that could be considered that would work with RandomPartitioner? Ryan On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote: i do see the classes now, but All the way back in version .20. Is there a newer version of Lucandra. It would be nice for us to use the lastest cassandra (trunk). -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4349520.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
On Wed, Jan 13, 2010 at 10:04 AM, ML_Seda sonnyh...@gmail.com wrote: I'm assuming I have to run the thrift gen-java from cassandra .4 release. Is there any documentation or tutorial on how to get that up and running? No, cassandra includes a copy of the thrift Java classes. You don't need to mess w/ the thrift compiler. -Jonathan
Re: Data Model Index Text
You should be using the ant file to build lucandra, see README. For eclipse you need to add lucandra/gen-java to src path (this contains the thrift stubs). -Jake On Wed, Jan 13, 2010 at 11:04 AM, ML_Seda sonnyh...@gmail.com wrote: I'm assuming I have to run the thrift gen-java from cassandra .4 release. Is there any documentation or tutorial on how to get that up and running? I've checked both cassandra and lucandra into eclipse, but the lucandra project is still unable to resolve some Classes. This is because I need to generate the java client classes? Thanks. Jake Luciani wrote: It should work but not a ton has changed in 2.9/3.0 AFAIK. I'm going to work on updating Lucandra to work with 0.5 branch I can try to update this as well. BTW, if you want to see Lucandra in action check out http://flocking.me (example: http://flocking.me/tjake ) You can use a random partitioner if you store the entire index under a supercolumn (how it was originally implemented) but then you need to accept the entire index will be in memory for any operation on that index (bad for big indexes). -Jake On Wed, Jan 13, 2010 at 9:14 AM, Ryan Daum r...@thimbleware.com wrote: On the topic of Lucandra, apart from having it work with 0.5 of Cassandra, has any work been done to get it up to date with Lucene 2.9/3.0? Also, I'm a bit concerned about its use of OrderPreservingPartitioner; is there an architecture for storage that could be considered that would work with RandomPartitioner? Ryan On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote: i do see the classes now, but All the way back in version .20. Is there a newer version of Lucandra. It would be nice for us to use the lastest cassandra (trunk). -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4349520.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
Ah, yes. I made this change locally once. let me try to find it. On Wed, Jan 13, 2010 at 10:43 AM, Ryan Daum r...@thimbleware.com wrote: The only tricky point I saw with 3.0 Lucene switch was that the TokenStream API changed completely, and IndexWriter in your code depended on the old API. I've ruled out OrderPreservingPartitioner for other jobs of mine because distribution of keys is likely not ideal across my cluster. I'm curious with Lucandra if the keys truly distribute well? R On Wed, Jan 13, 2010 at 10:26 AM, Jake Luciani jak...@gmail.com wrote: It should work but not a ton has changed in 2.9/3.0 AFAIK. I'm going to work on updating Lucandra to work with 0.5 branch I can try to update this as well. BTW, if you want to see Lucandra in action check out http://flocking.me (example: http://flocking.me/tjake ) You can use a random partitioner if you store the entire index under a supercolumn (how it was originally implemented) but then you need to accept the entire index will be in memory for any operation on that index (bad for big indexes). -Jake On Wed, Jan 13, 2010 at 9:14 AM, Ryan Daum r...@thimbleware.com wrote: On the topic of Lucandra, apart from having it work with 0.5 of Cassandra, has any work been done to get it up to date with Lucene 2.9/3.0? Also, I'm a bit concerned about its use of OrderPreservingPartitioner; is there an architecture for storage that could be considered that would work with RandomPartitioner? Ryan On Tue, Jan 12, 2010 at 12:20 PM, ML_Seda sonnyh...@gmail.com wrote: i do see the classes now, but All the way back in version .20. Is there a newer version of Lucandra. It would be nice for us to use the lastest cassandra (trunk). -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
i do see the classes now, but All the way back in version .20. Is there a newer version of Lucandra. It would be nice for us to use the lastest cassandra (trunk). -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4293071.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
Thanks Drew. That is correct. That would be one of the queries (give me all documents in which a list of terms are present) But not only that, another query would allow users to search for words around a given word. Keying in Michael would have a list of words in all documents after the word Michael (e.g. Jordan, Jackson etc). The same is done for words before a given word. Is cassandra not optimal for this? As pointed out by Ian I will look into Lucandra as well. Thanks. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4286704.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
Is there a particular version of cassandra required for Lucandra to work? It's not able to resolve Cassandra Class, along with a few others. I have trunk cassandra checked out, and Lucandra from github the link provided below. Ian Holsman-3 wrote: Hi ML. this sounds more like a job for SOLR, but if you want to do this with cassandra, you should look at Jake's Lucandra http://github.com/tjake/Lucandra you should also look at http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/ I wouldn't recommend you building your own IR engine, just use one of the ones out there. regards Ian On Jan 9, 2010, at 9:12 AM, ML_Seda wrote: Hey, I've been reading up on the Cassandra data model a bit, and would like to get some input from this forum on different techniques for a particular problem. Assume I need to index millions of text docs (e.g. research papers), and allow the ability to query them by a given word inside or around any of the indexed docs. meaning if i search for terms i would like to get a list of docs in which these terms show up (e.g. Michael Jordan = Michael is the main term, and Jordan is next term n1. The same can be applied by indicating previous terms to Michael) How do I model this in Cassandra? Would my Keys be a concat of the middle term + docid? Will I be able to do queries by wildcarding the docid? Thanks. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com. -- Ian Holsman i...@holsman.net -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4288808.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
currently uses the 0.4 release series. On Mon, Jan 11, 2010 at 6:21 PM, ML_Seda sonnyh...@gmail.com wrote: Is there a particular version of cassandra required for Lucandra to work? It's not able to resolve Cassandra Class, along with a few others. I have trunk cassandra checked out, and Lucandra from github the link provided below. Ian Holsman-3 wrote: Hi ML. this sounds more like a job for SOLR, but if you want to do this with cassandra, you should look at Jake's Lucandra http://github.com/tjake/Lucandra you should also look at http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/ I wouldn't recommend you building your own IR engine, just use one of the ones out there. regards Ian On Jan 9, 2010, at 9:12 AM, ML_Seda wrote: Hey, I've been reading up on the Cassandra data model a bit, and would like to get some input from this forum on different techniques for a particular problem. Assume I need to index millions of text docs (e.g. research papers), and allow the ability to query them by a given word inside or around any of the indexed docs. meaning if i search for terms i would like to get a list of docs in which these terms show up (e.g. Michael Jordan = Michael is the main term, and Jordan is next term n1. The same can be applied by indicating previous terms to Michael) How do I model this in Cassandra? Would my Keys be a concat of the middle term + docid? Will I be able to do queries by wildcarding the docid? Thanks. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com. -- Ian Holsman i...@holsman.net -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4288808.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
Thanks Jake. I don't see import org.apache.cassandra.service.Cassandra in 0.4 which is referenced in BookmarksDemo.java. Jake Luciani wrote: currently uses the 0.4 release series. On Mon, Jan 11, 2010 at 6:21 PM, ML_Seda sonnyh...@gmail.com wrote: Is there a particular version of cassandra required for Lucandra to work? It's not able to resolve Cassandra Class, along with a few others. I have trunk cassandra checked out, and Lucandra from github the link provided below. Ian Holsman-3 wrote: Hi ML. this sounds more like a job for SOLR, but if you want to do this with cassandra, you should look at Jake's Lucandra http://github.com/tjake/Lucandra you should also look at http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/ I wouldn't recommend you building your own IR engine, just use one of the ones out there. regards Ian On Jan 9, 2010, at 9:12 AM, ML_Seda wrote: Hey, I've been reading up on the Cassandra data model a bit, and would like to get some input from this forum on different techniques for a particular problem. Assume I need to index millions of text docs (e.g. research papers), and allow the ability to query them by a given word inside or around any of the indexed docs. meaning if i search for terms i would like to get a list of docs in which these terms show up (e.g. Michael Jordan = Michael is the main term, and Jordan is next term n1. The same can be applied by indicating previous terms to Michael) How do I model this in Cassandra? Would my Keys be a concat of the middle term + docid? Will I be able to do queries by wildcarding the docid? Thanks. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com. -- Ian Holsman i...@holsman.net -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4288808.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4289009.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
Re: Data Model Index Text
I think I am reading this right, basically you want to query for a word and find all of the documents that contain it? While there may be a better way to do this, the way the people at Facebook do it is with supercolumns. Inside the supercolumn column family they have columns for every word, such as Michael and Jordan, and within each of those columns they have keys that correspond to the ids of all of the documents. I suppose if you do it this way you're forced to figure out which documents are contained in all of the sets in memory, but if it's good enough for Facebook I suppose it can't be too bad. This video talks about it briefly: http://www.facebook.com/video/video.php?v=540974400803 Drew On Fri, Jan 8, 2010 at 14:12, ML_Seda sonnyh...@gmail.com wrote: Hey, I've been reading up on the Cassandra data model a bit, and would like to get some input from this forum on different techniques for a particular problem. Assume I need to index millions of text docs (e.g. research papers), and allow the ability to query them by a given word inside or around any of the indexed docs. meaning if i search for terms i would like to get a list of docs in which these terms show up (e.g. Michael Jordan = Michael is the main term, and Jordan is next term n1. The same can be applied by indicating previous terms to Michael) How do I model this in Cassandra? Would my Keys be a concat of the middle term + docid? Will I be able to do queries by wildcarding the docid? Thanks. -- View this message in context: http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.