Re: FieldCache
I think i'd try to use a bitset instead of a string for your categories, is that possible? how many categories do you have roughly? simon On Sat, Oct 22, 2011 at 6:01 AM, Peyman Faratin wrote: > Hi > > I have a field that is indexed as follows > > for(String c: article.getCategories()){ > doc.add(new Field("categories", c.toLowerCase(), > Field.Store.YES, Field.Index.ANALYZED)); > } > > I have a search space of 2 million docs and I need to access the category > field of each hitdoc. I would like to use FieldCache but since I am indexing > the field as mutlifield this is a problem. > > Is there a recommend solution to this problem? > > thank you > > Peyman - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Bet you didn't know Lucene can...
Hi All, I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." (http://na11.apachecon.com/talks/18396). It's based on my observation, that over the years, a number of us in the community have done some pretty cool things using Lucene that don't fit under the core premise of full text search. I've got a fair number of ideas for the talk (easily enough for 1 hour), but I wanted to reach out to hear your stories of ways you've (ab)used Lucene and Solr to see if we couldn't extend the conversation to a bit more than the conference and also see if I can't inject more ideas beyond the ones I have. I don't need deep technical details, but just high level use case and the basic insight that led you to believe Lucene could solve the problem. Thanks in advance, Grant Grant Ingersoll http://www.lucidimagination.com
Re: Bet you didn't know Lucene can...
Grant, for years the ActiveMath learning environment has been using as storage engine. At the time (~2004), it was by far the best storage engine ever doable in a pure java-world. Now it still is perfect in terms of performance. We had an issue with the separate versions where the stored-fields were not lazily loaded (~version 1.x-2.0) so that we do not store the big fragments yet there. However, for small fragments it's very very efficient (~5000 queries a second). The objects stored are fragments of XML documents (the format is called OMDoc, they're mostly hand-written). Tell me if you need more details, I am sure the pure storage option is something very common. paul Le 22 oct. 2011 à 11:11, Grant Ingersoll a écrit : > Hi All, > > I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." > (http://na11.apachecon.com/talks/18396). It's based on my observation, that > over the years, a number of us in the community have done some pretty cool > things using Lucene that don't fit under the core premise of full text > search. I've got a fair number of ideas for the talk (easily enough for 1 > hour), but I wanted to reach out to hear your stories of ways you've (ab)used > Lucene and Solr to see if we couldn't extend the conversation to a bit more > than the conference and also see if I can't inject more ideas beyond the ones > I have. I don't need deep technical details, but just high level use case > and the basic insight that led you to believe Lucene could solve the problem. > > Thanks in advance, > Grant > > > Grant Ingersoll > http://www.lucidimagination.com > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: No longer able to set merge factor since updating to Lucene 3.4
Hmm, this is because as of 3.2.0 the default MergePolicy is now TieredMergePolicy. But: if you pass Version.LUCENE_31 when you create the IndexWriterConfig you should get the old default (LogMergePolicy) and then IW.setMergeFactor should work. But it's better to use TieredMergePolicy (it's able to pick better merges), and instead set the merge settings directly on that class. That class actually "splits" mergeFactor into two separate controls: maxMergeAtOnce (how many segments to merge at a time) and segmentsPerTier (how "aggressively" you need to merge -- bigger numbers means merging is delayed but your index has more segments). Mike McCandless http://blog.mikemccandless.com On Fri, Oct 21, 2011 at 12:55 PM, Paul Taylor wrote: > Hi upgraded from 3.1 to 3.4, now it is compliaing about deprecated method > > indexWriter.setMergeFactor(); > > Saying it can only be used with the default LogMergePolicy ,but I never set > the merge policy so shouldn't I be using the default anyway ? > > Paul > > > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Bet you didn't know Lucene can...
Hi Grant, Not sure if this qualifies as a "bet you didn't know", but one could use Lucene term vectors to construct document vectors for similarity, clustering and classification tasks. I found this out recently (although I am probably not the first one), and I think this could be quite useful. -sujit On Sat, 2011-10-22 at 11:11 +0200, Grant Ingersoll wrote: > Hi All, > > I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." > (http://na11.apachecon.com/talks/18396). It's based on my observation, that > over the years, a number of us in the community have done some pretty cool > things using Lucene that don't fit under the core premise of full text > search. I've got a fair number of ideas for the talk (easily enough for 1 > hour), but I wanted to reach out to hear your stories of ways you've (ab)used > Lucene and Solr to see if we couldn't extend the conversation to a bit more > than the conference and also see if I can't inject more ideas beyond the ones > I have. I don't need deep technical details, but just high level use case > and the basic insight that led you to believe Lucene could solve the problem. > > Thanks in advance, > Grant > > > Grant Ingersoll > http://www.lucidimagination.com > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
using lucene to find neighbouring points in an n-dimensional space
My use case is the following : Given an n-dimensional vector ( only +ve quadrants/points ) find its closest neighbours. I would like to try out with lucene's default ranking. Here is how a typical document will look like : ( or same thing ) doc1 = 1245:15 3490:20 8856:20 etc. As reflected in the above example the number of dimensions is high ( ~ 50K ) and the length of vectors are small ( < 40 ). I am thinking of constructing a BooleanQuery in the following way ( for doc1 as Query ) : BooleanQuery bq = new BooleanQuery() bq.add (new TermQuery(new Term("field", "1245") ), BooleanClause.Occur.SHOULD ) ; bq.add (new TermQuery(new Term("field", "3490") ), BooleanClause.Occur.SHOULD ) ; bq.add (new TermQuery(new Term("field", "8856") ), BooleanClause.Occur.SHOULD ) ; The problem is how do I pass the dimension-value ( 15, 20, 20 etc. ) in the TermQuery. One solution is to pass as many TermQueries as the diemension value, but was thinking if there is any better way to pass the dimension-weight. I can probably do the same during indexing as latency is not an issue during indexing time. Any help is greatly appreciated. -Thanks, Prasenjit - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Bet you didn't know Lucene can...
Hi Grant, These are 2 cases into work i've done that I can think of: -use Lucene to match products in a database with eBay auctions, the title of the auction is used as the query to Lucene. -use a servlet filter and Lucene to map well-formed URL's into a website to it's individual (product) pages. A deeper URL results in a Lucene BooleanQuery with more clauses. Hope this is enough (ab)use... Wouter > Hi All, > > I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." > (http://na11.apachecon.com/talks/18396). It's based on my observation, > that over the years, a number of us in the community have done some pretty > cool things using Lucene that don't fit under the core premise of full > text search. I've got a fair number of ideas for the talk (easily enough > for 1 hour), but I wanted to reach out to hear your stories of ways you've > (ab)used Lucene and Solr to see if we couldn't extend the conversation to > a bit more than the conference and also see if I can't inject more ideas > beyond the ones I have. I don't need deep technical details, but just > high level use case and the basic insight that led you to believe Lucene > could solve the problem. > > Thanks in advance, > Grant > > > Grant Ingersoll > http://www.lucidimagination.com > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Language Identifier with Lucene?
On Oct 22, 2011, at 2:49 AM, Luca Rondanini wrote: > I usually use Nutch for this but, just for fun, I tried to create a language > identifier based on Lucene only. Talking of which: Google's Compact Language Detector http://blog.mikemccandless.com/2011/10/language-detection-with-googles-compact.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Bet you didn't know Lucene can...
On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > Hi Grant, > > Not sure if this qualifies as a "bet you didn't know", but one could use > Lucene term vectors to construct document vectors for similarity, > clustering and classification tasks. I found this out recently (although > I am probably not the first one), and I think this could be quite > useful. Yep, had these on my list! - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Bet you didn't know Lucene can...
Using Lucene as a recommendation engine. On Sat, Oct 22, 2011 at 6:33 PM, Grant Ingersoll wrote: > > On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > >> Hi Grant, >> >> Not sure if this qualifies as a "bet you didn't know", but one could use >> Lucene term vectors to construct document vectors for similarity, >> clustering and classification tasks. I found this out recently (although >> I am probably not the first one), and I think this could be quite >> useful. > > Yep, had these on my list! > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org