Re: FieldCache

2011-10-22 Thread Simon Willnauer
I think i'd try to use a bitset instead of a string for your
categories, is that possible? how many categories do you have roughly?

simon

On Sat, Oct 22, 2011 at 6:01 AM, Peyman Faratin  wrote:
> Hi
>
> I have a field that is indexed as follows
>
> for(String c: article.getCategories()){
>        doc.add(new Field("categories", c.toLowerCase(),
>        Field.Store.YES, Field.Index.ANALYZED));
> }
>
> I have a search space of 2 million docs and I need to access the category 
> field of each hitdoc. I would like to use FieldCache but since I am indexing 
> the field as mutlifield this is a problem.
>
> Is there a recommend solution to this problem?
>
> thank you
>
> Peyman

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
Hi All,

I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
(http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
over the years, a number of us in the community have done some pretty cool 
things using Lucene that don't fit under the core premise of full text search.  
I've got a fair number of ideas for the talk (easily enough for 1 hour), but I 
wanted to reach out to hear your stories of ways you've (ab)used Lucene and 
Solr to see if we couldn't extend the conversation to a bit more than the 
conference and also see if I can't inject more ideas beyond the ones I have.  I 
don't need deep technical details, but just high level use case and the basic 
insight that led you to believe Lucene could solve the problem.

Thanks in advance,
Grant


Grant Ingersoll
http://www.lucidimagination.com




Re: Bet you didn't know Lucene can...

2011-10-22 Thread Paul Libbrecht
Grant,

for years the ActiveMath learning environment has been using as storage engine.
At the time (~2004), it was by far the best storage engine ever doable in a 
pure java-world.
Now it still is perfect in terms of performance.
We had an issue with the separate versions where the stored-fields were not 
lazily loaded (~version 1.x-2.0) so that we do not store the big fragments yet 
there. However, for small fragments it's very very efficient (~5000 queries a 
second).

The objects stored are fragments of XML documents (the format is called OMDoc, 
they're mostly hand-written).

Tell me if you need more details, I am sure the pure storage option is 
something very common.

paul


Le 22 oct. 2011 à 11:11, Grant Ingersoll a écrit :

> Hi All,
> 
> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
> (http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
> over the years, a number of us in the community have done some pretty cool 
> things using Lucene that don't fit under the core premise of full text 
> search.  I've got a fair number of ideas for the talk (easily enough for 1 
> hour), but I wanted to reach out to hear your stories of ways you've (ab)used 
> Lucene and Solr to see if we couldn't extend the conversation to a bit more 
> than the conference and also see if I can't inject more ideas beyond the ones 
> I have.  I don't need deep technical details, but just high level use case 
> and the basic insight that led you to believe Lucene could solve the problem.
> 
> Thanks in advance,
> Grant
> 
> 
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: No longer able to set merge factor since updating to Lucene 3.4

2011-10-22 Thread Michael McCandless
Hmm, this is because as of 3.2.0 the default MergePolicy is now
TieredMergePolicy.

But: if you pass Version.LUCENE_31 when you create the
IndexWriterConfig you should get the old default (LogMergePolicy) and
then IW.setMergeFactor should work.

But it's better to use TieredMergePolicy (it's able to pick better
merges), and instead set the merge settings directly on that class.
That class actually "splits" mergeFactor into two separate controls:
maxMergeAtOnce (how many segments to merge at a time) and
segmentsPerTier (how "aggressively" you need to merge -- bigger
numbers means merging is delayed but your index has more segments).

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 21, 2011 at 12:55 PM, Paul Taylor  wrote:
> Hi upgraded from 3.1 to 3.4, now it is compliaing about deprecated method
>
> indexWriter.setMergeFactor();
>
> Saying it can only be used with the default LogMergePolicy ,but I never set
> the merge policy so shouldn't I be using the default anyway ?
>
> Paul
>
>
>
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Bet you didn't know Lucene can...

2011-10-22 Thread Sujit Pal
Hi Grant,

Not sure if this qualifies as a "bet you didn't know", but one could use
Lucene term vectors to construct document vectors for similarity,
clustering and classification tasks. I found this out recently (although
I am probably not the first one), and I think this could be quite
useful.

-sujit

On Sat, 2011-10-22 at 11:11 +0200, Grant Ingersoll wrote:
> Hi All,
> 
> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
> (http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
> over the years, a number of us in the community have done some pretty cool 
> things using Lucene that don't fit under the core premise of full text 
> search.  I've got a fair number of ideas for the talk (easily enough for 1 
> hour), but I wanted to reach out to hear your stories of ways you've (ab)used 
> Lucene and Solr to see if we couldn't extend the conversation to a bit more 
> than the conference and also see if I can't inject more ideas beyond the ones 
> I have.  I don't need deep technical details, but just high level use case 
> and the basic insight that led you to believe Lucene could solve the problem.
> 
> Thanks in advance,
> Grant
> 
> 
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



using lucene to find neighbouring points in an n-dimensional space

2011-10-22 Thread prasenjit mukherjee
My use case is the following :
Given an n-dimensional vector ( only +ve quadrants/points ) find its
closest neighbours. I would like to try out with lucene's default
ranking. Here is how a typical document will look like :
 ( or  same thing
)

doc1 = 1245:15 3490:20 8856:20 etc.

As reflected in the above example the number of dimensions is high ( ~
50K ) and the length of vectors are small ( < 40 ).

I am thinking of constructing a  BooleanQuery in the following way (
for doc1 as Query ) :

BooleanQuery bq = new BooleanQuery()
bq.add (new TermQuery(new Term("field", "1245") ),
BooleanClause.Occur.SHOULD ) ;
bq.add (new TermQuery(new Term("field", "3490") ),
BooleanClause.Occur.SHOULD ) ;
bq.add (new TermQuery(new Term("field", "8856") ),
BooleanClause.Occur.SHOULD ) ;

The problem is how do I pass the dimension-value ( 15, 20, 20 etc. )
in the TermQuery.

One solution is to pass as many TermQueries as the diemension value,
but was thinking if there is any better way to pass the
dimension-weight. I can probably do the same during indexing as
latency is not an issue during indexing time.

Any help is greatly appreciated.

-Thanks,
Prasenjit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Bet you didn't know Lucene can...

2011-10-22 Thread Wouter Heijke
Hi Grant,

These are 2 cases into work i've done that I can think of:

-use Lucene to match products in a database with eBay auctions, the title
of the auction is used as the query to Lucene.

-use a servlet filter and Lucene to map well-formed URL's into a website
to it's individual (product) pages. A deeper URL results in a Lucene
BooleanQuery with more clauses.

Hope this is enough (ab)use...

Wouter


> Hi All,
>
> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..."
> (http://na11.apachecon.com/talks/18396).  It's based on my observation,
> that over the years, a number of us in the community have done some pretty
> cool things using Lucene that don't fit under the core premise of full
> text search.  I've got a fair number of ideas for the talk (easily enough
> for 1 hour), but I wanted to reach out to hear your stories of ways you've
> (ab)used Lucene and Solr to see if we couldn't extend the conversation to
> a bit more than the conference and also see if I can't inject more ideas
> beyond the ones I have.  I don't need deep technical details, but just
> high level use case and the basic insight that led you to believe Lucene
> could solve the problem.
>
> Thanks in advance,
> Grant
>
> 
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Language Identifier with Lucene?

2011-10-22 Thread Petite Abeille

On Oct 22, 2011, at 2:49 AM, Luca Rondanini wrote:

> I usually use Nutch for this but, just for fun, I tried to create a language
> identifier based on Lucene only.

Talking of which:

Google's Compact Language Detector
http://blog.mikemccandless.com/2011/10/language-detection-with-googles-compact.html


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll

On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote:

> Hi Grant,
> 
> Not sure if this qualifies as a "bet you didn't know", but one could use
> Lucene term vectors to construct document vectors for similarity,
> clustering and classification tasks. I found this out recently (although
> I am probably not the first one), and I think this could be quite
> useful.

Yep, had these on my list!

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Bet you didn't know Lucene can...

2011-10-22 Thread Shashi Kant
Using Lucene as a recommendation engine.

On Sat, Oct 22, 2011 at 6:33 PM, Grant Ingersoll  wrote:
>
> On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote:
>
>> Hi Grant,
>>
>> Not sure if this qualifies as a "bet you didn't know", but one could use
>> Lucene term vectors to construct document vectors for similarity,
>> clustering and classification tasks. I found this out recently (although
>> I am probably not the first one), and I think this could be quite
>> useful.
>
> Yep, had these on my list!
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org