arguments in favour of lucene over commercial competition

2010-06-23 Thread jm
Hi,

I am trying to compile some arguments in favour of lucene as
management is deciding weather to standardize on lucene or a competing
commercial product (we have a couple of produc, one using lucene,
another using commercial product, imagine what am i using). I searched
the lists but could not find any post, although I remember seeing such
posts in the past.

Does somebody kept such posts linked or something? Or does someone
know of some page that would help me?

I would like to show:
- traction of lucene, really improving a lot last couple of years
- rich ecosystem (solr...)
- references of other companies choosing lucene/solr over commercial
(be it Fast or whatever)

thanks

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Overriding Lucene's term weights computation

2010-06-23 Thread Naama Kraus
Hi,

Is there a way for an application to index a document along with its "term
weighted vector" (Lucene's TermFreqVector). I.e., override the term
frequencies computed by Lucene, with an application's computed term weights
(non frequency based) ?
I don't think I want to use Scorer#score() for applying score changes as
this one is activated at search time which won't work for me.

Thanks for any insight,
Naama


Re: Stop words filter

2010-06-23 Thread Erick Erickson
On the chance that this is an XY problem
(http://people.apache.org/~hossman/#xyproblem),
why can't you use StopFilter and PorterStemFilter in
your filter chain rather than try to do this yourself?

Best
Erick

On Tue, Jun 22, 2010 at 10:49 PM, Vinicius Carvalho <
viniciusccarva...@gmail.com> wrote:

> Hello there! I've been using lucene as a Fult Text Search solution for some
> time. And  although I'm familiar with Analyzers and Stemmers I never used
> them directly.
>
> I'm testing a few experiments on Sentiment Analysis and our implementation
> needs to perform stemming and stop word removal. I thought using lucene
> built-in support to spare me some coding time.
>
> Is there any example? I'm trying
>
> TokenStream stream = analyzer.tokenStream("", new StringReader(inputStr));
>
> Problem is that I could not find a way to get the result tokens. I was
> expecting something like stream.getTokens:Token[] :P
>
> Could someone point me in the right direction?
>
> Regards
>
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.
>


Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Erick Erickson
One thing to consider is that you have access to the source,
so worst-case you won't be cut off at the knees by the commercial
vendor.

Case in point: Fast was acquired by Microsoft, who have since
dropped all future Unix development. Hope all Fast users
really like running their apps on Windows servers.

Here's a start for companies using Lucene:
http://wiki.apache.org/lucene-java/PoweredBy

HTH
Erick

On Wed, Jun 23, 2010 at 4:01 AM, jm  wrote:

> Hi,
>
> I am trying to compile some arguments in favour of lucene as
> management is deciding weather to standardize on lucene or a competing
> commercial product (we have a couple of produc, one using lucene,
> another using commercial product, imagine what am i using). I searched
> the lists but could not find any post, although I remember seeing such
> posts in the past.
>
> Does somebody kept such posts linked or something? Or does someone
> know of some page that would help me?
>
> I would like to show:
> - traction of lucene, really improving a lot last couple of years
> - rich ecosystem (solr...)
> - references of other companies choosing lucene/solr over commercial
> (be it Fast or whatever)
>
> thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


URL Tokenization

2010-06-23 Thread Sudha Verma
Hi,

I am new to lucene and I am using Lucene 3.0.2.

I am using Lucene to parse text which may contain URLs. I noticed the
StandardTokenizer keeps the email addresses in one token, but not the URLs.
I also looked at Solr wiki pages, and even though the wiki page for
solr.StandardTokenizerFactory says it keeps track of the URL token type - it
does not seem to be the case.

Is there an Analyzer implementation that can keep the URLs intact into one
token? or does anyone have an example of that for Solr or Lucene?

Thanks much,
Sudha


RE: URL Tokenization

2010-06-23 Thread Steven A Rowe
Hi Sudha,

There is such a tokenizer, named NewStandardTokenizer, in the most recent patch 
on the following JIRA issue: 

   https://issues.apache.org/jira/browse/LUCENE-2167

It keeps (HTTP(S), FTP, and FILE) URLs together as single tokens, and e-mails 
too, in accordance with the relevant IETF RFCs.

Steve

> -Original Message-
> From: Sudha Verma [mailto:verma.su...@gmail.com]
> Sent: Wednesday, June 23, 2010 2:07 PM
> To: java-user@lucene.apache.org
> Subject: URL Tokenization
> 
> Hi,
> 
> I am new to lucene and I am using Lucene 3.0.2.
> 
> I am using Lucene to parse text which may contain URLs. I noticed the
> StandardTokenizer keeps the email addresses in one token, but not the
> URLs.
> I also looked at Solr wiki pages, and even though the wiki page for
> solr.StandardTokenizerFactory says it keeps track of the URL token type -
> it does not seem to be the case.
> 
> Is there an Analyzer implementation that can keep the URLs intact into one
> token? or does anyone have an example of that for Solr or Lucene?
> 
> Thanks much,
> Sudha


RE: Overriding Lucene's term weights computation

2010-06-23 Thread Yuval Feinstein
Naama, Maybe you could use the new flexible indexing mechanism.
Some information is in this lecture:
http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
Alternatively, you may use payloads, but they seem like a worse fit.
Good Luck,
Yuval


From: Naama Kraus [naamakr...@gmail.com]
Sent: Wednesday, June 23, 2010 1:38 PM
To: java-user@lucene.apache.org
Subject: Overriding Lucene's term weights computation

Hi,

Is there a way for an application to index a document along with its "term
weighted vector" (Lucene's TermFreqVector). I.e., override the term
frequencies computed by Lucene, with an application's computed term weights
(non frequency based) ?
I don't think I want to use Scorer#score() for applying score changes as
this one is activated at search time which won't work for me.

Thanks for any insight,
Naama
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
Lucene/Solr choice typically means:

* lower cost of ownership (think about various crazy licensing models some of 
the commercial search vendors have: per doc, per server, per query, per 
year)

* faster implementation (just think about the duration of the sales/negotiation 
phase for commercial search vendors)

* flexibility -- it's open source, you can change whatever you want.  Try that 
with closed-source commercial search vendor's package.

* super fast and knowledgeable community  -- see 
http://www.jroller.com/otis/entry/lucene_solr_nutch_amazing_tech

* commercial support and experts still available -- see 
http://www.sematext.com/services/index.html

* adoption - small companies, medium companies, HUGE companies, secret 
organizations, everyone's using some form of Lucene -- see 
http://wiki.apache.org/lucene-java/PoweredBy , 
http://wiki.apache.org/solr/PublicServers

* maturity - Lucene is over 10 years old.  Solr is over 4 years old.

* future - look at JIRA, look at mailing list traffic, look at pace of 
development, look at CHANGES.txt

* searchable documentation and mailing list archives  -- 
http://search-lucene.com/


* ...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: jm 
> To: java-user@lucene.apache.org
> Sent: Wed, June 23, 2010 4:01:05 AM
> Subject: arguments in favour of lucene over commercial competition
> 
> Hi,

I am trying to compile some arguments in favour of lucene 
> as
management is deciding weather to standardize on lucene or a 
> competing
commercial product (we have a couple of produc, one using 
> lucene,
another using commercial product, imagine what am i using). I 
> searched
the lists but could not find any post, although I remember seeing 
> such
posts in the past.

Does somebody kept such posts linked or 
> something? Or does someone
know of some page that would help me?

I 
> would like to show:
- traction of lucene, really improving a lot last couple 
> of years
- rich ecosystem (solr...)
- references of other companies 
> choosing lucene/solr over commercial
(be it Fast or 
> whatever)

thanks

-
To 
> unsubscribe, e-mail: 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
For 
> additional commands, e-mail: 
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread jm
thanks guys, those links are cool. I welcome any other positive thing
anyone can add. Specially references of products/sites moving to
lucene/solr

javier

On Wed, Jun 23, 2010 at 10:49 PM, Otis Gospodnetic
 wrote:
> Lucene/Solr choice typically means:
>
> * lower cost of ownership (think about various crazy licensing models some of 
> the commercial search vendors have: per doc, per server, per query, per 
> year)
>
> * faster implementation (just think about the duration of the 
> sales/negotiation phase for commercial search vendors)
>
> * flexibility -- it's open source, you can change whatever you want.  Try 
> that with closed-source commercial search vendor's package.
>
> * super fast and knowledgeable community  -- see 
> http://www.jroller.com/otis/entry/lucene_solr_nutch_amazing_tech
>
> * commercial support and experts still available -- see 
> http://www.sematext.com/services/index.html
>
> * adoption - small companies, medium companies, HUGE companies, secret 
> organizations, everyone's using some form of Lucene -- see 
> http://wiki.apache.org/lucene-java/PoweredBy , 
> http://wiki.apache.org/solr/PublicServers
>
> * maturity - Lucene is over 10 years old.  Solr is over 4 years old.
>
> * future - look at JIRA, look at mailing list traffic, look at pace of 
> development, look at CHANGES.txt
>
> * searchable documentation and mailing list archives  -- 
> http://search-lucene.com/
>
>
> * ...
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: jm 
>> To: java-user@lucene.apache.org
>> Sent: Wed, June 23, 2010 4:01:05 AM
>> Subject: arguments in favour of lucene over commercial competition
>>
>> Hi,
>
> I am trying to compile some arguments in favour of lucene
>> as
> management is deciding weather to standardize on lucene or a
>> competing
> commercial product (we have a couple of produc, one using
>> lucene,
> another using commercial product, imagine what am i using). I
>> searched
> the lists but could not find any post, although I remember seeing
>> such
> posts in the past.
>
> Does somebody kept such posts linked or
>> something? Or does someone
> know of some page that would help me?
>
> I
>> would like to show:
> - traction of lucene, really improving a lot last couple
>> of years
> - rich ecosystem (solr...)
> - references of other companies
>> choosing lucene/solr over commercial
> (be it Fast or
>> whatever)
>
> thanks
>
> -
> To
>> unsubscribe, e-mail:
>> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
> For
>> additional commands, e-mail:
>> ymailto="mailto:java-user-h...@lucene.apache.org";
>> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Hans Merkl
Just curious. What commercial alternatives are out there?

On Wed, Jun 23, 2010 at 04:01, jm  wrote:

> Hi,
>
> I am trying to compile some arguments in favour of lucene as
> management is deciding weather to standardize on lucene or a competing
> commercial product (we have a couple of produc, one using lucene,
> another using commercial product, imagine what am i using). I searched
> the lists but could not find any post, although I remember seeing such
> posts in the past.
>
> Does somebody kept such posts linked or something? Or does someone
> know of some page that would help me?
>
> I would like to show:
> - traction of lucene, really improving a lot last couple of years
> - rich ecosystem (solr...)
> - references of other companies choosing lucene/solr over commercial
> (be it Fast or whatever)
>
> thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
Off the top of my head:

FAST
Endeca
Coveo
Attivio
Vivisimo
Google Search Appliance
(tell me when to stop)
Dieselpoint
IBM OmniFind
Exalead
Autonomy
dtSearch
ISYS
Oracle
...
...

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Hans Merkl 
> To: java-user 
> Sent: Wed, June 23, 2010 5:15:46 PM
> Subject: Re: arguments in favour of lucene over commercial competition
> 
> Just curious. What commercial alternatives are out there?

On Wed, Jun 23, 
> 2010 at 04:01, jm <
> href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com> wrote:

> 
> Hi,
>
> I am trying to compile some arguments in favour of lucene 
> as
> management is deciding weather to standardize on lucene or a 
> competing
> commercial product (we have a couple of produc, one using 
> lucene,
> another using commercial product, imagine what am i using). I 
> searched
> the lists but could not find any post, although I remember 
> seeing such
> posts in the past.
>
> Does somebody kept such 
> posts linked or something? Or does someone
> know of some page that would 
> help me?
>
> I would like to show:
> - traction of lucene, 
> really improving a lot last couple of years
> - rich ecosystem 
> (solr...)
> - references of other companies choosing lucene/solr over 
> commercial
> (be it Fast or whatever)
>
> 
> thanks
>
> 
> -
> To 
> unsubscribe, e-mail: 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: 
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: arguments in favour of lucene over commercial competition

2010-06-23 Thread Itamar Syn-Hershko
Otis, I'm 99% sure Attivio is just a wrapper arround Lucene...

And I personally wouldn't count full text search solutions such as Oracle's.

Itamar.

> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
> Sent: Thursday, June 24, 2010 12:42 AM
> To: java-user@lucene.apache.org
> Subject: Re: arguments in favour of lucene over commercial competition
> 
> Off the top of my head:
> 
> FAST
> Endeca
> Coveo
> Attivio
> Vivisimo
> Google Search Appliance
> (tell me when to stop)
> Dieselpoint
> IBM OmniFind
> Exalead
> Autonomy
> dtSearch
> ISYS
> Oracle
> ...
> ...
> 
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch 
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
> > From: Hans Merkl 
> > To: java-user 
> > Sent: Wed, June 23, 2010 5:15:46 PM
> > Subject: Re: arguments in favour of lucene over commercial 
> competition
> > 
> > Just curious. What commercial alternatives are out there?
> 
> On Wed, Jun 23, 
> > 2010 at 04:01, jm <
> > href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com> wrote:
> 
> > 
> > Hi,
> >
> > I am trying to compile some arguments in favour of lucene as 
> > management is deciding weather to standardize on lucene or 
> a competing 
> > commercial product (we have a couple of produc, one using lucene, 
> > another using commercial product, imagine what am i using). 
> I searched 
> > the lists but could not find any post, although I remember 
> seeing such 
> > posts in the past.
> >
> > Does somebody kept such
> > posts linked or something? Or does someone know of some page that 
> > would help me?
> >
> > I would like to show:
> > - traction of lucene,
> > really improving a lot last couple of years
> > - rich ecosystem
> > (solr...)
> > - references of other companies choosing lucene/solr over 
> commercial 
> > (be it Fast or whatever)
> >
> > 
> > thanks
> >
> > 
> > 
> -
> > To
> > unsubscribe, e-mail: 
> > 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubs
> > cr...@lucene.apache.org
> > 
> > For additional commands, e-mail: 
> > ymailto="mailto:java-user-h...@lucene.apache.org"; 
> > 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.a
> > pache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread jm
yes, in my case the competition is one of the list...

On Wed, Jun 23, 2010 at 11:41 PM, Otis Gospodnetic
 wrote:
> Off the top of my head:
>
> FAST
> Endeca
> Coveo
> Attivio
> Vivisimo
> Google Search Appliance
> (tell me when to stop)
> Dieselpoint
> IBM OmniFind
> Exalead
> Autonomy
> dtSearch
> ISYS
> Oracle
> ...
> ...
>
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: Hans Merkl 
>> To: java-user 
>> Sent: Wed, June 23, 2010 5:15:46 PM
>> Subject: Re: arguments in favour of lucene over commercial competition
>>
>> Just curious. What commercial alternatives are out there?
>
> On Wed, Jun 23,
>> 2010 at 04:01, jm <
>> href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com> wrote:
>
>>
>> Hi,
>>
>> I am trying to compile some arguments in favour of lucene
>> as
>> management is deciding weather to standardize on lucene or a
>> competing
>> commercial product (we have a couple of produc, one using
>> lucene,
>> another using commercial product, imagine what am i using). I
>> searched
>> the lists but could not find any post, although I remember
>> seeing such
>> posts in the past.
>>
>> Does somebody kept such
>> posts linked or something? Or does someone
>> know of some page that would
>> help me?
>>
>> I would like to show:
>> - traction of lucene,
>> really improving a lot last couple of years
>> - rich ecosystem
>> (solr...)
>> - references of other companies choosing lucene/solr over
>> commercial
>> (be it Fast or whatever)
>>
>>
>> thanks
>>
>>
>> -
>> To
>> unsubscribe, e-mail:
>> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
>>
>> For additional commands, e-mail:
>> ymailto="mailto:java-user-h...@lucene.apache.org";
>> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Problems with homebrew ParallelWriter

2010-06-23 Thread Justin
Hi all,

We've been waiting for LUCENE-1879 and LUCENE-2425 and have written our own 
ParallelWriter class in the meantime.  Apparently our indexes are falling out 
of sync (I suspect my colleague is seeing error messages come from 
ParallelReader stating the the number of documents must be the same).

Here's a code snippet from our ParallelWriter which extends Object:

writer1 = new IndexWriter(dir, analyzer, 
create,
  
new IndexWriter.MaxFieldLength(MFL));

writer1.setMergePolicy(new LogDocMergePolicy());

writer1.setMergeScheduler(new SerialMergeScheduler());

writer1.setMaxBufferedDocs(MBD);

writer1.setRAMBufferSizeMB(IndexWriter.DISABLE_AUTO_FLUSH);

My colleague suspects that merging or flushing is being triggered on something 
other than the doc count which leads to the writers' different behaviors.  I 
suspect our next step is to scatter breakpoints around Lucene source (we've got 
tr...@926791 to take advantage of latest NRT readers).

Does anyone have ideas on how the indexes would get out of sync?  Process 
close, committing, optimizing,... they all should work okay?

Thanks,
Justin


  

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Erick Erickson
Otis's comments reminded me of one of the astonishing things
I've seen in the Lucene/SOLR ecosystem; I've seen issues
reported, commented on, fixed, and patches made available
*for free* in a matter of hours.

Of course, you have to be willing to use a patched version, but
it sure beats waiting six months for the commercial vendor
to get around to including it in the next release. If they think it's
important enough. Unless you bribe them.

And if you really feel the need to bribe someone, there are
Lucene/SOLR support companies that you
can hire for a very reasonable fee if you don't want to add a
feature yourself. When you need them rather than up-front.

Do note one thing. Open Source software is, IMO,
hit-or-miss. Not all open source projects are created equal, and
just because something's "open source" does not mean it's of
great quality. I happen to think that Lucene/SOLR is in the very
top percentile of quality and active development FWIW. Subscribe
to the dev list to see just how active development is.

Of course the quality of commercial products also varies widely,
but you can't see the source (it's proprietary, dont'cha know) to
judge for yourself.

Have a look at the continuous build. Look at the code coverage
of the unit tests. Ask vendor X to provide equivalent data. Be
ready for BS as a response. Don't accept it.

In my initial foray into Lucene several years ago, by the time
I'd sent a support request to the vendor of a commercial product
and received an answer telling me that I hadn't included the
correct license info and I'd have to provide it before they could
talk to me, I'd found Lucene, downloaded it, indexed
some of our data and run searches against it. Not to mention
that rather than waiting for days to get a response from the
commercial vendor, my questions on the Lucene user's list were
answered within a very few hours. With grace and tolerance
for my ignorance. *For free*.

H, can you tell I'm somewhat of an enthusiast? ...
Disclosure: No, I don't work for any company that offers
support for SOLR/Lucene, no matter how much it may
sound like it 

Best
Erick

On Wed, Jun 23, 2010 at 5:57 PM, jm  wrote:

> yes, in my case the competition is one of the list...
>
> On Wed, Jun 23, 2010 at 11:41 PM, Otis Gospodnetic
>  wrote:
> > Off the top of my head:
> >
> > FAST
> > Endeca
> > Coveo
> > Attivio
> > Vivisimo
> > Google Search Appliance
> > (tell me when to stop)
> > Dieselpoint
> > IBM OmniFind
> > Exalead
> > Autonomy
> > dtSearch
> > ISYS
> > Oracle
> > ...
> > ...
> >
> >  Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original Message 
> >> From: Hans Merkl 
> >> To: java-user 
> >> Sent: Wed, June 23, 2010 5:15:46 PM
> >> Subject: Re: arguments in favour of lucene over commercial competition
> >>
> >> Just curious. What commercial alternatives are out there?
> >
> > On Wed, Jun 23,
> >> 2010 at 04:01, jm <
> >> href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com> wrote:
> >
> >>
> >> Hi,
> >>
> >> I am trying to compile some arguments in favour of lucene
> >> as
> >> management is deciding weather to standardize on lucene or a
> >> competing
> >> commercial product (we have a couple of produc, one using
> >> lucene,
> >> another using commercial product, imagine what am i using). I
> >> searched
> >> the lists but could not find any post, although I remember
> >> seeing such
> >> posts in the past.
> >>
> >> Does somebody kept such
> >> posts linked or something? Or does someone
> >> know of some page that would
> >> help me?
> >>
> >> I would like to show:
> >> - traction of lucene,
> >> really improving a lot last couple of years
> >> - rich ecosystem
> >> (solr...)
> >> - references of other companies choosing lucene/solr over
> >> commercial
> >> (be it Fast or whatever)
> >>
> >>
> >> thanks
> >>
> >>
> >> -
> >> To
> >> unsubscribe, e-mail:
> >> href="mailto:java-user-unsubscr...@lucene.apache.org";>
> java-user-unsubscr...@lucene.apache.org
> >>
> >> For additional commands, e-mail:
> >> ymailto="mailto:java-user-h...@lucene.apache.org";
> >> href="mailto:java-user-h...@lucene.apache.org";>
> java-user-h...@lucene.apache.org
> >>
> >>
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


RE: Help with Numeric Range

2010-06-23 Thread Todd Nine
Hi Uwe,

  Thank you for your help, it is greatly appreciated.  Unfortunately, my
tests all fail except for RangeInclusive.  I've changed the step to be 6
as per your recommendation.  I had it at max to eliminate step precision
as the cause of the test failure.  Essentially, all keys in Cassandra
are UTF-8 Keys.  In the Lucandra, the keys are constructed in the
following way.

1. Get the token stream for the field.  In this case it's a
NumericTokenStream with (numeric,valSize=64,precisionStep=6)
2. For all tokens in the stream, create a UTF8 String in the following
format \u
3. Set the term frequency to 1

This gives us a list of tokens, prefixed with the field name and the
delimiter.  then we do this

for each term from above create a key of the format
\u\u and write it to TermInfo
column Family

After debugging the implementation of the LucandraTermEnum, it is
correctly returning values that should match my numeric range query.
However, I never get the results in the TopDocs result set after they're
handed back to the numeric range query object.  Any ideas why this is
happening?

Thanks,
Todd




On Wed, 2010-06-23 at 08:53 +0200, Uwe Schindler wrote:

> Hi Todd,
> 
> I am not sure if I understand your problem correctly. I am not familiar with 
> Lucandra/Cassandra at all, but if Lucandra implements the IndexWriter and 
> IndexReader according to the documentation, numeric queries should work. A 
> NumericField internally creates a TokenStream and "analyzes" the number to 
> several Tokens, which are somehow "half binary" (they are terms containing of 
> characters in the full 0..127 range for optimal UTF8 compression with 3.x 
> versions of Lucene). The exact encoding can be looked at in the NumericUtils 
> class + javadocs.
> 
> About your testcase: The test looks good, so does it fail? If yes, where is 
> the problem? You can also look into Lucene's test TestNumericRangeQuery64 for 
> more examples. Or modify its @BeforeClass to instead build a Lucandra index. 
> 
> The test has one thing, that is not intended to be done like that:
> numeric = new NumericField("long", Integer.MAX_VALUE, Store.YES, true);
> 
> You are using MAX_VALUE as precision step, this would slowdown all queries to 
> the speed of old-style TermRangeQueries. It is always better to stick with 
> the default of 4, which creates 64 bits / 4 precStep = 16 terms per value. 
> Alternatively for longs, 6 is a good precision step (see NumericRangeQuery 
> documentation). MAX_VALUE is only intended for fields that do not do numeric 
> ranges but e.g. sort only. precisionStep is a performance tuning parameter, 
> it has nothing to do with better/worse precision on terms or different query 
> results. If you are using NumericRangeQuery with this large precStep, you are 
> not using the numeric features at all, so your test should not behave 
> different from a conventional TermRangeQuery with padded terms.
> 
> Uwe
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> > -Original Message-
> > From: Todd Nine [mailto:t...@spidertracks.co.nz]
> > Sent: Wednesday, June 23, 2010 7:53 AM
> > To: java-user@lucene.apache.org
> > Subject: Help with Numeric Range
> > 
> > Hi all,
> >   I'm new to Lucene, as well as Cassandra.  I'm working on the Lucandra
> > project to modify it to add some extra functionality.  It hasn't been fully
> > testing with range queries, so I've created some tests and contributed them.
> > You can view my source here.
> > 
> > http://github.com/tnine/Lucandra/blob/master/test/lucandra/NumericRang
> > eTests.java
> > 
> > First, is this a sensible test?  I'm specifically testing the case of longs 
> > where I
> > need millisecond precision on my searches.
> > 
> > 
> > Second, I see that Numeric Fields are built via terms.  I think the issue 
> > lies in
> > the encoding of these terms into bytes for the Cassandra keys.  Can anyone
> > point me to some documentation on numeric queries and terms, and how
> > they are encoded at the byte level based on the precision?
> > 
> > Thanks,
> > Todd
> 


Re: Problems with homebrew ParallelWriter

2010-06-23 Thread Shai Erera
How do you add documents to the index? Is it synchronized (such that
basically only one thread can add documents at a time)?
The same goes for removing documents as well.

Also, did you encounter any exceptions during the run - if say an addDoc
fails on one of the slices, then you need to revert that addDoc in all
previous slices ...

I remember running into such exception when working on the Parallel Index
stuff, but I don't remember what caused it ...

About merging, note that if you use LogDocMP, then you can guarantee that
all slices will be in sync, but still some merges could happen on some
slices not when you intended them to happen. For example, during a flush of
one addDoc on one of the slices, before the others addDoc finished. But if
you didn't see any exceptions and didn't terminate the process mid-action,
then this should not happen ...

I hope this helps. Unfortunately I had to shift focus from LUCENE-1879.
Perhaps I'll get back to it one day. But if you advanced on PI somehow,
perhaps you can diff the patch that's there and your code, and if you've
made progress, upload another patch?

Shai

On Thu, Jun 24, 2010 at 1:44 AM, Justin  wrote:

> Hi all,
>
> We've been waiting for LUCENE-1879 and LUCENE-2425 and have written our own
> ParallelWriter class in the meantime.  Apparently our indexes are falling
> out of sync (I suspect my colleague is seeing error messages come from
> ParallelReader stating the the number of documents must be the same).
>
> Here's a code snippet from our ParallelWriter which extends Object:
>
>writer1 = new IndexWriter(dir, analyzer,
> create,
>
> new IndexWriter.MaxFieldLength(MFL));
>
> writer1.setMergePolicy(new LogDocMergePolicy());
>
> writer1.setMergeScheduler(new SerialMergeScheduler());
>
> writer1.setMaxBufferedDocs(MBD);
>
> writer1.setRAMBufferSizeMB(IndexWriter.DISABLE_AUTO_FLUSH);
>
> My colleague suspects that merging or flushing is being triggered on
> something other than the doc count which leads to the writers' different
> behaviors.  I suspect our next step is to scatter breakpoints around Lucene
> source (we've got tr...@926791 to take advantage of latest NRT readers).
>
> Does anyone have ideas on how the indexes would get out of sync?  Process
> close, committing, optimizing,... they all should work okay?
>
> Thanks,
> Justin
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
I won't comment on Attivio, as I think I might have signed some NDA with them.  
But they do claim to combine full-text search with DB-like joins.  Can't 
MarkLogic do that, too?


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Itamar Syn-Hershko 
> To: java-user@lucene.apache.org
> Sent: Wed, June 23, 2010 5:54:34 PM
> Subject: RE: arguments in favour of lucene over commercial competition
> 
> Otis, I'm 99% sure Attivio is just a wrapper arround Lucene...

And I 
> personally wouldn't count full text search solutions such as 
> Oracle's.

Itamar.

> -Original Message-
> From: 
> Otis Gospodnetic [mailto:
> href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.com] 
> 
> Sent: Thursday, June 24, 2010 12:42 AM
> To: 
> ymailto="mailto:java-user@lucene.apache.org"; 
> href="mailto:java-user@lucene.apache.org";>java-user@lucene.apache.org
> 
> Subject: Re: arguments in favour of lucene over commercial competition
> 
> 
> Off the top of my head:
> 
> FAST
> Endeca
> 
> Coveo
> Attivio
> Vivisimo
> Google Search Appliance
> 
> (tell me when to stop)
> Dieselpoint
> IBM OmniFind
> 
> Exalead
> Autonomy
> dtSearch
> ISYS
> Oracle
> 
> ...
> ...
> 
>  Otis
> 
> Sematext :: 
> href="http://sematext.com/"; target=_blank >http://sematext.com/ :: Solr - 
> Lucene - Nutch 
> Lucene ecosystem search :: 
> href="http://search-lucene.com/"; target=_blank 
> >http://search-lucene.com/
> 
> 
> 
> - Original 
> Message 
> > From: Hans Merkl <
> ymailto="mailto:hme...@rightonpoint.us"; 
> href="mailto:hme...@rightonpoint.us";>hme...@rightonpoint.us>
> > 
> To: java-user <
> href="mailto:java-user@lucene.apache.org";>java-user@lucene.apache.org>
> 
> > Sent: Wed, June 23, 2010 5:15:46 PM
> > Subject: Re: arguments in 
> favour of lucene over commercial 
> competition
> > 
> > 
> Just curious. What commercial alternatives are out there?
> 
> On 
> Wed, Jun 23, 
> > 2010 at 04:01, jm <
> > href="mailto:
> ymailto="mailto:jmugur...@gmail.com"; 
> href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com">
> ymailto="mailto:jmugur...@gmail.com"; 
> href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com> wrote:
> 
> 
> > 
> > Hi,
> >
> > I am trying to compile 
> some arguments in favour of lucene as 
> > management is deciding 
> weather to standardize on lucene or 
> a competing 
> > 
> commercial product (we have a couple of produc, one using lucene, 
> > 
> another using commercial product, imagine what am i using). 
> I searched 
> 
> > the lists but could not find any post, although I remember 
> 
> seeing such 
> > posts in the past.
> >
> > 
> Does somebody kept such
> > posts linked or something? Or does someone 
> know of some page that 
> > would help me?
> >
> > I 
> would like to show:
> > - traction of lucene,
> > really 
> improving a lot last couple of years
> > - rich ecosystem
> > 
> (solr...)
> > - references of other companies choosing lucene/solr over 
> 
> commercial 
> > (be it Fast or whatever)
> >
> 
> > 
> > thanks
> >
> > 
> > 
> 
> -
> 
> > To
> > unsubscribe, e-mail: 
> > 
> href="mailto:
> ymailto="mailto:java-user-unsubscr...@lucene.apache.org"; 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org">java-user-unsubs
> 
> > 
> href="mailto:cr...@lucene.apache.org";>cr...@lucene.apache.org
> > 
> 
> > For additional commands, e-mail: 
> > ymailto="mailto:
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org"
>  
> 
> > 
> href="mailto:
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org">
> ymailto="mailto:java-user-h...@lucene.a"; 
> href="mailto:java-user-h...@lucene.a";>java-user-h...@lucene.a
> > 
> pache.org
> >
> >
> 
> 
> -
> To 
> unsubscribe, e-mail: 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: 
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org
> 
> 
> 
> 


-
To 
> unsubscribe, e-mail: 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
For 
> additional commands, e-mail: 
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Help with Numeric Range

2010-06-23 Thread Uwe Schindler
Are you sure that the term enum return the terms in correct order? For all 
types of RangeQueries, the term enumeration has to be correctly sorted as 
specified in the docs, if this is not correct, the enumeration may be 
incomplete. It’s a good thing to turn on assertions for the lucene package, as 
the internal term enum asserts some term order things.

 

At least to be sure, have you compared the results with the same test ran 
against pure-Lucene? Maybe there is something wrong in the tests, which we 
cannot see? Alternatively, maybe you try to use Lucene’s 
TestNumericRangeQuery64 and rewrite to Lucandra, as this one passes for sure.

 

One other thing: Lucene 4.0 with flexible indexing will change to binary-only 
terms (BytesRef class), will you be able to handle that?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

  http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Thursday, June 24, 2010 7:36 AM
To: 't...@spidertracks.co.nz'
Subject: RE: Help with Numeric Range

 

Are you sure that the term enum return the terms in correct order? For all 
types of RangeQueries, the term enumeration has to be correctly sorted as 
specified in the docs, if this is not correct, the enumeration may be 
incomplete.

 

One other thing: Lucene 4.0 with flexible indexing will change to binary-only 
terms (BytesRef class), will you be able to handle that?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de  

eMail: u...@thetaphi.de

 

From: Todd Nine [mailto:t...@spidertracks.co.nz] 
Sent: Thursday, June 24, 2010 2:00 AM
To: Uwe Schindler
Cc: java-user@lucene.apache.org
Subject: RE: Help with Numeric Range

 

Hi Uwe,

  Thank you for your help, it is greatly appreciated.  Unfortunately, my tests 
all fail except for RangeInclusive.  I've changed the step to be 6 as per your 
recommendation.  I had it at max to eliminate step precision as the cause of 
the test failure.  Essentially, all keys in Cassandra are UTF-8 Keys.  In the 
Lucandra, the keys are constructed in the following way.

1. Get the token stream for the field.  In this case it's a NumericTokenStream 
with (numeric,valSize=64,precisionStep=6)
2. For all tokens in the stream, create a UTF8 String in the following format 
\u
3. Set the term frequency to 1

This gives us a list of tokens, prefixed with the field name and the delimiter. 
 then we do this

for each term from above create a key of the format 
\u\u and write it to TermInfo column 
Family

After debugging the implementation of the LucandraTermEnum, it is correctly 
returning values that should match my numeric range query.  However, I never 
get the results in the TopDocs result set after they're handed back to the 
numeric range query object.  Any ideas why this is happening?

Thanks,
Todd



On Wed, 2010-06-23 at 08:53 +0200, Uwe Schindler wrote: 

 
Hi Todd,
 
I am not sure if I understand your problem correctly. I am not familiar with 
Lucandra/Cassandra at all, but if Lucandra implements the IndexWriter and 
IndexReader according to the documentation, numeric queries should work. A 
NumericField internally creates a TokenStream and "analyzes" the number to 
several Tokens, which are somehow "half binary" (they are terms containing of 
characters in the full 0..127 range for optimal UTF8 compression with 3.x 
versions of Lucene). The exact encoding can be looked at in the NumericUtils 
class + javadocs.
 
About your testcase: The test looks good, so does it fail? If yes, where is the 
problem? You can also look into Lucene's test TestNumericRangeQuery64 for more 
examples. Or modify its @BeforeClass to instead build a Lucandra index. 
 
The test has one thing, that is not intended to be done like that:
numeric = new NumericField("long", Integer.MAX_VALUE, Store.YES, true);
 
You are using MAX_VALUE as precision step, this would slowdown all queries to 
the speed of old-style TermRangeQueries. It is always better to stick with the 
default of 4, which creates 64 bits / 4 precStep = 16 terms per value. 
Alternatively for longs, 6 is a good precision step (see NumericRangeQuery 
documentation). MAX_VALUE is only intended for fields that do not do numeric 
ranges but e.g. sort only. precisionStep is a performance tuning parameter, it 
has nothing to do with better/worse precision on terms or different query 
results. If you are using NumericRangeQuery with this large precStep, you are 
not using the numeric features at all, so your test should not behave different 
from a conventional TermRangeQuery with padded terms.
 
Uwe
 
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
 
 
> -Original Message-
> From: Todd Nine [mailto:t...@spidertracks.co.nz]
> Sent: Wednesday, June 23, 2010 7:53 AM
> To: java-user@lucene.apache.org
> Subject: Help with N

Re: arguments in favour of lucene over commercial competition

2010-06-23 Thread Otis Gospodnetic
Coincidentally, just after I replied to this thread I received an email from 
one of our customers.  In that email was a quote from one of the commercial 
search vendors.  My jaw didn't drop because I've seen similar numbers from 
other commercial search vendors before, but I won't mention the customer 
nor the vendor, but I can tell you that the amount could put a couple of kids 
through a top-notch private college in the U.S.  Talking about TOC reduction 
through use of open-source!
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: jm 
> To: java-user@lucene.apache.org
> Sent: Wed, June 23, 2010 5:57:32 PM
> Subject: Re: arguments in favour of lucene over commercial competition
> 
> yes, in my case the competition is one of the list...

On Wed, Jun 23, 
> 2010 at 11:41 PM, Otis Gospodnetic
<
> ymailto="mailto:otis_gospodne...@yahoo.com"; 
> href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.com> 
> wrote:
> Off the top of my head:
>
> FAST
> 
> Endeca
> Coveo
> Attivio
> Vivisimo
> Google Search 
> Appliance
> (tell me when to stop)
> Dieselpoint
> IBM 
> OmniFind
> Exalead
> Autonomy
> dtSearch
> ISYS
> 
> Oracle
> ...
> ...
>
>  Otis
> 
> 
> Sematext :: 
> >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem 
> search :: 
> >http://search-lucene.com/
>
>
>
> - Original 
> Message 
>> From: Hans Merkl <
> ymailto="mailto:hme...@rightonpoint.us"; 
> href="mailto:hme...@rightonpoint.us";>hme...@rightonpoint.us>
>> 
> To: java-user <
> href="mailto:java-user@lucene.apache.org";>java-user@lucene.apache.org>
>> 
> Sent: Wed, June 23, 2010 5:15:46 PM
>> Subject: Re: arguments in favour 
> of lucene over commercial competition
>>
>> Just curious. What 
> commercial alternatives are out there?
>
> On Wed, Jun 
> 23,
>> 2010 at 04:01, jm <
>> href="mailto:
> ymailto="mailto:jmugur...@gmail.com"; 
> href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com">
> ymailto="mailto:jmugur...@gmail.com"; 
> href="mailto:jmugur...@gmail.com";>jmugur...@gmail.com> 
> wrote:
>
>>
>> Hi,
>>
>> I am trying 
> to compile some arguments in favour of lucene
>> as
>> 
> management is deciding weather to standardize on lucene or a
>> 
> competing
>> commercial product (we have a couple of produc, one 
> using
>> lucene,
>> another using commercial product, imagine 
> what am i using). I
>> searched
>> the lists but could not 
> find any post, although I remember
>> seeing such
>> posts in 
> the past.
>>
>> Does somebody kept such
>> posts 
> linked or something? Or does someone
>> know of some page that 
> would
>> help me?
>>
>> I would like to 
> show:
>> - traction of lucene,
>> really improving a lot last 
> couple of years
>> - rich ecosystem
>> (solr...)
>> - 
> references of other companies choosing lucene/solr over
>> 
> commercial
>> (be it Fast or 
> whatever)
>>
>>
>> 
> thanks
>>
>>
>> 
> -
>> 
> To
>> unsubscribe, e-mail:
>> href="mailto:
> ymailto="mailto:java-user-unsubscr...@lucene.apache.org"; 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org">
> ymailto="mailto:java-user-unsubscr...@lucene.apache.org"; 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
>>
>> 
> For additional commands, e-mail:
>> ymailto="mailto:
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org"
>> 
> href="mailto:
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org">
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org
>>
>>
>
> 
> -
> To 
> unsubscribe, e-mail: 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: 
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org
>
>

-
To 
> unsubscribe, e-mail: 
> href="mailto:java-user-unsubscr...@lucene.apache.org";>java-user-unsubscr...@lucene.apache.org
For 
> additional commands, e-mail: 
> ymailto="mailto:java-user-h...@lucene.apache.org"; 
> href="mailto:java-user-h...@lucene.apache.org";>java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Overriding Lucene's term weights computation

2010-06-23 Thread Naama Kraus
ok, thanks Yuval. I'll take a look.
Could you (or anyone) please elaborate why payloads "seem like a worse fit"
?

TX, Naama

On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein wrote:

> Naama, Maybe you could use the new flexible indexing mechanism.
> Some information is in this lecture:
>
> http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
> Alternatively, you may use payloads, but they seem like a worse fit.
> Good Luck,
> Yuval
>
> 
> From: Naama Kraus [naamakr...@gmail.com]
> Sent: Wednesday, June 23, 2010 1:38 PM
> To: java-user@lucene.apache.org
> Subject: Overriding Lucene's term weights computation
>
> Hi,
>
> Is there a way for an application to index a document along with its "term
> weighted vector" (Lucene's TermFreqVector). I.e., override the term
> frequencies computed by Lucene, with an application's computed term weights
> (non frequency based) ?
> I don't think I want to use Scorer#score() for applying score changes as
> this one is activated at search time which won't work for me.
>
> Thanks for any insight,
> Naama
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>