Re: ManifoldCF in Action

2011-03-01 Thread Paul Libbrecht
Karl, can you give, in one paragraph, the difference between ManifoldCF and DIH? thanks in advance paul Le 1 mars 2011 à 23:23, karl.wri...@nokia.com a écrit : > Dear Lucene/Solr user, > > It is possible you may not know of an Apache project called ManifoldCF, whose > purpose is to provide

ManifoldCF in Action

2011-03-01 Thread karl.wright
Dear Lucene/Solr user, It is possible you may not know of an Apache project called ManifoldCF, whose purpose is to provide content to Solr for index. If you have interest in this project, this is to inform you that the ManifoldCF book from Manning Publishing, titled ManifoldCF in Action, is no

BM 25 scoring with lucene

2011-03-01 Thread Lahiru Samarakoon
Hi All, Do you have any BM 25 scoring implementation which can be used with Lucene? How can I find and use the implementation mentioned in following jira entry? https://issues.apache.org/jira/browse/LUCENE-2091 Thanks, Lahiru

Re: Help!

2011-03-01 Thread Lance Norskog
Check out the Mahout project: mahout.apache.org -> there is a lucene-based text classifier project in there. Lance On Tue, Mar 1, 2011 at 9:25 PM, Sundus Hassan wrote: > I am doing MS-Thesis on content-based text categorization. > For This purpose I intend to use LUCENE.I need some > help/tutori

Help!

2011-03-01 Thread Sundus Hassan
I am doing MS-Thesis on content-based text categorization. For This purpose I intend to use LUCENE.I need some help/tutorial/guide regarding: 1) How to build and deploy LUCENE? 2) Some basic information regarding working of Lucene? 3) How to use LUCENE in my project? Will be looking forward for r

[ANNOUNCE] Web Crawler

2011-03-01 Thread Dominique Bejean
Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lo

Re: How to define different similarity scores per field ?

2011-03-01 Thread Sujit Pal
Yes, for the other methods (except scorePayload), I just use delegate to the corresponding method in DefaultSimilarity. The reason is that I don't have a way to trigger off the field name for these others. For me, I really only need to distinguish between DefaultSimilarity and PayloadSimilarity (wh

Re: How to define different similarity scores per field ?

2011-03-01 Thread Patrick Diviacco
I see, but I don't get one thing... you are actually customizing only normLenght method but not all the other methods that are calculating the similarity scores... those methods are called and they have the implementation you have in DefaultSimilarityClass.. right ? On 1 March 2011 21:12, Sujit

Re: How to define different similarity scores per field ?

2011-03-01 Thread Sujit Pal
One way to do this currently is to build a per field similarity wrapper (that triggers off the field name). I believe there is some work going on with Lucene Similarity that would make it pluggable for this sort of stuff, but in the meantime, this is what I did: public class MyPerFieldSimilarityWr

How to define different similarity scores per field ?

2011-03-01 Thread Patrick Diviacco
I need to define different similarity scores per document field. For example for field A I want to use Lucene tf.idf score, for the numerical field B I want to use a different metric (difference between values) and so on... thanks

Re: recurrent IO/CPU peaks

2011-03-01 Thread v . sevel
Hi, we developped a real time logging system. we index 4.5 millions events/day, spread over multiple servers, each with its own index. every night with delete events from the index based on a retention policy then we optimize. each server takes between 1 and 2 hours to optimize. ideally, we wo

Re: recurrent IO/CPU peaks

2011-03-01 Thread Michael McCandless
On Tue, Mar 1, 2011 at 3:17 AM, wrote: > Hi, OK so I will not bother using TieredMergePolicy for now. I will do > some more tests with the contrib balanced merge policy, playing with the > optimize(maxNumSegments) to try decreasing the optimize time (which is an > issue for us today). My index co

The MoreLikeThisHandler could include highlighting ?

2011-03-01 Thread Amel Fraisse
Hello, The MoreLikeThisHandler could include higlighting ? Is it true to define a MoreLikeThisHandler like this: ? true contenu Thank you for your help. Amel.

Re: finding the length of a field

2011-03-01 Thread Lahiru Samarakoon
Thanks nick, will try that On Tue, Mar 1, 2011 at 12:00 PM, Nick Pellow wrote: > Have you considered storing the length of the field in a Payload? > You could do that during analysis. > > Cheers, > Nick > > On 01/03/2011, at 5:06 PM, Lahiru Samarakoon wrote: > > > Hi Anshum, > > > > I am trying

Re: recurrent IO/CPU peaks

2011-03-01 Thread v . sevel
Hi, OK so I will not bother using TieredMergePolicy for now. I will do some more tests with the contrib balanced merge policy, playing with the optimize(maxNumSegments) to try decreasing the optimize time (which is an issue for us today). My index contains 35 millions documents. The size on dis