Karl,
can you give, in one paragraph, the difference between ManifoldCF and DIH?
thanks in advance
paul
Le 1 mars 2011 à 23:23, karl.wri...@nokia.com a écrit :
> Dear Lucene/Solr user,
>
> It is possible you may not know of an Apache project called ManifoldCF, whose
> purpose is to provide
Dear Lucene/Solr user,
It is possible you may not know of an Apache project called ManifoldCF, whose
purpose is to provide content to Solr for index. If you have interest in this
project, this is to inform you that the ManifoldCF book from Manning
Publishing, titled ManifoldCF in Action, is no
Hi All,
Do you have any BM 25 scoring implementation which can be used with Lucene?
How can I find and use the implementation mentioned in following jira entry?
https://issues.apache.org/jira/browse/LUCENE-2091
Thanks,
Lahiru
Check out the Mahout project: mahout.apache.org -> there is a
lucene-based text classifier project in there.
Lance
On Tue, Mar 1, 2011 at 9:25 PM, Sundus Hassan wrote:
> I am doing MS-Thesis on content-based text categorization.
> For This purpose I intend to use LUCENE.I need some
> help/tutori
I am doing MS-Thesis on content-based text categorization.
For This purpose I intend to use LUCENE.I need some
help/tutorial/guide regarding:
1) How to build and deploy LUCENE?
2) Some basic information regarding working of Lucene?
3) How to use LUCENE in my project?
Will be looking forward for r
Hi,
I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web
Crawler. It includes :
* a crawler
* a document processing pipeline
* a solr indexer
The crawler has a web administration in order to manage web sites to be
crawled. Each web site crawl is configured with a lo
Yes, for the other methods (except scorePayload), I just use delegate to
the corresponding method in DefaultSimilarity. The reason is that I
don't have a way to trigger off the field name for these others. For me,
I really only need to distinguish between DefaultSimilarity and
PayloadSimilarity (wh
I see, but I don't get one thing... you are actually customizing only
normLenght method but not all the other methods that are calculating the
similarity scores...
those methods are called and they have the implementation you have in
DefaultSimilarityClass.. right ?
On 1 March 2011 21:12, Sujit
One way to do this currently is to build a per field similarity wrapper
(that triggers off the field name). I believe there is some work going
on with Lucene Similarity that would make it pluggable for this sort of
stuff, but in the meantime, this is what I did:
public class MyPerFieldSimilarityWr
I need to define different similarity scores per document field.
For example for field A I want to use Lucene tf.idf score, for the numerical
field B I want to use a different metric (difference between values) and so
on...
thanks
Hi,
we developped a real time logging system. we index 4.5 millions
events/day, spread over multiple servers, each with its own index. every
night with delete events from the index based on a retention policy then
we optimize. each server takes between 1 and 2 hours to optimize. ideally,
we wo
On Tue, Mar 1, 2011 at 3:17 AM, wrote:
> Hi, OK so I will not bother using TieredMergePolicy for now. I will do
> some more tests with the contrib balanced merge policy, playing with the
> optimize(maxNumSegments) to try decreasing the optimize time (which is an
> issue for us today). My index co
Hello,
The MoreLikeThisHandler could include higlighting ?
Is it true to define a MoreLikeThisHandler like this: ?
true
contenu
Thank you for your help.
Amel.
Thanks nick, will try that
On Tue, Mar 1, 2011 at 12:00 PM, Nick Pellow wrote:
> Have you considered storing the length of the field in a Payload?
> You could do that during analysis.
>
> Cheers,
> Nick
>
> On 01/03/2011, at 5:06 PM, Lahiru Samarakoon wrote:
>
> > Hi Anshum,
> >
> > I am trying
Hi, OK so I will not bother using TieredMergePolicy for now. I will do
some more tests with the contrib balanced merge policy, playing with the
optimize(maxNumSegments) to try decreasing the optimize time (which is an
issue for us today). My index contains 35 millions documents. The size on
dis
15 matches
Mail list logo