Stefan Neufeind wrote:
sudhendra seshachala wrote:
I am experiencing a similar problem.
  What I have done is as follows.
  I have different parse-plugin for each site ( I have 3 sites to crawl and 
fetch data). But I capture data into same format I call it datarepository.
  I have one index-plugin which indexes on data repository and one query-plugin 
on the data repository,
  I dont have to run multiple instances. I just run one instance of search 
engine.
  However the parse configuration is different for each site so I run different 
crawler for each site
  Then I index and merge all of them. So far the results are good if not "WOW".
  I still have to figure a way of ranking the page. For example I would like to 
be able to apply ranking on the data repository. Let me know If I was clear...

Hi,

not sure if I got you right with your last point, but it just came to my
mind:
It would be nice to be able to have something like
"If it's from indexA, give it 100 extra-points - if from indexB give it
50 extra-points". Or some "if indexA give it 20% extra-weight" or so.
But I don't believe this is easily doable. Or is it?

I got a similar problem with languages: give priority to documents in
German and English. But somewhere after those results also list
documents in other languages. So I'd need to be able to give
"extra-points" on a "per-language"-basis, based on the indexed
language-field, right?


This is not only doable, but fairly easy - just add these fields to the index through a custom IndexingFilter plugin, and then implement a corresponding QueryPlugin that will expand your query appropriately - this "prioritization" that you describe is equivalent to adding a non-required and non-prohibited clause to a Lucene query. Please see how it's done in the existing index-more/query-more and index-basic/query-basic plugins.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to