Stefan Neufeind wrote:
sudhendra seshachala wrote:
I am experiencing a similar problem.
What I have done is as follows.
I have different parse-plugin for each site ( I have 3 sites to crawl and
fetch data). But I capture data into same format I call it datarepository.
I have one index-plugin which indexes on data repository and one query-plugin
on the data repository,
I dont have to run multiple instances. I just run one instance of search
engine.
However the parse configuration is different for each site so I run different
crawler for each site
Then I index and merge all of them. So far the results are good if not "WOW".
I still have to figure a way of ranking the page. For example I would like to
be able to apply ranking on the data repository. Let me know If I was clear...
Hi,
not sure if I got you right with your last point, but it just came to my
mind:
It would be nice to be able to have something like
"If it's from indexA, give it 100 extra-points - if from indexB give it
50 extra-points". Or some "if indexA give it 20% extra-weight" or so.
But I don't believe this is easily doable. Or is it?
I got a similar problem with languages: give priority to documents in
German and English. But somewhere after those results also list
documents in other languages. So I'd need to be able to give
"extra-points" on a "per-language"-basis, based on the indexed
language-field, right?
This is not only doable, but fairly easy - just add these fields to the
index through a custom IndexingFilter plugin, and then implement a
corresponding QueryPlugin that will expand your query appropriately -
this "prioritization" that you describe is equivalent to adding a
non-required and non-prohibited clause to a Lucene query. Please see how
it's done in the existing index-more/query-more and
index-basic/query-basic plugins.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general