Sorry for off-topic, but how do you make Nutch-0.9 search multiple indexes?

On Thu, Sep 25, 2008 at 4:42 PM, Dennis Kubes <[EMAIL PROTECTED]> wrote:
> If you are using more than one index then dedup will not work across
> indexes.  A single index should dedup correctly unless the pages are not
> exact duplicates but near duplicates.  The dedup process works on url and
> byte hash.  If the content is even 1 byte different, it doesn't work.
>
> Near duplicate detection is another set of algorithms that haven't been
> implemented in Nutch yet.  On the query site you can set hte hitsPerSite to
> 1 and it should limit your search results.
>
> Dennis
>
> Edward Quick wrote:
>>
>> Hi,
>>
>> Eventhough I ran nutch dedup on my index, I still have pages with
>> different urls but the exactly the same content (see search result example
>> below). From what I read up on dedup this shouldn't happen though as it
>> deletes the url with the lowest score. Is there anything else I can try to
>> get rid of these?
>>
>> Thanks,
>> Ed.
>>
>> Item Document :- Client - TeraTerm Pro
>> ... Item Document :- Client - TeraTerm Pro Intranet - Technical Standards
>> Online   Employee Self Service       ESS Home ... Description Document
>> Technology Category: Client Name of item: TeraTerm Pro Related policy: Unix
>> Access Tool Vendor: Current Technical Status ... standard Telnet tool. Where
>> printing or keymapping is an issue, TeraTerm ...
>>
>> http://www.somedomain.com/im/tech/technica.nsf/8918e269a19be23f802563ef004e8e7a/441cdf92bbe06a9e80256c87003d81d9?OpenDocument
>> (cached) (explain) (anchors)
>>
>>
>>
>> Item Document :- Client - TeraTerm Pro
>> ... Item Document :- Client - TeraTerm Pro Intranet - Technical Standards
>> Online   Employee Self Service       ESS Home ... Description Document
>> Technology Category: Client Name of item: TeraTerm Pro Related policy: Unix
>> Access Tool Vendor: Current Technical Status ... standard Telnet tool. Where
>> printing or keymapping is an issue, TeraTerm ...
>>
>> http://www.somedomain.com/im/tech/technica.nsf/dacff06c3e1dbc9780257273004e1e3b/441cdf92bbe06a9e80256c87003d81d9?OpenDocument
>> (cached) (explain) (anchors)
>> _________________________________________________________________
>> Make a mini you and download it into Windows Live Messenger
>> http://clk.atdmt.com/UKM/go/111354029/direct/01/
>



-- 
with best regards,
David Jashi
Web development EO,
Caucasus Online
+995(32)970368
[EMAIL PROTECTED]

პატივისცემით,
დავით ჯაში
ვებ–განვითარების დირექტორი
"კავკასუს ონლაინი"
+995(32)970368
[EMAIL PROTECTED]

Reply via email to