thank you very much!!!!!!

On Thu, Sep 25, 2008 at 9:26 PM, Dennis Kubes <[EMAIL PROTECTED]> wrote:

> In search.jsp lines 116-119:
>
>  int hitsPerSite = 2;                            // max hits per site
>  String hitsPerSiteString = request.getParameter("hitsPerSite");
>  if (hitsPerSiteString != null)
>    hitsPerSite = Integer.parseInt(hitsPerSiteString);
>
> Hope that helps.
>
> Dennis
>
>
> vishal vachhani wrote:
>
>> Dennis,
>>            I am facing same problem, in my crawl content of some urls are
>> same but urls are different. Could you please tell me how I can set
>> hitsPersite to 1 . ?
>>
>> --Vishal
>>
>> On Thu, Sep 25, 2008 at 6:12 PM, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>>
>>  If you are using more than one index then dedup will not work across
>>> indexes.  A single index should dedup correctly unless the pages are not
>>> exact duplicates but near duplicates.  The dedup process works on url and
>>> byte hash.  If the content is even 1 byte different, it doesn't work.
>>>
>>> Near duplicate detection is another set of algorithms that haven't been
>>> implemented in Nutch yet.  On the query site you can set hte hitsPerSite
>>> to
>>> 1 and it should limit your search results.
>>>
>>> Dennis
>>>
>>>
>>> Edward Quick wrote:
>>>
>>>  Hi,
>>>>
>>>> Eventhough I ran nutch dedup on my index, I still have pages with
>>>> different urls but the exactly the same content (see search result
>>>> example
>>>> below). From what I read up on dedup this shouldn't happen though as it
>>>> deletes the url with the lowest score. Is there anything else I can try
>>>> to
>>>> get rid of these?
>>>>
>>>> Thanks,
>>>> Ed.
>>>>
>>>> Item Document :- Client - TeraTerm Pro
>>>> ... Item Document :- Client - TeraTerm Pro Intranet - Technical
>>>> Standards
>>>> Online   Employee Self Service       ESS Home ... Description Document
>>>> Technology Category: Client Name of item: TeraTerm Pro Related policy:
>>>> Unix
>>>> Access Tool Vendor: Current Technical Status ... standard Telnet tool.
>>>> Where
>>>> printing or keymapping is an issue, TeraTerm ...
>>>>
>>>>
>>>> http://www.somedomain.com/im/tech/technica.nsf/8918e269a19be23f802563ef004e8e7a/441cdf92bbe06a9e80256c87003d81d9?OpenDocument(cached)<http://www.somedomain.com/im/tech/technica.nsf/8918e269a19be23f802563ef004e8e7a/441cdf92bbe06a9e80256c87003d81d9?OpenDocument%28cached%29>(explain)
>>>>  (anchors)
>>>>
>>>>
>>>>
>>>> Item Document :- Client - TeraTerm Pro
>>>> ... Item Document :- Client - TeraTerm Pro Intranet - Technical
>>>> Standards
>>>> Online   Employee Self Service       ESS Home ... Description Document
>>>> Technology Category: Client Name of item: TeraTerm Pro Related policy:
>>>> Unix
>>>> Access Tool Vendor: Current Technical Status ... standard Telnet tool.
>>>> Where
>>>> printing or keymapping is an issue, TeraTerm ...
>>>>
>>>>
>>>> http://www.somedomain.com/im/tech/technica.nsf/dacff06c3e1dbc9780257273004e1e3b/441cdf92bbe06a9e80256c87003d81d9?OpenDocument(cached)<http://www.somedomain.com/im/tech/technica.nsf/dacff06c3e1dbc9780257273004e1e3b/441cdf92bbe06a9e80256c87003d81d9?OpenDocument%28cached%29>(explain)
>>>>  (anchors)
>>>> _________________________________________________________________
>>>> Make a mini you and download it into Windows Live Messenger
>>>> http://clk.atdmt.com/UKM/go/111354029/direct/01/
>>>>
>>>>
>>


-- 
Thanks and Regards,
Vishal Vachhani
M.tech, CSE dept
Indian Institute of Technology, Bombay
http://www.cse.iitb.ac.in/~vishalv

Reply via email to