Gal Nitzan wrote:
Hi Andrzej,

I have two questions in regards to ParseOutputFormat.java:

1. On line 102 a String[] is used. Do you think it might be better to use a
ListArray? It will save a few cycles down the road -- it shall save you to
use "validCount" and will save you the "if" on line 121. I can make a patch
if you think I'm correct on this.

I doubt it would save anything, and even if, the savings would be negligible. Creating a new entry in ListArray and hooking it up to the list has some cost, too.

2. If I understand the functionality correct, on line 87 a new CrawlDatum is
created for the fetched page. The interval is set to 0.0. Could you please
explain why it is set to 0.0?
That's only a special additional CrawlDatum, which serves as a signature 
container. You see, if we don't parse at the same time as we fetch then we 
can't put the signature in the same CrawlDatum (see the logic in 
Fetcher.FetcherThread.output()), so we need another instance, to pick up the 
signature when running updatedb.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to