[Nutch-general] Re: Problems with MapRed-

Andrzej Bialecki Wed, 01 Feb 2006 11:25:33 -0800

Mike Smith wrote:

I finally find out why this problem happens, there should be a problem with
the JS parser. Because I used this:


<name>plugin.includes</name>

<value>protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>

instead of the default one which has JS in it and I could fetch
http://www.globalmedlaw.com/Canadam.html by depth 2. But, when I use

<name>plugin.includes</name>

<value>protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)</value>

reduce will fail at the end fetching. I came up with this solution because
that page was using a redirected JS page to have some dynamic contents, but
by removing the JS plugin it worked fine. Now, I am going to have a larger
crawl over 100,000 seed urls to see if this really solved the problem.

Do you have any problem with JS parser?

That's an interesting observation. Could you perhaps check what is theexception (if any) from the JS parser when it's failing? It could beemitted into one of the tasktracker logs.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Problems with MapRed-

Reply via email to