Hi,
I've been noticing some strange behavior from nutch parsing recently
and I'm wondering if someone can walk me through setting up a (non
map/reduce) test of certain URLs or parsing certain pages. I can't
find any examples that set up jobs w/o contacting the taskserver and
making a job configuration.
For a quick example:
<input type=submit value=Tag
onclick="AddTag(['args__KilluspalKrónika/2007-09-12', 'tag17',
'args__17'], ['tagdiv17'], 'POST');
document.getElementById('tagdiv17').innerHTML='Vákicsit!'; return false;">
Yes, it's AJAX, and it uses the perl CGI::AJAX module.
Unfortunately, your crawler mistakes the string
'args__KilluspalKrónika/2007-09-12' to be an URI, my guess is that the
slash in the string misleads it to do so. So, later on, your crawler
tries to fetch
http://apocalypse.rulez.org/kozos/args__KilluspalKrónika/2007-09-12
(this was emailed to me by a webmaster.)
Thanks for any help,
Ned