Parse pulls strange urls

Ned Rockson Thu, 13 Sep 2007 14:01:22 -0700

Hi,

I've been noticing some strange behavior from nutch parsing recently
and I'm wondering if someone can walk me through setting up a (non
map/reduce) test of certain URLs or parsing certain pages.  I can't
find any examples that set up jobs w/o contacting the taskserver and
making a job configuration.


For a quick example:

<input type=submit value=Tag
 onclick="AddTag(['args__KilluspalKrónika/2007-09-12', 'tag17',
'args__17'], ['tagdiv17'], 'POST');
 document.getElementById('tagdiv17').innerHTML='Vákicsit!'; return false;">

Yes, it's AJAX, and it uses the perl CGI::AJAX module.
Unfortunately, your crawler mistakes the string
'args__KilluspalKrónika/2007-09-12' to be an URI, my guess is that the
slash in the string misleads it to do so. So, later on, your crawler
tries to fetch
http://apocalypse.rulez.org/kozos/args__KilluspalKrónika/2007-09-12

(this was emailed to me by a webmaster.)

Thanks for any help,
Ned

Parse pulls strange urls

Reply via email to