Hi Guenter, the site I have trouble with is
http://www.dmgbielefeld.de/de,dmg,dmg-bielefeld Some links of the site will be extracted but up to 80% not. I have switched the JavaScript plugin on. My be, can you take a look... That would help me... "Guenter, Matthias" <[EMAIL PROTECTED]> wrote on 17.02.2006 09:04:12: > Hi Elwin > Can you provide samples of not working links and code? And put it into JIRA? > Kind regards > Matthias > > > > -----Ursprüngliche Nachricht----- > Von: Elwin [mailto:[EMAIL PROTECTED] > Gesendet: Fr 17.02.2006 08:51 > An: [email protected] > Betreff: extract links problem with parse-html plugin > > It seems that the parse-html plguin may not process many pages well, because > I have found that the plugin can't extract all valid links in a page when I > test it in my code. > I guess that it may be caused by the style of a html page? When I "view > source" of a html page I used to parse, I saw that some elements in the > source are segmented by some unrequired spaces. However, the situation is > quiet often to the pages of large portal sites or news sites. ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
