Hi,
Sometimes in the header of pages that are <link> tag that link to pages that 
are source code that doesn't interesting for example 
http://......../somexmlsettingsdata?type=xml
This link is not suffix xml so I can't filter it out but I want that the nutch 
will get only links from body and not from the header.
Is this possible? (I'm using nutch 1.9)

Thanks,
Shani

---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply via email to