Hi, Sometimes in the header of pages that are <link> tag that link to pages that are source code that doesn't interesting for example http://......../somexmlsettingsdata?type=xml This link is not suffix xml so I can't filter it out but I want that the nutch will get only links from body and not from the header. Is this possible? (I'm using nutch 1.9)
Thanks, Shani --------------------------------------------------------------------- Intel Electronics Ltd. This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

