Hi, I am using Biolerpipe in nutch to remove noise like headers, footers, advertisement to get relevant content. But I lost html tagging in it. I want to retain html tagging in extracted content for example paragraph tags like div, td, tr. Is there any way to retain the html tagging in extracted content from biolerpipe in nutch.
Thanks Vineet Yadav

