Hi,
I am using Biolerpipe in nutch to remove noise like headers, footers,
advertisement
to get relevant content. But I lost html tagging in it. I want to retain
html tagging in extracted content for example paragraph tags like div, td,
tr. Is there any way to retain the html tagging in extracted content from
biolerpipe in nutch.

Thanks
Vineet Yadav

Reply via email to