Hi,

I have crawled PDFs using Nutch 1.7. I found that "content" field has no
line breaks. It grabbed all the paragraphs in the PDF as one aggregated
paragraph without line breaks. Is it possible to crawl such that the
"content" field has line breaks the way it appears in the original PDF?

Please advise.

Thanks,
AL

Reply via email to