Hello, map the file field using mapper-attachments plugin - https://github.com/elastic/elasticsearch-mapper-attachments
Roman On 1 December 2015 at 16:17, Corey, Stephen <[email protected]> wrote: > I’m putting together a proof-of-concept for crawling our website content > with MCF, and indexing it with ES. At a basic level, everything seems to be > working. What I’m trying to understand is that when MCF indexes web content, > the HTML is stored inside an object called file in a property called > _content. When this is added to the ES index, all the HTML is Base64 > encoded. I believe this is preventing ES from property searching the field. > > > > Is this Base64 encoding to be expected, or do I need to change something? > > > > Does anyone have a walkthrough of using MCF to crawl web content, and output > to ES? I’ve seen many many guides for both systems, but never something that > combines the two. I’d prefer to avoid using Nutch for crawling, since it > lacks any UI for management. > > > > > > Stephen Corey > > Technology Consultant > East Carolina University > > [email protected] > >
