Hi, > Can Nutch index custom HTTP headers?
Nutch stores the HTTP response headers if the property `store.http.headers` is true. The headers are saved as string concatenated by `\r\n` under the key `_response.headers_` in the content metadata. You can send the entire HTTP headers to the indexer using the plugin index-metadata and adding `_response.headers_` to `index.content.md`. It will add a field `_response.headers_` to the index: % bin/nutch indexchecker \ -Dplugin.includes='protocol-okhttp|parse-html|index-metadata' \ -Dstore.http.headers=true \ -Dindex.content.md=_response.headers_ \ 'http://localhost/' fetching: http://localhost/ ... _response.headers_ : HTTP/1.1 200 OK Date: Mon, 11 Mar 2019 16:03:41 GMT Server: Apache/2.4.29 (Ubuntu) Last-Modified: ... But there is no standard way to pick single headers and send them to the indexer as arbitrary fields. Best, Sebastian On 3/11/19 4:21 PM, hany.n...@hsbc.com.INVALID wrote: > Hello, > > Can Nutch index custom HTTP headers? > > Kind regards, > Hany Shehata > Enterprise Engineer > Green Six Sigma Certified > Solutions Architect, Marketing and Communications IT > Corporate Functions | HSBC Operations, Services and Technology (HOST) > ul. Kapelanka 42A, 30-347 Kraków, Poland > __________________________________________________________________ > > Tie line: 7148 7689 4698 > External: +48 123 42 0698 > Mobile: +48 723 680 278 > E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com> > __________________________________________________________________ > Protect our environment - please only print this if you have to! > > > > ----------------------------------------- > SAVE PAPER - THINK BEFORE YOU PRINT! > > This E-mail is confidential. > > It may also be legally privileged. If you are not the addressee you may not > copy, > forward, disclose or use any part of it. If you have received this message in > error, > please delete it and all copies from your system and notify the sender > immediately by > return E-mail. > > Internet communications cannot be guaranteed to be timely secure, error or > virus-free. > The sender does not accept liability for any errors or omissions. >