Thank you Sebastian. I'm able to get the HTTP headers as you explained below.
How can I index this value on Solr? What is the difference between protocol-okhttp and protocol-http? Kind regards, Hany Shehata Enterprise Engineer Green Six Sigma Certified Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, Poland __________________________________________________________________ Tie line: 7148 7689 4698 External: +48 123 42 0698 Mobile: +48 723 680 278 E-mail: [email protected] __________________________________________________________________ Protect our environment - please only print this if you have to! -----Original Message----- From: Sebastian Nagel [mailto:[email protected]] Sent: 11 March 2019 17:06 To: [email protected] Subject: Re: Nutch and HTTP headers Hi, > Can Nutch index custom HTTP headers? Nutch stores the HTTP response headers if the property `store.http.headers` is true. The headers are saved as string concatenated by `\r\n` under the key `_response.headers_` in the content metadata. You can send the entire HTTP headers to the indexer using the plugin index-metadata and adding `_response.headers_` to `index.content.md`. It will add a field `_response.headers_` to the index: % bin/nutch indexchecker \ -Dplugin.includes='protocol-okhttp|parse-html|index-metadata' \ -Dstore.http.headers=true \ -Dindex.content.md=_response.headers_ \ 'http://localhost/' fetching: http://localhost/ ... _response.headers_ : HTTP/1.1 200 OK Date: Mon, 11 Mar 2019 16:03:41 GMT Server: Apache/2.4.29 (Ubuntu) Last-Modified: ... But there is no standard way to pick single headers and send them to the indexer as arbitrary fields. Best, Sebastian On 3/11/19 4:21 PM, [email protected] wrote: > Hello, > > Can Nutch index custom HTTP headers? > > Kind regards, > Hany Shehata > Enterprise Engineer > Green Six Sigma Certified > Solutions Architect, Marketing and Communications IT Corporate > Functions | HSBC Operations, Services and Technology (HOST) ul. > Kapelanka 42A, 30-347 Kraków, Poland > __________________________________________________________________ > > Tie line: 7148 7689 4698 > External: +48 123 42 0698 > Mobile: +48 723 680 278 > E-mail: [email protected]<mailto:[email protected]> > __________________________________________________________________ > Protect our environment - please only print this if you have to! > > > > ----------------------------------------- > SAVE PAPER - THINK BEFORE YOU PRINT! > > This E-mail is confidential. > > It may also be legally privileged. If you are not the addressee you > may not copy, forward, disclose or use any part of it. If you have > received this message in error, please delete it and all copies from > your system and notify the sender immediately by return E-mail. > > Internet communications cannot be guaranteed to be timely secure, error or > virus-free. > The sender does not accept liability for any errors or omissions. > *************************************************** This message originated from the Internet. Its originator may or may not be who they claim to be and the information contained in the message and any attachments may or may not be accurate. **************************************************** ----------------------------------------- SAVE PAPER - THINK BEFORE YOU PRINT! This E-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail. Internet communications cannot be guaranteed to be timely secure, error or virus-free. The sender does not accept liability for any errors or omissions.

