Thank you Sadiki. The patch is working as expected.
Kind regards, Hany Shehata Enterprise Engineer Green Six Sigma Certified Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, Poland __________________________________________________________________ Tie line: 7148 7689 4698 External: +48 123 42 0698 Mobile: +48 723 680 278 E-mail: hany.n...@hsbc.com __________________________________________________________________ Protect our environment - please only print this if you have to! -----Original Message----- From: Sadiki Latty [mailto:sla...@uottawa.ca] Sent: 26 March 2019 12:05 To: user@nutch.apache.org Subject: RE: Meta tags are duplicated Hey, This is caused by usage of the Tika plugin and MetatagParser. I am currently using this patch to resolve the issue https://issues.apache.org/jira/browse/NUTCH-1559 Cheers, Sadiki Latty Web Developer/ Développeur Web Technologies de l’information / Information Technology Université d'Ottawa | University of Ottawa 1 Nicholas (801) 613-562-5800 ext. 7512 -----Original Message----- From: hany.n...@hsbc.com.INVALID [mailto:hany.n...@hsbc.com.INVALID] Sent: March 26, 2019 4:53 AM To: user@nutch.apache.org Subject: Meta tags are duplicated Hello.... I'm using Nutch 1.15 and parsing/indexing meta tags using parse-metatags plugin. Values are always come duplicated and forced me to change Solr fields to multivalue. Example: <field name="keywords" type="string" multiValued="true" indexed="true" stored="true"/> Moreover, I ran indexchecker and can see the duplication as well. Any advice how to remove this duplication? Kind regards, Hany Shehata Enterprise Engineer Green Six Sigma Certified Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, Poland __________________________________________________________________ Tie line: 7148 7689 4698 External: +48 123 42 0698 Mobile: +48 723 680 278 E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com> __________________________________________________________________ Protect our environment - please only print this if you have to! ----------------------------------------- SAVE PAPER - THINK BEFORE YOU PRINT! This E-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail. Internet communications cannot be guaranteed to be timely secure, error or virus-free. The sender does not accept liability for any errors or omissions. *************************************************** This message originated from the Internet. Its originator may or may not be who they claim to be and the information contained in the message and any attachments may or may not be accurate. **************************************************** ----------------------------------------- SAVE PAPER - THINK BEFORE YOU PRINT! This E-mail is confidential. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail. Internet communications cannot be guaranteed to be timely secure, error or virus-free. The sender does not accept liability for any errors or omissions.