Hey,

This is caused by usage of the Tika plugin and MetatagParser. I am currently 
using this patch to resolve the issue

https://issues.apache.org/jira/browse/NUTCH-1559

Cheers,

Sadiki Latty
Web Developer/ Développeur Web
Technologies de l’information / Information Technology
Université d'Ottawa | University of Ottawa 
1 Nicholas (801)
613-562-5800 ext. 7512


-----Original Message-----
From: hany.n...@hsbc.com.INVALID [mailto:hany.n...@hsbc.com.INVALID] 
Sent: March 26, 2019 4:53 AM
To: user@nutch.apache.org
Subject: Meta tags are duplicated

Hello....

I'm using Nutch 1.15 and parsing/indexing meta tags using parse-metatags plugin.

Values are always come duplicated and forced me to change Solr fields to 
multivalue.

Example:  <field name="keywords" type="string" multiValued="true" 
indexed="true" stored="true"/>

Moreover, I ran indexchecker and can see the duplication as well.

Any advice how to remove this duplication?

Kind regards,
Hany Shehata
Enterprise Engineer
Green Six Sigma Certified
Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC 
Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, 
Poland __________________________________________________________________

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com>
__________________________________________________________________
Protect our environment - please only print this if you have to!



-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not 
copy, forward, disclose or use any part of it. If you have received this 
message in error, please delete it and all copies from your system and notify 
the sender immediately by return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or 
virus-free.
The sender does not accept liability for any errors or omissions.

Reply via email to