Thank you Sadiki.

The patch is working as expected.

Kind regards, 
Hany Shehata
Enterprise Engineer
Green Six Sigma Certified
Solutions Architect, Marketing and Communications IT 
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland
__________________________________________________________________ 

Tie line: 7148 7689 4698 
External: +48 123 42 0698 
Mobile: +48 723 680 278 
E-mail: hany.n...@hsbc.com 
__________________________________________________________________ 
Protect our environment - please only print this if you have to!

-----Original Message-----
From: Sadiki Latty [mailto:sla...@uottawa.ca] 
Sent: 26 March 2019 12:05
To: user@nutch.apache.org
Subject: RE: Meta tags are duplicated

Hey,

This is caused by usage of the Tika plugin and MetatagParser. I am currently 
using this patch to resolve the issue

https://issues.apache.org/jira/browse/NUTCH-1559

Cheers,

Sadiki Latty
Web Developer/ Développeur Web
Technologies de l’information / Information Technology Université d'Ottawa | 
University of Ottawa
1 Nicholas (801)
613-562-5800 ext. 7512


-----Original Message-----
From: hany.n...@hsbc.com.INVALID [mailto:hany.n...@hsbc.com.INVALID] 
Sent: March 26, 2019 4:53 AM
To: user@nutch.apache.org
Subject: Meta tags are duplicated

Hello....

I'm using Nutch 1.15 and parsing/indexing meta tags using parse-metatags plugin.

Values are always come duplicated and forced me to change Solr fields to 
multivalue.

Example:  <field name="keywords" type="string" multiValued="true" 
indexed="true" stored="true"/>

Moreover, I ran indexchecker and can see the duplication as well.

Any advice how to remove this duplication?

Kind regards,
Hany Shehata
Enterprise Engineer
Green Six Sigma Certified
Solutions Architect, Marketing and Communications IT Corporate Functions | HSBC 
Operations, Services and Technology (HOST) ul. Kapelanka 42A, 30-347 Kraków, 
Poland __________________________________________________________________

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com>
__________________________________________________________________
Protect our environment - please only print this if you have to!



-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not 
copy, forward, disclose or use any part of it. If you have received this 
message in error, please delete it and all copies from your system and notify 
the sender immediately by return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or 
virus-free.
The sender does not accept liability for any errors or omissions.


***************************************************
This message originated from the Internet. Its originator
may or may not be who they claim to be and the information
contained in the message and any attachments may or may
not be accurate.
****************************************************

 


-----------------------------------------
SAVE PAPER - THINK BEFORE YOU PRINT!

This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not 
copy,
forward, disclose or use any part of it. If you have received this message in 
error,
please delete it and all copies from your system and notify the sender 
immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or 
virus-free.
The sender does not accept liability for any errors or omissions.

Reply via email to