problems with documents with noindex meta

Eyeris Rodriguez Rueda Wed, 10 May 2017 12:01:47 -0700

Hi all.
I need some help with this problem, sorry if is a trivial things.
I have a little problem with some url that have noindex meta and are being 
indexed.


For example this url:
https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/

have the meta noindex and for some reason it is not deleted as well and 
<meta name="robots" content="noindex,follow"/>

I have read that nutch should delete this document at the indexing time and it 
is not occurring correctly.

<property>
  <name>indexer.delete.robots.noindex</name>
  <value>true</value>
</property>

If i do a parsechecker the output has an empty content but the document it is 
not deleted:

fetching: https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/
robots.txt whitelist not configured.
parsing: https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/
contentType: text/html
date :  Wed May 10 14:21:36 CDT 2017
agent : cubbot
type :  text/html
type :  text
type :  html
title : 3
url :   https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/
content :       
tstamp :        Wed May 10 14:21:36 CDT 2017
domain :        uci.cu
digest :        25ed6b1b7be4cbb69a3405f5efe2f8a2
host :  humanos.uci.cu
name :  3
id :    https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/
lang :  es

Please any help or suggestion will be appreciated.
****************************************************
Text below is autogenerated
***************************************************
La @universidad_uci es Fidel. Los jóvenes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre

problems with documents with noindex meta

Reply via email to