Hi all. I need some help with this problem, sorry if is a trivial things. I have a little problem with some url that have noindex meta and are being indexed.
For example this url: https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ have the meta noindex and for some reason it is not deleted as well and <meta name="robots" content="noindex,follow"/> I have read that nutch should delete this document at the indexing time and it is not occurring correctly. <property> <name>indexer.delete.robots.noindex</name> <value>true</value> </property> If i do a parsechecker the output has an empty content but the document it is not deleted: fetching: https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ robots.txt whitelist not configured. parsing: https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ contentType: text/html date : Wed May 10 14:21:36 CDT 2017 agent : cubbot type : text/html type : text type : html title : 3 url : https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ content : tstamp : Wed May 10 14:21:36 CDT 2017 domain : uci.cu digest : 25ed6b1b7be4cbb69a3405f5efe2f8a2 host : humanos.uci.cu name : 3 id : https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ lang : es Please any help or suggestion will be appreciated. **************************************************** Text below is autogenerated *************************************************** La @universidad_uci es Fidel. Los jóvenes no fallaremos. #HastaSiempreComandante #HastalaVictoriaSiempre

