Hi. I am new using Nutch 1.9(local mode) and Solr 4.10 and I have a problem when the spider try to identify the title of resources. I mean that in many cases Nutch don't identify the title of a web page, however this page have a title. I did a parsechecker to this web page and nutch don't detected any title.
This URL is an example. This page has a title and nutch don't detect it: http://www.ecured.cu/index.php/Especial:CambiosEnEnlazadas/EcuRed:Enciclopedia_cubana For this URL, this is my output of the parsechecker: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ [root@cidicubanutch2 generales]# bin/nutch parsechecker http://www.ecured.cu/index.php/Especial:CambiosEnEnlazadas/EcuRed:Enciclopedia_cubana fetching: http://www.ecured.cu/index.php/Especial:CambiosEnEnlazadas/EcuRed:Enciclopedia_cubana parsing: http://www.ecured.cu/index.php/Especial:CambiosEnEnlazadas/EcuRed:Enciclopedia_cubana contentType: text/html signature: 32541e28e020f7c290735bfe2cc4c7b3 --------- Url --------------- http://www.ecured.cu/index.php/Especial:CambiosEnEnlazadas/EcuRed:Enciclopedia_cubana --------- ParseData --------- Version: 5 Status: success(1,0) Title: Outlinks: 0 Content Metadata: Content-Language=es Content-Length=9504 Expires=Sat, 23 May 2015 17:35:15 GMT Connection=close X-Cache-Lookup=MISS from www.ecured.cu:80 Server=Apache X-Cache=MISS from www.ecured.cu X-Content-Type-Options=nosniff Cache-Control=s-maxage=10, must-revalidate, max-age=0, max-age=2592000 X-Frame-Options=DENY Date=Thu, 23 Apr 2015 17:35:15 GMT Vary=Accept-Encoding,Cookie,User-Agent nutch.crawl.score=0.0 Content-Encoding=gzip Via=1.0 www.ecured.cu (squid/3.1.10) Content-Type=text/html; charset=UTF-8 Parse Metadata: Custom-Tag=h1- Cambios relacionados con «EcuRed:Enciclopedia cubana» Custom-Tag=strong-(+1940) Custom-Tag=strong-(+4392) Custom-Tag=strong-50 Custom-Tag=strong-7 CharEncodingForConversion=utf-8 OriginalCharEncoding=utf-8 language=lt [root@cidicubanutch2 generales]# ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ I have the field "title" as a required="true" on the schema.xml of the Nutch and Solr to prevent indexing the resources without title. I hope anybody can help me.

