Hi all.
I need to get and index response time for each url that nutch crawl.
I have added a responseTime field in solr for this value.

Is there any way to do this with configurations only or i need to do my own 
plugin to extract this key from crawl datum "_rs_" ? 
Please some help about the steps will be apprecciated.


Im have configured http.store.responsetime property to true, what im missing ?.



This is my nutch-site.xml property

<property>
  <name>http.store.responsetime</name>
  <value>true</value>
  <description>Enables us to record the response time of the 
  host which is the time period between start connection to end 
  connection of a pages host. The response time in milliseconds
  is stored in CrawlDb in CrawlDatum's meta data under key &quot;_rs_&quot;
  </description>
</property>

after i have put the key but when i do parsechecker i don´t see data related to 
responseTime in the output.

<property>
  <name>db.parsemeta.to.crawldb</name>
  <value>&quot;_rs_&quot;</value>
  <description>Comma-separated list of parse metadata keys to transfer to the 
crawldb (NUTCH-779).
   Assuming for instance that the languageidentifier plugin is enabled, setting 
the value to 'lang' 
   will copy both the key 'lang' and its value to the corresponding entry in 
the crawldb.
  </description>
</property>
La @universidad_uci es Fidel. Los jóvenes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre

Reply via email to