Hi,

    I want to extract the location name from  nutch query result page , for
example ,  I have  a lot of crawled pages , and then I request a query in
nutch , and then I hope to extract the location name from the page that show
the query result . How can I do this ?
    What I am thinking is firstly , I need to know the location of the exact
query page . Then I can use tika html parser to get  all the information as
a plain text from that query result page .( Is that right?)   Secondly , I
am writing  codes for extract location name from this plain text , and
regarded as a "second-time" parser for location name .

   now I am using nutch -1.3 and solr  and tika 0.9

 any suggestion will be helpful

Thanks

-- 
Cheng Li

Reply via email to