Hi,
I want to extract the location name from nutch query result page , for
example , I have a lot of crawled pages , and then I request a query in
nutch , and then I hope to extract the location name from the page that show
the query result . How can I do this ?
What I am thinking is firstly , I need to know the location of the exact
query page . Then I can use tika html parser to get all the information as
a plain text from that query result page .( Is that right?) Secondly , I
am writing codes for extract location name from this plain text , and
regarded as a "second-time" parser for location name .
now I am using nutch -1.3 and solr and tika 0.9
any suggestion will be helpful
Thanks
--
Cheng Li