***I might have posted this already, my mail server is playing up. apologies if so***
hi there, Been playing with Nutch for a few weeks now, so i am starting on coming up something usable but i need some suggestions here; Heres the problem - crawl the web (maybe 50 sites or so) and get physical addreses; i want to index physical addresses found on the crawl, so my search results should return "Company Name, State" as the Title, the Summary can be what ever is found on that page. [this is just an example to simplify what i want to say] To index, looking at the Nutch code, seems i have to parse the HTML content and look for the details I need to be searchable.. at the moment only things found in META data is indexed but i want to expand this with custom fields, such as company name, state etc.. Whats the best way to go about this? I want to write a plug in for this; Which classes do i start with and how do i tackle this? Thanks Fadzi
