Matthias,
maybe you want to build a advanced local search plugin:Very interesting article!
http://cis.poly.edu/tr/tr-cis-2005-03.pdf could be rich of information how to start.
From my point of view the 'geo coding' via gazetteer isn't that difficult, just an named entity extraction, (our ie-lib provide quite well :-] ).
Results need to be lookup to transform it to geo coordinates. All this can be done in a index filter plugin.
However the geo coding based on incoming links is the most interesting but most difficult job.
The problem with nutch is that we haven't the chance to add meta data to the web db. This is one of the feature I really would love to see.
Cache such meta data e.g. in a database does not scale and slow down things very much.
I was discussing this with Doug (OS Wizard 2004 conference) and I clearly understand that this feature is very difficult and will dramatically slow down webdb.
However I strongly believe that the possibility to add meta data to web db is one major step.
Beside the geo coding based on geo position of incoming links we can use meta data for tracking update intervals of web-pages for better fetch lists and a set of other great functionalities.
May with the map reduce port we can introduce flexible meta data to web db as well as it is flexible in the index today.
Stefan
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
