Hi Manish,

On Sat, Dec 12, 2015 at 6:22 AM, <[email protected]> wrote:

>
> Ian using notch 1.10, I need to index page locale, I could see there is
> plugin available for identifying page language but I need to index locale.
>
>
Well I have a few answers.
1) Take a look at the index-geoip [0] plugin and associated properties
within nutch-default [1]. This will provide you with a rich metadata model
for indexing all sorts of Geographical information. The downside here
though is that this is based off of the IP of the Webserver which we obtain
a WebSocket connection. This is not therefore necessarily the Webpage
locale. Which is not ideal and which provides no guarantee of satisfying
your requirements.
2) Have a look at Apache Any23 [2]. Any23 has extraction capabilities which
will pick up Geo coordinate data if it is structured as Markup. You can try
out the web service at [3]
3) You can check out the Tika GeoTopicParser [4]. This is a bit bulky right
now but may also provide you with interesting results.

hth
Lewis

[0] https://github.com/apache/nutch/tree/trunk/src/plugin/index-geoip
[1]
https://github.com/apache/nutch/blob/trunk/conf/nutch-default.xml#L1482-L1510
[2] http://any23.apache.org
[3] http://any23.org
[4] https://wiki.apache.org/tika/GeoTopicParser

Reply via email to