Thanks For Replying,

My requirement is , I have some pages where there is not language or locale 
info provides in html tag or there attribute.

I think I got by your earlier reply is based on IP location locale can be 
calculated but what this is not the case , I mean what if servers are not 
divided geographycaly and managed centerally.

Thanks
Manish Verma
AML Search
+1 669 224 9924

> On Dec 14, 2015, at 10:22 AM, Lewis John Mcgibbney 
> <[email protected]> wrote:
> 
> Hi Manish,
> 
> On Sat, Dec 12, 2015 at 6:22 AM, <[email protected]> wrote:
> 
>> 
>> Ian using notch 1.10, I need to index page locale, I could see there is
>> plugin available for identifying page language but I need to index locale.
>> 
>> 
> Well I have a few answers.
> 1) Take a look at the index-geoip [0] plugin and associated properties
> within nutch-default [1]. This will provide you with a rich metadata model
> for indexing all sorts of Geographical information. The downside here
> though is that this is based off of the IP of the Webserver which we obtain
> a WebSocket connection. This is not therefore necessarily the Webpage
> locale. Which is not ideal and which provides no guarantee of satisfying
> your requirements.
> 2) Have a look at Apache Any23 [2]. Any23 has extraction capabilities which
> will pick up Geo coordinate data if it is structured as Markup. You can try
> out the web service at [3]
> 3) You can check out the Tika GeoTopicParser [4]. This is a bit bulky right
> now but may also provide you with interesting results.
> 
> hth
> Lewis
> 
> [0] https://github.com/apache/nutch/tree/trunk/src/plugin/index-geoip
> [1]
> https://github.com/apache/nutch/blob/trunk/conf/nutch-default.xml#L1482-L1510
> [2] http://any23.apache.org
> [3] http://any23.org
> [4] https://wiki.apache.org/tika/GeoTopicParser

Reply via email to