I came across a languageidentifier plugin at PluginCentral while trying to figure out something else. *Maybe *this could be a starting point for you.
http://wiki.apache.org/nutch/PluginCentral 2008/1/16 Volkan Ebil <[EMAIL PROTECTED]>: > url filter will solve the url limitation problem thanks.Is anyone know how > i > can add an if check to the crawl process that allows only the sites that > contains special chars like "ç,ü,ğ".Shoul i study on parse algoritm. > > -- Tired of reading blogs? Listen to your favorite blogs at http://www.blogbard.com !!!!
