Hi, I cannot say about two mentioned approaches however take a look at Tika CLI with --language option.
Hope it helps, Oleg On Mon, Feb 20, 2012 at 4:44 AM, bing <[email protected]> wrote: > Hi, all, > > I am deploying a multicore solr server runing on Tomcat, where I want to > achieve language detection during index/query. > > Solr3.5.0 has a wrapped Tika API that can do language detection. Currently, > the default behavior of Solr3.5.0 is, every time I index a document, and at > mean time Solr call Tika API to give the result of language detection, i.e. > index and detection happens at the same time. However, I hope I can have > the > language detection result first, and then I decide which core to put the > document, i.e. detection happens before index. > > There seems that I need to do development in either of the following ways: > > 1. I might need to do revision of Solr itself, change the default behavior > of Solr; > 2. Or I might write a Java client outside Solr, call the client through > server (JSP maybe) in index/query. > > Can anyone meeting with similar conditions give some suggestions about the > advantages and disad of the two approaches? Any other alternatives? Thank > you. > > > Best > Bing > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html > Sent from the Solr - User mailing list archive at Nabble.com. >
