I recommend that you search both this and the
Lucene list. You'll find that this topic has been
discussed many times, and several approaches
have been outlined.

The searchable archives are linked to from here:
http://lucene.apache.org/java/docs/mailinglists.html.

Best
Erick

On Mon, Feb 16, 2009 at 12:42 AM, revathy arun <revas...@gmail.com> wrote:

> Hi,
> I have a scenario where ,i need to  convert pdf content to text  and then
> index the same at run time .I do not know as to what language the pdf would
> be ,in this case which is the best  soln i have with respect the content
> field type in the schema where the text content would be indexed to?
>
> That is can i use the default tokenizer for all languages and  since i
> would
> not know the language and hence would not be able to stem the
> tokens,how would  this impact search?Is there any other solution for the
> same?
>
> Rgds
>

Reply via email to