Hi,
I have a scenario where ,i need to  convert pdf content to text  and then
index the same at run time .I do not know as to what language the pdf would
be ,in this case which is the best  soln i have with respect the content
field type in the schema where the text content would be indexed to?

That is can i use the default tokenizer for all languages and  since i would
not know the language and hence would not be able to stem the
tokens,how would  this impact search?Is there any other solution for the
same?

Rgds

Reply via email to