Antony Bowesman wrote:
Hello,
I'm new to Lucene and wanted some advice on analyzers, stemmers and
language analysis. I've got LIA, so have read it's chapters.
I am writing a framework that needs to be able to index documents from a
range of languages where just the character set of the docu
On Oct 13, 2006, at 3:42 AM, Antony Bowesman wrote:
I am writing a framework that needs to be able to index documents
from a range of languages where just the character set of the
document is known. Has anyone looked at or is using language
analysis to determine the language of a document
Hello Antony,
I have a similar problem. My collection contains mainly German
documents, but some in English and few in French, Spain and Latin. I
know that each language has its own stemming rules.
Language detection is not my domain. But I can imagine it could be
possible to detect the lang
Generally, stemming is not a method for index size reduction even though
that might be a side effect. It is very useful in search however...you would
generally want a search for skiing to also hit ski and skier (I can't spell
so don't get caught up on that). There are lots of those examples...if y
This won't be *really* helpful, but I remember this being discussed at some
length a while ago. You'd be able to see some good info if you searched the
list archive, probably for language
I didn't pay much attention since this isn't something I'm concerned with
lately, so I can't be much real hel
Hello,
I'm new to Lucene and wanted some advice on analyzers, stemmers and language
analysis. I've got LIA, so have read it's chapters.
I am writing a framework that needs to be able to index documents from a range
of languages where just the character set of the document is known. Has anyo