Madhu,
Analyzer is the magic word here.
Lucene's StandardAnalyzer has a whole grammar to split words into
tokens. There are many more analyzers, most of which are language
specific (e.g. based the Snowball or Porter-stemmers, see contribs or
javadoc of core).
For which language do wish to u
This depends on the analyzer you are using. You can find the standard
analyzers in org.apache.lucene.analysis. To find out what they do, I
recommend the example in Lucene in action in 4.2.3 called
"AnalyzerDemo". If you don't have the book, you can also download the
examples from http://www.manning