pavithra kariyawasam created LUCENE-9044:
--------------------------------------------

             Summary: Currently Lucene doesn't have an analyzer for Sinhala. We 
have built analyzer which consist of language dependent tokenizer, stemming 
algorithm and list of stop words.
                 Key: LUCENE-9044
                 URL: https://issues.apache.org/jira/browse/LUCENE-9044
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 8.3
         Environment: Lucene
            Reporter: pavithra kariyawasam
             Fix For: 5.5.6
         Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
SinhalaTokenizer.java, stopwords.txt

This component is developed based on three main researches.
Lucene did not have component to analyze Sinhala documents. So our intension is 
to fill that space with an Analyzer which can analyze Sinhala documents. 
Sinhala Analyzer has implemented by performing Sinhala morphological analysis. 
Tokenizing the document content precisely, Removing stopwords accordingly and 
converting the terms to its base/root form accurately are the main three 
functionalities of Sinhala Analyzer. These are built by considering the 
grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to