[jira] [Updated] (LUCENE-9043) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9043:
-
Status: Patch Available  (was: Open)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9043
> URL: https://issues.apache.org/jira/browse/LUCENE-9043
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 8.3
>Reporter: pavithra kariyawasam
>Priority: Major
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches. 
>  Sinhala Analyzer, as it word implies it is an enhanced software library to 
> analyze documents which are written in Sinhala language. Sinhala Analyzer has 
> implemented by performing Sinhala morphological analysis. Tokenizing the 
> document content precisely, Removing stopwords accordingly and converting the 
> terms to its base/root form accurately are the main three functionalities of 
> Sinhala Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9043) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9043:
-
Status: Open  (was: Patch Available)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9043
> URL: https://issues.apache.org/jira/browse/LUCENE-9043
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 8.3
>Reporter: pavithra kariyawasam
>Priority: Major
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches. 
>  Sinhala Analyzer, as it word implies it is an enhanced software library to 
> analyze documents which are written in Sinhala language. Sinhala Analyzer has 
> implemented by performing Sinhala morphological analysis. Tokenizing the 
> document content precisely, Removing stopwords accordingly and converting the 
> terms to its base/root form accurately are the main three functionalities of 
> Sinhala Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org