[jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9044:
-
Status: Open  (was: Patch Available)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9044
> URL: https://issues.apache.org/jira/browse/LUCENE-9044
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
> Environment: Lucene
>Reporter: pavithra kariyawasam
>Priority: Major
>  Labels: features
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension 
> is to fill that space with an Analyzer which can analyze Sinhala documents. 
> Sinhala Analyzer has implemented by performing Sinhala morphological 
> analysis. Tokenizing the document content precisely, Removing stopwords 
> accordingly and converting the terms to its base/root form accurately are the 
> main three functionalities of Sinhala Analyzer. These are built by 
> considering the grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9044:
-
Status: Patch Available  (was: Open)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9044
> URL: https://issues.apache.org/jira/browse/LUCENE-9044
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
> Environment: Lucene
>Reporter: pavithra kariyawasam
>Priority: Major
>  Labels: features
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension 
> is to fill that space with an Analyzer which can analyze Sinhala documents. 
> Sinhala Analyzer has implemented by performing Sinhala morphological 
> analysis. Tokenizing the document content precisely, Removing stopwords 
> accordingly and converting the terms to its base/root form accurately are the 
> main three functionalities of Sinhala Analyzer. These are built by 
> considering the grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9044:
-
Affects Version/s: (was: 8.3)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9044
> URL: https://issues.apache.org/jira/browse/LUCENE-9044
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
> Environment: Lucene
>Reporter: pavithra kariyawasam
>Priority: Major
>  Labels: features
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension 
> is to fill that space with an Analyzer which can analyze Sinhala documents. 
> Sinhala Analyzer has implemented by performing Sinhala morphological 
> analysis. Tokenizing the document content precisely, Removing stopwords 
> accordingly and converting the terms to its base/root form accurately are the 
> main three functionalities of Sinhala Analyzer. These are built by 
> considering the grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9044:
-
Review Patch?:   (was: Yes)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9044
> URL: https://issues.apache.org/jira/browse/LUCENE-9044
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
> Environment: Lucene
>Reporter: pavithra kariyawasam
>Priority: Major
>  Labels: features
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension 
> is to fill that space with an Analyzer which can analyze Sinhala documents. 
> Sinhala Analyzer has implemented by performing Sinhala morphological 
> analysis. Tokenizing the document content precisely, Removing stopwords 
> accordingly and converting the terms to its base/root form accurately are the 
> main three functionalities of Sinhala Analyzer. These are built by 
> considering the grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9043) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9043:
-
Status: Patch Available  (was: Open)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9043
> URL: https://issues.apache.org/jira/browse/LUCENE-9043
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 8.3
>Reporter: pavithra kariyawasam
>Priority: Major
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches. 
>  Sinhala Analyzer, as it word implies it is an enhanced software library to 
> analyze documents which are written in Sinhala language. Sinhala Analyzer has 
> implemented by performing Sinhala morphological analysis. Tokenizing the 
> document content precisely, Removing stopwords accordingly and converting the 
> terms to its base/root form accurately are the main three functionalities of 
> Sinhala Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9043) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9043:
-
Status: Open  (was: Patch Available)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9043
> URL: https://issues.apache.org/jira/browse/LUCENE-9043
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 8.3
>Reporter: pavithra kariyawasam
>Priority: Major
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches. 
>  Sinhala Analyzer, as it word implies it is an enhanced software library to 
> analyze documents which are written in Sinhala language. Sinhala Analyzer has 
> implemented by performing Sinhala morphological analysis. Tokenizing the 
> document content precisely, Removing stopwords accordingly and converting the 
> terms to its base/root form accurately are the main three functionalities of 
> Sinhala Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-13 Thread pavithra kariyawasam (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9044:
-
Issue Type: New Feature  (was: Improvement)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> ---
>
> Key: LUCENE-9044
> URL: https://issues.apache.org/jira/browse/LUCENE-9044
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 8.3
> Environment: Lucene
>Reporter: pavithra kariyawasam
>Priority: Major
>  Labels: features
> Fix For: 5.5.6
>
> Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension 
> is to fill that space with an Analyzer which can analyze Sinhala documents. 
> Sinhala Analyzer has implemented by performing Sinhala morphological 
> analysis. Tokenizing the document content precisely, Removing stopwords 
> accordingly and converting the terms to its base/root form accurately are the 
> main three functionalities of Sinhala Analyzer. These are built by 
> considering the grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9043) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-12 Thread pavithra kariyawasam (Jira)
pavithra kariyawasam created LUCENE-9043:


 Summary: Currently Lucene doesn't have an analyzer for Sinhala. We 
have built analyzer which consist of language dependent tokenizer, stemming 
algorithm and list of stop words.
 Key: LUCENE-9043
 URL: https://issues.apache.org/jira/browse/LUCENE-9043
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 8.3
Reporter: pavithra kariyawasam
 Fix For: 5.5.6
 Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
SinhalaTokenizer.java, stopwords.txt

This component is developed based on three main researches. 


 Sinhala Analyzer, as it word implies it is an enhanced software library to 
analyze documents which are written in Sinhala language. Sinhala Analyzer has 
implemented by performing Sinhala morphological analysis. Tokenizing the 
document content precisely, Removing stopwords accordingly and converting the 
terms to its base/root form accurately are the main three functionalities of 
Sinhala Analyzer.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.

2019-11-12 Thread pavithra kariyawasam (Jira)
pavithra kariyawasam created LUCENE-9044:


 Summary: Currently Lucene doesn't have an analyzer for Sinhala. We 
have built analyzer which consist of language dependent tokenizer, stemming 
algorithm and list of stop words.
 Key: LUCENE-9044
 URL: https://issues.apache.org/jira/browse/LUCENE-9044
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 8.3
 Environment: Lucene
Reporter: pavithra kariyawasam
 Fix For: 5.5.6
 Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
SinhalaTokenizer.java, stopwords.txt

This component is developed based on three main researches.
Lucene did not have component to analyze Sinhala documents. So our intension is 
to fill that space with an Analyzer which can analyze Sinhala documents. 
Sinhala Analyzer has implemented by performing Sinhala morphological analysis. 
Tokenizing the document content precisely, Removing stopwords accordingly and 
converting the terms to its base/root form accurately are the main three 
functionalities of Sinhala Analyzer. These are built by considering the 
grammatical rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org