[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2014-03-24 Thread Michal Hlavac (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944886#comment-13944886
 ] 

Michal Hlavac commented on LUCENE-5356:
---

Hi Ahmet,
I think this is not good way how to ask quetions like this. Please use lucene's 
user mailing list. Thanks

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 4.8, 5.0

 Attachments:  LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2014-03-22 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944156#comment-13944156
 ] 

Ahmet Arslan commented on LUCENE-5356:
--

Hi [~hlavki] , is it possible to use morfologik to convert 
https://github.com/coltekin/TRmorph to java and create a stem  filter?

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 4.8, 5.0

 Attachments:  LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2014-03-21 Thread Michal Hlavac (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943049#comment-13943049
 ] 

Michal Hlavac commented on LUCENE-5356:
---

Dawin, is it possible to move on with this issue? thanks

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 4.8, 5.0

 Attachments:  LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2014-03-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943362#comment-13943362
 ] 

Dawid Weiss commented on LUCENE-5356:
-

Hi Michal. Sorry, it slipped my mind somehow. I'll look at it over the weekend. 
Thanks for reminding me.

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 4.8, 5.0

 Attachments:  LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-19 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852779#comment-13852779
 ] 

Dawid Weiss commented on LUCENE-5356:
-

I looked at the patch and wanted to apply it but there are still some 
showstoppers to me.
- property deprecation was not handled the way I mentioned in my previous 
comment
- the default mode should be backwards compatible (no custom dictionary = 
Polish dictionary), so the test should pass without passing 'pl' as the 
dictionary too. a custom-dictionary test should be added.
- javadocs and comments need to be updated to reflect this change
- MorfologikLemmatizer is not needed at all, an IStemmer is enough (this class 
is a dummy delegate now)
- this is not the same:
{code}
-  me.setContextClassLoader(PolishStemmer.class.getClassLoader());
-  this.stemmer = new PolishStemmer();
+  me.setContextClassLoader(MorfologikLemmatizer.class.getClassLoader());
+  this.stemmer = new MorfologikLemmatizer(dict);
{code}
the context class loader should be left as it was (pointing to PolishStemmer); 
if the custom dictionary is within that classloader's scope (it should be) 
it'll be loaded.

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-19 Thread Michal Hlavac (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852852#comment-13852852
 ] 

Michal Hlavac commented on LUCENE-5356:
---

Ok, I'll try to change what you say. 
One of base motivation was to remove morfologik-polish from dependecies. It's 
not backwards compatible but it's more generic. I don't need polish dictionary 
when I am using e.g. english dictionary.

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-19 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852857#comment-13852857
 ] 

Dawid Weiss commented on LUCENE-5356:
-

You don't need it but it has to be backwards compatible because others may rely 
on it. So we can't just change how it currently works. Alternatively, you can 
provide an entirely different filter factory class.

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-19 Thread Michal Hlavac (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852860#comment-13852860
 ] 

Michal Hlavac commented on LUCENE-5356:
---

Can't we change it even in major version release?

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-19 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852864#comment-13852864
 ] 

Dawid Weiss commented on LUCENE-5356:
-

We could, but it seems like something that could be implemented and backported 
to the branch as well. I would do it myself, but I don't want to steal your 
thunder ;)

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-04 Thread Michal Hlavac (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838733#comment-13838733
 ] 

Michal Hlavac commented on LUCENE-5356:
---

It's similar code to:
https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-polish/src/main/java/morfologik/stemming/PolishStemmer.java

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-04 Thread Michal Hlavac (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838734#comment-13838734
 ] 

Michal Hlavac commented on LUCENE-5356:
---

Another point is that lucene-morfologic doesn't need dependency to 
morfologic-polish library anymore. It's not included in patch.

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-04 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838745#comment-13838745
 ] 

Dawid Weiss commented on LUCENE-5356:
-

I know it's similar but in PolishStemmer the reason for having multiple 
delegates was that there actually *were* multiple delegates -- the code now 
doesn't make much sense and should be fixed there too.

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration

2013-12-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838022#comment-13838022
 ] 

Dawid Weiss commented on LUCENE-5356:
-

A quick look at the patch:
{code}
   /** Schema attribute. */
-  @Deprecated
   public static final String DICTIONARY_SCHEMA_ATTRIBUTE = dictionary;
{code}

We should not un-deprecate this property, especially that its new meaning is 
different to what it was before. The custom dictionary should be a separate 
property, with a new semantics.

All the logic in MorfologikLemmatizer seems awkward to me:
{code}
+@Override
+public IteratorWordData iterator() {
+if (delegate.size() == 1) {
+return delegate.get(0).iterator();
+} else {
+throw new RuntimeException(No iteration over compound stemmer 
forms: 
++ Arrays.toString(delegate.toArray()));
+}
+}
{code}

How can this ever be != 1 if the only place you add a delegate in is in the 
constructor?

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org