[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944886#comment-13944886 ] Michal Hlavac commented on LUCENE-5356: --- Hi Ahmet, I think this is not good way how to ask quetions like this. Please use lucene's user mailing list. Thanks more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 4.8, 5.0 Attachments: LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944156#comment-13944156 ] Ahmet Arslan commented on LUCENE-5356: -- Hi [~hlavki] , is it possible to use morfologik to convert https://github.com/coltekin/TRmorph to java and create a stem filter? more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 4.8, 5.0 Attachments: LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943049#comment-13943049 ] Michal Hlavac commented on LUCENE-5356: --- Dawin, is it possible to move on with this issue? thanks more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 4.8, 5.0 Attachments: LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943362#comment-13943362 ] Dawid Weiss commented on LUCENE-5356: - Hi Michal. Sorry, it slipped my mind somehow. I'll look at it over the weekend. Thanks for reminding me. more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 4.8, 5.0 Attachments: LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852779#comment-13852779 ] Dawid Weiss commented on LUCENE-5356: - I looked at the patch and wanted to apply it but there are still some showstoppers to me. - property deprecation was not handled the way I mentioned in my previous comment - the default mode should be backwards compatible (no custom dictionary = Polish dictionary), so the test should pass without passing 'pl' as the dictionary too. a custom-dictionary test should be added. - javadocs and comments need to be updated to reflect this change - MorfologikLemmatizer is not needed at all, an IStemmer is enough (this class is a dummy delegate now) - this is not the same: {code} - me.setContextClassLoader(PolishStemmer.class.getClassLoader()); - this.stemmer = new PolishStemmer(); + me.setContextClassLoader(MorfologikLemmatizer.class.getClassLoader()); + this.stemmer = new MorfologikLemmatizer(dict); {code} the context class loader should be left as it was (pointing to PolishStemmer); if the custom dictionary is within that classloader's scope (it should be) it'll be loaded. more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852852#comment-13852852 ] Michal Hlavac commented on LUCENE-5356: --- Ok, I'll try to change what you say. One of base motivation was to remove morfologik-polish from dependecies. It's not backwards compatible but it's more generic. I don't need polish dictionary when I am using e.g. english dictionary. more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852857#comment-13852857 ] Dawid Weiss commented on LUCENE-5356: - You don't need it but it has to be backwards compatible because others may rely on it. So we can't just change how it currently works. Alternatively, you can provide an entirely different filter factory class. more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852860#comment-13852860 ] Michal Hlavac commented on LUCENE-5356: --- Can't we change it even in major version release? more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852864#comment-13852864 ] Dawid Weiss commented on LUCENE-5356: - We could, but it seems like something that could be implemented and backported to the branch as well. I would do it myself, but I don't want to steal your thunder ;) more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838733#comment-13838733 ] Michal Hlavac commented on LUCENE-5356: --- It's similar code to: https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-polish/src/main/java/morfologik/stemming/PolishStemmer.java more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838734#comment-13838734 ] Michal Hlavac commented on LUCENE-5356: --- Another point is that lucene-morfologic doesn't need dependency to morfologic-polish library anymore. It's not included in patch. more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838745#comment-13838745 ] Dawid Weiss commented on LUCENE-5356: - I know it's similar but in PolishStemmer the reason for having multiple delegates was that there actually *were* multiple delegates -- the code now doesn't make much sense and should be fixed there too. more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838022#comment-13838022 ] Dawid Weiss commented on LUCENE-5356: - A quick look at the patch: {code} /** Schema attribute. */ - @Deprecated public static final String DICTIONARY_SCHEMA_ATTRIBUTE = dictionary; {code} We should not un-deprecate this property, especially that its new meaning is different to what it was before. The custom dictionary should be a separate property, with a new semantics. All the logic in MorfologikLemmatizer seems awkward to me: {code} +@Override +public IteratorWordData iterator() { +if (delegate.size() == 1) { +return delegate.get(0).iterator(); +} else { +throw new RuntimeException(No iteration over compound stemmer forms: ++ Arrays.toString(delegate.toArray())); +} +} {code} How can this ever be != 1 if the only place you add a delegate in is in the constructor? more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 5.0, 4.7 Attachments: LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org