[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059139#comment-13059139 ] Michał Dybizbański commented on LUCENE-2341: Thanks :) > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Fix For: 4.0 > > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, > LUCENE-2341.diff, LUCENE-2341.patch, LUCENE-2341.patch, > morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, > morfologik-stemming-1.5.2.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057119#comment-13057119 ] Dawid Weiss commented on LUCENE-2341: - You do like those pesky .toString() calls, don't you? :) I replaced the code slightly to keep char. sequences only; no need to create new objects. I also changed the impl. a bit to go from the start of the returned list -> theoretically, lemmas should be ordered by probability (in practice it's not the case, but may be in the future). All looks good, committed in. Thanks! > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, > LUCENE-2341.diff, LUCENE-2341.patch, morfologik-fsa-1.5.2.jar, > morfologik-polish-1.5.2.jar, morfologik-stemming-1.5.2.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057034#comment-13057034 ] Dawid Weiss commented on LUCENE-2341: - Thanks Michał. I'll review it later today and commit in if there are no objections. As for the deleted line -- yes, it was intentional; we'll piggyback in this patch unless somebody fixes it earlier, no problem. Dawid > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, > LUCENE-2341.diff, LUCENE-2341.patch, morfologik-fsa-1.5.2.jar, > morfologik-polish-1.5.2.jar, morfologik-stemming-1.5.2.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056442#comment-13056442 ] Dawid Weiss commented on LUCENE-2341: - I've cleaned up the patch, but I'd still address the two TODOs that I left in the code: - lowercasing should be done not at the external filter level, but inside the filter as a fallback IF AND ONLY IF the original sequence is not found in the dictionary. Morfeusz and Morfologik do have uppercase surface forms and do treat them differently (returning uppercase lemmas, for example). A test for this would be nice as well. An example of an uppercase/mixed surface form: AGD, Aaron, Poznania. - I'd expose another attribute with morphosyntactic annotations -- this is something that is there anyway, so why not expose it. I attached a git diff, but it should apply with patch -p1 < ... too. Michał, will you have the time to polish this off? > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, > LUCENE-2341.patch, morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, > morfologik-stemming-1.5.2.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056430#comment-13056430 ] Dawid Weiss commented on LUCENE-2341: - Working on the integration, will provide a final patch before committing. Thanks Michał. > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, > morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, > morfologik-stemming-1.5.0.jar, morfologik-stemming-1.5.2.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056261#comment-13056261 ] Robert Muir commented on LUCENE-2341: - {quote} provided each thread obtains its own TokenStreamComponents through ReusableAnalyzerBase.createComponents (is this always the case ? looking at other filters, thay don't look thread-safe neither ..) {quote} yes, its the case that Analyzer/ReusableAnalyzerBase take care of this with a threadlocal, as long as each thread only needs to use one tokenstream at a time (which is true for all lucene consumers), see: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/analysis/Analyzer.java > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, > morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, > morfologik-stemming-1.5.0.jar, morfologik-stemming-1.5.2.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053079#comment-13053079 ] Dawid Weiss commented on LUCENE-2341: - bq. Dawid, do you think it's reasonable to optimize further and use directly a list returned by IStemmer.lookup (instead of copying with addAll) ? My concern is that (at least in current DictionaryLookup implementation) that list seems to be shared by distinct invocations of the lookup method, which would make the use of a specific IStemmer not applicable in thread-safe code. IStemmer implementations are not thread safe anyway, so there is no problem in reusing that list. In fact, the returned WordData objects are reused internally as well, so you can't store them either (this is done to avoid GC overhead). So yes: I missed that, but you'll need to ensure IStemmer instances are not shared. This can be done in various ways (thread local, etc), but I think the simplest way to do it would be to instantiate PolishStemmer at the MorfologikFilter level. This is cheap (the dictionary is loaded once anyway). You can then create two constructors in the analyzer -- one with PolishStemmer.DICTIONARY and one with the default (I'd suggest MORFOLOGIK). Exposing IStemmer constructor will do more harm than good -- thinking ahead is good, but in this case I don't think there'll be this many people interested in subclassing IStemmer (if anything, they'll plug into Lucene's infrastructure directly). A simple test case spawning 5 or 10 threads in a parallel executor and crunching stems on the same analyzer would also be nice to ensure we have everything correct wrt multithreading, but it's not that crucial if you don't have the time to write it. Thanks! > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, > morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052483#comment-13052483 ] Dawid Weiss commented on LUCENE-2341: - I've just published morfologik 1.5.2, Michał. This comes with two dictionaries (morfologik and morfeusz) that can be used as one (fallback for missing words) or separately, but I would stick to using morfologik as the default dictionary (possibly with an option of using morfeusz?). POS tags have a different notation in these two resources, so mixing both is probably not a good idea. Will you update the patch? Thanks. > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052451#comment-13052451 ] Robert Muir commented on LUCENE-2341: - {quote} Eventually it would be probably sensible to limit the automaton for use in Lucene to store surface forms and lemmas only (no POS tags) and merge both dictionaries into a single automaton... but this can be a future improvement. {quote} or alternatively, you can expose the POS tags for each stem to lucene right, easiest way would be to put it into TypeAttribute (a string), but you could make your own strongly-typed one if thats a better fit. this could be useful for downstream processing. > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052423#comment-13052423 ] Dawid Weiss commented on LUCENE-2341: - One note wrt patch: I would use an explicit pointer over a list of returned WordData entries instead of adding them to a local list: private List stemsAcc = new ArrayList(); Right now you're shifting the internal array on each call unnecessarily (just increase an int ptr instead): + termAtt.setEmpty().append(stemsAcc.remove(0).getStem().toString()); getStem() should also be enough since it's a CharSequence, right? No need for an intermediate String. > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052421#comment-13052421 ] Dawid Weiss commented on LUCENE-2341: - I did some analyses on both dictionaries. {noformat} Number of lines (distict surface forms): 3.662.366 morfologik.utf8 5.086.141 sgjp.utf8 Distinct words (not in both): 2.729.334 unique.utf8 - upper/lower case (morfologik has upper case forms, morfeusz only lower case surface forms) acerze Acerze - very rare or jargon; abszminka abszytowałem acetobakteria acetarsolowi niebombiasto hakatystce hakatystycznościach warzże - differences in spelling; abelard abélard - acronyms and super-short stuff aap aar Dictinct normalized (lowercase): 2.564.366 lowered.utf8 Most of these are very infrequent words or inflection forms. There are minor differences or missing surface forms in both dictionaries, as in here (mz - morfeusz, mk - morfologik): mz> hakersko mz> hakerskość mz> hakerskości mz> hakerskością mz> hakerskościach mz> hakerskościami mz> hakerskościom mk> hakerstw mk> hakerstwa ... mk> hakowałyśmy mk> hakowań mk> hakowaniach mk> hakowaniami mk> hakowaniom mz> hakowatość mz> hakowatości mz> hakowatością mz> hakowatościach mz> hakowatościami mz> hakowatościom {noformat} So... the conclusion is pretty consistent with Zipf's law: both dictionaries have a fairly different coverage, even if they're quite large. We don't have a frequency dictionary for Polish, but I assume most of these surface forms are purely theoretical and occur super-rarely in practice. This said, I think we should use BOTH dictionaries -- after all there's no harm done if we overdo the lemmatization process a little bit, is there? So... my proposal would be this: I'll integrate Morfeusz's dictionary in Morfologik (as an alternative dictionary one can load and use). Eventually it would be probably sensible to limit the automaton for use in Lucene to store surface forms and lemmas only (no POS tags) and merge both dictionaries into a single automaton... but this can be a future improvement. > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052380#comment-13052380 ] Dawid Weiss commented on LUCENE-2341: - I'll take a look at the differences between Morfologik and Morfeusz right now, actually. I'll post the results once I have something. > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052376#comment-13052376 ] Dawid Weiss commented on LUCENE-2341: - Thanks for the contribution, Michał. Robert: the dictionary is licensed under MPL or CC-SA (to be selected by the user depending on one's needs). Do you know which one is preferable over another? Michał: there is also another (much larger) dictionary that has been released recently and comes from the Morfeusz project. http://sgjp.pl/morfeusz/dopobrania.html This dictionary is actually licensed under BSD license, so no legal worries at all. Both dictionaries are nearly identical (they differ slightly in their convention of morphosyntactic annotations) and Morfeusz's dictionary could be compiled into an automaton for use with Morfologik. Which way should we go? What do you think? > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052251#comment-13052251 ] Robert Muir commented on LUCENE-2341: - Sorry, about my second comment i was confusing this with the stuff you have for the morfologik jar itself, which is correct :) What i should have said was, I think we should include this information in the top-level modules/analysis/LICENSE.txt and modules/analysis/NOTICE.txt > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052246#comment-13052246 ] Robert Muir commented on LUCENE-2341: - Hi Michał, This patch looks great! I took a quick glance, here are a couple suggestions: * In the MorfologikFilter, I think we should implement reset(), first calling the superclass reset(), then clearing the stemsAcc list. This ensures that all of the filter's state is cleared before it is reused. Under normal operations, this should not be necessary, but some consumers in Lucene (e.g. LimitTokenCountFilter, and some similar code in the Highlighter), will only partially consume up to some point, then suddenly stop. By clearing this list in reset() we ensure that there is no chance any leftover stems will appear in the next stream. * because the data is licensed under MPL, I think we should explicitly list a hyperlink if possible to the source code used in the NOTICE.txt. I saw you included some wordage in LICENSE.txt but I think this should only say 'XYZ data is under this license, with the actual MPL license text. In the NOTICE.txt we should link to the source code I think... there is some more information on this under the section Category B: Reciprocal Licenses at http://www.apache.org/legal/3party.html > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org