[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567639#comment-13567639 ] Kai Gülzau commented on SOLR-3013: -- http://wiki.apache.org/solr/SolrUIMA is not mentioning these analyzers/tokenizers. Is there any documentation how to use these? Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 4.0-ALPHA Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264695#comment-13264695 ] Tommaso Teofili commented on SOLR-3013: --- due to the refactoring needed I think it makes sense to have this just in 4.0 Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228789#comment-13228789 ] Lance Norskog commented on SOLR-3013: - Is this committed? Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228805#comment-13228805 ] Erick Erickson commented on SOLR-3013: -- Well, it's still marked Resolution: unresolved so I assume not. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228809#comment-13228809 ] Yonik Seeley commented on SOLR-3013: bq. Well, it's still marked Resolution: unresolved so I assume not. As long as commit messages have the JIRA issue in there, you can just click on All to see all commit related activity for the issue. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228941#comment-13228941 ] Tommaso Teofili commented on SOLR-3013: --- yes, this is committed but it's not resolved yet as it needs to be adapted to 3.x as well. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219906#comment-13219906 ] Tommaso Teofili commented on SOLR-3013: --- thanks Steven, now fixing Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219962#comment-13219962 ] Tommaso Teofili commented on SOLR-3013: --- it should be ok now. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219615#comment-13219615 ] Tommaso Teofili commented on SOLR-3013: --- Now that LUCENE-3731 has been resolved I'll proceed with adding the needed factories for the Tokenizers in Solr. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219624#comment-13219624 ] Tommaso Teofili commented on SOLR-3013: --- Solr factories committed in r1295330 Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219641#comment-13219641 ] Steven Rowe commented on SOLR-3013: --- Javadocs errors found on Jenkins, I think related to your commit, Tommaso? - from [https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12565/consoleText]: {noformat} [javadoc] Constructing Javadoc information... [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javadoc] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javadoc] ^ [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javadoc] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javadoc] ^ [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26: package org.apache.lucene.analysis.uima.ae does not exist [javadoc] import org.apache.lucene.analysis.uima.ae.AEProvider; [javadoc] ^ [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27: package org.apache.lucene.analysis.uima.ae does not exist [javadoc] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javadoc] ^ [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51: cannot find symbol [javadoc] symbol : class AEProvider [javadoc] location: class org.apache.solr.uima.processor.UIMAUpdateRequestProcessor [javadoc] private AEProvider aeProvider; [javadoc] ^ [javadoc] Standard Doclet version 1.6.0 [javadoc] Building tree for all the packages and classes... [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] Generating /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/org/apache/solr/util/package-summary.html... [javadoc] Copying file /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/core/src/java/org/apache/solr/util/doc-files/min-should-match.html to directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/org/apache/solr/util/doc-files... [javadoc] Generating /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/serialized-form.html... [javadoc] Copying file /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/tools/prettify/stylesheet+prettify.css to file /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/stylesheet+prettify.css... [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196999#comment-13196999 ] Tommaso Teofili commented on SOLR-3013: --- Considering the needed refactoring to put the tokenizers/analyzers in a dedicated Lucene analysis module I think the 'ae' package for creating AnalysisEngines should be moved to that module as well, so that there is a common mechanism for instantiating AnalysisEngines both in Lucene and Solr. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195986#comment-13195986 ] Tommaso Teofili commented on SOLR-3013: --- Chris, Robert, thanks for your comments, I'll integrate your suggestions in a new patch. I agree with the module proposal as this was part of a following issue/discussion I'd be going to raise. Maybe I can create a new issue in Lucene for creating a new module under modules/analysis/uima containing just the Lucene UIMA tokenizers and then create a new patch for this one which contains only the factories. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195990#comment-13195990 ] Chris Male commented on SOLR-3013: -- +1, Go for it. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195861#comment-13195861 ] Tommaso Teofili commented on SOLR-3013: --- If no one objects I'll commit this shortly. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195906#comment-13195906 ] Chris Male commented on SOLR-3013: -- Hey Tommaso, Did a quick glance over the patch. Couple of things: - Could UIMATypeAwareAnalyzerTest (and any other Analyzer/Tokenizer tests) use BaseTokenStreamTestCase? It has some useful utility methods to verify that your Analyzer works as expected - UIMABaseAnalyzerTest could do the same, and could probably make use of newDirectory() etc to handle some of the boilerplate Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195908#comment-13195908 ] Robert Muir commented on SOLR-3013: --- in addition to what Chris said: * it looks like some correctOffset() etc are missing (these would be detected by BaseTokenStreamTestCase.checkRandomData likely) * the analysis components look as if they might be able to work with lucene too... maybe we could refactor the Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? And Solr uima module would provide the factories to integrate Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195909#comment-13195909 ] Chris Male commented on SOLR-3013: -- {quote} the analysis components look as if they might be able to work with lucene too... maybe we could refactor the Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? And Solr uima module would provide the factories to integrate {quote} I absolutely agree. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org