[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2013-01-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567639#comment-13567639
 ] 

Kai Gülzau commented on SOLR-3013:
--

http://wiki.apache.org/solr/SolrUIMA is not mentioning these 
analyzers/tokenizers.
Is there any documentation how to use these?

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 4.0-ALPHA

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-04-30 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264695#comment-13264695
 ] 

Tommaso Teofili commented on SOLR-3013:
---

due to the refactoring needed I think it makes sense to have this just in 4.0

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-13 Thread Lance Norskog (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228789#comment-13228789
 ] 

Lance Norskog commented on SOLR-3013:
-

Is this committed?

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-13 Thread Erick Erickson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228805#comment-13228805
 ] 

Erick Erickson commented on SOLR-3013:
--

Well, it's still marked Resolution: unresolved so I assume not.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-13 Thread Yonik Seeley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228809#comment-13228809
 ] 

Yonik Seeley commented on SOLR-3013:


bq. Well, it's still marked Resolution: unresolved so I assume not.

As long as commit messages have the JIRA issue in there, you can just click on 
All to see all commit related activity for the issue.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-13 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228941#comment-13228941
 ] 

Tommaso Teofili commented on SOLR-3013:
---

yes, this is committed but it's not resolved yet as it needs to be adapted to 
3.x as well.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-01 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219906#comment-13219906
 ] 

Tommaso Teofili commented on SOLR-3013:
---

thanks Steven, now fixing

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-01 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219962#comment-13219962
 ] 

Tommaso Teofili commented on SOLR-3013:
---

it should be ok now.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-02-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219615#comment-13219615
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Now that LUCENE-3731 has been resolved I'll proceed with adding the needed 
factories for the Tokenizers in Solr.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-02-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219624#comment-13219624
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Solr factories committed in r1295330

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-02-29 Thread Steven Rowe (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219641#comment-13219641
 ] 

Steven Rowe commented on SOLR-3013:
---

Javadocs errors found on Jenkins, I think related to your commit, Tommaso? - 
from 
[https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12565/consoleText]:

{noformat}
  [javadoc] Constructing Javadoc information...
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21:
 package org.apache.lucene.analysis.uima does not exist
  [javadoc] import 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer;
  [javadoc]   ^
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21:
 package org.apache.lucene.analysis.uima does not exist
  [javadoc] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer;
  [javadoc]   ^
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26:
 package org.apache.lucene.analysis.uima.ae does not exist
  [javadoc] import org.apache.lucene.analysis.uima.ae.AEProvider;
  [javadoc]  ^
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27:
 package org.apache.lucene.analysis.uima.ae does not exist
  [javadoc] import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
  [javadoc]  ^
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51:
 cannot find symbol
  [javadoc] symbol  : class AEProvider
  [javadoc] location: class 
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor
  [javadoc]   private AEProvider aeProvider;
  [javadoc]   ^
  [javadoc] Standard Doclet version 1.6.0
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30:
 warning - Tag @link: reference not found: UIMAAnnotationsTokenizer
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30:
 warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30:
 warning - Tag @link: reference not found: UIMAAnnotationsTokenizer
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30:
 warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer
  [javadoc] Generating 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/org/apache/solr/util/package-summary.html...
  [javadoc] Copying file 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/core/src/java/org/apache/solr/util/doc-files/min-should-match.html
 to directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/org/apache/solr/util/doc-files...
  [javadoc] Generating 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/serialized-form.html...
  [javadoc] Copying file 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/tools/prettify/stylesheet+prettify.css
 to file 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/docs/api/stylesheet+prettify.css...
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30:
 warning - Tag @link: reference not found: UIMAAnnotationsTokenizer
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30:
 warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer
  

[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-31 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196999#comment-13196999
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Considering the needed refactoring to put the tokenizers/analyzers in a 
dedicated Lucene analysis module I think the 'ae' package for creating 
AnalysisEngines should be moved to that module as well, so that there is a 
common mechanism for instantiating AnalysisEngines both in Lucene and Solr.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-30 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195986#comment-13195986
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Chris, Robert, thanks for your comments, I'll integrate your suggestions in a 
new patch.
I agree with the module proposal as this was part of a following 
issue/discussion I'd be going to raise.
Maybe I can create a new issue in Lucene for creating a new module under 
modules/analysis/uima containing just the Lucene UIMA tokenizers and then 
create a new patch for this one which contains only the factories.


 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-30 Thread Chris Male (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195990#comment-13195990
 ] 

Chris Male commented on SOLR-3013:
--

+1, Go for it.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195861#comment-13195861
 ] 

Tommaso Teofili commented on SOLR-3013:
---

If no one objects I'll commit this shortly.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Chris Male (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195906#comment-13195906
 ] 

Chris Male commented on SOLR-3013:
--

Hey Tommaso,

Did a quick glance over the patch.  Couple of things:

- Could UIMATypeAwareAnalyzerTest (and any other Analyzer/Tokenizer tests) use 
BaseTokenStreamTestCase? It has some useful utility methods to verify that your 
Analyzer works as expected
- UIMABaseAnalyzerTest could do the same, and could probably make use of 
newDirectory() etc to handle some of the boilerplate


 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195908#comment-13195908
 ] 

Robert Muir commented on SOLR-3013:
---

in addition to what Chris said: 

* it looks like some correctOffset() etc are missing (these would be detected 
by BaseTokenStreamTestCase.checkRandomData likely)
* the analysis components look as if they might be able to work with lucene 
too... maybe we could refactor the 
  Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? 
And Solr uima module would 
  provide the factories to integrate

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Chris Male (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195909#comment-13195909
 ] 

Chris Male commented on SOLR-3013:
--

{quote}
the analysis components look as if they might be able to work with lucene 
too... maybe we could refactor the
Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? And 
Solr uima module would 
provide the factories to integrate
{quote}

I absolutely agree.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org