[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428583#comment-13428583 ] Chris Male commented on LUCENE-3616: With all the various typed XYZField implementations we have now, what do we see as the role of Field? Is it just serving as a parent class to the implementations or do we expect users will be using it too? Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) {quote} which is due to the using Textfield.TYPE_STORED when using a TokenStream. Since this is an illegal combination, we should throw an exception upon construction of the Field, not later when actually trying to do the indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428591#comment-13428591 ] Robert Muir commented on LUCENE-3616: - Chris: that's a good point. The current design seems to be that Field can do everything and the others are simply sugar on top. Personally I think this is confusing and error-prone. thats why i wrote such a huge test, but its silly. In my opinion if i have a ShortDocValuesField, it shouldnt have a setReader method :) Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) {quote} which is due to the using Textfield.TYPE_STORED when using a TokenStream. Since this is an illegal combination, we should throw an exception upon construction of the Field, not later when actually trying to do the indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For
[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428594#comment-13428594 ] Chris Male commented on LUCENE-3616: bq. In my opinion if i have a ShortDocValuesField, it shouldnt have a setReader method Agreed. The setABC() methods are extremely confusing and add another level of validation (using your example, we have to validate that you're not setting a Reader on a NumericField). Perhaps we can re-arrange this a little. If we genuinely feel there there are use cases out there that we haven't covered with the typed impls and that we don't want to cover, then why not make a GenericField or something, which is abstract and accepts just name, FieldType and maybe an Object value. We can then emphasis in documentation that it is expert only, should only be subclassed in the extremely rare situations that our typed impls are insufficient, and won't be validated so buyer-beware kind of thing. We can then gut Field down to a very simple abstract class / interface, and promote our typed impls to being 1st class and the recommended entry points for users. Of course if we feel we have provided adequate support through the typed impls, then we can skip straight to the gutting. Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199) at
[jira] [Updated] (LUCENE-4216) Token X exceeds length of provided text sized X
[ https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ibrahim updated LUCENE-4216: Attachment: myApp.zip Please find the attached Test case Token X exceeds length of provided text sized X --- Key: LUCENE-4216 URL: https://issues.apache.org/jira/browse/LUCENE-4216 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0-ALPHA Environment: Windows 7, jdk1.6.0_27 Reporter: Ibrahim Attachments: myApp.zip I'm facing this exception: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم exceeds length of provided text sized 170 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at classes.myApp$16$1.run(myApp.java:1508) I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug or it is something wrong in my code with v4. The code that im using: final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query)); ... final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, analyzer); final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get(Line), false, 10); Please note that this is working fine with v3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4216) Token X exceeds length of provided text sized X
[ https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4216. - Resolution: Not A Problem The bugs are in your custom tokenizer. I would recommend looking at lucene-test-framework.jar (especially BaseTokenStreamTestCase) and writing some tests for it. Problems I see at a glance: * it doesn't implement reset(), so its not safe at all. This is the main reason it doesn't work for you in 4.0, because Analysis reuse is mandatory and it doesn't reset its state. * it doesn't implement end(), so multi-valued fields wont work * it doesn't call correctOffset(), so charfilters won't work * it removes tashkeel in the tokenizer itself without adjusting offsets, thats unsafe. Really you can fix this easily, by: 1. instead of extending Tokenizer, extend CharTokenizer and implement isTokenChar via isArabicChar. Or just use StandardTokenizer, it tokenizes arabic just fine. 2. instead of removing tashkeel in your tokenizer itself with your pattern ([\u0650\u064D\u064E\u064B\u064F\u064C\u0652\u0651]), just pass that pattern to PatternReplaceFilter. Token X exceeds length of provided text sized X --- Key: LUCENE-4216 URL: https://issues.apache.org/jira/browse/LUCENE-4216 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0-ALPHA Environment: Windows 7, jdk1.6.0_27 Reporter: Ibrahim Attachments: myApp.zip I'm facing this exception: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم exceeds length of provided text sized 170 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at classes.myApp$16$1.run(myApp.java:1508) I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug or it is something wrong in my code with v4. The code that im using: final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query)); ... final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, analyzer); final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get(Line), false, 10); Please note that this is working fine with v3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format
[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-4283: -- Attachment: LUCENE-4283-small-interval-partially.patch LUCENE-4283-small-interval-fully.patch Two patches: tidied some codes, and removed the partially decoding out to see how we improved only with smaller interval. *-fully.patch will refill a whole block of docs when docBuffer is used up, *-partially.patch will only decode an interval of block when necessary. Support more frequent skip with Block Postings Format - Key: LUCENE-4283 URL: https://issues.apache.org/jira/browse/LUCENE-4283 Project: Lucene - Core Issue Type: Improvement Reporter: Han Jiang Priority: Minor Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch This change works on the new bulk branch. Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those dfblockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck. For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428675#comment-13428675 ] Steven Rowe commented on SOLR-1725: --- bq. Maybe something mentioned here http://stackoverflow.com/questions/6558055/is-osgi-fundamentally-incompatible-with-jsr-223-scripting-language-discovery is relevant? Thanks for looking Erik, but I'm not sure if it's relevant. I get an NPE on {{lucene.zones.apache.org}} using [this program|http://pastebin.com/iQEAwE3A] with the OpenJDK VM at {{/usr/local/openjdk6/}} (the default javac/java on that box). Jenkins Java 6 jobs running Ant use this same VM (via {{/home/hudson/tools/java/latest1.6 - openjdk6 - /usr/local/openjdk6}}. I don't get it. How do these tests pass under Ant? I can't see any obviously named jars under {{solr/lib/}} that would provide the javascript engine... Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Assignee: Erik Hatcher Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3708) Add hashcode to ClusterState so that structures built based on the ClusterState can be easily cached.
Mark Miller created SOLR-3708: - Summary: Add hashcode to ClusterState so that structures built based on the ClusterState can be easily cached. Key: SOLR-3708 URL: https://issues.apache.org/jira/browse/SOLR-3708 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3708) Add hashcode to ClusterState so that structures built based on the ClusterState can be easily cached.
[ https://issues.apache.org/jira/browse/SOLR-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3708: -- Attachment: SOLR-3708.patch Add hashcode to ClusterState so that structures built based on the ClusterState can be easily cached. - Key: SOLR-3708 URL: https://issues.apache.org/jira/browse/SOLR-3708 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3708.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3709) Cache the url list created from the ClusterState in CloudSolrServer on each requet.
Mark Miller created SOLR-3709: - Summary: Cache the url list created from the ClusterState in CloudSolrServer on each requet. Key: SOLR-3709 URL: https://issues.apache.org/jira/browse/SOLR-3709 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3709) Cache the url list created from the ClusterState in CloudSolrServer on each requet.
[ https://issues.apache.org/jira/browse/SOLR-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428678#comment-13428678 ] Mark Miller commented on SOLR-3709: --- Part of patch in SOLR-3708 Cache the url list created from the ClusterState in CloudSolrServer on each requet. --- Key: SOLR-3709 URL: https://issues.apache.org/jira/browse/SOLR-3709 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3710) Change CloudSolrServer so that update requests are only sent to leaders by default.
Mark Miller created SOLR-3710: - Summary: Change CloudSolrServer so that update requests are only sent to leaders by default. Key: SOLR-3710 URL: https://issues.apache.org/jira/browse/SOLR-3710 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428681#comment-13428681 ] Jan Høydahl commented on LUCENE-3616: - The commit yesterday (1369196) causes the Group By tab of Solritas to stop working, where we try to group-by a string field. Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) {quote} which is due to the using Textfield.TYPE_STORED when using a TokenStream. Since this is an illegal combination, we should throw an exception upon construction of the Field, not later when actually trying to do the indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format
[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428682#comment-13428682 ] Michael McCandless commented on LUCENE-4283: I added some new tasks to luceneutil (AndHighLow, OrHighLow), and also separated tasks for Low/Med/HighTerm (and same for SpanNear/Phrase queries) so that we can see the impact on the different queries, and so that we actually test skipping (AndHighLow). Then I ran a test w/ the 2nd (non-buggy, partial decode, 32 skipInterval patch): {noformat} TaskQPS base StdDev baseQPS comp StdDev comp Pct diff AndHighLow 631.54 10.72 101.440.70 -84% - -83% AndHighMed 44.850.94 39.310.36 -14% - -9% AndHighHigh 18.390.27 16.160.08 -13% - -10% MedSloppyPhrase 12.150.14 11.270.30 -10% - -3% MedSpanNear9.110.108.580.10 -7% - -3% LowSpanNear5.050.034.780.03 -6% - -4% MedPhrase5.090.104.810.10 -9% - -1% LowPhrase7.800.087.430.07 -6% - -2% HighSloppyPhrase2.130.062.040.06 -10% - 1% LowSloppyPhrase5.280.115.090.15 -8% - 1% HighTerm 22.850.11 22.080.56 -6% - 0% LowTerm 526.193.56 510.539.14 -5% - 0% MedTerm 138.340.51 134.663.58 -5% - 0% HighPhrase3.550.113.460.11 -8% - 3% HighSpanNear1.640.001.600.02 -3% - 0% Fuzzy1 99.113.49 98.912.71 -6% - 6% Fuzzy2 88.313.05 88.192.32 -6% - 6% Respell 77.971.75 78.241.86 -4% - 5% PKLookup 192.611.47 193.471.53 -1% - 2% OrHighMed 25.141.23 25.281.16 -8% - 10% OrHighHigh9.220.479.300.45 -8% - 11% OrHighLow 37.281.79 37.601.75 -8% - 10% Wildcard 67.880.33 69.192.70 -2% - 6% Prefix3 25.670.35 26.251.22 -3% - 8% IntNRQ8.850.029.270.98 -6% - 15% {noformat} I'm confused why AndHighLow got slower... this patch should have lowered the per-skip cost. Support more frequent skip with Block Postings Format - Key: LUCENE-4283 URL: https://issues.apache.org/jira/browse/LUCENE-4283 Project: Lucene - Core Issue Type: Improvement Reporter: Han Jiang Priority: Minor Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch This change works on the new bulk branch. Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those dfblockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck. For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3710) Change CloudSolrServer so that update requests are only sent to leaders by default.
[ https://issues.apache.org/jira/browse/SOLR-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3710: -- Attachment: SOLR-3710.patch Change CloudSolrServer so that update requests are only sent to leaders by default. --- Key: SOLR-3710 URL: https://issues.apache.org/jira/browse/SOLR-3710 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3710.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428681#comment-13428681 ] Jan Høydahl edited comment on LUCENE-3616 at 8/4/12 9:45 PM: - The commit yesterday (1369196) causes the Group By tab of Solritas to stop working, where we try to group-by a string field.: {noformat} INFO: [collection1] webapp=/solr path=/browse params={group=truegroup.field=manu_exactqueryOpts=group} status=500 QTime=44 Aug 4, 2012 11:25:40 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.IllegalArgumentException: You cannot set an index-time boost on an unindexed field, or one that omits norms at org.apache.lucene.document.Field.setBoost(Field.java:382) at org.apache.solr.schema.FieldType.createField(FieldType.java:277) at org.apache.solr.schema.FieldType.createField(FieldType.java:263) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:101) at org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:790) {noformat} Perhaps the setBoost() in Solr's FieldType should be conditional depending on field type: {code:title=FieldType.java line 275} protected IndexableField createField(String name, String val, org.apache.lucene.document.FieldType type, float boost){ Field f = new Field(name, val, type); f.setBoost(boost); return f; } {code} was (Author: janhoy): The commit yesterday (1369196) causes the Group By tab of Solritas to stop working, where we try to group-by a string field. Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at
[jira] [Resolved] (SOLR-3439) Make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-3439. --- Resolution: Fixed Committed r1369433 to trunk and r1369478 to branch_4x Make SolrCell easier to use out of the box -- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 5.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, filetypes.zip Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3439) Make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428685#comment-13428685 ] Jan Høydahl commented on SOLR-3439: --- Any suggestions for what we should tell people to index to test SolrCell? I think the most fun is indexing my own docs folders :) I was thinking instead of bundling some synthetic docs in exampledocs, we could use a dump of the web site/wiki, javadocs or some other real docs? Make SolrCell easier to use out of the box -- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 5.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, filetypes.zip Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428691#comment-13428691 ] Robert Muir commented on LUCENE-3616: - The bug is in grouping, apply a 0.0f boost. {quote} Perhaps the setBoost() in Solr's FieldType should be conditional depending on field type: {quote} No, we should throw exception. Thats the whole point, to not silently discard users boosts when they will have no effect. Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) {quote} which is due to the using Textfield.TYPE_STORED when using a TokenStream. Since this is an illegal combination, we should throw an exception upon construction of the Field, not later when actually trying to do the indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Created] (SOLR-3711) Velocity: Break or truncate long strings in facet output
Jan Høydahl created SOLR-3711: - Summary: Velocity: Break or truncate long strings in facet output Key: SOLR-3711 URL: https://issues.apache.org/jira/browse/SOLR-3711 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Fix For: 4.0, 5.0 In Solritas /browse GUI, if facets contain very long strings (such as content-type tend to do), currently the too long text runs over the main column and it is not pretty. Perhaps inserting a Soft Hyphen shy; (http://en.wikipedia.org/wiki/Soft_hyphen) at position N in very long terms is a solution? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4286. - Resolution: Fixed Fix Version/s: 5.0 4.0 Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams - Key: LUCENE-4286 URL: https://issues.apache.org/jira/browse/LUCENE-4286 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 3.6.1 Reporter: Tom Burton-West Priority: Minor Fix For: 4.0, 5.0 Attachments: LUCENE-4286.patch, LUCENE-4286.patch Add an optional flag to the CJKBigramFilter to tell it to also output unigrams. This would allow indexing of both bigrams and unigrams and at query time the analyzer could analyze queries as bigrams unless the query contained a single Han unigram. As an example here is a configuration a Solr fieldType with the analyzer for indexing with the indexUnigrams flag set and the analyzer for querying without the flag. fieldType name=CJK autoGeneratePhraseQueries=false − analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory indexUnigrams=true han=true/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory han=true/ /analyzer /fieldType Use case: About 10% of our queries that contain Han characters are single character queries. The CJKBigram filter only outputs single characters when there are no adjacent bigrammable characters in the input. This means we have to create a separate field to index Han unigrams in order to address single character queries and then write application code to search that separate field if we detect a single character Han query. This is rather kludgey. With the optional flag, we could configure Solr as above This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter used to allow single word queries (although that uses word n-grams rather than character n-grams.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2
[ https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3707: -- Attachment: SOLR-3707.patch Patch with updated classpath for Eclipse. Anything else needed before commit? Upgrade Solr to Tika 1.2 Key: SOLR-3707 URL: https://issues.apache.org/jira/browse/SOLR-3707 Project: Solr Issue Type: Improvement Components: contrib - LangId, contrib - Solr Cell (Tika extraction) Reporter: Jan Høydahl Assignee: Jan Høydahl Fix For: 4.0, 5.0 Attachments: SOLR-3707.patch, SOLR-3707.patch Tika 1.2 has been released with these improvements: http://tika.apache.org/1.2/index.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format
[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428698#comment-13428698 ] Michael McCandless commented on LUCENE-4283: I tested the -fully patch: {noformat} TaskQPS base StdDev baseQPS comp StdDev comp Pct diff AndHighLow 628.468.28 155.041.42 -75% - -74% LowSpanNear5.070.024.850.10 -6% - -2% MedSpanNear9.120.078.860.22 -5% - 0% OrHighMed 26.161.15 25.532.65 -16% - 12% AndHighMed 44.920.88 43.940.30 -4% - 0% OrHighLow 38.761.70 37.974.03 -16% - 13% OrHighHigh9.570.459.401.02 -16% - 14% HighTerm 22.880.13 22.830.95 -4% - 4% HighSloppyPhrase2.140.102.140.11 -9% - 10% LowSloppyPhrase5.310.225.320.22 -7% - 8% LowPhrase7.850.097.870.21 -3% - 3% HighSpanNear1.650.011.660.04 -2% - 3% Respell 77.701.24 78.142.12 -3% - 4% MedTerm 138.260.52 139.075.52 -3% - 4% PKLookup 193.632.06 195.982.84 -1% - 3% MedSloppyPhrase 12.150.34 12.330.48 -5% - 8% LowTerm 525.124.89 534.89 14.12 -1% - 5% Fuzzy2 87.202.05 89.053.27 -3% - 8% Fuzzy1 97.812.33 99.943.99 -4% - 8% AndHighHigh 18.390.27 19.620.064% - 8% MedPhrase5.090.115.520.330% - 17% Wildcard 67.590.58 73.763.373% - 15% Prefix3 25.510.39 29.541.607% - 23% HighPhrase3.550.124.130.333% - 30% IntNRQ8.790.08 10.671.523% - 40% {noformat} It seems like we are getting some gains for Med/HighPhrase, but AndHighLow is still way off. Support more frequent skip with Block Postings Format - Key: LUCENE-4283 URL: https://issues.apache.org/jira/browse/LUCENE-4283 Project: Lucene - Core Issue Type: Improvement Reporter: Han Jiang Priority: Minor Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch This change works on the new bulk branch. Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those dfblockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck. For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428702#comment-13428702 ] Lance Norskog commented on LUCENE-4286: --- Is this a request by Han language readers? Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams - Key: LUCENE-4286 URL: https://issues.apache.org/jira/browse/LUCENE-4286 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 3.6.1 Reporter: Tom Burton-West Priority: Minor Fix For: 4.0, 5.0 Attachments: LUCENE-4286.patch, LUCENE-4286.patch Add an optional flag to the CJKBigramFilter to tell it to also output unigrams. This would allow indexing of both bigrams and unigrams and at query time the analyzer could analyze queries as bigrams unless the query contained a single Han unigram. As an example here is a configuration a Solr fieldType with the analyzer for indexing with the indexUnigrams flag set and the analyzer for querying without the flag. fieldType name=CJK autoGeneratePhraseQueries=false − analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory indexUnigrams=true han=true/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory han=true/ /analyzer /fieldType Use case: About 10% of our queries that contain Han characters are single character queries. The CJKBigram filter only outputs single characters when there are no adjacent bigrammable characters in the input. This means we have to create a separate field to index Han unigrams in order to address single character queries and then write application code to search that separate field if we detect a single character Han query. This is rather kludgey. With the optional flag, we could configure Solr as above This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter used to allow single word queries (although that uses word n-grams rather than character n-grams.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3691) SimplePostTool: Mode for indexing a web page
[ https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3691: -- Attachment: SOLR-3691.patch New patch: * Fetches pages with GZIP/deflate * Warns if user uses delay 10s * Prints how many new links per level * Normalizes URLs by stripping everything after # SimplePostTool: Mode for indexing a web page Key: SOLR-3691 URL: https://issues.apache.org/jira/browse/SOLR-3691 Project: Solr Issue Type: Bug Components: scripts and tools Reporter: Jan Høydahl Assignee: Jan Høydahl Fix For: 4.0, 5.0 Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch The simple post.jar tool should both show some sample code as well as aid users in testing Solr from the command line. Missing is an easy way to index a web page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428705#comment-13428705 ] Jan Høydahl commented on LUCENE-3616: - Thanks for fixing that one. Think there may be a few more, untested ones, only did a code search, no testing: SearchGroupsResultTransformer # 113,115 TopGroupsResultTransformer # 255,257 GroupedEndResultTransformer # 72 Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) {quote} which is due to the using Textfield.TYPE_STORED when using a TokenStream. Since this is an illegal combination, we should throw an exception upon construction of the Field, not later when actually trying to do the indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (SOLR-3712) Analysis UI page: option to display payloads in text or numerical form
Lance Norskog created SOLR-3712: --- Summary: Analysis UI page: option to display payloads in text or numerical form Key: SOLR-3712 URL: https://issues.apache.org/jira/browse/SOLR-3712 Project: Solr Issue Type: Improvement Components: web gui Reporter: Lance Norskog Priority: Minor In the Analysis page, please add the ability to display payloads as bytes, numbers or strings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3713) Analysis page: add payload data to Schema Browser term display
Lance Norskog created SOLR-3713: --- Summary: Analysis page: add payload data to Schema Browser term display Key: SOLR-3713 URL: https://issues.apache.org/jira/browse/SOLR-3713 Project: Solr Issue Type: Improvement Components: web gui Reporter: Lance Norskog Priority: Minor In the Schema Browser UI, please add a way to display the payloads attached to any terms in the field. And as before, offer a way to display the payloads in byte, number or string form. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4287) solr/example/solr-webapp included in git checkin
Lance Norskog created LUCENE-4287: - Summary: solr/example/solr-webapp included in git checkin Key: LUCENE-4287 URL: https://issues.apache.org/jira/browse/LUCENE-4287 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Lance Norskog Priority: Minor I tried checking in /solr and /lucene without doing an 'ant clean'. It included solr/example/solr-webapp and data directories in solr/example. Is it intended to support checking in without a clean? After all, solr/build and lucene/build and other artifacts are proscribed. I think the top-level .gitignore should include solr/example/solr-webapp. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated LUCENE-2899: -- Attachment: LUCENE-2899.patch New patch for current build system on trunk 4.x. Add OpenNLP Analysis capabilities as a module - Key: LUCENE-2899 URL: https://issues.apache.org/jira/browse/LUCENE-2899 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, opennlp_trunk.patch Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does: * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens) * NamedEntity recognition as a TokenFilter We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position. I'd propose it go under: modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428809#comment-13428809 ] Lance Norskog commented on LUCENE-2899: --- As it turns out, building is still confused: solr/example/solr-webapps comes and goes. This build parks the lucene-analyzer-opennlp jar in solr/contrib/opennlp/lucene-libs. example//solrconfig.xml includes a reference to ../../contrib/opennlp/lib and lucene-libs and .././dist. A jar-of-jars or a fully repacked jar in dist/ is the best way to ship this. Committability status: forbidden api checks fail. checksums and licenses validate. rat-sources validate. Add OpenNLP Analysis capabilities as a module - Key: LUCENE-2899 URL: https://issues.apache.org/jira/browse/LUCENE-2899 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, opennlp_trunk.patch Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does: * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens) * NamedEntity recognition as a TokenFilter We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position. I'd propose it go under: modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org