date:20120804


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428591#comment-13428591
 ] 

Robert Muir commented on LUCENE-3616:
-

Chris: that's a good point.

The current design seems to be that Field can do everything and the others 
are simply sugar on top.

Personally I think this is confusing and error-prone.
thats why i wrote such a huge test, but its silly.

In my opinion if i have a ShortDocValuesField, it shouldnt have a setReader 
method :)


 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 {quote}
 which is due to the using Textfield.TYPE_STORED when using a TokenStream.  
 Since this is an illegal combination, we should throw an exception upon 
 construction of the Field, not later when actually trying to do the indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For

[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions

2012-08-04 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428594#comment-13428594
 ] 

Chris Male commented on LUCENE-3616:


bq. In my opinion if i have a ShortDocValuesField, it shouldnt have a setReader 
method

Agreed.  The setABC() methods are extremely confusing and add another level of 
validation (using your example, we have to validate that you're not setting a 
Reader on a NumericField).

Perhaps we can re-arrange this a little.  If we genuinely feel there there are 
use cases out there that we haven't covered with the typed impls and that we 
don't want to cover, then why not make a GenericField or something, which is 
abstract and accepts just name, FieldType and maybe an Object value.  We can 
then emphasis in documentation that it is expert only, should only be 
subclassed in the extremely rare situations that our typed impls are 
insufficient, and won't be validated so buyer-beware kind of thing.  

We can then gut Field down to a very simple abstract class / interface, and 
promote our typed impls to being 1st class and the recommended entry points for 
users.

Of course if we feel we have provided adequate support through the typed impls, 
then we can skip straight to the gutting.

 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199)
   at

[jira] [Updated] (LUCENE-4216) Token X exceeds length of provided text sized X

2012-08-04 Thread Ibrahim (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ibrahim updated LUCENE-4216:


Attachment: myApp.zip

Please find the attached Test case

 Token X exceeds length of provided text sized X
 ---

 Key: LUCENE-4216
 URL: https://issues.apache.org/jira/browse/LUCENE-4216
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0-ALPHA
 Environment: Windows 7, jdk1.6.0_27
Reporter: Ibrahim
 Attachments: myApp.zip


 I'm facing this exception:
 org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم 
 exceeds length of provided text sized 170
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
   at classes.myApp$16$1.run(myApp.java:1508)
 I tried to find anything wrong in my code when i start migrating Lucene 3.6 
 to 4.0 without successful. i found similar issues with HTMLStripCharFilter 
 e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm 
 triggering this here to see if there is really a bug or it is something wrong 
 in my code with v4. The code that im using:
 final Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query));
 ...
 final TokenStream tokenStream = 
 TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, 
 analyzer);
 final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, 
 doc.get(Line), false, 10);
 Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4216) Token X exceeds length of provided text sized X

[
https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir resolved LUCENE-4216.
-

Resolution: Not A Problem

The bugs are in your custom tokenizer. I would recommend looking at
lucene-test-framework.jar (especially BaseTokenStreamTestCase) and writing some
tests for it.

Problems I see at a glance:
* it doesn't implement reset(), so its not safe at all. This is the main reason
it doesn't work for you in 4.0, because Analysis reuse is mandatory and it
doesn't reset its state.
* it doesn't implement end(), so multi-valued fields wont work
* it doesn't call correctOffset(), so charfilters won't work
* it removes tashkeel in the tokenizer itself without adjusting offsets, thats
unsafe.

Really you can fix this easily, by:
1. instead of extending Tokenizer, extend CharTokenizer and implement
isTokenChar via isArabicChar. Or just use StandardTokenizer, it tokenizes
arabic just fine.
2. instead of removing tashkeel in your tokenizer itself with your pattern
([\u0650\u064D\u064E\u064B\u064F\u064C\u0652\u0651]), just pass that pattern to
PatternReplaceFilter.

Token X exceeds length of provided text sized X
---

Key: LUCENE-4216
URL: https://issues.apache.org/jira/browse/LUCENE-4216
Project: Lucene - Core
Issue Type: Bug
Components: modules/highlighter
Affects Versions: 4.0-ALPHA
Environment: Windows 7, jdk1.6.0_27
Reporter: Ibrahim
Attachments: myApp.zip

I'm facing this exception:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم
exceeds length of provided text sized 170
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
at classes.myApp$16$1.run(myApp.java:1508)
I tried to find anything wrong in my code when i start migrating Lucene 3.6
to 4.0 without successful. i found similar issues with HTMLStripCharFilter
e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm
triggering this here to see if there is really a bug or it is something wrong
in my code with v4. The code that im using:
final Highlighter highlighter = new Highlighter(new
SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query));
...
final TokenStream tokenStream =
TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line,
analyzer);
final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream,
doc.get(Line), false, 10);
Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-04 Thread Han Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-4283:
--

Attachment: LUCENE-4283-small-interval-partially.patch
LUCENE-4283-small-interval-fully.patch

Two patches: tidied some codes, and removed the partially decoding out to see 
how we improved only with smaller interval. *-fully.patch will refill a whole 
block of docs when docBuffer is used up, *-partially.patch will only decode an 
interval of block when necessary.

 Support more frequent skip with Block Postings Format
 -

 Key: LUCENE-4283
 URL: https://issues.apache.org/jira/browse/LUCENE-4283
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Priority: Minor
 Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, 
 LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, 
 LUCENE-4283-small-interval-partially.patch


 This change works on the new bulk branch.
 Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
 Every time the skipper reaches the last level 0 skip point, we'll have to 
 decode a whole block to read doc/freq data. Also,  a higher level skip list 
 will be created only for those dfblockSize^k, which means for most terms, 
 skipping will just be a linear scan. If we increase current blockSize for 
 better bulk i/o performance, current skip setting will be a bottleneck. 
 For ForPF, the encoded block can be easily splitted if we set 
 skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-04 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428675#comment-13428675
 ] 

Steven Rowe commented on SOLR-1725:
---

bq. Maybe something mentioned here 
http://stackoverflow.com/questions/6558055/is-osgi-fundamentally-incompatible-with-jsr-223-scripting-language-discovery
 is relevant?

Thanks for looking Erik, but I'm not sure if it's relevant.

I get an NPE on {{lucene.zones.apache.org}} using [this 
program|http://pastebin.com/iQEAwE3A] with the OpenJDK VM at 
{{/usr/local/openjdk6/}} (the default javac/java on that box).  Jenkins Java 6 
jobs running Ant use this same VM (via {{/home/hudson/tools/java/latest1.6 - 
openjdk6 - /usr/local/openjdk6}}.

I don't get it.  How do these tests pass under Ant?  I can't see any obviously 
named jars under {{solr/lib/}} that would provide the javascript engine...



 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Assignee: Erik Hatcher
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3708) Add hashcode to ClusterState so that structures built based on the ClusterState can be easily cached.

Mark Miller created SOLR-3708:
-

 Summary: Add hashcode to ClusterState so that structures built 
based on the ClusterState can be easily cached.
 Key: SOLR-3708
 URL: https://issues.apache.org/jira/browse/SOLR-3708
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3708) Add hashcode to ClusterState so that structures built based on the ClusterState can be easily cached.


 [ 
https://issues.apache.org/jira/browse/SOLR-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3708:
--

Attachment: SOLR-3708.patch

 Add hashcode to ClusterState so that structures built based on the 
 ClusterState can be easily cached.
 -

 Key: SOLR-3708
 URL: https://issues.apache.org/jira/browse/SOLR-3708
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3708.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3709) Cache the url list created from the ClusterState in CloudSolrServer on each requet.

Mark Miller created SOLR-3709:
-

 Summary: Cache the url list created from the ClusterState in 
CloudSolrServer on each requet.
 Key: SOLR-3709
 URL: https://issues.apache.org/jira/browse/SOLR-3709
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3709) Cache the url list created from the ClusterState in CloudSolrServer on each requet.


[ 
https://issues.apache.org/jira/browse/SOLR-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428678#comment-13428678
 ] 

Mark Miller commented on SOLR-3709:
---

Part of patch in SOLR-3708

 Cache the url list created from the ClusterState in CloudSolrServer on each 
 requet.
 ---

 Key: SOLR-3709
 URL: https://issues.apache.org/jira/browse/SOLR-3709
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3710) Change CloudSolrServer so that update requests are only sent to leaders by default.

Mark Miller created SOLR-3710:
-

 Summary: Change CloudSolrServer so that update requests are only 
sent to leaders by default.
 Key: SOLR-3710
 URL: https://issues.apache.org/jira/browse/SOLR-3710
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428681#comment-13428681
 ] 

Jan Høydahl commented on LUCENE-3616:
-

The commit yesterday (1369196) causes the Group By tab of Solritas to stop 
working, where we try to group-by a string field.

 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 {quote}
 which is due to the using Textfield.TYPE_STORED when using a TokenStream.  
 Since this is an illegal combination, we should throw an exception upon 
 construction of the Field, not later when actually trying to do the indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-04 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428682#comment-13428682
 ] 

Michael McCandless commented on LUCENE-4283:


I added some new tasks to luceneutil (AndHighLow, OrHighLow), and also
separated tasks for Low/Med/HighTerm (and same for SpanNear/Phrase
queries) so that we can see the impact on the different queries, and
so that we actually test skipping (AndHighLow).

Then I ran a test w/ the 2nd (non-buggy, partial decode, 32
skipInterval patch):

{noformat}
TaskQPS base StdDev baseQPS comp StdDev comp  Pct 
diff
  AndHighLow  631.54   10.72  101.440.70  -84% -  
-83%
  AndHighMed   44.850.94   39.310.36  -14% -   
-9%
 AndHighHigh   18.390.27   16.160.08  -13% -  
-10%
 MedSloppyPhrase   12.150.14   11.270.30  -10% -   
-3%
 MedSpanNear9.110.108.580.10   -7% -   
-3%
 LowSpanNear5.050.034.780.03   -6% -   
-4%
   MedPhrase5.090.104.810.10   -9% -   
-1%
   LowPhrase7.800.087.430.07   -6% -   
-2%
HighSloppyPhrase2.130.062.040.06  -10% -
1%
 LowSloppyPhrase5.280.115.090.15   -8% -
1%
HighTerm   22.850.11   22.080.56   -6% -
0%
 LowTerm  526.193.56  510.539.14   -5% -
0%
 MedTerm  138.340.51  134.663.58   -5% -
0%
  HighPhrase3.550.113.460.11   -8% -
3%
HighSpanNear1.640.001.600.02   -3% -
0%
  Fuzzy1   99.113.49   98.912.71   -6% -
6%
  Fuzzy2   88.313.05   88.192.32   -6% -
6%
 Respell   77.971.75   78.241.86   -4% -
5%
PKLookup  192.611.47  193.471.53   -1% -
2%
   OrHighMed   25.141.23   25.281.16   -8% -   
10%
  OrHighHigh9.220.479.300.45   -8% -   
11%
   OrHighLow   37.281.79   37.601.75   -8% -   
10%
Wildcard   67.880.33   69.192.70   -2% -
6%
 Prefix3   25.670.35   26.251.22   -3% -
8%
  IntNRQ8.850.029.270.98   -6% -   
15%
{noformat}

I'm confused why AndHighLow got slower... this patch should have
lowered the per-skip cost.


 Support more frequent skip with Block Postings Format
 -

 Key: LUCENE-4283
 URL: https://issues.apache.org/jira/browse/LUCENE-4283
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Priority: Minor
 Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, 
 LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, 
 LUCENE-4283-small-interval-partially.patch


 This change works on the new bulk branch.
 Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
 Every time the skipper reaches the last level 0 skip point, we'll have to 
 decode a whole block to read doc/freq data. Also,  a higher level skip list 
 will be created only for those dfblockSize^k, which means for most terms, 
 skipping will just be a linear scan. If we increase current blockSize for 
 better bulk i/o performance, current skip setting will be a bottleneck. 
 For ForPF, the encoded block can be easily splitted if we set 
 skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3710) Change CloudSolrServer so that update requests are only sent to leaders by default.


 [ 
https://issues.apache.org/jira/browse/SOLR-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3710:
--

Attachment: SOLR-3710.patch

 Change CloudSolrServer so that update requests are only sent to leaders by 
 default.
 ---

 Key: SOLR-3710
 URL: https://issues.apache.org/jira/browse/SOLR-3710
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3710.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3616) Illegal Field Configurations should throw exceptions


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428681#comment-13428681
 ] 

Jan Høydahl edited comment on LUCENE-3616 at 8/4/12 9:45 PM:
-

The commit yesterday (1369196) causes the Group By tab of Solritas to stop 
working, where we try to group-by a string field.:

{noformat}
INFO: [collection1] webapp=/solr path=/browse 
params={group=truegroup.field=manu_exactqueryOpts=group} status=500 QTime=44 
Aug 4, 2012 11:25:40 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalArgumentException: You cannot set an index-time 
boost on an unindexed field, or one that omits norms
at org.apache.lucene.document.Field.setBoost(Field.java:382)
at org.apache.solr.schema.FieldType.createField(FieldType.java:277)
at org.apache.solr.schema.FieldType.createField(FieldType.java:263)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:101)
at 
org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:790)
{noformat}

Perhaps the setBoost() in Solr's FieldType should be conditional depending on 
field type:
{code:title=FieldType.java line 275}
  protected IndexableField createField(String name, String val, 
org.apache.lucene.document.FieldType type, float boost){
Field f = new Field(name, val, type);
f.setBoost(boost);
return f;
  }
{code}

  was (Author: janhoy):
The commit yesterday (1369196) causes the Group By tab of Solritas to 
stop working, where we try to group-by a string field.
  
 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at

[jira] [Resolved] (SOLR-3439) Make SolrCell easier to use out of the box


 [ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-3439.
---

Resolution: Fixed

Committed r1369433 to trunk and r1369478 to branch_4x

 Make SolrCell easier to use out of the box
 --

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch, SOLR-3439.patch, 
 SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, 
 SOLR-3439.patch, filetypes.zip


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3439) Make SolrCell easier to use out of the box


[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428685#comment-13428685
 ] 

Jan Høydahl commented on SOLR-3439:
---

Any suggestions for what we should tell people to index to test SolrCell? I 
think the most fun is indexing my own docs folders :) I was thinking instead of 
bundling some synthetic docs in exampledocs, we could use a dump of the web 
site/wiki, javadocs or some other real docs?

 Make SolrCell easier to use out of the box
 --

 Key: SOLR-3439
 URL: https://issues.apache.org/jira/browse/SOLR-3439
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction), Schema and 
 Analysis
Reporter: Jack Krupansky
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: Lincoln-Gettysburg-Address.docx, 
 Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch, SOLR-3439.patch, 
 SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, SOLR-3439.patch, 
 SOLR-3439.patch, filetypes.zip


 Currently, SolrCell is configured to map Tika content (the main body of a 
 document) to the text field which is the indexed-only (not stored) 
 catch-all for default queries. That searches fine, but doesn't show the 
 document content in the results, sometimes leading users to think that 
 something is wrong. Sure, the user can easily add the field (and this is 
 documented), but it would be a better user experience to have such a basic 
 feature work right out of the box without any config editing and without the 
 need for the user to read the fine print in the documentation.
 I propose that we add the content field to the example schema in the 
 section of fields already defined to support SolrCell metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428691#comment-13428691
 ] 

Robert Muir commented on LUCENE-3616:
-

The bug is in grouping, apply a 0.0f boost.

{quote}
Perhaps the setBoost() in Solr's FieldType should be conditional depending on 
field type:
{quote}

No, we should throw exception. Thats the whole point, to not silently discard 
users boosts when they will have no effect.

 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 {quote}
 which is due to the using Textfield.TYPE_STORED when using a TokenStream.  
 Since this is an illegal combination, we should throw an exception upon 
 construction of the Field, not later when actually trying to do the indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:

[jira] [Created] (SOLR-3711) Velocity: Break or truncate long strings in facet output

Jan Høydahl created SOLR-3711:
-

 Summary: Velocity: Break or truncate long strings in facet output
 Key: SOLR-3711
 URL: https://issues.apache.org/jira/browse/SOLR-3711
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
 Fix For: 4.0, 5.0


In Solritas /browse GUI, if facets contain very long strings (such as 
content-type tend to do), currently the too long text runs over the main column 
and it is not pretty.

Perhaps inserting a Soft Hyphen shy; 
(http://en.wikipedia.org/wiki/Soft_hyphen) at position N in very long terms is 
a solution?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams


 [ 
https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4286.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.0

 Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
 -

 Key: LUCENE-4286
 URL: https://issues.apache.org/jira/browse/LUCENE-4286
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 3.6.1
Reporter: Tom Burton-West
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4286.patch, LUCENE-4286.patch


 Add an optional  flag to the CJKBigramFilter to tell it to also output 
 unigrams.   This would allow indexing of both bigrams and unigrams and at 
 query time the analyzer could analyze queries as bigrams unless the query 
 contained a single Han unigram.
 As an example here is a configuration a Solr fieldType with the analyzer for 
 indexing with the indexUnigrams flag set and the analyzer for querying 
 without the flag. 
 fieldType name=CJK autoGeneratePhraseQueries=false
 −
 analyzer type=index
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory indexUnigrams=true 
 han=true/
 /analyzer
 analyzer type=query
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory han=true/
 /analyzer
 /fieldType
 Use case: About 10% of our queries that contain Han characters are single 
 character queries.   The CJKBigram filter only outputs single characters when 
 there are no adjacent bigrammable characters in the input.  This means we 
 have to create a separate field to index Han unigrams in order to address 
 single character queries and then write application code to search that 
 separate field if we detect a single character Han query.  This is rather 
 kludgey.  With the optional flag, we could configure Solr as above  
 This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter 
 used to allow single word queries (although that uses word n-grams rather 
 than character n-grams.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2


 [ 
https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3707:
--

Attachment: SOLR-3707.patch

Patch with updated classpath for Eclipse.

Anything else needed before commit?

 Upgrade Solr to Tika 1.2
 

 Key: SOLR-3707
 URL: https://issues.apache.org/jira/browse/SOLR-3707
 Project: Solr
  Issue Type: Improvement
  Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
Reporter: Jan Høydahl
Assignee: Jan Høydahl
 Fix For: 4.0, 5.0

 Attachments: SOLR-3707.patch, SOLR-3707.patch


 Tika 1.2 has been released with these improvements: 
 http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-04 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428698#comment-13428698
 ] 

Michael McCandless commented on LUCENE-4283:


I tested the -fully patch:
{noformat}
TaskQPS base StdDev baseQPS comp StdDev comp  Pct 
diff
  AndHighLow  628.468.28  155.041.42  -75% -  
-74%
 LowSpanNear5.070.024.850.10   -6% -   
-2%
 MedSpanNear9.120.078.860.22   -5% -
0%
   OrHighMed   26.161.15   25.532.65  -16% -   
12%
  AndHighMed   44.920.88   43.940.30   -4% -
0%
   OrHighLow   38.761.70   37.974.03  -16% -   
13%
  OrHighHigh9.570.459.401.02  -16% -   
14%
HighTerm   22.880.13   22.830.95   -4% -
4%
HighSloppyPhrase2.140.102.140.11   -9% -   
10%
 LowSloppyPhrase5.310.225.320.22   -7% -
8%
   LowPhrase7.850.097.870.21   -3% -
3%
HighSpanNear1.650.011.660.04   -2% -
3%
 Respell   77.701.24   78.142.12   -3% -
4%
 MedTerm  138.260.52  139.075.52   -3% -
4%
PKLookup  193.632.06  195.982.84   -1% -
3%
 MedSloppyPhrase   12.150.34   12.330.48   -5% -
8%
 LowTerm  525.124.89  534.89   14.12   -1% -
5%
  Fuzzy2   87.202.05   89.053.27   -3% -
8%
  Fuzzy1   97.812.33   99.943.99   -4% -
8%
 AndHighHigh   18.390.27   19.620.064% -
8%
   MedPhrase5.090.115.520.330% -   
17%
Wildcard   67.590.58   73.763.373% -   
15%
 Prefix3   25.510.39   29.541.607% -   
23%
  HighPhrase3.550.124.130.333% -   
30%
  IntNRQ8.790.08   10.671.523% -   
40%
{noformat}

It seems like we are getting some gains for Med/HighPhrase, but AndHighLow is 
still way off.

 Support more frequent skip with Block Postings Format
 -

 Key: LUCENE-4283
 URL: https://issues.apache.org/jira/browse/LUCENE-4283
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Priority: Minor
 Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, 
 LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, 
 LUCENE-4283-small-interval-partially.patch


 This change works on the new bulk branch.
 Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
 Every time the skipper reaches the last level 0 skip point, we'll have to 
 decode a whole block to read doc/freq data. Also,  a higher level skip list 
 will be created only for those dfblockSize^k, which means for most terms, 
 skipping will just be a linear scan. If we increase current blockSize for 
 better bulk i/o performance, current skip setting will be a bottleneck. 
 For ForPF, the encoded block can be easily splitted if we set 
 skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams


[ 
https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428702#comment-13428702
 ] 

Lance Norskog commented on LUCENE-4286:
---

Is this a request by Han language readers?



 Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
 -

 Key: LUCENE-4286
 URL: https://issues.apache.org/jira/browse/LUCENE-4286
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 3.6.1
Reporter: Tom Burton-West
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4286.patch, LUCENE-4286.patch


 Add an optional  flag to the CJKBigramFilter to tell it to also output 
 unigrams.   This would allow indexing of both bigrams and unigrams and at 
 query time the analyzer could analyze queries as bigrams unless the query 
 contained a single Han unigram.
 As an example here is a configuration a Solr fieldType with the analyzer for 
 indexing with the indexUnigrams flag set and the analyzer for querying 
 without the flag. 
 fieldType name=CJK autoGeneratePhraseQueries=false
 −
 analyzer type=index
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory indexUnigrams=true 
 han=true/
 /analyzer
 analyzer type=query
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory han=true/
 /analyzer
 /fieldType
 Use case: About 10% of our queries that contain Han characters are single 
 character queries.   The CJKBigram filter only outputs single characters when 
 there are no adjacent bigrammable characters in the input.  This means we 
 have to create a separate field to index Han unigrams in order to address 
 single character queries and then write application code to search that 
 separate field if we detect a single character Han query.  This is rather 
 kludgey.  With the optional flag, we could configure Solr as above  
 This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter 
 used to allow single word queries (although that uses word n-grams rather 
 than character n-grams.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3691) SimplePostTool: Mode for indexing a web page


 [ 
https://issues.apache.org/jira/browse/SOLR-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3691:
--

Attachment: SOLR-3691.patch

New patch:
* Fetches pages with GZIP/deflate
* Warns if user uses delay  10s
* Prints how many new links per level
* Normalizes URLs by stripping everything after #

 SimplePostTool: Mode for indexing a web page
 

 Key: SOLR-3691
 URL: https://issues.apache.org/jira/browse/SOLR-3691
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Reporter: Jan Høydahl
Assignee: Jan Høydahl
 Fix For: 4.0, 5.0

 Attachments: SOLR-3691.patch, SOLR-3691.patch, SOLR-3691.patch


 The simple post.jar tool should both show some sample code as well as aid 
 users in testing Solr from the command line. Missing is an easy way to index 
 a web page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions


[ 
https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428705#comment-13428705
 ] 

Jan Høydahl commented on LUCENE-3616:
-

Thanks for fixing that one. Think there may be a few more, untested ones, only 
did a code search, no testing:
SearchGroupsResultTransformer # 113,115
TopGroupsResultTransformer # 255,257
GroupedEndResultTransformer # 72

 Illegal Field Configurations should throw exceptions
 

 Key: LUCENE-3616
 URL: https://issues.apache.org/jira/browse/LUCENE-3616
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3616.patch


 When working on LUCENE-3615, I came across:
 {quote}
 java.lang.IllegalArgumentException: field field is stored but does not have 
 binaryValue, stringValue nor numericValue
   at 
 org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177)
   at 
 org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119)
   at 
 org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242)
   at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223)
   at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 {quote}
 which is due to the using Textfield.TYPE_STORED when using a TokenStream.  
 Since this is an illegal combination, we should throw an exception upon 
 construction of the Field, not later when actually trying to do the indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (SOLR-3712) Analysis UI page: option to display payloads in text or numerical form

Lance Norskog created SOLR-3712:
---

 Summary: Analysis UI page: option to display payloads in text or 
numerical form
 Key: SOLR-3712
 URL: https://issues.apache.org/jira/browse/SOLR-3712
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Lance Norskog
Priority: Minor


In the Analysis page, please add the ability to display payloads as bytes, 
numbers or strings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3713) Analysis page: add payload data to Schema Browser term display

Lance Norskog created SOLR-3713:
---

 Summary: Analysis page: add payload data to Schema Browser term 
display
 Key: SOLR-3713
 URL: https://issues.apache.org/jira/browse/SOLR-3713
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Lance Norskog
Priority: Minor


In the Schema Browser UI, please add a way to display the payloads attached to 
any terms in the field. And as before, offer a way to display the payloads in 
byte, number or string form.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4287) solr/example/solr-webapp included in git checkin

Lance Norskog created LUCENE-4287:
-

 Summary: solr/example/solr-webapp included in git checkin
 Key: LUCENE-4287
 URL: https://issues.apache.org/jira/browse/LUCENE-4287
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Lance Norskog
Priority: Minor


I tried checking in /solr and /lucene without doing an 'ant clean'. It included 
solr/example/solr-webapp and data directories in solr/example.

Is it intended to support checking in without a clean? After all, solr/build 
and lucene/build and other artifacts are proscribed.

I think the top-level .gitignore should include solr/example/solr-webapp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module


 [ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated LUCENE-2899:
--

Attachment: LUCENE-2899.patch

New patch for current build system on trunk  4.x.

 Add OpenNLP Analysis capabilities as a module
 -

 Key: LUCENE-2899
 URL: https://issues.apache.org/jira/browse/LUCENE-2899
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
 LUCENE-2899.patch, LUCENE-2899.patch, opennlp_trunk.patch


 Now that OpenNLP is an ASF project and has a nice license, it would be nice 
 to have a submodule (under analysis) that exposed capabilities for it. Drew 
 Farris, Tom Morton and I have code that does:
 * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
 would have to change slightly to buffer tokens)
 * NamedEntity recognition as a TokenFilter
 We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
 either payloads (PartOfSpeechAttribute?) on a token or at the same position.
 I'd propose it go under:
 modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module