date:20111024

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/720/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterReader.testAddIndexesAndDoDeletesThreads

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/test/5/test3469534702tmp/_59_4.tim
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/test/5/test3469534702tmp/_59_4.tim
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:99)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:122)
at 
org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:58)
at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:433)
at 
org.apache.lucene.index.codecs.BlockTreeTermsReader.init(BlockTreeTermsReader.java:119)
at 
org.apache.lucene.index.codecs.pulsing.PulsingCodec.fieldsProducer(PulsingCodec.java:114)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.init(PerFieldCodecWrapper.java:114)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsProducer(PerFieldCodecWrapper.java:182)
at 
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:91)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:112)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:91)
at 
org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:123)
at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:89)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:699)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:84)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:536)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:329)
at 
org.apache.lucene.index.TestIndexWriterReader.testAddIndexesAndDoDeletesThreads(TestIndexWriterReader.java:395)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:610)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51)




Build Log (for compile errors):
[...truncated 1879 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-trunk - Build # 1709 - Failure

2011-10-24 Thread Michael McCandless

I committed fix.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Oct 23, 2011 at 11:54 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-trunk/1709/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting

 Error Message:
 GC overhead limit exceeded

 Stack Trace:
 java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.Integer.toString(Integer.java:329)
        at 
 org.apache.lucene.index.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:147)
        at 
 org.apache.lucene.index.codecs.PostingsConsumer.merge(PostingsConsumer.java:97)
        at 
 org.apache.lucene.index.codecs.TermsConsumer.merge(TermsConsumer.java:114)
        at 
 org.apache.lucene.index.codecs.FieldsConsumer.merge(FieldsConsumer.java:53)
        at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:573)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:136)
        at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3679)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3315)
        at 
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
        at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1969)
        at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1963)
        at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1959)
        at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1566)
        at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1318)
        at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1299)
        at 
 org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting(TestIndexWriterDelete.java:924)
        at 
 org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:610)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51)




 Build Log (for compile errors):
 [...truncated 13042 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java

2011-10-24 Thread Uwe Schindler

Mike,

We have an annotation for this... No assume needed anymore. :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
 Sent: Monday, October 24, 2011 1:00 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1188089 -
 /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterD
 elete.java
 
 Author: mikemccand
 Date: Mon Oct 24 11:00:29 2011
 New Revision: 1188089
 
 URL: http://svn.apache.org/viewvc?rev=1188089view=rev
 Log:
 don't use MemCodec/SimpleText for this test
 
 Modified:
 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe
 lete.java
 
 Modified:
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe
 lete.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/luc
 ene/index/TestIndexWriterDelete.java?rev=1188089r1=1188088r2=118808
 9view=diff
 
 ==
 ---
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe
 lete.java (original)
 +++ lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWr
 +++ iterDelete.java Mon Oct 24 11:00:29 2011
 @@ -23,14 +23,15 @@ import java.util.ArrayList;  import java.util.Collections;
 import java.util.List;  import java.util.Random; -import
 java.util.concurrent.atomic.AtomicInteger;
  import java.util.concurrent.atomic.AtomicBoolean;
 +import java.util.concurrent.atomic.AtomicInteger;
 
  import org.apache.lucene.analysis.*;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.FieldType;
  import org.apache.lucene.document.StringField;
  import org.apache.lucene.document.TextField;
 +import org.apache.lucene.index.codecs.CodecProvider;
  import org.apache.lucene.search.IndexSearcher;
  import org.apache.lucene.search.ScoreDoc;
  import org.apache.lucene.search.TermQuery;
 @@ -896,6 +897,8 @@ public class TestIndexWriterDelete exten
}
 
public void testIndexingThenDeleting() throws Exception {
 +assumeFalse(This test cannot run with Memory codec,
 CodecProvider.getDefault().getFieldCodec(field).equals(Memory));
 +assumeFalse(This test cannot run with SimpleText codec,
 + CodecProvider.getDefault().getFieldCodec(field).equals(SimpleText)
 + );
  final Random r = random;
  Directory dir = newDirectory();
  // note this test explicitly disables payloads



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-24 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133990#comment-13133990
 ] 

Uwe Schindler commented on LUCENE-1536:
---

I will commit this tomorrow, if nobody objects and we will work on further 
issues to improve Weight.scorer() API, CachingWrapperFilter,... There is no 
slowdown, only speedups with room to improve.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536_hack.patch, 
 changes-yonik-uwe.patch, luceneutil.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133994#comment-13133994
 ] 

Robert Muir commented on LUCENE-1536:
-

+1, lets commit this one and make progress here.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536_hack.patch, 
 changes-yonik-uwe.patch, luceneutil.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java

2011-10-24 Thread Michael McCandless

But then I should break test into new class right?  /me was being
lazy... and this test only uses the one field...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Oct 24, 2011 at 7:29 AM, Uwe Schindler u...@thetaphi.de wrote:
 Mike,

 We have an annotation for this... No assume needed anymore. :-)

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
 Sent: Monday, October 24, 2011 1:00 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1188089 -
 /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterD
 elete.java

 Author: mikemccand
 Date: Mon Oct 24 11:00:29 2011
 New Revision: 1188089

 URL: http://svn.apache.org/viewvc?rev=1188089view=rev
 Log:
 don't use MemCodec/SimpleText for this test

 Modified:

 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe
 lete.java

 Modified:
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe
 lete.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/luc
 ene/index/TestIndexWriterDelete.java?rev=1188089r1=1188088r2=118808
 9view=diff
 
 ==
 ---
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe
 lete.java (original)
 +++ lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWr
 +++ iterDelete.java Mon Oct 24 11:00:29 2011
 @@ -23,14 +23,15 @@ import java.util.ArrayList;  import 
 java.util.Collections;
 import java.util.List;  import java.util.Random; -import
 java.util.concurrent.atomic.AtomicInteger;
  import java.util.concurrent.atomic.AtomicBoolean;
 +import java.util.concurrent.atomic.AtomicInteger;

  import org.apache.lucene.analysis.*;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.FieldType;
  import org.apache.lucene.document.StringField;
  import org.apache.lucene.document.TextField;
 +import org.apache.lucene.index.codecs.CodecProvider;
  import org.apache.lucene.search.IndexSearcher;
  import org.apache.lucene.search.ScoreDoc;
  import org.apache.lucene.search.TermQuery;
 @@ -896,6 +897,8 @@ public class TestIndexWriterDelete exten
    }

    public void testIndexingThenDeleting() throws Exception {
 +    assumeFalse(This test cannot run with Memory codec,
 CodecProvider.getDefault().getFieldCodec(field).equals(Memory));
 +    assumeFalse(This test cannot run with SimpleText codec,
 + CodecProvider.getDefault().getFieldCodec(field).equals(SimpleText)
 + );
      final Random r = random;
      Directory dir = newDirectory();
      // note this test explicitly disables payloads



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3509) Add settings to IWC to optimize IDV indices for CPU or RAM respectivly

2011-10-24 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134031#comment-13134031
 ] 

Michael McCandless commented on LUCENE-3509:


I think enabling at the codec impl level makes sense.

But I'd prefer to have the defaulting match what we do for FieldCache, ie 
default to fasterButMoreRAM.

 Add settings to IWC to optimize IDV indices for CPU or RAM respectivly
 --

 Key: LUCENE-3509
 URL: https://issues.apache.org/jira/browse/LUCENE-3509
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3509.patch


 spinnoff from LUCENE-3496 - we are seeing much better performance if required 
 bits for PackedInts are rounded up to a 8/16/32/64. We should add this option 
 to IWC and default to round up ie. more RAM  faster lookups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3501) random sampler is not random (and so facet SamplingWrapperTest occasionally fails)

2011-10-24 Thread Doron Cohen (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3501.
-

Resolution: Fixed

Fix merged to 3x: 1188129.
Thanks Gilad and Shai for helping to fix this.

 random sampler is not random (and so facet SamplingWrapperTest occasionally 
 fails)
 --

 Key: LUCENE-3501
 URL: https://issues.apache.org/jira/browse/LUCENE-3501
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3501.patch


 RandomSample is not random at all:
 It does not even import java.util.Random, and its behavior is deterministic.
 in addition, the test testCountUsingSamping() never retries as it was 
 supposed to (for taking care of the hoped-for randomness).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java

2011-10-24 Thread Uwe Schindler

Thats right, this is still an open issue :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Monday, October 24, 2011 2:43 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1188089 -
 /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterD
 elete.java
 
 But then I should break test into new class right?  /me was being lazy...
and this
 test only uses the one field...
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 On Mon, Oct 24, 2011 at 7:29 AM, Uwe Schindler u...@thetaphi.de wrote:
  Mike,
 
  We have an annotation for this... No assume needed anymore. :-)
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
  Sent: Monday, October 24, 2011 1:00 PM
  To: comm...@lucene.apache.org
  Subject: svn commit: r1188089 -
  /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWr
  iterD
  elete.java
 
  Author: mikemccand
  Date: Mon Oct 24 11:00:29 2011
  New Revision: 1188089
 
  URL: http://svn.apache.org/viewvc?rev=1188089view=rev
  Log:
  don't use MemCodec/SimpleText for this test
 
  Modified:
 
  lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWri
  terDe
  lete.java
 
  Modified:
  lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWri
  terDe
  lete.java
  URL:
  http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apa
  che/luc
 
 ene/index/TestIndexWriterDelete.java?rev=1188089r1=1188088r2=118808
  9view=diff
 
 
  ==
  ---
  lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWri
  terDe
  lete.java (original)
  +++ lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestInde
  +++ xWr iterDelete.java Mon Oct 24 11:00:29 2011
  @@ -23,14 +23,15 @@ import java.util.ArrayList;  import
  java.util.Collections; import java.util.List;  import
  java.util.Random; -import java.util.concurrent.atomic.AtomicInteger;
   import java.util.concurrent.atomic.AtomicBoolean;
  +import java.util.concurrent.atomic.AtomicInteger;
 
   import org.apache.lucene.analysis.*;
   import org.apache.lucene.document.Document;
   import org.apache.lucene.document.FieldType;
   import org.apache.lucene.document.StringField;
   import org.apache.lucene.document.TextField;
  +import org.apache.lucene.index.codecs.CodecProvider;
   import org.apache.lucene.search.IndexSearcher;
   import org.apache.lucene.search.ScoreDoc;
   import org.apache.lucene.search.TermQuery;
  @@ -896,6 +897,8 @@ public class TestIndexWriterDelete exten
     }
 
     public void testIndexingThenDeleting() throws Exception {
  +    assumeFalse(This test cannot run with Memory codec,
  CodecProvider.getDefault().getFieldCodec(field).equals(Memory));
  +    assumeFalse(This test cannot run with SimpleText codec,
  + CodecProvider.getDefault().getFieldCodec(field).equals(SimpleTex
  + t)
  + );
       final Random r = random;
       Directory dir = newDirectory();
       // note this test explicitly disables payloads
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it


 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1536:
--

Attachment: LUCENE-1536.patch

Here the updated patch after some changes in trunk. It also adds missCount 
checks back to Caching*Filters, I lost then during cleanup.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10981 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10981/

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.update.AutoCommitTest

Error Message:
java.lang.AssertionError: directory of test was not closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33)

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: directory of test was not 
closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:469)
at 
org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:527)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:437)




Build Log (for compile errors):
[...truncated 7846 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name


 [ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3526:


Attachment: LUCENE-3526_test.patch

Updated set of tests, I changed TestRegexpRandom2 to sometimes use an empty 
field name for better testing.

this seems to trigger its own problems:
{noformat}
[junit] Testcase: testRegexps(org.apache.lucene.search.TestRegexpRandom2):  
  FAILED
   [junit] Terms are out of order: field= (number 0) lastField= (number -1) 
text= lastText=
   [junit] junit.framework.AssertionFailedError: Terms are out of order: field= 
(number 0) lastField= (number -1) text= lastText=
   [junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149)
   [junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51)
   [junit] at 
org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.add(TermInfosWriter.java:213)
   [junit] at 
org.apache.lucene.index.codecs.preflexrw.PreFlexFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexFieldsWriter.java:192)
   [junit] at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:409)
   [junit] at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92)
{noformat}

I had thought to workaround this original issue with this hack-patch, but i 
still get that fail... perhaps its a bad assert/something unrelated?
{noformat}
Index: src/java/org/apache/lucene/index/codecs/preflex/PreFlexFields.java
===
--- src/java/org/apache/lucene/index/codecs/preflex/PreFlexFields.java  
(revision 1188010)
+++ src/java/org/apache/lucene/index/codecs/preflex/PreFlexFields.java  
(working copy)
@@ -711,7 +711,12 @@
   } else {
 getTermsDict().seekEnum(termEnum, term, true);
   }
-  skipNext = true;
+  if (internedFieldName == ) {
+// hackedy-hack: we aren't actually positioned yet
+skipNext = false;
+  } else {
+skipNext = true;
+  }
 
   unicodeSortOrder = sortTermsByUnicode();
{noformat}

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526_test.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name


 [ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3526:


Attachment: LUCENE-3526_test.patch

ok, here's a patch... all tests pass now.

The assert fail in the writer was a bad assert, we previously had:
{noformat}
  // If there is a field named  (empty string) then we
  // will get 0 on this comparison, yet, it's OK.  But
  // it's not OK if two different field numbers map to
  // the same name.
  if (cmp != 0 || lastFieldNumber != -1)
return cmp;
{noformat}

which is nice, but it doesn't cover the case of empty term PLUS empty string: 
Term(, ). in this case we would fall thru and return 0, which is wrong.

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name


 [ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3526:


Attachment: LUCENE-3526.patch

oops, wrong patch. here is the correct one

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name


[ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134071#comment-13134071
 ] 

Robert Muir commented on LUCENE-3526:
-

I will add an additional test to 3.x for Term(, ) and see if it has any bad 
asserts like this, and add it to the patch.


 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

IndexableField(Type) interfaces, abstract classes and back compat.

2011-10-24 Thread Grant Ingersoll

Hi,

I was perusing trunk code on the way back from Eurocon and noticed the new 
FieldType stuff has some interfaces in it.  In the past we've tried to stick to 
interfaces for only simple ones (i.e. one or two methods that aren't likely to 
change at all) and instead used abstract classes for bigger classes that may be 
subject to change more often.  

On the one side, interfaces are cleaner design wise, but adding new methods 
makes it hard for supporting back compatibility if we wish to add new methods.  
Abstract classes allow for back compat, but they are perhaps a bit less clean 
b/c they often tie an implementation to the broader API.  In the past, we've 
been bitten by interfaces b/c let's face it, we can't predict the future 
(Fieldable is the most notorious -- and this stuff has a very Fieldable feel to 
it -- but there are others, please see the archives for past discussions.)  I 
think in an ideal world, interfaces are kept quite compact and use use multiple 
of them and then you provide a base abstract class that provides most of the 
implementation for most people by implementing said interfaces.  Logically, 
this doesn't always work out.  An alternative is to mark it all as experimental 
and punt for now.

In the end, I just want to make sure we have the discussion about it so that we 
don't find ourselves having to wait until 5.x in order to add a new method to 
one of these interfaces.  Alternatively, perhaps we won't need to at all or 
perhaps we think no one other than core Lucene will implement these.  Just 
trying to avoid past pain and headaches in the future.  


-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexableField(Type) interfaces, abstract classes and back compat.

2011-10-24 Thread Robert Muir

On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote:
 Hi,

 I was perusing trunk code on the way back from Eurocon and noticed the new 
 FieldType stuff has some interfaces in it.  In the past we've tried to stick 
 to interfaces for only simple ones (i.e. one or two methods that aren't 
 likely to change at all) and instead used abstract classes for bigger classes 
 that may be subject to change more often.

I think its good you brought this up Grant.

I wanted to mention this: as far as interfaces versus abstract
classes, in my opinion Lucene was under a false sense of security
before thinking that abstract classes actually solve these back compat
problems.
In fact they can create serious problems like
https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if
someone writes a delegator over an abstract class, its asking for
trouble.
On the other hand, delegators over interfaces are safe because they
(and we) get a compile-time break for the new methods.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexableField(Type) interfaces, abstract classes and back compat.

2011-10-24 Thread Grant Ingersoll


On Oct 24, 2011, at 9:56 AM, Robert Muir wrote:

 On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote:
 Hi,
 
 I was perusing trunk code on the way back from Eurocon and noticed the new 
 FieldType stuff has some interfaces in it.  In the past we've tried to stick 
 to interfaces for only simple ones (i.e. one or two methods that aren't 
 likely to change at all) and instead used abstract classes for bigger 
 classes that may be subject to change more often.
 
 I think its good you brought this up Grant.
 
 I wanted to mention this: as far as interfaces versus abstract
 classes, in my opinion Lucene was under a false sense of security
 before thinking that abstract classes actually solve these back compat
 problems.
 In fact they can create serious problems like
 https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if
 someone writes a delegator over an abstract class, its asking for
 trouble.
 On the other hand, delegators over interfaces are safe because they
 (and we) get a compile-time break for the new methods.

Good point.   Basically, going down this line, are we saying that we would 
still allow new methods on minor versions on Interfaces?  My personal take is 
that if we do, we primarily just need to communicate it ahead of time.  
Ideally, at least one release ahead, but maybe it is just an email.  We just 
want to avoid surprises for people where possible.

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name


[ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134083#comment-13134083
 ] 

Robert Muir commented on LUCENE-3526:
-

There are more serious problems in 3.x here.

* if you create new Field(, ), you get IllegalArgumentException from 
Field's ctor: name and value cannot both be empty
* But there are tons of other ways to index an empty term for the empty field 
(for example initially make it garbage then .setValue(), or via 
tokenstream).
* If you do this, and you have assertions enabled, you will trip the same 
assert bug i fixed in trunk here.
* If you don't have assertions enabled, you will create a corrupt index: 
test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num docs 
deleted 0]

So we need to figure out what the semantics should be for 3.x. is Term(, ) 
allowed or not?

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: IndexableField(Type) interfaces, abstract classes and back compat.

2011-10-24 Thread Uwe Schindler

Hi,

Beyond that, we should add final modifier to all methods that simply
delegate to other methods from the same class. This is another trap when
trying to be backwards compatible. An easy-to-use method that simply takes
some defaults for specific parameters of a telescopic other one should
always be final. If somebody subclasses, he can only overwrite the large
extended telescope and don't need to take care of the easy-to-use methods. I
revised lots of classes for that, but there are still some worse cases e.g.
in IndexReader.

If we don't make such delegating methods final, we also have the same
backwards compatibility problem like with tokenStream or
FilteredIndexReader.

This is just seen to be as an additional comments about stuff that easily
goes wrong when making APIs. Make everything final that's not intended to be
modified in subclasses (or make the whole class final). And most methods are
not needed to be overridden, only open them up for subclassing when there is
*really* a use case! We can remove final later easily, but initially we
should prevent subclassing. This would remove lot's of VirtualMethod usages
in 3.x (my abstraction of the TokenStream backwards layer).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Monday, October 24, 2011 4:02 PM
 To: dev@lucene.apache.org
 Subject: Re: IndexableField(Type) interfaces, abstract classes and back
compat.
 
 
 On Oct 24, 2011, at 9:56 AM, Robert Muir wrote:
 
  On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org
 wrote:
  Hi,
 
  I was perusing trunk code on the way back from Eurocon and noticed the
 new FieldType stuff has some interfaces in it.  In the past we've tried to
stick to
 interfaces for only simple ones (i.e. one or two methods that aren't
likely to
 change at all) and instead used abstract classes for bigger classes that
may be
 subject to change more often.
 
  I think its good you brought this up Grant.
 
  I wanted to mention this: as far as interfaces versus abstract
  classes, in my opinion Lucene was under a false sense of security
  before thinking that abstract classes actually solve these back compat
  problems.
  In fact they can create serious problems like
  https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if
  someone writes a delegator over an abstract class, its asking for
  trouble.
  On the other hand, delegators over interfaces are safe because they
  (and we) get a compile-time break for the new methods.
 
 Good point.   Basically, going down this line, are we saying that we would
still
 allow new methods on minor versions on Interfaces?  My personal take is
that
 if we do, we primarily just need to communicate it ahead of time.
Ideally, at
 least one release ahead, but maybe it is just an email.  We just want to
avoid
 surprises for people where possible.
 
 -Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment

2011-10-24 Thread James Dyer (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134121#comment-13134121
]

James Dyer commented on SOLR-2848:
--

Robert,

I think your first suggestion (moving configuration and response formatting out
of the *SolrSpellCheck) is desirable and doable, but I wanted to keep this
issue focused on increasing test coverage and to make DirectSolrSpellChecker
mirror what AbstractLuceneSpellChecker already does so that it can pass.

Obviously, if every SpellChecker plug-in implemented or extended something that
had a getStringDistance or getAccuracy method then we wouldn't need to do
instanceof and cast. Once again, a big structural change like this seems
inappropriate in a bug fix, especially as we are not introducing these checks
for the first time. This is a long-standing problem.

It looks to me like internal levenshtein is just a dummy class designed to
technically meet the api requirements while not actually doing anything. But
SpellCheckComponent.finishStage() needs to be able to get the StringDistance
impl that was used to generate suggestions during the first stage, then
re-compute distances using its getDistance() method. This is how it can know
how to order the varying suggestions from multiple shards after-the-fact. I
see from the notes in DirectSpellChecker that using the internal
StringDistance yields performance improvements over using a pluggable s.d. I
did not look enough to determine if internal levenshtein could be modified to
re-compute these internally-generated distance calculations and be usable
externally, without sacrificing the performance gain. If possible, this would
probably be our best bet, eliminating the Exception hack and any possible
discrepancies using 2 different s.d. classes would cause. Do you agree?

DirectSolrSpellChecker fails in distributed environment
---

Key: SOLR-2848
URL: https://issues.apache.org/jira/browse/SOLR-2848
Project: Solr
Issue Type: Bug
Components: SolrCloud, spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
Fix For: 4.0

Attachments: SOLR-2848.patch

While working on SOLR-2585, it was brought to my attention that
DirectSolrSpellChecker has no test coverage involving a distributed
environment. Here I am adding a random element to
DistributedSpellCheckComponentTest to alternate between the IndexBased and
Direct spell checkers. Doing so revealed bugs in using
DirectSolrSpellChecker in a distributed environment. The fixes here roughly
mirror those made to the IndexBased spell checker with SOLR-2083.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment


[ 
https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134123#comment-13134123
 ] 

Robert Muir commented on SOLR-2848:
---

{quote}
But SpellCheckComponent.finishStage() needs to be able to get the 
StringDistance impl that was used to generate suggestions during the first 
stage, then re-compute distances using its getDistance() method.
{quote}

This is the part i dont understand... we already have the scores in the 
results, so why recompute?

 DirectSolrSpellChecker fails in distributed environment
 ---

 Key: SOLR-2848
 URL: https://issues.apache.org/jira/browse/SOLR-2848
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2848.patch


 While working on SOLR-2585, it was brought to my attention that 
 DirectSolrSpellChecker has no test coverage involving a distributed 
 environment.  Here I am adding a random element to 
 DistributedSpellCheckComponentTest to alternate between the IndexBased and 
 Direct spell checkers.  Doing so revealed bugs in using 
 DirectSolrSpellChecker in a distributed environment.  The fixes here roughly 
 mirror those made to the IndexBased spell checker with SOLR-2083.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2804) Logging error causes entire DIH process to fail

2011-10-24 Thread Adam Neal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134141#comment-13134141
 ] 

Adam Neal commented on SOLR-2804:
-

Are you using the multithreading in the DIH? I have the same problem but when I 
remove the maxthreads attribute the indexing completes successfully.

 Logging error causes entire DIH process to fail
 ---

 Key: SOLR-2804
 URL: https://issues.apache.org/jira/browse/SOLR-2804
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
 Environment: java version 1.6.0_26
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
 Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)
 Model Name:   MacBook Pro
   Model Identifier:   MacBookPro8,2
   Processor Name: Intel Core i7
   Processor Speed:2.2 GHz
   Number of Processors:   1
   Total Number of Cores:  4
   L2 Cache (per Core):256 KB
   L3 Cache:   6 MB
   Memory: 4 GB
 System Software Overview:
   System Version: Mac OS X 10.6.8 (10K549)
   Kernel Version: Darwin 10.8.0
Reporter: Pulkit Singhal
  Labels: dih
   Original Estimate: 48h
  Remaining Estimate: 48h

 SEVERE: Full Import failed:java.lang.ClassCastException:
 java.util.ArrayList cannot be cast to java.lang.String
at org.apache.solr.common.util.NamedList.getName(NamedList.java:127)
at org.apache.solr.common.util.NamedList.toString(NamedList.java:263)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
 org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188)
at 
 org.apache.solr.handler.dataimport.SolrWriter.close(SolrWriter.java:57)
at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:265)
at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:372)
at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:440)
at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:421)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment

2011-10-24 Thread James Dyer (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134142#comment-13134142
]

James Dyer commented on SOLR-2848:
--

finishStage() is being run on the Master Shard. It receives spelling results
from all of the shards and then has to integrate them together. Solr doesn't
return the scores with spelling suggestions back to the client. I suppose the
authors of SOLR-785 could have modified the response Solr sends back to its
clients. However, it probably seemed inexpensive enough to just re-compute the
string distance after-the-fact (indeed Lucene In Action 2nd ed sect 8.5.3
mentions doing the same thing, so I take it this is a common thing to do?).
The problem now we have is we've got a spellchecker that doesn't fully
implement a StringDistance all the time. I'd imagine the best bet is to try
and change that. Possibly, the slight discrepancies our current patch leave
are not serious enough to fix? If neither option is good, then we'd probably
have to modify the solr response to include scores.

DirectSolrSpellChecker fails in distributed environment
---

Attachments: SOLR-2848.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment

[
https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134150#comment-13134150
]

Robert Muir commented on SOLR-2848:
---

{quote}
I'd imagine the best bet is to try and change that.
{quote}

OK, Lets do this, such that the distance impl is a real one computing
levenshtein like Lucene does and not a fake one. Then its one less hack.

Want to open a LUCENE issue for this? I can help if you want.

DirectSolrSpellChecker fails in distributed environment
---

Attachments: SOLR-2848.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment


[ 
https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134160#comment-13134160
 ] 

Robert Muir commented on SOLR-2848:
---

{quote}
The problem now we have is we've got a spellchecker that doesn't fully 
implement a StringDistance all the time.
{quote}

we should fix that hack as i mentioned I think (its just a hack, caused by me, 
sorry!).

But then we should think about how to make sure that SpellChecker subclasses 
always work correctly distributed if we aren't going to change the wire format. 
Rather than instanceof/StringDistance maybe we could add a merge() method that 
would be more general?

 DirectSolrSpellChecker fails in distributed environment
 ---

 Key: SOLR-2848
 URL: https://issues.apache.org/jira/browse/SOLR-2848
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2848.patch


 While working on SOLR-2585, it was brought to my attention that 
 DirectSolrSpellChecker has no test coverage involving a distributed 
 environment.  Here I am adding a random element to 
 DistributedSpellCheckComponentTest to alternate between the IndexBased and 
 Direct spell checkers.  Doing so revealed bugs in using 
 DirectSolrSpellChecker in a distributed environment.  The fixes here roughly 
 mirror those made to the IndexBased spell checker with SOLR-2083.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment

2011-10-24 Thread James Dyer (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134182#comment-13134182
 ] 

James Dyer commented on SOLR-2848:
--

{quote}
OK, Lets do this, such that the distance impl is a real one computing 
levenshtein like Lucene does
{quote}
I opened LUCENE-3527.

{quote}
Rather than instanceof/StringDistance maybe we could add a merge() method that 
would be more general?
{quote}
Are you thinking each *SolrSpellChecker should have a merge() that 
finishStage() calls?  This sounds reasonable to me.

 DirectSolrSpellChecker fails in distributed environment
 ---

 Key: SOLR-2848
 URL: https://issues.apache.org/jira/browse/SOLR-2848
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2848.patch


 While working on SOLR-2585, it was brought to my attention that 
 DirectSolrSpellChecker has no test coverage involving a distributed 
 environment.  Here I am adding a random element to 
 DistributedSpellCheckComponentTest to alternate between the IndexBased and 
 Direct spell checkers.  Doing so revealed bugs in using 
 DirectSolrSpellChecker in a distributed environment.  The fixes here roughly 
 mirror those made to the IndexBased spell checker with SOLR-2083.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment


[ 
https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134187#comment-13134187
 ] 

Robert Muir commented on SOLR-2848:
---

Yeah, this way a spellchecker can decide how it merges results (since we arent 
going to put any 'score' in the wire format or require it).

So for example, the default impl of AbstractLuceneSpellChecker's merge() would 
use getComparator and such (we can just put this in the abstract class)



 DirectSolrSpellChecker fails in distributed environment
 ---

 Key: SOLR-2848
 URL: https://issues.apache.org/jira/browse/SOLR-2848
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2848.patch


 While working on SOLR-2585, it was brought to my attention that 
 DirectSolrSpellChecker has no test coverage involving a distributed 
 environment.  Here I am adding a random element to 
 DistributedSpellCheckComponentTest to alternate between the IndexBased and 
 Direct spell checkers.  Doing so revealed bugs in using 
 DirectSolrSpellChecker in a distributed environment.  The fixes here roughly 
 mirror those made to the IndexBased spell checker with SOLR-2083.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE

2011-10-24 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134192#comment-13134192
 ] 

Michael McCandless commented on LUCENE-3183:


I think the hack is actually correct, but maybe change it to check 
termEnum.position = 0?

So this was a case we missed from LUCENE-3183 (maybe there are more!?), where 
we decided for the corner case of empty field and term text, the caller must 
handle that the returned enum is unpositioned (in exchange for not adding an if 
per next).

And maybe add the same comment about LUCENE-3183 on top of that logic?

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
Assignee: Michael McCandless
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)

[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name

2011-10-24 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134205#comment-13134205
 ] 

Michael McCandless commented on LUCENE-3526:


I think the hack is actually correct, but maybe change it to check 
termEnum.position = 0?

So this was a case we missed from LUCENE-3183 (maybe there are more!?), where 
we decided for the corner case of empty field and term text, the caller must 
handle that the returned enum is unpositioned (in exchange for not adding an if 
per next).

And maybe add the same comment about LUCENE-3183 on top of that logic?

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE

2011-10-24 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134206#comment-13134206
 ] 

Michael McCandless commented on LUCENE-3183:


Woops, above comment was meant for LUCENE-3526.

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
Assignee: Michael McCandless
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit] at

[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name

2011-10-24 Thread Michael McCandless (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3526:
---

Attachment: LUCENE-3526.patch

Patch, putting back the safer-but-if-per-scan from LUCENE-3183; this fixed 
another test failure.

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name


[ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134211#comment-13134211
 ] 

Robert Muir commented on LUCENE-3526:
-

+1, i'm running the tests a lot, this seems solid.


 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name


[ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134238#comment-13134238
 ] 

Robert Muir commented on LUCENE-3526:
-

I committed this, thanks Mike!

Now to figure out wtf to do for 3.x...

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3526.patch, LUCENE-3526.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexableField(Type) interfaces, abstract classes and back compat.

2011-10-24 Thread Michael McCandless

Thanks for raising this Grant.

My feeling is we can stick with an interface here, and mark it
@experimental.

This is a very-low-level-very-expert API.

Most users will use the sugar field impls (TextField, BinaryField,
NumericField, etc.).  Expert users will build their own FieldType and
pass that to Field.  Waaay expert users will skip our user-space
Document/Field/FieldType entirely and code directly to this low level
minimal indexing API.

For example maybe their app sucks streamed bytes off a socket, parses
out fields and immediately hands that data off to IndexWriter for
indexing (never making FieldTypes/Fields/Documents).

So I think such way-expert users can handle hard breaks on the API,
and would likely want to see the hard break so they know they're
something to fix / new to add to indexing.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Oct 24, 2011 at 10:02 AM, Grant Ingersoll gsing...@apache.org wrote:

 On Oct 24, 2011, at 9:56 AM, Robert Muir wrote:

 On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote:
 Hi,

 I was perusing trunk code on the way back from Eurocon and noticed the new 
 FieldType stuff has some interfaces in it.  In the past we've tried to 
 stick to interfaces for only simple ones (i.e. one or two methods that 
 aren't likely to change at all) and instead used abstract classes for 
 bigger classes that may be subject to change more often.

 I think its good you brought this up Grant.

 I wanted to mention this: as far as interfaces versus abstract
 classes, in my opinion Lucene was under a false sense of security
 before thinking that abstract classes actually solve these back compat
 problems.
 In fact they can create serious problems like
 https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if
 someone writes a delegator over an abstract class, its asking for
 trouble.
 On the other hand, delegators over interfaces are safe because they
 (and we) get a compile-time break for the new methods.

 Good point.   Basically, going down this line, are we saying that we would 
 still allow new methods on minor versions on Interfaces?  My personal take is 
 that if we do, we primarily just need to communicate it ahead of time.  
 Ideally, at least one release ahead, but maybe it is just an email.  We just 
 want to avoid surprises for people where possible.

 -Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms

2011-10-24 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134247#comment-13134247
 ] 

Uwe Schindler commented on LUCENE-3473:
---

Robert: In your patch is an additional test for CheckIndex on the old indexes. 
This is implicitely already done by: testSearchOldIndex, which calls Testutil's 
checkindex as first step. So this test is duplicate and slows down, right?

 CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
 ---

 Key: LUCENE-3473
 URL: https://issues.apache.org/jira/browse/LUCENE-3473
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.4, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch


 Just glancing at the code it seems to sorta do this check, but only in the 
 hasOrd==true case maybe (which seems to be testing something else)?
 It would be nice to verify this also for terms dicts that dont support ord.
 we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and 
 preflex

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms


[ 
https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134248#comment-13134248
 ] 

Robert Muir commented on LUCENE-3473:
-

Uwe yes: i was actually adding this test only for debugging... I'll remove it 
(it does not give us any additional test coverage)

 CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
 ---

 Key: LUCENE-3473
 URL: https://issues.apache.org/jira/browse/LUCENE-3473
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.4, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch


 Just glancing at the code it seems to sorta do this check, but only in the 
 hasOrd==true case maybe (which seems to be testing something else)?
 It would be nice to verify this also for terms dicts that dont support ord.
 we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and 
 preflex

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms


 [ 
https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3473:


Attachment: LUCENE-3473.patch

updated patch, now that LUCENE-3526 is fixed, all tests passed.

* removed the useless TestBackwardsCompatibility test (i was just debugging)
* fixed TestRollingUpdates to not combine PreFlexCodec and MemoryCodec in 
PerFieldCodecWrapper (this is stupid, and causes my assert to trip)

 CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
 ---

 Key: LUCENE-3473
 URL: https://issues.apache.org/jira/browse/LUCENE-3473
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.4, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch, 
 LUCENE-3473.patch


 Just glancing at the code it seems to sorta do this check, but only in the 
 hasOrd==true case maybe (which seems to be testing something else)?
 It would be nice to verify this also for terms dicts that dont support ord.
 we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and 
 preflex

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11003 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11003/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest.testMultiCore

Error Message:
Index directory exists after core unload with deleteIndex=true

Stack Trace:
junit.framework.AssertionFailedError: Index directory exists after core unload 
with deleteIndex=true
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.solr.client.solrj.MultiCoreExampleTestBase.testMultiCore(MultiCoreExampleTestBase.java:163)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:435)




Build Log (for compile errors):
[...truncated 1 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name

2011-10-24 Thread Martijn van Groningen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134278#comment-13134278
 ] 

Robert Muir commented on LUCENE-3526:
-

I'm gonna close this issue and open a separate issue for Term(, ) on 3.x... 

 preflex codec returns wrong terms if you use an empty field name
 

 Key: LUCENE-3526
 URL: https://issues.apache.org/jira/browse/LUCENE-3526
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3526.patch, LUCENE-3526.patch, 
 LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, 
 LUCENE-3526_test.patch


 spinoff from LUCENE-3473.
 I have a standalone test for this... the termsenum is returning a bogus extra 
 empty-term (I assume it has no postings, i didnt try).
 This causes the checkindex test in LUCENE-3473 to fail, because there are 4 
 terms instead of 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3509) Add settings to IWC to optimize IDV indices for CPU or RAM respectivly


[ 
https://issues.apache.org/jira/browse/LUCENE-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134293#comment-13134293
 ] 

Martijn van Groningen commented on LUCENE-3509:
---

I also prefer to have a default that matches with the FieldCache. I will change 
the patch so that the option is at the codec impl level 
(DefaultDocValuesConsumer).

 Add settings to IWC to optimize IDV indices for CPU or RAM respectivly
 --

 Key: LUCENE-3509
 URL: https://issues.apache.org/jira/browse/LUCENE-3509
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3509.patch


 spinnoff from LUCENE-3496 - we are seeing much better performance if required 
 bits for PackedInts are rounded up to a 8/16/32/64. We should add this option 
 to IWC and default to round up ie. more RAM  faster lookups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexableField(Type) interfaces, abstract classes and back compat.

2011-10-24 Thread Grant Ingersoll


On Oct 24, 2011, at 1:01 PM, Michael McCandless wrote:

 Thanks for raising this Grant.
 
 My feeling is we can stick with an interface here, and mark it
 @experimental.
 
 This is a very-low-level-very-expert API.

:-)  We thought the same of Fieldable once upon a time!  

At any rate, +1 on all of this.  I think we are much more sane about this stuff 
now!

 
 Most users will use the sugar field impls (TextField, BinaryField,
 NumericField, etc.).  Expert users will build their own FieldType and
 pass that to Field.  Waaay expert users will skip our user-space
 Document/Field/FieldType entirely and code directly to this low level
 minimal indexing API.
 
 For example maybe their app sucks streamed bytes off a socket, parses
 out fields and immediately hands that data off to IndexWriter for
 indexing (never making FieldTypes/Fields/Documents).
 
 So I think such way-expert users can handle hard breaks on the API,
 and would likely want to see the hard break so they know they're
 something to fix / new to add to indexing.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 On Mon, Oct 24, 2011 at 10:02 AM, Grant Ingersoll gsing...@apache.org wrote:
 
 On Oct 24, 2011, at 9:56 AM, Robert Muir wrote:
 
 On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org 
 wrote:
 Hi,
 
 I was perusing trunk code on the way back from Eurocon and noticed the new 
 FieldType stuff has some interfaces in it.  In the past we've tried to 
 stick to interfaces for only simple ones (i.e. one or two methods that 
 aren't likely to change at all) and instead used abstract classes for 
 bigger classes that may be subject to change more often.
 
 I think its good you brought this up Grant.
 
 I wanted to mention this: as far as interfaces versus abstract
 classes, in my opinion Lucene was under a false sense of security
 before thinking that abstract classes actually solve these back compat
 problems.
 In fact they can create serious problems like
 https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if
 someone writes a delegator over an abstract class, its asking for
 trouble.
 On the other hand, delegators over interfaces are safe because they
 (and we) get a compile-time break for the new methods.
 
 Good point.   Basically, going down this line, are we saying that we would 
 still allow new methods on minor versions on Interfaces?  My personal take 
 is that if we do, we primarily just need to communicate it ahead of time.  
 Ideally, at least one release ahead, but maybe it is just an email.  We just 
 want to avoid surprises for people where possible.
 
 -Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


Grant Ingersoll
http://www.lucidimagination.com

[jira] [Commented] (LUCENE-3528) TestNRTManager hang


[ 
https://issues.apache.org/jira/browse/LUCENE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134328#comment-13134328
 ] 

Robert Muir commented on LUCENE-3528:
-

{noformat}
[junit] 2011-10-24 14:28:25
[junit] Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed 
mode):
[junit] 
[junit] Thread-2 daemon prio=10 tid=0x7fe66005b800 nid=0x22d6 waiting 
on condition [0x7fe66d854000]
[junit]java.lang.Thread.State: WAITING (parking)
[junit] at sun.misc.Unsafe.park(Native Method)
[junit] - parking to wait for  0xe0002120 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
[junit] at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
[junit] at 
org.apache.lucene.index.NRTManager.waitOnGenCondition(NRTManager.java:251)
[junit] at 
org.apache.lucene.index.NRTManager.waitForGeneration(NRTManager.java:232)
[junit] at 
org.apache.lucene.index.NRTManager.waitForGeneration(NRTManager.java:196)
[junit] at 
org.apache.lucene.index.TestNRTManager.addDocuments(TestNRTManager.java:95)
[junit] at 
org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase$1.run(ThreadedIndexingAndSearchingTestCase.java:223)
[junit] 
[junit] NRT Reopen Thread daemon prio=10 tid=0x7fe660024800 
nid=0x22d5 in Object.wait() [0x7fe66d955000]
[junit]java.lang.Thread.State: TIMED_WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] at java.lang.Object.wait(Object.java:443)
[junit] at 
org.apache.lucene.index.NRTManagerReopenThread.run(NRTManagerReopenThread.java:162)
[junit] - locked 0xe0006000 (a 
org.apache.lucene.index.NRTManagerReopenThread)
[junit] 
[junit] Low Memory Detector daemon prio=10 tid=0x7fe668001000 
nid=0x22ba runnable [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] CompilerThread1 daemon prio=10 tid=0x40f8b000 nid=0x22b7 
waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] CompilerThread0 daemon prio=10 tid=0x40f88000 nid=0x22b5 
waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Signal Dispatcher daemon prio=10 tid=0x40f86000 
nid=0x22b2 waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Finalizer daemon prio=10 tid=0x40f69000 nid=0x229e in 
Object.wait() [0x7fe66e7d7000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
[junit] - locked 0xe0002528 (a 
java.lang.ref.ReferenceQueue$Lock)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
[junit] at 
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
[junit] 
[junit] Reference Handler daemon prio=10 tid=0x40f62000 
nid=0x2298 in Object.wait() [0x7fe66e8d8000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] at java.lang.Object.wait(Object.java:485)
[junit] at 
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
[junit] - locked 0xe0006090 (a java.lang.ref.Reference$Lock)
[junit] 
[junit] main prio=10 tid=0x40ef6000 nid=0x2240 in Object.wait() 
[0x7fe673e7d000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0xe0002090 (a 
org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase$1)
[junit] at java.lang.Thread.join(Thread.java:1186)
[junit] - locked 0xe0002090 (a 
org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase$1)
[junit] at java.lang.Thread.join(Thread.java:1239)
[junit] at 
org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase.runTest(ThreadedIndexingAndSearchingTestCase.java:524)
[junit] at 
org.apache.lucene.index.TestNRTManager.testNRTManager(TestNRTManager.java:37)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at

[jira] [Created] (LUCENE-3528) TestNRTManager hang

2011-10-24 Thread Robert Muir (Created) (JIRA)

TestNRTManager hang
---

 Key: LUCENE-3528
 URL: https://issues.apache.org/jira/browse/LUCENE-3528
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir


didn't check 3.x yet, just encountered this one running the tests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3529) creating empty field + empty term leads to invalid index

2011-10-24 Thread Robert Muir (Created) (JIRA)

creating empty field + empty term leads to invalid index


 Key: LUCENE-3529
 URL: https://issues.apache.org/jira/browse/LUCENE-3529
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Robert Muir


Spinoff from LUCENE-3526.

* if you create new Field(, ), you get IllegalArgumentException from 
Field's ctor: name and value cannot both be empty
* But there are tons of other ways to index an empty term for the empty field 
(for example initially make it garbage then .setValue(), or via 
tokenstream).
* If you do this, and you have assertions enabled, you will trip an assert (the 
assert is fixed in trunk, in LUCENE-3526)
* But If you don't have assertions enabled, you will create a corrupt index: 
test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num docs 
deleted 0]


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3529) creating empty field + empty term leads to invalid index


 [ 
https://issues.apache.org/jira/browse/LUCENE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3529:


Attachment: LUCENE-3529_test.patch

attached is a test (committed to trunk).

I also fixed the assert and removed the bogus check in Field's ctor.

But the checkIndex fails (as it does before, if you index this term with 
assertions disabled). So next step is to figure out a fix...

 creating empty field + empty term leads to invalid index
 

 Key: LUCENE-3529
 URL: https://issues.apache.org/jira/browse/LUCENE-3529
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Robert Muir
 Attachments: LUCENE-3529_test.patch


 Spinoff from LUCENE-3526.
 * if you create new Field(, ), you get IllegalArgumentException from 
 Field's ctor: name and value cannot both be empty
 * But there are tons of other ways to index an empty term for the empty field 
 (for example initially make it garbage then .setValue(), or via 
 tokenstream).
 * If you do this, and you have assertions enabled, you will trip an assert 
 (the assert is fixed in trunk, in LUCENE-3526)
 * But If you don't have assertions enabled, you will create a corrupt index: 
 test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num 
 docs deleted 0]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2849) Solr maven dependencies: logging

2011-10-24 Thread David Smiley (Created) (JIRA)

Solr maven dependencies: logging


 Key: SOLR-2849
 URL: https://issues.apache.org/jira/browse/SOLR-2849
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0
Reporter: David Smiley
Priority: Trivial


I was looking at my maven based project's Solr-core dependencies (trunk), and 
observed some issues that I think should be fixed in Solr's maven poms. I ran 
{{mvn dependency:tree}} -- the output is further below.  There are two changes 
I see needed, related to logging:
* slf4j-jdk14 should be runtime scope, and optional.
* httpclient depends on commons-logging.  Exclude this dependency from the 
httpclient dependency, and add a dependency on jcl-over-slf4j with compile 
scope.
* Zookeeper depends on Log4j, unfortunately. There is an issue to change this 
to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use 
log4j-over-slf4j with compile scope, at the solrj pom.

As an aside, it's unfortunate to see all those velocity dependencies.  It even 
depends on struts -- seriously?!  I hope solritas gets put back into a contrib 
sometime: SOLR-2588

Steve, if you'd like to me to create the patch, I will.

{code}
[INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile
[INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile
[INFO] |  | +- log4j:log4j:jar:1.2.15:compile
[INFO] |  | |  \- javax.mail:mail:jar:1.4:compile
[INFO] |  | | \- javax.activation:activation:jar:1.1:compile
[INFO] |  | \- jline:jline:jar:0.9.94:compile
[INFO] |  +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile
[INFO] |  +- 
org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile
[INFO] |  |  \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile
[INFO] |  | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile
[INFO] |  +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile
[INFO] |  +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile
[INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
[INFO] |  +- commons-fileupload:commons-fileupload:jar:1.2.1:compile
[INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  |  \- commons-logging:commons-logging:jar:1.0.4:compile
[INFO] |  +- commons-io:commons-io:jar:1.4:compile
[INFO] |  +- org.apache.velocity:velocity:jar:1.6.4:compile
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  |  \- oro:oro:jar:2.0.8:compile
[INFO] |  +- org.apache.velocity:velocity-tools:jar:2.0:compile
[INFO] |  |  +- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  +- commons-chain:commons-chain:jar:1.1:compile
[INFO] |  |  +- commons-validator:commons-validator:jar:1.3.1:compile
[INFO] |  |  +- dom4j:dom4j:jar:1.1:compile
[INFO] |  |  +- sslext:sslext:jar:1.2-0:compile
[INFO] |  |  +- org.apache.struts:struts-core:jar:1.3.8:compile
[INFO] |  |  |  \- antlr:antlr:jar:2.7.2:compile
[INFO] |  |  +- org.apache.struts:struts-taglib:jar:1.3.8:compile
[INFO] |  |  \- org.apache.struts:struts-tiles:jar:1.3.8:compile
[INFO] |  +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile
[INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

System model for Apache solr

2011-10-24 Thread Jose Garcia

Hi guys,

First of all, thanks for this good project. I would like to know if exists
papers or documents related with theoretic model of response time of  Apache
Solr or Apache Lucene.
I am writing an article and I would like to compare my experimental data.

Best regards.

[jira] [Commented] (SOLR-2849) Solr maven dependencies: logging

2011-10-24 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134357#comment-13134357
 ] 

Jason Rutherglen commented on SOLR-2849:


{quote}As an aside, it's unfortunate to see all those velocity dependencies.  
It even depends on struts -- seriously?!  I hope solritas gets put back into a 
contrib sometime: SOLR-2588{quote}

+1, move it out!

 Solr maven dependencies: logging
 

 Key: SOLR-2849
 URL: https://issues.apache.org/jira/browse/SOLR-2849
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0
Reporter: David Smiley
Priority: Trivial

 I was looking at my maven based project's Solr-core dependencies (trunk), and 
 observed some issues that I think should be fixed in Solr's maven poms. I ran 
 {{mvn dependency:tree}} -- the output is further below.  There are two 
 changes I see needed, related to logging:
 * slf4j-jdk14 should be runtime scope, and optional.
 * httpclient depends on commons-logging.  Exclude this dependency from the 
 httpclient dependency, and add a dependency on jcl-over-slf4j with compile 
 scope.
 * Zookeeper depends on Log4j, unfortunately. There is an issue to change this 
 to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use 
 log4j-over-slf4j with compile scope, at the solrj pom.
 As an aside, it's unfortunate to see all those velocity dependencies.  It 
 even depends on struts -- seriously?!  I hope solritas gets put back into a 
 contrib sometime: SOLR-2588
 Steve, if you'd like to me to create the patch, I will.
 {code}
 [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile
 [INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile
 [INFO] |  | +- log4j:log4j:jar:1.2.15:compile
 [INFO] |  | |  \- javax.mail:mail:jar:1.4:compile
 [INFO] |  | | \- javax.activation:activation:jar:1.1:compile
 [INFO] |  | \- jline:jline:jar:0.9.94:compile
 [INFO] |  +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- 
 org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile
 [INFO] |  |  \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile
 [INFO] |  | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile
 [INFO] |  +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
 [INFO] |  +- commons-fileupload:commons-fileupload:jar:1.2.1:compile
 [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
 [INFO] |  |  \- commons-logging:commons-logging:jar:1.0.4:compile
 [INFO] |  +- commons-io:commons-io:jar:1.4:compile
 [INFO] |  +- org.apache.velocity:velocity:jar:1.6.4:compile
 [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
 [INFO] |  |  \- oro:oro:jar:2.0.8:compile
 [INFO] |  +- org.apache.velocity:velocity-tools:jar:2.0:compile
 [INFO] |  |  +- commons-beanutils:commons-beanutils:jar:1.7.0:compile
 [INFO] |  |  +- commons-digester:commons-digester:jar:1.8:compile
 [INFO] |  |  +- commons-chain:commons-chain:jar:1.1:compile
 [INFO] |  |  +- commons-validator:commons-validator:jar:1.3.1:compile
 [INFO] |  |  +- dom4j:dom4j:jar:1.1:compile
 [INFO] |  |  +- sslext:sslext:jar:1.2-0:compile
 [INFO] |  |  +- org.apache.struts:struts-core:jar:1.3.8:compile
 [INFO] |  |  |  \- antlr:antlr:jar:2.7.2:compile
 [INFO] |  |  +- org.apache.struts:struts-taglib:jar:1.3.8:compile
 [INFO] |  |  \- org.apache.struts:struts-tiles:jar:1.3.8:compile
 [INFO] |  +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile
 [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2849) Solr maven dependencies: logging

2011-10-24 Thread Steven Rowe (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134359#comment-13134359
 ] 

Steven Rowe commented on SOLR-2849:
---

bq. Steve, if you'd like to me to create the patch, I will.

Sure, please do.

 Solr maven dependencies: logging
 

 Key: SOLR-2849
 URL: https://issues.apache.org/jira/browse/SOLR-2849
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0
Reporter: David Smiley
Priority: Trivial

 I was looking at my maven based project's Solr-core dependencies (trunk), and 
 observed some issues that I think should be fixed in Solr's maven poms. I ran 
 {{mvn dependency:tree}} -- the output is further below.  There are two 
 changes I see needed, related to logging:
 * slf4j-jdk14 should be runtime scope, and optional.
 * httpclient depends on commons-logging.  Exclude this dependency from the 
 httpclient dependency, and add a dependency on jcl-over-slf4j with compile 
 scope.
 * Zookeeper depends on Log4j, unfortunately. There is an issue to change this 
 to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use 
 log4j-over-slf4j with compile scope, at the solrj pom.
 As an aside, it's unfortunate to see all those velocity dependencies.  It 
 even depends on struts -- seriously?!  I hope solritas gets put back into a 
 contrib sometime: SOLR-2588
 Steve, if you'd like to me to create the patch, I will.
 {code}
 [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile
 [INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile
 [INFO] |  | +- log4j:log4j:jar:1.2.15:compile
 [INFO] |  | |  \- javax.mail:mail:jar:1.4:compile
 [INFO] |  | | \- javax.activation:activation:jar:1.1:compile
 [INFO] |  | \- jline:jline:jar:0.9.94:compile
 [INFO] |  +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- 
 org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile
 [INFO] |  |  \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile
 [INFO] |  | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile
 [INFO] |  +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
 [INFO] |  +- commons-fileupload:commons-fileupload:jar:1.2.1:compile
 [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
 [INFO] |  |  \- commons-logging:commons-logging:jar:1.0.4:compile
 [INFO] |  +- commons-io:commons-io:jar:1.4:compile
 [INFO] |  +- org.apache.velocity:velocity:jar:1.6.4:compile
 [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
 [INFO] |  |  \- oro:oro:jar:2.0.8:compile
 [INFO] |  +- org.apache.velocity:velocity-tools:jar:2.0:compile
 [INFO] |  |  +- commons-beanutils:commons-beanutils:jar:1.7.0:compile
 [INFO] |  |  +- commons-digester:commons-digester:jar:1.8:compile
 [INFO] |  |  +- commons-chain:commons-chain:jar:1.1:compile
 [INFO] |  |  +- commons-validator:commons-validator:jar:1.3.1:compile
 [INFO] |  |  +- dom4j:dom4j:jar:1.1:compile
 [INFO] |  |  +- sslext:sslext:jar:1.2-0:compile
 [INFO] |  |  +- org.apache.struts:struts-core:jar:1.3.8:compile
 [INFO] |  |  |  \- antlr:antlr:jar:2.7.2:compile
 [INFO] |  |  +- org.apache.struts:struts-taglib:jar:1.3.8:compile
 [INFO] |  |  \- org.apache.struts:struts-tiles:jar:1.3.8:compile
 [INFO] |  +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile
 [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2849) Solr maven dependencies: logging

2011-10-24 Thread Erik Hatcher (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134361#comment-13134361
 ] 

Erik Hatcher commented on SOLR-2849:


bq. As an aside, it's unfortunate to see all those velocity dependencies. 
It even depends on struts – seriously?! I hope solritas gets put back into a 
contrib sometime: SOLR-2588

I hear ya loud and clear.  I'll aim to move it out over the next week or so.  
There's some test hiccup in moving it out, IIRC, let me dust it off and get it 
relocated.

As far as the Struts dependency, that's just some transitive POM listing, not 
some run (or compile)-time dependency.   We certainly don't ship any Struts 
JARs from Solr and it all works fine.

 Solr maven dependencies: logging
 

 Key: SOLR-2849
 URL: https://issues.apache.org/jira/browse/SOLR-2849
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0
Reporter: David Smiley
Priority: Trivial

 I was looking at my maven based project's Solr-core dependencies (trunk), and 
 observed some issues that I think should be fixed in Solr's maven poms. I ran 
 {{mvn dependency:tree}} -- the output is further below.  There are two 
 changes I see needed, related to logging:
 * slf4j-jdk14 should be runtime scope, and optional.
 * httpclient depends on commons-logging.  Exclude this dependency from the 
 httpclient dependency, and add a dependency on jcl-over-slf4j with compile 
 scope.
 * Zookeeper depends on Log4j, unfortunately. There is an issue to change this 
 to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use 
 log4j-over-slf4j with compile scope, at the solrj pom.
 As an aside, it's unfortunate to see all those velocity dependencies.  It 
 even depends on struts -- seriously?!  I hope solritas gets put back into a 
 contrib sometime: SOLR-2588
 Steve, if you'd like to me to create the patch, I will.
 {code}
 [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile
 [INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile
 [INFO] |  | +- log4j:log4j:jar:1.2.15:compile
 [INFO] |  | |  \- javax.mail:mail:jar:1.4:compile
 [INFO] |  | | \- javax.activation:activation:jar:1.1:compile
 [INFO] |  | \- jline:jline:jar:0.9.94:compile
 [INFO] |  +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- 
 org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile
 [INFO] |  |  \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile
 [INFO] |  | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile
 [INFO] |  +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile
 [INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
 [INFO] |  +- commons-fileupload:commons-fileupload:jar:1.2.1:compile
 [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
 [INFO] |  |  \- commons-logging:commons-logging:jar:1.0.4:compile
 [INFO] |  +- commons-io:commons-io:jar:1.4:compile
 [INFO] |  +- org.apache.velocity:velocity:jar:1.6.4:compile
 [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
 [INFO] |  |  \- oro:oro:jar:2.0.8:compile
 [INFO] |  +- org.apache.velocity:velocity-tools:jar:2.0:compile
 [INFO] |  |  +- commons-beanutils:commons-beanutils:jar:1.7.0:compile
 [INFO] |  |  +- commons-digester:commons-digester:jar:1.8:compile
 [INFO] |  |  +- commons-chain:commons-chain:jar:1.1:compile
 [INFO] |  |  +- commons-validator:commons-validator:jar:1.3.1:compile
 [INFO] |  |  +- dom4j:dom4j:jar:1.1:compile
 [INFO] |  |  +- sslext:sslext:jar:1.2-0:compile
 [INFO] |  |  +- org.apache.struts:struts-core:jar:1.3.8:compile
 [INFO] |  |  |  \- antlr:antlr:jar:2.7.2:compile
 [INFO] |  |  +- org.apache.struts:struts-taglib:jar:1.3.8:compile
 [INFO] |  |  \- org.apache.struts:struts-tiles:jar:1.3.8:compile
 [INFO] |  +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile
 [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Request for Feedback for Patch to Allow DIH to Archive Files

2011-10-24 Thread Josh Harness

Hi -

 We are using SOLR to process XML input files using the Data Import
Handler. I didn't see a way to move the xml files out of the way after
processing, so I wrote a small extension to allow this. The How to
Contribute http://wiki.apache.org/solr/HowToContribute page says to pitch
the request to the developer list in order to decide whether or not to
submit a patch. As such, here goes:

 The new code basically extends FileDataSource and wraps the underlying
reader such that when the close method on the input stream is called, the
file is moved to a configurable archive directory. It is unclear to me
whether this is the correct place to put it (I pondered changing the
FileListEntityProcessor but this somehow felt safer). I realize that a more
robust implementation would consider the success status of the file being
processed and would also allow for configurable policies rather than a
concrete implementation. Nonetheless, I didn't want the perfect to be the
enemy of the good.

 Please peruse the attached source code file and provide feedback as to
the merit of the idea, whether I ought to submit a JIRA ticket/patch and if
my approach is correct.

Thanks!

Josh Harness


ArchivingFileDataSource.java
Description: Binary data

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Patch submission for DataImportHandler's FileListEntityProcessor to sort files

2011-10-24 Thread Gabriel Cooper


Hello,

I noticed what appears to be a bug in DataImportHandler's 
FileListEntityProcessor. Specifically, it relies on Java's File.list() 
method to retrieve a list of files from the configured dataimport 
directory, but list() does not guarantee a sort order. This means that 
if you have two files that update the same record, the results are 
non-deterministic. Typically, list() does in fact return them 
lexigraphically sorted, but this is not guaranteed.


An example of how you can get into trouble is to imagine the following:

xyz.xml -- Created one hour ago. Contains updates to records Foo and 
Bar.
abc.xml -- Created one minute ago. Contains updates to records Bar and 
Baz.


In this case, the newest file, in abc.xml, would (likely, but not 
guaranteed) be run first, updating the Bar and Baz records. Next, 
the older file, xyz.xml, would update Foo and overwrite Bar with 
outdated changes.


The HowToContribute wiki page suggested I send my request here before 
opening an actual bug ticket, so please let me know if there's anything 
else I can or should do to get this patch submitted and approved. I've 
attached a patch of FileListEntityProcessor, along with an updated test, 
please let me know if it's kosher.


Thank you,

Gabriel.
Index: 
src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java
===
--- 
src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java
(revision 1188246)
+++ 
src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java
(working copy)
@@ -36,12 +36,19 @@
   @Test
   @SuppressWarnings(unchecked)
   public void testSimple() throws IOException {
+final String CREATED_FIRST  = b.xml;
+final String CREATED_SECOND = a.xml;
 File tmpdir = File.createTempFile(test, tmp, TEMP_DIR);
 tmpdir.delete();
 tmpdir.mkdir();
 tmpdir.deleteOnExit();
+createFile(tmpdir, b.xml, b.xml.getBytes(), false);
+try {
+  Thread.sleep(1000);
+} catch (Exception e) {
+// Don't care if interrupted. Pass.
+}
 createFile(tmpdir, a.xml, a.xml.getBytes(), false);
-createFile(tmpdir, b.xml, b.xml.getBytes(), false);
 createFile(tmpdir, c.props, c.props.getBytes(), false);
 Map attrs = createMap(
 FileListEntityProcessor.FILE_NAME, xml$,
@@ -58,6 +65,9 @@
   fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE));
 }
 assertEquals(2, fList.size());
+
+assertTrue(File created first should have appeared first, 
fList.get(0).endsWith(CREATED_FIRST));
+assertTrue(File created second should have appeared second, 
fList.get(1).endsWith(CREATED_SECOND));
   }
   
   @Test
Index: src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java
===
--- src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java
(revision 1188246)
+++ src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java
(working copy)
@@ -219,25 +219,24 @@
   }
 
   private void getFolderFiles(File dir, final ListMapString, Object 
fileDetails) {
-// Fetch an array of file objects that pass the filter, however the
-// returned array is never populated; accept() always returns false.
-// Rather we make use of the fileDetails array which is populated as
-// a side affect of the accept method.
-dir.list(new FilenameFilter() {
-  public boolean accept(File dir, String name) {
-File fileObj = new File(dir, name);
-if (fileObj.isDirectory()) {
-  if (recursive) getFolderFiles(fileObj, fileDetails);
-} else if (fileNamePattern == null) {
-  addDetails(fileDetails, dir, name);
-} else if (fileNamePattern.matcher(name).find()) {
-  if (excludesPattern != null  excludesPattern.matcher(name).find())
-return false;
-  addDetails(fileDetails, dir, name);
+File[] files = dir.listFiles();
+Arrays.sort(files, new ComparatorFile(){
+public int compare(File f1, File f2) {
+return ((Long)f1.lastModified()).compareTo(f2.lastModified());
 }
-return false;
-  }
 });
+
+for(File fileObj : files) {
+  String name = fileObj.getName();
+  if (fileObj.isDirectory()) {
+if (recursive) getFolderFiles(fileObj, fileDetails);
+  } else if (fileNamePattern == null) {
+addDetails(fileDetails, dir, name);
+  } else if (fileNamePattern.matcher(name).find()) {
+if (excludesPattern == null || !excludesPattern.matcher(name).find())
+  addDetails(fileDetails, dir, name);
+  }
+}
   }
 
   private void addDetails(ListMapString, Object files, File dir, String 
name) {


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes


 [ 
https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3508:
--

Attachment: LUCENE-3508.patch

Attached you will find a new patch for trunk. I made some improvements to the 
copy operations and CompoundTokenClass:
- copy operations no longer create useless String objects or clones of String's 
internal char[] (this slows down indexing a lot)
- the algorithmic hyphenator uses CTA's char[] directly as it did for Token 
before (see above) and uses optimized append()
- the broken non-unicode-conform lowercasing was removed, instead, the 
CharArraySet is created case insensitive. If you pass in an own CharArraySet, 
it has to be case insensitive, if not, filter will fail (what to do? Robert, 
how do we handle that otherwise?)
- As all tokens are again CTAs, the CAS lookup is fast again.
- Some whitespace cleanup in the test and relicts in base source file (Lucene 
requires 2 spaces, no tabs)

Robert, if you could look into it, it would be great. I did not test it with 
Solr, but for me it looks correct.

Uwe

 Decompounders based on CompoundWordTokenFilterBase cannot be used with custom 
 attributes
 

 Key: LUCENE-3508
 URL: https://issues.apache.org/jira/browse/LUCENE-3508
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.4, 4.0
Reporter: Spyros Kapnissis
Assignee: Uwe Schindler
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3508.patch, LUCENE-3508.patch


 The CompoundWordTokenFilterBase.setToken method will call clearAttributes() 
 and then will reset only the default Token attributes (term, position, flags, 
 etc) resulting in any custom attributes losing their value. Commenting out 
 clearAttributes() seems to do the trick, but will fail the 
 TestCompoundWordTokenFilter tests..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 731 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/731/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability

Error Message:
No live SolrServers available to handle this request

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:222)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:177)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:435)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:206)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)




Build Log (for compile errors):
[...truncated 14448 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes


 [ 
https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3508:
--

Attachment: LUCENE-3508.patch

More cleanup:
- As original token is always preserved, is not put into the list at all and 
returned without modifying (no extra copy operations)
- removed deprecated makeDictionary method and corrected matchVersion usage.

 Decompounders based on CompoundWordTokenFilterBase cannot be used with custom 
 attributes
 

 Key: LUCENE-3508
 URL: https://issues.apache.org/jira/browse/LUCENE-3508
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.4, 4.0
Reporter: Spyros Kapnissis
Assignee: Uwe Schindler
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch


 The CompoundWordTokenFilterBase.setToken method will call clearAttributes() 
 and then will reset only the default Token attributes (term, position, flags, 
 etc) resulting in any custom attributes losing their value. Commenting out 
 clearAttributes() seems to do the trick, but will fail the 
 TestCompoundWordTokenFilter tests..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes


 [ 
https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3508:
--

Attachment: LUCENE-3508.patch

One more time the filter was revisited and partly rewritten:
- it no longer clones the orginal token, as decompounding is done when 
TokenStream is on this token, which does not change. The decompounder simply 
takes termAtt/offsetAtt and produces new CompoundToken instances out of it, 
added to the LinkedList. The original is returned unmodified by a simple 
return true. This filter actually only creates new opjects when compounds are 
found, all other tokens are passed as is.
- CompoundToken is now a simple wrapper around the characters and the offsets, 
copied out of the unmodified original termAtt.

I think thats the most effective implementation of this filters. I think it's 
ready to commit.

 Decompounders based on CompoundWordTokenFilterBase cannot be used with custom 
 attributes
 

 Key: LUCENE-3508
 URL: https://issues.apache.org/jira/browse/LUCENE-3508
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.4, 4.0
Reporter: Spyros Kapnissis
Assignee: Uwe Schindler
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch, 
 LUCENE-3508.patch


 The CompoundWordTokenFilterBase.setToken method will call clearAttributes() 
 and then will reset only the default Token attributes (term, position, flags, 
 etc) resulting in any custom attributes losing their value. Commenting out 
 clearAttributes() seems to do the trick, but will fail the 
 TestCompoundWordTokenFilter tests..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10989 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10989/

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.update.AutoCommitTest

Error Message:
java.lang.AssertionError: directory of test was not closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33)

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: directory of test was not 
closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:469)
at 
org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:527)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:437)




Build Log (for compile errors):
[...truncated 7871 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes


 [ 
https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3508:
--

Attachment: LUCENE-3508.patch

 Decompounders based on CompoundWordTokenFilterBase cannot be used with custom 
 attributes
 

 Key: LUCENE-3508
 URL: https://issues.apache.org/jira/browse/LUCENE-3508
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.4, 4.0
Reporter: Spyros Kapnissis
Assignee: Uwe Schindler
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch, 
 LUCENE-3508.patch


 The CompoundWordTokenFilterBase.setToken method will call clearAttributes() 
 and then will reset only the default Token attributes (term, position, flags, 
 etc) resulting in any custom attributes losing their value. Commenting out 
 clearAttributes() seems to do the trick, but will fail the 
 TestCompoundWordTokenFilter tests..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-10-24 Thread Michael McCandless (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-2205:
---

Attachment: LUCENE-2205.patch

New patch, iterated from Aaron's last patch.

I moved the DataInput/Output impls into PagedBytes, so they can directly
operate on the byte[] blocks. I also don't write skipOffset unless df =
skipInterval.

I think this is ready!

Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and
the index pointer long[] and create a more memory efficient data structure.
---

Key: LUCENE-2205
URL: https://issues.apache.org/jira/browse/LUCENE-2205
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
Fix For: 3.5

Attachments: LUCENE-2205.patch, RandomAccessTest.java,
TermInfosReader.java, TermInfosReaderIndex.java,
TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java,
lowmemory_w_utf8_encoding.patch, lowmemory_w_utf8_encoding.v4.patch,
patch-final.txt, rawoutput.txt

Basically packing those three arrays into a byte array with an int array as
an index offset.
The performance benefits are stagering on my test index (of size 6.2 GB, with
~1,000,000 documents and ~175,000,000 terms), the memory needed to load the
terminfos into memory were reduced to 17% of there original size. From 291.5
MB to 49.7 MB. The random access speed has been made better by 1-2%, load
time of the segments are ~40% faster as well, and full GC's on my JVM were
made 7 times faster.
I have already performed the work and am offering this code as a patch.
Currently all test in the trunk pass with this new code enabled. I did write
a system property switch to allow for the original implementation to be used
as well.
-Dorg.apache.lucene.index.TermInfosReader=default or small
I have also written a blog about this patch here is the link.
http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3515) Possible slowdown of indexing/merging on 3.x vs trunk

2011-10-24 Thread Michael McCandless (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3515:
---

Fix Version/s: (was: 3.5)

This bug was only present in 4.0.

 Possible slowdown of indexing/merging on 3.x vs trunk
 -

 Key: LUCENE-3515
 URL: https://issues.apache.org/jira/browse/LUCENE-3515
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3515.patch, LUCENE-3515.patch, 
 LUCENE-index-34.patch, LUCENE-index-40.patch, TestGenerationTime.java.3x, 
 TestGenerationTime.java.40, stdout-snow-leopard.tar.gz


 Opening an issue to pursue the possible slowdown Marc Sturlese uncovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2850) Do not refine facets when minCount == 1

2011-10-24 Thread Matt Smith (Created) (JIRA)

Do not refine facets when minCount == 1
---

 Key: SOLR-2850
 URL: https://issues.apache.org/jira/browse/SOLR-2850
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.4
 Environment: Ubuntu, distributed
Reporter: Matt Smith


Currently there is a special case in the code to not refine facets if 
minCount==0.  It seems this could be extended to minCount = 1 as there would 
be no need to take the extra step to refine facets if minCount is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3529) creating empty field + empty term leads to invalid index

2011-10-24 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3529.
-

   Resolution: Fixed
Fix Version/s: 3.5

Thanks Mike, your fix from 3183 was correct all along... we should have just 
gone with it originally...

 creating empty field + empty term leads to invalid index
 

 Key: LUCENE-3529
 URL: https://issues.apache.org/jira/browse/LUCENE-3529
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Robert Muir
 Fix For: 3.5

 Attachments: LUCENE-3529.patch, LUCENE-3529_test.patch


 Spinoff from LUCENE-3526.
 * if you create new Field(, ), you get IllegalArgumentException from 
 Field's ctor: name and value cannot both be empty
 * But there are tons of other ways to index an empty term for the empty field 
 (for example initially make it garbage then .setValue(), or via 
 tokenstream).
 * If you do this, and you have assertions enabled, you will trip an assert 
 (the assert is fixed in trunk, in LUCENE-3526)
 * But If you don't have assertions enabled, you will create a corrupt index: 
 test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num 
 docs deleted 0]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms

2011-10-24 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3473.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.5
 Assignee: Robert Muir

 CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
 ---

 Key: LUCENE-3473
 URL: https://issues.apache.org/jira/browse/LUCENE-3473
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.4, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch, 
 LUCENE-3473.patch


 Just glancing at the code it seems to sorta do this check, but only in the 
 hasOrd==true case maybe (which seems to be testing something else)?
 It would be nice to verify this also for terms dicts that dont support ord.
 we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and 
 preflex

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes