date:20140517

Robert Muir created LUCENE-5677:
---

 Summary: Simplify position handling in DefaultIndexingChain
 Key: LUCENE-5677
 URL: https://issues.apache.org/jira/browse/LUCENE-5677
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5677.patch

There are currently a ton of conditionals checking for various problems, as 
well as a horribly confusing unbalanced decrement + increment, and in general 
the code is a nightmare to follow.

To make it worse, besides being confusing it doesnt handle all cases: e.g. a 
negative position increment gap from the analyzer will just result in total 
chaos (corruption etc).

I think an easier way to implement this is to init fieldinvertstate.position to 
-1, and for the logic to be:

{code}
  position += posincr;
  check that position = 0  position = lastPosition
  lastPosition = position;
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5677) Simplify position handling in DefaultIndexingChain


 [ 
https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5677:


Attachment: LUCENE-5677.patch

heres a quick prototype. tests seem happy.

 Simplify position handling in DefaultIndexingChain
 --

 Key: LUCENE-5677
 URL: https://issues.apache.org/jira/browse/LUCENE-5677
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5677.patch


 There are currently a ton of conditionals checking for various problems, as 
 well as a horribly confusing unbalanced decrement + increment, and in general 
 the code is a nightmare to follow.
 To make it worse, besides being confusing it doesnt handle all cases: e.g. a 
 negative position increment gap from the analyzer will just result in total 
 chaos (corruption etc).
 I think an easier way to implement this is to init fieldinvertstate.position 
 to -1, and for the logic to be:
 {code}
   position += posincr;
   check that position = 0  position = lastPosition
   lastPosition = position;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: BaseTokenStreamTestCase

2014-05-17 Thread Nitzan Shaked

Got it. Will do so, and amend my JIRA ticket to include this as well as
tests.

Thanks!


On Sat, May 17, 2014 at 2:21 AM, Uwe Schindler u...@thetaphi.de wrote:

 Hi,



 you have to capture state on the first token before inserting new ones.
 When inserting a new token, **solely** call restoreState();
 clearAttributes() is not needed before restoreState().

 If you don’t do this, your filter will work incorrect if other filters
 come **after** it.



 The assertion in BaseTokenStreamTestCase is therefore correct and really
 mandatory. There are many filters that show how to do this token inserting
 correctly.



 Uwe



 -

 Uwe Schindler

 H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de

 eMail: u...@thetaphi.de



 *From:* Nitzan Shaked [mailto:nitzan.sha...@gmail.com]
 *Sent:* Friday, May 16, 2014 6:28 AM
 *To:* dev@lucene.apache.org
 *Subject:* BaseTokenStreamTestCase



 Hi all



 While writing the unit tests for a new token filter I came across an
 issue(?) with BaseTokenStreamTestCase.assertTokenStreamContents(): it goes
 to some length to assure that clearAttributes() was called for every token
 produced by the filter under test.



 I suppose this helps most of the time, but my filter produces sometimes
 more than 1 output token for a given input token. I don't want to care
 about what attributes the input token carries, and so don't clear
 attributes between producing the output tokens from a given input token: I
 only change the attributes I care about (in my case this is charTerm right
 now, and nothing else, not even positionIncrement).



 This makes my unit tests unable to use
 BaseTokenStreamTestCase.assertTokenStreamContents(). I certainly do not
 want to add a captureState() and clearAttributes() ; restoreState() 
 calls just so I can pass the unit tests.



 I would rather change assertTokenStreamContents to support my use case, by
 adding a boolean and making the required changes everywhere else.



 Thoughts?

 Nitzan

[jira] [Commented] (LUCENE-5663) Fix FSDirectory.open API


[ 
https://issues.apache.org/jira/browse/LUCENE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000695#comment-14000695
 ] 

Uwe Schindler commented on LUCENE-5663:
---

I think the main problem here is just the name open(). The problem here is 
that NIOFSDir.open() reads like open a NIOFSDir. I agree with Hoss, that 
this is the standard,well known factory pattern and this problem with it 
applies to other cases, too (you can also call 
{{Lucene43Codec.forName(Lucene3x)}} which is also bullshit. But here it is 
obvious from the method name that forName relates to a factory.

So people should really listen to their Eclipse warning (better would be to 
have it as error and Java should not allow access to static methods on 
subclasses).

The better fix is in my opinion to just rename the method to a better name: 
{{FSDirectory.newPlatformDefault(...);}} Then no need to shadow them and its 
more obvious, that this is a factory method.

In 4.x we can still provide a deprecated open(), which is shadowed in the 
subclasses and throws UOE in subclasses.

 Fix FSDirectory.open API
 

 Key: LUCENE-5663
 URL: https://issues.apache.org/jira/browse/LUCENE-5663
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Spinoff from LUCENE-5658
 {quote}
 This does not use NIOFSDir! open() is a static factory method on FSDirectory, 
 just inherited to NIOFSDirectory. 
 {quote}
 I think its confusing we have this method on FSDirectory, so its visible in 
 subclasses. We should at least consider doing this in another way so its not 
 confusing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice


[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000706#comment-14000706
 ] 

Uwe Schindler commented on LUCENE-4371:
---

Looks cool.

I was a bit confused about ByteBufferIndexInput, because this one already has 
{{slice(...)}}. We should add {{@Override}} here, because it now implements 
abstract method.

I still have to think if close works as expected, but this did not change as 
before. Maybe this is my misunderstanding, but it is really confusing:
Slices are always closed by consumer code (not like clones) or not? If yes, all 
looks fine, but we should document this: clones do not need to be closed, but 
what about slices? I think we use the same FileDescriptor, so we also don't 
need to close the slices?

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, 
 LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1595425 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/core/ lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java

2014-05-17 Thread Michael McCandless

Grrr, thanks Rob.

Mike McCandless

http://blog.mikemccandless.com


On Sat, May 17, 2014 at 1:22 AM,  rm...@apache.org wrote:
 Author: rmuir
 Date: Sat May 17 05:22:33 2014
 New Revision: 1595425

 URL: http://svn.apache.org/r1595425
 Log:
 improve test

 Modified:
 lucene/dev/branches/branch_4x/   (props changed)
 lucene/dev/branches/branch_4x/lucene/   (props changed)
 lucene/dev/branches/branch_4x/lucene/core/   (props changed)
 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java

 Modified: 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java?rev=1595425r1=1595424r2=1595425view=diff
 ==
 --- 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
  (original)
 +++ 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
  Sat May 17 05:22:33 2014
 @@ -36,6 +36,7 @@ import org.apache.lucene.analysis.MockTo
  import org.apache.lucene.analysis.Token;
  import org.apache.lucene.analysis.TokenFilter;
  import org.apache.lucene.analysis.TokenStream;
 +import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
  import org.apache.lucene.document.BinaryDocValuesField;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.Field;
 @@ -1509,6 +1510,7 @@ public class TestIndexWriterExceptions e
String value = null;
doc.add(new StoredField(foo, value));
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1532,6 +1534,7 @@ public class TestIndexWriterExceptions e
// set to null value
theField.setStringValue(null);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1556,6 +1559,7 @@ public class TestIndexWriterExceptions e
Field theField = new StoredField(foo, v);
doc.add(theField);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (NullPointerException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1580,6 +1584,7 @@ public class TestIndexWriterExceptions e
byte v[] = null;
theField.setBytesValue(v);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (NullPointerException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1604,6 +1609,7 @@ public class TestIndexWriterExceptions e
Field theField = new StoredField(foo, v);
doc.add(theField);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1628,6 +1634,7 @@ public class TestIndexWriterExceptions e
BytesRef v = null;
theField.setBytesValue(v);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice


[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000715#comment-14000715
 ] 

Uwe Schindler commented on LUCENE-4371:
---

Btw, thanks for hiding and making the concrete FSDirIndexInputs hidden and 
especially final! Great step. The protected annoyed me for long time, but for 
backwards compatibility I never removed them (although I am sure nobody was 
ever able to subclass them correctly!).

In ByteBufferIndexInput.slice() the return value is a package-protected class, 
so we should change this to the general IndexInput like in the abstract base 
class, otherwise the Javadocs will be look broken. This applies to the other 
classes and their clone(), too. The caller only needs the abstract IndexInput 
(especially if the impl class is invisible).

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, 
 LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5677) Simplify position handling in DefaultIndexingChain


[ 
https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000713#comment-14000713
 ] 

Michael McCandless commented on LUCENE-5677:


+1, much better!

 Simplify position handling in DefaultIndexingChain
 --

 Key: LUCENE-5677
 URL: https://issues.apache.org/jira/browse/LUCENE-5677
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5677.patch


 There are currently a ton of conditionals checking for various problems, as 
 well as a horribly confusing unbalanced decrement + increment, and in general 
 the code is a nightmare to follow.
 To make it worse, besides being confusing it doesnt handle all cases: e.g. a 
 negative position increment gap from the analyzer will just result in total 
 chaos (corruption etc).
 I think an easier way to implement this is to init fieldinvertstate.position 
 to -1, and for the logic to be:
 {code}
   position += posincr;
   check that position = 0  position = lastPosition
   lastPosition = position;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Consolidate IndexWriter.deleteDocuments()

2014-05-17 Thread Michael McCandless

+1

Mike McCandless

http://blog.mikemccandless.com


On Fri, May 16, 2014 at 7:03 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I was looking at IW.deleteDocs() API, and was wondering why do we have both
 : deleteDocuments(Term) and deleteDocuments(Term...). Why can't we have just
 : the vararg one? Same applies to deleteDocuments(Query).

 +1

 I think those method signatures just haven't been cleaned up since the
 introduction of varags ?

 (ie: Lucene 2.9 was Java1.4 compatible and had Array versions of both of
 those methods instead of the more general vararg versions we have now)


 -Hoss
 http://www.lucidworks.com/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice


[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000722#comment-14000722
 ] 

Michael McCandless commented on LUCENE-4371:


+1, this is an awesome simplification!

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, 
 LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice


[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000737#comment-14000737
 ] 

Robert Muir commented on LUCENE-4371:
-

{quote}
We should add @Override here, because it now implements abstract method.
{quote}

Oh, thanks, I forgot this.

{quote}
I think we use the same FileDescriptor, so we also don't need to close the 
slices?
{quote}

Slices are just like clones. So for example CFSDirectory holds an input over 
the entire .cfs file, and when you ask to open a file within the cfs it 
returns a slice (clone) of it. when you close the cfs it closes the real one.

{quote}
In ByteBufferIndexInput.slice() the return value is a package-protected class, 
so we should change this to the general IndexInput like in the abstract base 
class, otherwise the Javadocs will be look broken. 
{quote}

What javadocs? This is not a public class :)

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, 
 LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1595425 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/core/ lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java

2014-05-17 Thread Robert Muir

it was my bug, i recently added these tests

On Sat, May 17, 2014 at 4:32 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Grrr, thanks Rob.

 Mike McCandless

 http://blog.mikemccandless.com


 On Sat, May 17, 2014 at 1:22 AM,  rm...@apache.org wrote:
 Author: rmuir
 Date: Sat May 17 05:22:33 2014
 New Revision: 1595425

 URL: http://svn.apache.org/r1595425
 Log:
 improve test

 Modified:
 lucene/dev/branches/branch_4x/   (props changed)
 lucene/dev/branches/branch_4x/lucene/   (props changed)
 lucene/dev/branches/branch_4x/lucene/core/   (props changed)
 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java

 Modified: 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java?rev=1595425r1=1595424r2=1595425view=diff
 ==
 --- 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
  (original)
 +++ 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
  Sat May 17 05:22:33 2014
 @@ -36,6 +36,7 @@ import org.apache.lucene.analysis.MockTo
  import org.apache.lucene.analysis.Token;
  import org.apache.lucene.analysis.TokenFilter;
  import org.apache.lucene.analysis.TokenStream;
 +import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
  import org.apache.lucene.document.BinaryDocValuesField;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.Field;
 @@ -1509,6 +1510,7 @@ public class TestIndexWriterExceptions e
String value = null;
doc.add(new StoredField(foo, value));
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1532,6 +1534,7 @@ public class TestIndexWriterExceptions e
// set to null value
theField.setStringValue(null);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1556,6 +1559,7 @@ public class TestIndexWriterExceptions e
Field theField = new StoredField(foo, v);
doc.add(theField);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (NullPointerException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1580,6 +1584,7 @@ public class TestIndexWriterExceptions e
byte v[] = null;
theField.setBytesValue(v);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (NullPointerException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1604,6 +1609,7 @@ public class TestIndexWriterExceptions e
Field theField = new StoredField(foo, v);
doc.add(theField);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc
 @@ -1628,6 +1634,7 @@ public class TestIndexWriterExceptions e
BytesRef v = null;
theField.setBytesValue(v);
iw.addDocument(doc);
 +  fail(didn't get expected exception);
  } catch (IllegalArgumentException expected) {}
  iw.close();
  // make sure we see our good doc



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_55) - Build # 10320 - Failure!

2014-05-17 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10320/
Java: 32bit/jdk1.7.0_55 -client -XX:+UseG1GC

1 tests failed.
REGRESSION:  
org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues

Error Message:


Stack Trace:
java.lang.AssertionError
at 
__randomizedtesting.SeedInfo.seed([B2F2E797848F720B:51C934959CBDA284]:0)
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at org.junit.Assert.assertFalse(Assert.java:79)
at 
org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues(TestFieldCacheVsDocValues.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 8634 lines...]
   [junit4] Suite: org.apache.lucene.uninverting.TestFieldCacheVsDocValues
   [junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=TestFieldCacheVsDocValues -Dtests.method=testHugeBinaryValues 
-Dtests.seed=B2F2E797848F720B -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=sr_RS -Dtests.timezone=America/Cambridge_Bay 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE

[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice


[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000748#comment-14000748
 ] 

Uwe Schindler commented on LUCENE-4371:
---

bq. What javadocs? This is not a public class 

You are right, because MMapIndexInput is private, too!

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, 
 LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4371) consider refactoring slicer to indexinput.slice

[
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-4371:

Attachment: LUCENE-4371.patch

Added missing \@Override

By the way, i noticed something when refactoring the code: slicing/cloning
currently has no safety (except for MMAP). we should think about this for
NIO/Simple too: simple range checks that the slice is in bounds and maybe that
the channel.isOpen. CFSDir could check some of this too, because its handle
is now an ordinary input.

But i didn't want to stir up controversy in this refactor (it is unrelated to
this patch). I think there is no performance impact of adding such checks to
NIO/Simple because they already must suffer a buffer refill here anyway. So
maybe we can just open a followup.

consider refactoring slicer to indexinput.slice
---

Key: LUCENE-4371
URL: https://issues.apache.org/jira/browse/LUCENE-4371
Project: Lucene - Core
Issue Type: Task
Reporter: Robert Muir
Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch,
LUCENE-4371.patch, LUCENE-4371.patch

From LUCENE-4364:
{quote}
In my opinion, we should maybe check, if we can remove the whole Slicer in
all Indexinputs? Just make the slice(...) method return the current
BufferedIndexInput-based one. This could be another issue, once this is in.
{quote}

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re:

2014-05-17 Thread Mikhail Khludnev

For Sure. Lucene's explain is really expensive and is not purposed for
production use, but only for rare troubleshooting. As a mitigation measure
you can scroll result set by small portions more efficient like Hoss
recently explained at SearchHub. In such kind of problems, usually it's
possible to create sort of specialized custom collectors doing something
particular.

Have a god day!


On Sat, May 17, 2014 at 3:01 AM, Tom Burton-West tburt...@umich.edu wrote:

 Hello all,


 I'm trying to get relevance scoring information for each of 1,000 docs
 returned for each of 250 queries.If I run the query (appended below)
 without debugQuery=on, I have no problem with getting all the results
 with under 4GB of memory use.  If I add the parameter debugQuery=on,
 memory use goes up continuously and after about 20 queries (with 1,000
 results each), memory use reaches about 29.1 GB and the garbage collector
 gives up:

  org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: GC overhead limit exceeded

 I've attached a jmap -histo, exgerpt below.

 Is this a known issue with debugQuery?

 Tom
 
 query:


 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
 debugQuery=on

 without debugQuery=on:


 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2

 num   #instances#bytes  Class description
 --
 1:  585,559 10,292,067,456  byte[]
 2:  743,639 18,874,349,592  char[]
 3:  53,821  91,936,328  long[]
 4:  70,430  69,234,400  int[]
 5:  51,348  27,111,744
  org.apache.lucene.util.fst.FST$Arc[]
 6:  286,357 20,617,704  org.apache.lucene.util.fst.FST$Arc
 7:  715,364 17,168,736  java.lang.String
 8:  79,561  12,547,792  * ConstMethodKlass
 9:  18,909  11,404,696  short[]
 10: 345,854 11,067,328  java.util.HashMap$Entry
 11: 8,823   10,351,024  * ConstantPoolKlass
 12: 79,561  10,193,328  * MethodKlass
 13: 228,587 9,143,480
 org.apache.lucene.document.FieldType
 14: 228,584 9,143,360   org.apache.lucene.document.Field
 15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
 16: 210,342 8,413,680   java.util.TreeMap$Entry
 17: 81,576  8,204,648   java.util.HashMap$Entry[]
 18: 107,921 7,770,312   org.apache.lucene.util.fst.FST$Arc
 19: 13,020  6,874,560
 org.apache.lucene.util.fst.FST$Arc[]


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

[jira] [Updated] (LUCENE-5677) Simplify position handling in DefaultIndexingChain

2014-05-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5677:


Attachment: LUCENE-5677.patch

Slightly tweaked patch: just handles the offsets with the same logic for 
consistency, and adds a test for crazyOffsetGap

This removes another conditional and just makes it simpler. I also pulled out 
the 'boost  omitNorms check' into the caller, because its unrelated to 
inverting the tokenstream. 

We should try to keep invert() simple.

 Simplify position handling in DefaultIndexingChain
 --

 Key: LUCENE-5677
 URL: https://issues.apache.org/jira/browse/LUCENE-5677
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5677.patch, LUCENE-5677.patch


 There are currently a ton of conditionals checking for various problems, as 
 well as a horribly confusing unbalanced decrement + increment, and in general 
 the code is a nightmare to follow.
 To make it worse, besides being confusing it doesnt handle all cases: e.g. a 
 negative position increment gap from the analyzer will just result in total 
 chaos (corruption etc).
 I think an easier way to implement this is to init fieldinvertstate.position 
 to -1, and for the logic to be:
 {code}
   position += posincr;
   check that position = 0  position = lastPosition
   lastPosition = position;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_55) - Build # 10320 - Failure!

2014-05-17 Thread Robert Muir

Test bug: i committed a fix.

On Sat, May 17, 2014 at 8:12 AM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10320/
 Java: 32bit/jdk1.7.0_55 -client -XX:+UseG1GC

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues

 Error Message:


 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([B2F2E797848F720B:51C934959CBDA284]:0)
 at org.junit.Assert.fail(Assert.java:92)
 at org.junit.Assert.assertTrue(Assert.java:43)
 at org.junit.Assert.assertFalse(Assert.java:68)
 at org.junit.Assert.assertFalse(Assert.java:79)
 at 
 org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues(TestFieldCacheVsDocValues.java:188)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
 at java.lang.Thread.run(Thread.java:745)




 Build Log:
 [...truncated 8634 lines...]
[junit4] Suite: org.apache.lucene.uninverting.TestFieldCacheVsDocValues
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestFieldCacheVsDocValues

[jira] [Commented] (LUCENE-5677) Simplify position handling in DefaultIndexingChain


[ 
https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000783#comment-14000783
 ] 

ASF subversion and git services commented on LUCENE-5677:
-

Commit 1595469 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1595469 ]

LUCENE-5677: simplify position handling in DefaultIndexingChain

 Simplify position handling in DefaultIndexingChain
 --

 Key: LUCENE-5677
 URL: https://issues.apache.org/jira/browse/LUCENE-5677
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5677.patch, LUCENE-5677.patch


 There are currently a ton of conditionals checking for various problems, as 
 well as a horribly confusing unbalanced decrement + increment, and in general 
 the code is a nightmare to follow.
 To make it worse, besides being confusing it doesnt handle all cases: e.g. a 
 negative position increment gap from the analyzer will just result in total 
 chaos (corruption etc).
 I think an easier way to implement this is to init fieldinvertstate.position 
 to -1, and for the logic to be:
 {code}
   position += posincr;
   check that position = 0  position = lastPosition
   lastPosition = position;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5666) Add UninvertingReader

2014-05-17 Thread David Smiley (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000782#comment-14000782
]

David Smiley commented on LUCENE-5666:
--

Oh, right. I'll repost it here for everyone's benefit:
{noformat}
* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc)
to use the DocValues API instead of FieldCache. For FieldCache functionality,
use UninvertingReader in lucene/misc (or implement your own FilterReader).
UninvertingReader is more efficient: supports multi-valued numeric fields,
detects when a multi-valued field is single-valued, reuses caches
of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access
without insanity). Insanity is no longer possible unless you explicitly
want it.
Rename FieldCache* and DocTermOrds* classes in the search package to
DocValues*.
Move SortedSetSortField to core and add SortedSetFieldSource to queries/,
which
takes the same selectors. Add helper methods to DocValues.java that are
better
suited for search code (never return null, etc). (Mike McCandless, Robert
Muir)
{noformat}

I looked up DocValues which is new to me but the commit message references
LUCENE-5573 which seems mis-attributed. I'm kinda surprised FieldCache isn't
deprecated. It could be marked \@lucene.internal. At least... it's name
doesn't seem appropriate anymore. Maybe UninvertedCache. But perhaps a rename
like that would introduce too much change for now, even though it's trunk. It
could use some javadocs stating that DocValues.java should generally be used
instead.

Add UninvertingReader
-

Key: LUCENE-5666
URL: https://issues.apache.org/jira/browse/LUCENE-5666
Project: Lucene - Core
Issue Type: Improvement
Reporter: Robert Muir
Fix For: 5.0

Attachments: LUCENE-5666.patch

Currently the fieldcache is not pluggable at all. It would be better if
everything used the docvalues apis.
This would allow people to customize the implementation, extend the classes
with custom subclasses with additional stuff, etc etc.
FieldCache can be accessed via the docvalues apis, using the FilterReader api.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5677) Simplify position handling in DefaultIndexingChain

2014-05-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000786#comment-14000786
 ] 

ASF subversion and git services commented on LUCENE-5677:
-

Commit 1595475 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1595475 ]

LUCENE-5677: simplify position handling in DefaultIndexingChain

 Simplify position handling in DefaultIndexingChain
 --

 Key: LUCENE-5677
 URL: https://issues.apache.org/jira/browse/LUCENE-5677
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5677.patch, LUCENE-5677.patch


 There are currently a ton of conditionals checking for various problems, as 
 well as a horribly confusing unbalanced decrement + increment, and in general 
 the code is a nightmare to follow.
 To make it worse, besides being confusing it doesnt handle all cases: e.g. a 
 negative position increment gap from the analyzer will just result in total 
 chaos (corruption etc).
 I think an easier way to implement this is to init fieldinvertstate.position 
 to -1, and for the logic to be:
 {code}
   position += posincr;
   check that position = 0  position = lastPosition
   lastPosition = position;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5677) Simplify position handling in DefaultIndexingChain


 [ 
https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5677.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.9

 Simplify position handling in DefaultIndexingChain
 --

 Key: LUCENE-5677
 URL: https://issues.apache.org/jira/browse/LUCENE-5677
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5677.patch, LUCENE-5677.patch


 There are currently a ton of conditionals checking for various problems, as 
 well as a horribly confusing unbalanced decrement + increment, and in general 
 the code is a nightmare to follow.
 To make it worse, besides being confusing it doesnt handle all cases: e.g. a 
 negative position increment gap from the analyzer will just result in total 
 chaos (corruption etc).
 I think an easier way to implement this is to init fieldinvertstate.position 
 to -1, and for the logic to be:
 {code}
   position += posincr;
   check that position = 0  position = lastPosition
   lastPosition = position;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5666) Add UninvertingReader


[ 
https://issues.apache.org/jira/browse/LUCENE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000784#comment-14000784
 ] 

Robert Muir commented on LUCENE-5666:
-

I think you missed the point. it does not have any javadocs: its package 
private.

 Add UninvertingReader
 -

 Key: LUCENE-5666
 URL: https://issues.apache.org/jira/browse/LUCENE-5666
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 5.0

 Attachments: LUCENE-5666.patch


 Currently the fieldcache is not pluggable at all. It would be better if 
 everything used the docvalues apis.
 This would allow people to customize the implementation, extend the classes 
 with custom subclasses with additional stuff, etc etc.
 FieldCache can be accessed via the docvalues apis, using the FilterReader api.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4371) consider refactoring slicer to indexinput.slice

2014-05-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4371.
-

   Resolution: Fixed
Fix Version/s: 5.0

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 5.0

 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, 
 LUCENE-4371.patch, LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice


[ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000794#comment-14000794
 ] 

ASF subversion and git services commented on LUCENE-4371:
-

Commit 1595480 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1595480 ]

LUCENE-4371: Replace IndexInputSlicer with IndexInput.slice

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 5.0

 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, 
 LUCENE-4371.patch, LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput

Uwe Schindler created LUCENE-5678:
-

 Summary: Investigate to use FileoutputStream instead of 
RandomAccessFile in FSIndexOutput
 Key: LUCENE-5678
 URL: https://issues.apache.org/jira/browse/LUCENE-5678
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler


We no longer allow seeking in IndexOutput, so there is no need to use 
RandomAccessFile. We can change this with a  1 KiB patch.

Further improvements would be to merge this with OutputStreamIndexOutput, so we 
get many simplifications.

There is also no reason anymore to separate DataOutput from IndexOutput. The 
only additional thing is IndexOutput#getFilePointer(), which is handled by  an 
internal counter (does not use getFilePointer of the underlying RAF).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput


 [ 
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5678:
--

Attachment: LUCENE-5678.patch

Very simple patch.

[~mikemccand]: It would be good to compare performance as a first review. We 
can then merge this with OutputStreamDataOutput. An alternative would be to 
nuke BufferedIndexOutput completely and use BufferedOutputStream!

 Investigate to use FileoutputStream instead of RandomAccessFile in 
 FSIndexOutput
 

 Key: LUCENE-5678
 URL: https://issues.apache.org/jira/browse/LUCENE-5678
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5678.patch


 We no longer allow seeking in IndexOutput, so there is no need to use 
 RandomAccessFile. We can change this with a  1 KiB patch.
 Further improvements would be to merge this with OutputStreamIndexOutput, so 
 we get many simplifications.
 There is also no reason anymore to separate DataOutput from IndexOutput. The 
 only additional thing is IndexOutput#getFilePointer(), which is handled by  
 an internal counter (does not use getFilePointer of the underlying RAF).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2809) searcher leases

2014-05-17 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2809:
---

Attachment: SOLR-2809.patch

OK, here's a proof-of-concept lease manager that implements the core 
functionality.  It's nice and small since it just uses the existing searcher 
management code.

The remaining work would be integration  HTTP-API:
- SolrCore would have a LeaseManager
- if a lease key is passed in, look up the searcher in the lease manager rather 
than getting the most recently registered searcher
- at the end of a request, do the lease if requested, and return the lease key 
to the client

 searcher leases
 ---

 Key: SOLR-2809
 URL: https://issues.apache.org/jira/browse/SOLR-2809
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Attachments: SOLR-2809.patch


 Leases/reservations on searcher instances would give us the ability to use 
 the same searcher across phases of a distributed search, or for clients to 
 send multiple requests and have them hit a consistent/unchanging view of the 
 index. The latter requires something extra to ensure that the load balancer 
 contacts the same replicas as before.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5970) Create collection API always has status 0

2014-05-17 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000814#comment-14000814
 ] 

Mark Miller commented on SOLR-5970:
---

Collections API responses really need an overhaul I think. One of those things 
that has gotten no real attention. Very hard to process the response currently 
I think.

I do think we need fine grained results available of some kind, unless we 
change how things work - for instance, you can create a collection and it fails 
to create on 4 nodes and succeeds on 3 - that collection will exist regardless 
the way things currently work - it just won't be what you wanted. That's a lot 
more effort to improve I think, but an all or nothing system would be nicer at 
some point IMO.

 Create collection API always has status 0
 -

 Key: SOLR-5970
 URL: https://issues.apache.org/jira/browse/SOLR-5970
 Project: Solr
  Issue Type: Bug
Reporter: Abraham Elmahrek

 The responses below are from a successful create collection API 
 (https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection)
  call and an unsuccessful create collection API call. It seems the 'status' 
 is always 0.
 Success:
 {u'responseHeader': {u'status': 0, u'QTime': 4421}, u'success': {u'': 
 {u'core': u'test1_shard1_replica1', u'responseHeader': {u'status': 0, 
 u'QTime': 3449
 Failure:
 {u'failure': 
   {u'': 
 uorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
 CREATEing SolrCore 'test43_shard1_replica1': Unable to create core: 
 test43_shard1_replica1 Caused by: Could not find configName for collection 
 test43 found:[test1]},
  u'responseHeader': {u'status': 0, u'QTime': 17149}
 }
 It seems like the status should be 400 or something similar for an 
 unsuccessful attempt?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re:

2014-05-17 Thread Tom Burton-West

Thanks Mikhail,

I understand its expensive, but it appears that it is not freeing up memory
after each debugQuery is run.  That seems like it should be avoidable (I
say that without having looked at the code).  Should I open a JIRA about a
possible memory leak?

Tom


On Sat, May 17, 2014 at 8:20 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 For Sure. Lucene's explain is really expensive and is not purposed for
 production use, but only for rare troubleshooting. As a mitigation measure
 you can scroll result set by small portions more efficient like Hoss
 recently explained at SearchHub. In such kind of problems, usually it's
 possible to create sort of specialized custom collectors doing something
 particular.

 Have a god day!


 On Sat, May 17, 2014 at 3:01 AM, Tom Burton-West tburt...@umich.eduwrote:

 Hello all,


 I'm trying to get relevance scoring information for each of 1,000 docs
 returned for each of 250 queries.If I run the query (appended below)
 without debugQuery=on, I have no problem with getting all the results
 with under 4GB of memory use.  If I add the parameter debugQuery=on,
 memory use goes up continuously and after about 20 queries (with 1,000
 results each), memory use reaches about 29.1 GB and the garbage collector
 gives up:

  org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: GC overhead limit exceeded

 I've attached a jmap -histo, exgerpt below.

 Is this a known issue with debugQuery?

 Tom
 
 query:


 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
 debugQuery=on

 without debugQuery=on:


 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2

 num   #instances#bytes  Class description
 --
 1:  585,559 10,292,067,456  byte[]
 2:  743,639 18,874,349,592  char[]
 3:  53,821  91,936,328  long[]
 4:  70,430  69,234,400  int[]
 5:  51,348  27,111,744
  org.apache.lucene.util.fst.FST$Arc[]
 6:  286,357 20,617,704  org.apache.lucene.util.fst.FST$Arc
 7:  715,364 17,168,736  java.lang.String
 8:  79,561  12,547,792  * ConstMethodKlass
 9:  18,909  11,404,696  short[]
 10: 345,854 11,067,328  java.util.HashMap$Entry
 11: 8,823   10,351,024  * ConstantPoolKlass
 12: 79,561  10,193,328  * MethodKlass
 13: 228,587 9,143,480
 org.apache.lucene.document.FieldType
 14: 228,584 9,143,360   org.apache.lucene.document.Field
 15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
 16: 210,342 8,413,680   java.util.TreeMap$Entry
 17: 81,576  8,204,648   java.util.HashMap$Entry[]
 18: 107,921 7,770,312   org.apache.lucene.util.fst.FST$Arc
 19: 13,020  6,874,560
 org.apache.lucene.util.fst.FST$Arc[]


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re:

2014-05-17 Thread Yonik Seeley

On Sat, May 17, 2014 at 12:11 PM, Tom Burton-West tburt...@umich.edu wrote:
 I understand its expensive, but it appears that it is not freeing up memory
 after each debugQuery is run.  That seems like it should be avoidable (I say
 that without having looked at the code).  Should I open a JIRA about a
 possible memory leak?

Yes, please do!

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filtersfieldcache

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput

[
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-5678:
--

Description:
We no longer allow seeking in IndexOutput, so there is no need to use
RandomAccessFile. We can change this with a 1 KiB patch.

Further improvements would be to merge this with OutputStreamIndexOutput, so we
get many simplifications.

There is also no reason anymore to separate DataOutput from IndexOutput. The
only additional thing is IndexOutput#getFilePointer(), which is handled by an
internal counter (does not use getFilePointer of the underlying RAF) and
checksums.

was:
We no longer allow seeking in IndexOutput, so there is no need to use
RandomAccessFile. We can change this with a 1 KiB patch.

Further improvements would be to merge this with OutputStreamIndexOutput, so we
get many simplifications.

Investigate to use FileoutputStream instead of RandomAccessFile in
FSIndexOutput

Key: LUCENE-5678
URL: https://issues.apache.org/jira/browse/LUCENE-5678
Project: Lucene - Core
Issue Type: Bug
Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Attachments: LUCENE-5678.patch

We no longer allow seeking in IndexOutput, so there is no need to use
RandomAccessFile. We can change this with a 1 KiB patch.
Further improvements would be to merge this with OutputStreamIndexOutput, so
we get many simplifications.
There is also no reason anymore to separate DataOutput from IndexOutput. The
only additional thing is IndexOutput#getFilePointer(), which is handled by
an internal counter (does not use getFilePointer of the underlying RAF) and
checksums.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput

2014-05-17 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000801#comment-14000801
]

Uwe Schindler edited comment on LUCENE-5678 at 5/17/14 5:56 PM:

Very simple patch.

[~mikemccand]: It would be good to compare performance as a first review. We
can then merge this with OutputStreamDataOutput. An alternative would be to
nuke BufferedIndexOutput completely and use BufferedOutputStream in
combinations with java.util.zip.CheckedOutputStream (for the checksum)!

was (Author: thetaphi):
Very simple patch.

Investigate to use FileoutputStream instead of RandomAccessFile in
FSIndexOutput

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5675) ID postings format


[ 
https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000850#comment-14000850
 ] 

ASF subversion and git services commented on LUCENE-5675:
-

Commit 1595530 from [~mikemccand] in branch 'dev/branches/lucene5675'
[ https://svn.apache.org/r1595530 ]

LUCENE-5675: checkpoint current dirty state

 ID postings format
 

 Key: LUCENE-5675
 URL: https://issues.apache.org/jira/browse/LUCENE-5675
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir

 Today the primary key lookup in lucene is not that great for systems like 
 solr and elasticsearch that have versioning in front of IndexWriter.
 To some extend BlockTree can sometimes help avoid seeks by telling you the 
 term does not exist for a segment. But this technique (based on FST prefix) 
 is fragile. The only other choice today is bloom filters, which use up huge 
 amounts of memory.
 I don't think we are using everything we know: particularly the version 
 semantics.
 Instead, if the FST for the terms index used an algebra that represents the 
 max version for any subtree, we might be able to answer that there is no term 
 T with version  V in that segment very efficiently.
 Also ID fields dont need postings lists, they dont need stats like 
 docfreq/totaltermfreq, etc this stuff is all implicit. 
 As far as API, i think for users to provide IDs with versions to such a PF, 
 a start would to set a payload or whatever on the term field to get it thru 
 indexwriter to the codec. And a consumer of the codec can just cast the 
 Terms to a subclass that exposes the FST to do this version check efficiently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput


 [ 
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5678:
--

Attachment: (was: LUCENE-5678.patch)

 Investigate to use FileoutputStream instead of RandomAccessFile in 
 FSIndexOutput
 

 Key: LUCENE-5678
 URL: https://issues.apache.org/jira/browse/LUCENE-5678
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5678.patch, LUCENE-5678.patch


 We no longer allow seeking in IndexOutput, so there is no need to use 
 RandomAccessFile. We can change this with a  1 KiB patch.
 Further improvements would be to merge this with OutputStreamIndexOutput, so 
 we get many simplifications.
 There is also no reason anymore to separate DataOutput from IndexOutput. The 
 only additional thing is IndexOutput#getFilePointer(), which is handled by  
 an internal counter (does not use getFilePointer of the underlying RAF) and 
 checksums.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput


[ 
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000867#comment-14000867
 ] 

Michael McCandless commented on LUCENE-5678:


I tested index time for full Wikipedia; it's output intensive, and it looks 
like no perf change w/ the patch, though the numbers are a little noisy from 
run to run ...

 Investigate to use FileoutputStream instead of RandomAccessFile in 
 FSIndexOutput
 

 Key: LUCENE-5678
 URL: https://issues.apache.org/jira/browse/LUCENE-5678
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5678.patch


 We no longer allow seeking in IndexOutput, so there is no need to use 
 RandomAccessFile. We can change this with a  1 KiB patch.
 Further improvements would be to merge this with OutputStreamIndexOutput, so 
 we get many simplifications.
 There is also no reason anymore to separate DataOutput from IndexOutput. The 
 only additional thing is IndexOutput#getFilePointer(), which is handled by  
 an internal counter (does not use getFilePointer of the underlying RAF) and 
 checksums.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput


[ 
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000885#comment-14000885
 ] 

Michael McCandless commented on LUCENE-5678:


Indexing perf of new patch looks fine too!

 Investigate to use FileoutputStream instead of RandomAccessFile in 
 FSIndexOutput
 

 Key: LUCENE-5678
 URL: https://issues.apache.org/jira/browse/LUCENE-5678
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5678.patch, LUCENE-5678.patch


 We no longer allow seeking in IndexOutput, so there is no need to use 
 RandomAccessFile. We can change this with a  1 KiB patch.
 Further improvements would be to merge this with OutputStreamIndexOutput, so 
 we get many simplifications.
 There is also no reason anymore to separate DataOutput from IndexOutput. The 
 only additional thing is IndexOutput#getFilePointer(), which is handled by  
 an internal counter (does not use getFilePointer of the underlying RAF) and 
 checksums.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput


 [ 
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5678:
--

Attachment: LUCENE-5678.patch

 Investigate to use FileoutputStream instead of RandomAccessFile in 
 FSIndexOutput
 

 Key: LUCENE-5678
 URL: https://issues.apache.org/jira/browse/LUCENE-5678
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5678.patch, LUCENE-5678.patch


 We no longer allow seeking in IndexOutput, so there is no need to use 
 RandomAccessFile. We can change this with a  1 KiB patch.
 Further improvements would be to merge this with OutputStreamIndexOutput, so 
 we get many simplifications.
 There is also no reason anymore to separate DataOutput from IndexOutput. The 
 only additional thing is IndexOutput#getFilePointer(), which is handled by  
 an internal counter (does not use getFilePointer of the underlying RAF) and 
 checksums.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput

2014-05-17 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-5678:
--

Attachment: LUCENE-5678.patch

Hi,

I cleaned up most of the code. This now makes BufferedIndexOutput obsolete
(once I fixed RateLimiter, which buffers a second time!!! horrible!!!).

But before I do this, we should check the perf, because this is now completely
different code.

I also fixed HdfsDirectory to use this new class, too.

The only remaining use of BufferedIndexOutput is in RateLimiter. I think we
should plug the rate limiter deeper on the OutputStream level in future
(subclass BufferedOutputStream to limit rate) and allow to plug that into the
FSDir impl.

Investigate to use FileoutputStream instead of RandomAccessFile in
FSIndexOutput

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5650) When mixing adds and deletes, it appears there is a corner case where peersync can bring back a deleted update.


[ 
https://issues.apache.org/jira/browse/SOLR-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000891#comment-14000891
 ] 

ASF subversion and git services commented on SOLR-5650:
---

Commit 1595547 from [~rjernst] in branch 'dev/branches/lucene5650'
[ https://svn.apache.org/r1595547 ]

SOLR-5650: add changes entries

 When mixing adds and deletes, it appears there is a corner case where 
 peersync can bring back a deleted update.
 ---

 Key: SOLR-5650
 URL: https://issues.apache.org/jira/browse/SOLR-5650
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.7, 5.0

 Attachments: SOLR-5650.patch, SOLR-5650.patch, solr.log.tar.gz






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir

2014-05-17 Thread Ryan Ernst (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000889#comment-14000889
 ] 

Ryan Ernst commented on LUCENE-5650:


Sorry about that.  The nocommit was left by mistake.  The failure was a goof on 
my part.  I've put a fix for it in the branch.

 createTempDir and associated functions no longer create java.io.tmpdir
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir

2014-05-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000888#comment-14000888
 ] 

ASF subversion and git services commented on LUCENE-5650:
-

Commit 1595546 from [~rjernst] in branch 'dev/branches/lucene5650'
[ https://svn.apache.org/r1595546 ]

LUCENE-5650: fix some solr tests

 createTempDir and associated functions no longer create java.io.tmpdir
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput

2014-05-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5678:
--

Attachment: LUCENE-5678.patch

New patch to make sure BufferedOutputStream is flushed on close(), not ignoring 
Exceptions.

 Investigate to use FileoutputStream instead of RandomAccessFile in 
 FSIndexOutput
 

 Key: LUCENE-5678
 URL: https://issues.apache.org/jira/browse/LUCENE-5678
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5678.patch, LUCENE-5678.patch, LUCENE-5678.patch


 We no longer allow seeking in IndexOutput, so there is no need to use 
 RandomAccessFile. We can change this with a  1 KiB patch.
 Further improvements would be to merge this with OutputStreamIndexOutput, so 
 we get many simplifications.
 There is also no reason anymore to separate DataOutput from IndexOutput. The 
 only additional thing is IndexOutput#getFilePointer(), which is handled by  
 an internal counter (does not use getFilePointer of the underlying RAF) and 
 checksums.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5675) ID postings format


[ 
https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000896#comment-14000896
 ] 

ASF subversion and git services commented on LUCENE-5675:
-

Commit 1595548 from [~mikemccand] in branch 'dev/branches/lucene5675'
[ https://svn.apache.org/r1595548 ]

LUCENE-5675: testRandom seems to be passing

 ID postings format
 

 Key: LUCENE-5675
 URL: https://issues.apache.org/jira/browse/LUCENE-5675
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir

 Today the primary key lookup in lucene is not that great for systems like 
 solr and elasticsearch that have versioning in front of IndexWriter.
 To some extend BlockTree can sometimes help avoid seeks by telling you the 
 term does not exist for a segment. But this technique (based on FST prefix) 
 is fragile. The only other choice today is bloom filters, which use up huge 
 amounts of memory.
 I don't think we are using everything we know: particularly the version 
 semantics.
 Instead, if the FST for the terms index used an algebra that represents the 
 max version for any subtree, we might be able to answer that there is no term 
 T with version  V in that segment very efficiently.
 Also ID fields dont need postings lists, they dont need stats like 
 docfreq/totaltermfreq, etc this stuff is all implicit. 
 As far as API, i think for users to provide IDs with versions to such a PF, 
 a start would to set a payload or whatever on the term field to get it thru 
 indexwriter to the codec. And a consumer of the codec can just cast the 
 Terms to a subclass that exposes the FST to do this version check efficiently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir

2014-05-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000902#comment-14000902
 ] 

ASF subversion and git services commented on LUCENE-5650:
-

Commit 1595551 from [~rjernst] in branch 'dev/branches/lucene5650'
[ https://svn.apache.org/r1595551 ]

LUCENE-5650: fix one more solr test

 createTempDir and associated functions no longer create java.io.tmpdir
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput