date:20130916


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768098#comment-13768098
 ] 

Littlestar commented on LUCENE-5218:


PagedBytes.java#fillSlice
maybe wrong start??


public void fillSlice(BytesRef b, long start, int length) {
  assert length = 0: length= + length;
  assert length = blockSize+1;
  final int index = (int) (start  blockBits);
  final int offset = (int) (start  blockMask);
  b.length = length;
  if (blockSize - offset = length) {
// Within block
b.bytes = blocks[index];
b.offset = offset;
  } else {
// Split
b.bytes = new byte[length];
b.offset = 0;
System.arraycopy(blocks[index], offset, b.bytes, 0, blockSize-offset);
System.arraycopy(blocks[1+index], 0, b.bytes, blockSize-offset, 
length-(blockSize-offset));
  }
}

 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768098#comment-13768098
 ] 

Littlestar edited comment on LUCENE-5218 at 9/16/13 6:24 AM:
-

PagedBytes.java#fillSlice
maybe wrong start??

{noformat} 
public void fillSlice(BytesRef b, long start, int length) {
  assert length = 0: length= + length;
  assert length = blockSize+1;
  final int index = (int) (start  blockBits);
  final int offset = (int) (start  blockMask);
  b.length = length;
  if (blockSize - offset = length) {
// Within block
b.bytes = blocks[index];
b.offset = offset;
  } else {
// Split
b.bytes = new byte[length];
b.offset = 0;
System.arraycopy(blocks[index], offset, b.bytes, 0, blockSize-offset);
System.arraycopy(blocks[1+index], 0, b.bytes, blockSize-offset, 
length-(blockSize-offset));
  }
}

{noformat} 

  was (Author: cnstar9988):
PagedBytes.java#fillSlice
maybe wrong start??


public void fillSlice(BytesRef b, long start, int length) {
  assert length = 0: length= + length;
  assert length = blockSize+1;
  final int index = (int) (start  blockBits);
  final int offset = (int) (start  blockMask);
  b.length = length;
  if (blockSize - offset = length) {
// Within block
b.bytes = blocks[index];
b.offset = offset;
  } else {
// Split
b.bytes = new byte[length];
b.offset = 0;
System.arraycopy(blocks[index], offset, b.bytes, 0, blockSize-offset);
System.arraycopy(blocks[1+index], 0, b.bytes, blockSize-offset, 
length-(blockSize-offset));
  }
}
  
 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

[
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768107#comment-13768107
]

Shai Erera commented on LUCENE-5215:

Ok I'll add just segmentSuffix

Add support for FieldInfos generation
-

Key: LUCENE-5215
URL: https://issues.apache.org/jira/browse/LUCENE-5215
Project: Lucene - Core
Issue Type: New Feature
Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

In LUCENE-5189 we've identified few reasons to do that:
# If you want to update docs' values of field 'foo', where 'foo' exists in
the index, but not in a specific segment (sparse DV), we cannot allow that
and have to throw a late UOE. If we could rewrite FieldInfos (with
generation), this would be possible since we'd also write a new generation of
FIS.
# When we apply NDV updates, we call DVF.fieldsConsumer. Currently the
consumer isn't allowed to change FI.attributes because we cannot modify the
existing FIS. This is implicit however, and we silently ignore any modified
attributes. FieldInfos.gen will allow that too.
The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and
add support for FIS generation in FieldInfosFormat, SegReader etc., like we
now do for DocValues. I'll work on a patch.
Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that
have same limitation -- if a Codec modifies them, they are silently being
ignored, since we don't gen the .si files. I think we can easily solve that
by recording SI.attributes in SegmentInfos, so they are recorded per-commit.
But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Parquet dictionary encoding bit packing

2013-09-16 Thread eks dev

indeed, I did look at Parquet and had the same feeling as Otis,  some
striking similarity with terminology used around stored fields.

If I got it right, parquet chunk stores sets of documents in chunks, just
like lucene does but each chunk is column stride.
Maybe possible to apply this idea to compressing stored fields (chunks in
column stride fashion)?





On Sun, Sep 15, 2013 at 11:17 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 I was reading the Parquet announcement from July:

 https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop

 And a few things caught my attention - Dictionary encoding and
 (dynamic) bit packing.  This smells like something Adrien likes to eat
 for breakfast.

 Over in the Hadoop ecosystem Parquet interest has picked up:
 http://search-hadoop.com/?q=parquet

 I thought I'd point it out as I haven't seen anyone bring this up.  I
 imagine there are ideas to be borrowed there.

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5217) disable transitive dependencies in maven config

2013-09-16 Thread Steve Rowe (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768115#comment-13768115
]

Steve Rowe commented on LUCENE-5217:

bq. This is really hard to configure and maintain

I agree.

bq. maven supports wildcard exclusions: MNG-3832

I did not know that.

bq. I think it just means we have to require a minimum of maven 3 instead of
also supporting 2. Since this has been out for 3 years (in fact older than the
ant 1.8.2 that we require), I don't see this as a significant imposition on
anyone?

+1, though this will be a viral change, unlike the Ant upgrade: for Ant, we
only forced Lucene/Solr source users to upgrade, but for Maven, everybody who
depends on binary Lucene/Solr artifacts will have to upgrade their own projects
to Maven 3 - I think. I'll do some testing to confirm.

disable transitive dependencies in maven config
---

Key: LUCENE-5217
URL: https://issues.apache.org/jira/browse/LUCENE-5217
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir

Our ivy configuration does this: each dependency is specified and so we know
what will happen. Unfortunately the maven setup is not configured the same
way.
Instead the maven setup is configured to download the internet: and it
excludes certain things specifically.
This is really hard to configure and maintain: we added a
'validate-maven-dependencies' that tries to fail on any extra jars, but all
it really does is run a license check after maven runs. It wouldnt find
unnecessary dependencies being dragged in if something else in lucene was
using them and thus they had a license file.
Since maven supports wildcard exclusions: MNG-3832, we can disable this
transitive shit completely.
We should do this, so its configuration is the exact parallel of ivy.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b106) - Build # 7485 - Failure!

2013-09-16 Thread Shai Erera

I failed to reproduce with the reported seed, master seed, random seeds ...
all with iters. I'll dig.

Shai


On Mon, Sep 16, 2013 at 9:07 AM, Policeman Jenkins Server 
jenk...@thetaphi.de wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7485/
 Java: 64bit/jdk1.8.0-ea-b106 -XX:-UseCompressedOops -XX:+UseSerialGC

 1 tests failed.
 REGRESSION:
  org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields

 Error Message:
 invalid value for doc=351, field=f1 expected:15 but was:14

 Stack Trace:
 java.lang.AssertionError: invalid value for doc=351, field=f1
 expected:15 but was:14
 at
 __randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at
 org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:491)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:724)




 Build Log:

[jira] [Commented] (LUCENE-5217) disable transitive dependencies in maven config

[
https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768152#comment-13768152
]

Robert Muir commented on LUCENE-5217:
-

I wont comment on viral change :)

But I think this is a totally fair thing to do for 5.0, since its a new major
release.

disable transitive dependencies in maven config
---

Key: LUCENE-5217
URL: https://issues.apache.org/jira/browse/LUCENE-5217
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Parquet dictionary encoding bit packing

2013-09-16 Thread Adrien Grand

Thanks for pointing this out, Otis!

I think the columnar nature of Parquet makes it more similar to doc
values than to stored fields, and indeed, if you look at the parquet
file-format specification [1], it is very similar to what we have for
doc values [2]. In both cases, we have
 - dictionary encoding (PLAIN_DICTIONARY in parquet, TABLE_COMPRESSED
in Lucene45DVF),
 - bit-packing (BIT_PACKED(/RLE) in parquet, DELTA_COMPRESSED in Lucene45DVF).

Parquet also uses run-length encoding (RLE) which is unfortunately not
doable for doc values since they need to support random access.
Parquet's RLE compression is actually closer to what we have for
postings lists (a postings list of X values is encoded as X/128 blocs
of 128 packed values and X%128 RLE-encoded (VInt) values). On the
other hand, doc values have GCD_COMPRESSED (which efficiently
compresses any sequence of longs where all values can be expressed as
a * x + b) which is typically useful for storing dates that don't have
millisecond precision.

About stored fields, it would indeed be possible to store all values
of a given field in a column-stride fashion per chunk. However, I
think parquet doesn't optimize for the same thing as stored fields:
parquet needs to run computations on the values of a few fields of
many documents (like doc values) while with stored fields, we usually
need to get all values of a single document. This makes columnar
storage a bit unconvenient for stored fields, although I think we
could try it on our chunks of stored documents given that it may
improve the compression ratio.

I only have a very superficial understanding of parquet so if you know
I said something which is wrong about parquet, please tell me!

[1] https://github.com/parquet/parquet-format
[2] 
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesConsumer.java

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-16 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768204#comment-13768204
]

Simon Willnauer commented on LUCENE-5189:
-

I only briefly looked at the changed in DW, DWPT, IW BDS and I have 2
questions:

- SegmentWriteState flushState; in DWPT is unused - can we remove it? (I
generally want this class to have only final members as well if possible)
- In DW the `updateNumericDocValue` method is synchronized - I don't think it
needs to. The other two deletes methods don't need to be synced either - maybe
we can open another issue to remove the synchronization? It won't be possible
to just drop it but it won't be much work.

I really like the way how this is implemented piggybacking on the delete queue
to get a total ordering :)
nice one!

Numeric DocValues Updates
-

Key: LUCENE-5189
URL: https://issues.apache.org/jira/browse/LUCENE-5189
Project: Lucene - Core
Issue Type: New Feature
Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch

In LUCENE-4258 we started to work on incremental field updates, however the
amount of changes are immense and hard to follow/consume. The reason is that
we targeted postings, stored fields, DV etc., all from the get go.
I'd like to start afresh here, with numeric-dv-field updates only. There are
a couple of reasons to that:
* NumericDV fields should be easier to update, if e.g. we write all the
values of all the documents in a segment for the updated field (similar to
how livedocs work, and previously norms).
* It's a fairly contained issue, attempting to handle just one data type to
update, yet requires many changes to core code which will also be useful for
updating other data types.
* It has value in and on itself, and we don't need to allow updating all the
data types in Lucene at once ... we can do that gradually.
I have some working patch already which I'll upload next, explaining the
changes.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 807 - Failure!

2013-09-16 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/807/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9834 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=2C69D4309AF28FAE -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.6 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=4.6-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 
-classpath

[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException

2013-09-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768253#comment-13768253
 ] 

Michael McCandless commented on LUCENE-5218:


Which JVM are you using?

 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768274#comment-13768274
 ] 

Shai Erera commented on LUCENE-5189:


Jenkins reported this failure, which I'm unable to reproduce with and without 
the seed (master and child), with iters.

{noformat}
1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields

Error Message:
invalid value for doc=351, field=f1 expected:15 but was:14

Stack Trace:
java.lang.AssertionError: invalid value for doc=351, field=f1 expected:15 but 
was:14
at 
__randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757)
...

Build Log:
[...truncated 776 lines...]
   [junit4] Suite: org.apache.lucene.index.TestNumericDocValuesUpdates
   [junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=TestNumericDocValuesUpdates -Dtests.method=testManyReopensAndFields 
-Dtests.seed=5E1E0079E35D52E -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=tr -Dtests.timezone=Etc/GMT-6 -Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 1.40s J0 | 
TestNumericDocValuesUpdates.testManyReopensAndFields 
   [junit4] Throwable #1: java.lang.AssertionError: invalid value for 
doc=351, field=f1 expected:15 but was:14
   [junit4]at 
__randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0)
   [junit4]at 
org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757)
   [junit4]at java.lang.Thread.run(Thread.java:724)
   [junit4]   2 NOTE: test params are: codec=Asserting, 
sim=RandomSimilarityProvider(queryNorm=false,coord=no): {}, locale=tr, 
timezone=Etc/GMT-6
   [junit4]   2 NOTE: Linux 3.2.0-53-generic amd64/Oracle Corporation 1.8.0-ea 
(64-bit)/cpus=8,threads=1,free=66621176,total=210272256
   [junit4]   2 NOTE: All tests run in this JVM: [TestSegmentReader, 
TestStressNRT, TestSort, TestShardSearching, TestEliasFanoSequence, 
TestBytesRefHash, TestPhrasePrefixQuery, TestLucene45DocValuesFormat, 
TestFastCompressionMode, TestEliasFanoDocIdSet, TestSearchForDuplicates, 
TestFixedBitSet, TestIsCurrent, TestFilteredSearch, 
TestFieldCacheSanityChecker, TestSegmentTermEnum, TestDeletionPolicy, 
TestSimpleExplanations, TestRegexpRandom, TestIndexCommit, 
TestCloseableThreadLocal, TestNumericRangeQuery32, TestTwoPhaseCommitTool, 
TestIndexWriterOnDiskFull, TestPhraseQuery, TestSearchAfter, 
TestParallelReaderEmptyIndex, TestMaxTermFrequency, 
TestFlushByRamOrCountsPolicy, TestSimilarity, TestNumericRangeQuery64, 
TestByteSlices, TestSameScoresWithThreads, TestDocValuesWithThreads, 
TestMockAnalyzer, TestArrayUtil, TestPostingsOffsets, 
TestCompressingTermVectorsFormat, TestSentinelIntSet, TestCustomNorms, 
TestExternalCodecs, TestNumericDocValuesUpdates]
   [junit4] Completed on J0 in 83.46s, 24 tests, 1 failure  FAILURES!
{noformat}

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

[
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768271#comment-13768271
]

Shai Erera commented on LUCENE-5189:

bq. SegmentWriteState flushState; in DWPT is unused

+1 to remove it. Indeed it's unused, but because it's package-private, eclipse
doesn't complain about it.

bq. In DW the `updateNumericDocValue` method is synchronized

I followed the other two delete methods. I'm fine with opening a separate issue
to remove the synchronization, especially if it's not trivial.

bq. I really like the way how this is implemented piggybacking on the delete
queue to get a total ordering

Thanks, it was very helpful to have deletes already covered like that. I only
had to follow their breadcrumbs :).

Numeric DocValues Updates
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768322#comment-13768322
 ] 

Littlestar commented on LUCENE-5218:


java version 1.7.0_25
I also build openjdk 7u40 with openjdk-7u40-fcs-src-b43-26_aug_2013.zip
two jdks has same problem.


 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768322#comment-13768322
 ] 

Littlestar edited comment on LUCENE-5218 at 9/16/13 2:09 PM:
-

java version 1.7.0_25
I also build jdk 7u40 with openjdk-7u40-fcs-src-b43-26_aug_2013.zip
two jdks has same problem.


  was (Author: cnstar9988):
java version 1.7.0_25
I also build openjdk 7u40 with openjdk-7u40-fcs-src-b43-26_aug_2013.zip
two jdks has same problem.

  
 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768344#comment-13768344
 ] 

Littlestar edited comment on LUCENE-5218 at 9/16/13 2:22 PM:
-

my app continue insert records, may be 10-1 records per seconds.
lucene index with a lots of small segments, so I call forceMerge(80) before 
each call.


  was (Author: cnstar9988):
my app continue insert records, may be 10-1 records per seconds.
lucene index with very small segments, so I call forceMerge(80) before each 
call.
  
 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768344#comment-13768344
 ] 

Littlestar commented on LUCENE-5218:


my app continue insert records, may be 10-1 records per seconds.
lucene index with very small segments, so I call forceMerge(80) before each 
call.

 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-16 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768362#comment-13768362
 ] 

David Smiley commented on SOLR-2548:


I just committed to trunk; I'll wait a day just in case and for any more 
feedback before applying to 4x.

 Multithreaded faceting
 --

 Key: SOLR-2548
 URL: https://issues.apache.org/jira/browse/SOLR-2548
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1
Reporter: Janne Majaranta
Assignee: Erick Erickson
Priority: Minor
  Labels: facet
 Fix For: 4.5, 5.0

 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
 SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
 SOLR-2548.patch


 Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException

2013-09-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768380#comment-13768380
 ] 

Michael McCandless commented on LUCENE-5218:


Don't use 7u40: there is apparently a JVM bug that can cause index corruption 
like this (LUCENE-5212).

But 7u25 should be safe.  If you use only 7u25, and start from a new index, you 
can reproduce this exception?  Can you run CheckIndex on the resulting index 
and post the output?

 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
 [maxNumSegments=80]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
   at 
 com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
   ... 4 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
   at 
 org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
   at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
   at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
   at 
 org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
   at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-09-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768381#comment-13768381
 ] 

Mark Miller commented on SOLR-5150:
---

I'm just going to commit the current fix and worry about any performance 
improvements in another issue.

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.


[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768387#comment-13768387
 ] 

ASF subversion and git services commented on SOLR-5150:
---

Commit 1523693 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1523693 ]

SOLR-5150: HdfsIndexInput may not fully read requested bytes.

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5082) Implement ie=charset parameter

2013-09-16 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768383#comment-13768383
 ] 

David Smiley commented on SOLR-5082:


Uwe, why did you give me credit with you on this in CHANGES.txt?

By the way, I was looking through the code for this. Why in decodeBuffer() do 
you call remove() from the buffer iterator on every item; couldn't you not to 
that and simply call clear() when the loop is done?  If you made that change, I 
think ArrayList would perform better for this buffer than LinkedList.

 Implement ie=charset parameter
 --

 Key: SOLR-5082
 URL: https://issues.apache.org/jira/browse/SOLR-5082
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Shawn Heisey
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5082.patch, SOLR-5082.patch


 Allow a user to send a query or update to Solr in a character set other than 
 UTF-8 and inform Solr what charset to use with an ie parameter, for input 
 encoding.  This was discussed in SOLR-4265 and SOLR-4283.
 Changing the default charset is a bad idea because distributed search 
 (SolrCloud) relies on UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 807 - Failure!

2013-09-16 Thread Robert Muir

jvm crash:

   [junit4] JVM J0: stdout was not empty, see:
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20130916_101823_672.sysout
   [junit4]  JVM J0: stdout (verbatim) 
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGSEGV (0xb) at pc=0x000103acfa2b, pid=388, tid=104711
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment
(7.0_40-b43) (build 1.7.0_40-b43)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b56
mixed mode bsd-amd64 )
   [junit4] # Problematic frame:
   [junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3


On Mon, Sep 16, 2013 at 6:22 AM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/807/
 Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

 All tests passed

 Build Log:
 [...truncated 9834 lines...]
[junit4] ERROR: JVM J0 ended with an exception, command line: 
 /Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java 
 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 
 -XX:+HeapDumpOnOutOfMemoryError 
 -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/heapdumps 
 -Dtests.prefix=tests -Dtests.seed=2C69D4309AF28FAE -Xmx512M -Dtests.iters= 
 -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
 -Dtests.postingsformat=random -Dtests.docvaluesformat=random 
 -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
 -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.6 
 -Dtests.cleanthreads=perClass 
 -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/logging.properties
  -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
 -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
 -Djava.io.tmpdir=. 
 -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp
  
 -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build/clover/db
  -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
 -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/tests.policy
  -Dlucene.version=4.6-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
 -Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 
 -classpath

[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-09-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768393#comment-13768393
 ] 

ASF subversion and git services commented on SOLR-5150:
---

Commit 1523694 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1523694 ]

SOLR-5150: HdfsIndexInput may not fully read requested bytes.

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-09-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768400#comment-13768400
 ] 

ASF subversion and git services commented on SOLR-5150:
---

Commit 1523698 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1523698 ]

SOLR-5150: HdfsIndexInput may not fully read requested bytes.

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-09-16 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5150.
---

Resolution: Fixed

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-09-16 Thread Elran Dvir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768442#comment-13768442
 ] 

Elran Dvir commented on SOLR-5084:
--

Hi all,

Did any have a chance to examine the latest patch?

Thanks. 

 new field type - EnumField
 --

 Key: SOLR-5084
 URL: https://issues.apache.org/jira/browse/SOLR-5084
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Erick Erickson
 Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
 Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch, 
 Solr-5084.trunk.patch


 We have encountered a use case in our system where we have a few fields 
 (Severity. Risk etc) with a closed set of values, where the sort order for 
 these values is pre-determined but not lexicographic (Critical is higher than 
 High). Generically this is very close to how enums work.
 To implement, I have prototyped a new type of field: EnumField where the 
 inputs are a closed predefined  set of strings in a special configuration 
 file (similar to currency.xml).
 The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: SimplePostToolTest very slow

2013-09-16 Thread Chris Hostetter


: the main problem with any security manager is: To check if a connection 
: is allowed, it has to resolve DNS and look the IP up in the policy.

Can we update the secuity policy to fail fast anytime a DNS lookup 
happens? even if it happens implicitly in situations like this 
(URL.hashCode) so we can more easily find problems like this via test 
Exceptions instead of via slow tests?

(I'm not saying it's a good idea to do this -- i don't know -- it might 
make more trouble then it's worth ... i'm just trying to udnerstand if 
it's possible)


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5241) SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com


[ 
https://issues.apache.org/jira/browse/SOLR-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768474#comment-13768474
 ] 

ASF subversion and git services commented on SOLR-5241:
---

Commit 1523725 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1523725 ]

SOLR-5241: Fix SimplePostToolTest performance problem - implicit DNS lookups

 SimplePostToolTest is slow on some systmes - likely due to hostname 
 resolution of example.com
 ---

 Key: SOLR-5241
 URL: https://issues.apache.org/jira/browse/SOLR-5241
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5241.patch, SOLR-5241.patch


 As noted by Shai on the dev @lucene list, SimplePostToolTest is ridiculously 
 slow when he ran from ant, but only takes 1 second in his IDE.
 problem seems to be relate to the URL class attempting to response 
 example.com

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5241) SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com

2013-09-16 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man resolved SOLR-5241.

Resolution: Fixed
Fix Version/s: 4.6
5.0

bq. ... Theoretically we could also use 127.0.0.1, the blackhole is not related
here, because it just looks up hostnames. ...

we could, this was my point earlier when i asked rmuir why [ff01::114] was
better - since we're never opening a socket i didn't understand the diff.

now that i do understand the diff however, i definitely think [ff01::114] is
better -- not because of anything in the test now, but because it helps protect
us from the risk of someone working on the test in the future and accidentally
changing something so that it *does* start trying to open sockets.

so i've committed the most recent patch as is.

Thanks everybody for your help.

SimplePostToolTest is slow on some systmes - likely due to hostname
resolution of example.com
---

Key: SOLR-5241
URL: https://issues.apache.org/jira/browse/SOLR-5241
Project: Solr
Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
Fix For: 5.0, 4.6

Attachments: SOLR-5241.patch, SOLR-5241.patch

As noted by Shai on the dev @lucene list, SimplePostToolTest is ridiculously
slow when he ran from ant, but only takes 1 second in his IDE.
problem seems to be relate to the URL class attempting to response
example.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SimplePostToolTest very slow

2013-09-16 Thread Robert Muir

Solr tests will all completely fail in that case then: just like they
do when i run them on my laptop with internet disconnected.

thats because it looks up its own hostname: which involves
reverse/forward dns lookups.

On Mon, Sep 16, 2013 at 1:07 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : the main problem with any security manager is: To check if a connection
 : is allowed, it has to resolve DNS and look the IP up in the policy.

 Can we update the secuity policy to fail fast anytime a DNS lookup
 happens? even if it happens implicitly in situations like this
 (URL.hashCode) so we can more easily find problems like this via test
 Exceptions instead of via slow tests?

 (I'm not saying it's a good idea to do this -- i don't know -- it might
 make more trouble then it's worth ... i'm just trying to udnerstand if
 it's possible)


 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Parquet dictionary encoding bit packing

2013-09-16 Thread Otis Gospodnetic

You guys got it, of course. :)

I liked the sound of being able to detect how to pack things at run
time and switch between multiple approaches over time or at least
that's how I interpreted the announcement.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Sep 16, 2013 at 4:29 AM, Adrien Grand jpou...@gmail.com wrote:
 Thanks for pointing this out, Otis!

 I think the columnar nature of Parquet makes it more similar to doc
 values than to stored fields, and indeed, if you look at the parquet
 file-format specification [1], it is very similar to what we have for
 doc values [2]. In both cases, we have
  - dictionary encoding (PLAIN_DICTIONARY in parquet, TABLE_COMPRESSED
 in Lucene45DVF),
  - bit-packing (BIT_PACKED(/RLE) in parquet, DELTA_COMPRESSED in Lucene45DVF).

 Parquet also uses run-length encoding (RLE) which is unfortunately not
 doable for doc values since they need to support random access.
 Parquet's RLE compression is actually closer to what we have for
 postings lists (a postings list of X values is encoded as X/128 blocs
 of 128 packed values and X%128 RLE-encoded (VInt) values). On the
 other hand, doc values have GCD_COMPRESSED (which efficiently
 compresses any sequence of longs where all values can be expressed as
 a * x + b) which is typically useful for storing dates that don't have
 millisecond precision.

 About stored fields, it would indeed be possible to store all values
 of a given field in a column-stride fashion per chunk. However, I
 think parquet doesn't optimize for the same thing as stored fields:
 parquet needs to run computations on the values of a few fields of
 many documents (like doc values) while with stored fields, we usually
 need to get all values of a single document. This makes columnar
 storage a bit unconvenient for stored fields, although I think we
 could try it on our chunks of stored documents given that it may
 improve the compression ratio.

 I only have a very superficial understanding of parquet so if you know
 I said something which is wrong about parquet, please tell me!

 [1] https://github.com/parquet/parquet-format
 [2] 
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesConsumer.java

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5241) SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com

2013-09-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768478#comment-13768478
 ] 

ASF subversion and git services commented on SOLR-5241:
---

Commit 1523726 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1523726 ]

SOLR-5241: Fix SimplePostToolTest performance problem - implicit DNS lookups 
(merge r1523725)

 SimplePostToolTest is slow on some systmes - likely due to hostname 
 resolution of example.com
 ---

 Key: SOLR-5241
 URL: https://issues.apache.org/jira/browse/SOLR-5241
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5241.patch, SOLR-5241.patch


 As noted by Shai on the dev @lucene list, SimplePostToolTest is ridiculously 
 slow when he ran from ant, but only takes 1 second in his IDE.
 problem seems to be relate to the URL class attempting to response 
 example.com

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Parquet dictionary encoding bit packing

2013-09-16 Thread Robert Muir

To some extent that already happens in a rough way in things like
BlockPackedWriter (and also postings lists).

For example these things encode blocks (e.g. 128 in the postings,
maybe 1024 in docvalues, i forget), and if they encounter blocks of
all the same value, they just write a bit marking that and encode the
value once.

On Mon, Sep 16, 2013 at 1:18 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 You guys got it, of course. :)

 I liked the sound of being able to detect how to pack things at run
 time and switch between multiple approaches over time or at least
 that's how I interpreted the announcement.

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Mon, Sep 16, 2013 at 4:29 AM, Adrien Grand jpou...@gmail.com wrote:
 Thanks for pointing this out, Otis!

 I think the columnar nature of Parquet makes it more similar to doc
 values than to stored fields, and indeed, if you look at the parquet
 file-format specification [1], it is very similar to what we have for
 doc values [2]. In both cases, we have
  - dictionary encoding (PLAIN_DICTIONARY in parquet, TABLE_COMPRESSED
 in Lucene45DVF),
  - bit-packing (BIT_PACKED(/RLE) in parquet, DELTA_COMPRESSED in 
 Lucene45DVF).

 Parquet also uses run-length encoding (RLE) which is unfortunately not
 doable for doc values since they need to support random access.
 Parquet's RLE compression is actually closer to what we have for
 postings lists (a postings list of X values is encoded as X/128 blocs
 of 128 packed values and X%128 RLE-encoded (VInt) values). On the
 other hand, doc values have GCD_COMPRESSED (which efficiently
 compresses any sequence of longs where all values can be expressed as
 a * x + b) which is typically useful for storing dates that don't have
 millisecond precision.

 About stored fields, it would indeed be possible to store all values
 of a given field in a column-stride fashion per chunk. However, I
 think parquet doesn't optimize for the same thing as stored fields:
 parquet needs to run computations on the values of a few fields of
 many documents (like doc values) while with stored fields, we usually
 need to get all values of a single document. This makes columnar
 storage a bit unconvenient for stored fields, although I think we
 could try it on our chunks of stored documents given that it may
 improve the compression ratio.

 I only have a very superficial understanding of parquet so if you know
 I said something which is wrong about parquet, please tell me!

 [1] https://github.com/parquet/parquet-format
 [2] 
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesConsumer.java

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-16 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768518#comment-13768518
 ] 

Hoss Man commented on SOLR-2548:


David: i'm not suggesting we rush this -- but if your changes aren't going to 
make it into 4.5, we should track them in a new issue that can have it's own 
record in CHANGES.txt so it's clear what versions of Solr have what version of 
the code.

 Multithreaded faceting
 --

 Key: SOLR-2548
 URL: https://issues.apache.org/jira/browse/SOLR-2548
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1
Reporter: Janne Majaranta
Assignee: Erick Erickson
Priority: Minor
  Labels: facet
 Fix For: 4.5, 5.0

 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
 SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
 SOLR-2548.patch


 Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #449: POMs out of sync

2013-09-16 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/449/

83 tests failed.
FAILED:  org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:13F466DD345B91A3]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.doTestLotsOfBindings(TestDemoExpressions.java:174)
at 
org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings(TestDemoExpressions.java:156)


FAILED:  org.apache.lucene.expressions.TestDemoExpressions.test

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:5DC4FDA4CB21DF11]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.test(TestDemoExpressions.java:85)


FAILED:  org.apache.lucene.expressions.TestDemoExpressions.testSortValues

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:F364EFF69B4C4982]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.testSortValues(TestDemoExpressions.java:100)


FAILED:  org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:2DBE21DF8775F2C1]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding(TestDemoExpressions.java:118)


FAILED:  
org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:E49206CA6799B14F]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression(TestDemoExpressions.java:136)


FAILED:  org.apache.lucene.expressions.TestExpressionSorts.testQueries

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([BD0F8AC4302D298F:E181461F2A449C21]:0)
at 
org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:146)
at 
org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:130)
at 
org.apache.lucene.expressions.TestExpressionSorts.testQueries(TestExpressionSorts.java:101)


FAILED:  
org.apache.lucene.expressions.TestExpressionValidation.testValidExternals

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([DDCBCADBC5FF7283:7A90B2D1B9FBFC41]:0)
at 
org.apache.lucene.expressions.TestExpressionValidation.testValidExternals(TestExpressionValidation.java:33)


FAILED:  org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion3

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([DDCBCADBC5FF7283:7DA332E3CD41F10]:0)
at 
org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion3(TestExpressionValidation.java:103)


FAILED:  org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([DDCBCADBC5FF7283:7AB01C912ED84010]:0)
at 
org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2(TestExpressionValidation.java:90)


FAILED:  org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion

Error Message:
Could not

[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-16 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768528#comment-13768528
 ] 

David Smiley commented on SOLR-2548:


I thought about that. I figure that if I'm cautious about this such as by 
committing to trunk first, as I did, then there shouldn't be consternation 
about porting this to branch_45.  Besides, I have more confidence in 
understanding the code that I committed vs. what it replaced.  But I take your 
point that *if* for some reason it doesn't go to v4.5 then, sure, use another 
issue.

 Multithreaded faceting
 --

 Key: SOLR-2548
 URL: https://issues.apache.org/jira/browse/SOLR-2548
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1
Reporter: Janne Majaranta
Assignee: Erick Erickson
Priority: Minor
  Labels: facet
 Fix For: 4.5, 5.0

 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
 SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
 SOLR-2548.patch


 Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2013-09-16 Thread wolfgang hoschek (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768629#comment-13768629
]

wolfgang hoschek commented on SOLR-1301:

cdk-morphlines-solr-core and cdk-morphlines-solr-cell should remain separate
and be available through separate maven modules so that clients such as Flume
Solr Sink and Hbase Indexer can continue to choose to depend (or not depend) on
them. For example, not everyone wants tika and it's dependency chain.

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 4.5, 5.0

Attachments: commons-logging-1.0.4.jar,
commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar,
hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch,
log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch,
SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SolrRecordWriter.java

This patch contains a contrib module that provides distributed indexing
(using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is
twofold:
* provide an API that is familiar to Hadoop developers, i.e. that of
OutputFormat
* avoid unnecessary export and (de)serialization of data maintained on HDFS.
SolrOutputFormat consumes data produced by reduce tasks directly, without
storing it in intermediate files. Furthermore, by using an
EmbeddedSolrServer, the indexing task is split into as many parts as there
are reducers, and the data to be indexed is not sent over the network.
Design
--
Key/value pairs produced by reduce tasks are passed to SolrOutputFormat,
which in turn uses SolrRecordWriter to write this data. SolrRecordWriter
instantiates an EmbeddedSolrServer, and it also instantiates an
implementation of SolrDocumentConverter, which is responsible for turning
Hadoop (key, value) into a SolrInputDocument. This data is then added to a
batch, which is periodically submitted to EmbeddedSolrServer. When reduce
task completes, and the OutputFormat is closed, SolrRecordWriter calls
commit() and optimize() on the EmbeddedSolrServer.
The API provides facilities to specify an arbitrary existing solr.home
directory, from which the conf/ and lib/ files will be taken.
This process results in the creation of as many partial Solr home directories
as there were reduce tasks. The output shards are placed in the output
directory on the default filesystem (e.g. HDFS). Such part-N directories
can be used to run N shard servers. Additionally, users can specify the
number of reduce tasks, in particular 1 reduce task, in which case the output
will consist of a single shard.
An example application is provided that processes large CSV files and uses
this API. It uses a custom CSV processing to avoid (de)serialization overhead.
This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this
issue, you should put it in contrib/hadoop/lib.
Note: the development of this patch was sponsored by an anonymous contributor
and approved for release under Apache License.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5234) Allow SolrResourceLoader to load resources from URLs

2013-09-16 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768616#comment-13768616
 ] 

Markus Jelsma commented on SOLR-5234:
-

So i assume this will provide a similar functionallity as SOLR-5234 when in 
cloud mode?

 Allow SolrResourceLoader to load resources from URLs
 

 Key: SOLR-5234
 URL: https://issues.apache.org/jira/browse/SOLR-5234
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-5234.patch, SOLR-5234.patch


 This would allow multiple solr instance to share large configuration files.  
 It would also help resolve problems caused by attempting to store 1Mb files 
 in zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-09-16 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768661#comment-13768661
 ] 

Erick Erickson commented on SOLR-5084:
--

I have a flight coming up, I'll see if I can give it a look-see.

 new field type - EnumField
 --

 Key: SOLR-5084
 URL: https://issues.apache.org/jira/browse/SOLR-5084
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Erick Erickson
 Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
 Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch, 
 Solr-5084.trunk.patch


 We have encountered a use case in our system where we have a few fields 
 (Severity. Risk etc) with a closed set of values, where the sort order for 
 these values is pre-determined but not lexicographic (Critical is higher than 
 High). Generically this is very close to how enums work.
 To implement, I have prototyped a new type of field: EnumField where the 
 inputs are a closed predefined  set of strings in a special configuration 
 file (similar to currency.xml).
 The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2013-09-16 Thread wolfgang hoschek (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768662#comment-13768662
]

wolfgang hoschek commented on SOLR-1301:

Seems like the patch still misses tika-xmp.

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 4.5, 5.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions

2013-09-16 Thread Dawid Weiss (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768737#comment-13768737
]

Dawid Weiss commented on LUCENE-5214:
-

I looked through the patch but I didn't get it, too late ;) I'll give it
another shot later.

Anyway, the idea is very interesting though -- I wonder how much left-context
(regardless of this implementation) one needs for the right prediction (reminds
me of Markov chains and generative poetry :)

Add new FreeTextSuggester, to handle long tail suggestions

Key: LUCENE-5214
URL: https://issues.apache.org/jira/browse/LUCENE-5214
Project: Lucene - Core
Issue Type: Improvement
Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.0, 4.6

Attachments: LUCENE-5214.patch

The current suggesters are all based on a finite space of possible
suggestions, i.e. the ones they were built on, so they can only
suggest a full suggestion from that space.
This means if the current query goes outside of that space then no
suggestions will be found.
The goal of FreeTextSuggester is to address this, by giving
predictions based on an ngram language model, i.e. using the last few
tokens from the user's query to predict likely following token.
I got the idea from this blog post about Google's suggest:
http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html
This is very much still a work in progress, but it seems to be
working. I've tested it on the AOL query logs, using an interactive
tool from luceneutil to show the suggestions, and it seems to work well.
It's fun to use that tool to explore the word associations...
I don't think this suggester would be used standalone; rather, I think
it'd be a fallback for times when the primary suggester fails to find
anything. You can see this behavior on google.com, if you type the
fast and the , you see entire queries being suggested, but then if
the next word you type is burning then suddenly you see the
suggestions are only based on the last word, not the entire query.
It uses ShingleFilter under-the-hood to generate the token ngrams;
once LUCENE-5180 is in it will be able to properly handle a user query
that ends with stop-words (e.g. wizard of ), and then stores the
ngrams in an FST.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5215) Add support for FieldInfos generation

[
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-5215:
---

Attachment: LUCENE-5215.patch

Patch adds FieldInfos generation:

* SegmentInfoPerCommit manages fieldInfosGen; SegmentInfos read/write it, like
delGen.
** Updated SegmentInfos format jdocs
* ReaderAndLiveDocs writes a new FIS generation if there are DV updates, also
updates existing FIs dvGen.
** We now support updating documents in segments where the field wasn't indexed
(sparse DV).
* New Lucene46Codec and Lucene46FieldInfosFormat for writing the dvGen per
field in the fnm file.
** Updated package.html
** Updated FieldInfosFormat jdocs
** Deprecated Lucene45Codec, moved Lucene42FieldInfosWriter to test-framework,
added Lucene45RWCodec
* Added a static utility method to SegmentReader to readFieldInfos from SIPC,
since a couple of places in the code needed to execute same logic.
* Added segmentSuffix to FieldsReader/Writer.

Most of the changes in the patch are due to the new Lucene46Codec. I couldn't
test FIS.gen without making all the other changes (Lucene45Codec deprecation
etc.) because I didn't feel running tests with e.g. -Dtests.codec=Lucene46 is
enough. So the patch is big, but if you want to review the FIS.gen changes, you
should look at: Lucene46Codec, Lucene46FieldInfosFormat, ReaderAndLiveDocs,
SIPC, SIS.

Core tests pass, so I think it's ready for a review. Also, do I understand
correctly that a 4.5 index for TestBackcompat will be created when we release
4.6 (if that issue makes it to 4.6)?

Add support for FieldInfos generation
-

Key: LUCENE-5215
URL: https://issues.apache.org/jira/browse/LUCENE-5215
Project: Lucene - Core
Issue Type: New Feature
Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Attachments: LUCENE-5215.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #972: POMs out of sync

2013-09-16 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/972/

83 tests failed.
FAILED:  
org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:E0EB452872B80D84]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression(TestDemoExpressions.java:134)


FAILED:  org.apache.lucene.expressions.TestDemoExpressions.testSortValues

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:F71DAC148E6DF549]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.testSortValues(TestDemoExpressions.java:98)


FAILED:  org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:29C7623D92544E0A]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding(TestDemoExpressions.java:116)


FAILED:  org.apache.lucene.expressions.TestDemoExpressions.test

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:59BDBE46DE0063DA]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.test(TestDemoExpressions.java:83)


FAILED:  org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:178D253F217A2D68]:0)
at 
org.apache.lucene.expressions.TestDemoExpressions.doTestLotsOfBindings(TestDemoExpressions.java:172)
at 
org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings(TestDemoExpressions.java:154)


FAILED:  org.apache.lucene.expressions.TestExpressionSorts.testQueries

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([A405898F38B743F3:F88B455422DEF65D]:0)
at 
org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:144)
at 
org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:128)
at 
org.apache.lucene.expressions.TestExpressionSorts.testQueries(TestExpressionSorts.java:99)


FAILED:  
org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal2

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([EEC2EA3D0D51:B94A06B61EEAFE2E]:0)
at 
org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal2(TestExpressionValidation.java:56)


FAILED:  org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([EEC2EA3D0D51:49B93C7AEB2A3FC2]:0)
at 
org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2(TestExpressionValidation.java:90)


FAILED:  
org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal

Error Message:
Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler

Stack Trace:
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.expressions.js.JavascriptCompiler
at 
__randomizedtesting.SeedInfo.seed([EEC2EA3D0D51:B9449032CFC5C3D]:0)
at 
org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal(TestExpressionValidation.java:44)


FAILED:  
org.apache.lucene.expressions.TestExpressionValidation.testSelfRecursion

Error

[jira] [Updated] (SOLR-4221) Custom sharding

2013-09-16 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-4221:
-

Fix Version/s: 5.0
   4.5

 Custom sharding
 ---

 Key: SOLR-4221
 URL: https://issues.apache.org/jira/browse/SOLR-4221
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 4.5, 5.0

 Attachments: SOLR-4221.patch, SOLR-4221.patch, SOLR-4221.patch, 
 SOLR-4221.patch, SOLR-4221.patch


 Features to let users control everything about sharding/routing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated SOLR-1301:
--

Attachment: SOLR-1301.patch

This is likely the last patch I'll put up for a bit - I'm on vacation from
Wed-Mon.

Patch Notes:

ant precommit passes again. I've fixed the forbidden api calls and a couple
minor javadoc issues in the new morphlines code. Also fixed a more problematic
javadocs issue due to broken links from the morphlines code to extraction code
due to extending extraction classes.

I've added tika-xmp to the extraction dependencies.

I don't like that tests can pass when some necessary run-time jars are missing
- we will likely need to look into adding simple tests that cause each
necessary jar to be used - or even just have hack tests that try and create a
class in the offending jars or something. I'll save that for a follow up issue
though - the solr cell morphlines tests actually upped the number of
dependencies tests hit quite a bit at least.

There is also a test speed issue that is not on the critical path - on my fast
machine that does 8 tests in parallel, this adds about 4-5 minutes to the
tests. It would be good to try and minimize some of the longer tests for std
runs, and keep them as is for @nightly runs. That can wait post commit though.

That leaves the following 2 critical path items to deal with:

* Get the tests to run without a hacked test.policy file.
* Dist packaging. This includes things like creation of the final
MapReduceIndexerTool jar file and dealing with it's dependencies, as well as
the location of the morphlines code and how it is distributed.

Other than that we are looking pretty good - all tests passing and precommit
passing.

Add a Solr contrib that allows for building Solr indexes via Hadoop's
Map-Reduce.
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: New Feature
Reporter: Andrzej Bialecki
Assignee: Mark Miller
Fix For: 4.5, 5.0

Re: Can we use TREC data set in open source?

2013-09-16 Thread Grant Ingersoll

Inline below

On Sep 9, 2013, at 10:53 PM, Han Jiang jiangha...@gmail.com wrote:

 Back in 2007 Grant contacted with NIST about making TREC collection 
 available to our community: 
 
 http://mail-archives.apache.org/mod_mbox/lucene-dev/200708.mbox/browser
 
 I think a try for this is really important to our project and people who 
 use Lucene. All these years the speed performance is mainly tuned on 
 Wikipedia, however it's not very 'standard':
 
 * it doesn't represent how real-world search works; 
 * it cannot be used to evaluate the relevance of our scoring models;
 * researchers tend to do experiments on other data sets, and usually it is 
   hard to know whether Lucene performs its best performance; 
 
 And personally I agree with this line:
 
  I think it would encourage Lucene users/developers to think about 
  relevance as much as we think about speed.
 
 There's been much work to make Lucene's scoring models pluggable in 4.0, 
 and it'll be great if we can explore more about it. It is very appealing to 
 see a high-performance library work along with state-of-the-art ranking 
 methods. 
 
 
 And about TREC data set, the problems we met are:
 
 1. NIST/TREC does not own the original collections, therefore it might be 
necessary to have direct contact with those organizations who really did,
such as:
 
http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html
http://lemurproject.org/clueweb12/
 
 2. Currently, there is no open-source license for any of the data sets, so 
it won't be as 'open' as Wikipedia is.
 
As is proposed by Grant, a possibility is to make the data set accessible
only to committers instead of all users. It is not very open-source then,
but TREC data sets is public and usually available to researchers, so 
people can still reproduce performance test.
 
 I'm quite curious, has anyone explored getting an open-source license for 
 one of those data sets? And is our community still interested about this 
 issue after all these years?
 

It continues to be of interest to me.  I've had various conversations 
throughout the years on it.  Most people like the idea, but are not sure how to 
distribute it in an open way (ClueWeb comes as 4 1TB disks right now) and I am 
also not sure how they would handle any copyright/redaction claims against it.  
There is, of course, little incentive for those involved to solve these, 
either, as most people who are interested sign the form and pay the $600 for 
the disks.  I've had a number of conversations about how I view this to be a 
significant barrier to open research, esp. in under-served countries and to 
open source.  People sympathize with me, but then move on.

To this day, I think the only way it will happen is for the community to 
build a completely open system, perhaps based off of Common Crawl or our own 
crawl and host it ourselves and develop judgments, etc.  We tried to get this 
off the ground w/ the Open Relevance Project, but there was never a sustainable 
effort, and thus I have little hope at this point for it (but I would love to 
be proven wrong)  For it to succeed, I think we would need the backing of a 
University with students interested in curating such a collection, the 
judgments, etc.  I think we could figure out how to distribute the data either 
as an AWS public data set or possibly via the ASF or similar (although I am 
pretty sure the ASF would balk at multi-TB sized downloads).  

Happy to hear other ideas.


Grant Ingersoll | @gsingers
http://www.lucidworks.com

[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors


[ 
https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768967#comment-13768967
 ] 

Robert Muir commented on LUCENE-5212:
-

https://bugs.openjdk.java.net/browse/JDK-8024830

 java 7u40 causes sigsegv and corrupt term vectors
 -

 Key: LUCENE-5212
 URL: https://issues.apache.org/jira/browse/LUCENE-5212
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: hs_err_pid32714.log, jenkins.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests


[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768980#comment-13768980
 ] 

Sea Marie commented on SOLR-4470:
-

Hi,

I am running a simple two-node SolrCloud cluster with this patch (pulled from 
Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. 

I made a few small changes to the Jetty configs to restrict access via basic 
auth on all SOLR resources. After rebooting with these changes, the SolrCore on 
my second node is not coming up - it seems like the credentials are not being 
used in the core recovery code, or not being passed to ZooKeeper, or something. 
Have I missed some configuration step? Or am I confused and this scenario is 
not supported by this patch?

Here are the changes I made in Jetty to enable basic auth:

etc/webdefault.xml (perhaps protecting everything is overly general?): 
  security-constraint
web-resource-collection
  web-resource-nameSolr authenticated application/web-resource-name
  url-pattern//url-pattern
/web-resource-collection
auth-constraint
  role-nameaccess-role/role-name
/auth-constraint
  /security-constraint

  login-config
auth-methodBASIC/auth-method
realm-nameAccess Realm/realm-name
  /login-config

etc/jetty.xml:
Call name=addBean
  Arg
New class=org.eclipse.jetty.security.HashLoginService
  Set name=nameAccess Realm/Set
  Set name=configSystemProperty name=jetty.home 
default=.//etc/realm.properties/Set
  Set name=refreshInterval0/Set
/New
  /Arg
/Call

etc/realm.properties (redacted for obvious reasons :))
  user: password, access-role

And the changes to SOLR-related things:
scripts/ctl.sh (on host2):

SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf 
-Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 
-Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ 
-Djetty.home=$INSTALL_PATH/ -jar 
-DinternalAuthCredentialsBasicAuthUsername=user 
-DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar 
$INSTALL_PATH/etc/jetty.xml

(on host1, same as above w/o the -DzkHost param)

And then the error I'm getting (on host2, the second node, only. host1, the 
leader is fine):
INFO  - 2013-09-16 23:36:58.409; 
org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error 
while trying to recover. 
core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://host1:8983/solr returned non ok status:401, 
message:Unauthorized
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219)




 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.5, 5.0

 Attachments: SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch, 
 SOLR-4470.patch, SOLR-4470.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal

[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests


[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768980#comment-13768980
 ] 

Sea Marie edited comment on SOLR-4470 at 9/17/13 12:02 AM:
---

Hi,

I am running a simple two-node SolrCloud cluster with this patch (pulled from 
Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. 

I made a few small changes to the Jetty configs to restrict access via basic 
auth on all SOLR resources. After rebooting with these changes, the SolrCore on 
my second node is not coming up - it seems like the credentials are not being 
used in the core recovery code, or not being passed to ZooKeeper, or something. 
Have I missed some configuration step? Or am I confused and this scenario is 
not supported by this patch?

h5. Changes I made in Jetty to enable basic auth:

h6. etc/webdefault.xml (perhaps protecting everything is overly general?): 
  security-constraint
web-resource-collection
  web-resource-nameSolr authenticated application/web-resource-name
  url-pattern//url-pattern
/web-resource-collection
auth-constraint
  role-nameaccess-role/role-name
/auth-constraint
  /security-constraint

  login-config
auth-methodBASIC/auth-method
realm-nameAccess Realm/realm-name
  /login-config

h6. etc/jetty.xml:
Call name=addBean
  Arg
New class=org.eclipse.jetty.security.HashLoginService
  Set name=nameAccess Realm/Set
  Set name=configSystemProperty name=jetty.home 
default=.//etc/realm.properties/Set
  Set name=refreshInterval0/Set
/New
  /Arg
/Call

h6. etc/realm.properties (redacted for obvious reasons :))
  user: password, access-role

h5. Changes to SOLR-related things:
scripts/ctl.sh (on host2):

SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf 
-Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 
-Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ 
-Djetty.home=$INSTALL_PATH/ -jar 
-DinternalAuthCredentialsBasicAuthUsername=user 
-DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar 
$INSTALL_PATH/etc/jetty.xml

(on host1, same as above w/o the -DzkHost param)

h5. The error I'm seeing (on host2, the second node, only. host1, the 
leader is fine):
INFO  - 2013-09-16 23:36:58.409; 
org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error 
while trying to recover. 
core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://host1:8983/solr returned non ok status:401, 
message:Unauthorized
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219)




  was (Author: sapphiremirage):
Hi,

I am running a simple two-node SolrCloud cluster with this patch (pulled from 
Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. 

I made a few small changes to the Jetty configs to restrict access via basic 
auth on all SOLR resources. After rebooting with these changes, the SolrCore on 
my second node is not coming up - it seems like the credentials are not being 
used in the core recovery code, or not being passed to ZooKeeper, or something. 
Have I missed some configuration step? Or am I confused and this scenario is 
not supported by this patch?

Here are the changes I made in Jetty to enable basic auth:

etc/webdefault.xml (perhaps protecting everything is overly general?): 
  security-constraint
web-resource-collection
  web-resource-nameSolr authenticated application/web-resource-name
  url-pattern//url-pattern
/web-resource-collection
auth-constraint
  role-nameaccess-role/role-name
/auth-constraint
  /security-constraint

  login-config
auth-methodBASIC/auth-method
realm-nameAccess Realm/realm-name
  /login-config

etc/jetty.xml:
Call name=addBean
  Arg
New class=org.eclipse.jetty.security.HashLoginService
  Set name=nameAccess Realm/Set
  Set name=configSystemProperty name=jetty.home 
default=.//etc/realm.properties/Set
  Set name=refreshInterval0/Set
/New
  /Arg
/Call

etc/realm.properties (redacted for obvious reasons :))
  user: password, access-role

And the changes to SOLR-related things:
scripts/ctl.sh (on host2):

SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf

[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests


[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768980#comment-13768980
 ] 

Sea Marie edited comment on SOLR-4470 at 9/17/13 12:04 AM:
---

Hi,

I am running a simple two-node SolrCloud cluster with this patch (pulled from 
Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. 

I made a few small changes to the Jetty configs to restrict access via basic 
auth on all SOLR resources. After rebooting with these changes, the SolrCore on 
my second node is not coming up - it seems like the credentials are not being 
used in the core recovery code, or not being passed to ZooKeeper, or something. 
Have I missed some configuration step? Or am I confused and this scenario is 
not supported by this patch?

h5. Changes I made in Jetty to enable basic auth:

h6. etc/webdefault.xml (perhaps protecting everything is overly general?):
{noformat} 
  security-constraint
web-resource-collection
  web-resource-nameSolr authenticated application/web-resource-name
  url-pattern//url-pattern
/web-resource-collection
auth-constraint
  role-nameaccess-role/role-name
/auth-constraint
  /security-constraint

  login-config
auth-methodBASIC/auth-method
realm-nameAccess Realm/realm-name
  /login-config
{noformat}

h6. etc/jetty.xml:
{noformat}
Call name=addBean
  Arg
New class=org.eclipse.jetty.security.HashLoginService
  Set name=nameAccess Realm/Set
  Set name=configSystemProperty name=jetty.home 
default=.//etc/realm.properties/Set
  Set name=refreshInterval0/Set
/New
  /Arg
/Call
{noformat}
h6. etc/realm.properties (redacted for obvious reasons :))
{noformat}
  user: password, access-role
{noformat}

h5. Changes to SOLR-related things:
scripts/ctl.sh (on host2):
{noformat}
SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf 
-Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 
-Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ 
-Djetty.home=$INSTALL_PATH/ -jar 
-DinternalAuthCredentialsBasicAuthUsername=user 
-DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar 
$INSTALL_PATH/etc/jetty.xml
{noformat}
(on host1, same as above w/o the -DzkHost param)

h5. The error I'm seeing (on host2, the second node, only. host1, the 
leader is fine):
{noformat}
INFO  - 2013-09-16 23:36:58.409; 
org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error 
while trying to recover. 
core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://host1:8983/solr returned non ok status:401, 
message:Unauthorized
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219)
{noformat}



  was (Author: sapphiremirage):
Hi,

I am running a simple two-node SolrCloud cluster with this patch (pulled from 
Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. 

I made a few small changes to the Jetty configs to restrict access via basic 
auth on all SOLR resources. After rebooting with these changes, the SolrCore on 
my second node is not coming up - it seems like the credentials are not being 
used in the core recovery code, or not being passed to ZooKeeper, or something. 
Have I missed some configuration step? Or am I confused and this scenario is 
not supported by this patch?

h5. Changes I made in Jetty to enable basic auth:

h6. etc/webdefault.xml (perhaps protecting everything is overly general?): 
  security-constraint
web-resource-collection
  web-resource-nameSolr authenticated application/web-resource-name
  url-pattern//url-pattern
/web-resource-collection
auth-constraint
  role-nameaccess-role/role-name
/auth-constraint
  /security-constraint

  login-config
auth-methodBASIC/auth-method
realm-nameAccess Realm/realm-name
  /login-config

h6. etc/jetty.xml:
Call name=addBean
  Arg
New class=org.eclipse.jetty.security.HashLoginService
  Set name=nameAccess Realm/Set
  Set name=configSystemProperty name=jetty.home 
default=.//etc/realm.properties/Set
  Set name=refreshInterval0/Set
/New
  /Arg
/Call

h6. etc/realm.properties (redacted for obvious reasons :))
  user: password, access-role

h5. Changes to

[jira] [Updated] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back

2013-09-16 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-5240:
---

Attachment: SOLR-5240.patch

Here's the simplest patch that fixes it - removing any executor thread limit 
when in ZK mode.  Note that this deadlock-until-timeout situation can also 
easily happen even when replicas of a particular shard aren't on the same node. 
 All that is required is to have more than 3 cores per node.

 SolrCloud node doesn't (quickly) come all the way back
 --

 Key: SOLR-5240
 URL: https://issues.apache.org/jira/browse/SOLR-5240
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Yonik Seeley
 Fix For: 4.5

 Attachments: SOLR-5240.patch


 Killing a single node and bringing it back up can result in waiting until we 
 see more replicas up...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back


[ 
https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769015#comment-13769015
 ] 

Mark Miller commented on SOLR-5240:
---

+1 - any other fix seems somewhat complicated. 

 SolrCloud node doesn't (quickly) come all the way back
 --

 Key: SOLR-5240
 URL: https://issues.apache.org/jira/browse/SOLR-5240
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Yonik Seeley
 Fix For: 4.5

 Attachments: SOLR-5240.patch


 Killing a single node and bringing it back up can result in waiting until we 
 see more replicas up...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (SOLR-4470) Support for basic http auth in internal solr requests

2013-09-16 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sea Marie updated SOLR-4470:


Comment: was deleted

(was: Hi,

I am running a simple two-node SolrCloud cluster with this patch (pulled from 
Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. 

I made a few small changes to the Jetty configs to restrict access via basic 
auth on all SOLR resources. After rebooting with these changes, the SolrCore on 
my second node is not coming up - it seems like the credentials are not being 
used in the core recovery code, or not being passed to ZooKeeper, or something. 
Have I missed some configuration step? Or am I confused and this scenario is 
not supported by this patch?

h5. Changes I made in Jetty to enable basic auth:

h6. etc/webdefault.xml (perhaps protecting everything is overly general?):
{noformat} 
  security-constraint
web-resource-collection
  web-resource-nameSolr authenticated application/web-resource-name
  url-pattern//url-pattern
/web-resource-collection
auth-constraint
  role-nameaccess-role/role-name
/auth-constraint
  /security-constraint

  login-config
auth-methodBASIC/auth-method
realm-nameAccess Realm/realm-name
  /login-config
{noformat}

h6. etc/jetty.xml:
{noformat}
Call name=addBean
  Arg
New class=org.eclipse.jetty.security.HashLoginService
  Set name=nameAccess Realm/Set
  Set name=configSystemProperty name=jetty.home 
default=.//etc/realm.properties/Set
  Set name=refreshInterval0/Set
/New
  /Arg
/Call
{noformat}
h6. etc/realm.properties (redacted for obvious reasons :))
{noformat}
  user: password, access-role
{noformat}

h5. Changes to SOLR-related things:
scripts/ctl.sh (on host2):
{noformat}
SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf 
-Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 
-Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ 
-Djetty.home=$INSTALL_PATH/ -jar 
-DinternalAuthCredentialsBasicAuthUsername=user 
-DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar 
$INSTALL_PATH/etc/jetty.xml
{noformat}
(on host1, same as above w/o the -DzkHost param)

h5. The error I'm seeing (on host2, the second node, only. host1, the 
leader is fine):
{noformat}
INFO  - 2013-09-16 23:36:58.409; 
org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error 
while trying to recover. 
core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://host1:8983/solr returned non ok status:401, 
message:Unauthorized
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219)
{noformat}

)

 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.5, 5.0

 Attachments: SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch, 
 SOLR-4470.patch, SOLR-4470.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously

[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back


[ 
https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769099#comment-13769099
 ] 

ASF subversion and git services commented on SOLR-5240:
---

Commit 1523871 from [~yo...@apache.org] in branch 'dev/trunk'
[ https://svn.apache.org/r1523871 ]

SOLR-5240: unlimited core loading threads to fix waiting-for-other-replicas 
deadlock

 SolrCloud node doesn't (quickly) come all the way back
 --

 Key: SOLR-5240
 URL: https://issues.apache.org/jira/browse/SOLR-5240
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Yonik Seeley
 Fix For: 4.5

 Attachments: SOLR-5240.patch


 Killing a single node and bringing it back up can result in waiting until we 
 see more replicas up...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors


[ 
https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769096#comment-13769096
 ] 

Robert Muir commented on LUCENE-5212:
-

I crashed again but with a core file (set 'ulimit -c unlimited').

zip file with core dump and hs_err is here: 
http://people.apache.org/~rmuir/crash.zip (its too large for JIRA, sorry)

For some more context, it always happens fairly early in the test run:
so when it doesnt crash at this exact point, you can ^C and run again until it 
does.

Here was my commands with output below: (i tried to simplify the procedure to 
make it easy to reproduce, but its not easy, it took me quite a few tries)

{noformat}
# note: we are pulling the exact revision that jenkins failed on, because 
things have changed in lucene codebase over the weekend
svn co -r 1523179 https://svn.apache.org/repos/asf/lucene/dev/trunk
# just go to core tests
cd trunk/lucene/core
#
# now the following two commands: just run again and again until it crashes.
#
rm -rf ../../.caches/
ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 
-Dargs=-XX:-UseCompressedOops -XX:+UseParallelGC
{noformat}

Here was the output:

{noformat}
rmuir@beast:~/workspace/trunk/lucene/core$ rm -rf ../../.caches/
rmuir@beast:~/workspace/trunk/lucene/core$ ant test 
-Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dargs=-XX:-UseCompressedOops 
-XX:+UseParallelGC
Buildfile: /home/rmuir/workspace/trunk/lucene/core/build.xml

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: 
http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

init:

compile-core:

compile-test-framework:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

init:

compile-lucene-core:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

init:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

compile-core:

compile-codecs:
 [echo] Building codecs...

ivy-availability-check:
 [echo] Building codecs...

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

common.init:

compile-lucene-core:

init:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

compile-core:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

common.compile-core:

compile-core:

compile-test:

install-junit4-taskdef:

validate:

test:
[junit4:pickseed] Seed property 'tests.seed' already defined: 43A1116E7F98BED4
[mkdir] Created dir: /home/rmuir/workspace/trunk/.caches/test-stats/core
   [junit4] JUnit4 says ciao! Master seed: 43A1116E7F98BED4
   [junit4] Executing 367 suites with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(26780@beast).
   [junit4] Suite: org.apache.lucene.store.TestHugeRamFile
   [junit4] Completed in 1.26s, 1 test
   [junit4] 
   [junit4] Suite: org.apache.lucene.search.TestTimeLimitingCollector
   [junit4] Completed in 3.26s, 6 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping
   [junit4] Completed in 0.68s, 2 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestPostingsOffsets
   [junit4] Completed in 0.78s, 11 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.search.TestRegexpQuery
   [junit4] Completed in 0.11s, 7 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestTryDelete
   [junit4] Completed in 0.03s, 3 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.util.TestDoubleBarrelLRUCache
   [junit4] Completed in 1.02s, 2 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.analysis.TestGraphTokenizers
   [junit4] Completed in 3.01s, 21 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestIndexWriterMerging
   [junit4] Completed in 10.21s, 6 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.search.TestSearchAfter
   [junit4] Completed in 2.05s, 1 test
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestNoMergeScheduler
   [junit4] Completed in 0.02s, 3 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.TestSearchForDuplicates
   [junit4] Completed in 0.08s, 1 test
   [junit4] 
   [junit4] Suite: org.apache.lucene.util.TestBytesRef
   [junit4] Completed in 0.02s, 5 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.Test4GBStoredFields
   [junit4] IGNOR/A 0.02s | Test4GBStoredFields.test
   [junit4] Assumption #1: 'nightly' test group is disabled (@Nightly)
   [junit4] Completed in 0.03s, 1 test, 1 skipped
   [junit4] 
   [junit4] Suite:

[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors

2013-09-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769105#comment-13769105
 ] 

Robert Muir commented on LUCENE-5212:
-

I ran again and here is the output where it does not crash, but instead 
corrupts:
{noformat}
rmuir@beast:~/workspace/trunk/lucene/core$ rm -rf ../../.caches/
rmuir@beast:~/workspace/trunk/lucene/core$ ant test 
-Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dargs=-XX:-UseCompressedOops 
-XX:+UseParallelGC
Buildfile: /home/rmuir/workspace/trunk/lucene/core/build.xml

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: 
http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

init:

compile-core:

compile-test-framework:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

init:

compile-lucene-core:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

init:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

compile-core:

compile-codecs:
 [echo] Building codecs...

ivy-availability-check:
 [echo] Building codecs...

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/home/rmuir/workspace/trunk/lucene/ivy-settings.xml

resolve:

common.init:

compile-lucene-core:

init:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

compile-core:

-clover.disable:

-clover.load:

-clover.classpath:

-clover.setup:

clover:

common.compile-core:

compile-core:

compile-test:

install-junit4-taskdef:

validate:

test:
[junit4:pickseed] Seed property 'tests.seed' already defined: 43A1116E7F98BED4
[mkdir] Created dir: /home/rmuir/workspace/trunk/.caches/test-stats/core
   [junit4] JUnit4 says ciao! Master seed: 43A1116E7F98BED4
   [junit4] Executing 367 suites with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(27313@beast).
   [junit4] Suite: org.apache.lucene.store.TestHugeRamFile
   [junit4] Completed in 1.55s, 1 test
   [junit4] 
   [junit4] Suite: org.apache.lucene.search.TestTimeLimitingCollector
   [junit4] Completed in 3.23s, 6 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping
   [junit4] Completed in 0.57s, 2 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestPostingsOffsets
   [junit4] Completed in 0.78s, 11 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.search.TestRegexpQuery
   [junit4] Completed in 0.11s, 7 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestTryDelete
   [junit4] Completed in 0.04s, 3 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.util.TestDoubleBarrelLRUCache
   [junit4] Completed in 1.02s, 2 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.analysis.TestGraphTokenizers
   [junit4] Completed in 3.05s, 21 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.index.TestIndexWriterMerging
   [junit4] Completed in 10.27s, 6 tests
   [junit4] 
   [junit4] Suite: org.apache.lucene.search.TestSearchAfter
   [junit4]   1 CheckIndex failed
   [junit4]   1 Segments file=segments_2 numSegments=2 version=5.0 format=
   [junit4]   1   1 of 2: name=_0 docCount=156
   [junit4]   1 codec=Lucene45
   [junit4]   1 compound=false
   [junit4]   1 numFiles=30
   [junit4]   1 size (MB)=0.157
   [junit4]   1 diagnostics = {timestamp=1379384646861, os=Linux, 
os.version=3.5.0-27-generic, source=flush, lucene.version=5.0-SNAPSHOT, 
os.arch=amd64, java.version=1.7.0_40, java.vendor=Oracle Corporation}
   [junit4]   1 no deletions
   [junit4]   1 test: open reader.OK
   [junit4]   1 test: fields..OK [15 fields]
   [junit4]   1 test: field norms.OK [2 fields]
   [junit4]   1 test: terms, freq, prox...OK [8628 terms; 10659 terms/docs 
pairs; 423 tokens]
   [junit4]   1 test: stored fields...OK [1 total field count; avg 
0.006 fields per doc]
   [junit4]   1 test: term vectorsOK [3 total vector count; avg 1 
term/freq vector fields per doc]
   [junit4]   1 test: docvalues...OK [5 docvalues fields; 1 
BINARY; 2 NUMERIC; 2 SORTED; 0 SORTED_SET]
   [junit4]   1 
   [junit4]   1   2 of 2: name=_1 docCount=57
   [junit4]   1 codec=Lucene45
   [junit4]   1 compound=false
   [junit4]   1 numFiles=30
   [junit4]   1 size (MB)=0.06
   [junit4]   1 diagnostics = {timestamp=1379384646900, os=Linux, 
os.version=3.5.0-27-generic, source=flush, lucene.version=5.0-SNAPSHOT, 
os.arch=amd64, java.version=1.7.0_40, java.vendor=Oracle Corporation}
   [junit4]   1 no

[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back


[ 
https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769109#comment-13769109
 ] 

ASF subversion and git services commented on SOLR-5240:
---

Commit 1523872 from [~yo...@apache.org] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1523872 ]

SOLR-5240: unlimited core loading threads to fix waiting-for-other-replicas 
deadlock

 SolrCloud node doesn't (quickly) come all the way back
 --

 Key: SOLR-5240
 URL: https://issues.apache.org/jira/browse/SOLR-5240
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Yonik Seeley
 Fix For: 4.5

 Attachments: SOLR-5240.patch


 Killing a single node and bringing it back up can result in waiting until we 
 see more replicas up...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back

2013-09-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769110#comment-13769110
 ] 

ASF subversion and git services commented on SOLR-5240:
---

Commit 1523873 from [~yo...@apache.org] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1523873 ]

SOLR-5240: unlimited core loading threads to fix waiting-for-other-replicas 
deadlock

 SolrCloud node doesn't (quickly) come all the way back
 --

 Key: SOLR-5240
 URL: https://issues.apache.org/jira/browse/SOLR-5240
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Yonik Seeley
 Fix For: 4.5

 Attachments: SOLR-5240.patch


 Killing a single node and bringing it back up can result in waiting until we 
 see more replicas up...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors