date:20130813

mygithubit created LUCENE-5169:
--

 Summary: UniDic 2.1.2 support
 Key: LUCENE-5169
 URL: https://issues.apache.org/jira/browse/LUCENE-5169
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: mygithubit
Priority: Minor


I made some amendments to support UniDic 2.1.2 into kuromoji.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5169) UniDic 2.1.2 support


 [ 
https://issues.apache.org/jira/browse/LUCENE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mygithubit updated LUCENE-5169:
---

Attachment: unidic.patch

 UniDic 2.1.2 support
 

 Key: LUCENE-5169
 URL: https://issues.apache.org/jira/browse/LUCENE-5169
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: mygithubit
Priority: Minor
 Attachments: unidic.patch


 I made some amendments to support UniDic 2.1.2 into kuromoji.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5169) UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mygithubit updated LUCENE-5169:
---

Summary: UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji)  (was: 
UniDic 2.1.2 support)

 UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji)
 --

 Key: LUCENE-5169
 URL: https://issues.apache.org/jira/browse/LUCENE-5169
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: mygithubit
Priority: Minor
 Attachments: unidic.patch


 I made some amendments to support UniDic 2.1.2 into kuromoji.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5169) UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mygithubit updated LUCENE-5169:
---

Description: 
I made some amendments to support UniDic 2.1.2 into kuromoji.

The attached patch is against lucene_solr_4_4 branch.

  was:I made some amendments to support UniDic 2.1.2 into kuromoji.


 UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji)
 --

 Key: LUCENE-5169
 URL: https://issues.apache.org/jira/browse/LUCENE-5169
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: mygithubit
Priority: Minor
 Attachments: unidic.patch


 I made some amendments to support UniDic 2.1.2 into kuromoji.
 The attached patch is against lucene_solr_4_4 branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5166) PostingsHighlighter fails with IndexOutOfBoundsException

2013-08-13 Thread Manuel Amoabeng (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737983#comment-13737983
 ] 

Manuel Amoabeng commented on LUCENE-5166:
-

Thank you for the quick help!

 PostingsHighlighter fails with IndexOutOfBoundsException
 

 Key: LUCENE-5166
 URL: https://issues.apache.org/jira/browse/LUCENE-5166
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.4
Reporter: Manuel Amoabeng
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5166-2.patch, LUCENE-5166.patch, 
 LUCENE-5166.patch, LUCENE-5166.patch, LUCENE-5166.patch, LUCENE-5166.patch, 
 LUCENE-5166.patch


 Given a document with a match at a startIndex  PostingsHighlighter.maxlength 
 and an endIndex  PostingsHighlighter.maxLength, DefaultPassageFormatter will 
 throw an IndexOutOfBoundsException when DefaultPassageFormatter.append() is 
 invoked. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-08-13 Thread Elran Dvir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738031#comment-13738031
 ] 

Elran Dvir commented on SOLR-5084:
--

The patch is finally attached.
I'll attach a patch with unit tests ASAP. 

 new field type - EnumField
 --

 Key: SOLR-5084
 URL: https://issues.apache.org/jira/browse/SOLR-5084
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
 Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
 Solr-5084.patch, Solr-5084.patch


 We have encountered a use case in our system where we have a few fields 
 (Severity. Risk etc) with a closed set of values, where the sort order for 
 these values is pre-determined but not lexicographic (Critical is higher than 
 High). Generically this is very close to how enums work.
 To implement, I have prototyped a new type of field: EnumField where the 
 inputs are a closed predefined  set of strings in a special configuration 
 file (similar to currency.xml).
 The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5084) new field type - EnumField

2013-08-13 Thread Elran Dvir (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elran Dvir updated SOLR-5084:
-

Attachment: Solr-5084.patch

 new field type - EnumField
 --

 Key: SOLR-5084
 URL: https://issues.apache.org/jira/browse/SOLR-5084
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
 Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
 Solr-5084.patch, Solr-5084.patch


 We have encountered a use case in our system where we have a few fields 
 (Severity. Risk etc) with a closed set of values, where the sort order for 
 these values is pre-determined but not lexicographic (Critical is higher than 
 High). Generically this is very close to how enums work.
 To implement, I have prototyped a new type of field: EnumField where the 
 inputs are a closed predefined  set of strings in a special configuration 
 file (similar to currency.xml).
 The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer


 [ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5170:
--

Fix Version/s: 4.5
   5.0
 Assignee: Uwe Schindler

 Add getter for reuse strategy to Analyzer
 -

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5170) Add getter for reuse strategy to Analyzer

Uwe Schindler created LUCENE-5170:
-

 Summary: Add getter for reuse strategy to Analyzer
 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler


If you write an Analyzer that wraps another one (but without using 
AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This 
is not possible as there is no way to get the reuse startegy (private field and 
no getter).

An example is ES's NamedAnalyzer, see my comment: 
[https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]

This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer

[
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-5170:
--

Attachment: LUCENE-5170.patch

Patch.

Maybe we should rethink AnalyzerWrapper, too. So it would use the strategy of
the wrapped Analyzer, too, unless you have something field-specific? In that
case your would pass explicit reuse strategy in the ctor, but the default is
the one of the inner analyzer.

Add getter for reuse strategy to Analyzer
-

Key: LUCENE-5170
URL: https://issues.apache.org/jira/browse/LUCENE-5170
Project: Lucene - Core
Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 5.0, 4.5

Attachments: LUCENE-5170.patch

If you write an Analyzer that wraps another one (but without using
AnalyzerWrapper) you may need use the same reuse strategy in your wrapper.
This is not possible as there is no way to get the reuse startegy (private
field and no getter).
An example is ES's NamedAnalyzer, see my comment:
[https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer

2013-08-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738077#comment-13738077
 ] 

Simon Willnauer commented on LUCENE-5170:
-

+1

 Add getter for reuse strategy to Analyzer
 -

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list


[ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738081#comment-13738081
 ] 

Erick Erickson commented on SOLR-5057:
--

I'll give this another go-over in the next day or two.

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-5057.patch, SOLR-5057.patch, SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738089#comment-13738089
 ] 

Robert Muir commented on LUCENE-5170:
-

+1. I agree we should rethink AnalyzerWrapper too.

my preference: just make it a mandatory arg to the protected ctor of this class.



 Add getter for reuse strategy to Analyzer
 -

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5084) new field type - EnumField

[
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738092#comment-13738092
]

Erick Erickson commented on SOLR-5084:
--

@Elran

bq: Why do you say the assumption is the type is restricted to single value?...

Parts of the discussion mentioned sorting, which is undefined on multivalued
fields. If sorting is _required_ for an enum-type field then it shouldn't be
mutliValued. There's no reason it _needs_ to be restricted to single values,
it's fine for the enum type to be just like any other field; it's up to the
user to only put one value in the field if it's to be used to sorting.

Mostly getting it straight in my head what the characteristics are, not saying
it _should_ be single-valued-only...

Erick

new field type - EnumField
--

Key: SOLR-5084
URL: https://issues.apache.org/jira/browse/SOLR-5084
Project: Solr
Issue Type: New Feature
Reporter: Elran Dvir
Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch,
Solr-5084.patch, Solr-5084.patch

We have encountered a use case in our system where we have a few fields
(Severity. Risk etc) with a closed set of values, where the sort order for
these values is pre-determined but not lexicographic (Critical is higher than
High). Generically this is very close to how enums work.
To implement, I have prototyped a new type of field: EnumField where the
inputs are a closed predefined set of strings in a special configuration
file (similar to currency.xml).
The code is based on 4.2.1.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5084) new field type - EnumField

[
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738096#comment-13738096
]

Robert Muir commented on SOLR-5084:
---

Wait: i said sort order (not sorting).

So to me the multivalued case of an enum field makes total sense (it is kinda
like java's EnumSet). And the sort order defines what is used in faceting,
range queries, and so on.

new field type - EnumField
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4906) PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects

2013-08-13 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738097#comment-13738097
]

Michael McCandless commented on LUCENE-4906:

I think the challenge here is that these are not just advanced uses;
they are super expert uses, and I don't feel like that justifies the
added cost of generics for normal users.

There are definitely times when generics make sense but I don't think
this case applies ...

I agree the Object approach is rather old fashioned ... but it should
still work for these super expert cases. So, it's not ideal, but it's
a step forward at least (progress not perfection) ... I'd like to
commit the Object approach so we move forward.

If future use cases emerge that makes the generics use-case more
common we can always re-visit this (this API is experimental; we are
free to change it), so none of this is set in stone ...

PostingsHighlighter's PassageFormatter should allow for rendering to
arbitrary objects
--

Key: LUCENE-4906
URL: https://issues.apache.org/jira/browse/LUCENE-4906
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Attachments: LUCENE-4906.patch, LUCENE-4906.patch

For example, in a server, I may want to render the highlight result to
JsonObject to send back to the front-end. Today since we render to string, I
have to render to JSON string and then re-parse to JsonObject, which is
inefficient...
Or, if (Rob's idea:) we make a query that's like MoreLikeThis but it pulls
terms from snippets instead, so you get proximity-influenced salient/expanded
terms, then perhaps that renders to just an array of tokens or fragments or
something from each snippet.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5084) new field type - EnumField

[
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738098#comment-13738098
]

Erick Erickson commented on SOLR-5084:
--

Ahhh, OK. Then Hoss says sorting, so no wonder I'm confused!

There's no reason one couldn't sort by a field of this type, right? Frankly
though, it seems kind of low-utility since there are probably only going to be
a few values in the common use-case, but I'd guess it's still a possibility...

new field type - EnumField
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-08-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4583:
---

Attachment: LUCENE-4583.patch

bq. I'm confused by the following comment:

I fixed the comment; it's because those DVFormats use PagedBytes.fillSlice, 
which cannot handle more than 2 pages.

New patch w/ that fix ...

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Assignee: Michael McCandless
Priority: Critical
 Fix For: 5.0, 4.5

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-08-13 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738105#comment-13738105
 ] 

Han Jiang commented on LUCENE-3069:
---

Hi, currently, we have problem when migrating the codes to trunk:

The API refactoring on PostingsReader/WriterBase now splits term metadata into 
two parts:
monotonic long[] and generical byte[], the former is known by term dictionary 
for better
d-gap encoding. 

So we need a 'longsSize' in field summary, to tell reader the fixed length of 
this monotonic
long[]. However, this API change actually breaks backward compability: the old 
4.x indices didn't 
support this, and for some codec like Lucene40, since their writer part are 
already deprecated, 
their tests won't pass.

It seems like we can put all the metadata in generic byte[] and let PBF do its 
own buffering 
(like we do in old API: nextTerm() ), however we'll have to add logics for 
this, in every PBF then.

So... can we solve this problem more elegantly?

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 5.0, 4.5

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5084) new field type - EnumField

[
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738106#comment-13738106
]

Robert Muir commented on SOLR-5084:
---

I think sorting is a major use case. With some of these previous examples like
risk or issue tracker status, you want to sort by the field and for 'high' risk
to sort after 'low', maybe 'closed' after 'created' and so on.

new field type - EnumField
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738113#comment-13738113
 ] 

Uwe Schindler commented on LUCENE-5170:
---

Robert: After reviewing the code:
The fixed-nonchangeable default in AnalyzerWrapper is PerField, which is a 
large overhead and should only be used in stuff like PerFieldAnalyzerWrapper 
(this class should call super(PerField) in its own ctor). But for other use 
cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped 
analyzer). It looks like the current impl in AnalyzerWrapper is somehow 
assuming you want to wrap per field.

I would suggest to make it mandatory in Lucene trunk, and add the missing ctor 
in Lucene 4.x, too. The default one should be deprecated with a hint that it 
might be a bad idea to use this default.

My use case is:
I have lots of predefined Analyzers for several languages or functionality in 
my search application. I have some additional AnalyzerWrappers around that 
simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I 
can use that with another field). So, my wrapper just takes one of these 
per-language Analyzers and wraps with another additional TokenFilter. As the 
underlying Analyzer is global reuse, I need to make the wrapper global, too - 
currently impossible. Per field is a waste of resources in this case.

So I would suggest to make the base class AnalyzerWrapper copy the ctor of the 
superclass Analyzer and deprecate the default ctor in 4.x. For my above example 
(to wrap another analyzer), I still need the resuse strategy of the inner 
analyzer, so I need set getter on Analyzer.java, too (see current patch).

 Add getter for reuse strategy to Analyzer
 -

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable


 [ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5170:
--

  Component/s: modules/analysis
   core/other
Affects Version/s: 4.4

 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse 
 strategy configureable
 --

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other, modules/analysis
Affects Versions: 4.4
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable


 [ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5170:
--

Summary: Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's 
reuse strategy configureable  (was: Add getter for reuse strategy to Analyzer)

 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse 
 strategy configureable
 --

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5170) Add getter for reuse strategy to Analyzer


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738113#comment-13738113
 ] 

Uwe Schindler edited comment on LUCENE-5170 at 8/13/13 11:58 AM:
-

Robert: After reviewing the code:
The fixed-nonchangeable default in AnalyzerWrapper is PerField, which is a 
large overhead and should only be used in stuff like PerFieldAnalyzerWrapper 
(this class should call super(PerField) in its own ctor). But for other use 
cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped 
analyzer). It looks like the current impl in AnalyzerWrapper is somehow 
assuming you want to wrap per field.

I would suggest to make it mandatory in Lucene trunk, and add the missing ctor 
in Lucene 4.x, too. The default one should be deprecated with a hint that it 
might be a bad idea to use this default.

My use case is:
I have lots of predefined Analyzers for several languages or functionality in 
my search application. I have some additional AnalyzerWrappers around that 
simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I 
can use that with another field). So, my wrapper just takes one of these 
per-language Analyzers and wraps with another additional TokenFilter. As the 
underlying Analyzer is global reuse, I need to make the wrapper global, too - 
currently impossible. Per field is a waste of resources in this case.

Only PerFieldAnalyzerWrapper should use PerField strategy hardcoded (as it is 
per field), the base class not!

So I would suggest to make the base class AnalyzerWrapper copy the ctor of the 
superclass Analyzer and deprecate the default ctor in 4.x. For my above example 
(to wrap another analyzer), I still need the resuse strategy of the inner 
analyzer, so I need set getter on Analyzer.java, too (see current patch).

  was (Author: thetaphi):
Robert: After reviewing the code:
The fixed-nonchangeable default in AnalyzerWrapper is PerField, which is a 
large overhead and should only be used in stuff like PerFieldAnalyzerWrapper 
(this class should call super(PerField) in its own ctor). But for other use 
cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped 
analyzer). It looks like the current impl in AnalyzerWrapper is somehow 
assuming you want to wrap per field.

I would suggest to make it mandatory in Lucene trunk, and add the missing ctor 
in Lucene 4.x, too. The default one should be deprecated with a hint that it 
might be a bad idea to use this default.

My use case is:
I have lots of predefined Analyzers for several languages or functionality in 
my search application. I have some additional AnalyzerWrappers around that 
simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I 
can use that with another field). So, my wrapper just takes one of these 
per-language Analyzers and wraps with another additional TokenFilter. As the 
underlying Analyzer is global reuse, I need to make the wrapper global, too - 
currently impossible. Per field is a waste of resources in this case.

So I would suggest to make the base class AnalyzerWrapper copy the ctor of the 
superclass Analyzer and deprecate the default ctor in 4.x. For my above example 
(to wrap another analyzer), I still need the resuse strategy of the inner 
analyzer, so I need set getter on Analyzer.java, too (see current patch).
  
 Add getter for reuse strategy to Analyzer
 -

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738120#comment-13738120
 ] 

Robert Muir commented on LUCENE-5170:
-

{quote}
I would suggest to make it mandatory in Lucene trunk, and add the missing ctor 
in Lucene 4.x, too. The default one should be deprecated with a hint that it 
might be a bad idea to use this default.
{quote}

Yes, this is exactly what i think we should do. i really should be a mandatory 
parameter today (but cannot really work without also having the getter 
available!)

 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse 
 strategy configureable
 --

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other, modules/analysis
Affects Versions: 4.4
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738132#comment-13738132
 ] 

Uwe Schindler commented on LUCENE-5170:
---

There is a major problem:
*Strategy is no strategy at all, it holds state!*

So my idea to make the getter available is wrong, because it would make the 
private state of the analyzer public to the outside! So this is a misdesign 
in the API. The correct way to do this would be:

Make the strategy a ENUM like class (no state). The ThreadLocal should not be 
sitting on the strategy, the strategy should only implement the strategy, not 
also take care of storing the data in the ThreadLocal.

I have no idea how to fix this - it looks like we need to backwards break to 
fix this!

 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse 
 strategy configureable
 --

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other, modules/analysis
Affects Versions: 4.4
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738143#comment-13738143
 ] 

Uwe Schindler commented on LUCENE-5170:
---

The strategy pattern is defined like this, no state involved: 
http://en.wikipedia.org/wiki/Strategy_pattern

 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse 
 strategy configureable
 --

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other, modules/analysis
Affects Versions: 4.4
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738143#comment-13738143
 ] 

Uwe Schindler edited comment on LUCENE-5170 at 8/13/13 12:39 PM:
-

The definition of the strategy pattern can be found here, no state involved: 
http://en.wikipedia.org/wiki/Strategy_pattern

  was (Author: thetaphi):
The strategy pattern is defined like this, no state involved: 
http://en.wikipedia.org/wiki/Strategy_pattern
  
 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse 
 strategy configureable
 --

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other, modules/analysis
Affects Versions: 4.4
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5171) AnalyzingSuggester and FuzzySuggester should be able to share same FST

Anna Björk Nikulásdóttir created LUCENE-5171:


 Summary: AnalyzingSuggester and FuzzySuggester should be able to 
share same FST
 Key: LUCENE-5171
 URL: https://issues.apache.org/jira/browse/LUCENE-5171
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Affects Versions: 4.3.1, 4.4
Reporter: Anna Björk Nikulásdóttir


In my code I use both suggesters for the same FST. I use 
AnalyzerSuggester#store() to create the FST and later on 
AnalyzingSuggester#load() and FuzzySuggester#load() to use it.
This approach works very well but it unnecessarily creates 2 fst instances 
resulting in 2x memory consumption.

It seems that for the time being both suggesters use the same FST format.

The following trivial method in AnalyzingSuggester provides the possibility to 
share the same FST among different instances of AnalyzingSuggester. It has been 
tested in the above scenario:

  public boolean shareFstFrom(AnalyzingSuggester instance)
  {
if (instance.fst == null) {
  return false;
}
this.fst = instance.fst;
this.maxAnalyzedPathsForOneInput = instance.maxAnalyzedPathsForOneInput;
this.hasPayloads = instance.hasPayloads;

return true;
  }


One could use it like this:

  analyzingSugg = new AnalyzingSuggester(...);
  fuzzySugg = new FuzzySuggester(...);
  analyzingSugg.load(someInputStream);
  fuzzySugg = analyzingSugg.shareFstFrom(analyzingSugg);




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4231 - Still Failing

2013-08-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4231/

All tests passed

Build Log:
[...truncated 34759 lines...]
-documentation-lint:
[jtidy] Checking for broken html (such as invalid tags)...
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/lucene/build/jtidy_tmp
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for malformed docs...
 [exec] 
 [exec] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/solr/build/docs/solr-core/overview-summary.html
 [exec]   missing: org.apache.solr.search.join
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:389:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:60:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/solr/build.xml:563:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/solr/build.xml:579:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/lucene/common-build.xml:2149:
 exec returned: 1

Total time: 80 minutes 13 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5171) AnalyzingSuggester and FuzzySuggester should be able to share same FST


 [ 
https://issues.apache.org/jira/browse/LUCENE-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anna Björk Nikulásdóttir updated LUCENE-5171:
-

Priority: Minor  (was: Major)

 AnalyzingSuggester and FuzzySuggester should be able to share same FST
 --

 Key: LUCENE-5171
 URL: https://issues.apache.org/jira/browse/LUCENE-5171
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Affects Versions: 4.4, 4.3.1
Reporter: Anna Björk Nikulásdóttir
Priority: Minor

 In my code I use both suggesters for the same FST. I use 
 AnalyzerSuggester#store() to create the FST and later on 
 AnalyzingSuggester#load() and FuzzySuggester#load() to use it.
 This approach works very well but it unnecessarily creates 2 fst instances 
 resulting in 2x memory consumption.
 It seems that for the time being both suggesters use the same FST format.
 The following trivial method in AnalyzingSuggester provides the possibility 
 to share the same FST among different instances of AnalyzingSuggester. It has 
 been tested in the above scenario:
   public boolean shareFstFrom(AnalyzingSuggester instance)
   {
 if (instance.fst == null) {
   return false;
 }
 this.fst = instance.fst;
 this.maxAnalyzedPathsForOneInput = instance.maxAnalyzedPathsForOneInput;
 this.hasPayloads = instance.hasPayloads;
 return true;
   }
 One could use it like this:
   analyzingSugg = new AnalyzingSuggester(...);
   fuzzySugg = new FuzzySuggester(...);
   analyzingSugg.load(someInputStream);
   fuzzySugg = analyzingSugg.shareFstFrom(analyzingSugg);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 727 - Still Failing!

2013-08-13 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/727/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 34761 lines...]
-documentation-lint:
[jtidy] Checking for broken html (such as invalid tags)...
   [delete] Deleting directory 
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/jtidy_tmp
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for malformed docs...
 [exec] 
 [exec] 
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/docs/solr-core/overview-summary.html
 [exec]   missing: org.apache.solr.search.join
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/build.xml:389: 
The following error occurred while executing this line:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/build.xml:60: 
The following error occurred while executing this line:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build.xml:563:
 The following error occurred while executing this line:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build.xml:579:
 The following error occurred while executing this line:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/common-build.xml:2149:
 exec returned: 1

Total time: 164 minutes 25 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseSerialGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5172) FuzzySuggester should boost terms with minimal Levenshtein Distance

Anna Björk Nikulásdóttir created LUCENE-5172:


 Summary: FuzzySuggester should boost terms with minimal 
Levenshtein Distance
 Key: LUCENE-5172
 URL: https://issues.apache.org/jira/browse/LUCENE-5172
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Affects Versions: 4.3.1, 4.4
Reporter: Anna Björk Nikulásdóttir


For my use case I need both suggesters: AnalyzingSuggester and FuzzySuggester 
because FuzzySuggester does not boost terms with minimal Levenshtein distance. 
Post processing of FuzzySuggester results is somewhat heavy if only one wants 
to find direct prefix suggestions. So I first use AnalyzingSuggester to find 
prefix suggestions and optionally FuzzySuggester afterwards if 
AnaylzingSuggester did not yield appropriate results.

It would be really useful if FuzzySuggester could boost/sort suggestion results 
in order of Levenshtein distances. Then I only would need FuzzySuggester.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5173) Add checkindex piece of LUCENE-5116

Robert Muir created LUCENE-5173:
---

 Summary: Add checkindex piece of LUCENE-5116
 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir


LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
the case you merge in empty or all-deleted stuff).

I considered it just an inconsistency, but it could cause confusing exceptions 
to real users too if there was a regression here. (see solr users list:Split 
Shard Error - maxValue must be non-negative). 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5174) On disk FST objects

2013-08-13 Thread ASF subversion and git services (JIRA)

Anna Björk Nikulásdóttir created LUCENE-5174:


 Summary: On disk FST objects
 Key: LUCENE-5174
 URL: https://issues.apache.org/jira/browse/LUCENE-5174
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 4.3.1, 4.4
Reporter: Anna Björk Nikulásdóttir


If one wants to support multiple language suggestions at the same time via 
AnalyzingSuggester/FuzzySuggester on Android, it's almost not possible for the 
time being, because all suggesters use in memory resident FST's. And of course 
each language needs its own FST. On Android there are VM memory restrictions of 
32MB for older devices like the Nexus S. Making the math: a good language FST 
is roughly 11-15MB in size. Supporting even 2 languages at the same time is 
therefore difficult taking into account that FST's are not the only part of a 
common Android app.

A possible approach to a solution via memory mapping and DirectByteBuffer has 
been proposed by Mike Mc Candless on Lucene ML:

[http://mail-archives.apache.org/mod_mbox/lucene-java-user/201308.mbox/%3CCAL8PwkbHdeEvk+e47H6v6_=Ln36yhE2RY=m7rqbfp+h50u5...@mail.gmail.com%3E]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5116) IW.addIndexes doesn't prune all deleted segments


[ 
https://issues.apache.org/jira/browse/LUCENE-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738221#comment-13738221
 ] 

ASF subversion and git services commented on LUCENE-5116:
-

Commit 1513487 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1513487 ]

LUCENE-5116: Simplify test to use MatchNoBits instead own impl

 IW.addIndexes doesn't prune all deleted segments
 

 Key: LUCENE-5116
 URL: https://issues.apache.org/jira/browse/LUCENE-5116
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5116.patch, LUCENE-5116_test.patch


 at the least, this can easily create segments with maxDoc == 0.
 It seems buggy: elsewhere we prune these segments out, so its expected to 
 have a commit point with no segments rather than a segment with 0 documents...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116

2013-08-13 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5173:


Attachment: LUCENE-5116.patch

Simple patch: also adds an assert to SegmentMerger.

we can only check this if the index is 4.5+, because thats when LUCENE-5116 was 
fixed.


 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5116.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5116) IW.addIndexes doesn't prune all deleted segments


[ 
https://issues.apache.org/jira/browse/LUCENE-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738225#comment-13738225
 ] 

ASF subversion and git services commented on LUCENE-5116:
-

Commit 1513488 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1513488 ]

Merged revision(s) 1513487 from lucene/dev/trunk:
LUCENE-5116: Simplify test to use MatchNoBits instead own impl

 IW.addIndexes doesn't prune all deleted segments
 

 Key: LUCENE-5116
 URL: https://issues.apache.org/jira/browse/LUCENE-5116
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5116.patch, LUCENE-5116_test.patch


 at the least, this can easily create segments with maxDoc == 0.
 It seems buggy: elsewhere we prune these segments out, so its expected to 
 have a commit point with no segments rather than a segment with 0 documents...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116


[ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738244#comment-13738244
 ] 

Robert Muir commented on LUCENE-5173:
-

deleted the wrongly-named patch, sorry :)

 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116


 [ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5173:


Attachment: LUCENE-5173.patch

just a slight simplification of the logic

 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116


 [ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5173:


Attachment: (was: LUCENE-5116.patch)

 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable


[ 
https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738258#comment-13738258
 ] 

Robert Muir commented on LUCENE-5170:
-

{quote}
Make the strategy a ENUM like class (no state). The ThreadLocal should not be 
sitting on the strategy, the strategy should only implement the strategy, not 
also take care of storing the data in the ThreadLocal.
{quote}

I like this idea, i think it could simplify the thing a lot.

{quote}
I have no idea how to fix this - it looks like we need to backwards break to 
fix this!
{quote}

Personally i support that in this case: because i think we can minimize the 
breaks at the end of the day.

For example if we switch to enums, in 4.x, we could still allow 'instantiation' 
but its just useless (since the object is stateless) and deprecated. and the 
'constants' would be declared like MultiTermQuery rewrite?


 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse 
 strategy configureable
 --

 Key: LUCENE-5170
 URL: https://issues.apache.org/jira/browse/LUCENE-5170
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other, modules/analysis
Affects Versions: 4.4
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5170.patch


 If you write an Analyzer that wraps another one (but without using 
 AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. 
 This is not possible as there is no way to get the reuse startegy (private 
 field and no getter).
 An example is ES's NamedAnalyzer, see my comment: 
 [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
 This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116


[ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738263#comment-13738263
 ] 

Uwe Schindler commented on LUCENE-5173:
---

Patch is fine. I like that the checkindex allows older segments with empty 
size, but once a segment was merged it can no longer be empty.

Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger 
always strictly throws away empty segments (not that somebody can somehow with 
a crazy alcoholic mergepolicy create those segments again).

 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-08-13 Thread Han Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
--

Attachment: LUCENE-3069.patch

Patch with backward compability fix on Lucene41PBF (TempPostingsReader is 
actually a fork of Lucene41PostingsReader).

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 5.0, 4.5

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116


[ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738299#comment-13738299
 ] 

Robert Muir commented on LUCENE-5173:
-

{quote}
Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger 
always strictly throws away empty segments (not that somebody can somehow with 
a crazy alcoholic mergepolicy create those segments again).
{quote}

Or, maybe  mergeState.segmentInfo.setDocCount(setDocMaps()) should happen in 
the ctor of SegmentMerger instead of line 1 of merge()?

And it could a simple boolean method like shouldMerge(): returns docCount  0, 
called by addIndexes and mergeMiddle?

this way the logic added to addIndexes in LUCENE-5116 wouldnt even need to be 
there, and we'd feel better that we arent writing such 0 document segments 
(which codecs are not prepared to handle today).


 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5139) Make Core Admin more user friendly when in SolrCloud mode.

2013-08-13 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-5139:


Component/s: web gui

 Make Core Admin more user friendly when in SolrCloud mode.
 --

 Key: SOLR-5139
 URL: https://issues.apache.org/jira/browse/SOLR-5139
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud, web gui
Reporter: Mark Miller
 Fix For: 4.5, 5.0


 The CoreAdmin in the UI can easily get users into trouble - especially since 
 we don't yet have a collection management API. The info displayed is useful 
 though, and sometimes it makes sense to have access to the commands on a per 
 core level as well.
 We should improve the situation though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5101) Invalid UTF-8 character 0xfffe during shard update

2013-08-13 Thread Federico Chiacchiaretta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738302#comment-13738302
 ] 

Federico Chiacchiaretta commented on SOLR-5101:
---

Hi,
should this issue be reopened or filed elsewhere?
I'd like to track changes to Solr that may affect this issue (i.e. switch to 
javabin for updates).

Thanks,
Federico Chiacchiaretta

 Invalid UTF-8 character 0xfffe during shard update
 --

 Key: SOLR-5101
 URL: https://issues.apache.org/jira/browse/SOLR-5101
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.3
 Environment: Ubuntu 12.04.2
 java version 1.6.0_27
 OpenJDK Runtime Environment (IcedTea6 1.12.5) (6b27-1.12.5-0ubuntu0.12.04.1)
 OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Reporter: Federico Chiacchiaretta

 On data import from a PostgreSQL db, I get the following error in solr.log:
 ERROR - 2013-08-01 09:51:00.217; org.apache.solr.common.SolrException; shard 
 update error RetryNode: 
 http://172.16.201.173:8983/solr/archive/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Invalid UTF-8 character 0xfffe at char #416, byte #127)
at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
 This prevents the document from being successfully added to the index, and a 
 few documents targeting the same shard are also missing.
 This happens silently, because data import completes successfully, and the 
 whole number of documents reported as Added includes those who failed (and 
 are actually lost).
 Is there a known workaround for this issue?
 Regards,
 Federico Chiacchiaretta

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero


[ 
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738311#comment-13738311
 ] 

James Dyer commented on SOLR-5122:
--

Hoss,

I appreciate your reporting this  taking care of this as much as possible.  Do 
you know offhand a failing seed for this test?  (I've been away for awhile and 
might not have the jenkins log easily available.)  I will look at this.  
Likely, I need to require docs to be collected in order and mistakenly thought 
this was unnecessary.

 spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead 
 to ArithmeticException: / by zero
 

 Key: SOLR-5122
 URL: https://issues.apache.org/jira/browse/SOLR-5122
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Hoss Man
 Attachments: SOLR-5122.patch


 As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, 
 and this (aparently) led to a failure in testEstimatedHitCounts.
 As far as i can tell: the test assumes that specific values would be returned 
 as the _estimated_ hits for a colleation, and it appears that the change in 
 MergePolicy however resulted in different segments with different term stats, 
 causing the estimation code to produce different values then what is expected.
 I made a quick attempt to improve the test to:
  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is 
 set such that the estimate' should actually be exact (ie: 
 collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num 
 docs in the index
  * randomize the values used for collateMaxCollectDocs and confirm that the 
 estimates are never more then the num docs in the index
 This lead to an odd ArithmeticException: / by zero error in the test, which 
 seems to suggest that there is a genuine bug in the code for estimating the 
 hits that only gets tickled in certain 
 mergepolicy/segment/collateMaxCollectDocs combinations.
 *Update:* This appears to be a general problem with collecting docs out of 
 order and the estimation of hits -- i believe even if there is no divide by 
 zero error, the estimates are largely meaningless since the docs are 
 collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero


 [ 
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer reassigned SOLR-5122:


Assignee: James Dyer

 spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead 
 to ArithmeticException: / by zero
 

 Key: SOLR-5122
 URL: https://issues.apache.org/jira/browse/SOLR-5122
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Hoss Man
Assignee: James Dyer
 Attachments: SOLR-5122.patch


 As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, 
 and this (aparently) led to a failure in testEstimatedHitCounts.
 As far as i can tell: the test assumes that specific values would be returned 
 as the _estimated_ hits for a colleation, and it appears that the change in 
 MergePolicy however resulted in different segments with different term stats, 
 causing the estimation code to produce different values then what is expected.
 I made a quick attempt to improve the test to:
  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is 
 set such that the estimate' should actually be exact (ie: 
 collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num 
 docs in the index
  * randomize the values used for collateMaxCollectDocs and confirm that the 
 estimates are never more then the num docs in the index
 This lead to an odd ArithmeticException: / by zero error in the test, which 
 seems to suggest that there is a genuine bug in the code for estimating the 
 hits that only gets tickled in certain 
 mergepolicy/segment/collateMaxCollectDocs combinations.
 *Update:* This appears to be a general problem with collecting docs out of 
 order and the estimation of hits -- i believe even if there is no divide by 
 zero error, the estimates are largely meaningless since the docs are 
 collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-13 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738321#comment-13738321
 ] 

Jack Krupansky commented on SOLR-5017:
--

Is this feature intended for both traditional Solr sharding as well as 
SolrCloud?

If it is intended for SolrCloud as well, how does delete-by-id work, in the 
sense that the delete command does not include the field needed to determine 
routing?


 Allow sharding based on the value of a field
 

 Key: SOLR-5017
 URL: https://issues.apache.org/jira/browse/SOLR-5017
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 4.5, 5.0

 Attachments: SOLR-5017.patch


 We should be able to create a collection where sharding is done based on the 
 value of a given field
 collections can be created with shardField=fieldName, which will be persisted 
 in DocCollection in ZK
 implicit DocRouter would look at this field instead of _shard_ field
 CompositeIdDocRouter can also use this field instead of looking at the id 
 field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116

2013-08-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738344#comment-13738344
 ] 

Michael McCandless commented on LUCENE-5173:


+1, I like consolidating the logic into a single shouldMerge().

And I don't think codecs should be required to handle the 0 doc segment case: 
we should never send such a segment to them.

 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-13 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738375#comment-13738375
 ] 

Noble Paul commented on SOLR-5017:
--

This is only for SolrCloud 

deleteById/getById would expect the param \_route_ or shard.keys (deprecated) 
without which it will have to fan out a distributed request. it works without 
complaining but will be inefficient

 Allow sharding based on the value of a field
 

 Key: SOLR-5017
 URL: https://issues.apache.org/jira/browse/SOLR-5017
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 4.5, 5.0

 Attachments: SOLR-5017.patch


 We should be able to create a collection where sharding is done based on the 
 value of a given field
 collections can be created with shardField=fieldName, which will be persisted 
 in DocCollection in ZK
 implicit DocRouter would look at this field instead of _shard_ field
 CompositeIdDocRouter can also use this field instead of looking at the id 
 field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116

[
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738388#comment-13738388
]

Uwe Schindler commented on LUCENE-5173:
---

I agree with both. My complaint was the following:
The assert was not correct, as asserts should only be used for real assertions
withing the same class. For this special check, there is something outside of
SegmentMerger that could maybe insert empty readers into the merge queue, so
those should be thrown away while merging or when sergmentmerger initializes
(so move to ctor is a good idea). I am thinking about crazy stuff like a merge
policy that wraps with a FilterAtomicReader to filter while merging (like
IndexSorter) - which is possible with the current API.

So the segments should be removed on creating the SegmentMerger when all
readers to merge are already in the ListAtomicReader.

In the IndexWriter#addIndexes we may then just need the top-level check to not
even start a merge.

Add checkindex piece of LUCENE-5116
---

Key: LUCENE-5173
URL: https://issues.apache.org/jira/browse/LUCENE-5173
Project: Lucene - Core
Issue Type: Improvement
Reporter: Robert Muir
Attachments: LUCENE-5173.patch

LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in
the case you merge in empty or all-deleted stuff).
I considered it just an inconsistency, but it could cause confusing
exceptions to real users too if there was a regression here. (see solr
users list:Split Shard Error - maxValue must be non-negative).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero

2013-08-13 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738395#comment-13738395
 ] 

Hoss Man commented on SOLR-5122:


The initial jenkins failure i saw was At revision 1511278...

https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/343/
https://mail-archives.apache.org/mod_mbox/lucene-dev/201308.mbox/%3Calpine.DEB.2.02.1308070919170.13959@frisbee%3E

{quote}
I can reproduce this -- it's probably related to the MP randomization i 
put in ... looks like it's doing exact numeric comparisons based on term 
stats.  I'll take a look later today...

ant test  -Dtestcase=SpellCheckCollatorTest 
-Dtests.method=testEstimatedHitCounts -Dtests.seed=16B4D8F74E59EE10 
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true   -Dtests.locale=nl 
-Dtests.timezone=America/Dawson -Dtests.file.encoding=US-ASCII
{quote}

...regardless of he initial failure though, if you try out the patch i attached 
to try and improve the test coverage, then the reproduce line from the 
failure i posted along iwth that patch still reproduces on trunk (but you do 
have to manually uncomment the {{@Ignore}}...

{code}
ant test  -Dtestcase=SpellCheckCollatorTest 
-Dtests.method=testEstimatedHitCounts -Dtests.seed=16B4D8F74E59EE10 
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=nl 
-Dtests.timezone=America/Dawson -Dtests.file.encoding=US-ASCII
{code}

 spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead 
 to ArithmeticException: / by zero
 

 Key: SOLR-5122
 URL: https://issues.apache.org/jira/browse/SOLR-5122
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Hoss Man
Assignee: James Dyer
 Attachments: SOLR-5122.patch


 As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, 
 and this (aparently) led to a failure in testEstimatedHitCounts.
 As far as i can tell: the test assumes that specific values would be returned 
 as the _estimated_ hits for a colleation, and it appears that the change in 
 MergePolicy however resulted in different segments with different term stats, 
 causing the estimation code to produce different values then what is expected.
 I made a quick attempt to improve the test to:
  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is 
 set such that the estimate' should actually be exact (ie: 
 collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num 
 docs in the index
  * randomize the values used for collateMaxCollectDocs and confirm that the 
 estimates are never more then the num docs in the index
 This lead to an odd ArithmeticException: / by zero error in the test, which 
 seems to suggest that there is a genuine bug in the code for estimating the 
 hits that only gets tickled in certain 
 mergepolicy/segment/collateMaxCollectDocs combinations.
 *Update:* This appears to be a general problem with collecting docs out of 
 order and the estimation of hits -- i believe even if there is no divide by 
 zero error, the estimates are largely meaningless since the docs are 
 collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero

2013-08-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5122:
---

Attachment: SOLR-5122.patch

updated patch to trunk and included the commenting out of the {{@Ignore}} so 
all ou need to do is apply this patch to reproduce with the previously 
mentioned seed.

 spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead 
 to ArithmeticException: / by zero
 

 Key: SOLR-5122
 URL: https://issues.apache.org/jira/browse/SOLR-5122
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Hoss Man
Assignee: James Dyer
 Attachments: SOLR-5122.patch, SOLR-5122.patch


 As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, 
 and this (aparently) led to a failure in testEstimatedHitCounts.
 As far as i can tell: the test assumes that specific values would be returned 
 as the _estimated_ hits for a colleation, and it appears that the change in 
 MergePolicy however resulted in different segments with different term stats, 
 causing the estimation code to produce different values then what is expected.
 I made a quick attempt to improve the test to:
  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is 
 set such that the estimate' should actually be exact (ie: 
 collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num 
 docs in the index
  * randomize the values used for collateMaxCollectDocs and confirm that the 
 estimates are never more then the num docs in the index
 This lead to an odd ArithmeticException: / by zero error in the test, which 
 seems to suggest that there is a genuine bug in the code for estimating the 
 hits that only gets tickled in certain 
 mergepolicy/segment/collateMaxCollectDocs combinations.
 *Update:* This appears to be a general problem with collecting docs out of 
 order and the estimation of hits -- i believe even if there is no divide by 
 zero error, the estimates are largely meaningless since the docs are 
 collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

Tom Burton-West created LUCENE-5175:
---

 Summary: Add parameter to lower-bound TF normalization for BM25 
(for long documents)
 Key: LUCENE-5175
 URL: https://issues.apache.org/jira/browse/LUCENE-5175
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Tom Burton-West
Priority: Minor


In the article When Documents Are Very Long, BM25 Fails! a fix for the 
problem is documented.  There was a TODO note in BM25Similarity to add this 
fix. I will attach a patch that implements the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-13 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tom Burton-West updated LUCENE-5175:

Attachment: LUCENE-5175.patch

Patch adds optional parameter delta to lower-bound tf normalization. Attached
also are unit tests.

Still need to add tests of the explanation/scoring for cases 1) no norms, and
2) no delta

If no delta parameter is supplied, the math works out to the equivalent of the
regular BM25 formula as far as the score, but I think there is an extra step
or two to get there. I'll see if I can get some benchmarks running to see if
there is any significant performance issue.

Add parameter to lower-bound TF normalization for BM25 (for long documents)
---

Key: LUCENE-5175
URL: https://issues.apache.org/jira/browse/LUCENE-5175
Project: Lucene - Core
Issue Type: Improvement
Components: core/search
Reporter: Tom Burton-West
Priority: Minor
Attachments: LUCENE-5175.patch

In the article When Documents Are Very Long, BM25 Fails! a fix for the
problem is documented. There was a TODO note in BM25Similarity to add this
fix. I will attach a patch that implements the fix shortly.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins


[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738415#comment-13738415
 ] 

ASF subversion and git services commented on SOLR-3076:
---

Commit 1513577 from [~yo...@apache.org] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1513577 ]

SOLR-3076: block join parent and child queries

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-13 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738422#comment-13738422
 ] 

Shalin Shekhar Mangar commented on SOLR-5017:
-

Shard splitting doesn't support collections configured with a hash router and 
routeField. I'll put up a test and fix.

 Allow sharding based on the value of a field
 

 Key: SOLR-5017
 URL: https://issues.apache.org/jira/browse/SOLR-5017
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 4.5, 5.0

 Attachments: SOLR-5017.patch


 We should be able to create a collection where sharding is done based on the 
 value of a given field
 collections can be created with shardField=fieldName, which will be persisted 
 in DocCollection in ZK
 implicit DocRouter would look at this field instead of _shard_ field
 CompositeIdDocRouter can also use this field instead of looking at the id 
 field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4799) SQLEntityProcessor for zipper join

2013-08-13 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738424#comment-13738424
]

James Dyer commented on SOLR-4799:
--

Mikhail, This seems like a great feature, but I haven't looked at it. As I
said, I do not feel it wise to add features that won't neatly plug-in the
current DIH infrastructure until we improve the code. Really, I would love to
chop out features (Debug mode, delta updates, streaming from a POST request,
etc), and make it work independently from Solr before we build more into it.
But I've been busy with other things and haven't had much time.

By the way, have you any experience with Apache Flume? In your opinion, could
it become DIH's successor? A Solr Sink was added earlier in the year that will
index disparate data. I haven't looked much at it, but my first impression is
that it is a big, complicated tool whereas DIH is smaller and simpler and a the
2 would have different use-cases. Also, not so sure it has any support yet for
RDBMS.

SQLEntityProcessor for zipper join
--

Key: SOLR-4799
URL: https://issues.apache.org/jira/browse/SOLR-4799
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: Mikhail Khludnev
Priority: Minor
Labels: dih
Attachments: SOLR-4799.patch

DIH is mostly considered as a playground tool, and real usages end up with
SolrJ. I want to contribute few improvements target DIH performance.
This one provides performant approach for joining SQL Entities with miserable
memory at contrast to
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
The idea is:
* parent table is explicitly ordered by it’s PK in SQL
* children table is explicitly ordered by parent_id FK in SQL
* children entity processor joins ordered resultsets by ‘zipper’ algorithm.
Do you think it’s worth to contribute it into DIH?
cc: [~goksron] [~jdyer]

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-13 Thread Alan Woodward (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738434#comment-13738434
]

Alan Woodward commented on SOLR-4718:
-

Rather than adding Zookeeper stuff to ConfigSolr.fromSolrHome(), can we create
a new static method ConfigSolr.fromZookeeper()? And then push the system
property checks back out into SolrDispatchFilter or wherever fromSolrHome is
being called. Keeps each fromXXX method just doing one thing.

I wonder if it's worth refactoring the ByteArrayInputStream re-reading dance
into fromInputStream as well. It's a bit of a hack anyway, and I don't like
having it in more than once place.

Allow solr.xml to be stored in zookeeper

Key: SOLR-4718
URL: https://issues.apache.org/jira/browse/SOLR-4718
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Attachments: SOLR-4718.patch, SOLR-4718.patch

So the near-final piece of this puzzle is to make solr.xml be storable in
Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm
working on it now.
More interesting is how to get the configuration into ZK in the first place,
enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this
patch.
Second level is how to tell Solr to get the file from ZK. Some possibilities:
1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where
the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
- easy to script
cons - can't run multiple JVMs pointing to different files. Is this
really a problem?
2 New solr.xml element. Something like:
solr
solrcloud
str name=zkHostzkurl/str
str name=zkSolrXmlPathwhatever/str
/solrcloud
solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath.
If present, go up and look for the indicated solr.xml file on ZK. Any
properties in the ZK version would overwrite anything in the local copy.
NOTE: I'm really not very interested in supporting this as an option for
old-style solr.xml unless it's _really_ easy. For instance, what if the local
solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since
old-style is going away, this doesn't seem like it's worth the effort.
pros - No new mechanisms
cons - once again requires that there be a solr.xml file on each client.
Admittedly for installations that didn't care much about multiple JVMs, it
could be a stock file that didn't change...
For now, I'm going to just manually push solr.xml to ZK, then read it based
on a sysprop. That'll get the structure in place while we debate. Not going
to check this in until there's some consensus though.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests


[ 
https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738454#comment-13738454
 ] 

ASF subversion and git services commented on SOLR-4952:
---

Commit 1513586 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1513586 ]

SOLR-4952: use solrconfig.snippet.randomindexconfig.xml in the QueryElevation 
tests

 audit test configs to use solrconfig.snippet.randomindexconfig.xml in more 
 tests
 

 Key: SOLR-4952
 URL: https://issues.apache.org/jira/browse/SOLR-4952
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
Assignee: Hoss Man

 in SOLR-4942 i updated every solrconfig.xml to either...
 * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so
 * use the useCompoundFile sys prop if it already had an {{indexConfig}} 
 section, or if including the snippet wasn't going to be easy (ie: contrib 
 tests)
 As an improvment on this:
 * audit all core configs not already using 
 solrconfig.snippet.randomindexconfig.xml and either:
 ** make them use it, ignoring any previously unimportant explicit 
 incdexConfig settings
 ** make them use it, using explicit sys props to overwrite random values in 
 cases were explicit indexConfig values are important for test
 ** add a comment why it's not using the include snippet in cases where the 
 explicit parsing is part of hte test
 * try figure out a way for contrib tests to easily include the same file 
 and/or apply the same rules as above

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests

2013-08-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738459#comment-13738459
 ] 

ASF subversion and git services commented on SOLR-4952:
---

Commit 1513587 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1513587 ]

SOLR-4952: use solrconfig.snippet.randomindexconfig.xml in the QueryElevation 
tests (merge r1513586)

 audit test configs to use solrconfig.snippet.randomindexconfig.xml in more 
 tests
 

 Key: SOLR-4952
 URL: https://issues.apache.org/jira/browse/SOLR-4952
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
Assignee: Hoss Man

 in SOLR-4942 i updated every solrconfig.xml to either...
 * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so
 * use the useCompoundFile sys prop if it already had an {{indexConfig}} 
 section, or if including the snippet wasn't going to be easy (ie: contrib 
 tests)
 As an improvment on this:
 * audit all core configs not already using 
 solrconfig.snippet.randomindexconfig.xml and either:
 ** make them use it, ignoring any previously unimportant explicit 
 incdexConfig settings
 ** make them use it, using explicit sys props to overwrite random values in 
 cases were explicit indexConfig values are important for test
 ** add a comment why it's not using the include snippet in cases where the 
 explicit parsing is part of hte test
 * try figure out a way for contrib tests to easily include the same file 
 and/or apply the same rules as above

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738465#comment-13738465
 ] 

Mark Miller commented on SOLR-4718:
---

+1

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718.patch, SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac

2013-08-13 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738471#comment-13738471
 ] 

Steve Rowe commented on SOLR-4856:
--

I don't use Eclipse, so it may be that there is something else wrong that isn't 
apparent on casual inspection, but I can't reproduce the problem you're 
reporting here.

On my Macbook Pro with OS X 10.8.4, when I run {{ant eclipse}} from a Bash 
cmdline on {{branch_4x}} (using ant v1.8.2 and Oracle Java 1.7.0_25), the 
generated {{.project}} file contents start with: 

{code:xml}
?xml version=1.0 encoding=UTF-8?
projectDescription
namebranch_4x/name
comment/comment
projects
/projects
buildSpec
buildCommand
nameorg.eclipse.jdt.core.javabuilder/name
arguments
/arguments
/buildCommand
/buildSpec
natures
natureorg.eclipse.jdt.core.javanature/nature
/natures
filteredResources
 ...
{code}


 ant eclipse is not generating .project file correctly on mac
 

 Key: SOLR-4856
 URL: https://issues.apache.org/jira/browse/SOLR-4856
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4
 Environment: Mac OS X 10.8.2
 Eclipse Version: Juno Service Release 2
 Build id: 20130225-0426
Reporter: Kranti Parisa
Priority: Minor

 STEPS:
 
 - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno)
 - On the Terminal (command line) ran ant eclipse
 - Generated the eclipse .project, .classpath, .settings files
 - Refresh the project in Eclipse (I can see the files in the Navigator View) 
 along with the actual source code checked out from SVN
 - Open .project file and there is no buildSpec, natures elements in there
 - Hence not able to build it properly and use ctrl+clicks for the references
 I manually edited the .project file to have the following
 buildSpec
   buildCommand
   nameorg.eclipse.jdt.core.javabuilder/name
   arguments
   /arguments
   /buildCommand
   /buildSpec
   natures
   natureorg.eclipse.jdt.core.javanature/nature
   /natures
 Shouldn't it be automatically added to the .project file at the first place 
 when we run ant eclipse ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-13 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738470#comment-13738470
]

Mark Miller commented on SOLR-4718:
---

We should also probably be strict about the property values for the setting -
eg zookeeper works, solrhome works, null works (as solrhome), anything else
fails with an error.

Allow solr.xml to be stored in zookeeper

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)


[ 
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738474#comment-13738474
 ] 

Robert Muir commented on LUCENE-5175:
-

I can benchmark your patch with luceneutil Tom.
I know this thing is sensitive for some reason.

Really if there is a performance issue, worst case we can just call it BM25L or 
something?

Thanks for doing this!

 Add parameter to lower-bound TF normalization for BM25 (for long documents)
 ---

 Key: LUCENE-5175
 URL: https://issues.apache.org/jira/browse/LUCENE-5175
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Tom Burton-West
Priority: Minor
 Attachments: LUCENE-5175.patch


 In the article When Documents Are Very Long, BM25 Fails! a fix for the 
 problem is documented.  There was a TODO note in BM25Similarity to add this 
 fix. I will attach a patch that implements the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116


 [ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5173:


Attachment: LUCENE-5173_ugly.patch

here is a ugly patch, there must be a better way... sorry :)

I wonder if its too paranoid: however playing with the old patch I think i hit 
my own assert with testThreadInterruptDeadLock...

I will investigate that more, to see under what conditions we are doing these 0 
doc merges today.

 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch, LUCENE-5173_ugly.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-13 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738627#comment-13738627
]

Tom Burton-West commented on LUCENE-5175:
-

Thanks Robert,

In the article, they claim that the change doesn't have a performance impact.
On the other hand, I'm not familiar enough with Java performance to be able to
eyeball it, and it looks to me like we added one or more floating point
operations, so it would be good to benchmark, especially since the scoring alg
gets run against every hit, and we might have millions of hits for a poorly
chosen query. (And if we switch to page-level indexing we could have hundreds
of millions of hits).

I was actually considering making it a subclass instead of just modifying
BM25Similarity, so that it would be easy to benchmark, and if it turns out that
there is a significant perf difference, that users could choose which
implementation to use. I saw that computeWeight in BM25Similarity was final
and decided I didn't know enough about why this is final to either refactor to
create a base class, or change the method in order to subclass.

Is luceneutil the same as lucene benchmark? I've been wanting to learn how to
use lucene benchmark for some time.

Tom

Add parameter to lower-bound TF normalization for BM25 (for long documents)
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests


[ 
https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738640#comment-13738640
 ] 

ASF subversion and git services commented on SOLR-4952:
---

Commit 1513611 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1513611 ]

SOLR-4952: get all manged schema tests using 
solrconfig.snippet.randomindexconfig.xml - mainly by removing several 
solrconfig-*-managed-schema.xml files and using sys props in 
solrconfig-managed-schema.xml

 audit test configs to use solrconfig.snippet.randomindexconfig.xml in more 
 tests
 

 Key: SOLR-4952
 URL: https://issues.apache.org/jira/browse/SOLR-4952
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
Assignee: Hoss Man

 in SOLR-4942 i updated every solrconfig.xml to either...
 * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so
 * use the useCompoundFile sys prop if it already had an {{indexConfig}} 
 section, or if including the snippet wasn't going to be easy (ie: contrib 
 tests)
 As an improvment on this:
 * audit all core configs not already using 
 solrconfig.snippet.randomindexconfig.xml and either:
 ** make them use it, ignoring any previously unimportant explicit 
 incdexConfig settings
 ** make them use it, using explicit sys props to overwrite random values in 
 cases were explicit indexConfig values are important for test
 ** add a comment why it's not using the include snippet in cases where the 
 explicit parsing is part of hte test
 * try figure out a way for contrib tests to easily include the same file 
 and/or apply the same rules as above

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

[
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738664#comment-13738664
]

Robert Muir commented on LUCENE-5175:
-

Hi Tom:

I know for a fact i tried to remove the crazy cache (I created the monster)
that this thing creates, and it always hurts performance for example.

But I don't think we need to worry too much because:
# We should benchmark it the way you have it first and just see what we are
dealing with
# IF there is a problem, we could try to open it up to subclassing better,
maybe it even improves the API
# There is also the option of just having specialized SimScorers for the
delta=0 case.

So I am confident we will find a good solution.

As far as luceneutil we tried creating a README
(http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/README.txt)
to get started.

The basic idea is you pull down 2 different checkouts of lucene-trunk and setup
a competition between the two. There are two options important here: one is
to set the similarity for each competitor, the other can disable score
comparisons (I havent yet examined the patch to tell if they might differ
slightly, e.g. order of floating point ops and stuff).

But thats typically how i benchmark two Sim impls against each other.

Add parameter to lower-bound TF normalization for BM25 (for long documents)
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac

2013-08-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738672#comment-13738672
 ] 

Uwe Schindler commented on SOLR-4856:
-

Hi, could or be that the problem is because you check out from inside Eclipse 
using subclipse? Ant does not overwrite an already existing project file to no 
loose custom project settings. Make sure that after checkout all eclipse files 
are deleted, you can use ant clean-eclipse to do this.

 ant eclipse is not generating .project file correctly on mac
 

 Key: SOLR-4856
 URL: https://issues.apache.org/jira/browse/SOLR-4856
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4
 Environment: Mac OS X 10.8.2
 Eclipse Version: Juno Service Release 2
 Build id: 20130225-0426
Reporter: Kranti Parisa
Priority: Minor

 STEPS:
 
 - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno)
 - On the Terminal (command line) ran ant eclipse
 - Generated the eclipse .project, .classpath, .settings files
 - Refresh the project in Eclipse (I can see the files in the Navigator View) 
 along with the actual source code checked out from SVN
 - Open .project file and there is no buildSpec, natures elements in there
 - Hence not able to build it properly and use ctrl+clicks for the references
 I manually edited the .project file to have the following
 buildSpec
   buildCommand
   nameorg.eclipse.jdt.core.javabuilder/name
   arguments
   /arguments
   /buildCommand
   /buildSpec
   natures
   natureorg.eclipse.jdt.core.javanature/nature
   /natures
 Shouldn't it be automatically added to the .project file at the first place 
 when we run ant eclipse ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac

2013-08-13 Thread Kranti Parisa (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738682#comment-13738682
 ] 

Kranti Parisa commented on SOLR-4856:
-

yes, it's fine now. thanks.

 ant eclipse is not generating .project file correctly on mac
 

 Key: SOLR-4856
 URL: https://issues.apache.org/jira/browse/SOLR-4856
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4
 Environment: Mac OS X 10.8.2
 Eclipse Version: Juno Service Release 2
 Build id: 20130225-0426
Reporter: Kranti Parisa
Priority: Minor

 STEPS:
 
 - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno)
 - On the Terminal (command line) ran ant eclipse
 - Generated the eclipse .project, .classpath, .settings files
 - Refresh the project in Eclipse (I can see the files in the Navigator View) 
 along with the actual source code checked out from SVN
 - Open .project file and there is no buildSpec, natures elements in there
 - Hence not able to build it properly and use ctrl+clicks for the references
 I manually edited the .project file to have the following
 buildSpec
   buildCommand
   nameorg.eclipse.jdt.core.javabuilder/name
   arguments
   /arguments
   /buildCommand
   /buildSpec
   natures
   natureorg.eclipse.jdt.core.javanature/nature
   /natures
 Shouldn't it be automatically added to the .project file at the first place 
 when we run ant eclipse ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-4856) ant eclipse is not generating .project file correctly on mac

2013-08-13 Thread Kranti Parisa (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kranti Parisa closed SOLR-4856.
---

Resolution: Not A Problem

it was an environmental issue

 ant eclipse is not generating .project file correctly on mac
 

 Key: SOLR-4856
 URL: https://issues.apache.org/jira/browse/SOLR-4856
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4
 Environment: Mac OS X 10.8.2
 Eclipse Version: Juno Service Release 2
 Build id: 20130225-0426
Reporter: Kranti Parisa
Priority: Minor

 STEPS:
 
 - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno)
 - On the Terminal (command line) ran ant eclipse
 - Generated the eclipse .project, .classpath, .settings files
 - Refresh the project in Eclipse (I can see the files in the Navigator View) 
 along with the actual source code checked out from SVN
 - Open .project file and there is no buildSpec, natures elements in there
 - Hence not able to build it properly and use ctrl+clicks for the references
 I manually edited the .project file to have the following
 buildSpec
   buildCommand
   nameorg.eclipse.jdt.core.javabuilder/name
   arguments
   /arguments
   /buildCommand
   /buildSpec
   natures
   natureorg.eclipse.jdt.core.javanature/nature
   /natures
 Shouldn't it be automatically added to the .project file at the first place 
 when we run ant eclipse ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests


[ 
https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738687#comment-13738687
 ] 

ASF subversion and git services commented on SOLR-4952:
---

Commit 1513616 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1513616 ]

SOLR-4952: get all manged schema tests using 
solrconfig.snippet.randomindexconfig.xml - mainly by removing several 
solrconfig-*-managed-schema.xml files and using sys props in 
solrconfig-managed-schema.xml (merge r1513611)

 audit test configs to use solrconfig.snippet.randomindexconfig.xml in more 
 tests
 

 Key: SOLR-4952
 URL: https://issues.apache.org/jira/browse/SOLR-4952
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
Assignee: Hoss Man

 in SOLR-4942 i updated every solrconfig.xml to either...
 * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so
 * use the useCompoundFile sys prop if it already had an {{indexConfig}} 
 section, or if including the snippet wasn't going to be easy (ie: contrib 
 tests)
 As an improvment on this:
 * audit all core configs not already using 
 solrconfig.snippet.randomindexconfig.xml and either:
 ** make them use it, ignoring any previously unimportant explicit 
 incdexConfig settings
 ** make them use it, using explicit sys props to overwrite random values in 
 cases were explicit indexConfig values are important for test
 ** add a comment why it's not using the include snippet in cases where the 
 explicit parsing is part of hte test
 * try figure out a way for contrib tests to easily include the same file 
 and/or apply the same rules as above

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero


[ 
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738708#comment-13738708
 ] 

James Dyer commented on SOLR-5122:
--

The scenarios tested in testEstimatedHitCounts() seem to always pick a 
collector that does not accept docs out-of-order 
(TopFieldCollector$OneComparatorNonScoringCollector).  The problem looks like 
when a new segment/scorer is set, we get a new set of doc id's.  So prior to 
random merges, the test naively assummed everything was on 1 segment.  Now with 
multiple, all bets are off and I don't think we can be estimating hits.

I think the best fix is to dial back the functionality here and not offer hit 
estimates at all.  The functionality still would be beneficial in cases the 
user did not require hit-counts to be returned at all (for instance, ~rmuir 
mentioned using this feature with suggesters).  

Another option is to add together the doc ids for the various scorers that are 
looked at and pretend this is your max doc id.  I'm torn here because I'd hate 
to remove functionality that has been released but on the other hand if it is 
always going to give lousy estimates then why fool people?

Thoughts?

 spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead 
 to ArithmeticException: / by zero
 

 Key: SOLR-5122
 URL: https://issues.apache.org/jira/browse/SOLR-5122
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Hoss Man
Assignee: James Dyer
 Attachments: SOLR-5122.patch, SOLR-5122.patch


 As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, 
 and this (aparently) led to a failure in testEstimatedHitCounts.
 As far as i can tell: the test assumes that specific values would be returned 
 as the _estimated_ hits for a colleation, and it appears that the change in 
 MergePolicy however resulted in different segments with different term stats, 
 causing the estimation code to produce different values then what is expected.
 I made a quick attempt to improve the test to:
  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is 
 set such that the estimate' should actually be exact (ie: 
 collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num 
 docs in the index
  * randomize the values used for collateMaxCollectDocs and confirm that the 
 estimates are never more then the num docs in the index
 This lead to an odd ArithmeticException: / by zero error in the test, which 
 seems to suggest that there is a genuine bug in the code for estimating the 
 hits that only gets tickled in certain 
 mergepolicy/segment/collateMaxCollectDocs combinations.
 *Update:* This appears to be a general problem with collecting docs out of 
 order and the estimation of hits -- i believe even if there is no divide by 
 zero error, the estimates are largely meaningless since the docs are 
 collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac

2013-08-13 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738720#comment-13738720
 ] 

Steve Rowe commented on SOLR-4856:
--

Kranti, can you describe the environmental issue?  Someone else encountering 
the problem might benefit.

 ant eclipse is not generating .project file correctly on mac
 

 Key: SOLR-4856
 URL: https://issues.apache.org/jira/browse/SOLR-4856
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4
 Environment: Mac OS X 10.8.2
 Eclipse Version: Juno Service Release 2
 Build id: 20130225-0426
Reporter: Kranti Parisa
Priority: Minor

 STEPS:
 
 - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno)
 - On the Terminal (command line) ran ant eclipse
 - Generated the eclipse .project, .classpath, .settings files
 - Refresh the project in Eclipse (I can see the files in the Navigator View) 
 along with the actual source code checked out from SVN
 - Open .project file and there is no buildSpec, natures elements in there
 - Hence not able to build it properly and use ctrl+clicks for the references
 I manually edited the .project file to have the following
 buildSpec
   buildCommand
   nameorg.eclipse.jdt.core.javabuilder/name
   arguments
   /arguments
   /buildCommand
   /buildSpec
   natures
   natureorg.eclipse.jdt.core.javanature/nature
   /natures
 Shouldn't it be automatically added to the .project file at the first place 
 when we run ant eclipse ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins


[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738729#comment-13738729
 ] 

Uwe Schindler commented on SOLR-3076:
-

Hi Yonik, hi Mikhail,

the committed version seems to be much better than the top-level cache one! 
Many, many thanks for committing that one! This was my only problem with it. 
But as you say, we should really work on getting Solr to no longer use 
top-level caches for filters and facets. Also Filters should not use OpenBitSet 
anymore, instead FixedBitSet or one of the new compressed bitsets (maybe 
off-heap). FixedBitSet is also better supported by internal APIs, as some 
algorithms can directly use it (e.g. in BooleanFilter), not sure if this is 
relevant for Solr.

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ

2013-08-13 Thread ASF subversion and git services (JIRA)

Erick Erickson created SOLR-5141:


 Summary: the VelocityResponseWriter can't find lucene.IOUtils from 
within IntelliJ
 Key: SOLR-5141
 URL: https://issues.apache.org/jira/browse/SOLR-5141
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.5, 5.0


IRC chat with Steve Rowe pointed me at how to fix this, will check in 
momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ


[ 
https://issues.apache.org/jira/browse/SOLR-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738740#comment-13738740
 ] 

ASF subversion and git services commented on SOLR-5141:
---

Commit 1513628 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1513628 ]

SOLR-5141. lucene.IOUtils needs to be available for VelocityResopnseWriter in 
IntelliJ

 the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ
 -

 Key: SOLR-5141
 URL: https://issues.apache.org/jira/browse/SOLR-5141
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.5, 5.0


 IRC chat with Steve Rowe pointed me at how to fix this, will check in 
 momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac

2013-08-13 Thread Kranti Parisa (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738752#comment-13738752
 ] 

Kranti Parisa commented on SOLR-4856:
-

Earlier, I did checkout using Subclipse. But now, I tried on command line svn 
checkout and then did ant eclipse and the .project file does have the 
javabuilder commands. Not sure what was wrong with my eclipse environment for 
the Subclipse based checkout. Anyways, I think command line checkout is more 
safer and cleaner.

 ant eclipse is not generating .project file correctly on mac
 

 Key: SOLR-4856
 URL: https://issues.apache.org/jira/browse/SOLR-4856
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4
 Environment: Mac OS X 10.8.2
 Eclipse Version: Juno Service Release 2
 Build id: 20130225-0426
Reporter: Kranti Parisa
Priority: Minor

 STEPS:
 
 - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno)
 - On the Terminal (command line) ran ant eclipse
 - Generated the eclipse .project, .classpath, .settings files
 - Refresh the project in Eclipse (I can see the files in the Navigator View) 
 along with the actual source code checked out from SVN
 - Open .project file and there is no buildSpec, natures elements in there
 - Hence not able to build it properly and use ctrl+clicks for the references
 I manually edited the .project file to have the following
 buildSpec
   buildCommand
   nameorg.eclipse.jdt.core.javabuilder/name
   arguments
   /arguments
   /buildCommand
   /buildSpec
   natures
   natureorg.eclipse.jdt.core.javanature/nature
   /natures
 Shouldn't it be automatically added to the .project file at the first place 
 when we run ant eclipse ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac

2013-08-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738758#comment-13738758
 ] 

Uwe Schindler commented on SOLR-4856:
-

As I said before.  Ant eclipse does not overwrite an already existing project 
file.  When you check out from inside Eclipse one is generated by Eclipse 
itself. 

Use Ant clean-eclipse to remove all eclipse specific files from checkout before 
regenerating.

 ant eclipse is not generating .project file correctly on mac
 

 Key: SOLR-4856
 URL: https://issues.apache.org/jira/browse/SOLR-4856
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4
 Environment: Mac OS X 10.8.2
 Eclipse Version: Juno Service Release 2
 Build id: 20130225-0426
Reporter: Kranti Parisa
Priority: Minor

 STEPS:
 
 - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno)
 - On the Terminal (command line) ran ant eclipse
 - Generated the eclipse .project, .classpath, .settings files
 - Refresh the project in Eclipse (I can see the files in the Navigator View) 
 along with the actual source code checked out from SVN
 - Open .project file and there is no buildSpec, natures elements in there
 - Hence not able to build it properly and use ctrl+clicks for the references
 I manually edited the .project file to have the following
 buildSpec
   buildCommand
   nameorg.eclipse.jdt.core.javabuilder/name
   arguments
   /arguments
   /buildCommand
   /buildSpec
   natures
   natureorg.eclipse.jdt.core.javanature/nature
   /natures
 Shouldn't it be automatically added to the .project file at the first place 
 when we run ant eclipse ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lazily-loaded cores and SolrCloud

2013-08-13 Thread Erick Erickson

There was a question on the user's list today about making lazily-loaded
(aka transient) cores work with SolrCloud where I basically punted and said
not designed with that in mind. I've kind of avoided thinking about this
as the use-case; the transient code wasn't written with SolrCloud in mind.

But what is the general reaction to that pairing? Mostly I'm looking for
feedback at the level of no way that could work without invasive changes
to SolrCloud, don't even go there or sure, just allow ZK to get a list of
all cores and it'll be fine, the user is responsible for the quirks
though. Some questions that come to my mind:

 Is a core that's not loaded be considered live by ZK? Would simply
returning a list of all cores (both loaded and not loaded) be sufficient
for ZK? (this list is already available so the admin UI can list all cores).

 Does SolrCloud distributed update processing go through (or could be made
to go through) the path that autoloads a core?

 Ditto for querying. I suspect the answer to both is that it'll just
happen.

 Would the idea of waiting for all the cores to load on all the nodes for
an update be totally unacceptable? We already have the distributed deadlock
potential, this seems to make that more likely by lengthening out the time
the semaphore in question is held.

 Would re-synching/leader election be an absolute nightmare? I can imagine
that if all the cores for a particular shard weren't loaded at startup,
there'd be a terrible time waiting for leader election for instance.

 Stuff I haven't thought of

Mostly I'm trying to get a sense of the community here about whether
supporting transient cores in SolrCloud mode would be something that would
be easy/do-able/really_hard/totally_unacceptable.

Thanks,
Erick

[jira] [Created] (SOLR-5142) Block Indexing / Join Improvements

2013-08-13 Thread Yonik Seeley (JIRA)

Yonik Seeley created SOLR-5142:
--

 Summary: Block Indexing / Join Improvements
 Key: SOLR-5142
 URL: https://issues.apache.org/jira/browse/SOLR-5142
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Fix For: 4.5, 5.0


Follow-on main issue for general block indexing / join improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-13 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-3076.


Resolution: Fixed

Closing... 
I opened SOLR-5142 for additional work.

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ


[ 
https://issues.apache.org/jira/browse/SOLR-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738789#comment-13738789
 ] 

ASF subversion and git services commented on SOLR-5141:
---

Commit 1513640 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1513640 ]

=SOLR-5141 the VelocityResponseWriter can't find lucene.IOUtils from within 
IntelliJ

 the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ
 -

 Key: SOLR-5141
 URL: https://issues.apache.org/jira/browse/SOLR-5141
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.5, 5.0


 IRC chat with Steve Rowe pointed me at how to fix this, will check in 
 momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ


 [ 
https://issues.apache.org/jira/browse/SOLR-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5141.
--

Resolution: Fixed

Thanks for the pointers Steve!

 the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ
 -

 Key: SOLR-5141
 URL: https://issues.apache.org/jira/browse/SOLR-5141
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.5, 5.0


 IRC chat with Steve Rowe pointed me at how to fix this, will check in 
 momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-13 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738801#comment-13738801
 ] 

Mikhail Khludnev commented on SOLR-3076:


Yonik,
Thanks and Congratulations!

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-4836) overwrite=true support for block updates

2013-08-13 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev closed SOLR-4836.
--

Resolution: Won't Fix

as far as I understand it's already covered by SOLR-3076

 overwrite=true support for block updates
 

 Key: SOLR-4836
 URL: https://issues.apache.org/jira/browse/SOLR-4836
 Project: Solr
  Issue Type: Sub-task
  Components: update
Reporter: Mikhail Khludnev
 Fix For: 4.5, 5.0


 functional extension for SOLR-3076. 
 I just want to propose an approach for the subj. We can treat uniqueKey as 
 a key for whole block not for single document, sadly it's not really backward 
 compatible. Otherwise we can introduce uniqueBlockKey tag in schema.xml. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116


 [ 
https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5173:


Attachment: LUCENE-5173.patch

Here's a cleaned up version... maybe its OK.

As far as the stuff i saw with the first patch on this issue, maybe it was due 
to running tests from eclipse (I beasted TestIndexWriter with it from 
curiousity, but nothing came out)... its old news anyway I guess.

 Add checkindex piece of LUCENE-5116
 ---

 Key: LUCENE-5173
 URL: https://issues.apache.org/jira/browse/LUCENE-5173
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5173.patch, LUCENE-5173.patch, 
 LUCENE-5173_ugly.patch


 LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in 
 the case you merge in empty or all-deleted stuff).
 I considered it just an inconsistency, but it could cause confusing 
 exceptions to real users too if there was a regression here. (see solr 
 users list:Split Shard Error - maxValue must be non-negative). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: lazily-loaded cores and SolrCloud

2013-08-13 Thread Yonik Seeley

At a high level, I think the idea is fine (and I've seen a number of
people that wanted it).
The question is more around one of implementation... would it make a
mess of things or not.
The answer to that I think is probably mostly related to issues around
how zookeeper is currently handled.
I don't see any issues with other things like spinning up a core when
a request comes in for it.

-Yonik
http://lucidworks.com

On Tue, Aug 13, 2013 at 4:26 PM, Erick Erickson erickerick...@gmail.com wrote:
 There was a question on the user's list today about making lazily-loaded
 (aka transient) cores work with SolrCloud where I basically punted and said
 not designed with that in mind. I've kind of avoided thinking about this
 as the use-case; the transient code wasn't written with SolrCloud in mind.

 But what is the general reaction to that pairing? Mostly I'm looking for
 feedback at the level of no way that could work without invasive changes to
 SolrCloud, don't even go there or sure, just allow ZK to get a list of all
 cores and it'll be fine, the user is responsible for the quirks though.
 Some questions that come to my mind:

 Is a core that's not loaded be considered live by ZK? Would simply
 returning a list of all cores (both loaded and not loaded) be sufficient for
 ZK? (this list is already available so the admin UI can list all cores).

 Does SolrCloud distributed update processing go through (or could be made
 to go through) the path that autoloads a core?

 Ditto for querying. I suspect the answer to both is that it'll just
 happen.

 Would the idea of waiting for all the cores to load on all the nodes for
 an update be totally unacceptable? We already have the distributed deadlock
 potential, this seems to make that more likely by lengthening out the time
 the semaphore in question is held.

 Would re-synching/leader election be an absolute nightmare? I can imagine
 that if all the cores for a particular shard weren't loaded at startup,
 there'd be a terrible time waiting for leader election for instance.

 Stuff I haven't thought of

 Mostly I'm trying to get a sense of the community here about whether
 supporting transient cores in SolrCloud mode would be something that would
 be easy/do-able/really_hard/totally_unacceptable.

 Thanks,
 Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-13 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738797#comment-13738797
 ] 

Yonik Seeley commented on SOLR-3076:


bq. Filters should not use OpenBitSet anymore, instead FixedBitSet

Hey, it isn't my fault that Lucene chose to fork OpenBitSet ;-)

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)


[ 
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738818#comment-13738818
 ] 

Tom Burton-West commented on LUCENE-5175:
-

I wondered about that crazy cache, in that it makes the implementation 
dependent on the norms implementation.  

BTW: It looks to me with Lucene's default norms that there are only about 130 
or so document lengths.  If there is no boosting going on the byte value has 
to get to 124 for a doclenth = 1, so there are only 255-124 =131 possible 
different lengths.

i=124 norm=1.0,doclen=1.0

 Add parameter to lower-bound TF normalization for BM25 (for long documents)
 ---

 Key: LUCENE-5175
 URL: https://issues.apache.org/jira/browse/LUCENE-5175
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Tom Burton-West
Priority: Minor
 Attachments: LUCENE-5175.patch


 In the article When Documents Are Very Long, BM25 Fails! a fix for the 
 problem is documented.  There was a TODO note in BM25Similarity to add this 
 fix. I will attach a patch that implements the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4718) Allow solr.xml to be stored in zookeeper

[
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-4718:
-

Attachment: SOLR-4718.patch

I have to run right now, but is this what you two had in mind?

Not all of the new tests run, but I have to leave for the evening and wanted to
see if this is down the right path.

Haven't dealt with the bytestream yet.

Allow solr.xml to be stored in zookeeper

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4718) Allow solr.xml to be stored in zookeeper