[jira] [Commented] (SOLR-4679) HTML line breaks (br) are removed during indexing; causes wrong search results

2013-04-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626309#comment-13626309
 ] 

Christoph Straßer commented on SOLR-4679:
-

Thank you for checking Tika.

As far as i understand http://wiki.apache.org/solr/ExtractingRequestHandler 
extracts XHTML, not text. Tika XHTML-option-output looks okay too. 

Root issue - like you said - probably somewhere within Solr.

{noformat}
D:\temp\20130409java -jar tika-app-1.3.jar --xml external.htm
?xml version=1.0 encoding=UTF-8?html 
xmlns=http://www.w3.org/1999/xhtml;
head
meta name=Content-Length content=193/
meta name=Content-Encoding content=windows-1252/
meta name=Content-Type content=text/html; charset=windows-1252/
meta name=resourceName content=external.htm/
meta name=dc:title content=Test mit HTML-Zeilenschaltungen/
titleTest mit HTML-Zeilenschaltungen/title
/head
bodyp
word1
word2

Some other words, a special name like linz
and another special name - vienna
/p

/body/html
{noformat}

 HTML line breaks (br) are removed during indexing; causes wrong search 
 results
 

 Key: SOLR-4679
 URL: https://issues.apache.org/jira/browse/SOLR-4679
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.2
 Environment: Windows Server 2008 R2, Java 6, Tomcat 7
Reporter: Christoph Straßer
 Attachments: external.htm, Solr_HtmlLineBreak_Linz_NotFound.png, 
 Solr_HtmlLineBreak_Vienna.png


 HTML line breaks (br, BR, br/, ...) seem to be removed during 
 extraction of content from HTML-Files. They need to be replaced with a empty 
 space.
 Test-File:
 html
 head
 titleTest mit HTML-Zeilenschaltungen/title
 /head
 p
 word1brword2br/
 Some other words, a special name like linzbrand another special name - 
 vienna
 /p
 /html
 The Solr-content-attribute contains the following text:
 Test mit HTML-Zeilenschaltungen
 word1word2
 Some other words, a special name like linzand another special name - vienna
 So we are not able to find the word linz.
 We use the ExtractingRequestHandler to put content into Solr. 
 (wiki.apache.org/solr/ExtractingRequestHandler)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4137) FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder

2013-04-09 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626319#comment-13626319
 ] 

Markus Jelsma commented on SOLR-4137:
-

Simon, we'll try to reproduce the problem without LUCENE-4899 and report if we 
can and whether the patch works..

 FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder
 --

 Key: SOLR-4137
 URL: https://issues.apache.org/jira/browse/SOLR-4137
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.6.1
Reporter: Marcel

 under some circumstances the BaseFragmentsBuilder genereates a 
 StringIndexOutOfBoundsException inside the makeFragment method.
 The starting offset is higher than the end offset.
 I did a small patch checking the offsets and posted it over there at 
 Stackoverflow: 
 http://stackoverflow.com/questions/12456448/solr-highlight-bug-with-usefastvectorhighlighter
 The code in 4.0 seems to be the same as in 3.6.1
 Example how to reproduce the behaviour:
 There is a word called www.DAKgesundAktivBonus.de inside the index. If you 
 search for dak bonus some offset calculations went wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4137) FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder

2013-04-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626322#comment-13626322
 ] 

Simon Willnauer commented on SOLR-4137:
---

thanks markus, this would be awesome!

 FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder
 --

 Key: SOLR-4137
 URL: https://issues.apache.org/jira/browse/SOLR-4137
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.6.1
Reporter: Marcel

 under some circumstances the BaseFragmentsBuilder genereates a 
 StringIndexOutOfBoundsException inside the makeFragment method.
 The starting offset is higher than the end offset.
 I did a small patch checking the offsets and posted it over there at 
 Stackoverflow: 
 http://stackoverflow.com/questions/12456448/solr-highlight-bug-with-usefastvectorhighlighter
 The code in 4.0 seems to be the same as in 3.6.1
 Example how to reproduce the behaviour:
 There is a word called www.DAKgesundAktivBonus.de inside the index. If you 
 search for dak bonus some offset calculations went wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_17) - Build # 5084 - Still Failing!

2013-04-09 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5084/
Java: 32bit/jdk1.7.0_17 -client -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin 
{#4 seed=[7DB836996B6A4C58:49C7E98A6C01A087]}

Error Message:
Didn't match 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@192e709
 in Rect(minX=4.0,maxX=230.0,minY=-128.0,maxY=98.0) Expect: [0] (of 1)

Stack Trace:
java.lang.AssertionError: Didn't match 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@192e709
 in Rect(minX=4.0,maxX=230.0,minY=-128.0,maxY=98.0) Expect: [0] (of 1)
at 
__randomizedtesting.SeedInfo.seed([7DB836996B6A4C58:49C7E98A6C01A087]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:186)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:722)




Build Log:
[...truncated 7983 lines...]
[junit4:junit4] Suite: 

Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception

2013-04-09 Thread Lyuba Romanchuk
Hi all,

We run solr (4.2 and 5.0) in a real time environment with big data. Each
day two Solr cores are generated that can reach ~8-10g, depending on the
insertion rates and on different hardware.

Currently, all cores are loaded on solr startup.

The query rate is not high but the response must be quick and must be
returned even for old data and over a large time frame.

There are a lot of simple queries (facet/facet.pivot for small distributed
fields) but there are also heavy queries like facet.pivot for a large-scale
distributed fields. We use distributed search to query the cores and,
usually, the query over 1-2 weeks (around 7-28 cores).

After some large queries (with facet.pivot for wide distributed fields) we
sometimes encounter a java.lang.OutOfMemoryError: Java heap space
exception:.

The software is to be deployed to customer sites so increasing memory would
not always be possible, and the customers may want to get slower responses
for the larger queries, if we can provide them.

We looked at the LotsOfCores functionality that was added in 4.1 and 4.2.
It enables defining an upper limit of online cores and unloading them when
the cache gets full on a LRU basis. However in our case it seems a more
general use case is needed:

* Only cores that are used for updates/inserts must be loaded at all times.
Other cores, which are queried only, should be loaded / unloaded on demand
while the query runs, until completion – according to memory demands.

* Each facet, facet.pivot must be estimated for memory consumption. In case
there is not enough memory to run the query for all cores concurrently it
must be separated into sequential queries, unloading already queried or
irrelevant cores (but not permanent cores) and loading older cores to
complete the query.

* Occasionally, the oldest cores should be unloaded according to a
configurable policy (for example, one type of high volume cores will be
kept loaded for 1 week, while smaller cores can remain loaded for a month).
The policy will allow for data we know is queried less but is higher volume
to be kept live over shorter time periods.

We are considering adding the following functionality to Solr (optional –
turned on by new configs):

The flow of SolrCore.execute() function will be changed:


   - Change status of the core to “USED”
   - Call waitForResource(SolrRequestHandler, SolrQueryRequest) function
  - estimate the required memory for this query/handler on this core
  - if there is no enough free resources to run the query then
 - if all cores are permanent and can’t be unloaded then
- throw a OutOfMemoryError  exception // here the status of
the core should be changed to “UNUSED”
 - else
-  try to unload unused, not permanent cores
- if unloading unused cores didn’t release enough resources and
no core can be unloaded then
   - throw an OutOfMemoryError  exception // here the status
   of the core should be changed to “UNUSED”
- if unloading unused cores didn’t release enough resources and
there are cores that can be unloaded then
- wait with timeout till some resource is released
   - check again until the required resource is available or
   the exception is thrown
   - reserve the resource
   - Call the current SolrCore.execute()
   - Change status of the core to “UNUSED”

We would like to get some initial feedback on the design / functionality
we’re proposing as we feel this really benefits real-time, high volume
indexing systems such as ours. We are also happy to contribute the code
back if you feel there is a need for this functionality.

Best regards,

Lyuba


[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-04-09 Thread Stein J. Gran (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626417#comment-13626417
 ] 

Stein J. Gran commented on SOLR-2894:
-

Andrew, which version does the latest patch apply to? I've tried applying it to 
trunk, branch_4x and 4.2.1 without any luck so far. I'm planning on testing 
this patch in a SolrCloud environment with lots of pivot facet queries.

For trunk I get this:
patching file `solr/core/src/java/org/apache/solr/request/SimpleFacets.java'
Hunk #1 succeeded at 323 with fuzz 2 (offset 51 lines).
Hunk #2 FAILED at 374.
1 out of 2 hunks FAILED -- saving rejects to solr/core/src/java/org/apache/solr/
request/SimpleFacets.java.rej

The rej file seems similar for trunk and the 4.2.1 tag

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.3

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4670) Core mismatch in concurrent documents creation

2013-04-09 Thread Alberto Ferrini (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Ferrini updated SOLR-4670:
--

Affects Version/s: 4.2.1

 Core mismatch in concurrent documents creation
 --

 Key: SOLR-4670
 URL: https://issues.apache.org/jira/browse/SOLR-4670
 Project: Solr
  Issue Type: Bug
  Components: multicore, SolrCloud
Affects Versions: 4.0, 4.1, 4.2, 4.2.1
 Environment: CPU: 32x AMD Opteron(TM) Processor 6276
 RAM: 132073620 kB
 OS: Red Hat Enterprise Linux Server release 5.7 (Tikanga)
 JDK 1.6.0_21
 JBoss [EAP] 4.3.0.GA_CP09
 Apache Solr 4.x
 Apache ZooKeeper 3.4.5
Reporter: Alberto Ferrini
  Labels: concurrency, multicore, solrcloud, zookeeper

 The issue can be reproduced in this way:
 - Install SolrCloud with at least 2 nodes
 - Install ZooKeeper with at least 2 nodes
 - Create 30 cores
 - After each core creation, create 20 random generated documents in a random 
 existent core with 2 concurrent threads on all solr nodes (for example, 
 document 1 in core 3 on node 1, document 2 in core 5 on node 1, document 3 in 
 core 3 on node 2, etc...).
 - After all cores creation, query each core for all documents and compare 
 insert data with query result
 Some documents result in different core than they are created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626429#comment-13626429
 ] 

Adrien Grand commented on LUCENE-4858:
--

Thanks for updating the patch, Shai.

bq. Adrien, do we have anything else to do here, or are we ready to go? If so, 
I'll add a CHANGES entry and commit later.

The patch looks good to me. Maybe NumericDocValuesSorter.getID() could just 
return 'fieldName'? I think it's not necessary to describe the doc values type 
since they are exclusive and doc values are the natural way to sort documents 
by field values in Lucene? Otherwise +1.

 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4918) Highlighter closes the given IndexReader

2013-04-09 Thread Sirvan Yahyaei (JIRA)
Sirvan Yahyaei created LUCENE-4918:
--

 Summary: Highlighter closes the given IndexReader
 Key: LUCENE-4918
 URL: https://issues.apache.org/jira/browse/LUCENE-4918
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Sirvan Yahyaei
Priority: Minor
 Fix For: 4.3


If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, 
WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter 
reader (IndexReader) instead of closing the member variable 'reader'. To fix, 
line 519 of WeightedSpanTermExtractor should be changed from 
IOUtils.close(reader) to IOUtils.close(this.reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626435#comment-13626435
 ] 

Shai Erera commented on LUCENE-4858:


bq. Maybe NumericDocValuesSorter.getID() could just return 'fieldName'?

The reason I did that is in case someone will want to sort by a stored field 
and numeric field which have same names. I know it's probably very low chance, 
but numericdv_field is really unique, as you cannot have two numeric DV 
fields with the same name, but different meaning.

 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4918) Highlighter closes the given IndexReader

2013-04-09 Thread Sirvan Yahyaei (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirvan Yahyaei updated LUCENE-4918:
---

Attachment: LuceneHighlighter.java

I have attached a simple class to show the issue.

 Highlighter closes the given IndexReader
 

 Key: LUCENE-4918
 URL: https://issues.apache.org/jira/browse/LUCENE-4918
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Sirvan Yahyaei
Priority: Minor
 Fix For: 4.3

 Attachments: LuceneHighlighter.java


 If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, 
 WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter 
 reader (IndexReader) instead of closing the member variable 'reader'. To fix, 
 line 519 of WeightedSpanTermExtractor should be changed from 
 IOUtils.close(reader) to IOUtils.close(this.reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



re:

2013-04-09 Thread Qiang Zhou
http://privilegedconnections.com/wp-content/plugins/wp_mod/likeit.php?qxkwgry712nngczt































































































































=
Those are the whitest teeth I have ever come across.

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-04-09 Thread Sviatoslav Lisenkin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626441#comment-13626441
 ] 

Sviatoslav Lisenkin commented on SOLR-2894:
---

Hello, everyone. 
I had applied the latest patch two weeks ago (rev.1465879), faced the issues 
with merging in SimpleFacets class near 'incomingMinCount' variable, fixed them 
manually (just renaming). Simple pivot faceting via web UI and sample Solr 
installation with two nodes worked fine. I really appreciate if someone have a 
chance to test it under load etc.
Hope, this patch (and feature) will be included in the upcoming release.


 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.3

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4918) Highlighter closes the given IndexReader

2013-04-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-4918:
---

Assignee: Simon Willnauer

 Highlighter closes the given IndexReader
 

 Key: LUCENE-4918
 URL: https://issues.apache.org/jira/browse/LUCENE-4918
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Sirvan Yahyaei
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.3

 Attachments: LuceneHighlighter.java


 If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, 
 WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter 
 reader (IndexReader) instead of closing the member variable 'reader'. To fix, 
 line 519 of WeightedSpanTermExtractor should be changed from 
 IOUtils.close(reader) to IOUtils.close(this.reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4918) Highlighter closes the given IndexReader

2013-04-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626442#comment-13626442
 ] 

Simon Willnauer commented on LUCENE-4918:
-

argh, I guess that was my fault. I will add a test and dig

 Highlighter closes the given IndexReader
 

 Key: LUCENE-4918
 URL: https://issues.apache.org/jira/browse/LUCENE-4918
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Sirvan Yahyaei
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.3

 Attachments: LuceneHighlighter.java


 If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, 
 WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter 
 reader (IndexReader) instead of closing the member variable 'reader'. To fix, 
 line 519 of WeightedSpanTermExtractor should be changed from 
 IOUtils.close(reader) to IOUtils.close(this.reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4919) IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)
Renaud Delbru created LUCENE-4919:
-

 Summary: IntsRef, BytesRef and CharsRef returns incorrect hashcode 
when filled with 0
 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3


IntsRef, BytesRef and CharsRef implementation does not follow the java 
Arrays.hashCode implementation, and returns incorrect hashcode when filled with 
0. 
For example, an IntsRef with { 0 } will return the same hashcode than an 
IntsRef with { 0, 0 }.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4919) IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renaud Delbru updated LUCENE-4919:
--

Description: 
IntsRef, BytesRef and CharsRef implementation does not follow the java 
Arrays.hashCode implementation, and returns incorrect hashcode when filled with 
0. 
For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
IntsRef with \{ 0, 0 \}.

  was:
IntsRef, BytesRef and CharsRef implementation does not follow the java 
Arrays.hashCode implementation, and returns incorrect hashcode when filled with 
0. 
For example, an IntsRef with { 0 } will return the same hashcode than an 
IntsRef with { 0, 0 }.


 IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0
 

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3


 IntsRef, BytesRef and CharsRef implementation does not follow the java 
 Arrays.hashCode implementation, and returns incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4919) IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renaud Delbru updated LUCENE-4919:
--

Attachment: LUCENE-4919.patch

Here is a patch for IntsRef, BytesRef and CharsRef including unit tests. The 
new hashcode implementation is identical to the one found in Arrays.hashCode.

 IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0
 

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation does not follow the java 
 Arrays.hashCode implementation, and returns incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renaud Delbru updated LUCENE-4919:
--

Description: 
IntsRef, BytesRef and CharsRef implementation do not follow the java 
Arrays.hashCode implementation, and return incorrect hashcode when filled with 
0. 
For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
IntsRef with \{ 0, 0 \}.

  was:
IntsRef, BytesRef and CharsRef implementation does not follow the java 
Arrays.hashCode implementation, and returns incorrect hashcode when filled with 
0. 
For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
IntsRef with \{ 0, 0 \}.

Summary: IntsRef, BytesRef and CharsRef return incorrect hashcode when 
filled with 0  (was: IntsRef, BytesRef and CharsRef returns incorrect hashcode 
when filled with 0)

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626450#comment-13626450
 ] 

Robert Muir commented on LUCENE-4919:
-

The hashcode here is not arbitrary, as mentioned in the javadocs:

{noformat}
  /** Calculates the hash code as required by TermsHash during indexing.
   * pIt is defined as:
   * pre class=prettyprint
   *  int hash = 0;
   *  for (int i = offset; i lt; offset + length; i++) {
   *hash = 31*hash + bytes[i];
   *  }
   * /pre
   */
{noformat}

There is code in BytesRefHash that relies upon this.

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626453#comment-13626453
 ] 

Robert Muir commented on LUCENE-4919:
-

This patch also doesn't fix code in UnicodeUtil that relies upon this.

I think i'm against the change: the whole issue is wrong to me as the hashcode 
does what it documents it should do already, and a lot of things rely upon the 
current function.

I don't understand why the javadocs for BytesRef.hashCode make it seem like it 
should be doing something else.

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626454#comment-13626454
 ] 

Renaud Delbru commented on LUCENE-4919:
---

Hi Robert,

From my understanding, this applies only for BytesRef (even if this behavior 
sounds dangerous to me). However, why is IntsRef and CharsRef following the 
same behavior ?

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626458#comment-13626458
 ] 

Renaud Delbru commented on LUCENE-4919:
---

I see that BytesRef is used a bit everywhere in various contexts, contexts 
which are different from the TermsHash context. This hashcode behavior might 
cause unexpected problems, as I am sure most of the users of BytesRef are 
unaware of this particular behavior.

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626461#comment-13626461
 ] 

Robert Muir commented on LUCENE-4919:
-

The current hashcode seems to correspond with String.hashCode.

I'm not against this change on some theoretical basis, only mentioning that to 
me there is no bug (it does exactly what it says it should do), and that 
changing it without being thorough will only create bugs since things rely upon 
this.

Any patch to change the hashcode needs to update all these additional things, 
such as methods in UnicodeUtil, BytesRefHash collision probing, javadocs in 
TermToBytesRefAttribute, and anything else that relies upon this: otherwise it 
only causes more harm than good.

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626462#comment-13626462
 ] 

Simon Willnauer commented on LUCENE-4919:
-

I am not getting why this should return the same as Arrays.hashCode, can you 
elaborate on this a bit?

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4918) Highlighter closes the given IndexReader

2013-04-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4918:


Attachment: LUCENE-4918.patch

here is a patch

 Highlighter closes the given IndexReader
 

 Key: LUCENE-4918
 URL: https://issues.apache.org/jira/browse/LUCENE-4918
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.2
Reporter: Sirvan Yahyaei
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4918.patch, LuceneHighlighter.java


 If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, 
 WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter 
 reader (IndexReader) instead of closing the member variable 'reader'. To fix, 
 line 519 of WeightedSpanTermExtractor should be changed from 
 IOUtils.close(reader) to IOUtils.close(this.reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626471#comment-13626471
 ] 

Renaud Delbru commented on LUCENE-4919:
---

Ok, I understand Robert. That sounds like a big task. I can try to make a first 
pass over it in the next days if you think it is worth it (personally I would 
feel more reassured knowing that the hashcode follows a more common behavior).

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4918) Highlighter closes the given IndexReader

2013-04-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4918:


Lucene Fields: New,Patch Available  (was: New)
Affects Version/s: 4.2.1
Fix Version/s: 5.0

 Highlighter closes the given IndexReader
 

 Key: LUCENE-4918
 URL: https://issues.apache.org/jira/browse/LUCENE-4918
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.2, 4.2.1
Reporter: Sirvan Yahyaei
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4918.patch, LuceneHighlighter.java


 If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, 
 WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter 
 reader (IndexReader) instead of closing the member variable 'reader'. To fix, 
 line 519 of WeightedSpanTermExtractor should be changed from 
 IOUtils.close(reader) to IOUtils.close(this.reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626475#comment-13626475
 ] 

Robert Muir commented on LUCENE-4919:
-

I have no opinion: I'm not a hashing guy. I'm just mentioning the change is 
pretty serious.

Additionally I'm unhappy the hashcode is part of the API: so I dont think it 
should be changed in a minor release (e.g. things like TermToBytesRefAttribute 
expose this as an API requirement). But I think trunk is fine.

On the other hand I know the current situation has some bad worst-case behavior 
that users might actually hit (e.g. indexing increasing numerics), but I don't 
see sure how this patch addresses that. It seems to me that if we want to go 
thru all the trouble to improve the hashing (which would be a good thing), we 
should solve that too, maybe involving a totally different hashing scheme like 
what they did with java (i dont know).


 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626478#comment-13626478
 ] 

Dawid Weiss commented on LUCENE-4919:
-

This isn't a bug, it's a definition like any other. In general any definition 
of hash(X), even hash(X) = 42 is also valid (obviously with poor 
space-distributing properties...). The question which particular hash function 
to pick and what inputs it should consume (number of elements, values of 
elements) is kind of difficult -- when you include more elements into 
computations the distribution for certain inputs may be better but you'll 
probably loose some performance on the average case.


 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626477#comment-13626477
 ] 

Renaud Delbru commented on LUCENE-4919:
---

@Simon: I discovered the issue when using IntsRef. during query processing, I 
am streaming array of integers using IntsRef. I was relying on the hashCode to 
compute a unique identifier for the content of a particular IntsRef until I 
started to see unexpected results in my unit tests. Then I saw that the same 
behaviour is found in the other *Ref classes. 
I could live without it and bypass the problem by changing my implementation 
(and computing myself my own hash code). But I thought this behaviour is not 
very clear for the user, and could be potentially dangerous, and therefore good 
to share it with you.

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4918) Highlighter closes the given IndexReader

2013-04-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-4918.
-

Resolution: Fixed

committed thanks!

 Highlighter closes the given IndexReader
 

 Key: LUCENE-4918
 URL: https://issues.apache.org/jira/browse/LUCENE-4918
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.2, 4.2.1
Reporter: Sirvan Yahyaei
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4918.patch, LuceneHighlighter.java


 If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, 
 WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter 
 reader (IndexReader) instead of closing the member variable 'reader'. To fix, 
 line 519 of WeightedSpanTermExtractor should be changed from 
 IOUtils.close(reader) to IOUtils.close(this.reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626480#comment-13626480
 ] 

Renaud Delbru commented on LUCENE-4919:
---

Maybe a simpler solution would be to clearly state this behavior in all the 
methods javadoc.

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626481#comment-13626481
 ] 

Dawid Weiss commented on LUCENE-4919:
-

 I was relying on the hashCode to compute a unique identifier for the content 
 of a particular IntsRef

This is generally an invalid assumption for *any* hashing function with a 
limited target function space. Unless you have something that implements 
minimal perfect hashing but this is typically data-specific (and even 
precomputed in advance).


 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626483#comment-13626483
 ] 

Dawid Weiss commented on LUCENE-4919:
-

Btw. Arrays.hashCode is also not a unique identifier for the contents of an 
array so if you're using it this way your code... well, it has a problem. :)

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626486#comment-13626486
 ] 

Renaud Delbru commented on LUCENE-4919:
---

I agree with you Dawid, but this particular behaviour increases the chance of 
getting the same hash for a certain type of inputs. Anyway, I think the general 
decision is to not change their hashCode behvaiour ;o), I am fine with it. Feel 
free to close the issue.
Thanks, and sorry for the distraction.

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4581) sort-order of facet-counts depends on facet.mincount

2013-04-09 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-4581:


Attachment: SOLR-4581.patch

The bug is reproducible even after reverting SOLR-2850.

Adding facet.method=fc gives correct response but omitting facet.method or 
using facet.method=enum gives the wrong sort order.

I'm not that familiar with faceting code to fix this. Perhaps someone else can 
take a look.

 sort-order of facet-counts depends on facet.mincount
 

 Key: SOLR-4581
 URL: https://issues.apache.org/jira/browse/SOLR-4581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Buhr
 Attachments: SOLR-4581.patch, SOLR-4581.patch


 I just upgraded to Solr 4.2 and cannot explain the following behaviour:
 I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. 
 The solr-response for the query 
 {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat}
 includes the following facet-counts:
 {noformat}lst name=ListPrice_EUR_INV
   int name=-420.1261/int
   int name=-285.6721/int
   int name=-1.2181/int
 /lst{noformat}
 If I also set the parameter *'facet.mincount=1'* in the query, the order of 
 the facet-counts is reversed.
 {noformat}lst name=ListPrice_EUR_INV
   int name=-1.2181/int
   int name=-285.6721/int
   int name=-420.1261/int
 /lst{noformat}
 I would have expected, that the sort-order of the facet-counts is not 
 affected by the facet.mincount parameter, as it is in Solr 4.1.
 Is this related to SOLR-2850? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fwd: Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception

2013-04-09 Thread Lyuba Romanchuk
It seems like bullets don't look nice then I'm sending explanation without
bullets.

The flow of SolrCore.execute() function will be changed:

Change the status of the core to “USED” and call
waitForResource(SolrRequestHandler, SolrQueryRequest) function, after that
perform the current SolrCore.execute() flow and change status of the core
to “UNUSED”.

In waitForResource(SolrRequestHandler, SolrQueryRequest) function,
initially, estimate the required memory for this query/handler on this
core. If there is no enough free resources to run the query and after
unloading all unused, not permanent cores still there is no enough resource
throw an OutOfMemoryError  exception and change the status of the core to
“UNUSED”; else wait with timeout till some resource is released and then
check again until the required resource is available or the exception is
thrown.

Best regards,

Lyuba

-- Forwarded message --
From: Lyuba Romanchuk lyuba.romanc...@gmail.com
Date: Tue, Apr 9, 2013 at 11:47 AM
Subject: Adding new functionality to avoid java.lang.OutOfMemoryError:
Java heap space exception
To: dev@lucene.apache.org


Hi all,

We run solr (4.2 and 5.0) in a real time environment with big data. Each
day two Solr cores are generated that can reach ~8-10g, depending on the
insertion rates and on different hardware.

Currently, all cores are loaded on solr startup.

The query rate is not high but the response must be quick and must be
returned even for old data and over a large time frame.

There are a lot of simple queries (facet/facet.pivot for small distributed
fields) but there are also heavy queries like facet.pivot for a large-scale
distributed fields. We use distributed search to query the cores and,
usually, the query over 1-2 weeks (around 7-28 cores).

After some large queries (with facet.pivot for wide distributed fields) we
sometimes encounter a java.lang.OutOfMemoryError: Java heap space
exception:.

The software is to be deployed to customer sites so increasing memory would
not always be possible, and the customers may want to get slower responses
for the larger queries, if we can provide them.

We looked at the LotsOfCores functionality that was added in 4.1 and 4.2.
It enables defining an upper limit of online cores and unloading them when
the cache gets full on a LRU basis. However in our case it seems a more
general use case is needed:

* Only cores that are used for updates/inserts must be loaded at all times.
Other cores, which are queried only, should be loaded / unloaded on demand
while the query runs, until completion – according to memory demands.

* Each facet, facet.pivot must be estimated for memory consumption. In case
there is not enough memory to run the query for all cores concurrently it
must be separated into sequential queries, unloading already queried or
irrelevant cores (but not permanent cores) and loading older cores to
complete the query.

* Occasionally, the oldest cores should be unloaded according to a
configurable policy (for example, one type of high volume cores will be
kept loaded for 1 week, while smaller cores can remain loaded for a month).
The policy will allow for data we know is queried less but is higher volume
to be kept live over shorter time periods.

We are considering adding the following functionality to Solr (optional –
turned on by new configs):

The flow of SolrCore.execute() function will be changed:


   - Change status of the core to “USED”
   - Call waitForResource(SolrRequestHandler, SolrQueryRequest) function
  - estimate the required memory for this query/handler on this core
  - if there is no enough free resources to run the query then
 - if all cores are permanent and can’t be unloaded then
- throw a OutOfMemoryError  exception // here the status of
the core should be changed to “UNUSED”
 - else
-  try to unload unused, not permanent cores
- if unloading unused cores didn’t release enough resources and
no core can be unloaded then
   - throw an OutOfMemoryError  exception // here the status
   of the core should be changed to “UNUSED”
- if unloading unused cores didn’t release enough resources and
there are cores that can be unloaded then
- wait with timeout till some resource is released
   - check again until the required resource is available or
   the exception is thrown
   - reserve the resource
   - Call the current SolrCore.execute()
   - Change status of the core to “UNUSED”

We would like to get some initial feedback on the design / functionality
we’re proposing as we feel this really benefits real-time, high volume
indexing systems such as ours. We are also happy to contribute the code
back if you feel there is a need for this functionality.

Best regards,

Lyuba


[jira] [Commented] (LUCENE-3786) Create SearcherTaxoManager

2013-04-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626519#comment-13626519
 ] 

Michael McCandless commented on LUCENE-3786:


OK I discussed these tricky issues with Shai ... with the non-NRT case
(app commits and then calls .maybeRefresh) there are some big
challenges.

First off, the app must always commit IW first then TW.  But second
off, even if it does that, there is at least this multi-threaded case
where .maybeRefresh can screw up:

  * Thread 1 (indexer) commits IW1
  * Thread 1 (indexer) commits TW1
  * Thread 2 (indexer) commits IW2
  * Thread 3 (searcher) maybeRefresh opens IW2
  * Thread 3 (searcher) maybeRefresh opens TW1
  * Thread 1 (indexer) commits TW2

That will then lead to confusing AIOOBEs during facet counting...

Net/net I think there's too much hair around supporting the non-NRT
case, and I think for starters we should just support NRT, ie you must
pass IW and TW to STM's ctor.  Then STM is agnostic to what commits
are being done ... commit is only for durability purposes.

We must still document that you cannot do IW.deleteAll /
TW.replaceTaxonomy (I'll add it).

bq. Why does the test uses newFSDirectory?

Just because it's using the LineFileDocs, which have biggish docs in
them.  Add in -Dtests.nightly, -Dtests.multiplier=3, and it could
maybe be we are pushing the 512 MB RAM limit...

bq. Manager.decRef()-- I think you should searcher.reader.incRef() if 
taxoReader.decRef() failed?

Hmm this isn't so simple: that decRef could have closed the reader.  I
suppose I could do a best effort tryIncRef so that if the app
somehow catches the exception and retries the decRef we don't
prematurely close the reader ...

bq. It's odd that acquire() throws IOE ... I realize it's because the decRef 
call in tryIncRef. I don't know if it's critical, but if it is, you may want to 
throw RuntimeEx?

I think it's OK to add IOE to the signature?


 Create SearcherTaxoManager
 --

 Key: LUCENE-3786
 URL: https://issues.apache.org/jira/browse/LUCENE-3786
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Assignee: Michael McCandless
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-3786-3x-nocommit.patch, LUCENE-3786.patch


 If an application wants to use an IndexSearcher and TaxonomyReader in a 
 SearcherManager-like fashion, it cannot use a separate SearcherManager, and 
 say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader 
 instances need to be in sync. That is, the IS-TR pair must match, or 
 otherwise the category ordinals that are encoded in the search index might 
 not match the ones in the taxonomy index.
 This can happen if someone reopens the IndexSearcher's IndexReader, but does 
 not refresh the TaxonomyReader, and the category ordinals that exist in the 
 reopened IndexReader are not yet visible to the TaxonomyReader instance.
 I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which 
 manages an IndexSearcher and TaxonomyReader pair. Then an application will 
 call:
 {code}
 SearcherTaxoPair pair = manager.acquire();
 try {
   IndexSearcher searcher = pair.searcher;
   TaxonomyReader taxoReader = pair.taxoReader;
   // do something with them
 } finally {
   manager.release(pair);
   pair = null;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626529#comment-13626529
 ] 

Adrien Grand commented on LUCENE-4858:
--

bq. The reason I did that is in case someone will want to sort by a stored 
field and numeric field which have same names.

A Sorter which sorts by stored field values would indeed need to add more 
information to its ID (at least to say that it is a stored field).

bq. numericdv_field is really unique, as you cannot have two numeric DV 
fields with the same name, but different meaning.

Since doc values types are exclusive, could we then just say that these are doc 
values without mentioning the type? I think this would help keep up with doc 
values types evolutions (for example there used to be BYTES_FIXED_SORTED and 
BYTES_VAR_SORTED which have been merged into SORTED) and/or additions 
(SORTED_SET). I would also prefer having something even more human-readable 
(like DocValues(fieldName=$fieldName,order=asc|desc)?).



 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4581) sort-order of facet-counts depends on facet.mincount

2013-04-09 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-4581:
--

Assignee: Yonik Seeley

 sort-order of facet-counts depends on facet.mincount
 

 Key: SOLR-4581
 URL: https://issues.apache.org/jira/browse/SOLR-4581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Buhr
Assignee: Yonik Seeley
 Attachments: SOLR-4581.patch, SOLR-4581.patch


 I just upgraded to Solr 4.2 and cannot explain the following behaviour:
 I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. 
 The solr-response for the query 
 {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat}
 includes the following facet-counts:
 {noformat}lst name=ListPrice_EUR_INV
   int name=-420.1261/int
   int name=-285.6721/int
   int name=-1.2181/int
 /lst{noformat}
 If I also set the parameter *'facet.mincount=1'* in the query, the order of 
 the facet-counts is reversed.
 {noformat}lst name=ListPrice_EUR_INV
   int name=-1.2181/int
   int name=-285.6721/int
   int name=-420.1261/int
 /lst{noformat}
 I would have expected, that the sort-order of the facet-counts is not 
 affected by the facet.mincount parameter, as it is in Solr 4.1.
 Is this related to SOLR-2850? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3786) Create SearcherTaxoManager

2013-04-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626543#comment-13626543
 ] 

Shai Erera commented on LUCENE-3786:


bq. I think it's OK to add IOE to the signature?

Ok.

bq. that decRef could have closed the reader

Hmm ... if we assume that this TR/IR pair is managed only by that manager, then 
an IOE thrown from decRef could only be caused by closing the reader, right?
So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is 
closed already right? Therefore there's no point to even tryIncRef?

bq. Just because it's using the LineFileDocs

Ahh ok. As I said, I didn't read the test through. I will review the patch 
after you post a new version.

 Create SearcherTaxoManager
 --

 Key: LUCENE-3786
 URL: https://issues.apache.org/jira/browse/LUCENE-3786
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Assignee: Michael McCandless
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-3786-3x-nocommit.patch, LUCENE-3786.patch


 If an application wants to use an IndexSearcher and TaxonomyReader in a 
 SearcherManager-like fashion, it cannot use a separate SearcherManager, and 
 say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader 
 instances need to be in sync. That is, the IS-TR pair must match, or 
 otherwise the category ordinals that are encoded in the search index might 
 not match the ones in the taxonomy index.
 This can happen if someone reopens the IndexSearcher's IndexReader, but does 
 not refresh the TaxonomyReader, and the category ordinals that exist in the 
 reopened IndexReader are not yet visible to the TaxonomyReader instance.
 I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which 
 manages an IndexSearcher and TaxonomyReader pair. Then an application will 
 call:
 {code}
 SearcherTaxoPair pair = manager.acquire();
 try {
   IndexSearcher searcher = pair.searcher;
   TaxonomyReader taxoReader = pair.taxoReader;
   // do something with them
 } finally {
   manager.release(pair);
   pair = null;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception

2013-04-09 Thread Erick Erickson
On a quick glance, I think this would be difficult. How could one
estimate memory without loading the core? Facets in particular are
sensitive to the number of unique terms in the field. One could
probably work it backwards, that is load the cores as necessary and
_measure_ the memory consumption. You'd then have to store that
information someplace though.

It seems like you can get relatively close to this by specifying a set
of cores with transient=false and the rest with transient=true,
but that's certainly not going to satisfy the complex requirements
you've outlined.

That said, it feels like your design is a band-aid, client are going
to then _still_ put too much information on too little hardware, but
you know your problem space better than I do.

But before you start working there, be aware that this code is
evolving fairly quickly. SOLR-4662 should have the structure in
reasonably stable condition, and I hope to get that done this coming
weekend. You might want to wait until that gets committed to do more
than exploratory work as the code base may change out from underneath
you.

Good luck!
Erick

On Tue, Apr 9, 2013 at 7:02 AM, Lyuba Romanchuk
lyuba.romanc...@gmail.com wrote:
 It seems like bullets don't look nice then I'm sending explanation without
 bullets.

 The flow of SolrCore.execute() function will be changed:

 Change the status of the core to “USED” and call
 waitForResource(SolrRequestHandler, SolrQueryRequest) function, after that
 perform the current SolrCore.execute() flow and change status of the core to
 “UNUSED”.

 In waitForResource(SolrRequestHandler, SolrQueryRequest) function,
 initially, estimate the required memory for this query/handler on this core.
 If there is no enough free resources to run the query and after unloading
 all unused, not permanent cores still there is no enough resource throw an
 OutOfMemoryError  exception and change the status of the core to “UNUSED”;
 else wait with timeout till some resource is released and then check again
 until the required resource is available or the exception is thrown.

 Best regards,

 Lyuba


 -- Forwarded message --
 From: Lyuba Romanchuk lyuba.romanc...@gmail.com
 Date: Tue, Apr 9, 2013 at 11:47 AM
 Subject: Adding new functionality to avoid java.lang.OutOfMemoryError: Java
 heap space exception
 To: dev@lucene.apache.org


 Hi all,

 We run solr (4.2 and 5.0) in a real time environment with big data. Each day
 two Solr cores are generated that can reach ~8-10g, depending on the
 insertion rates and on different hardware.

 Currently, all cores are loaded on solr startup.

 The query rate is not high but the response must be quick and must be
 returned even for old data and over a large time frame.

 There are a lot of simple queries (facet/facet.pivot for small distributed
 fields) but there are also heavy queries like facet.pivot for a large-scale
 distributed fields. We use distributed search to query the cores and,
 usually, the query over 1-2 weeks (around 7-28 cores).

 After some large queries (with facet.pivot for wide distributed fields) we
 sometimes encounter a java.lang.OutOfMemoryError: Java heap space
 exception:.

 The software is to be deployed to customer sites so increasing memory would
 not always be possible, and the customers may want to get slower responses
 for the larger queries, if we can provide them.

 We looked at the LotsOfCores functionality that was added in 4.1 and 4.2. It
 enables defining an upper limit of online cores and unloading them when the
 cache gets full on a LRU basis. However in our case it seems a more general
 use case is needed:

 * Only cores that are used for updates/inserts must be loaded at all times.
 Other cores, which are queried only, should be loaded / unloaded on demand
 while the query runs, until completion – according to memory demands.

 * Each facet, facet.pivot must be estimated for memory consumption. In case
 there is not enough memory to run the query for all cores concurrently it
 must be separated into sequential queries, unloading already queried or
 irrelevant cores (but not permanent cores) and loading older cores to
 complete the query.

 * Occasionally, the oldest cores should be unloaded according to a
 configurable policy (for example, one type of high volume cores will be kept
 loaded for 1 week, while smaller cores can remain loaded for a month). The
 policy will allow for data we know is queried less but is higher volume to
 be kept live over shorter time periods.

 We are considering adding the following functionality to Solr (optional –
 turned on by new configs):

 The flow of SolrCore.execute() function will be changed:

 Change status of the core to “USED”
 Call waitForResource(SolrRequestHandler, SolrQueryRequest) function

 estimate the required memory for this query/handler on this core
 if there is no enough free resources to run the query then

 if all cores are permanent and can’t be unloaded then

 throw a 

[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626546#comment-13626546
 ] 

Shai Erera commented on LUCENE-4858:


bq. Since doc values types are exclusive, could we then just say that these are 
doc values without mentioning the type?

+1. I mistakenly thought you can add both a numeric and binary DocValues to a 
document, under the same name. I prefer slightly less verbosity, but just 
because I think the fieldName and order part are redundant. So 
DocValues($field,asc|desc)?

 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626548#comment-13626548
 ] 

Adrien Grand commented on LUCENE-4858:
--

Sounds good to me!

 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

2013-04-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4880:


Attachment: LUCENE-4880.patch

Attached is a fix with tests.

 Difference in offset handling between IndexReader created by MemoryIndex and 
 one created by RAMDirectory
 

 Key: LUCENE-4880
 URL: https://issues.apache.org/jira/browse/LUCENE-4880
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.2
 Environment: Windows 7 (probably irrelevant)
Reporter: Timothy Allison
 Attachments: LUCENE-4880.patch, 
 MemoryIndexVsRamDirZeroLengthTermTest.java


 MemoryIndex skips tokens that have length == 0 when building the index; the 
 result is that it does not increment the token offset (nor does it store the 
 position offsets if that option is set) for tokens of length == 0.  A regular 
 index (via, say, RAMDirectory) does not appear to do this.
 When using the ICUFoldingFilter, it is possible to have a term of zero length 
 (the \u0640 character separated by spaces).  If that occurs in a document, 
 the offsets returned at search time differ between the MemoryIndex and a 
 regular index.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0

2013-04-09 Thread Renaud Delbru (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renaud Delbru closed LUCENE-4919.
-

Resolution: Not A Problem

 IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
 ---

 Key: LUCENE-4919
 URL: https://issues.apache.org/jira/browse/LUCENE-4919
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.2
Reporter: Renaud Delbru
 Fix For: 4.3

 Attachments: LUCENE-4919.patch


 IntsRef, BytesRef and CharsRef implementation do not follow the java 
 Arrays.hashCode implementation, and return incorrect hashcode when filled 
 with 0. 
 For example, an IntsRef with \{ 0 \} will return the same hashcode than an 
 IntsRef with \{ 0, 0 \}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4581) sort-order of facet-counts depends on facet.mincount

2013-04-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626592#comment-13626592
 ] 

Yonik Seeley commented on SOLR-4581:


It looks like this is a new bug due to the new faceting code introduced in 
SOLR-3855


 sort-order of facet-counts depends on facet.mincount
 

 Key: SOLR-4581
 URL: https://issues.apache.org/jira/browse/SOLR-4581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Buhr
Assignee: Yonik Seeley
 Attachments: SOLR-4581.patch, SOLR-4581.patch


 I just upgraded to Solr 4.2 and cannot explain the following behaviour:
 I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. 
 The solr-response for the query 
 {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat}
 includes the following facet-counts:
 {noformat}lst name=ListPrice_EUR_INV
   int name=-420.1261/int
   int name=-285.6721/int
   int name=-1.2181/int
 /lst{noformat}
 If I also set the parameter *'facet.mincount=1'* in the query, the order of 
 the facet-counts is reversed.
 {noformat}lst name=ListPrice_EUR_INV
   int name=-1.2181/int
   int name=-285.6721/int
   int name=-420.1261/int
 /lst{noformat}
 I would have expected, that the sort-order of the facet-counts is not 
 affected by the facet.mincount parameter, as it is in Solr 4.1.
 Is this related to SOLR-2850? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

2013-04-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4880.
-

   Resolution: Fixed
Fix Version/s: 4.3
   5.0

Thanks Timothy!

 Difference in offset handling between IndexReader created by MemoryIndex and 
 one created by RAMDirectory
 

 Key: LUCENE-4880
 URL: https://issues.apache.org/jira/browse/LUCENE-4880
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.2
 Environment: Windows 7 (probably irrelevant)
Reporter: Timothy Allison
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4880.patch, 
 MemoryIndexVsRamDirZeroLengthTermTest.java


 MemoryIndex skips tokens that have length == 0 when building the index; the 
 result is that it does not increment the token offset (nor does it store the 
 position offsets if that option is set) for tokens of length == 0.  A regular 
 index (via, say, RAMDirectory) does not appear to do this.
 When using the ICUFoldingFilter, it is possible to have a term of zero length 
 (the \u0640 character separated by spaces).  If that occurs in a document, 
 the offsets returned at search time differ between the MemoryIndex and a 
 regular index.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4858:
---

Attachment: LUCENE-4858.patch

Patch adds CHANGES and improves getID impls. I think it's ready. I'll run some 
tests and if everything's ok, commit.

 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626606#comment-13626606
 ] 

Robert Muir commented on LUCENE-949:


Hello Timothy, can you turn these changes into a patch?

See http://wiki.apache.org/lucene-java/HowToContribute#Creating_a_patch

Thanks!

 AnalyzingQueryParser can't work with leading wildcards.
 ---

 Key: LUCENE-949
 URL: https://issues.apache.org/jira/browse/LUCENE-949
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 2.2
Reporter: Stefan Klein
 Attachments: AnalyzingQueryParser.java


 The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
 changes to accept leading wildcards:
   protected Query getWildcardQuery(String field, String termStr) throws 
 ParseException
   {
   String useTermStr = termStr;
   String leadingWildcard = null;
   if (*.equals(field))
   {
   if (*.equals(useTermStr))
   return new MatchAllDocsQuery();
   }
   boolean hasLeadingWildcard = (useTermStr.startsWith(*) || 
 useTermStr.startsWith(?)) ? true : false;
   if (!getAllowLeadingWildcard()  hasLeadingWildcard)
   throw new ParseException('*' or '?' not allowed as 
 first character in WildcardQuery);
   if (getLowercaseExpandedTerms())
   {
   useTermStr = useTermStr.toLowerCase();
   }
   if (hasLeadingWildcard)
   {
   leadingWildcard = useTermStr.substring(0, 1);
   useTermStr = useTermStr.substring(1);
   }
   List tlist = new ArrayList();
   List wlist = new ArrayList();
   /*
* somewhat a hack: find/store wildcard chars in order to put 
 them back
* after analyzing
*/
   boolean isWithinToken = (!useTermStr.startsWith(?)  
 !useTermStr.startsWith(*));
   isWithinToken = true;
   StringBuffer tmpBuffer = new StringBuffer();
   char[] chars = useTermStr.toCharArray();
   for (int i = 0; i  useTermStr.length(); i++)
   {
   if (chars[i] == '?' || chars[i] == '*')
   {
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = false;
   }
   else
   {
   if (!isWithinToken)
   {
   wlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = true;
   }
   tmpBuffer.append(chars[i]);
   }
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   }
   else
   {
   wlist.add(tmpBuffer.toString());
   }
   // get Analyzer from superclass and tokenize the term
   TokenStream source = getAnalyzer().tokenStream(field, new 
 StringReader(useTermStr));
   org.apache.lucene.analysis.Token t;
   int countTokens = 0;
   while (true)
   {
   try
   {
   t = source.next();
   }
   catch (IOException e)
   {
   t = null;
   }
   if (t == null)
   {
   break;
   }
   if (!.equals(t.termText()))
   {
   try
   {
   tlist.set(countTokens++, t.termText());
   }
   catch (IndexOutOfBoundsException ioobe)
   {
   countTokens = -1;
   }
   }
   }
   try
   {
   source.close();
   }
   catch (IOException e)
   {
 

[jira] [Resolved] (SOLR-4677) Improve Solr's use of spec version.

2013-04-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-4677.
---

Resolution: Fixed

 Improve Solr's use of spec version.
 ---

 Key: SOLR-4677
 URL: https://issues.apache.org/jira/browse/SOLR-4677
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Mark Miller
 Fix For: 4.3, 5.0

 Attachments: SOLR-4677.patch, SOLR-4677.patch, SOLR-4677.patch


 Solr 4.2.1 went out with an impl version of 4.2.1 and a spec version of 4.2.0.
 This is because you must update the spec version in common-build.xml while 
 the impl is set by the version you pass as a sys prop when doing 
 prepare-release.
 Do we need this spec version? Does it serve any purpose? I think we should 
 either stop dealing with it or just set it the same way as the impl...or?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4693) Create a collections API to delete/cleanup a Slice

2013-04-09 Thread Anshum Gupta (JIRA)
Anshum Gupta created SOLR-4693:
--

 Summary: Create a collections API to delete/cleanup a Slice
 Key: SOLR-4693
 URL: https://issues.apache.org/jira/browse/SOLR-4693
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta


Have a collections API that cleans up a given shard.
Among other places, this would be useful post the shard split call to manage 
the parent/original slice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4903) Add AssertingScorer

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626628#comment-13626628
 ] 

Robert Muir commented on LUCENE-4903:
-

{quote}
The problem is that scorers are hard to track: scoring usually happens by 
calling Scorer.score(Collector), which itself calls 
Collector.setScorer(Scorer). Since the asserting scorer delegates to the 
wrapped one, the asserting scorer gets lost, this is why Collector.setScorer 
tries to get it back by using a weak hash map.

I'm not totally happy with it either and would really like to make 
Scorer.score(Collector) use methods from the asserting scorer directly. We 
can't rely on Scorer.score(Collector)'s default implementation since it relies 
on Scorer.nextDoc and some scorers such as BooleanScorer don't implement this 
method.
{quote}

Could we alternatively use VirtualMethod to detect if 
score(Collector)/score(Collector,int,int) are overridden in the underlying 
scorer? If they aren't, then its safe for AssertingScorer to use its own 
implementation (possibly with more checks). 

 Add AssertingScorer
 ---

 Key: LUCENE-4903
 URL: https://issues.apache.org/jira/browse/LUCENE-4903
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4903.patch


 I think we would benefit from having an AssertingScorer that would assert 
 that scorers are advanced correctly, return valid scores (eg. not NaN), ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626637#comment-13626637
 ] 

Adrien Grand commented on LUCENE-4858:
--

+1

 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4903) Add AssertingScorer

2013-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626638#comment-13626638
 ] 

Adrien Grand commented on LUCENE-4903:
--

This is a good idea, I didn't know of this class. I'll update the patch!

 Add AssertingScorer
 ---

 Key: LUCENE-4903
 URL: https://issues.apache.org/jira/browse/LUCENE-4903
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4903.patch


 I think we would benefit from having an AssertingScorer that would assert 
 that scorers are advanced correctly, return valid scores (eg. not NaN), ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2366) Facet Range Gaps

2013-04-09 Thread Jeroen Steggink (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626639#comment-13626639
 ] 

Jeroen Steggink commented on SOLR-2366:
---

I'm also very interested in a variable range gap feature.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 4.3

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.
 (Original syntax proposal removed, see discussion for concrete syntax)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4920) CLONE - TermsFilter should use AutomatonQuery

2013-04-09 Thread sani kumar (JIRA)
sani  kumar created LUCENE-4920:
---

 Summary: CLONE - TermsFilter should use AutomatonQuery
 Key: LUCENE-4920
 URL: https://issues.apache.org/jira/browse/LUCENE-4920
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: sani  kumar


I think we could see perf gains if TermsFilter sorted the terms, built a 
minimal automaton, and used TermsEnum.intersect to visit the terms...

This idea came up on the dev list recently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4858) Early termination with SortingMergePolicy

2013-04-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-4858.


   Resolution: Fixed
Fix Version/s: 5.0
Lucene Fields: New,Patch Available  (was: New)

Committed to trunk and 4x. Thanks Adrien for the fun collaboration!

 Early termination with SortingMergePolicy
 -

 Key: LUCENE-4858
 URL: https://issues.apache.org/jira/browse/LUCENE-4858
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, 
 LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch


 Spin-off of LUCENE-4752, see 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
  and 
 https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
 When an index is sorted per-segment, queries that sort according to the index 
 sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4663) Log an error if more than one core points to the same data dir.

2013-04-09 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4663:
-

Attachment: SOLR-4663.patch

I'll commit this later today unless there are objections, and assuming all the 
tests pass (running nightly now).

 Log an error if more than one core points to the same data dir.
 ---

 Key: SOLR-4663
 URL: https://issues.apache.org/jira/browse/SOLR-4663
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-4663.patch, SOLR-4663.patch, SOLR-4663.patch


 In large multi-core setups, having mistakes whereby two or more cores point 
 to the same data dir seems quite possible. We should at least complain very 
 loudly in the logs if this is detected.
 Should be a very straightforward check at core discovery time.
 Is this serious enough to keep Solr from coming up at all?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4581) sort-order of facet-counts depends on facet.mincount

2013-04-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626682#comment-13626682
 ] 

Yonik Seeley commented on SOLR-4581:


OK, the code from SOLR-3855 had a bug where the IEEE float bits were 
used/compared directly for sort order, which is not correct for negative 
numbers.
I'm testing a patch now, expect to commit shortly.

 sort-order of facet-counts depends on facet.mincount
 

 Key: SOLR-4581
 URL: https://issues.apache.org/jira/browse/SOLR-4581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Buhr
Assignee: Yonik Seeley
 Attachments: SOLR-4581.patch, SOLR-4581.patch


 I just upgraded to Solr 4.2 and cannot explain the following behaviour:
 I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. 
 The solr-response for the query 
 {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat}
 includes the following facet-counts:
 {noformat}lst name=ListPrice_EUR_INV
   int name=-420.1261/int
   int name=-285.6721/int
   int name=-1.2181/int
 /lst{noformat}
 If I also set the parameter *'facet.mincount=1'* in the query, the order of 
 the facet-counts is reversed.
 {noformat}lst name=ListPrice_EUR_INV
   int name=-1.2181/int
   int name=-285.6721/int
   int name=-420.1261/int
 /lst{noformat}
 I would have expected, that the sort-order of the facet-counts is not 
 affected by the facet.mincount parameter, as it is in Solr 4.1.
 Is this related to SOLR-2850? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order

2013-04-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4904:
---

Attachment: LUCENE-4904.patch

I hacked this up real quickly, so I could be missing something. Patch adds a 
ReverseOrderSorter which wraps a Sorter and on sort() returns a DocMap that 
reverses whatever the wrapped Sorter DocMap returned.

I still didn't figure out how to plug that sorter with existing tests, so it 
could be this approach doesn't work. Will look at it later.

 Sorter API: Make NumericDocValuesSorter able to sort in reverse order
 -

 Key: LUCENE-4904
 URL: https://issues.apache.org/jira/browse/LUCENE-4904
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Trivial
  Labels: newdev
 Fix For: 4.3

 Attachments: LUCENE-4904.patch


 Today it is only able to sort in ascending order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4581) sort-order of facet-counts depends on facet.mincount

2013-04-09 Thread Alexander Buhr (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626704#comment-13626704
 ] 

Alexander Buhr commented on SOLR-4581:
--

happy to hear this :)
thx!

 sort-order of facet-counts depends on facet.mincount
 

 Key: SOLR-4581
 URL: https://issues.apache.org/jira/browse/SOLR-4581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Buhr
Assignee: Yonik Seeley
 Attachments: SOLR-4581.patch, SOLR-4581.patch


 I just upgraded to Solr 4.2 and cannot explain the following behaviour:
 I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. 
 The solr-response for the query 
 {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat}
 includes the following facet-counts:
 {noformat}lst name=ListPrice_EUR_INV
   int name=-420.1261/int
   int name=-285.6721/int
   int name=-1.2181/int
 /lst{noformat}
 If I also set the parameter *'facet.mincount=1'* in the query, the order of 
 the facet-counts is reversed.
 {noformat}lst name=ListPrice_EUR_INV
   int name=-1.2181/int
   int name=-285.6721/int
   int name=-420.1261/int
 /lst{noformat}
 I would have expected, that the sort-order of the facet-counts is not 
 affected by the facet.mincount parameter, as it is in Solr 4.1.
 Is this related to SOLR-2850? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4694) DataImporter uses wrong format for 'last_index_time'

2013-04-09 Thread Arul Kalaipandian (JIRA)
Arul Kalaipandian created SOLR-4694:
---

 Summary: DataImporter uses wrong format for 'last_index_time'
 Key: SOLR-4694
 URL: https://issues.apache.org/jira/browse/SOLR-4694
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.2
Reporter: Arul Kalaipandian
Priority: Blocker


DataImporter uses wrong format for first import(no dataimport.propeties  in 
/conf folder).

{code}
R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}';
R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 1970','-mm-dd 
hh24:mi:ss').
{code}
It's similar to SOLR-1496.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4694) DataImporter uses wrong format for 'last_index_time'

2013-04-09 Thread Arul Kalaipandian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arul Kalaipandian updated SOLR-4694:


Description: 
DataImporter uses wrong format for first import(no dataimport.propeties  in 
/conf folder).

{code}
   R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}';
  formatted as  follows,
   R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 
1970','-mm-dd hh24:mi:ss').
{code}
It's similar to SOLR-1496.


  was:
DataImporter uses wrong format for first import(no dataimport.propeties  in 
/conf folder).

{code}
R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}';
R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 1970','-mm-dd 
hh24:mi:ss').
{code}
It's similar to SOLR-1496.



 DataImporter uses wrong format for 'last_index_time'
 

 Key: SOLR-4694
 URL: https://issues.apache.org/jira/browse/SOLR-4694
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.2
Reporter: Arul Kalaipandian
Priority: Blocker
  Labels: formatDate

 DataImporter uses wrong format for first import(no dataimport.propeties  in 
 /conf folder).
 {code}
R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}';
   formatted as  follows,
R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 
 1970','-mm-dd hh24:mi:ss').
 {code}
 It's similar to SOLR-1496.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4581) sort-order of facet-counts depends on facet.mincount

2013-04-09 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-4581.


   Resolution: Fixed
Fix Version/s: 5.0
   4.3

committed.

 sort-order of facet-counts depends on facet.mincount
 

 Key: SOLR-4581
 URL: https://issues.apache.org/jira/browse/SOLR-4581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Buhr
Assignee: Yonik Seeley
 Fix For: 4.3, 5.0

 Attachments: SOLR-4581.patch, SOLR-4581.patch


 I just upgraded to Solr 4.2 and cannot explain the following behaviour:
 I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. 
 The solr-response for the query 
 {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat}
 includes the following facet-counts:
 {noformat}lst name=ListPrice_EUR_INV
   int name=-420.1261/int
   int name=-285.6721/int
   int name=-1.2181/int
 /lst{noformat}
 If I also set the parameter *'facet.mincount=1'* in the query, the order of 
 the facet-counts is reversed.
 {noformat}lst name=ListPrice_EUR_INV
   int name=-1.2181/int
   int name=-285.6721/int
   int name=-420.1261/int
 /lst{noformat}
 I would have expected, that the sort-order of the facet-counts is not 
 affected by the facet.mincount parameter, as it is in Solr 4.1.
 Is this related to SOLR-2850? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4695) Fix core admin SPLIT action to be useful with non-cloud setups

2013-04-09 Thread Shalin Shekhar Mangar (JIRA)
Shalin Shekhar Mangar created SOLR-4695:
---

 Summary: Fix core admin SPLIT action to be useful with non-cloud 
setups
 Key: SOLR-4695
 URL: https://issues.apache.org/jira/browse/SOLR-4695
 Project: Solr
  Issue Type: Bug
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.3


The core admin SPLIT action assumes that the core being split is zookeeper 
aware. It will throw a NPE if invoked against a non-cloud solr setup.

It should be fixed to work with non-cloud setups and documents in such an index 
should be distributed alternately into sub-indexes instead of using hashes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3755) shard splitting

2013-04-09 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626753#comment-13626753
 ] 

Shalin Shekhar Mangar commented on SOLR-3755:
-

Committed three changes:
# Set update log to buffering mode before it is published (fixes bug with extra 
doc count on sub-shard)
# Use deleteIndex=true while unloading sub-shard cores (if a sub-shard in 
construction state already exists at the start of the splitshard operation)
# Made ChaosMonkeyShardSplitTest consistent with ShardSplitTest -- Use correct 
router and replica count, assert sub-shards are active, parent shards are 
inactive etc

Anshum suggested over chat that we should think about combining ShardSplitTest 
and ChaosMonkeyShardSplit tests into one to avoid code duplication. I'll try to 
see if we can do that.

 shard splitting
 ---

 Key: SOLR-3755
 URL: https://issues.apache.org/jira/browse/SOLR-3755
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Shalin Shekhar Mangar
 Fix For: 4.3, 5.0

 Attachments: SOLR-3755-combined.patch, 
 SOLR-3755-combinedWithReplication.patch, SOLR-3755-CoreAdmin.patch, 
 SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, 
 SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, 
 SOLR-3755.patch, SOLR-3755.patch, SOLR-3755-testSplitter.patch, 
 SOLR-3755-testSplitter.patch


 We can currently easily add replicas to handle increases in query volume, but 
 we should also add a way to add additional shards dynamically by splitting 
 existing shards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_17) - Build # 5040 - Failure!

2013-04-09 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/5040/
Java: 32bit/jdk1.7.0_17 -server -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 14695 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:381: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:88: The following 
files contain @author tags, tabs or nocommits:
* solr/core/src/test/org/apache/solr/request/TestFaceting.java

Total time: 52 minutes 53 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_17 -server -XX:+UseConcMarkSweepGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4679) HTML line breaks (br) are removed during indexing; causes wrong search results

2013-04-09 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626775#comment-13626775
 ] 

Hoss Man commented on SOLR-4679:


Right ... i wonder if somewhere in the flow of SAX events these newline are 
being treated as ignorable whitespace ... i can't imagine why they would be, 
but that's the best guess i have at the moment.

 HTML line breaks (br) are removed during indexing; causes wrong search 
 results
 

 Key: SOLR-4679
 URL: https://issues.apache.org/jira/browse/SOLR-4679
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.2
 Environment: Windows Server 2008 R2, Java 6, Tomcat 7
Reporter: Christoph Straßer
 Attachments: external.htm, Solr_HtmlLineBreak_Linz_NotFound.png, 
 Solr_HtmlLineBreak_Vienna.png


 HTML line breaks (br, BR, br/, ...) seem to be removed during 
 extraction of content from HTML-Files. They need to be replaced with a empty 
 space.
 Test-File:
 html
 head
 titleTest mit HTML-Zeilenschaltungen/title
 /head
 p
 word1brword2br/
 Some other words, a special name like linzbrand another special name - 
 vienna
 /p
 /html
 The Solr-content-attribute contains the following text:
 Test mit HTML-Zeilenschaltungen
 word1word2
 Some other words, a special name like linzand another special name - vienna
 So we are not able to find the word linz.
 We use the ExtractingRequestHandler to put content into Solr. 
 (wiki.apache.org/solr/ExtractingRequestHandler)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4921) Create a DocValuesFormat for sparse doc values

2013-04-09 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-4921:


 Summary: Create a DocValuesFormat for sparse doc values
 Key: LUCENE-4921
 URL: https://issues.apache.org/jira/browse/LUCENE-4921
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Adrien Grand
Priority: Trivial


We could have a special DocValuesFormat in lucene/codecs to better handle 
sparse doc values.

See http://search-lucene.com/m/HUeYW1RlEtc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4921) Create a DocValuesFormat for sparse doc values

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626796#comment-13626796
 ] 

Robert Muir commented on LUCENE-4921:
-

a good baseline could be something as simple as passing COMPACT to the default 
DVConsumer?

or we could provide something that works entirely different... there are a lot 
of possibilities.

 Create a DocValuesFormat for sparse doc values
 --

 Key: LUCENE-4921
 URL: https://issues.apache.org/jira/browse/LUCENE-4921
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Adrien Grand
Priority: Trivial
  Labels: gsoc2013, newdev

 We could have a special DocValuesFormat in lucene/codecs to better handle 
 sparse doc values.
 See http://search-lucene.com/m/HUeYW1RlEtc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1490 - Failure

2013-04-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/1490/

All tests passed

Build Log:
[...truncated 14050 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:381:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:88:
 The following files contain @author tags, tabs or nocommits:
* solr/core/src/test/org/apache/solr/request/TestFaceting.java

Total time: 62 minutes 20 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4431) Developer Curb Appeal: easier URL to get to Cloud UI

2013-04-09 Thread Mark Bennett (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bennett updated SOLR-4431:
---

Attachment: SOLR-4431.patch

Adds 3 convenience URLs that generate 302 redirects.

Also includes ivy entry to fetch 1 additional jetty jar and ini to include it.

These are not mimicking older Solr URLs.  The nice thing is that they could 
easily be modified in the future if we need to change the URL structure again.

 Developer Curb Appeal: easier URL to get to Cloud UI
 

 Key: SOLR-4431
 URL: https://issues.apache.org/jira/browse/SOLR-4431
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.1
Reporter: Mark Bennett
 Attachments: SOLR-4431.patch


 Currently the URL to get the cloud UI is 
 http://172.16.10.236:8983/solr/#/~cloud
 The path and anchor portion is very strange: /solr/#/~cloud
 Ideally it would just be /cloud
 Or even just /, and if it's in cloud mode, take the admin to the right 
 place.
 If there's some internal important structural reason for /solr, # and 
 ~cloud sections, perhaps each would need to be addressed.
 Another option would be to possibly put something the default Jetty xml file 
 to handle this as some type of redirect or registered handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_17) - Build # 5090 - Still Failing!

2013-04-09 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5090/
Java: 32bit/jdk1.7.0_17 -client -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 14399 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:375: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:88: The following 
files contain @author tags, tabs or nocommits:
* solr/core/src/test/org/apache/solr/request/TestFaceting.java

Total time: 48 minutes 4 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_17 -client -XX:+UseConcMarkSweepGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order

2013-04-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4904:
---

Attachment: LUCENE-4904.patch

Added ReverseOrderSorter to IndexSortingTest (was after all very easy), which 
uncovered a bug in my original implementation. It's now working and tests are 
happy.

I basically think this is ready, would appreciate some review.

 Sorter API: Make NumericDocValuesSorter able to sort in reverse order
 -

 Key: LUCENE-4904
 URL: https://issues.apache.org/jira/browse/LUCENE-4904
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Trivial
  Labels: newdev
 Fix For: 4.3

 Attachments: LUCENE-4904.patch, LUCENE-4904.patch


 Today it is only able to sort in ascending order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-4.x-java7 - Build # 1131 - Still Failing

2013-04-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-java7/1131/

All tests passed

Build Log:
[...truncated 14569 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-java7/build.xml:381:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-java7/build.xml:88:
 The following files contain @author tags, tabs or nocommits:
* solr/core/src/test/org/apache/solr/request/TestFaceting.java

Total time: 60 minutes 8 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4431) Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc.

2013-04-09 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626889#comment-13626889
 ] 

Hoss Man commented on SOLR-4431:


-1

a) this patch would change the behaviour only of the example jetty server, 
causing solr to (knowingly!) behave radically different if you deployed to a 
different servlet container.

2) as explicitly mentioned before, this change would cause problems for people 
trying to create solr cores (or handlers in the default solr core) named 
cloud (or any other names that get taken up by other aliases like this that 
might get added if we go down this road)...

bq. switching these UI URLs from things like /~cloud to /cloud would cause 
problems for anyone who might want to have a collection named cloud 

The included Admin UI is a nice to have, but improving it's ease of use or 
prettiness must not come at the expense of reduced configurability or 
expressiveness of the underlying API URLs.

If people want an admin UI for solr that has short and pretty URLs then it 
should be something deployed as an independent war (or written in ruby or 
whatever)



 Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc.
 

 Key: SOLR-4431
 URL: https://issues.apache.org/jira/browse/SOLR-4431
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.1
Reporter: Mark Bennett
 Attachments: SOLR-4431.patch


 Currently the URL to get the cloud UI is 
 http://172.16.10.236:8983/solr/#/~cloud
 The path and anchor portion is very strange: /solr/#/~cloud
 Ideally it would just be /cloud
 Or even just /, and if it's in cloud mode, take the admin to the right 
 place.
 If there's some internal important structural reason for /solr, # and 
 ~cloud sections, perhaps each would need to be addressed.
 Another option would be to possibly put something the default Jetty xml file 
 to handle this as some type of redirect or registered handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.6.0_43) - Build # 5041 - Still Failing!

2013-04-09 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/5041/
Java: 64bit/jdk1.6.0_43 -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 13967 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:381: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:88: The following 
files contain @author tags, tabs or nocommits:
* solr/core/src/test/org/apache/solr/request/TestFaceting.java

Total time: 55 minutes 40 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.6.0_43 -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-04-09 Thread David Smiley (JIRA)
David Smiley created LUCENE-4922:


 Summary: A SpatialPrefixTree based on the Hilbert Curve and 
variable grid sizes
 Key: LUCENE-4922
 URL: https://issues.apache.org/jira/browse/LUCENE-4922
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley


My wish-list for an ideal SpatialPrefixTree has these properties:
* Hilbert Curve ordering
* Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
all in-between)
* Compact binary encoding (so-called Morton number)
* Works for geodetic (i.e. lat  lon) and non-geodetic

Some bonus wishes for use in geospatial:
* Use an equal-area projection such that each cell has an equal area to all 
others at the same level.
* When advancing a grid level, if a cell's width is less than half its height. 
then divide it as 4 vertically stacked instead of 2 by 2. The point is to avoid 
super-skinny cells which occurs towards the poles and degrades performance.

All of this requires some basic performance benchmarks to measure the effects 
of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order

2013-04-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4904:
---

Attachment: LUCENE-4904.patch

Patch on latest trunk (previous one had issues applying).

 Sorter API: Make NumericDocValuesSorter able to sort in reverse order
 -

 Key: LUCENE-4904
 URL: https://issues.apache.org/jira/browse/LUCENE-4904
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Trivial
  Labels: newdev
 Fix For: 4.3

 Attachments: LUCENE-4904.patch, LUCENE-4904.patch, LUCENE-4904.patch


 Today it is only able to sort in ascending order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-04-09 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4922:
-

Assignee: David Smiley
  Labels: gsoc2013 mentor newdev  (was: gsoc2013 newdev)

 A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
 --

 Key: LUCENE-4922
 URL: https://issues.apache.org/jira/browse/LUCENE-4922
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
  Labels: gsoc2013, mentor, newdev

 My wish-list for an ideal SpatialPrefixTree has these properties:
 * Hilbert Curve ordering
 * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
 all in-between)
 * Compact binary encoding (so-called Morton number)
 * Works for geodetic (i.e. lat  lon) and non-geodetic
 Some bonus wishes for use in geospatial:
 * Use an equal-area projection such that each cell has an equal area to all 
 others at the same level.
 * When advancing a grid level, if a cell's width is less than half its 
 height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
 is to avoid super-skinny cells which occurs towards the poles and degrades 
 performance.
 All of this requires some basic performance benchmarks to measure the effects 
 of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4696) All threads become blocked resulting in hang when bulk adding

2013-04-09 Thread matt knecht (JIRA)
matt knecht created SOLR-4696:
-

 Summary: All threads become blocked resulting in hang when bulk 
adding
 Key: SOLR-4696
 URL: https://issues.apache.org/jira/browse/SOLR-4696
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.2, 4.1, 4.2.1
 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
KVM, 4xCPU, 5GB RAM, 4GB heap.
Reporter: matt knecht


During a bulk load after about 150,000 documents load, thread usage spikes, 
solr no longer processes any documents.  Any additional documents added result 
in a new thread until the pool is exhausted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding

2013-04-09 Thread matt knecht (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

matt knecht updated SOLR-4696:
--

Attachment: solr.jstack.2
solr.jstack.1

jstack from solr once problem manifests.  Stopped adding documents before 
running out of threads.  One jstack from each solr node (4 cores, 2 shards)

 All threads become blocked resulting in hang when bulk adding
 -

 Key: SOLR-4696
 URL: https://issues.apache.org/jira/browse/SOLR-4696
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.1, 4.2, 4.2.1
 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic
 Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
 KVM, 4xCPU, 5GB RAM, 4GB heap.
Reporter: matt knecht
  Labels: hang
 Attachments: solr.jstack.1, solr.jstack.2


 During a bulk load after about 150,000 documents load, thread usage spikes, 
 solr no longer processes any documents.  Any additional documents added 
 result in a new thread until the pool is exhausted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding

2013-04-09 Thread matt knecht (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

matt knecht updated SOLR-4696:
--

Attachment: screenshot-1.jpg

jconsole overview.  Solr stops processing new documents, CPU usage drops, 
threads grow as new docs are submitted that go into immediate wait.

 All threads become blocked resulting in hang when bulk adding
 -

 Key: SOLR-4696
 URL: https://issues.apache.org/jira/browse/SOLR-4696
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.1, 4.2, 4.2.1
 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic
 Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
 KVM, 4xCPU, 5GB RAM, 4GB heap.
Reporter: matt knecht
  Labels: hang
 Attachments: screenshot-1.jpg, solr.jstack.1, solr.jstack.2


 During a bulk load after about 150,000 documents load, thread usage spikes, 
 solr no longer processes any documents.  Any additional documents added 
 result in a new thread until the pool is exhausted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding

2013-04-09 Thread matt knecht (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

matt knecht updated SOLR-4696:
--

Environment: 
Ubuntu 12.04.2 LTS 3.5.0-27-generic
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
KVM, 4xCPU, 5GB RAM, 4GB heap.
4 cores, 2 shards, 2 nodes, tomcat7

  was:
Ubuntu 12.04.2 LTS 3.5.0-27-generic
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
KVM, 4xCPU, 5GB RAM, 4GB heap.


 All threads become blocked resulting in hang when bulk adding
 -

 Key: SOLR-4696
 URL: https://issues.apache.org/jira/browse/SOLR-4696
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.1, 4.2, 4.2.1
 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic
 Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
 KVM, 4xCPU, 5GB RAM, 4GB heap.
 4 cores, 2 shards, 2 nodes, tomcat7
Reporter: matt knecht
  Labels: hang
 Attachments: screenshot-1.jpg, solr.jstack.1, solr.jstack.2


 During a bulk load after about 150,000 documents load, thread usage spikes, 
 solr no longer processes any documents.  Any additional documents added 
 result in a new thread until the pool is exhausted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order

2013-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626982#comment-13626982
 ] 

Adrien Grand commented on LUCENE-4904:
--

We can add this ReverseOrderSorter, but as far as NumericDocValuesSorter is 
concerned, I would rather have the abstraction at the level of the 
DocComparator rather than the Sorter. This would allow 
{{Sorter.sort(int,DocComparator)}} to quickly return null without allocating 
(potentially lots of) memory for the doc maps if the reader is already sorted. 
Additionally, this would allow for more readable diagnostics (such as 
DocValues(fieldName,desc) instead of Reverse(DocValues(fieldName,asc)).


 Sorter API: Make NumericDocValuesSorter able to sort in reverse order
 -

 Key: LUCENE-4904
 URL: https://issues.apache.org/jira/browse/LUCENE-4904
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Trivial
  Labels: newdev
 Fix For: 4.3

 Attachments: LUCENE-4904.patch, LUCENE-4904.patch, LUCENE-4904.patch


 Today it is only able to sort in ascending order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4431) Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc.

2013-04-09 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626992#comment-13626992
 ] 

Shawn Heisey commented on SOLR-4431:


-1 here too, can't think of any additional reasons beyond those Hoss has stated.

Everything except this patch's specific tie to jetty would be solved if we 
could put the API to access cores and collections into its own url path, such 
as /api/corename (horrible name, just for illustration purposes).  That is an 
idea with its own problems, though.  The current URL scheme is so widely used 
that there'd be no way we could remove backward compatibility until at least 
6.0.  Note: I actually think this would be a good move, but I would not expect 
to see a lot of support for it.


 Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc.
 

 Key: SOLR-4431
 URL: https://issues.apache.org/jira/browse/SOLR-4431
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.1
Reporter: Mark Bennett
 Attachments: SOLR-4431.patch


 Currently the URL to get the cloud UI is 
 http://172.16.10.236:8983/solr/#/~cloud
 The path and anchor portion is very strange: /solr/#/~cloud
 Ideally it would just be /cloud
 Or even just /, and if it's in cloud mode, take the admin to the right 
 place.
 If there's some internal important structural reason for /solr, # and 
 ~cloud sections, perhaps each would need to be addressed.
 Another option would be to possibly put something the default Jetty xml file 
 to handle this as some type of redirect or registered handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a managed schema facility

2013-04-09 Thread yuanyun.cn (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627022#comment-13627022
 ] 

yuanyun.cn commented on SOLR-4658:
--

Steve, Thanks for your excellent work.
I met one small issue when use this feature: in our schema, we define one 
fieldtype, it has one tokenizer: MyPathHierarchyTokenizerFactory which in the 
package org.apache.lucene.analysis. -- This is not good, but the factory class 
is in the package a long time ago.
{code:xml} 
fieldType name=text_path class=solr.TextField positionIncrementGap=100
analyzer
tokenizer 
class=org.apache.lucene.analysis.MyPathHierarchyTokenizerFactory 
delimiter=\ replace=//
/analyzer
/fieldType
{code} 

After upgrade, it shorten the name to solr.MyPathHierarchyTokenizerFactory due 
to org.apache.solr.schema.FieldType.getShortName(String).
private static final Pattern SHORTENABLE_PACKAGE_PATTERN = 
Pattern.compile(org\\.apache\\.(?:lucene\\.analysis(?=.).*|solr\\.(?:analysis|schema))\\.([^.]+)$);

Then later it will fail with follwing error when I restart my solr server, 
Caused by: org.apache.solr.common.SolrException: Error loading class 
'solr.MyPathHierarchyTokenizerFactory'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:440)

This is because in SolrResourceLoader.findClass, it will try to load the class 
in sub packages of org.apache.solr. Can't find it, so throw 
ClassNotFoundException.
base=org.apache.solr;
String name = base + '.' + subpackage + newName;
return clazz = Class.forName(name,true,classLoader).asSubclass(expectedType);

I think maybe we can SHORTENABLE_PACKAGE_PATTERN: 
Pattern.compile(org\\.apache\\.(?:solr\\.(?:analysis|schema))\\.([^.]+)$);
After change SHORTENABLE_PACKAGE_PATTERN like above, it works for me now.

 In preparation for dynamic schema modification via REST API, add a managed 
 schema facility
 

 Key: SOLR-4658
 URL: https://issues.apache.org/jira/browse/SOLR-4658
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.3, 5.0

 Attachments: SOLR-4658.patch, SOLR-4658.patch


 The idea is to have a set of configuration items in {{solrconfig.xml}}:
 {code:xml}
 schema managed=true mutable=true 
 managedSchemaResourceName=managed-schema/
 {code} 
 It will be a precondition for future dynamic schema modification APIs that 
 {{mutable=true}}.  {{solrconfig.xml}} parsing will fail if 
 {{mutable=true}} but {{managed=false}}.
 When {{managed=true}}, and the resource named in 
 {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade 
 the schema to managed: the non-managed schema resource (typically 
 {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} 
 under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at 
 {{/configs/$configName/}}, and the non-managed schema resource is renamed by 
 appending {{.bak}}, e.g. {{schema.xml.bak}}.
 Once the upgrade has taken place, users can get the full schema from the 
 {{/schema?wt=schema.xml}} REST API, and can use this as the basis for 
 modifications which can then be used to manually downgrade back to 
 non-managed schema: put the {{schema.xml}} in place, then add {{schema 
 managed=false/}} to {{solrconfig.xml}} (or remove the whole {{schema/}} 
 element, since {{managed=false}} is the default).
 If users take no action, then Solr behaves the same as always: the example 
 {{solrconfig.xml}} will include {{schema managed=false ...}}.
 For a discussion of rationale for this feature, see 
 [~hossman_luc...@fucit.org]'s post to the solr-user mailing list in the 
 thread Dynamic schema design: feedback requested 
 [http://markmail.org/message/76zj24dru2gkop7b]:
  
 {quote}
 Ignoring for a moment what format is used to persist schema information, I 
 think it's important to have a conceptual distinction between data that 
 is managed by applications and manipulated by a REST API, and config 
 that is managed by the user and loaded by solr on init -- or via an 
 explicit reload config REST API.
 Past experience with how users percieve(d) solr.xml has heavily reinforced 
 this opinion: on one hand, it's a place users must specify some config 
 information -- so people wnat to be able to keep it in version control 
 with other config files.  On the other hand it's a live data file that 
 is rewritten by solr when cores are added.  (God help you if you want do a 
 rolling deploy a new version of solr.xml where you've edited some of the 
 config values while simultenously clients are creating new SolrCores)
 As we move forward towards having REST APIs that 

[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding

2013-04-09 Thread matt knecht (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

matt knecht updated SOLR-4696:
--

Attachment: solrconfig.xml

solrconfig mostly default, except for:

 autoCommit 
   !-- 30 minute auto commit --
   maxTime180/maxTime 
   maxTime10/maxTime
   openSearcherfalse/openSearcher 
 /autoCommit
   autoSoftCommit 
 maxTime1000/maxTime
   /autoSoftCommit


 All threads become blocked resulting in hang when bulk adding
 -

 Key: SOLR-4696
 URL: https://issues.apache.org/jira/browse/SOLR-4696
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.1, 4.2, 4.2.1
 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic
 Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
 KVM, 4xCPU, 5GB RAM, 4GB heap.
 4 cores, 2 shards, 2 nodes, tomcat7
Reporter: matt knecht
  Labels: hang
 Attachments: screenshot-1.jpg, solrconfig.xml, solr.jstack.1, 
 solr.jstack.2


 During a bulk load after about 150,000 documents load, thread usage spikes, 
 solr no longer processes any documents.  Any additional documents added 
 result in a new thread until the pool is exhausted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a managed schema facility

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627026#comment-13627026
 ] 

Robert Muir commented on SOLR-4658:
---

I mentioned this same bug as it applies to similarities on the dev list a week 
or so ago!

 In preparation for dynamic schema modification via REST API, add a managed 
 schema facility
 

 Key: SOLR-4658
 URL: https://issues.apache.org/jira/browse/SOLR-4658
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.3, 5.0

 Attachments: SOLR-4658.patch, SOLR-4658.patch


 The idea is to have a set of configuration items in {{solrconfig.xml}}:
 {code:xml}
 schema managed=true mutable=true 
 managedSchemaResourceName=managed-schema/
 {code} 
 It will be a precondition for future dynamic schema modification APIs that 
 {{mutable=true}}.  {{solrconfig.xml}} parsing will fail if 
 {{mutable=true}} but {{managed=false}}.
 When {{managed=true}}, and the resource named in 
 {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade 
 the schema to managed: the non-managed schema resource (typically 
 {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} 
 under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at 
 {{/configs/$configName/}}, and the non-managed schema resource is renamed by 
 appending {{.bak}}, e.g. {{schema.xml.bak}}.
 Once the upgrade has taken place, users can get the full schema from the 
 {{/schema?wt=schema.xml}} REST API, and can use this as the basis for 
 modifications which can then be used to manually downgrade back to 
 non-managed schema: put the {{schema.xml}} in place, then add {{schema 
 managed=false/}} to {{solrconfig.xml}} (or remove the whole {{schema/}} 
 element, since {{managed=false}} is the default).
 If users take no action, then Solr behaves the same as always: the example 
 {{solrconfig.xml}} will include {{schema managed=false ...}}.
 For a discussion of rationale for this feature, see 
 [~hossman_luc...@fucit.org]'s post to the solr-user mailing list in the 
 thread Dynamic schema design: feedback requested 
 [http://markmail.org/message/76zj24dru2gkop7b]:
  
 {quote}
 Ignoring for a moment what format is used to persist schema information, I 
 think it's important to have a conceptual distinction between data that 
 is managed by applications and manipulated by a REST API, and config 
 that is managed by the user and loaded by solr on init -- or via an 
 explicit reload config REST API.
 Past experience with how users percieve(d) solr.xml has heavily reinforced 
 this opinion: on one hand, it's a place users must specify some config 
 information -- so people wnat to be able to keep it in version control 
 with other config files.  On the other hand it's a live data file that 
 is rewritten by solr when cores are added.  (God help you if you want do a 
 rolling deploy a new version of solr.xml where you've edited some of the 
 config values while simultenously clients are creating new SolrCores)
 As we move forward towards having REST APIs that treat schema information 
 as data that can be manipulated, I anticipate the same types of 
 confusion, missunderstanding, and grumblings if we try to use the same 
 pattern of treating the existing schema.xml (or some new schema.json) as a 
 hybrid configs  data file.  Edit it by hand if you want, the /schema/* 
 REST API will too!  ... Even assuming we don't make any of the same 
 technical mistakes that have caused problems with solr.xml round tripping 
 in hte past (ie: losing comments, reading new config options that we 
 forget to write back out, etc...) i'm fairly certain there is still going 
 to be a lot of things that will loook weird and confusing to people.
 (XML may bave been designed to be both human readable  writable and 
 machine readable  writable, but practically speaking it's hard have a 
 single XML file be machine and human readable  writable)
 I think it would make a lot of sense -- not just in terms of 
 implementation but also for end user clarity -- to have some simple, 
 straightforward to understand caveats about maintaining schema 
 information...
 1) If you want to keep schema information in an authoritative config file 
 that you can manually edit, then the /schema REST API will be read only. 
 2) If you wish to use the /schema REST API for read and write operations, 
 then schema information will be persisted under the covers in a data store 
 whose format is an implementation detail just like the index file format.
 3) If you are using a schema config file and you wish to switch to using 
 the /schema 

[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #296: POMs out of sync

2013-04-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/296/

1 tests failed.
FAILED:  org.apache.solr.cloud.ChaosMonkeyShardSplitTest.testDistribSearch

Error Message:
Wrong doc count on shard1_1 expected:49 but was:50

Stack Trace:
java.lang.AssertionError: Wrong doc count on shard1_1 expected:49 but was:50
at 
__randomizedtesting.SeedInfo.seed([4DC904688FE1877E:CC2F8A70F8BEE742]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.solr.cloud.ChaosMonkeyShardSplitTest.doTest(ChaosMonkeyShardSplitTest.java:274)




Build Log:
[...truncated 23442 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b84) - Build # 5043 - Still Failing!

2013-04-09 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/5043/
Java: 64bit/jdk1.8.0-ea-b84 -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin 
{#8 seed=[9E472C74036F0BA9:AB033BAA3F573984]}

Error Message:
Didn't match 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@52e14672
 in Rect(minX=16.0,maxX=232.0,minY=-54.0,maxY=128.0) Expect: [3, 4] (of 1)

Stack Trace:
java.lang.AssertionError: Didn't match 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@52e14672
 in Rect(minX=16.0,maxX=232.0,minY=-54.0,maxY=128.0) Expect: [3, 4] (of 1)
at 
__randomizedtesting.SeedInfo.seed([9E472C74036F0BA9:AB033BAA3F573984]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:186)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:487)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:722)




Build Log:
[...truncated 8097 lines...]
[junit4:junit4] 

[jira] [Commented] (LUCENE-3786) Create SearcherTaxoManager

2013-04-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627065#comment-13627065
 ] 

Michael McCandless commented on LUCENE-3786:


{quote}
bq. that decRef could have closed the reader

Hmm ... if we assume that this TR/IR pair is managed only by that manager, then 
an IOE thrown from decRef could only be caused by closing the reader, right?
So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is 
closed already right? Therefore there's no point to even tryIncRef?
{quote}

You're right ... so I just left the two decRefs in the patch ...

 Create SearcherTaxoManager
 --

 Key: LUCENE-3786
 URL: https://issues.apache.org/jira/browse/LUCENE-3786
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Assignee: Michael McCandless
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-3786-3x-nocommit.patch, LUCENE-3786.patch, 
 LUCENE-3786.patch


 If an application wants to use an IndexSearcher and TaxonomyReader in a 
 SearcherManager-like fashion, it cannot use a separate SearcherManager, and 
 say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader 
 instances need to be in sync. That is, the IS-TR pair must match, or 
 otherwise the category ordinals that are encoded in the search index might 
 not match the ones in the taxonomy index.
 This can happen if someone reopens the IndexSearcher's IndexReader, but does 
 not refresh the TaxonomyReader, and the category ordinals that exist in the 
 reopened IndexReader are not yet visible to the TaxonomyReader instance.
 I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which 
 manages an IndexSearcher and TaxonomyReader pair. Then an application will 
 call:
 {code}
 SearcherTaxoPair pair = manager.acquire();
 try {
   IndexSearcher searcher = pair.searcher;
   TaxonomyReader taxoReader = pair.taxoReader;
   // do something with them
 } finally {
   manager.release(pair);
   pair = null;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a managed schema facility

2013-04-09 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627087#comment-13627087
 ] 

Steve Rowe commented on SOLR-4658:
--

Hi yuanyun,

Thanks for the bug report.

The problem isn't that {{SHORTENABLE_PACKAGE_PATTERN}} includes factories under 
{{org.apache.lucene.analysis}} - most of the shared Lucene/Solr analysis 
factories live there now - but rather that users can use the same package for 
their own code, which is what you've done.

The issue is serialization: as currently written, the user's class=whatever 
is lost, and the serialization code attempts to reconstitute it on output.  I 
think the fix is to stop guessing what it should be, and just reuse the exact 
string supplied by the user in the original file when persisting the schema.

I'll make a patch.

 In preparation for dynamic schema modification via REST API, add a managed 
 schema facility
 

 Key: SOLR-4658
 URL: https://issues.apache.org/jira/browse/SOLR-4658
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.3, 5.0

 Attachments: SOLR-4658.patch, SOLR-4658.patch


 The idea is to have a set of configuration items in {{solrconfig.xml}}:
 {code:xml}
 schema managed=true mutable=true 
 managedSchemaResourceName=managed-schema/
 {code} 
 It will be a precondition for future dynamic schema modification APIs that 
 {{mutable=true}}.  {{solrconfig.xml}} parsing will fail if 
 {{mutable=true}} but {{managed=false}}.
 When {{managed=true}}, and the resource named in 
 {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade 
 the schema to managed: the non-managed schema resource (typically 
 {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} 
 under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at 
 {{/configs/$configName/}}, and the non-managed schema resource is renamed by 
 appending {{.bak}}, e.g. {{schema.xml.bak}}.
 Once the upgrade has taken place, users can get the full schema from the 
 {{/schema?wt=schema.xml}} REST API, and can use this as the basis for 
 modifications which can then be used to manually downgrade back to 
 non-managed schema: put the {{schema.xml}} in place, then add {{schema 
 managed=false/}} to {{solrconfig.xml}} (or remove the whole {{schema/}} 
 element, since {{managed=false}} is the default).
 If users take no action, then Solr behaves the same as always: the example 
 {{solrconfig.xml}} will include {{schema managed=false ...}}.
 For a discussion of rationale for this feature, see 
 [~hossman_luc...@fucit.org]'s post to the solr-user mailing list in the 
 thread Dynamic schema design: feedback requested 
 [http://markmail.org/message/76zj24dru2gkop7b]:
  
 {quote}
 Ignoring for a moment what format is used to persist schema information, I 
 think it's important to have a conceptual distinction between data that 
 is managed by applications and manipulated by a REST API, and config 
 that is managed by the user and loaded by solr on init -- or via an 
 explicit reload config REST API.
 Past experience with how users percieve(d) solr.xml has heavily reinforced 
 this opinion: on one hand, it's a place users must specify some config 
 information -- so people wnat to be able to keep it in version control 
 with other config files.  On the other hand it's a live data file that 
 is rewritten by solr when cores are added.  (God help you if you want do a 
 rolling deploy a new version of solr.xml where you've edited some of the 
 config values while simultenously clients are creating new SolrCores)
 As we move forward towards having REST APIs that treat schema information 
 as data that can be manipulated, I anticipate the same types of 
 confusion, missunderstanding, and grumblings if we try to use the same 
 pattern of treating the existing schema.xml (or some new schema.json) as a 
 hybrid configs  data file.  Edit it by hand if you want, the /schema/* 
 REST API will too!  ... Even assuming we don't make any of the same 
 technical mistakes that have caused problems with solr.xml round tripping 
 in hte past (ie: losing comments, reading new config options that we 
 forget to write back out, etc...) i'm fairly certain there is still going 
 to be a lot of things that will loook weird and confusing to people.
 (XML may bave been designed to be both human readable  writable and 
 machine readable  writable, but practically speaking it's hard have a 
 single XML file be machine and human readable  writable)
 I think it would make a lot of sense -- not just in terms of 
 implementation but also for end user clarity -- to have some 

[jira] [Commented] (SOLR-4686) HTMLStripCharFilter and Highlighter generates invalid HTML

2013-04-09 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627153#comment-13627153
 ] 

Steve Rowe commented on SOLR-4686:
--

Hi Holger,

I wrote the latest version of HTMLStripCharFilter, and the behavior you 
describe is expected, though obviously not good.

The problem is that when a CharFilter replaces an input sequence with a 
differently-sized output sequence, it has to decide how to map the offsets 
back.  All of the CharFilter's I've looked at map the end offsets for smaller 
output sequences to the end offset of the larger input sequence.  I suppose a 
CharFilter could make different choices, though, as long as it did so 
consistently.

HTMLStripCharFilter could change offset mappings for end tags to point at the 
offset of the *beginning* of the input sequence, while keeping offset mappings 
for start tags the same as they are now for all tags: at the offset of the 
*end* of the input sequence.  {{axxx/a}} would then be highlit as 
{{aemxxx/em/a}}.

But fixing this one issue won't solve the general problem.  An example: if 
HTMLStripCharFilter were to change offset mappings for end tags as described 
above, {{bx/bixx/i}} would still result in 
{{bemx/bixx/em/i}}, which is problematic in a way that 
modifications to HTMLStripCharFilter can't fix.

It's worth noting that HTMLTidy can fix up your example, but doesn't properly 
handle my example - I tested with the cmdline version on OS X.

My surface reading of Highlighter and Formatter classes makes me think that 
there is no natural plugin point right now for an HTML-aware boundary insertion 
mechanism.  

I suspect that the low complaint volume to date is as a result of the lenient 
HTML parsing browsers do; even though the output HTML is invalid, it (usually?) 
looks okay anyway.

 HTMLStripCharFilter and Highlighter generates invalid HTML
 --

 Key: SOLR-4686
 URL: https://issues.apache.org/jira/browse/SOLR-4686
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 4.1
Reporter: Holger Floerke
  Labels: HTML, highlighter

 Using the HTMLStripCharFilter may yield to an invalid HTML highlight.
 The HTMLStripCharFilter has a special treatment of inline-elements (eg. a, 
 b, ...). For theese elements the CharFilter ignores the tag and does not 
 insert any split-character.
 If you index
 
 axxx/a
 
 you get the word xxx starting at position 3 ending on position 10(!) 
 If you highlight a search on xxx, you will get
 
 aemxxx/a/em
 
 which is invalid HTML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index

2013-04-09 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4738:
---

Attachment: LUCENE-4738.patch

New patch with several things:

  * I folded in Rob's patch on LUCENE-2727, to have MockDirWrapper
sometimes throw IOExc in openInput and createOutput to get better
test coverage of out of file descriptors like situations

  * Added a new TestIndexWriterOutOfFileDescriptors

  * Changes DirReader.indexExists back to before LUCENE-2812; I think
it's just too dangerous to try to be too smart about whether an
index exists or not, so now the method returns true if it sees any
segments file.  (These smarts were causing failures in the new
test, and caused LUCENE-4870).

  * Fixes IndexWriter so that if OpenMode is CREATE it will work even
if a corrupt index is present.  But if it's CREATE_OR_APPEND, or
APPEND then a corrupt index will cause an exc so app must manually
resolve.


 Killed JVM when first commit was running will generate a corrupted index
 

 Key: LUCENE-4738
 URL: https://issues.apache.org/jira/browse/LUCENE-4738
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64
 Java: java version 1.7.0_05
 Lucene: lucene-core-4.0.0 
Reporter: Billow Gao
 Attachments: LUCENE-4738.patch, LUCENE-4738.patch, 
 LUCENE-4738_test.patch


 1. Start a NEW IndexWriterBuilder on an empty folder,
add some documents to the index
 2. Call commit
 3. When the segments_1 file with 0 byte was created, kill the JVM
 We will end with a corrupted index with an empty segments_1.
 We only have issue with the first commit crash.
 Also, if you tried to open an IndexSearcher on a new index. And the first 
 commit on the index was not finished yet. Then you will see exception like:
 ===
 org.apache.lucene.index.IndexNotFoundException: no segments* file found in 
 org.apache.lucene.store.MMapDirectory@C:\tmp\testdir 
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: 
 [write.lock, _0.fdt, _0.fdx]
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
 ===
 So when a new index was created, we should first create an empty index. We 
 should not wait for the commit/close call to create the segment file.
 If we had an empty index there. It won't leave a corrupted index when there 
 were a power issue on the first commit. 
 And a concurrent IndexSearcher can access to the index(No match is better 
 than exception).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index

2013-04-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627186#comment-13627186
 ] 

Robert Muir commented on LUCENE-4738:
-

Patch looks great. I agree with the approach, its way too dangerous what we try 
to do today.

I also like the additional testing we have here (e.g. random FNFE: since so 
many places treat them special).

my only comment is loadFirstCommit confuses me (as a variable name). Is there 
something more intuitive?

 Killed JVM when first commit was running will generate a corrupted index
 

 Key: LUCENE-4738
 URL: https://issues.apache.org/jira/browse/LUCENE-4738
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64
 Java: java version 1.7.0_05
 Lucene: lucene-core-4.0.0 
Reporter: Billow Gao
 Attachments: LUCENE-4738.patch, LUCENE-4738.patch, 
 LUCENE-4738_test.patch


 1. Start a NEW IndexWriterBuilder on an empty folder,
add some documents to the index
 2. Call commit
 3. When the segments_1 file with 0 byte was created, kill the JVM
 We will end with a corrupted index with an empty segments_1.
 We only have issue with the first commit crash.
 Also, if you tried to open an IndexSearcher on a new index. And the first 
 commit on the index was not finished yet. Then you will see exception like:
 ===
 org.apache.lucene.index.IndexNotFoundException: no segments* file found in 
 org.apache.lucene.store.MMapDirectory@C:\tmp\testdir 
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: 
 [write.lock, _0.fdt, _0.fdx]
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
 ===
 So when a new index was created, we should first create an empty index. We 
 should not wait for the commit/close call to create the segment file.
 If we had an empty index there. It won't leave a corrupted index when there 
 were a power issue on the first commit. 
 And a concurrent IndexSearcher can access to the index(No match is better 
 than exception).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index

2013-04-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627190#comment-13627190
 ] 

Michael McCandless commented on LUCENE-4738:


bq. Is there something more intuitive?

Hmm maybe firstCommitExists?  IW only sets this to false it if was unable to 
load the segments file in CREATE.

 Killed JVM when first commit was running will generate a corrupted index
 

 Key: LUCENE-4738
 URL: https://issues.apache.org/jira/browse/LUCENE-4738
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64
 Java: java version 1.7.0_05
 Lucene: lucene-core-4.0.0 
Reporter: Billow Gao
 Attachments: LUCENE-4738.patch, LUCENE-4738.patch, 
 LUCENE-4738_test.patch


 1. Start a NEW IndexWriterBuilder on an empty folder,
add some documents to the index
 2. Call commit
 3. When the segments_1 file with 0 byte was created, kill the JVM
 We will end with a corrupted index with an empty segments_1.
 We only have issue with the first commit crash.
 Also, if you tried to open an IndexSearcher on a new index. And the first 
 commit on the index was not finished yet. Then you will see exception like:
 ===
 org.apache.lucene.index.IndexNotFoundException: no segments* file found in 
 org.apache.lucene.store.MMapDirectory@C:\tmp\testdir 
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: 
 [write.lock, _0.fdt, _0.fdx]
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
 ===
 So when a new index was created, we should first create an empty index. We 
 should not wait for the commit/close call to create the segment file.
 If we had an empty index there. It won't leave a corrupted index when there 
 were a power issue on the first commit. 
 And a concurrent IndexSearcher can access to the index(No match is better 
 than exception).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4686) HTMLStripCharFilter and Highlighter generates invalid HTML

2013-04-09 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627191#comment-13627191
 ] 

Steve Rowe commented on SOLR-4686:
--

I've read that the [Jericho HTML 
parser|http://jericho.htmlparser.net/docs/index.html], implemented in Java, 
reports tag offsets, unlike many other HTML parsers, and that could be useful 
in implementing the HTML-aware boundary insertion mechanism I mentioned 
earlier.  

 HTMLStripCharFilter and Highlighter generates invalid HTML
 --

 Key: SOLR-4686
 URL: https://issues.apache.org/jira/browse/SOLR-4686
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 4.1
Reporter: Holger Floerke
  Labels: HTML, highlighter

 Using the HTMLStripCharFilter may yield to an invalid HTML highlight.
 The HTMLStripCharFilter has a special treatment of inline-elements (eg. a, 
 b, ...). For theese elements the CharFilter ignores the tag and does not 
 insert any split-character.
 If you index
 
 axxx/a
 
 you get the word xxx starting at position 3 ending on position 10(!) 
 If you highlight a search on xxx, you will get
 
 aemxxx/a/em
 
 which is invalid HTML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >