date:20140317


 [ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5532:


Attachment: LUCENE-5532.patch

same patch, just with some reordering of things in RunAutomaton.equals for 
faster speed.

 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5513) Binary DocValues Updates

2014-03-17 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-5513:
---

Attachment: LUCENE-5513.patch

Patch makes the following refactoring changes (all internal API):

* DocValuesUpdate abstract class w/ common implementation for
NumericDocValuesUpdate and BinaryDocValuesUpdate.

* DocValuesFieldUpdates hold the doc+updates for a single field. It mostly
defines the API for the Numeric* and Binary* implementations.

* DocValuesFieldUpdates.Container holds numeric+binary updates for a set of
fields. It is as its name says -- a container of updates used by
ReaderAndUpdates.
** It helps not bloat the API w/ more maps being passed as well as simplified
BufferedUpdatesStream and IndexWriter.commitMergedDeletes.
** It also serves as a factory method based on the updates Type

* Finished TestBinaryDVUpdates

* Added TestMixedDVUpdates which ports some of the 'big' tests from both
TestNDV/BDVUpdates and mixes some NDV and BDV updates.
** I'll beast it some to make sure all edge cases are covered.

I may take a crack at simplifying IW.commitMergedDeletes even more by pulling a
lot of duplicate code into a method. This is impossible now because those
sections modify more than one state variables, but I'll try to stuff these
variables in a container to make this method more sane to read.

Otherwise, I think it's ready.

Binary DocValues Updates

Key: LUCENE-5513
URL: https://issues.apache.org/jira/browse/LUCENE-5513
Project: Lucene - Core
Issue Type: Wish
Components: core/index
Reporter: Mikhail Khludnev
Priority: Minor
Attachments: LUCENE-5513.patch, LUCENE-5513.patch

LUCENE-5189 was a great move toward. I wish to continue. The reason for
having this feature is to have join-index - to write children docnums into
parent's binaryDV. I can try to proceed the implementation, but I'm not so
experienced in such deep Lucene internals. [~shaie], any hint to begin with
is much appreciated.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60-ea-b07) - Build # 9716 - Still Failing!

2014-03-17 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9716/
Java: 32bit/jdk1.7.0_60-ea-b07 -server -XX:+UseParallelGC

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:34990 within 45000 ms

Stack Trace:
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:34990 within 45000 ms
at 
__randomizedtesting.SeedInfo.seed([8CEE065EE8AE1FEE:D0888469FF17FD2]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)

[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination


[ 
https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937526#comment-13937526
 ] 

ASF subversion and git services commented on LUCENE-5515:
-

Commit 1578262 from [~martijn.v.groningen] in branch 'dev/trunk'
[ https://svn.apache.org/r1578262 ]

LUCENE-5515: Improved TopDocs#merge to create a merged ScoreDoc array with 
length of at most equal to the specified size instead of length equal to at 
most from + size as was before.

 Improve TopDocs#merge for pagination
 

 Key: LUCENE-5515
 URL: https://issues.apache.org/jira/browse/LUCENE-5515
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5515.patch, LUCENE-5515.patch


 If TopDocs#merge takes from and size into account it can be optimized to 
 create a hits ScoreDoc array equal to size instead of from+size what is now 
 the case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5476) Facet sampling


 [ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5476:
---

Attachment: LUCENE-5476.patch

New patch. I'm still not really sure about the scorings, but please take a look 
at it.

 Facet sampling
 --

 Key: LUCENE-5476
 URL: https://issues.apache.org/jira/browse/LUCENE-5476
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Rob Audenaerde
 Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, 
 SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java


 With LUCENE-5339 facet sampling disappeared. 
 When trying to display facet counts on large datasets (10M documents) 
 counting facets is rather expensive, as all the hits are collected and 
 processed. 
 Sampling greatly reduced this and thus provided a nice speedup. Could it be 
 brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results

2014-03-17 Thread ASF subversion and git services (JIRA)

Rob Audenaerde created LUCENE-5533:
--

 Summary: TaxonomyFacetSumIntAssociations overflows, unpredicted 
results
 Key: LUCENE-5533
 URL: https://issues.apache.org/jira/browse/LUCENE-5533
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Affects Versions: 4.7
Reporter: Rob Audenaerde


{{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses a 
{{int[]}} to store values. If you sum a lot of integers in the IntAssociatoins, 
the {{int}} will overflow.

The easiest fix seems to change the {{value[]}} to {{long}}?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination


[ 
https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937530#comment-13937530
 ] 

ASF subversion and git services commented on LUCENE-5515:
-

Commit 1578267 from [~martijn.v.groningen] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1578267 ]

Merged revision 1578262 from trunk: LUCENE-5515: Improved TopDocs#merge to 
create a merged ScoreDoc array with length of at most equal to the specified 
size instead of length equal to at most from + size as was before.

 Improve TopDocs#merge for pagination
 

 Key: LUCENE-5515
 URL: https://issues.apache.org/jira/browse/LUCENE-5515
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5515.patch, LUCENE-5515.patch


 If TopDocs#merge takes from and size into account it can be optimized to 
 create a hits ScoreDoc array equal to size instead of from+size what is now 
 the case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5515) Improve TopDocs#merge for pagination

2014-03-17 Thread Martijn van Groningen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen resolved LUCENE-5515.
---

Resolution: Fixed

Committed to trunk and 4x branch.

 Improve TopDocs#merge for pagination
 

 Key: LUCENE-5515
 URL: https://issues.apache.org/jira/browse/LUCENE-5515
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5515.patch, LUCENE-5515.patch


 If TopDocs#merge takes from and size into account it can be optimized to 
 create a hits ScoreDoc array equal to size instead of from+size what is now 
 the case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5871) Ability to see the list of fields that matched the query with scores

Alexander S. created SOLR-5871:
--

 Summary: Ability to see the list of fields that matched the query 
with scores
 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.


Hello, I need the ability to show users what content matched their query, this 
way:
| Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
content | 
| John Doe | Yes| No  | Yes | No
| Yes | 
| Jane Doe | No | Yes | No  | No
| Yes | 

All these columns are indexed text fields and I need to know what content 
matched the query and would be also cool to be able to show the score per field.

As far as I know right now there's no way to return this information when 
running a query request. Debug outputs is suitable for visual review but has 
lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5871) Ability to see the list of fields that matched the query with scores


 [ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5871:
---

Description: 
Hello, I need the ability to tell users what content matched their query, this 
way:
| Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
content | 
| John Doe | Yes| No  | Yes | No
| Yes | 
| Jane Doe | No | Yes | No  | No
| Yes | 

All these columns are indexed text fields and I need to know what content 
matched the query and would be also cool to be able to show the score per field.

As far as I know right now there's no way to return this information when 
running a query request. Debug outputs is suitable for visual review but has 
lots of nesting levels and is hard for understanding.

  was:
Hello, I need the ability to show users what content matched their query, this 
way:
| Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
content | 
| John Doe | Yes| No  | Yes | No
| Yes | 
| Jane Doe | No | Yes | No  | No
| Yes | 

All these columns are indexed text fields and I need to know what content 
matched the query and would be also cool to be able to show the score per field.

As far as I know right now there's no way to return this information when 
running a query request. Debug outputs is suitable for visual review but has 
lots of nesting levels and is hard for understanding.


 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-5849) write.lock is not removed by LogReplayer

2014-03-17 Thread pavan patel (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavan patel closed SOLR-5849.
-

Resolution: Invalid

Marking this bug invalid as this is not a solr issue.

In my application we had two cores and we were initializing the 
SolrCoreContainer twice in our factory class. Because of that the write.lock 
issue was coming.

 write.lock is not removed by LogReplayer
 

 Key: SOLR-5849
 URL: https://issues.apache.org/jira/browse/SOLR-5849
 Project: Solr
  Issue Type: Bug
 Environment: Windows 7, Tomcat 7.0.52, Solr 4.3.0, jdk1.7.0_51 
Reporter: pavan patel

 In my application I am using SolrEmbeddedServer inside tomcat. I have below 
 configuration for my core:-
 lockTypesimple/lockType
 unlockOnStartuptrue/unlockOnStartup
 updateLog
   str name=dir${solr.ulog.dir:}/str
 /updateLog
  autoCommit 
maxTime15000/maxTime 
openSearcherfalse/openSearcher 
  /autoCommit
 autoSoftCommit 
  maxTime1000/maxTime 
 /autoSoftCommit
 The issue I am facing is when I restart tocmat and in case there is any 
 uncommitted data  in tlog, then I am getting below exception:-
 org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
 SimpleFSLock@F:\mydir\Install\solr\conf\alerts\data\index\write.lock
   at org.apache.lucene.store.Lock.obtain(Lock.java:84)
   at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:644)
   at 
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77)
   at 
 org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
   at 
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:197)
   at 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:148)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
   at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:504)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:640)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
   at 
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
   at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
   at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
   at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
   at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
 After restart I am not able to index anything into the solr. I debug the code 
 and found out that LogReplayer during start up creates the SolrIndexWriter on 
 core and that creates the write.lock file. Once all the leftover tlog's are 
 indexed, the write.lock remains there, its not getting deleted. So when my 
 application tries to add document the SolrIndexWriter is not able to create 
 the lock because write.lock already exists.
 This seems to be a bug in Solr 4.3.0, because I believe SolrIndexWriter 
 created during  LogReplayer is not closed, and that is causing the write.lock 
 leftover in data directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5476) Facet sampling

2014-03-17 Thread Gilad Barkai (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937580#comment-13937580
 ] 

Gilad Barkai commented on LUCENE-5476:
--

About the scores (the only part I got to review thus far), the scores should be 
a non-sparse float array.
E,g, if there are 1M documents and the original set contains 1000 documents the 
score[] array would be of length 1000, If the sampled set will only have 10 
documents, the score[] array should be only 10.

The relevant part:
{code}
if (getKeepScores()) {
  scores[doc] = docs.scores[doc];
}
{code}
should be changed as the scores[] size and index should be relative to the 
sampled set and not the original results.
Also the size of the score[] array could be the amount of bins?

 Facet sampling
 --

 Key: LUCENE-5476
 URL: https://issues.apache.org/jira/browse/LUCENE-5476
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Rob Audenaerde
 Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, 
 SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java


 With LUCENE-5339 facet sampling disappeared. 
 When trying to display facet counts on large datasets (10M documents) 
 counting facets is rather expensive, as all the hits are collected and 
 processed. 
 Sampling greatly reduced this and thus provided a nice speedup. Could it be 
 brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe

2014-03-17 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937585#comment-13937585
 ] 

Simon Willnauer commented on LUCENE-5532:
-

+1 to the patch - I agree with the change to go away from acceptsSameLanguage! 
speedups are also good though. Let's make sure we put this on the change 
runtime behavior section in CHANGES.TXT

 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe

2014-03-17 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937594#comment-13937594
 ] 

Uwe Schindler commented on LUCENE-5532:
---

+1 to the patch.

I am not sure about this, I did not get the latest test-framework updates: It 
looks to me that the thread does not use a custom name or is the thread group 
inherited by the test framework? I would also change the thread to simply 
{{Assert.fail()}} on Exception in the thread.

 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe


[ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937611#comment-13937611
 ] 

Robert Muir commented on LUCENE-5532:
-

Uwe, it just inherits. Its similar to many tests in the .index package that 
work this way. If we want to do something else, we should ban some methods. But 
I can name the threads if you want :)

As far as Assert.fail, this would lose the stacktrace of the original 
exception? In the case of this test failing due to a thread safety issue, I 
think thats useful for debugging

 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-03-17 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937616#comment-13937616
 ] 

Tim Allison commented on LUCENE-5205:
-

[~rcmuir] and community, given recent interest in LUCENE-2878, should we stop 
work on this?

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.8

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937617#comment-13937617
 ] 

Robert Muir commented on LUCENE-5205:
-

Tim I don't think so. I think actually it makes sense to have real current use 
cases for spans to ensure everything is really done correctly.

This is just my opinion. I've fallen behind on the issue only because I've been 
busy lately.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.8

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe

2014-03-17 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937619#comment-13937619
 ] 

Uwe Schindler commented on LUCENE-5532:
---

bq. Uwe, it just inherits. Its similar to many tests in the .index package that 
work this way. If we want to do something else, we should ban some methods. But 
I can name the threads if you want 

All fine, just wanted to be sure. And we dont have Thread#init() on the 
forbidden list :-)

bq. As far as Assert.fail, this would lose the stacktrace of the original 
exception? In the case of this test failing due to a thread safety issue, I 
think thats useful for debugging

Yes it will loose. I dont like RuntimeExceptions wrapping others. You can 
ideally do Rethrow.rethrow(ex). In tests this is I think the preferred way. You 
will get the original Exception in the thread stack dump, unwrapped. This is 
why we have Rethrow class in test framework.

 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe


[ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937623#comment-13937623
 ] 

Robert Muir commented on LUCENE-5532:
-

Rethrow is good: I'll use that!

 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4878) Change default directory for infostream from CWD to dataDir

2014-03-17 Thread Toby Cole


[ 
https://issues.apache.org/jira/browse/SOLR-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-4878:
---

   Fix Version/s: (was: 4.7)
  4.8

 Change default directory for infostream from CWD to dataDir
 ---
 
Key: SOLR-4878
URL: https://issues.apache.org/jira/browse/SOLR-4878
Project: Solr
 Issue Type: Bug
   Affects Versions: 4.3
   Reporter: Shawn Heisey
   Assignee: Shawn Heisey
Fix For: 4.8
 
Attachments: SOLR-4878.patch, SOLR-4878.patch
 
 
 The default directory for the infoStream file is CWD.  In a multicore system 
 where all the cores share similar configs, the output from all cores is 
 likely to end up in the same file.  Although this is sometimes the desired 
 outcome, it seems less than ideal.  If you've got cores that literally share 
 the same config file, or you're using SolrCloud where more than one core on 
 the system uses the same config set, you won't have the option of putting 
 different files in different configs.
 If the default directory were dataDir rather than CWD, each core would get 
 its own infostream file.  You could still get the original behavior by 
 specifying an absolute path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe


[ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937672#comment-13937672
 ] 

Michael McCandless commented on LUCENE-5532:


+1

I love the use of startingGun in the test :)

 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination


[ 
https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937673#comment-13937673
 ] 

Michael McCandless commented on LUCENE-5515:


This seems worth mentioning in CHANGES?

 Improve TopDocs#merge for pagination
 

 Key: LUCENE-5515
 URL: https://issues.apache.org/jira/browse/LUCENE-5515
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5515.patch, LUCENE-5515.patch


 If TopDocs#merge takes from and size into account it can be optimized to 
 create a hits ScoreDoc array equal to size instead of from+size what is now 
 the case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results

2014-03-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937677#comment-13937677
 ] 

Michael McCandless commented on LUCENE-5533:


+1

But maybe we should break out a LongTaxonomyFacets instead?  Ie, the more 
common-case of simple facet counting would never overlow an int since a Lucene 
shard can have at most 2.1B docs.

 TaxonomyFacetSumIntAssociations overflows, unpredicted results
 --

 Key: LUCENE-5533
 URL: https://issues.apache.org/jira/browse/LUCENE-5533
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Affects Versions: 4.7
Reporter: Rob Audenaerde

 {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses 
 a {{int[]}} to store values. If you sum a lot of integers in the 
 IntAssociatoins, the {{int}} will overflow.
 The easiest fix seems to change the {{value[]}} to {{long}}?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination


[ 
https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937680#comment-13937680
 ] 

ASF subversion and git services commented on LUCENE-5515:
-

Commit 1578300 from [~martijn.v.groningen] in branch 'dev/trunk'
[ https://svn.apache.org/r1578300 ]

LUCENE-5515: Added missing CHANGES.TXT entry

 Improve TopDocs#merge for pagination
 

 Key: LUCENE-5515
 URL: https://issues.apache.org/jira/browse/LUCENE-5515
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5515.patch, LUCENE-5515.patch


 If TopDocs#merge takes from and size into account it can be optimized to 
 create a hits ScoreDoc array equal to size instead of from+size what is now 
 the case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5476) Facet sampling

[
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937681#comment-13937681
]

Shai Erera commented on LUCENE-5476:

Rob, I reviewed the patch and I agree with Gilad - the way you handle the
scores array is wrong. It's not random access by doc. I believe if you added a
test it would show up quickly. But perhaps we can keep scores out of this
collector ... we can always add it later. So I don't mind if you want to wrap
up w/o scores for now. Can you then fix the patch to always set
keepScores=false? Also, I noticed few sops left in test.

Facet sampling
--

Key: LUCENE-5476
URL: https://issues.apache.org/jira/browse/LUCENE-5476
Project: Lucene - Core
Issue Type: Improvement
Reporter: Rob Audenaerde
Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
LUCENE-5476.patch, LUCENE-5476.patch,
SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java

With LUCENE-5339 facet sampling disappeared.
When trying to display facet counts on large datasets (10M documents)
counting facets is rather expensive, as all the hits are collected and
processed.
Sampling greatly reduced this and thus provided a nice speedup. Could it be
brought back?

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937683#comment-13937683
]

Michael McCandless commented on LUCENE-4396:

Da, that's a great discovery.

So, in the case where at least one MUST clause is present, BS will in fact
collect in-order, and then BS could be embedded in other queries that want a
sub-scorer.

This may force us to more strongly separate the notion of force doc-at-a-time
scoring (LUCENE-2684), since today the sneaky way to do this is return false
from your Collector.acceptsDocsOutOfOrder.

I think you should be careful in your proposal to keep this issue well-scoped.
I.e., the overall goal is to let BS handle MUST clauses in certain causes
(heuristic needs to decide this), and then a nice-to-have is to enable BS too
also be a sub-scorer in some cases.

BooleanScorer should sometimes be used for MUST clauses
---

Key: LUCENE-4396
URL: https://issues.apache.org/jira/browse/LUCENE-4396
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless

Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
If there is one or more MUST clauses we always use BooleanScorer2.
But I suspect that unless the MUST clauses have very low hit count compared
to the other clauses, that BooleanScorer would perform better than
BooleanScorer2. BooleanScorer still has some vestiges from when it used to
handle MUST so it shouldn't be hard to bring back this capability ... I think
the challenging part might be the heuristics on when to use which (likely we
would have to use firstDocID as proxy for total hit count).
Likely we should also have BooleanScorer sometimes use .advance() on the subs
in this case, eg if suddenly the MUST clause skips 100 docs then you want
to .advance() all the SHOULD clauses.
I won't have near term time to work on this so feel free to take it if you
are inspired!

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5265) Add backward compatibility tests to JavaBinCodec's format.

2014-03-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-5265.
--

   Resolution: Fixed
Fix Version/s: 5.0

 Add backward compatibility tests to JavaBinCodec's format.
 --

 Key: SOLR-5265
 URL: https://issues.apache.org/jira/browse/SOLR-5265
 Project: Solr
  Issue Type: Test
Reporter: Adrien Grand
Assignee: Noble Paul
Priority: Blocker
 Fix For: 4.8, 5.0

 Attachments: SOLR-5265.patch, SOLR-5265.patch, SOLR-5265.patch, 
 SOLR-5265.patch, javabin_backcompat.bin


 Since Solr guarantees backward compatibility of JavaBinCodec's format between 
 releases, we should have tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] lucene-solr pull request: Update GreekStemmer.java

2014-03-17 Thread pitsios-s

GitHub user pitsios-s opened a pull request:

https://github.com/apache/lucene-solr/pull/43

Update GreekStemmer.java

Added javadoc to the functions stem , endsWith , endsWithVowel and 
endsWithVowelNoY for the purposes of the software engineering class.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Team-DP/lucene-solr trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/43.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #43


commit 4bb599eabfe9355b823ad9809eca06c61db53e1b
Author: Stamatis Pitsios stamatis@gmail.com
Date:   2014-03-17T11:15:10Z

Update GreekStemmer.java

Added javadoc to the functions stem , endsWith , endsWithVowel and 
endsWithVowelNoY for the purposes of the software engineering class.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination


[ 
https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937685#comment-13937685
 ] 

ASF subversion and git services commented on LUCENE-5515:
-

Commit 1578305 from [~martijn.v.groningen] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1578305 ]

LUCENE-5515: Added missing CHANGES.TXT entry

 Improve TopDocs#merge for pagination
 

 Key: LUCENE-5515
 URL: https://issues.apache.org/jira/browse/LUCENE-5515
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5515.patch, LUCENE-5515.patch


 If TopDocs#merge takes from and size into account it can be optimized to 
 create a hits ScoreDoc array equal to size instead of from+size what is now 
 the case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5837) Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField.


 [ 
https://issues.apache.org/jira/browse/SOLR-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-5837:


Assignee: Noble Paul  (was: Mark Miller)

 Add missing equals implementation for SolrDocument, SolrInputDocument and 
 SolrInputField.
 -

 Key: SOLR-5837
 URL: https://issues.apache.org/jira/browse/SOLR-5837
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
Assignee: Noble Paul
 Attachments: SOLR-5837.patch, SOLR-5837.patch


 While working on SOLR-5265 I tried comparing objects of SolrDocument, 
 SolrInputDocument and SolrInputField. These classes did not Override the 
 equals implementation. 
 The issue will Override equals and hashCode methods to the 3 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5837) Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField.


 [ 
https://issues.apache.org/jira/browse/SOLR-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-5837.
--

   Resolution: Won't Fix
Fix Version/s: (was: 4.8)
   (was: 5.0)

The equals methods were added to testcases . So ,this is not required

 Add missing equals implementation for SolrDocument, SolrInputDocument and 
 SolrInputField.
 -

 Key: SOLR-5837
 URL: https://issues.apache.org/jira/browse/SOLR-5837
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
Assignee: Noble Paul
 Attachments: SOLR-5837.patch, SOLR-5837.patch


 While working on SOLR-5265 I tried comparing objects of SolrDocument, 
 SolrInputDocument and SolrInputField. These classes did not Override the 
 equals implementation. 
 The issue will Override equals and hashCode methods to the 3 classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results

2014-03-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937690#comment-13937690
 ] 

Shai Erera commented on LUCENE-5533:


bq. But maybe we should break out a LongTaxonomyFacets instead?

+1, and especially as it's only relevant to association-based faceting.

Rob, is this something you really hit or just a random code review? I agree 
when you sum integers there's a risk of overflowing, but I'm afraid if we 
introduce LongTaxoFacets users might want to use it just in case. The risk is 
that a single ord will overflow, right?

I wonder if we should use a packed long buffer instead of a plain long[] ... 
that's optimization though. First let's agree that this is something that needs 
fixing.

 TaxonomyFacetSumIntAssociations overflows, unpredicted results
 --

 Key: LUCENE-5533
 URL: https://issues.apache.org/jira/browse/LUCENE-5533
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Affects Versions: 4.7
Reporter: Rob Audenaerde

 {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses 
 a {{int[]}} to store values. If you sum a lot of integers in the 
 IntAssociatoins, the {{int}} will overflow.
 The easiest fix seems to change the {{value[]}} to {{long}}?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination


[ 
https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937694#comment-13937694
 ] 

ASF subversion and git services commented on LUCENE-5515:
-

Commit 1578308 from [~martijn.v.groningen] in branch 'dev/trunk'
[ https://svn.apache.org/r1578308 ]

LUCENE-5515: Added author to CHANGES.txt entry

 Improve TopDocs#merge for pagination
 

 Key: LUCENE-5515
 URL: https://issues.apache.org/jira/browse/LUCENE-5515
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5515.patch, LUCENE-5515.patch


 If TopDocs#merge takes from and size into account it can be optimized to 
 create a hits ScoreDoc array equal to size instead of from+size what is now 
 the case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5513) Binary DocValues Updates


[ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937696#comment-13937696
 ] 

Shai Erera edited comment on LUCENE-5513 at 3/17/14 11:26 AM:
--

Fixed silly bug in BinaryDocValuesFieldUpdates.merge().


was (Author: shaie):
Fixed stupid bug in BinaryDocValuesFieldUpdates.merge().

 Binary DocValues Updates
 

 Key: LUCENE-5513
 URL: https://issues.apache.org/jira/browse/LUCENE-5513
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/index
Reporter: Mikhail Khludnev
Priority: Minor
 Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch


 LUCENE-5189 was a great move toward. I wish to continue. The reason for 
 having this feature is to have join-index - to write children docnums into 
 parent's binaryDV. I can try to proceed the implementation, but I'm not so 
 experienced in such deep Lucene internals. [~shaie], any hint to begin with 
 is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5513) Binary DocValues Updates


 [ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5513:
---

Attachment: LUCENE-5513.patch

Fixed stupid bug in BinaryDocValuesFieldUpdates.merge().

 Binary DocValues Updates
 

 Key: LUCENE-5513
 URL: https://issues.apache.org/jira/browse/LUCENE-5513
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/index
Reporter: Mikhail Khludnev
Priority: Minor
 Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch


 LUCENE-5189 was a great move toward. I wish to continue. The reason for 
 having this feature is to have join-index - to write children docnums into 
 parent's binaryDV. I can try to proceed the implementation, but I'm not so 
 experienced in such deep Lucene internals. [~shaie], any hint to begin with 
 is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5872) Eliminate overseer queue

Noble Paul created SOLR-5872:


 Summary: Eliminate overseer queue 
 Key: SOLR-5872
 URL: https://issues.apache.org/jira/browse/SOLR-5872
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul


The overseer queue is one of the busiest points in the entire system. The 
raison d'être of the queue is
 * Provide batching of operations for the main clusterstate,json so that state 
updates are minimized 
* Avoid race conditions and ensure order

Now , as we move the individual collection states out of the main 
clusterstate.json, the batching is not useful anymore.

Race conditions can easily be solved by using a compare and set in Zookeeper. 

The proposed solution  is , whenever an operation is required to be performed 
on the clusterstate, the same thread (and of course the same JVM)

 # read the fresh state and version of zk node  
 # construct the new state 
 # perform a compare and set
 # if compare and set fails go to step 1

This should be limited to all operations performed on external collections 
because batching would be required for others 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5534) GreekStemmer javadocs

2014-03-17 Thread ASF subversion and git services (JIRA)

Robert Muir created LUCENE-5534:
---

 Summary: GreekStemmer javadocs
 Key: LUCENE-5534
 URL: https://issues.apache.org/jira/browse/LUCENE-5534
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir


Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5873) Improve JavaBinCodec's backward compatibility tests

2014-03-17 Thread Varun Thacker (JIRA)

Varun Thacker created SOLR-5873:
---

 Summary: Improve JavaBinCodec's backward compatibility tests
 Key: SOLR-5873
 URL: https://issues.apache.org/jira/browse/SOLR-5873
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker


SOLR-5265 added backward compatibility tests, but it tries to read a 
pre-written binary file to check if there is a break a not. If we add more 
types to JavaBinCodec the test will need to be updated too, which will be error 
prone again.

This is what [~hakeber] proposed on IRC - 

- A test that I was thinking of: we could have a jenkins job that ran a script 
that checked out the previous version of lucene and the the latest
- Then use the solr/cloud-dev scripts to start a cloud cluster
- Index some docs
- Stop a node at a time, replace webapp with the latest in a rolling upgrade 
fashion
- Then we have a full rolling upgrade test

This would be a better approach for back compat tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5534) GreekStemmer javadocs


[ 
https://issues.apache.org/jira/browse/LUCENE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937704#comment-13937704
 ] 

ASF subversion and git services commented on LUCENE-5534:
-

Commit 1578315 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1578315 ]

LUCENE-5534: add javadocs to GreekStemmer (closes #43)

 GreekStemmer javadocs
 -

 Key: LUCENE-5534
 URL: https://issues.apache.org/jira/browse/LUCENE-5534
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir

 Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] lucene-solr pull request: Update GreekStemmer.java

2014-03-17 Thread rmuir

Github user rmuir commented on the pull request:

https://github.com/apache/lucene-solr/pull/43#issuecomment-37806850
  
Thank you very much! I just committed this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] lucene-solr pull request: Update GreekStemmer.java

2014-03-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/43


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5534) GreekStemmer javadocs

2014-03-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937705#comment-13937705
 ] 

ASF subversion and git services commented on LUCENE-5534:
-

Commit 1578317 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1578317 ]

LUCENE-5534: add javadocs to GreekStemmer (closes #43)

 GreekStemmer javadocs
 -

 Key: LUCENE-5534
 URL: https://issues.apache.org/jira/browse/LUCENE-5534
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir

 Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5534) GreekStemmer javadocs


 [ 
https://issues.apache.org/jira/browse/LUCENE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5534.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.8

 GreekStemmer javadocs
 -

 Key: LUCENE-5534
 URL: https://issues.apache.org/jira/browse/LUCENE-5534
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.8, 5.0


 Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe

2014-03-17 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937723#comment-13937723
 ] 

Dawid Weiss commented on LUCENE-5532:
-

 or is the thread group inherited by the test framework? 

The test suite runs in its own test group so any thread (unless explicitly 
assigned to another group) will inherit that group from its parent.



 AutomatonQuery.hashCode is not thread safe
 --

 Key: LUCENE-5532
 URL: https://issues.apache.org/jira/browse/LUCENE-5532
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5532.patch, LUCENE-5532.patch


 This hashCode is implemented based on  #states and #transitions.
 These methods use getNumberedStates() though, which may oversize itself 
 during construction and then size down when its done. But numberedStates is 
 prematurely set (before its ready), which can cause a hashCode call from 
 another thread to see a corrupt state... causing things like NPEs from null 
 states and other strangeness. I don't think we should set this variable until 
 its finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937732#comment-13937732
]

Alexander S. commented on SOLR-4787:

Thank you, Kranti Parisa, I am far from java development, how can I apply this
patch and build solr for linux? I tried to patch, it creates a new folder
joins in solr/contrib, installed ivy and launched ant compile but got this
error:
{quote}
common.compile-core:
[mkdir] Created dir:
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] Compiling 3 source files to
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with
-source 1.6
[javac]
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:883:
error: reached end of file while parsing
[javac] return this.delegate.acceptsDocsOutOfOrder();
[javac]^
[javac]
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:884:
error: reached end of file while parsing
[javac] 2 errors
[javac] 1 warning

BUILD FAILED
/home/heaven/Desktop/solr-4.7.0/build.xml:106: The following error occurred
while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:458: The following error
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:449: The following error
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:471: The following
error occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:1736: Compile failed;
see the compiler error output for details.

Total time: 8 minutes 55 seconds
{quote}

Join Contrib

Key: SOLR-4787
URL: https://issues.apache.org/jira/browse/SOLR-4787
Project: Solr
Issue Type: New Feature
Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
Fix For: 4.8

Attachments: SOLR-4787-deadlock-fix.patch,
SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch,
SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch,
SOLR-4797-hjoin-multivaluekeys-trunk.patch

Re: Reducing the number of warnings in the codebase

2014-03-17 Thread Furkan KAMACI

Hi;

As being adviser of using Sonar I want to add some more points. First of
all such kind of tools shows some metrics other than code warnings and we
should show that metrics even we don't use them. In example *sometimes *code
complexity is a good measure to review your code. I should mention that
code warnings are not listed directly at Sonar. They are separated into
categories. Major is an important category to take care on. You can ignore
the minor warnings if you want.

When I use Sonar and check the codes of my team sometimes I realize that
there are false positive warnings. These rules can easily be dropped from
Sonar. My idea is that: whether we care Sonar outputs or not we should
integrate our project to Sonar instance available at Apache. I've opened a
Jira issue for it: https://issues.apache.org/jira/browse/SOLR-5869 and *I'm
volunteer to work for it.*

All in all I think that these tools (as like PMD or etc.) sometimes really
helpful. Try to search for the reasons of every fired bugs and at the end
you can see that you've finished Effective Java because of references to
items. I know that there are false positives but such things can be discard
easily.

Other point is that: these tools produces nice graphics that you can see
the direction of your project even you don't use its bug warnings, code
coverage metrics or something like that.

I've created issues about bugs (I've checked all the major ones) and *I've
applied patch for them* previously. Some of them are: SOLR-5836, SOLR-5838,
SOLR-5839, SOLR-5840, SOLR-5841, LUCENE-5506, LUCENE-5508, LUCENE-5509

Thanks;
Furkan KAMACI





2014-03-16 23:34 GMT+02:00 Benson Margulies bimargul...@gmail.com:

 I think we avoid bikeshed by making incremental changes. If you offer
 a commit to turn off serial version UID whining, I'll +1 it. And then
 we iterate, in small doses, agreeing to either spike the warning or
 change the code.


 In passing, I will warn you that the IDEs can be very stubborn; in
 some cases, there is no way to avoid some amount of whining. Eclipse
 used to insist on warning on every @SuppressWarnings that it didn't
 understand. It might still.

 On Sun, Mar 16, 2014 at 5:29 PM, Shawn Heisey s...@elyograg.org wrote:
  A starting comment: We could bikeshed for *years*.
 
  General thought: The more I think about it, the more I like the notion
  of confining most of the cleanup to trunk.  Actual bug fixes and changes
  that are relatively non-invasive should be backported.
 
  On 3/16/2014 2:48 PM, Uwe Schindler wrote:
  Just because some tool expresses distaste, doesn't imply that everyone
 here
  agrees that it's a problem we should fix.
 
  Yes that is my biggest problem. Lots of warnings by Eclipse are just
 bullshit because of the code style in Lucene and for example the way we do
 things - e.g., it complains about missing close() all the time, just
 because we use IOUtils.closeWhileHandlingExceptions() for that.
 
  My original thought on this was that we should use a combination of
  SuppressWarnings and actual code changes to eliminate most of the
  warnings that show up in the well-supported IDEs when they are
  configured with *default* settings.
 
  Uwe brings up a really good point that there are a number of completely
  useless warnings, but I think there's still value in looking through
  EVERY default IDE warning and evaluating each one on a case-by-case
  basis to decide whether that specific warning should be fixed or
  ignored.  It could be a sort of background task with an open Jira for
  tracking commits.  It could also be something that we decide isn't worth
  the effort.
 
  In my experience, the default Sonar rulesets contain many things that
 people
  here are prone to disagree with. Start with serialVersionUID:
  do we care? Why would we care? In what cases to we really believe that
 a
  sane person would be using Java serialization with a Lucene/Solr class?
 
  We officially don't support serialization, so all warnings are useless.
 It's just Eclipse that complains for no reason.
 
  Project-specific IDE settings for errors/warnings (set by the ant build
  target) will go a long way towards making the whole situation better.
  For the current stable branch, we should include settings for anything
  that we want to ignore on trunk, but only a subset of those problems
  that get elevated to error status.
 
  Sonar can also be a bit cranky; it arranges for various tools to run
 via
  mechanisms that sometimes conflict with the ways you might run them
  yourself.
 
  So I'd suggest a process like:
 
  1. Someone proposes a set of (e.g.) checkstyle rules to live by.
  2. That ruleset is refined by experiment.
  3. We make violations fail the build.
 
  Then lather, rinse, repeat for other tools.
 
  Yes I agree. I am strongly against PMD or CheckStyle without our own
 rules. Forbiddeen-apis was invented because of the brokenness of PMD and
 CheckStyle to detect default Locale/Charset/Timezone violations (and

[jira] [Commented] (SOLR-4787) Join Contrib

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937747#comment-13937747
]

Alexander S. commented on SOLR-4787:

Nvm, there were 3 missing } at the end of HashSetJoinQParserPlugin.java, the
build was successful, testing now.

Join Contrib

This contrib provides a place where different join implementations can be
contributed to Solr. This contrib currently includes 3 join implementations.
The initial patch was generated from the Solr 4.3 tag. Because of changes in
the FieldCache API this patch will only build with Solr 4.2 or above.
*HashSetJoinQParserPlugin aka hjoin*
The hjoin provides a join implementation that filters results in one core
based on the results of a search in another core. This is similar in
functionality to the JoinQParserPlugin but the implementation differs in a
couple of important ways.
The first way is that the hjoin is designed to work with int and long join
keys only. So, in order to use hjoin, int or long join keys must be included
in both the to and from core.
The second difference is that the hjoin builds memory structures that are
used to quickly connect the join keys. So, the hjoin will need more memory
then the JoinQParserPlugin to perform the join.
The main advantage of the hjoin is that it can scale to join millions of keys
between cores and provide sub-second response time. The hjoin should work
well with up to two million results from the fromIndex and tens of millions
of results from the main query.
The hjoin supports the following features:
1) Both lucene query and PostFilter implementations. A *cost* 99 will
turn on the PostFilter. The PostFilter will typically outperform the Lucene
query when the main query results have been narrowed down.
2) With the lucene query implementation there is an option to build the
filter with threads. This can greatly improve the performance of the query if
the main query index is very large. The threads parameter turns on
threading. For example *threads=6* will use 6 threads to build the filter.
This will setup a fixed threadpool with six threads to handle all hjoin
requests. Once the threadpool is created the hjoin will always use it to
build the filter. Threading does not come into play with the PostFilter.
3) The *size* local parameter can be used to set the initial size of the
hashset used to perform the join. If this is set above the number of results
from the fromIndex then the you can avoid hashset resizing which improves
performance.
4) Nested filter queries. The local parameter fq can be used to nest a
filter query within the join. The nested fq will filter the results of the
join query. This can point to another join to support nested joins.
5) Full caching support for the lucene query implementation. The filterCache
and queryResultCache should work properly even with deep nesting of joins.
Only the queryResultCache comes into play with the PostFilter implementation
because PostFilters are not cacheable in the filterCache.
The syntax of the hjoin is similar to the JoinQParserPlugin except that the
plugin is referenced by the string hjoin rather then join.
fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
fq=$qq\}user:customer1qq=group:5
The example filter query above will search the fromIndex (collection2) for
user:customer1 applying the local fq parameter to filter the results. The
lucene filter query will be built using 6 threads. This query will generate a
list of values from the from field that will be used to filter the main
query. Only records from the main query, where the to field is present in
the from list will be included in the results.
The solrconfig.xml in the main query core must contain the reference to the
hjoin.
queryParser name=hjoin
class=org.apache.solr.joins.HashSetJoinQParserPlugin/
And the join contrib lib jars must be registed in the solrconfig.xml.
lib dir=../../../contrib/joins/lib regex=.*\.jar /
After issuing the ant dist command from inside the solr directory the joins
contrib jar will appear in the solr/dist

[jira] [Updated] (LUCENE-5476) Facet sampling


 [ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5476:
---

Attachment: LUCENE-5476.patch

Removed scores. Added javadoc explaining what happens to scores. Removed 
System.out.println

 Facet sampling
 --

 Key: LUCENE-5476
 URL: https://issues.apache.org/jira/browse/LUCENE-5476
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Rob Audenaerde
 Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
 SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java


 With LUCENE-5339 facet sampling disappeared. 
 When trying to display facet counts on large datasets (10M documents) 
 counting facets is rather expensive, as all the hits are collected and 
 processed. 
 Sampling greatly reduced this and thus provided a nice speedup. Could it be 
 brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?

Rob Audenaerde created LUCENE-5535:
--

 Summary: DrillDownQuery not working with AssociateFacetFields?
 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
 Attachments: AssociationsFacetsWithDrilldownExample.java

I'm trying to use the FloatAssociationFacetField to store a float with each 
facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I 
try to drilldown on one of the facets, the result is always empty. 

See attached example.  





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?


 [ 
https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5535:
---

Attachment: AssociationsFacetsWithDrilldownExample.java

 DrillDownQuery not working with AssociateFacetFields?
 -

 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
 Attachments: AssociationsFacetsWithDrilldownExample.java


 I'm trying to use the FloatAssociationFacetField to store a float with each 
 facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When 
 I try to drilldown on one of the facets, the result is always empty. 
 See attached example.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60-ea-b07) - Build # 9719 - Failure!

2014-03-17 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9719/
Java: 32bit/jdk1.7.0_60-ea-b07 -client -XX:+UseSerialGC

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:59868 within 45000 ms

Stack Trace:
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:59868 within 45000 ms
at 
__randomizedtesting.SeedInfo.seed([7B2763E3A20565BB:FAC1EDFBD55A0587]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)

[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?


[ 
https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937767#comment-13937767
 ] 

Shai Erera commented on LUCENE-5535:


Have you looked at AssociationsFacetsExample under lucene/demo? It has a 
drilldown() example too. Also, I ran the example code you attached and it 
produced:

{noformat}
Sum associations example:
-
tags: dim=tags path=[] value=-1 childCount=2
  lucene (4)
  solr (2)

genre: dim=genre path=[] value=-1.0 childCount=2
  computing (1.62)
  software (0.34)

Count withouth associations:
-
tags: dim=tags path=[] value=-1 childCount=2
  lucene (2)
  solr (1)
{noformat}

Where is the problem?

 DrillDownQuery not working with AssociateFacetFields?
 -

 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
 Attachments: AssociationsFacetsWithDrilldownExample.java


 I'm trying to use the FloatAssociationFacetField to store a float with each 
 facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When 
 I try to drilldown on one of the facets, the result is always empty. 
 See attached example.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2733) DIH - Ignoring Error when closing connection when send command abort in jdbc 5.1.17

2014-03-17 Thread Manjunath (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937766#comment-13937766
 ] 

Manjunath commented on SOLR-2733:
-

I did get the same error but different occasion.
Here is the stack trace

{code:xml}
ERROR org.apache.solr.handler.dataimport.JdbcDataSource  – Ignoring Error when 
closing connection
java.sql.SQLException: Streaming result set 
com.mysql.jdbc.RowDataDynamic@479abcd4 is still active. No statements may be 
issued when any streaming result sets are open and in use on a given 
connection. Ensure that you have called .close() on any active streaming result 
sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
at 
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3314)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2477)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809)
at 
com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:5165)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5048)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4654)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1630)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:436)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:421)
at 
org.apache.solr.handler.dataimport.DebugLogger$2.close(DebugLogger.java:180)
at 
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:294)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:283)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
{code}



 DIH -

[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results


[ 
https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937770#comment-13937770
 ] 

Rob Audenaerde commented on LUCENE-5533:


I hit this pretty easily. I tried to build an aggregator that sums the 
associated values for a given search. In my testcase, I used 1M documents. The 
n-th document had {{n}} as associated int value. Average int value is thus 
500.000. 500.000x1M   2,147,483,648

I currently switched to using Floats, which for me gives results that are 
accurate enough, and also allow for numbers greater than {{Integer.MAX_VALUE}}, 
so I'm not really sure it is a problem. Maybe there should be a 
{{RuntimeException}} if the accumulated values overflows?


 TaxonomyFacetSumIntAssociations overflows, unpredicted results
 --

 Key: LUCENE-5533
 URL: https://issues.apache.org/jira/browse/LUCENE-5533
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Affects Versions: 4.7
Reporter: Rob Audenaerde

 {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses 
 a {{int[]}} to store values. If you sum a lot of integers in the 
 IntAssociatoins, the {{int}} will overflow.
 The easiest fix seems to change the {{value[]}} to {{long}}?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?


[ 
https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937783#comment-13937783
 ] 

Rob Audenaerde commented on LUCENE-5535:


Maybe then I made a mistake somewhere else, because this is my result:

{noformat}
Sum associations example:
-
tags: null
genre: null

Count withouth associations:
-
tags: dim=tags path=[] value=-1 childCount=2
  lucene (2)
  solr (1)
{noformat}

I'll try to double check asap.

 DrillDownQuery not working with AssociateFacetFields?
 -

 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
 Attachments: AssociationsFacetsWithDrilldownExample.java


 I'm trying to use the FloatAssociationFacetField to store a float with each 
 facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When 
 I try to drilldown on one of the facets, the result is always empty. 
 See attached example.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5536) TaxonomyFacetSumInt/FloatAssociations should not rollup()

Shai Erera created LUCENE-5536:
--

 Summary: TaxonomyFacetSumInt/FloatAssociations should not rollup()
 Key: LUCENE-5536
 URL: https://issues.apache.org/jira/browse/LUCENE-5536
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Shai Erera


Stumbled upon this by accident when I reviewed the code. The previous 
associations impl never rolled-up. The assumption is that association values 
are given to exact categories and have no hierarchical meaning. For instance if 
a document is associated with two categories: {{Category/CS/Algo}} and 
{{Category/CS/DataStructure}} with weights {{0.95}} and {{0.43}} respectively, 
it is not associated with {{Category/CS}} with weight {{1.38}}! :)

If the app wants to association values to apply to parents in the hierarchy as 
well, it needs to explicitly specify that (as in passing the hierarchy 
categories with their own association value).

I will fix the bug and also make sure the app cannot trip it by accidentally 
specifying hierarchical on these categories, or that if it does (cause e.g. it 
indexes the categories for both counting and assoc values) then we don't apply 
the association to all the categories in the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results


[ 
https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937785#comment-13937785
 ] 

Shai Erera commented on LUCENE-5533:


I see, it is a pretty extreme case indeed! We never hit overflow problems in 
the past :).

The problem w/ raising RuntimeException is it means adding an {{if}} to every 
aggregation, which is costly and for a really extreme case. I think it's better 
if you write your own TaxoFacetSumLongAssoc to use a long[], packed-ints, 
float[] or whatever and raise these exceptions yourself? Also, perhaps it's ok 
to e.g. stop at weight=1B/2.1B to denote that this category is already very 
important and all categories beyond this weight are equally important? Not sure 
of your usecase and if it makes sense, but juts a thought. That too can easily 
be done in your own Facets impl.

 TaxonomyFacetSumIntAssociations overflows, unpredicted results
 --

 Key: LUCENE-5533
 URL: https://issues.apache.org/jira/browse/LUCENE-5533
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Affects Versions: 4.7
Reporter: Rob Audenaerde

 {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses 
 a {{int[]}} to store values. If you sum a lot of integers in the 
 IntAssociatoins, the {{int}} will overflow.
 The easiest fix seems to change the {{value[]}} to {{long}}?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?


[ 
https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937793#comment-13937793
 ] 

Shai Erera commented on LUCENE-5535:


Phew :). I'll resolve the issue then. Feel free to reopen if it still doesn't 
work.

 DrillDownQuery not working with AssociateFacetFields?
 -

 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
 Attachments: AssociationsFacetsWithDrilldownExample.java


 I'm trying to use the FloatAssociationFacetField to store a float with each 
 facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When 
 I try to drilldown on one of the facets, the result is always empty. 
 See attached example.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?


[ 
https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937798#comment-13937798
 ] 

Shai Erera commented on LUCENE-5535:


I think you may be tripping LUCENE-5522, which I fixed a few days ago.

 DrillDownQuery not working with AssociateFacetFields?
 -

 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
 Attachments: AssociationsFacetsWithDrilldownExample.java


 I'm trying to use the FloatAssociationFacetField to store a float with each 
 facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When 
 I try to drilldown on one of the facets, the result is always empty. 
 See attached example.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?


 [ 
https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5535.


Resolution: Duplicate
  Assignee: Shai Erera

 DrillDownQuery not working with AssociateFacetFields?
 -

 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
Assignee: Shai Erera
 Attachments: AssociationsFacetsWithDrilldownExample.java


 I'm trying to use the FloatAssociationFacetField to store a float with each 
 facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When 
 I try to drilldown on one of the facets, the result is always empty. 
 See attached example.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?


[ 
https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937789#comment-13937789
 ] 

Rob Audenaerde commented on LUCENE-5535:


I think I used an older revision :/

 DrillDownQuery not working with AssociateFacetFields?
 -

 Key: LUCENE-5535
 URL: https://issues.apache.org/jira/browse/LUCENE-5535
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Rob Audenaerde
 Attachments: AssociationsFacetsWithDrilldownExample.java


 I'm trying to use the FloatAssociationFacetField to store a float with each 
 facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When 
 I try to drilldown on one of the facets, the result is always empty. 
 See attached example.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5874) Unsafe cast in RouteException

2014-03-17 Thread David Arthur (JIRA)

David Arthur created SOLR-5874:
--

 Summary: Unsafe cast in RouteException
 Key: SOLR-5874
 URL: https://issues.apache.org/jira/browse/SOLR-5874
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.6.1
Reporter: David Arthur


When a non-Exception is thrown somewhere in the CloudSolrServer, I get a XXX 
cannot be cast to java.lang.Exception

{code}
java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast to 
java.lang.Exception
at 
org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException.init(CloudSolrServer.java:484)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:351)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
{code}

Should probably cast to Throwable, or do a check and wrap non-Exceptions in an 
Exception first




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5664) /browse: Show all highlighting fragments


 [ 
https://issues.apache.org/jira/browse/SOLR-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-5664:
--

Description: Currently if there are more highlighting fragments for the 
features field in example, only the first one is redered in the /browse GUI  
(was: Currently if there are more highlighting fragments, only the first one is 
redered in the /browse GUI)

 /browse: Show all highlighting fragments
 

 Key: SOLR-5664
 URL: https://issues.apache.org/jira/browse/SOLR-5664
 Project: Solr
  Issue Type: Bug
  Components: contrib - Velocity
Reporter: Jan Høydahl
 Fix For: 4.8


 Currently if there are more highlighting fragments for the features field 
 in example, only the first one is redered in the /browse GUI



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-3613) Namespace Solr's JAVA OPTIONS


 [ 
https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl closed SOLR-3613.
-

Resolution: Won't Fix

As Solr is moving away from being a deployable war, this issue becomes less 
relevant. Closing.

 Namespace Solr's JAVA OPTIONS
 -

 Key: SOLR-3613
 URL: https://issues.apache.org/jira/browse/SOLR-3613
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA
Reporter: Jan Høydahl
 Fix For: 4.8

 Attachments: SOLR-3613.patch


 Solr being a web-app, should play nicely in a setting where users deploy it 
 on a shared appServer.
 To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid 
 name clashes and for clarity when reading your appserver startup script. We 
 currently do that with most: {{solr.solr.home, solr.data.dir, 
 solr.abortOnConfigurationError, solr.directoryFactory, 
 solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we 
 fail to do so.
 Before release of 4.0 we should make sure to clean this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4495) solr.xml sharedLib attribtue should take a list of paths


[ 
https://issues.apache.org/jira/browse/SOLR-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937840#comment-13937840
 ] 

Jan Høydahl commented on SOLR-4495:
---

What do people feel is best here?
* (A) {{;}} separated string in {{str name=sharedLib}}
* (B) Multiple occurrences of the tag {{str name=sharedLib}}

 solr.xml sharedLib attribtue should take a list of paths
 

 Key: SOLR-4495
 URL: https://issues.apache.org/jira/browse/SOLR-4495
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Jan Høydahl
  Labels: classpath, solr.xml
 Fix For: 4.8

 Attachments: SOLR-4495.patch


 solr.xml's sharedLib is a great way to add plugins that should be shared 
 across all cores/collections.
 For increased flexibility I would like for it to take a list of paths. Then 
 I'd put Solr's own contrib libs in one shared folder solrJars and custom 
 plugins with deps in another customerJars. That eases Solr upgrades, then 
 we can simply wipe and replace all jars in solrJars during upgrade.
 I realize that solr.xml is going away, and so the same request will be valid 
 for whatever replaces solr.xml, whether it be system prop or properties file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4495) solr.xml sharedLib attribute should take a list of paths


 [ 
https://issues.apache.org/jira/browse/SOLR-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-4495:
--

Summary: solr.xml sharedLib attribute should take a list of paths  (was: 
solr.xml sharedLib attribtue should take a list of paths)

 solr.xml sharedLib attribute should take a list of paths
 

 Key: SOLR-4495
 URL: https://issues.apache.org/jira/browse/SOLR-4495
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Jan Høydahl
  Labels: classpath, solr.xml
 Fix For: 4.8

 Attachments: SOLR-4495.patch


 solr.xml's sharedLib is a great way to add plugins that should be shared 
 across all cores/collections.
 For increased flexibility I would like for it to take a list of paths. Then 
 I'd put Solr's own contrib libs in one shared folder solrJars and custom 
 plugins with deps in another customerJars. That eases Solr upgrades, then 
 we can simply wipe and replace all jars in solrJars during upgrade.
 I realize that solr.xml is going away, and so the same request will be valid 
 for whatever replaces solr.xml, whether it be system prop or properties file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5872) Eliminate overseer queue

2014-03-17 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937842#comment-13937842
 ] 

Yonik Seeley commented on SOLR-5872:


bq. as we move the individual collection states out of the main 
clusterstate.json [...]

This will make a difference on clusters with many smaller collections, but not 
on the single big collection.
It seems like we still want scalability in both directions (wrt number of 
collections, and the size a single collection can be).

 Eliminate overseer queue 
 -

 Key: SOLR-5872
 URL: https://issues.apache.org/jira/browse/SOLR-5872
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 The overseer queue is one of the busiest points in the entire system. The 
 raison d'être of the queue is
  * Provide batching of operations for the main clusterstate,json so that 
 state updates are minimized 
 * Avoid race conditions and ensure order
 Now , as we move the individual collection states out of the main 
 clusterstate.json, the batching is not useful anymore.
 Race conditions can easily be solved by using a compare and set in Zookeeper. 
 The proposed solution  is , whenever an operation is required to be performed 
 on the clusterstate, the same thread (and of course the same JVM)
  # read the fresh state and version of zk node  
  # construct the new state 
  # perform a compare and set
  # if compare and set fails go to step 1
 This should be limited to all operations performed on external collections 
 because batching would be required for others 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3135) New binary request/response format using Avro


 [ 
https://issues.apache.org/jira/browse/SOLR-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3135:
--

Fix Version/s: (was: 4.8)

 New binary request/response format using Avro
 -

 Key: SOLR-3135
 URL: https://issues.apache.org/jira/browse/SOLR-3135
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers, search
Reporter: Jan Høydahl
  Labels: Avro, RequestHandler, ResponseWriter, serialization

 Solr does not have a binary request/response format which can be supported by 
 any client/programming language. The JavaBin format is Java only and is also 
 not standards based.
 The proposal (spinoff from SOLR-1535 and SOLR-2204) is to investigate 
 creation of an [Apache Avro|http://avro.apache.org/] based serialization 
 format. First goal is to produce Avro 
 [Schemas|http://avro.apache.org/docs/current/#schemas] for Request and 
 Response and then provide {{AvroRequestHandler}} and {{AvroResponseWriter}}. 
 Secondary goal is to use it for replication.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937845#comment-13937845
]

Alexander S. commented on SOLR-4787:

Kranti,

Do I need to update anything in my solr config/schema? I've just tried the
patched version and it still ignores the fq parameter. I was using solr 4.7.0.

Thanks,
Alex

Join Contrib

[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-17 Thread Furkan KAMACI (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937853#comment-13937853
 ] 

Furkan KAMACI commented on SOLR-5852:
-

[~elyograg] ConnectStringParser at Zookeeper checks chroot and other invalid 
situations. We can give that checking responsibility to Zookeeper. If anything 
changes within Zookeeper check condition our CloudSolrServer will not be 
affected from it because we will pass that check to Zookeeper and it will 
handle it.

I think that we can handle chroot with current situation too. Zookeeper.java 
works like that: 127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a so I can 
improve the javadoc and include that: if there is a chroot add it to the end of 
the last host string (this is how original Zookeeper code works). All in all if 
anybody sends multiple chroot definitions or anything else Zookeeper will 
return an error.

Another approach is accepting like that: 
127.0.0.1:3000/app/a,127.0.0.1:3001/app/a,127.0.0.1:3002/app/a so parsing if 
there any chroot and valid for all hosts etc.

 Add CloudSolrServer helper method to connect to a ZK ensemble
 -

 Key: SOLR-5852
 URL: https://issues.apache.org/jira/browse/SOLR-5852
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-5852.patch, SOLR-5852_FK.patch


 We should have a CloudSolrServer constructor which takes a list of ZK servers 
 to connect to.
 Something Like 
 {noformat}
 public CloudSolrServer(String... zkHost);
 {noformat}
 - Document the current constructor better to mention that to connect to a ZK 
 ensemble you can pass a comma-delimited list of ZK servers like 
 zk1:2181,zk2:2181,zk3:2181
 - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Kranti Parisa (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937868#comment-13937868
]

Kranti Parisa commented on SOLR-4787:
-

Alex,

Are you using HashSetJoin? Did you configure in solrconfig.xml?

Join Contrib

[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2014-03-17 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937878#comment-13937878
 ] 

Erick Erickson commented on SOLR-1604:
--

OK, to finish this off, we need a Wiki/Confluence page, calling for volunteers:

Some points that should be mentioned I think:
 how to set up/use (simple really, defType)
 A number of examples
 inOrder=true|false as a local param mentioned
 Anyone's experience with how it performs, especially with things like 
 single-letter wildcards (e.g. j* smith)




 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, 
 SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, 
 SOLR1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-03-17 Thread Furkan KAMACI (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated SOLR-5852:


Attachment: SOLR-5852_FK.patch

I've improved the javadoc. We can use whether SOLR-4620 or this. On the other 
hand I can implement another patch according to second approach at my previous 
comment.

 Add CloudSolrServer helper method to connect to a ZK ensemble
 -

 Key: SOLR-5852
 URL: https://issues.apache.org/jira/browse/SOLR-5852
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-5852.patch, SOLR-5852_FK.patch, SOLR-5852_FK.patch


 We should have a CloudSolrServer constructor which takes a list of ZK servers 
 to connect to.
 Something Like 
 {noformat}
 public CloudSolrServer(String... zkHost);
 {noformat}
 - Document the current constructor better to mention that to connect to a ZK 
 ensemble you can pass a comma-delimited list of ZK servers like 
 zk1:2181,zk2:2181,zk3:2181
 - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-03-17 Thread Steven Bower (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937883#comment-13937883
 ] 

Steven Bower commented on SOLR-5488:


For field facet test this is def an ordering thing.. i started looking at this 
but haven't finished.. although i think i removed the @Ignore which is why is 
started failing always..

that being said I found some rather interesting issues internally that may have 
been causing some of the intermittent failures..

are these tests with the most recent patch i applied?

still working.. will update when I get further..

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, 
 SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937887#comment-13937887
]

Alexander S. commented on SOLR-4787:

Hi, I am using simple join, this way: {!join from=profile_ids_im to=id_i
fq=$joinFilter1 v=$joinQuery1}.

Join Contrib

[jira] [Created] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard

Steve Rowe created SOLR-5875:


 Summary: QueryComponent.mergeIds() unmarshals all docs' sort field 
values once per doc instead of once per shard
 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.7.1


SOLR-5354 added unmarshalling of distributed sort field values in 
{{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all 
docs' sort field values) for every doc, and stores the result with each doc.  
This is unnecessary, inefficient, and extremely wasteful of memory.

In an offline conversation, [~alexey] described the issue to me and located the 
likely problem, and [~hossman_luc...@fucit.org] located the problem code via 
inspection.

This bug is very likely the problem described on the solr-user mailing list 
here: [SolrCloud constantly crashes after upgrading to Solr 
4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-03-17 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937896#comment-13937896
]

Erick Erickson commented on SOLR-5488:
--

Hey Steve!

Sorry, should have updated things yesterday. Yes, these are with all the latest
patches applied. That said, I re-wound based on Houston's comments and undid
the changes to fieldFacets.txt (which were local anyway, of course I didn't
check them in). So essentially, just trunk with your latest patch and removing
@Ignore and/or @BadApple.

The FieldFacetTest is the more interesting since it fails all the time. Why
that would be related to the assertU around the commits I have no clue. That
seems out in left field somewhere.

I'll be able to look at any changes intermittently starting this evening CA
time, got a busy day ahead.

Fix up test failures for Analytics Component

Key: SOLR-5488
URL: https://issues.apache.org/jira/browse/SOLR-5488
Project: Solr
Issue Type: Bug
Affects Versions: 4.7, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch,
SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors

The analytics component has a few test failures, perhaps
environment-dependent. This is just to collect the test fixes in one place
for convenience when we merge back into 4.x

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Kranti Parisa (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937895#comment-13937895
]

Kranti Parisa commented on SOLR-4787:
-

NestedJoins (fqs) are implemented in HashSetJoin. so after applying the patch
you will need to configure it in solrconfig.xml

queryParser name=hjoin
class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/

and use {!hjoin from=profile_ids_im to=id_i fq=$joinFilter1 v=$joinQuery1}, so
you are trying to do a self join on the same core?

Join Contrib

[jira] [Updated] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard


 [ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5875:
-

Attachment: SOLR-5875.patch

Simple patch with fix.

[~alexey] has confirmed that this solved the excessive memory use issue he saw.

Committing shortly.

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5872) Eliminate overseer queue

2014-03-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937901#comment-13937901
 ] 

Mark Miller commented on SOLR-5872:
---

I'm not fully sold on this yet. Compare and set is how this was first 
implemented and it has it's own issues - hence the work Sami did to move to the 
queue. 

Potter has noticed the overseer is fairly slow at working through state 
updates. I think that should be investigated first. 

 Eliminate overseer queue 
 -

 Key: SOLR-5872
 URL: https://issues.apache.org/jira/browse/SOLR-5872
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 The overseer queue is one of the busiest points in the entire system. The 
 raison d'être of the queue is
  * Provide batching of operations for the main clusterstate,json so that 
 state updates are minimized 
 * Avoid race conditions and ensure order
 Now , as we move the individual collection states out of the main 
 clusterstate.json, the batching is not useful anymore.
 Race conditions can easily be solved by using a compare and set in Zookeeper. 
 The proposed solution  is , whenever an operation is required to be performed 
 on the clusterstate, the same thread (and of course the same JVM)
  # read the fresh state and version of zk node  
  # construct the new state 
  # perform a compare and set
  # if compare and set fails go to step 1
 This should be limited to all operations performed on external collections 
 because batching would be required for others 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard


[ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937915#comment-13937915
 ] 

ASF subversion and git services commented on SOLR-5875:
---

Commit 1578434 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1578434 ]

SOLR-5875: QueryComponent.mergeIds() unmarshals all docs' sort field values 
once per doc instead of once per shard.

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard

2014-03-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937917#comment-13937917
 ] 

ASF subversion and git services commented on SOLR-5875:
---

Commit 1578435 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1578435 ]

SOLR-5875: QueryComponent.mergeIds() unmarshals all docs' sort field values 
once per doc instead of once per shard. (merged trunk r1578434)

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937921#comment-13937921
]

Alexander S. commented on SOLR-4787:

Ok, thx, I'll try with hjoin. And yes, I am trying to do it on the same core.

Join Contrib

[jira] [Resolved] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard

2014-03-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-5875.
--

Resolution: Fixed

Committed to trunk, branch_4x and the lucene_solr_4_7 branch.

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.8, 5.0, 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard


[ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937920#comment-13937920
 ] 

ASF subversion and git services commented on SOLR-5875:
---

Commit 1578438 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_7'
[ https://svn.apache.org/r1578438 ]

SOLR-5875: QueryComponent.mergeIds() unmarshals all docs' sort field values 
once per doc instead of once per shard. (merged trunk r1578434)

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.8, 5.0, 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard


 [ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5875:
-

Fix Version/s: 5.0
   4.8

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.8, 5.0, 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5750) Backup/Restore API for SolrCloud

2014-03-17 Thread Robert Parker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937924#comment-13937924
 ] 

Robert Parker commented on SOLR-5750:
-

You should have the option of backing up/replicating a live searchable 
collection on SolrCloud A to a live searchable collection across a WAN on 
SolrCloud B, each with their own separate ZooKeeper ensemble.   You should also 
be able to rename the collection on the fly so that the live searchable 
collection on SolrCloud A is called collectionA and its live updated 
searchable replication copy is known as collectionB so as to allow a single 
remote instance of SolrCloud to be multi-homed to act as a replication target 
for multiple other Solr instances' collections, even if those collections 
happen to have the same name on each of their source instances.  Also, WAN 
compression/optimization would be helpful as well.

 Backup/Restore API for SolrCloud
 

 Key: SOLR-5750
 URL: https://issues.apache.org/jira/browse/SOLR-5750
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
 Fix For: 4.8, 5.0


 We should have an easy way to do backups and restores in SolrCloud. The 
 ReplicationHandler supports a backup command which can create snapshots of 
 the index but that is too little.
 The command should be able to backup:
 # Snapshots of all indexes or indexes from the leader or the shards
 # Config set
 # Cluster state
 # Cluster properties
 # Aliases
 # Overseer work queue?
 A restore should be able to completely restore the cloud i.e. no manual steps 
 required other than bringing nodes back up or setting up a new cloud cluster.
 SOLR-5340 will be a part of this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Sorry for JIRA spam

2014-03-17 Thread Mark Miller

I don’t think it’s bad to have JIRA bump the fix version at all - you just want 
to supress an individual email for each change if its going to be that many.

-- 
Mark Miller
about.me/markrmiller

On March 16, 2014 at 9:45:42 AM, David Smiley (@MITRE.org) (dsmi...@mitre.org) 
wrote:

Sorry for all the email spam last night, folks.  

I Released Lucene  Solr 4.7 in JIRA last night. I updated the  
instructions here  
https://wiki.apache.org/lucene-java/ReleaseTodo#Update_JIRA to explicitly  
indicate *not* to have JIRA bump the Fix-version values.  

~ David  



-  
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book  
--  
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545.html  
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.  

-  
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org  
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Sorry for JIRA spam

2014-03-17 Thread David Smiley (@MITRE.org)

What I mean is that when you click the “Release” menu choice next to the 
version, JIRA optionally asks you if it should bump the fix versions (I forget 
the precise language).  It didn’t say that in doing so it would send a ton of 
email, and it didn’t give an option to suppress email.  Separately from this, 
the release instructions we have in our wiki describe how to advance the 
fix-versions in a way that suppresses email.  But if beforehand you let JIRA do 
it for you with just one-click as part of releasing the version, then it’ll 
send out the mass email.
~ David

From: Mark Miller-3 [via Lucene] 
ml-node+s472066n4124847...@n3.nabble.commailto:ml-node+s472066n4124847...@n3.nabble.com
Date: Monday, March 17, 2014 at 11:41 AM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org
Subject: Re: Sorry for JIRA spam

I don’t think it’s bad to have JIRA bump the fix version at all - you just want 
to supress an individual email for each change if its going to be that many.

--
Mark Miller
about.me/markrmiller


On March 16, 2014 at 9:45:42 AM, David Smiley (@MITRE.org) ([hidden 
email]/user/SendEmail.jtp?type=nodenode=4124847i=0) wrote:

Sorry for all the email spam last night, folks.

I Released Lucene  Solr 4.7 in JIRA last night. I updated the
instructions here
https://wiki.apache.org/lucene-java/ReleaseTodo#Update_JIRA to explicitly
indicate *not* to have JIRA bump the Fix-version values.

~ David



-
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: [hidden 
email]/user/SendEmail.jtp?type=nodenode=4124847i=1
For additional commands, e-mail: [hidden 
email]/user/SendEmail.jtp?type=nodenode=4124847i=2




If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545p4124847.html
To unsubscribe from Sorry for JIRA spam, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4124545code=RFNNSUxFWUBtaXRyZS5vcmd8NDEyNDU0NXwxMDE2NDI2OTUw.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545p4124848.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

Lucene/Solr 4.7.1

2014-03-17 Thread Steve Rowe

I’d like to make a 4.7.1 release.  I’ve committed SOLR-5875 to the 
lucene_solr_4_7 branch; I think it definitely warrants a bugfix release.

I propose making an RC in one week: Monday March 24.

Steve
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard

2014-03-17 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937946#comment-13937946
 ] 

Erick Erickson commented on SOLR-5875:
--

Hmmm, is it possible to have the original person who posted the problem give it 
a test run? For something like this it'd be good to have some proof that if 
fixes the problem described.

Just a thought.

Erick

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.8, 5.0, 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5052) bitset codec for off heap filters

2014-03-17 Thread Mikhail Khludnev (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937947#comment-13937947
]

Mikhail Khludnev commented on LUCENE-5052:
--

bq. it'd be better if the postings format wrapped another postings format, and
then only used the bitset when the docFreq was high enough

There are two orthogonal conceptions:
* particular format - let's generalize bitset format to no-tf format, and
use WAH8, Elas-Fano with off-heap access (TODO). Thus, it works for spare
postings;
* API - how consumer can express his intention to use no-tf format? e.g.
TermFilter or TermsEnum.docs() with special flag;

I'd like to clarify use-case for this issue (issue summary might need to be
improved). It aims Solr's fq or even Heliosearch's GC-lightness. I suppose that
user can decide which fields to index with no-tf format, these are string
fields. Then, user requests filtering for these fields, no scoring is needed,
for sure.

[~mikemccand]
Hence, I don't think than conditional conditional triggering is a good choice,
however I don't know how to do it. I might not understand well how pulsing
codec is used (impl idea is clear, though), can you point me on its' usage.

Thanks!

bitset codec for off heap filters
-

Key: LUCENE-5052
URL: https://issues.apache.org/jira/browse/LUCENE-5052
Project: Lucene - Core
Issue Type: New Feature
Components: core/codecs
Reporter: Mikhail Khludnev
Labels: features
Fix For: 5.0

Attachments: LUCENE-5052.patch, bitsetcodec.zip, bitsetcodec.zip

Colleagues,
When we filter we don’t care any of scoring factors i.e. norms, positions,
tf, but it should be fast. The obvious way to handle this is to decode
postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet).
Both of consuming a heap and decoding as well are expensive.
Let’s write a posting list as a bitset, if df is greater than segment's
maxdocs/8 (what about skiplists? and overall performance?).
Beside of the codec implementation, the trickiest part to me is to design API
for this. How we can let the app know that a term query don’t need to be
cached in heap, but can be held as an mmaped bitset?
WDYT?

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard


[ 
https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937953#comment-13937953
 ] 

Steve Rowe commented on SOLR-5875:
--

Erick, as I mentioned above, [~alexey] gave it a test run and it fixed the 
memory issue he saw.

 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc 
 instead of once per shard
 ---

 Key: SOLR-5875
 URL: https://issues.apache.org/jira/browse/SOLR-5875
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Critical
 Fix For: 4.8, 5.0, 4.7.1

 Attachments: SOLR-5875.patch


 SOLR-5354 added unmarshalling of distributed sort field values in 
 {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling 
 all docs' sort field values) for every doc, and stores the result with each 
 doc.  This is unnecessary, inefficient, and extremely wasteful of memory.
 In an offline conversation, [~alexey] described the issue to me and located 
 the likely problem, and [~hossman_luc...@fucit.org] located the problem code 
 via inspection.
 This bug is very likely the problem described on the solr-user mailing list 
 here: [SolrCloud constantly crashes after upgrading to Solr 
 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5796) With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a conflicting state about who is the lead

2014-03-17 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937957#comment-13937957
]

Mark Miller commented on SOLR-5796:
---

Do we have a JIRA issue for the instability you mention in the failover? I can
guess what it is... but we should track it and harden it.

With many collections, leader re-election takes too long when a node dies or
is rebooted, leading to some shards getting into a conflicting state about
who is the leader.

Key: SOLR-5796
URL: https://issues.apache.org/jira/browse/SOLR-5796
Project: Solr
Issue Type: Bug
Components: SolrCloud
Environment: Found on branch_4x
Reporter: Timothy Potter
Assignee: Mark Miller
Fix For: 4.8, 5.0

Attachments: SOLR-5796.patch

I'm doing some testing with a 4-node SolrCloud cluster against the latest rev
in branch_4x having many collections, 150 to be exact, each having 4 shards
with rf=3, so 450 cores per node. Nodes are decent in terms of resources:
-Xmx6g with 4 CPU - m3.xlarge's in EC2.
The problem occurs when rebooting one of the nodes, say as part of a rolling
restart of the cluster. If I kill one node and then wait for an extended
period of time, such as 3 minutes, then all of the leaders on the downed node
(roughly 150) have time to failover to another node in the cluster. When I
restart the downed node, since leaders have all failed over successfully, the
new node starts up and all cores assume the replica role in their respective
shards. This is goodness and expected.
However, if I don't wait long enough for the leader failover process to
complete on the other nodes before restarting the downed node,
then some bad things happen. Specifically, when the dust settles, many of the
previous leaders on the node I restarted get stuck in the conflicting state
seen in the ZkController, starting around line 852 in branch_4x:
{quote}
852 while (!leaderUrl.equals(clusterStateLeaderUrl)) {
853 if (tries == 60) {
854 throw new SolrException(ErrorCode.SERVER_ERROR,
855 There is conflicting information about the leader of
shard:
856 + cloudDesc.getShardId() + our state says:
857 + clusterStateLeaderUrl + but zookeeper says: +
leaderUrl);
858 }
859 Thread.sleep(1000);
860 tries++;
861 clusterStateLeaderUrl = zkStateReader.getLeaderUrl(collection,
shardId,
862 timeoutms);
863 leaderUrl = getLeaderProps(collection, cloudDesc.getShardId(),
timeoutms)
864 .getCoreUrl();
865 }
{quote}
As you can see, the code is trying to give a little time for this problem to
work itself out, 1 minute to be exact. Unfortunately, that doesn't seem to be
long enough for a busy cluster that has many collections. Now, one might
argue that 450 cores per node is asking too much of Solr, however I think
this points to a bigger issue of the fact that a node coming up isn't aware
that it went down and leader election is running on other nodes and is just
being slow. Moreover, once this problem occurs, it's not clear how to fix it
besides shutting the node down again and waiting for leader failover to
complete.
It's also interesting to me that /clusterstate.json was updated by the
healthy node taking over the leader role but the
/collections/collleaders/shard# was not updated? I added some debugging and
it seems like the overseer queue is extremely backed up with work.
Maybe the solution here is to just wait longer but I also want to get some
feedback from the community on other options? I know there are some plans to
help scale the Overseer (i.e. SOLR-5476) so maybe that helps and I'm trying
to add more debug to see if this is really due to overseer backlog (which I
suspect it is).
In general, I'm a little confused by the keeping of leader state in multiple
places in ZK. Is there any background information on why we have leader state
in /clusterstate.json and in the leader path znode?
Also, here are some interesting side observations:
a. If I use rf=2, then this problem doesn't occur as leader failover happens
more quickly and there's less overseer work?
May be a red herring here, but I can consistently reproduce it with RF=3, but
not with RF=2 ... suppose that is because there are only 300 cores per node
versus 450 and that's just enough less work to make this issue work itself
out.
b. To support that many cores, I had to set -Xss256k to reduce the stack size
as Solr uses a lot of threads during startup (high point was 800'ish)

[jira] [Commented] (SOLR-5800) Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.


[ 
https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937968#comment-13937968
 ] 

ASF subversion and git services commented on SOLR-5800:
---

Commit 1578444 from [~steffkes] in branch 'dev/branches/lucene_solr_4_7'
[ https://svn.apache.org/r1578444 ]

SOLR-5800: Admin UI - Analysis form doesn't render results correctly when a 
CharFilter is used (merge r1576652)

 Admin UI - Analysis form doesn't render results correctly when a CharFilter 
 is used.
 

 Key: SOLR-5800
 URL: https://issues.apache.org/jira/browse/SOLR-5800
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.7
Reporter: Timothy Potter
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.8, 5.0, 4.7.1

 Attachments: SOLR-5800-sample.json, SOLR-5800.patch


 I have an example in Solr In Action that uses the
 PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0.
 Specifically, the fieldType is:
 fieldType name=text_microblog class=solr.TextField
 positionIncrementGap=100
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=([a-zA-Z])\1+
 replacement=$1$1/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 splitOnCaseChange=0
 splitOnNumerics=0
 stemEnglishPossessive=1
 preserveOriginal=0
 catenateWords=1
 generateNumberParts=1
 catenateNumbers=0
 catenateAll=0
 types=wdfftypes.txt/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
 filter class=solr.KStemFilterFactory/
   /analyzer
 /fieldType
 The PatternReplaceCharFilterFactory (PRCF) is used to collapse
 repeated letters in a term down to a max of 2, such as #yu would
 be #yumm
 When I run some text through this analyzer using the Analysis form,
 the output is as if the resulting text is unavailable to the
 tokenizer. In other words, the only results being displayed in the
 output on the form is for the PRCF
 This example stopped working in 4.7.0 and I've verified it worked
 correctly in 4.6.1.
 Initially, I thought this might be an issue with the actual analysis,
 but the analyzer actually works when indexing / querying. Then,
 looking at the JSON response in the Developer console with Chrome, I
 see the JSON that comes back includes output for all the components in
 my chain (see below) ... so looks like a UI rendering issue to me?
 {responseHeader:{status:0,QTime:24},analysis:{field_types:{text_microblog:{index:[org.apache.lucene.analysis.pattern.PatternReplaceCharFilter,#Yumm
 :) Drinking a latte at Caffe Grecco in SF's historic North Beach...
 Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad
 foo5,org.apache.lucene.analysis.core.WhitespaceTokenizer,[{text:#Yumm,raw_bytes:[23
 59 75 6d 
 6d],start:0,end:6,position:1,positionHistory:[1],type:word},{text::),raw_bytes:[3a
 29],start:7,end:9,position:2,positionHistory:[2],type:word},{text:Drinking,raw_bytes:[44
 72 69 6e 6b 69 6e
 67],start:10,end:18,position:3,positionHistory:[3],type:word},{text:a,raw_bytes:[61],start:19,end:20,position:4,positionHistory:[4],type:word},{text:latte,raw_bytes:[6c
  ...
 the JSON returned to the browser has evidence that the full analysis chain 
 was applied, so this seems to just be a rendering issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5800) Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.

2014-03-17 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-5800:


Fix Version/s: 4.7.1

 Admin UI - Analysis form doesn't render results correctly when a CharFilter 
 is used.
 

 Key: SOLR-5800
 URL: https://issues.apache.org/jira/browse/SOLR-5800
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.7
Reporter: Timothy Potter
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.8, 5.0, 4.7.1

 Attachments: SOLR-5800-sample.json, SOLR-5800.patch


 I have an example in Solr In Action that uses the
 PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0.
 Specifically, the fieldType is:
 fieldType name=text_microblog class=solr.TextField
 positionIncrementGap=100
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=([a-zA-Z])\1+
 replacement=$1$1/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 splitOnCaseChange=0
 splitOnNumerics=0
 stemEnglishPossessive=1
 preserveOriginal=0
 catenateWords=1
 generateNumberParts=1
 catenateNumbers=0
 catenateAll=0
 types=wdfftypes.txt/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
 filter class=solr.KStemFilterFactory/
   /analyzer
 /fieldType
 The PatternReplaceCharFilterFactory (PRCF) is used to collapse
 repeated letters in a term down to a max of 2, such as #yu would
 be #yumm
 When I run some text through this analyzer using the Analysis form,
 the output is as if the resulting text is unavailable to the
 tokenizer. In other words, the only results being displayed in the
 output on the form is for the PRCF
 This example stopped working in 4.7.0 and I've verified it worked
 correctly in 4.6.1.
 Initially, I thought this might be an issue with the actual analysis,
 but the analyzer actually works when indexing / querying. Then,
 looking at the JSON response in the Developer console with Chrome, I
 see the JSON that comes back includes output for all the components in
 my chain (see below) ... so looks like a UI rendering issue to me?
 {responseHeader:{status:0,QTime:24},analysis:{field_types:{text_microblog:{index:[org.apache.lucene.analysis.pattern.PatternReplaceCharFilter,#Yumm
 :) Drinking a latte at Caffe Grecco in SF's historic North Beach...
 Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad
 foo5,org.apache.lucene.analysis.core.WhitespaceTokenizer,[{text:#Yumm,raw_bytes:[23
 59 75 6d 
 6d],start:0,end:6,position:1,positionHistory:[1],type:word},{text::),raw_bytes:[3a
 29],start:7,end:9,position:2,positionHistory:[2],type:word},{text:Drinking,raw_bytes:[44
 72 69 6e 6b 69 6e
 67],start:10,end:18,position:3,positionHistory:[3],type:word},{text:a,raw_bytes:[61],start:19,end:20,position:4,positionHistory:[4],type:word},{text:latte,raw_bytes:[6c
  ...
 the JSON returned to the browser has evidence that the full analysis chain 
 was applied, so this seems to just be a rendering issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 4.7.1

2014-03-17 Thread Mark Miller

Sounds good to me.
-- 
Mark Miller
about.me/markrmiller

On March 17, 2014 at 11:53:14 AM, Steve Rowe (sar...@gmail.com) wrote:

I’d like to make a 4.7.1 release. I’ve committed SOLR-5875 to the 
lucene_solr_4_7 branch; I think it definitely warrants a bugfix release.  

I propose making an RC in one week: Monday March 24.  

Steve  
-  
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org  
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 4.7.1

2014-03-17 Thread Stefan Matheis

Thanks for doing this Steve! I've merged SOLR-5800 to the branch  

-Stefan  


On Monday, March 17, 2014 at 4:52 PM, Steve Rowe wrote:

 I’d like to make a 4.7.1 release. I’ve committed SOLR-5875 to the 
 lucene_solr_4_7 branch; I think it definitely warrants a bugfix release.
  
 I propose making an RC in one week: Monday March 24.
  
 Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
 (mailto:dev-unsubscr...@lucene.apache.org)
 For additional commands, e-mail: dev-h...@lucene.apache.org 
 (mailto:dev-h...@lucene.apache.org)

[jira] [Commented] (SOLR-5873) Improve JavaBinCodec's backward compatibility tests


[ 
https://issues.apache.org/jira/browse/SOLR-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937985#comment-13937985
 ] 

Mark Miller commented on SOLR-5873:
---

Wrong Mark Miller pinged ;) I'm one of the last ones that comes up - 
markrmil...@gmail.com username rather than hakeber.

 Improve JavaBinCodec's backward compatibility tests
 ---

 Key: SOLR-5873
 URL: https://issues.apache.org/jira/browse/SOLR-5873
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker

 SOLR-5265 added backward compatibility tests, but it tries to read a 
 pre-written binary file to check if there is a break a not. If we add more 
 types to JavaBinCodec the test will need to be updated too, which will be 
 error prone again.
 This is what [~hakeber] proposed on IRC - 
 - A test that I was thinking of: we could have a jenkins job that ran a 
 script that checked out the previous version of lucene and the the latest
 - Then use the solr/cloud-dev scripts to start a cloud cluster
 - Index some docs
 - Stop a node at a time, replace webapp with the latest in a rolling upgrade 
 fashion
 - Then we have a full rolling upgrade test
 This would be a better approach for back compat tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5837) Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField.