[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103601#comment-17103601
 ] 

David Smiley commented on SOLR-13289:
-

Should we really add {{numFoundExact="true"}} on responses where the user 
didn't even specify a parameter to control this new feature? I prefer not 
adding the noise.

I like the name {{numFoundExact}} in the response compared to others we 
explored – a last minute change I see. Wouldn't we want the controlling 
parameter to use "numFound" likewise instead of "hits"? I propose 
{{minNumFoundToBeExact}}. The word "hits" isn't particularly widespread in 
Solr, except for cache hits.

I spent some time today reviewing what you pushed more closely, and especially 
testing my theory that there is a problem with interactions with the Collapse 
PostFilter/Collector. +There is, albeit not a big problem.+ Essentially the 
Collapse PostFilter must see and cache all docs before passing those it deems 
appropriate on to the rest of the collectors. TopDocs Collector is downstream 
of it, and TDC tries to tell the Scorer to do approximation stuff but it is in 
vain because by this point, all the docs are already accumulated cached with 
Collapse. Other than a possible waste in computation, it ultimately results in 
Solr saying that the results weren't exact when they are actually exact. 
 I pushed a commit to my fork to demonstrate the problem:
 
[https://github.com/dsmiley/lucene-solr/commit/8803db97a5e4deb0ad5f3bdaabd02cd3b302a09f]
 Interestingly I see some other test failures there.

I think the solution is in 
{{org.apache.solr.search.SolrIndexSearcher#getDocListNC}} in the second half of 
the method ({{lastDocRequested <= 0}} i.e. top-X results case), right before 
{{buildTopDocsCollector}} in invoked, set 
{{cmd.setMinExactHits(Integer.MAX_VALUE);}} only if {{pf.postFilter.scoreMode}} 
isn't null and isn't TOP_SCORES, thus it's one of the two COMPLETE options. 
COMPLETE means the Scorer needs yield all matching docs.

> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8716) Test logging can bleed from one suite to another, cause failures due to sysout limits

2020-05-09 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103497#comment-17103497
 ] 

Erick Erickson commented on LUCENE-8716:


This is still a puzzle, I have a filter set up to collect failures that mention 
this limit. I just upgraded log4j2 to 2.13.2, see:  SOLR-14466. I'll start 
looking again at these failures if/when more come through.

> Test logging can bleed from one suite to another, cause failures due to 
> sysout limits
> -
>
> Key: LUCENE-8716
> URL: https://issues.apache.org/jira/browse/LUCENE-8716
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Chris M. Hostetter
>Assignee: Erick Erickson
>Priority: Major
> Attachments: thetaphi_Lucene-Solr-master-Linux_23743.log.txt
>
>
> in solr land, {{HLLUtilTest}} is an incredibly tiny, simple, test that tests 
> a utility method w/o using any other solr features or doing any logging - as 
> such it extends {{LuceneTestCase}} directly, and doesn't use any of the 
> typical solr test framework/plumbing or {{@SuppressSysoutChecks}}
> on a recent jenkins build, {{HLLUtilTest}} failed due to too much sysoutput 
> -- all of which seems to have come from the previous test run on that JVM -- 
> {{TestStressReorder}} -- suggesting that somehow the sysout from one test 
> suite can bleed over into the next suite?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13075) Harden SaslZkACLProviderTest.

2020-05-09 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103493#comment-17103493
 ] 

Erick Erickson commented on SOLR-13075:
---

It's pretty clear I won't  get to this, so unassigning.

> Harden SaslZkACLProviderTest.
> -
>
> Key: SOLR-13075
> URL: https://issues.apache.org/jira/browse/SOLR-13075
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Mark Miller
>Priority: Major
> Attachments: SOLR-13075.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13075) Harden SaslZkACLProviderTest.

2020-05-09 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-13075:
-

Assignee: (was: Erick Erickson)

> Harden SaslZkACLProviderTest.
> -
>
> Key: SOLR-13075
> URL: https://issues.apache.org/jira/browse/SOLR-13075
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Mark Miller
>Priority: Major
> Attachments: SOLR-13075.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1501: SOLR-13289: Add Refguide changes

2020-05-09 Thread GitBox


dsmiley commented on a change in pull request #1501:
URL: https://github.com/apache/lucene-solr/pull/1501#discussion_r422539014



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -361,3 +361,42 @@ This is what happens if a similar request is sent that 
adds `echoParams=all` to
   }
 }
 
+
+== minExactHits Parameter

Review comment:
   Wouldn't "minNumFoundExact" be a better parameter name because it aligns 
with the "numFound" that we expose?  I searched for "hits" in Solr string 
literals and the ref guide.  It's not used much; more often refers to cache 
hits.

##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -361,3 +361,42 @@ This is what happens if a similar request is sent that 
adds `echoParams=all` to
   }
 }
 
+
+== minExactHits Parameter
+When this parameter is used, Solr will count the number of hits accurately at 
least until this value. After that, Solr can skip over documents that don't 
have a score high enough to enter in the top N. This can greatly improve 
performance o search queries. On the other hand, when this parameter is used, 
the `numFound` may not be exact, and may instead be an approximation.
+The `numFoundExact` boolean attribute is included in all responses, indicating 
if the `numFound` value is exact or an approximation. If it's an approximation, 
the real number of hits for the query is guaranteed to be greater or equal 
`numFound`.
+
+More about approximate document counting and `minExactHits`:
+* The documents returned in the response are guaranteed to be the docs with 
the top scores. This parameter will not skip documents that are to be returned 
in the response, it will only skip counting docs that, while they match the 
query, their score is low enough to not be in the top N.
+* Providing `minExactHits` doesn't guarantee that Solr will use approximate 
hit counting (and thus, provide the speedup). Some types of queries, or other 
parameters (like if facets are requested) will require accurate counting. The 
value of `numFoundExact` indicates if the approximation was used or not.
+* Approximate counting can only be used when sorting by `score desc` first 
(which is the default sort in Solr). Other fields can be used after `score 
desc`, but if any other type of sorting is used before score, then the 
approximation won't be applied.
+* When doing distributed queries across multiple shards, each shard will 
accurately count hits until `minExactHits` (which means the query could be 
hitting `numShards * minExactHits` docs and `numFound` in the response would 
still be accurate)  
+For example:
+
+[source,text]
+q=quick brown fox=100=10
+
+[source,json]
+
+"response": {
+"numFound": 153,
+"start": 0,
+"hitCountExact": false,

Review comment:
   didn't we agree on "numFoundExact" ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14426) forbidden api error during precommit DateMathFunction

2020-05-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103452#comment-17103452
 ] 

David Smiley commented on SOLR-14426:
-

Moving these classes to within another class disturbed the API (i.e. the very 
identity of these classes).  If it's only used within the same source file then 
this makes sense to me, but not otherwise.  The other solution I don't see 
discussed here is giving them their own source file.  When I look at 
FaceContext in particular, it appears it's used widely and thus would make a 
good top level class, not inside FacetRequest.  Please revisit this Mike.

> forbidden api error during precommit DateMathFunction
> -
>
> Key: SOLR-14426
> URL: https://issues.apache.org/jira/browse/SOLR-14426
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Build
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When running `./gradlew precommit` I'll occasionally see
> {code}
> * What went wrong:
> Execution failed for task ':solr:contrib:analytics:forbiddenApisMain'.
> > de.thetaphi.forbiddenapis.ForbiddenApiException: Check for forbidden API 
> > calls failed while scanning class 
> > 'org.apache.solr.analytics.function.mapping.DateMathFunction' 
> > (DateMathFunction.java): java.lang.ClassNotFoundException: 
> > org.apache.solr.analytics.function.mapping.DateMathValueFunction (while 
> > looking up details about referenced class 
> > 'org.apache.solr.analytics.function.mapping.DateMathValueFunction')
> {code}
> `./gradlew clean` fixes this, but I don't understand what or why this 
> happens. Feels like a gradle issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10562) TestSolrCLIRunExample failures indicating documents just indexed are not all searchable.

2020-05-09 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-10562.
---
Resolution: Fixed

This test hasn't failed since September of 2019, so I'll close it. Apparently 
SOLR-11035 and, perhaps, other changes since fixed this.

> TestSolrCLIRunExample failures indicating documents just indexed are not all 
> searchable.
> 
>
> Key: SOLR-10562
> URL: https://issues.apache.org/jira/browse/SOLR-10562
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 6.6, 7.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: 1_1.res, 1_2.res, 2_1.res, 2_2.res, debug.patch, 
> runcli_12.log
>
>
> I've been beating the heck out of some test cases for fear that
> SOLR-10007 really messed things up and I can get a pretty regular test
> failure for TestSolrCLIRunExample.testInteractiveSolrCloudExample, but
> it doesn't make sense.
> So I went back to a revision _before_ SOLR-10007 and it still fails.
> But the failure is "impossible". I put a bunch of log.error messages
> in and, for experimental purposes a for loop in the test. Here's the
> lines that fail in the original:
> {code}
> for (idx = 0; idx < 10; ++idx) {
>  construct a SolrInputDoc and then:
>   cloudClient.add(SolrInputDoc);
> }
> cloudClient.commit();
> QueryResponse qr = cloudClient.query(new SolrQuery("str_s:a"));
> if (qr.getResults().getNumFound() != numDocs) {
>   fail("Expected "+numDocs+" to be found in the "+collectionName+
>   " collection but only found "+qr.getResults().getNumFound());
> }
> {code}
> If I put the above (not the commit, just the query and the test) in a
> loop and check the query 10 times with a 1 second sleep if the numDocs
> != getNumFound(). Quite regularly I'll see a message in the log file
> that getNumFound() != numDocs, but after a few loops getNumFound() ==
> numDocs and the test succeeds.
> cloudClient is what you'd expect:
> cloudClient = 
> getCloudSolrClient(executor.solrCloudCluster.getZkServer().getZkAddress());
> So unless I'm hallucinating, any tests that rely on
> cloudClient.commit() insuring that all docs sent to the cluster are
> searchable will potentially fail on occasion.
> I looked over the JIRAs briefly and don't see any mentions, of a
> similar problem, but I may have missed it.
> The logging I'm writing from the update handler _seems_ to show it to be 
> doing the right thing. Just late.
> I'll attach some data along with a "patch" which generates the logging 
> information. I also attempted to submit a single batch rather than 10 
> individual docs and that fails too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12301) Umbrella issue: paramaterize logging calls in Solr, use consistent naming conventions for the logger

2020-05-09 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-12301.
---
Fix Version/s: 8.6
   Resolution: Fixed

This will be slightly untidy for the rest of Solr 8x. LUCENE-7788 fixed all the 
logging calls in both master and 8.6 as per this JIRA. 

However, only the master gradle "check" and "precommit" actions have checks 
that will flag these kinds of problems and fail, so in the interim some of 
these may creep back into future 8x releases. This shouldn't be much of a 
problem for any code that has gradle check or precommit run before merging into 
8x.

Back-porting to Ant is more trouble that this issue is worth given 8x has been 
cleaned up.

> Umbrella issue: paramaterize logging calls in Solr, use consistent naming 
> conventions for the logger
> 
>
> Key: SOLR-12301
> URL: https://issues.apache.org/jira/browse/SOLR-12301
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>
> See the discussion at SOLR-12286 for a lot of background, but the short form 
> is that logging calls of the form
> log.info("somehting" + "something");
>  and
>  log.info("soemthing {}", object.someFunction());
> where someFunction includes toString()
> generate useless garbage/work even when the message is not printed.
> log.info("somehting {}", "something");
>  and
>  log.info("soemthing {}", object);
> do not. The first form is something of a relic, and there are even some uses 
> of the second.
> This will entail a LOT of changes, but almost all secretarial. I'll take it 
> in chunks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-11934) Visit Solr logging, it's too noisy.

2020-05-09 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-11934.
---
Fix Version/s: 8.6
   Resolution: Fixed

After going around a few times, the egregious samples I analyzed were ones that 
fired off very frequent updates of single records, sometimes followed by 
external commits so _of course_ the ratio of update messages to total messages 
was super-high. For ill-behaved applications like this, our advice should be to 
set the particular classes that report huge numbers of messages to "WARN" level.

That said, opening a new searcher generated 5-6 different messages at INFO 
level, which is unnecessary. All but one of them has been changed to log at 
DEBUG level and the one remaining altered to include the autowarm time.

> Visit Solr logging, it's too noisy.
> ---
>
> Key: SOLR-11934
> URL: https://issues.apache.org/jira/browse/SOLR-11934
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we have way too much INFO level logging. Or, perhaps more correctly, 
> Solr logging needs to be examined and messages logged at an appropriate level.
> We log every update at an INFO level for instance. But I think we log LIR at 
> INFO as well. As a sysadmin I don't care to have my logs polluted with a 
> message for every update, but if I'm trying to keep my system healthy I want 
> to see LIR messages and try to understand why.
> Plus, in large installations logging at INFO level is creating a _LOT_ of 
> files.
> What I want to discuss on this JIRA is
> 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE 
> levels?
> 2> Who's the audience at each level? For a running system that's functioning, 
> sysops folks would really like WARN messages that mean something need 
> attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? 
> TRACE?
> So let's say we get some kind of agreement as to the above. Then I propose 
> three things
> 1> Someone (and probably me but all help gratefully accepted) needs to go 
> through our logging and assign appropriate levels. This will take quite a 
> while, I intend to work on it in small chunks.
> 2> Actually answer whether unnecessary objects are created when something 
> like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this 
> independent on the logging implementation used? The SLF4J and log4j seem a 
> bit contradictory.
> 3> Maybe regularize log, logger, LOG as variable names, but that's a nit.
> As a tactical approach, I suggest we tag each LoggerFactory.getLogger in 
> files we work on with //SOLR-(whatever number is assigned when I create 
> this). We can remove them all later, but since I expect to approach this 
> piecemeal it'd be nice to keep track of which files have been done already.
> Finally, I really really really don't want to do this all at once. There are 
> 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting 
> now it would probably span the 7.3 release.
> This will probably be an umbrella issue so we can keep all the commits 
> straight and people can volunteer to "fix the files in core" as a separate 
> piece of work (hint).
> There are several existing JIRAs about logging in general, let's link them in 
> here as well.
> Let the discussion begin!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.

2020-05-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103371#comment-17103371
 ] 

ASF subversion and git services commented on SOLR-11934:


Commit 67be31cdcf147630835ac9346ee72e934b124f39 in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=67be31c ]

SOLR-11934: Visit Solr logging, it's too noisy.


> Visit Solr logging, it's too noisy.
> ---
>
> Key: SOLR-11934
> URL: https://issues.apache.org/jira/browse/SOLR-11934
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we have way too much INFO level logging. Or, perhaps more correctly, 
> Solr logging needs to be examined and messages logged at an appropriate level.
> We log every update at an INFO level for instance. But I think we log LIR at 
> INFO as well. As a sysadmin I don't care to have my logs polluted with a 
> message for every update, but if I'm trying to keep my system healthy I want 
> to see LIR messages and try to understand why.
> Plus, in large installations logging at INFO level is creating a _LOT_ of 
> files.
> What I want to discuss on this JIRA is
> 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE 
> levels?
> 2> Who's the audience at each level? For a running system that's functioning, 
> sysops folks would really like WARN messages that mean something need 
> attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? 
> TRACE?
> So let's say we get some kind of agreement as to the above. Then I propose 
> three things
> 1> Someone (and probably me but all help gratefully accepted) needs to go 
> through our logging and assign appropriate levels. This will take quite a 
> while, I intend to work on it in small chunks.
> 2> Actually answer whether unnecessary objects are created when something 
> like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this 
> independent on the logging implementation used? The SLF4J and log4j seem a 
> bit contradictory.
> 3> Maybe regularize log, logger, LOG as variable names, but that's a nit.
> As a tactical approach, I suggest we tag each LoggerFactory.getLogger in 
> files we work on with //SOLR-(whatever number is assigned when I create 
> this). We can remove them all later, but since I expect to approach this 
> piecemeal it'd be nice to keep track of which files have been done already.
> Finally, I really really really don't want to do this all at once. There are 
> 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting 
> now it would probably span the 7.3 release.
> This will probably be an umbrella issue so we can keep all the commits 
> straight and people can volunteer to "fix the files in core" as a separate 
> piece of work (hint).
> There are several existing JIRAs about logging in general, let's link them in 
> here as well.
> Let the discussion begin!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.

2020-05-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103370#comment-17103370
 ] 

ASF subversion and git services commented on SOLR-11934:


Commit 15be0db58696d379c6f7e1a6d0afa18dd7cdd43d in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=15be0db ]

SOLR-11934: Visit Solr logging, it's too noisy.


> Visit Solr logging, it's too noisy.
> ---
>
> Key: SOLR-11934
> URL: https://issues.apache.org/jira/browse/SOLR-11934
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we have way too much INFO level logging. Or, perhaps more correctly, 
> Solr logging needs to be examined and messages logged at an appropriate level.
> We log every update at an INFO level for instance. But I think we log LIR at 
> INFO as well. As a sysadmin I don't care to have my logs polluted with a 
> message for every update, but if I'm trying to keep my system healthy I want 
> to see LIR messages and try to understand why.
> Plus, in large installations logging at INFO level is creating a _LOT_ of 
> files.
> What I want to discuss on this JIRA is
> 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE 
> levels?
> 2> Who's the audience at each level? For a running system that's functioning, 
> sysops folks would really like WARN messages that mean something need 
> attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? 
> TRACE?
> So let's say we get some kind of agreement as to the above. Then I propose 
> three things
> 1> Someone (and probably me but all help gratefully accepted) needs to go 
> through our logging and assign appropriate levels. This will take quite a 
> while, I intend to work on it in small chunks.
> 2> Actually answer whether unnecessary objects are created when something 
> like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this 
> independent on the logging implementation used? The SLF4J and log4j seem a 
> bit contradictory.
> 3> Maybe regularize log, logger, LOG as variable names, but that's a nit.
> As a tactical approach, I suggest we tag each LoggerFactory.getLogger in 
> files we work on with //SOLR-(whatever number is assigned when I create 
> this). We can remove them all later, but since I expect to approach this 
> piecemeal it'd be nice to keep track of which files have been done already.
> Finally, I really really really don't want to do this all at once. There are 
> 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting 
> now it would probably span the 7.3 release.
> This will probably be an umbrella issue so we can keep all the commits 
> straight and people can volunteer to "fix the files in core" as a separate 
> piece of work (hint).
> There are several existing JIRAs about logging in general, let's link them in 
> here as well.
> Let the discussion begin!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9365) Fuzzy query has a false negative when prefix length == search term length

2020-05-09 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103280#comment-17103280
 ] 

Adrien Grand commented on LUCENE-9365:
--

bq. Maybe we should disallow prefix == term.text().length() for FuzzyQuery?  It 
is sort of strange to use FuzzyQuery in this way

I guess that the downside of this approach is that it pushes the burden to the 
application? I would expect users to have a global prefix length that they 
apply to all fuzzy queries. Disallowing prefix >= term.length would require 
consumers of the API to check the length of the string that they are searching 
for, which would be prone to the usual errors, e.g. should it be counting java 
chars or unicode code points.

bq. FuzzyQuery currently has checks for `prefix >= termLength` and collapses to 
a SingleTermEnum for that case.

Maybe that's the bug and it should only do it when the prefix length is 
strictly greater than the term length?

> Fuzzy query has a false negative when prefix length == search term length 
> --
>
> Key: LUCENE-9365
> URL: https://issues.apache.org/jira/browse/LUCENE-9365
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Reporter: Mark Harwood
>Priority: Major
>
> When using FuzzyQuery the search string `bba` does not match doc value `bbab` 
> with an edit distance of 1 and prefix length of 3.
> In FuzzyQuery an automaton is created for the "suffix" part of the search 
> string which in this case is an empty string.
> In this scenario maybe the FuzzyQuery should rewrite to a WildcardQuery of 
> the following form :
> {code:java}
> searchString + "?" 
> {code}
> .. where there's an appropriate number of ? characters according to the edit 
> distance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9328) SortingGroupHead to reuse DocValues

2020-05-09 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103189#comment-17103189
 ] 

Lucene/Solr QA commented on LUCENE-9328:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m  
8s{color} | {color:green} grouping in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m  9s{color} 
| {color:red} join in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
11s{color} | {color:green} queries in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m  5s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | lucene.search.join.TestBlockJoinSelector |
|   | solr.TestGroupingSearch |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9328 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13002452/LUCENE-9328.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / d9f9d6dd47c |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/272/artifact/out/patch-unit-lucene_join.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/272/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/272/testReport/ |
| modules | C: lucene/grouping lucene/join lucene/queries lucene/test-framework 
solr/core U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/272/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> SortingGroupHead to reuse DocValues
> ---
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] CaoManhDat commented on a change in pull request #1470: SOLR-14354: Async or using threads in better way for HttpShardHandler

2020-05-09 Thread GitBox


CaoManhDat commented on a change in pull request #1470:
URL: https://github.com/apache/lucene-solr/pull/1470#discussion_r422459139



##
File path: 
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
##
@@ -130,77 +134,64 @@ public void submit(final ShardRequest sreq, final String 
shard, final Modifiable
 final Tracer tracer = GlobalTracer.getTracer();
 final Span span = tracer != null ? tracer.activeSpan() : null;
 
-Callable task = () -> {
+params.remove(CommonParams.WT); // use default (currently javabin)
+params.remove(CommonParams.VERSION);
+QueryRequest req = makeQueryRequest(sreq, params, shard);
+req.setMethod(SolrRequest.METHOD.POST);
 
-  ShardResponse srsp = new ShardResponse();
-  if (sreq.nodeName != null) {
-srsp.setNodeName(sreq.nodeName);
-  }
-  srsp.setShardRequest(sreq);
-  srsp.setShard(shard);
-  SimpleSolrResponse ssr = new SimpleSolrResponse();
-  srsp.setSolrResponse(ssr);
-  long startTime = System.nanoTime();
+LBSolrClient.Req lbReq = 
httpShardHandlerFactory.newLBHttpSolrClientReq(req, urls);
+
+ShardResponse srsp = new ShardResponse();
+if (sreq.nodeName != null) {
+  srsp.setNodeName(sreq.nodeName);
+}
+srsp.setShardRequest(sreq);
+srsp.setShard(shard);
+SimpleSolrResponse ssr = new SimpleSolrResponse();
+srsp.setSolrResponse(ssr);
+
+pending.incrementAndGet();
+// if there are no shards available for a slice, urls.size()==0
+if (urls.size() == 0) {
+  // TODO: what's the right error code here? We should use the same thing 
when

Review comment:
   I do not, just copied and pasted from the old code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org