[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248302#comment-17248302
 ] 

David Smiley commented on SOLR-14923:
-

Hey, nice PR!  It's not as hacky as I feared it might be -- just an 
AtomicBoolean (no potentially unbounded Map).  I really appreciate your 
performance benchmarks to prove this out.

I'm going to look a bit further this weekend to see if the openRealtimeSearcher 
can be avoided further.  For example, in RTG.getInputDocument, you added the 
potential open _before_ the check if the doc from the updateLog is null... 
({{sid == null}}) but shouldn't we not if sid isn't null?  Thus move it down 
right below, indented.  Also, maybe in DUP, we can sometimes further restrict 
when this logic happens -- perhaps only when the document coming in is an 
atomic update.  I'll investigate that.

Earlier in this issue, you tried modifying UpdateLog.openRealtimeSearcher to 
move the searcher re-open outside of the synchronized block.  That makes sense 
to me; we should do that. 

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15029) More gracefully allow Shard Leader to give up leadership

2020-12-11 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-15029:
-
Fix Version/s: master (9.0)
   8.8

> More gracefully allow Shard Leader to give up leadership
> 
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15029) More gracefully allow Shard Leader to give up leadership

2020-12-11 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248250#comment-17248250
 ] 

Mike Drob edited comment on SOLR-15029 at 12/11/20, 11:40 PM:
--

Without further comment, I will plan to commit this on Tuesday, and back port 
to 8.8


was (Author: mdrob):
Without further comment, I will plan to commit this on Tuesday.

> More gracefully allow Shard Leader to give up leadership
> 
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15029) More gracefully allow Shard Leader to give up leadership

2020-12-11 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248250#comment-17248250
 ] 

Mike Drob commented on SOLR-15029:
--

Without further comment, I will plan to commit this on Tuesday.

> More gracefully allow Shard Leader to give up leadership
> 
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248237#comment-17248237
 ] 

Thomas Wöckinger commented on SOLR-14923:
-

[~dsmiley] I have run some performance tests, results are very promising, when 
indexing (only adding new documents) with 16 threads, 14 to 15 threads are 
fully utilized.

The results are the same as without nested documents.

I also have done some profiling using JMC, no contention (as expected) from 
DistributedUpdateProcessor.

There is still heavy contention on the UpdateLog.add() method, but this will be 
hard work to optimize. Maybe it would be better to remove this part if RTG is 
not used that much, but that's another story.

I hope you have time to review soon. Thx in advance.

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2120: SOLR-15029 More gracefully give up shard leadership

2020-12-11 Thread GitBox


muse-dev[bot] commented on a change in pull request #2120:
URL: https://github.com/apache/lucene-solr/pull/2120#discussion_r541316136



##
File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
##
@@ -692,55 +718,44 @@ public void close() {
   /**
* Best effort to give up the leadership of a shard in a core after hitting 
a tragic exception
* @param cd The current core descriptor
-   * @param tragicException The tragic exception from the {@code IndexWriter}
*/
-  public void giveupLeadership(CoreDescriptor cd, Throwable tragicException) {
-assert tragicException != null;
+  public void giveupLeadership(CoreDescriptor cd) {
 assert cd != null;
-DocCollection dc = 
getClusterState().getCollectionOrNull(cd.getCollectionName());
+
+String collection = cd.getCollectionName();
+DocCollection dc = getClusterState().getCollectionOrNull(collection);
 if (dc == null) return;
 
 Slice shard = dc.getSlice(cd.getCloudDescriptor().getShardId());

Review comment:
   *NULL_DEREFERENCE:*  object returned by `cd.getCloudDescriptor()` could 
be null and is dereferenced at line 729.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14792) Remove VelocityResponseWriter from Solr 9

2020-12-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248193#comment-17248193
 ] 

David Eric Pugh commented on SOLR-14792:


[~ehatcher] do you think this issue is complete?   IIRC, you did the work on 
this.

> Remove VelocityResponseWriter from Solr 9
> -
>
> Key: SOLR-14792
> URL: https://issues.apache.org/jira/browse/SOLR-14792
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Erik Hatcher
>Priority: Blocker
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> VelocityResponseWriter was deprecated in SOLR-14065.   It can now be removed 
> from 9's code branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15044) Update JSON syntax: detect nested documents via schema

2020-12-11 Thread David Smiley (Jira)
David Smiley created SOLR-15044:
---

 Summary: Update JSON syntax: detect nested documents via schema
 Key: SOLR-15044
 URL: https://issues.apache.org/jira/browse/SOLR-15044
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley


When sending JSON formatted documents to Solr, particularly to 
/update/json/commands instead of /update/json/docs (those are API v2 paths), it 
tries to differentiate between wether a nested structure is either a nested 
document or an atomic update -- it's rather ambiguous.  Presently the logic is 
simply checking for the presence of an "id" but it may not be there (it is 
auto-computed when absent later).  It ought to simply look in the schema to see 
if the field exists or not.  If it doesn't, then it can't be an atomic update, 
thus treat it as a nested document.

This was raised [on this comment in another JIRA 
issue|https://issues.apache.org/jira/browse/SOLR-12362?focusedCommentId=16526338=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16526338].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing

2020-12-11 Thread Mark Robert Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248164#comment-17248164
 ] 

Mark Robert Miller commented on SOLR-14788:
---

Okay, today marks what I will call the end of phase 1 for the ref branch. That 
does not mean I’m done , but I’m done hunting. No more trying to learn more, 
figure out what needs to be pushed harder, spending huge a nights of effort and 
time. No more dragons, no more input from me about what’s wrong with Solr or 
what it’s state is, no more Overseer honesty. My hunt is over, my attempts to 
share any silly tips, over.

The rest is the motions.

But before I put down the microphone, I have prepared my final move, Zero Hand. 
 But the first move is First Hand. It would be way more fun and exciting if 
someone else ran a little gauntlet first, and then I could squash it with First 
Hand without running all the cards. But no worries. I’ll demonstrate it, and 
then double down with 99 Hands. And if anything remains, Zero Hand clears the 
area.

There won’t be such great commentary anymore though, so anyone interested will 
just have to dully watch. 



> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is {color:#de350b}NOW{color} {color:#de350b}OFF{color} 
> duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* *occasionally*{color} *down for some 
> vigilante justice, but I won't be walking the beat, all that stuff about sit 
> back and relax goes out the window.*_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2120: SOLR-15029 More gracefully give up shard leadership

2020-12-11 Thread GitBox


madrob commented on a change in pull request #2120:
URL: https://github.com/apache/lucene-solr/pull/2120#discussion_r541219690



##
File path: 
solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java
##
@@ -1306,7 +1306,7 @@ private static void forceLeaderElection(SolrQueryRequest 
req, CollectionsHandler
 try (ZkShardTerms zkShardTerms = new ZkShardTerms(collectionName, 
slice.getName(), zkController.getZkClient())) {
   // if an active replica is the leader, then all is fine already
   Replica leader = slice.getLeader();
-  if (leader != null && leader.getState() == State.ACTIVE) {
+  if (leader != null && leader.getState() == State.ACTIVE && 
zkShardTerms.getHighestTerm() == zkShardTerms.getTerm(leader.getName())) {

Review comment:
   I think I need to revert this part since I'm not touching terms anymore 
anyway.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2120: SOLR-15029 More gracefully give up shard leadership

2020-12-11 Thread GitBox


madrob commented on a change in pull request #2120:
URL: https://github.com/apache/lucene-solr/pull/2120#discussion_r541218852



##
File path: solr/core/src/java/org/apache/solr/util/TestInjection.java
##
@@ -337,6 +342,39 @@ public static boolean injectFailUpdateRequests() {
 
 return true;
   }
+
+  public static boolean injectLeaderTragedy(SolrCore core) {

Review comment:
   I'm not sure. The other methods in the class have it, so I followed the 
pattern.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15037) config update listener reloads solr core while schema is changed

2020-12-11 Thread Tiziano Degaetano (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tiziano Degaetano updated SOLR-15037:
-
Security: (was: Public)

> config update listener reloads solr core while schema is changed
> 
>
> Key: SOLR-15037
> URL: https://issues.apache.org/jira/browse/SOLR-15037
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.7
> Environment: Solr Cloud 8.7 Java OpenJDK 11 
>Reporter: Tiziano Degaetano
>Priority: Major
>  Labels: pull-request-available
> Attachments: fixCoreReload.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This results makes update schema command return, without waiting for the core 
> to be  fully reloaded. Subsequent requests will use an old schema until 
> reload is done. 
> This is since:
>  
> [https://github.com/apache/lucene-solr/commit/669aff2108f0a8b298cd0afc23d20c658ab53a9d]
> see also 
> [http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202011.mbox/%3CAM0PR01MB42434E9DEB99F01CD88A4A02F3E30%40AM0PR01MB4243.eurprd01.prod.exchangelabs.com%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9621) pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()

2020-12-11 Thread Michael Froh (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248133#comment-17248133
 ] 

Michael Froh edited comment on LUCENE-9621 at 12/11/20, 7:21 PM:
-

Regarding the assertion failure, it looks like the call to 
{{adjustPendingNumDocs}} in {{rollbackInternalNoCommit}} is being call with 0 
(as both {{totalMaxDoc}} and {{rollbackMaxDoc}} are both 0).

It feels to me like when we roll back on tragedy, the {{IndexWriter}} is known 
to be in a bad state, so it's not really surprising that {{pendingNumDocs}} and 
{{segmentInfos.totalMaxDoc()}} are out of sync. Maybe the fix is to skip that 
assertion when called from {{maybeCloseOnTragicEvent}}, so that it doesn't mask 
the real tragedy?


was (Author: msfroh):
Regarding the assertion failure, it looks like the call to 
{{adjustPendingNumDocs}} in {{rollbackInternalNoCommit}} is being call with 0 
(as both {{totalMaxDoc}} and {{rollbackMaxDoc}} are both 0).

It feels to me like when we roll back on tragedy, the {{IndexWriter}} is known 
to be in a bad state, so it's not really surprising that {{pendingNumDocs}} and 
{{segmentInfos.totalMaxDoc()}} are out of sync. Maybe the fix is to skip that 
assertion when called from {{maybeCloseOnTragicEvent, so that it doesn't mask 
the real tragedy?}}

> pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()
> --
>
> Key: LUCENE-9621
> URL: https://issues.apache.org/jira/browse/LUCENE-9621
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.6.3
>Reporter: Michael Froh
>Priority: Major
>
> While implementing a test to trigger an OutOfMemoryError on flush() in 
> https://github.com/apache/lucene-solr/pull/2088, I noticed that the OOME was 
> followed by an assertion failure on rollback with the following stacktrace:
> {code:java}
> java.lang.AssertionError: pendingNumDocs 1 != 0 totalMaxDoc
>   at 
> __randomizedtesting.SeedInfo.seed([ABBF17C4E0FCDEE5:DDC8E99910AFC8FF]:0)
>   at 
> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2398)
>   at 
> org.apache.lucene.index.IndexWriter.maybeCloseOnTragicEvent(IndexWriter.java:5196)
>   at 
> org.apache.lucene.index.IndexWriter.tragicEvent(IndexWriter.java:5186)
>   at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3932)
>   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3874)
>   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3853)
>   at 
> org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllRepeated(TestIndexWriterDelete.java:496)
> {code}
> We should probably look into how exactly we behave with this kind of tragedy 
> on flush().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9621) pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()

2020-12-11 Thread Michael Froh (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248133#comment-17248133
 ] 

Michael Froh commented on LUCENE-9621:
--

Regarding the assertion failure, it looks like the call to 
{{adjustPendingNumDocs}} in {{rollbackInternalNoCommit}} is being call with 0 
(as both {{totalMaxDoc}} and {{rollbackMaxDoc}} are both 0).

It feels to me like when we roll back on tragedy, the {{IndexWriter}} is known 
to be in a bad state, so it's not really surprising that {{pendingNumDocs}} and 
{{segmentInfos.totalMaxDoc()}} are out of sync. Maybe the fix is to skip that 
assertion when called from {{maybeCloseOnTragicEvent, so that it doesn't mask 
the real tragedy?}}

> pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()
> --
>
> Key: LUCENE-9621
> URL: https://issues.apache.org/jira/browse/LUCENE-9621
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.6.3
>Reporter: Michael Froh
>Priority: Major
>
> While implementing a test to trigger an OutOfMemoryError on flush() in 
> https://github.com/apache/lucene-solr/pull/2088, I noticed that the OOME was 
> followed by an assertion failure on rollback with the following stacktrace:
> {code:java}
> java.lang.AssertionError: pendingNumDocs 1 != 0 totalMaxDoc
>   at 
> __randomizedtesting.SeedInfo.seed([ABBF17C4E0FCDEE5:DDC8E99910AFC8FF]:0)
>   at 
> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2398)
>   at 
> org.apache.lucene.index.IndexWriter.maybeCloseOnTragicEvent(IndexWriter.java:5196)
>   at 
> org.apache.lucene.index.IndexWriter.tragicEvent(IndexWriter.java:5186)
>   at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3932)
>   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3874)
>   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3853)
>   at 
> org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllRepeated(TestIndexWriterDelete.java:496)
> {code}
> We should probably look into how exactly we behave with this kind of tragedy 
> on flush().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9634) Highlighting of degenerate spans on fields *with offsets* doesn't work properly

2020-12-11 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248125#comment-17248125
 ] 

Dawid Weiss commented on LUCENE-9634:
-

I don't know how to fix this one so leaving it open.

> Highlighting of degenerate spans on fields *with offsets* doesn't work 
> properly
> ---
>
> Key: LUCENE-9634
> URL: https://issues.apache.org/jira/browse/LUCENE-9634
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> Match highlighter works fine with degenerate interval positions when 
> {{OffsetsFromPositions}} strategy is used to compute offsets but will show 
> incorrect offset ranges if offsets are read from directly from the 
> {{MatchIterator}} ({{OffsetsFromMatchIterator}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9633) Improve match highlighter behavior for degenerate intervals (on non-existing positions)

2020-12-11 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9633.
-
Resolution: Fixed

> Improve match highlighter behavior for degenerate intervals (on non-existing 
> positions)
> ---
>
> Key: LUCENE-9633
> URL: https://issues.apache.org/jira/browse/LUCENE-9633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Interval functions can produce match spans on non-existing or otherwise 
> degenerate token positions. For example,
> {code}
> extend(foo 5 5)
> {code}
> would create an interval to the left and right of each term foo, regardless 
> of whether such positions actually exist in the token stream.
> This issue improves match highlighter to still work in such cases. This is 
> actually fun to play with  as you can highlight and visualize actual interval 
> spans even for functions that expand or manipulate other sources' context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9633) Improve match highlighter behavior for degenerate intervals (on non-existing positions)

2020-12-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248124#comment-17248124
 ] 

ASF subversion and git services commented on LUCENE-9633:
-

Commit a6481439556c2f9dcb038d233d7eee4179f4 in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a648143 ]

LUCENE-9633: Improve match highlighter behavior for degenerate intervals (on 
non-existing positions). (#2127)



> Improve match highlighter behavior for degenerate intervals (on non-existing 
> positions)
> ---
>
> Key: LUCENE-9633
> URL: https://issues.apache.org/jira/browse/LUCENE-9633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Interval functions can produce match spans on non-existing or otherwise 
> degenerate token positions. For example,
> {code}
> extend(foo 5 5)
> {code}
> would create an interval to the left and right of each term foo, regardless 
> of whether such positions actually exist in the token stream.
> This issue improves match highlighter to still work in such cases. This is 
> actually fun to play with  as you can highlight and visualize actual interval 
> spans even for functions that expand or manipulate other sources' context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2127: LUCENE-9633: Improve match highlighter behavior for degenerate intervals

2020-12-11 Thread GitBox


dweiss merged pull request #2127:
URL: https://github.com/apache/lucene-solr/pull/2127


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15043) Add more type safety to noggit

2020-12-11 Thread Mike Drob (Jira)
Mike Drob created SOLR-15043:


 Summary: Add more type safety to noggit
 Key: SOLR-15043
 URL: https://issues.apache.org/jira/browse/SOLR-15043
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Mike Drob


Noggit parser returns Object in many places where we can return a more specific 
or possibly a generic type rather than opaque Object that has to be cast to 
what we know it already must be.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15042) Provide API to get current index writer without throwing IOException

2020-12-11 Thread Mike Drob (Jira)
Mike Drob created SOLR-15042:


 Summary: Provide API to get current index writer without throwing 
IOException
 Key: SOLR-15042
 URL: https://issues.apache.org/jira/browse/SOLR-15042
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Mike Drob


We have a lot of code that calls {{SolrCoreState.getIndexWriter(null)}}, which 
should never throw an exception, but we have to handle a checked IOException 
anyway. We should create a new method that will only return the current index 
writer without creating one, and does not throw, as this may simplify a lot of 
calling code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9621) pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()

2020-12-11 Thread Michael Froh (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248119#comment-17248119
 ] 

Michael Froh edited comment on LUCENE-9621 at 12/11/20, 6:55 PM:
-

I added a {{printStackTrace}} to {{onTragicEvent}} and got the following:
{code:java}
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:125)
at 
org.apache.lucene.index.FieldInfos$Builder.finish(FieldInfos.java:645)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:291)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:660)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3899)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3874)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3853)
at 
org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllRepeated(TestIndexWriterDelete.java:499)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
{code}
This is the leak that I called out and fixed in 
https://issues.apache.org/jira/browse/LUCENE-9617. If we add documents and call 
{{deleteAll}} on the same {{IndexWriter}} repeatedly, it leaks field numbers 
and tries allocating a huge array in {{FieldInfos}} to accommodate the largest 
known field number.


was (Author: msfroh):
I added a {{printStackTrace}} to {{onTragicEvent}} and got the following:
{code:java}
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:125)
at 
org.apache.lucene.index.FieldInfos$Builder.finish(FieldInfos.java:645)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:291)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:660)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3899)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3874)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3853)
at 
org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllRepeated(TestIndexWriterDelete.java:499)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 

[jira] [Commented] (LUCENE-9621) pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()

2020-12-11 Thread Michael Froh (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248119#comment-17248119
 ] 

Michael Froh commented on LUCENE-9621:
--

I added a {{printStackTrace}} to {{onTragicEvent}} and got the following:
{code:java}
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:125)
at 
org.apache.lucene.index.FieldInfos$Builder.finish(FieldInfos.java:645)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:291)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:660)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3899)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3874)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3853)
at 
org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllRepeated(TestIndexWriterDelete.java:499)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
{code}
This is the leak that I called out and fixed in 
https://issues.apache.org/jira/browse/LUCENE-9617. If we call {{deleteAll}} on 
the same {{IndexWriter}} repeatedly, it leaks field numbers and tries 
allocating a huge array in {{FieldInfos}} to accommodate the largest known 
field number.

> pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()
> --
>
> Key: LUCENE-9621
> URL: https://issues.apache.org/jira/browse/LUCENE-9621
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.6.3
>Reporter: Michael Froh
>Priority: Major
>
> While implementing a test to trigger an OutOfMemoryError on flush() in 
> https://github.com/apache/lucene-solr/pull/2088, I noticed that the OOME was 
> followed by an assertion failure on rollback with the following stacktrace:
> {code:java}
> java.lang.AssertionError: pendingNumDocs 1 != 0 totalMaxDoc
>   at 
> __randomizedtesting.SeedInfo.seed([ABBF17C4E0FCDEE5:DDC8E99910AFC8FF]:0)
>   at 
> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2398)
>   at 
> org.apache.lucene.index.IndexWriter.maybeCloseOnTragicEvent(IndexWriter.java:5196)
>   at 
> org.apache.lucene.index.IndexWriter.tragicEvent(IndexWriter.java:5186)
>   at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3932)
>   at 

[jira] [Commented] (SOLR-14792) Remove VelocityResponseWriter from Solr 9

2020-12-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248099#comment-17248099
 ] 

David Eric Pugh commented on SOLR-14792:


Definitely not arguing the usefulness of the Velocity component for those who 
adopt it!   I've used it to solve some very hard problems myself ;-)   




> Remove VelocityResponseWriter from Solr 9
> -
>
> Key: SOLR-14792
> URL: https://issues.apache.org/jira/browse/SOLR-14792
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Erik Hatcher
>Priority: Blocker
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> VelocityResponseWriter was deprecated in SOLR-14065.   It can now be removed 
> from 9's code branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14792) Remove VelocityResponseWriter from Solr 9

2020-12-11 Thread Walter Underwood (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248095#comment-17248095
 ] 

Walter Underwood commented on SOLR-14792:
-

Some of the things we add to the Velocity UI that aren't in the public UI:
 * Facets on filter fields, like document type or origin
 * Range facet on indexed_datetime, to check the freshness of the collection
 * For each document, links to the back-end API for that (returns JSON) and the 
production page for it
 * Values of fields used in ranking that are not shown in the public results 
page, like order counts and order forecast
 * Field values (with facets) for things that are still being implemented, like 
a new taxonomy

Yes, I know it is available in contrib, but the Velocity UI is very useful to 
us.

> Remove VelocityResponseWriter from Solr 9
> -
>
> Key: SOLR-14792
> URL: https://issues.apache.org/jira/browse/SOLR-14792
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Erik Hatcher
>Priority: Blocker
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> VelocityResponseWriter was deprecated in SOLR-14065.   It can now be removed 
> from 9's code branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14792) Remove VelocityResponseWriter from Solr 9

2020-12-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248089#comment-17248089
 ] 

David Eric Pugh commented on SOLR-14792:


It is still available, just install it as a package: 
https://github.com/erikhatcher/solritas.  

I'd love to see the Solr Admin built in query interface move to the newest 
APIs, and be  a lot more robust for the workflows of a search developer.  
Something much more akin to the experience with Postman.


> Remove VelocityResponseWriter from Solr 9
> -
>
> Key: SOLR-14792
> URL: https://issues.apache.org/jira/browse/SOLR-14792
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Erik Hatcher
>Priority: Blocker
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> VelocityResponseWriter was deprecated in SOLR-14065.   It can now be removed 
> from 9's code branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource

2020-12-11 Thread GitBox


madrob commented on pull request #2118:
URL: https://github.com/apache/lucene-solr/pull/2118#issuecomment-743349014


   > Can you give me a hint where I can add such test (and maybe find some 
inspiration from existing tests)?
   
   You can add a test to 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/search/function/TestFunctionQuery.java
 the `text` field in the schema for that test is already configured to use 
stopwords: `stopworda` and `stopwordb`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vonbox commented on a change in pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource

2020-12-11 Thread GitBox


vonbox commented on a change in pull request #2118:
URL: https://github.com/apache/lucene-solr/pull/2118#discussion_r541134314



##
File path: solr/core/src/java/org/apache/solr/search/FunctionQParser.java
##
@@ -361,7 +361,9 @@ protected ValueSource parseValueSource(int flags) throws 
SyntaxError {
 ((FunctionQParser)subParser).setParseMultipleSources(true);
   }
   Query subQuery = subParser.getQuery();
-  if (subQuery instanceof FunctionQuery) {
+  if (subQuery == null) {
+valueSource = new DoubleConstValueSource(0.0f);
+  } else if (subQuery instanceof FunctionQuery) {
 valueSource = ((FunctionQuery) subQuery).getValueSource();
   } else {
 valueSource = new QueryValueSource(subQuery, 0.0f);

Review comment:
   I pushed a commit adding the null check to the QueryValueSource 
constructor





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-11 Thread Michael Gibney (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gibney updated SOLR-10732:
--
Attachment: SOLR-10732.patch

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch, SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-11 Thread Michael Gibney (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gibney updated SOLR-10732:
--
Attachment: SOLR-10732.patch

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14792) Remove VelocityResponseWriter from Solr 9

2020-12-11 Thread Walter Underwood (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248075#comment-17248075
 ] 

Walter Underwood commented on SOLR-14792:
-

This is the first I saw about this, unfortunately. We have a Velocity UI for 
every collection. That is for the search team, not the official front end.

Odd to see this dropped with no replacement.

> Remove VelocityResponseWriter from Solr 9
> -
>
> Key: SOLR-14792
> URL: https://issues.apache.org/jira/browse/SOLR-14792
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Erik Hatcher
>Priority: Blocker
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> VelocityResponseWriter was deprecated in SOLR-14065.   It can now be removed 
> from 9's code branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-11 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248077#comment-17248077
 ] 

Michael Gibney commented on SOLR-10732:
---

Thanks, [~munendrasn], that makes sense. I don't have any objections, and your 
change is straightforward and low-risk.

Following up on my comment about optimizing "higher up in the program logic, to 
prune as much execution as possible (and when it's clearer how/why we got the 
point of having an empty domain)", I started trying to approach this from the 
top down and quickly got rather deeper into it than I'd intended. I'll post the 
resulting patch here for the sake of comparison; feel free to incorporate any 
aspects as you see fit, but to be clear I think it'd be perfectly reasonable to 
go ahead with your initial patch.

The new patch optimizes wrt configured {{mincount}} (where applicable) and 
covers all legacy facet types (I think). It's still pretty straightforward, but 
compared to your initial proposal it's admittedly more complex. It prunes 
potentially quite a bit more execution I think; but notably, as a consequence 
of optimizing higher up, and wrt {{mincount}} (as opposed to only 
{{domain.size()==0}}), it carries potential risks that your patch doesn't. 
(also note: it will fail precommit due to some {{nocommit}} comments on some 
assertions temporarily present to check for functional parity with your initial 
patch).



> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-11 Thread Michael Gibney (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gibney updated SOLR-10732:
--
Attachment: (was: SOLR-10732.patch)

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9621) pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()

2020-12-11 Thread Simon Willnauer (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247965#comment-17247965
 ] 

Simon Willnauer commented on LUCENE-9621:
-

do we have a stacktrace of the OOM as well where it happened?

> pendingNumDocs doesn't match totalMaxDoc if tragedy on flush()
> --
>
> Key: LUCENE-9621
> URL: https://issues.apache.org/jira/browse/LUCENE-9621
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.6.3
>Reporter: Michael Froh
>Priority: Major
>
> While implementing a test to trigger an OutOfMemoryError on flush() in 
> https://github.com/apache/lucene-solr/pull/2088, I noticed that the OOME was 
> followed by an assertion failure on rollback with the following stacktrace:
> {code:java}
> java.lang.AssertionError: pendingNumDocs 1 != 0 totalMaxDoc
>   at 
> __randomizedtesting.SeedInfo.seed([ABBF17C4E0FCDEE5:DDC8E99910AFC8FF]:0)
>   at 
> org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2398)
>   at 
> org.apache.lucene.index.IndexWriter.maybeCloseOnTragicEvent(IndexWriter.java:5196)
>   at 
> org.apache.lucene.index.IndexWriter.tragicEvent(IndexWriter.java:5186)
>   at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3932)
>   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3874)
>   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3853)
>   at 
> org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllRepeated(TestIndexWriterDelete.java:496)
> {code}
> We should probably look into how exactly we behave with this kind of tragedy 
> on flush().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247908#comment-17247908
 ] 

Thomas Wöckinger commented on SOLR-14923:
-

[~dsmiley] A first solution, created a PR, so please review, awaiting your 
suggestions, stored the field in the RTG instance.

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger opened a new pull request #2142: SOLR-14923: Reload RealtimeSearcher on next getInputDocument if forced

2020-12-11 Thread GitBox


thomaswoeckinger opened a new pull request #2142:
URL: https://github.com/apache/lucene-solr/pull/2142


   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247865#comment-17247865
 ] 

Thomas Wöckinger commented on SOLR-14923:
-

[~dsmiley] I started to investigate on this issue, i introduce an AtomicBoolean 
in the RTG, set by the DistributedUpdateProcessor instead of calling 
ulog.openRealtimeSearcher()

and evaluated in public static SolrInputDocument getInputDocument(...) only 
(line 633).

This method is defined static and provides the used SolrCore as parameter, so i 
cannot use a static member in the RTG because the flag must be set by core and 
evaluated by core otherwise it does not fit together.

The current solutions is tracked by the UpdateLog which is instantiated by each 
Updatehandler by each SolrCore, so it fits together.

So the point is, where should i put this AtomicBoolean (it must be context 
specific)?

An alternative solution would be a static Map with a key which separates the 
context, but i these seems to be ugly.

Any suggestions

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9564) Format code automatically and enforce it

2020-12-11 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247738#comment-17247738
 ] 

Dawid Weiss commented on LUCENE-9564:
-

I'm sorry - I remember about this but didn't have time to push it forward. I'll 
create a branch with the necessary infrastructure and perhaps apply it to one 
of the smaller projects (or packages) so that you guys (or whoever wants to) 
can join in.

The time-consuming part is in applying the formatter and then manually 
verifying (the code and the diff) what it potentially did wrong or where the 
code shapes can be improved. Sometimes very long nesting can be extracted into 
a local variable to make the code look nicer (when formatted), etc.

> Format code automatically and enforce it
> 
>
> Key: LUCENE-9564
> URL: https://issues.apache.org/jira/browse/LUCENE-9564
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is a trivial change but a bold move. And I'm sure it's not for everyone.
> I started using google java format [1] in my projects a while ago and have 
> never looked back since. It is an oracle-style formatter (doesn't allow 
> customizations or deviations from the defined 'ideal') - this takes some 
> getting used to - but it also eliminates *all* the potential differences 
> between IDEs, configs, etc.  And the formatted code typically looks much 
> better than hand-edited one. It is also verifiable on precommit (so you can't 
> commit code that deviates from what you'd get from automated formatting 
> output).
> The biggest benefit I see is that refactorings become such a joy and keep the 
> code neat, everywhere. Before you commit you just reformat everything 
> automatically, no matter how much you messed it up.
> This isn't a change for everyone. I myself love hand-edited, neat code... but 
> the reality is that with IDE support for automated code changes and so many 
> people with different styles working on the same codebase keeping it neat is 
> a big pain. 
> Checkstyle and other tools are fine for ensuring certain rules but they don't 
> take the burden of formatting off your shoulders. This tool does. 
> Like I said - I had *great* reservations about using it at the beginning but 
> over time got so used to it that I almost can't live without it now. It's 
> like magic - you play with the code in any way you like, then run formatting 
> and it's nice and neat.
> The downside is that automated formatting does imply potential merge problems 
> in backward patches (or any currently existing branches).
> Like I said, it is a bold move. Just throwing this for your consideration.
> -I've added a PR that adds spotless but it's not ready; some files would have 
> to be excluded as they currently violate header rules.-
> A more interesting thing is here where the current code is automatically 
> reformatted - this branch is for eyeballing only.
> https://github.com/dweiss/lucene-solr/compare/LUCENE-9564...dweiss:LUCENE-9564-example
> [1] https://google.github.io/styleguide/javaguide.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org