[GitHub] [lucene-solr] dsmiley commented on pull request #1582: Remove some needless toAbsolutePath calls

2020-06-16 Thread GitBox


dsmiley commented on pull request #1582:
URL: https://github.com/apache/lucene-solr/pull/1582#issuecomment-645122868


   Yes; tests pass.  I re-acquainted myself with these specific changes now and 
they all make sense to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #1514: SOLR-13749: Change cross-collection join query syntax to {!join method=crossCollection ...}

2020-06-16 Thread GitBox


dsmiley commented on pull request #1514:
URL: https://github.com/apache/lucene-solr/pull/1514#issuecomment-645120762


   Last minute request: also please change solrUrlWhitelist to avoid this 
"whitelist" word (think current events) to, I propose "allowSolrUrls".  There's 
a discussion going on now internally that probably should be public about this 
sort of thing.  There's another patch to add a new similar thing for file 
system paths deliberately named "allowPaths", so that influenced my suggested 
new name here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul opened a new pull request #1586: SOLR-14576 : Do not use SolrCore as keys in a WeakHashMap

2020-06-16 Thread GitBox


noblepaul opened a new pull request #1586:
URL: https://github.com/apache/lucene-solr/pull/1586


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14576) HttpCacheHeaderUti.etagCoreCache should not use a SolrCore as key

2020-06-16 Thread Noble Paul (Jira)
Noble Paul created SOLR-14576:
-

 Summary: HttpCacheHeaderUti.etagCoreCache should not use a 
SolrCore as key
 Key: SOLR-14576
 URL: https://issues.apache.org/jira/browse/SOLR-14576
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Noble Paul


GC performance is affected when the key is a complex data structure. We can 
make it

{code}
private static WeakIdentityMap etagCoreCache = 
WeakIdentityMap.newConcurrentHashMap();
{code}


instead of

 {code}
private static WeakIdentityMap etagCoreCache = 
WeakIdentityMap.newConcurrentHashMap();
{code}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1585: LUCENE-8962: Allow waiting for all merges in a merge spec

2020-06-16 Thread GitBox


msokolov commented on a change in pull request #1585:
URL: https://github.com/apache/lucene-solr/pull/1585#discussion_r441170320



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -4289,7 +4287,7 @@ private synchronized void 
mergeFinish(MergePolicy.OneMerge merge) {
   @SuppressWarnings("try")
   private synchronized void closeMergeReaders(MergePolicy.OneMerge merge, 
boolean suppressExceptions) throws IOException {
 final boolean drop = suppressExceptions == false;
-try (Closeable finalizer = merge::mergeFinished) {
+try (Closeable finalizer = () -> 
merge.mergeFinished(suppressExceptions==false)) {

Review comment:
   how about some spaces around the operator==?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msfroh commented on a change in pull request #1585: LUCENE-8962: Allow waiting for all merges in a merge spec

2020-06-16 Thread GitBox


msfroh commented on a change in pull request #1585:
URL: https://github.com/apache/lucene-solr/pull/1585#discussion_r441148085



##
File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
##
@@ -399,6 +427,23 @@ public String segString(Directory dir) {
   }
   return b.toString();
 }
+
+/**
+ * Waits if necessary for at most the given time for all merges.
+ */
+boolean await(long timeout, TimeUnit unit) {
+  try {
+CompletableFuture future = 
CompletableFuture.allOf(merges.stream()
+.map(m -> m.completable).collect(Collectors.toList()).toArray(new 
CompletableFuture[0]));
+future.get(timeout, unit);
+return true;
+  } catch (InterruptedException e) {
+Thread.interrupted();

Review comment:
   In my change, I remember I originally tried handling the 
`InterruptedException` while waiting for the `CountDownLatch` in a similar way, 
but it caused an intermittent test failure (in `TestIndexWriterWithThreads`, 
maybe?) You might run into the same test failure once you start calling this 
`await` method from within `IndexWriter`.
   
   I was able to get the test passing by rethrowing it wrapped in a 
`ThreadInterruptedException`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1585: LUCENE-8962: Allow waiting for all merges in a merge spec

2020-06-16 Thread GitBox


dsmiley commented on a change in pull request #1585:
URL: https://github.com/apache/lucene-solr/pull/1585#discussion_r441140226



##
File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
##
@@ -399,6 +427,23 @@ public String segString(Directory dir) {
   }
   return b.toString();
 }
+
+/**
+ * Waits if necessary for at most the given time for all merges.
+ */
+boolean await(long timeout, TimeUnit unit) {
+  try {
+CompletableFuture future = 
CompletableFuture.allOf(merges.stream()

Review comment:
   What I learned today: `CompletableFuture.allOf()` -- cool, thanks!  
So elegant.
   
   Let me try and return the favor:
   You can go directly to an array, avoiding toList:
   ```
   CompletableFuture future = CompletableFuture.allOf(merges.stream()
   .map(m -> m.completable).toArray(CompletableFuture[]::new));
   ```
   BTW IntelliJ pointed this out and had an automatic replacement when I 
hovered my cursor over "collect".





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-16 Thread Alex Klibisz (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137925#comment-17137925
 ] 

Alex Klibisz commented on LUCENE-9378:
--

[~mgibney] 

> quick clarification regarding "for every doc in the lucene shard": do your 
>benchmarks illustrating the regression evaluate the vector query over the full 
>domain (i.e., literally every (live) doc in the index, without any 
>pre-filtering of the search domain)?

It's reading every live doc in the elasticsearch index, which consists of 
multiple Lucene shards. I'm not controlling the order. From the perspective of 
my plugin, I'm just getting a docId from Elasticsearch and using 
`advanceNext(docId)` to lookup the binary value. Also, there are no deleted 
docs in this particular case, though there could be in practice.

Here's the exact snippet: 
[https://github.com/alexklibisz/elastiknn/blob/benchmarks-4/plugin/src/main/scala/com/klibisz/elastiknn/query/ExactQuery.scala#L25-L29]

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8574) ExpressionFunctionValues should cache per-hit value

2020-06-16 Thread Haoyu Zhai (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137920#comment-17137920
 ] 

Haoyu Zhai commented on LUCENE-8574:


I've attached a unit test showing a case that current code could not handle. 
And it seems the patch attached to this issue could not handle it as well 
(since DoubleValues generated for the same LeafReaderContext is not the same, 
we still get tons of DoubleValues created).

> ExpressionFunctionValues should cache per-hit value
> ---
>
> Key: LUCENE-8574
> URL: https://issues.apache.org/jira/browse/LUCENE-8574
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.5, 8.0
>Reporter: Michael McCandless
>Assignee: Robert Muir
>Priority: Major
> Attachments: LUCENE-8574.patch, unit_test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The original version of {{ExpressionFunctionValues}} had a simple per-hit 
> cache, so that nested expressions that reference the same common variable 
> would compute the value for that variable the first time it was referenced 
> and then use that cached value for all subsequent invocations, within one 
> hit.  I think it was accidentally removed in LUCENE-7609?
> This is quite important if you have non-trivial expressions that reference 
> the same variable multiple times.
> E.g. if I have these expressions:
> {noformat}
> x = c + d
> c = b + 2 
> d = b * 2{noformat}
> Then evaluating x should only cause b's value to be computed once (for a 
> given hit), but today it's computed twice.  The problem is combinatoric if b 
> then references another variable multiple times, etc.
> I think to fix this we just need to restore the per-hit cache?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8574) ExpressionFunctionValues should cache per-hit value

2020-06-16 Thread Haoyu Zhai (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoyu Zhai updated LUCENE-8574:
---
Attachment: unit_test.patch

> ExpressionFunctionValues should cache per-hit value
> ---
>
> Key: LUCENE-8574
> URL: https://issues.apache.org/jira/browse/LUCENE-8574
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.5, 8.0
>Reporter: Michael McCandless
>Assignee: Robert Muir
>Priority: Major
> Attachments: LUCENE-8574.patch, unit_test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The original version of {{ExpressionFunctionValues}} had a simple per-hit 
> cache, so that nested expressions that reference the same common variable 
> would compute the value for that variable the first time it was referenced 
> and then use that cached value for all subsequent invocations, within one 
> hit.  I think it was accidentally removed in LUCENE-7609?
> This is quite important if you have non-trivial expressions that reference 
> the same variable multiple times.
> E.g. if I have these expressions:
> {noformat}
> x = c + d
> c = b + 2 
> d = b * 2{noformat}
> Then evaluating x should only cause b's value to be computed once (for a 
> given hit), but today it's computed twice.  The problem is combinatoric if b 
> then references another variable multiple times, etc.
> I think to fix this we just need to restore the per-hit cache?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1576: Alternative approach to LUCENE-8962

2020-06-16 Thread GitBox


s1monw commented on a change in pull request #1576:
URL: https://github.com/apache/lucene-solr/pull/1576#discussion_r441125936



##
File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
##
@@ -399,8 +423,19 @@ public String segString(Directory dir) {
   }
   return b.toString();
 }
+
+boolean await(long timeout, TimeUnit unit) {
+  for (OneMerge merge : merges) {
+if (merge.await(timeout, unit) == false) {

Review comment:
   I fixed this in the followup PR 
https://github.com/apache/lucene-solr/pull/1585





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1576: Alternative approach to LUCENE-8962

2020-06-16 Thread GitBox


s1monw commented on pull request #1576:
URL: https://github.com/apache/lucene-solr/pull/1576#issuecomment-644997637


   @msfroh I opened https://github.com/apache/lucene-solr/pull/1585 to make it 
easier to do this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1585: LUCENE-8962: Allow waiting for all merges in a merge spec

2020-06-16 Thread GitBox


s1monw commented on pull request #1585:
URL: https://github.com/apache/lucene-solr/pull/1585#issuecomment-644992788


   @mikemccand @msokolov @msfroh I prepared some of my changes so we can pull 
them in and incorporate them into the overall change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #1585: LUCENE-8962: Allow waiting for all merges in a merge spec

2020-06-16 Thread GitBox


s1monw opened a new pull request #1585:
URL: https://github.com/apache/lucene-solr/pull/1585


   This change adds infrastructure to allow straight forward waiting
   on one or more merges or an entire merge specification. This is
   a basis for LUCENE-8962.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-06-16 Thread Varun Thacker (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137909#comment-17137909
 ] 

Varun Thacker commented on LUCENE-9322:
---

I've taken only VectorField parts from your PR in 
[https://github.com/apache/lucene-solr/pull/1584] . This was mostly me trying 
to to see if it makes sense to break out the work.

> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vthacker opened a new pull request #1584: Add VectorField: WIP

2020-06-16 Thread GitBox


vthacker opened a new pull request #1584:
URL: https://github.com/apache/lucene-solr/pull/1584


   https://issues.apache.org/jira/projects/LUCENE-ABCD
   
   # Description
   
   Add VectorField 
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-16 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137820#comment-17137820
 ] 

Michael Gibney commented on LUCENE-9378:


[~alexklibisz], quick clarification regarding "for every doc in the lucene 
shard": do your benchmarks illustrating the regression evaluate the vector 
query over the full domain (i.e., literally every (live) doc in the index, 
without any pre-filtering of the search domain)?

This question is related to [~jpountz]'s comment above: "decompress all values 
when we need a single one in a block". It would make sense that docId-order 
access to docValues over the full domain could be faster (e.g., full-domain 
facets?); selective docId-order access (i.e. over a filtered domain) could be 
slower; arbitrary (non-docId-order) access over the full domain would likely be 
a worst-case scenario (wrt the impact of block size), all else being equal. The 
last of these would affect bulk-export-type use cases, accessing docValues for 
each doc in arbitrary order.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14553) I get a stack trace when running Solrj when calling the query method of HttpSolrClient

2020-06-16 Thread Jim Anderson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137793#comment-17137793
 ] 

Jim Anderson commented on SOLR-14553:
-

My apologies for the misplace email. I realized the mistake and started
using the solr user mailing list before I decided to move away from solr.

Regards,
Jim

On Tue, Jun 16, 2020 at 12:54 PM Jason Gerlowski (Jira) 



> I get a stack trace when running Solrj when calling the query method of 
> HttpSolrClient
> --
>
> Key: SOLR-14553
> URL: https://issues.apache.org/jira/browse/SOLR-14553
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: 7.3.1
>Reporter: Jim Anderson
>Priority: Major
>  Labels: HttpSolrClient, solrj
> Attachments: SolrRead.java, bugSteps.ods
>
>
> I started a thread last week about running solrj using Solr 8.5.1. My program 
> was crashing. I was using it with Nutch 1.16. In the documentation, it was 
> recommended that Nutch 1.17 be used with Solr 8.5.1. Nutch 1.17 is still a 
> developer version and not released, so I have downloaded Solar 7.3.1 to use 
> with Nutch 1.16.
> I have been able to build up a small index in Solr using Nutch and I can see 
> the Solr core containing the index in the solr admin window.
> Next, I tried to query Solr using solrj, using an example I found at:
> https://lucene.apache.org/solr/guide/8_1/using-solrj.html#common-configuration-options
> I wrote my own version of the example and tried running it and I get the 
> following
> stacktrace:
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255
> )
>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244
> )
>  at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>  at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>  at bfs.solrRead.SolrRead.main(SolrRead.java:40)
>  
> This problem occurs on a linux platform, and the output from 'uname -a' is:
> Linux roe 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 
> GNU/Linux
> I have logged my steps and can probably reproduce the problem, but it does 
> take some time. Along the way I had some warnings and potential error 
> messages (I too new to Solr to know if the messages were errors or not) so 
> the crash could be due to bad data in the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14573) Fix or suppress warnings in solrj/src/test

2020-06-16 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14573.
---
Fix Version/s: 8.6
   Resolution: Fixed

> Fix or suppress warnings in solrj/src/test
> --
>
> Key: SOLR-14573
> URL: https://issues.apache.org/jira/browse/SOLR-14573
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>
> Bah. the target testClasses shows over 1,000 _more_ warnings.
> I'm going to do this a little differently. Rather than do a directory at a 
> time, I'll just fix a bunch, push, fix a bunch more, push all on this Jira 
> until I'm done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8274) Add per-request MDC logging based on user-provided value.

2020-06-16 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137783#comment-17137783
 ] 

David Smiley commented on SOLR-8274:


I'm an OpenTracing noob too but what drives me is the desire for the principle 
of having one way to transmit a tracking ID.  Some of the work [~caomanhdat] 
did here to make OpenTracing work ought to be useful for passing along any 
tracing ID.  At least in principle and I hope in practice.  Lets not do 
redundant work (redundant code paths for similar things).  I appreciate that 
OpenTracing specifically might really only be about a tracing server and 
sampling; I'm not sure.

> Add per-request MDC logging based on user-provided value.
> -
>
> Key: SOLR-8274
> URL: https://issues.apache.org/jira/browse/SOLR-8274
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-8274.patch
>
>
> *Problem 1* Currently, there's no way (AFAIK) to find all log messages 
> associated with a particular request.
> *Problem 2* There's also no easy way for multi-tenant Solr setups to find all 
> log messages associated with a particular customer/tenant.
> Both of these problems would be more manageable if Solr could be configured 
> to record an MDC tag based on a header, or some other user provided value.
> This would allow admins to group together logs about a single request.  If 
> the same header value is repeated multiple times this functionality could 
> also be used to group together arbitrary requests, such as those that come 
> from a particular user, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14573) Fix or suppress warnings in solrj/src/test

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137643#comment-17137643
 ] 

ASF subversion and git services commented on SOLR-14573:


Commit 6357b3bdaaa0e2c2b1acfe838e55db3a83e38e73 in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6357b3b ]

SOLR-14573: Fix or suppress warnings in solrj/src/test


> Fix or suppress warnings in solrj/src/test
> --
>
> Key: SOLR-14573
> URL: https://issues.apache.org/jira/browse/SOLR-14573
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> Bah. the target testClasses shows over 1,000 _more_ warnings.
> I'm going to do this a little differently. Rather than do a directory at a 
> time, I'll just fix a bunch, push, fix a bunch more, push all on this Jira 
> until I'm done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14573) Fix or suppress warnings in solrj/src/test

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137675#comment-17137675
 ] 

ASF subversion and git services commented on SOLR-14573:


Commit c5a29169c5de71e9eb6916139243d5530800d050 in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c5a2916 ]

SOLR-14573: Fix or suppress warnings in solrj/src/test


> Fix or suppress warnings in solrj/src/test
> --
>
> Key: SOLR-14573
> URL: https://issues.apache.org/jira/browse/SOLR-14573
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> Bah. the target testClasses shows over 1,000 _more_ warnings.
> I'm going to do this a little differently. Rather than do a directory at a 
> time, I'll just fix a bunch, push, fix a bunch more, push all on this Jira 
> until I'm done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msfroh commented on pull request #1576: Alternative approach to LUCENE-8962

2020-06-16 Thread GitBox


msfroh commented on pull request #1576:
URL: https://github.com/apache/lucene-solr/pull/1576#issuecomment-644899966


   This approach makes sense to me. 
   
   I like how much simpler the addition of MergeSpecification.await() makes 
things, versus the CountDownLatch hackery of the previous approach. Also, 
updatePendingMerges returning the MergeSpecification is much cleaner than 
explicitly computing a merge from within prepareCommitInternal.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1581: SOLR-14572 document missing SearchComponents

2020-06-16 Thread GitBox


dsmiley commented on a change in pull request #1581:
URL: https://github.com/apache/lucene-solr/pull/1581#discussion_r441014049



##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,14 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+* `RealTimeGetComponent`, described in the section 
<>.
+* `ClusteringComponent`, described in the section 
<>.
+* `SuggestComponent`, described in the section 
<>.
+* `AnalyticsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:
+
+* `ResponseLogComponent`, used to record which documents are returned to the 
user via the Solr log, described in the 
{solr-javadocs}solr-core/org/apache/solr/handler/component/ResponseLogComponent.html[ResponseLogComponent]
 javadocs.
+* `PhrasesIdentificationComponent`, used to identify & score "phrases" found 
in the input string, based on shingles in indexed fields, described in the 
{solr-javadocs}solr-core/org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent]
 javadocs.
+
+Lastly, you may be interested in some other components created by the 
community and listed on the https://solr.cool/#searchcomponents[Solr Cool] 
website.

Review comment:
   Yeah let's separate external plugin advocacy to another issue and 
address that comprehensively.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1541: RegExp - add case insensitive matching option

2020-06-16 Thread GitBox


jpountz commented on a change in pull request #1541:
URL: https://github.com/apache/lucene-solr/pull/1541#discussion_r440900284



##
File path: lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java
##
@@ -489,6 +497,19 @@ public RegExp(String s) throws IllegalArgumentException {
 this(s, ALL);
   }
   
+  /**
+   * Constructs new RegExp from a string. Same as
+   * RegExp(s, ALL).
+   * 
+   * @param s regexp string
+   * @param caseSensitive case sensitive matching
+   * @exception IllegalArgumentException if an error occurred while parsing the
+   *  regular expression
+   */
+  public RegExp(String s, boolean caseSensitive) throws 
IllegalArgumentException {
+this(s, ALL, caseSensitive);
+  }  

Review comment:
   same here, maybe it's fine to require passing syntax flags when you want 
to configure case sensitivity?

##
File path: lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java
##
@@ -499,10 +520,30 @@ public RegExp(String s) throws IllegalArgumentException {
*  regular expression
*/
   public RegExp(String s, int syntax_flags) throws IllegalArgumentException {
+this(s, syntax_flags, true);
+  }
+  /**
+   * Constructs new RegExp from a string.
+   * 
+   * @param s regexp string
+   * @param syntax_flags boolean 'or' of optional syntax constructs to be
+   *  enabled
+   * @param caseSensitive case sensitive matching
+   * @exception IllegalArgumentException if an error occurred while parsing the
+   *  regular expression
+   */
+  public RegExp(String s, int syntax_flags, boolean caseSensitive) throws 
IllegalArgumentException {
 originalString = s;
-flags = syntax_flags;
+// Trim any bits unrelated to syntax flags
+syntax_flags  = syntax_flags & 0xff;
+if (caseSensitive) {
+  flags = syntax_flags;
+} else {  
+  // Add in the case-insensitive setting
+  flags = syntax_flags  | UNICODE_CASE_INSENSITIVE;

Review comment:
   ```suggestion
 flags = syntax_flags | UNICODE_CASE_INSENSITIVE;
   ```

##
File path: lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java
##
@@ -743,6 +792,30 @@ private Automaton 
toAutomatonInternal(Map automata,
 }
 return a;
   }
+  private Automaton toCaseInsensitiveChar(int codepoint, int 
maxDeterminizedStates) {
+Automaton case1 = Automata.makeChar(codepoint);
+int altCase = Character.isLowerCase(codepoint) ? 
Character.toUpperCase(codepoint) : Character.toLowerCase(codepoint);
+Automaton result;
+if (altCase != codepoint) {
+  result = Operations.union(case1, Automata.makeChar(altCase));
+  result = MinimizationOperations.minimize(result, maxDeterminizedStates); 
 
+} else {
+  result = case1;  
+}  
+return result;
+  }
+  
+  private Automaton toCaseInsensitiveString(int maxDeterminizedStates) {
+List list = new ArrayList<>();
+s.codePoints().forEach(
+p -> {
+  list.add(toCaseInsensitiveChar(p, maxDeterminizedStates));
+}
+);

Review comment:
   I'd rather like a regular for loop, this is a bit abusing lambdas to my 
taste. :)

##
File path: lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java
##
@@ -743,6 +792,30 @@ private Automaton 
toAutomatonInternal(Map automata,
 }
 return a;
   }
+  private Automaton toCaseInsensitiveChar(int codepoint, int 
maxDeterminizedStates) {
+Automaton case1 = Automata.makeChar(codepoint);
+int altCase = Character.isLowerCase(codepoint) ? 
Character.toUpperCase(codepoint) : Character.toLowerCase(codepoint);
+Automaton result;
+if (altCase != codepoint) {
+  result = Operations.union(case1, Automata.makeChar(altCase));
+  result = MinimizationOperations.minimize(result, maxDeterminizedStates); 
 
+} else {
+  result = case1;  
+}  
+return result;
+  }

Review comment:
   I think that this is incorrect as there is no 1:1 mapping between 
lowercase and uppercase letters, for instance `ς` and `σ` both have `Σ` as 
their uppercase variant. And if someone uses `Σ` in their regexes, `ς` wouldn't 
match as `toLowerCase(Σ)` returns `σ`.
   
   Should we make this only about ASCII for now, like Java's Pattern class? 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/util/regex/Pattern.html#CASE_INSENSITIVE
 We could add support for full Unicode later but this doesn't look like a low 
hanging fruit to me?

##
File path: lucene/core/src/java/org/apache/lucene/search/RegexpQuery.java
##
@@ -68,6 +68,19 @@ public RegexpQuery(Term term) {
 this(term, RegExp.ALL);
   }
   
+  /**
+   * Constructs a query for terms matching term.
+   * 
+   * By default, all regular expression features are enabled.
+   * 
+   * 
+   * @param term regular expression.
+   * @param 

[jira] [Resolved] (SOLR-14553) I get a stack trace when running Solrj when calling the query method of HttpSolrClient

2020-06-16 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-14553.

Resolution: Invalid

> I get a stack trace when running Solrj when calling the query method of 
> HttpSolrClient
> --
>
> Key: SOLR-14553
> URL: https://issues.apache.org/jira/browse/SOLR-14553
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: 7.3.1
>Reporter: Jim Anderson
>Priority: Major
>  Labels: HttpSolrClient, solrj
> Attachments: SolrRead.java, bugSteps.ods
>
>
> I started a thread last week about running solrj using Solr 8.5.1. My program 
> was crashing. I was using it with Nutch 1.16. In the documentation, it was 
> recommended that Nutch 1.17 be used with Solr 8.5.1. Nutch 1.17 is still a 
> developer version and not released, so I have downloaded Solar 7.3.1 to use 
> with Nutch 1.16.
> I have been able to build up a small index in Solr using Nutch and I can see 
> the Solr core containing the index in the solr admin window.
> Next, I tried to query Solr using solrj, using an example I found at:
> https://lucene.apache.org/solr/guide/8_1/using-solrj.html#common-configuration-options
> I wrote my own version of the example and tried running it and I get the 
> following
> stacktrace:
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255
> )
>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244
> )
>  at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>  at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>  at bfs.solrRead.SolrRead.main(SolrRead.java:40)
>  
> This problem occurs on a linux platform, and the output from 'uname -a' is:
> Linux roe 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 
> GNU/Linux
> I have logged my steps and can probably reproduce the problem, but it does 
> take some time. Along the way I had some warnings and potential error 
> messages (I too new to Solr to know if the messages were errors or not) so 
> the crash could be due to bad data in the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14553) I get a stack trace when running Solrj when calling the query method of HttpSolrClient

2020-06-16 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136816#comment-17136816
 ] 

Jason Gerlowski commented on SOLR-14553:


Hey Jim, sorry you're having troubles.  Unfortunately, we don't use our JIRA 
here as a support portal - it's only for reproducible (or strongly suspected) 
bugs in Solr itself.

The mailing list is the best place for you to get help.  It's the right format, 
and it's also got a much wider readership than the smaller set of people who 
read the jira list.  I looked for the mail thread you mentioned in your 
description, but couldn't find it, though I found others you created for other 
issues in your recent project.  Please ping that thread again or start a new 
one, and when you do, please include the full stack trace you get on the client 
side.  The stack you've posted above is missing the top few lines that mention 
the name of the actual exception and its message. Also include all of the 
"Caused By" sections, if there are any.

Best of luck!

> I get a stack trace when running Solrj when calling the query method of 
> HttpSolrClient
> --
>
> Key: SOLR-14553
> URL: https://issues.apache.org/jira/browse/SOLR-14553
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: clients - java
>Affects Versions: 7.3.1
>Reporter: Jim Anderson
>Priority: Major
>  Labels: HttpSolrClient, solrj
> Attachments: SolrRead.java, bugSteps.ods
>
>
> I started a thread last week about running solrj using Solr 8.5.1. My program 
> was crashing. I was using it with Nutch 1.16. In the documentation, it was 
> recommended that Nutch 1.17 be used with Solr 8.5.1. Nutch 1.17 is still a 
> developer version and not released, so I have downloaded Solar 7.3.1 to use 
> with Nutch 1.16.
> I have been able to build up a small index in Solr using Nutch and I can see 
> the Solr core containing the index in the solr admin window.
> Next, I tried to query Solr using solrj, using an example I found at:
> https://lucene.apache.org/solr/guide/8_1/using-solrj.html#common-configuration-options
> I wrote my own version of the example and tried running it and I get the 
> following
> stacktrace:
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255
> )
>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244
> )
>  at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>  at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>  at bfs.solrRead.SolrRead.main(SolrRead.java:40)
>  
> This problem occurs on a linux platform, and the output from 'uname -a' is:
> Linux roe 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 
> GNU/Linux
> I have logged my steps and can probably reproduce the problem, but it does 
> take some time. Along the way I had some warnings and potential error 
> messages (I too new to Solr to know if the messages were errors or not) so 
> the crash could be due to bad data in the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-06-16 Thread Varun Thacker (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136805#comment-17136805
 ] 

Varun Thacker commented on LUCENE-9322:
---

Hello [~jtibshirani] ! Thanks for tackling this

 

> Support for storing and retrieving individual float vectors.

How would we feel to break this part and commit it separately ? I believe this 
is adding the VectorField field part ? The PR on SOLR-14397 also added a 
DenseVectorField ( Solr field ) so maybe we could reuse VectorField ( although 
there is some nuance since DenseVectorField currently supports string and 
vector encoding and a code comment saying bfloat16 as well )

> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #1581: SOLR-14572 document missing SearchComponents

2020-06-16 Thread GitBox


epugh commented on a change in pull request #1581:
URL: https://github.com/apache/lucene-solr/pull/1581#discussion_r440992193



##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,14 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+* `RealTimeGetComponent`, described in the section 
<>.
+* `ClusteringComponent`, described in the section 
<>.
+* `SuggestComponent`, described in the section 
<>.
+* `AnalyticsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:
+
+* `ResponseLogComponent`, used to record which documents are returned to the 
user via the Solr log, described in the 
{solr-javadocs}solr-core/org/apache/solr/handler/component/ResponseLogComponent.html[ResponseLogComponent]
 javadocs.
+* `PhrasesIdentificationComponent`, used to identify & score "phrases" found 
in the input string, based on shingles in indexed fields, described in the 
{solr-javadocs}solr-core/org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent]
 javadocs.
+
+Lastly, you may be interested in some other components created by the 
community and listed on the https://solr.cool/#searchcomponents[Solr Cool] 
website.

Review comment:
   I was wondering about actually linking to the specific types of plugins 
on the Solr Cool site? 
   
   https://github.com/solr-extensions/solr-extensions.github.io/issues/19
   
   I was wondering about not adding that link in this PR, and doing a separate 
one with links from the relevant portions of the ref guide.   Or maybe we need 
to have one place in ref guide with "related projects" or something?   I'm 
hopeful that solr.cool ends up being a clearing house of community created 
extensions to Solr of all kinds ;-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1581: SOLR-14572 document missing SearchComponents

2020-06-16 Thread GitBox


gerlowskija commented on a change in pull request #1581:
URL: https://github.com/apache/lucene-solr/pull/1581#discussion_r440983349



##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,15 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:
+
+[cols="20,40,40",options="header"]
+|===
+|Component Name |Class Name |More Information
+|phrases |`solr.PhrasesIdentificationComponent` | Learn more in the 
{solr-javadocs}org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent]
 java class.

Review comment:
   Sounds good.  You've clearly put more thought into it than I have, so 
I'll be happy with whatever you land on.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1581: SOLR-14572 document missing SearchComponents

2020-06-16 Thread GitBox


dsmiley commented on a change in pull request #1581:
URL: https://github.com/apache/lucene-solr/pull/1581#discussion_r440982370



##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,14 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+* `RealTimeGetComponent`, described in the section 
<>.
+* `ClusteringComponent`, described in the section 
<>.
+* `SuggestComponent`, described in the section 
<>.
+* `AnalyticsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:
+
+* `ResponseLogComponent`, used to record which documents are returned to the 
user via the Solr log, described in the 
{solr-javadocs}solr-core/org/apache/solr/handler/component/ResponseLogComponent.html[ResponseLogComponent]
 javadocs.
+* `PhrasesIdentificationComponent`, used to identify & score "phrases" found 
in the input string, based on shingles in indexed fields, described in the 
{solr-javadocs}solr-core/org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent]
 javadocs.
+
+Lastly, you may be interested in some other components created by the 
community and listed on the https://solr.cool/#searchcomponents[Solr Cool] 
website.

Review comment:
   It's great to mention outside resources, though I'd like to point out 
that SearchComponent is merely one of *many* abstractions you could write a 
plugin for.  Maybe we should mention "Solr Cool" in exactly one suitable place 
(about Solr plugins generally) instead of potentially many places.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9379) Directory based approach for index encryption

2020-06-16 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136783#comment-17136783
 ] 

Bruno Roustant edited comment on LUCENE-9379 at 6/16/20, 4:25 PM:
--

So I plan to implement an EncryptingDirectory extending FilterDirectory.

 

+Encryption method:+

AES CTR (counter)
 * This mode is approved by NIST. 
([https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29])
 * AES encryption has the same size as the original clear text (though the last 
block is padded to 128 bits). So we can use the same file pointers.
 * CTR mode allows random access to encrypted blocks (128 bits blocks).
 * IV (initialisation vector) must be random, and is stored at the beginning of 
the encrypted file because it can be public. No need to repeat the IV for each 
block (less disk impact compared to CBC mode).
 * It is appropriate to encrypt streams.

 

+API:+ 

I don’t anticipate any API change.

 

+How to provide encryption keys:+

EncryptingDirectory would require a delegate Directory, an encryption key 
supplier, and a Cipher pool (for performance).

For the callers to pass the encryption keys, I see two ways:

1- In Solr, declare a DirectoryFactory in solrconfig.xml that creates 
EncryptingDirectory. This factory is able to determine the encryption key per 
file based on the path. It is the responsibility of this factory to access the 
keys (e.g. stored in safe DB, received with an admin handler, read from 
properties, etc). The Cipher pool is hold by the DirectoryFactory.

2- More generally the EncryptingDirectory can be created to wrap a Directory 
when opening a segment (e.g. in PostingsFormat/DocValuesFormat 
fieldsConsumer()/fieldsProducer(), in StoredFieldFormat 
fieldsReader()/fieldsWriter(), etc). In this case the 
PostingsFormat/DocValuesFormat/StoredFieldFormat extension determines the 
encryption key based on the SegmentInfo. A custom Codec can be created to 
handle encrypting formats. The Cipher pool is hold either in the Codec or in 
the Format.

 

+Code:+

I will inspire from Apache commons-crypto CtrCryptoOutputStream, although not 
directly using it because it is an OutputStream while we need an IndexOutput. 
And we can probably simplify since we have a specific use-case compared to this 
lib wide usage.


was (Author: broustant):
So I plan to implement an EncryptingDirectory extending FilterDirectory.

 

+Encryption method:+

AES CTR (counter)
 * This mode is approved by NIST. 
([https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29])
 * AES encryption has the same size as the original clear text (though the last 
block is padded to 128 bits). So we can use the same file pointers.
 * CTR mode allows random access to encrypted blocks (128 bits blocks).
 * IV (initialisation vector) must be random, and is stored at the beginning of 
the encrypted file because it can be public.
 * It is appropriate to encrypt streams.

 

+API:+ 

I don’t anticipate any API change.

 

+How to provide encryption keys:+

EncryptingDirectory would require a delegate Directory, an encryption key 
supplier, and a Cipher pool (for performance).

For the callers to pass the encryption keys, I see two ways:

1- In Solr, declare a DirectoryFactory in solrconfig.xml that creates 
EncryptingDirectory. This factory is able to determine the encryption key per 
file based on the path. It is the responsibility of this factory to access the 
keys (e.g. stored in safe DB, received with an admin handler, read from 
properties, etc). The Cipher pool is hold by the DirectoryFactory.

2- More generally the EncryptingDirectory can be created to wrap a Directory 
when opening a segment (e.g. in PostingsFormat/DocValuesFormat 
fieldsConsumer()/fieldsProducer(), in StoredFieldFormat 
fieldsReader()/fieldsWriter(), etc). In this case the 
PostingsFormat/DocValuesFormat/StoredFieldFormat extension determines the 
encryption key based on the SegmentInfo. A custom Codec can be created to 
handle encrypting formats. The Cipher pool is hold either in the Codec or in 
the Format.

 

+Code:+

I will inspire from Apache commons-crypto CtrCryptoOutputStream, although not 
directly using it because it is an OutputStream while we need an IndexOutput. 
And we can probably simplify since we have a specific use-case compared to this 
lib wide usage.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on 

[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-06-16 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136783#comment-17136783
 ] 

Bruno Roustant commented on LUCENE-9379:


So I plan to implement an EncryptingDirectory extending FilterDirectory.

 

+Encryption method:+

AES CTR (counter)
 * This mode is approved by NIST. 
([https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29])
 * AES encryption has the same size as the original clear text (though the last 
block is padded to 128 bits). So we can use the same file pointers.
 * CTR mode allows random access to encrypted blocks (128 bits blocks).
 * IV (initialisation vector) must be random, and is stored at the beginning of 
the encrypted file because it can be public.
 * It is appropriate to encrypt streams.

 

+API:+ 

I don’t anticipate any API change.

 

+How to provide encryption keys:+

EncryptingDirectory would require a delegate Directory, an encryption key 
supplier, and a Cipher pool (for performance).

For the callers to pass the encryption keys, I see two ways:

1- In Solr, declare a DirectoryFactory in solrconfig.xml that creates 
EncryptingDirectory. This factory is able to determine the encryption key per 
file based on the path. It is the responsibility of this factory to access the 
keys (e.g. stored in safe DB, received with an admin handler, read from 
properties, etc). The Cipher pool is hold by the DirectoryFactory.

2- More generally the EncryptingDirectory can be created to wrap a Directory 
when opening a segment (e.g. in PostingsFormat/DocValuesFormat 
fieldsConsumer()/fieldsProducer(), in StoredFieldFormat 
fieldsReader()/fieldsWriter(), etc). In this case the 
PostingsFormat/DocValuesFormat/StoredFieldFormat extension determines the 
encryption key based on the SegmentInfo. A custom Codec can be created to 
handle encrypting formats. The Cipher pool is hold either in the Codec or in 
the Format.

 

+Code:+

I will inspire from Apache commons-crypto CtrCryptoOutputStream, although not 
directly using it because it is an OutputStream while we need an IndexOutput. 
And we can probably simplify since we have a specific use-case compared to this 
lib wide usage.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-06-16 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136761#comment-17136761
 ] 

Gus Heck commented on SOLR-13749:
-

8.6 is now [being 
scheduled|https://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/browser],
 so it's probably important to get any last documentation or touch up for this 
so it can be merged and included in the release. 

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Blocker
> Fix For: 8.6
>
> Attachments: 2020-03 Smiley with ASF hat.jpeg
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{   }}{{size}}{{=}}{{"128"}}
>  {{   }}{{initialSize}}{{=}}{{"0"}}
>  {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>   
>  

[jira] [Resolved] (LUCENE-9353) Move metadata of the terms dictionary to its own file

2020-06-16 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9353.
--
Fix Version/s: 8.6
   Resolution: Fixed

> Move metadata of the terms dictionary to its own file
> -
>
> Key: LUCENE-9353
> URL: https://issues.apache.org/jira/browse/LUCENE-9353
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently opening a terms index requires jumping to the end of the terms 
> index and terms dictionaries to decode some metadata such as sumTtf or file 
> pointers where information for a given field is located. It'd be nicer to 
> have it in a separate file, which would also have the benefit of letting us 
> verify checksums for this part of the content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9353) Move metadata of the terms dictionary to its own file

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136678#comment-17136678
 ] 

ASF subversion and git services commented on LUCENE-9353:
-

Commit 0dac659cd1707e39176c7ae21b33e78fbc39cbab in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0dac659 ]

LUCENE-9353: Move terms metadata to its own file. (#1473)


> Move metadata of the terms dictionary to its own file
> -
>
> Key: LUCENE-9353
> URL: https://issues.apache.org/jira/browse/LUCENE-9353
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently opening a terms index requires jumping to the end of the terms 
> index and terms dictionaries to decode some metadata such as sumTtf or file 
> pointers where information for a given field is located. It'd be nicer to 
> have it in a separate file, which would also have the benefit of letting us 
> verify checksums for this part of the content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8274) Add per-request MDC logging based on user-provided value.

2020-06-16 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136677#comment-17136677
 ] 

Jason Gerlowski edited comment on SOLR-8274 at 6/16/20, 2:04 PM:
-

I looked a bit at the OpenTracing docs in the ref-guide, but I'll admit I don't 
fully understand OpenTracing.  The sense I got from reading them was that 
OpenTracing:

# Required some metric backend to receive tracing info (Datadog, etc.)
# Was geared towards understanding aggregate performance data, rather than 
having it reliably be present on any given request. (the docs mention only 
sampling a small percentage of requests by default to avoid affecting 
performance.)

I'm not sure those takeaways are correct - I'd appreciate being corrected if 
they're not.  But if I've got that right, then it seems like using MDC as this 
ticket suggests or using the different approach in SOLR-14566 could be valuable 
supplements to what OpenTracing provides.  OpenTracing and logging-improvements 
would both be valuable in tandem - they don't duplicate functionality, and 
users might well want one without configuring the other.

But like I said, I might be wrong there. 


was (Author: gerlowskija):
I looked a bit at the OpenTracing docs in the ref-guide, but I'll admit I don't 
fully understand OpenTracing.  The sense I got from reading them was that 
OpenTracing:

# Required some metric backend to receive tracing info (Datadog, etc.)
# Was geared towards understanding aggregate performance data, rather than 
having it reliably be present on any given request. (the docs mention only 
sampling a small percentage of requests by default to avoid affecting 
performance.)

I'm not sure those takeaways are correct - I'd appreciate being corrected if 
they're not.  But if I've got that right, then it seems like using MDC as this 
ticket suggests or using the different approach in SOLR-14566 could be valuable 
supplements to what OpenTracing provides.  OpenTracing and logging-improvements 
would both be valuable in tandem, and don't duplicate functionality.

But like I said, I might be wrong there. 

> Add per-request MDC logging based on user-provided value.
> -
>
> Key: SOLR-8274
> URL: https://issues.apache.org/jira/browse/SOLR-8274
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-8274.patch
>
>
> *Problem 1* Currently, there's no way (AFAIK) to find all log messages 
> associated with a particular request.
> *Problem 2* There's also no easy way for multi-tenant Solr setups to find all 
> log messages associated with a particular customer/tenant.
> Both of these problems would be more manageable if Solr could be configured 
> to record an MDC tag based on a header, or some other user provided value.
> This would allow admins to group together logs about a single request.  If 
> the same header value is repeated multiple times this functionality could 
> also be used to group together arbitrary requests, such as those that come 
> from a particular user, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8274) Add per-request MDC logging based on user-provided value.

2020-06-16 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136677#comment-17136677
 ] 

Jason Gerlowski commented on SOLR-8274:
---

I looked a bit at the OpenTracing docs in the ref-guide, but I'll admit I don't 
fully understand OpenTracing.  The sense I got from reading them was that 
OpenTracing:

# Required some metric backend to receive tracing info (Datadog, etc.)
# Was geared towards understanding aggregate performance data, rather than 
having it reliably be present on any given request. (the docs mention only 
sampling a small percentage of requests by default to avoid affecting 
performance.)

I'm not sure those takeaways are correct - I'd appreciate being corrected if 
they're not.  But if I've got that right, then it seems like using MDC as this 
ticket suggests or using the different approach in SOLR-14566 could be valuable 
supplements to what OpenTracing provides.  OpenTracing and logging-improvements 
would both be valuable in tandem, and don't duplicate functionality.

But like I said, I might be wrong there. 

> Add per-request MDC logging based on user-provided value.
> -
>
> Key: SOLR-8274
> URL: https://issues.apache.org/jira/browse/SOLR-8274
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-8274.patch
>
>
> *Problem 1* Currently, there's no way (AFAIK) to find all log messages 
> associated with a particular request.
> *Problem 2* There's also no easy way for multi-tenant Solr setups to find all 
> log messages associated with a particular customer/tenant.
> Both of these problems would be more manageable if Solr could be configured 
> to record an MDC tag based on a header, or some other user provided value.
> This would allow admins to group together logs about a single request.  If 
> the same header value is repeated multiple times this functionality could 
> also be used to group together arbitrary requests, such as those that come 
> from a particular user, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-16 Thread Alex Klibisz (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136668#comment-17136668
 ] 

Alex Klibisz commented on LUCENE-9378:
--

[~jpountz] Sure, I'll explain below:

My plugin is doing nearest neighbors search on sparse and dense vectors. 
Neighbors are "near" based on a similarity score like L1, L2, Angular, Hamming, 
or Jaccard similarity.

Also, when I say "Vector" I mean it in the math/physics sense, not the "Vector" 
style of data structure. The only relevant data structure for storing a vector 
is a simple arrays of floats or ints.

I'm storing the contents of each vector in binary doc values. So for dense 
floating point vectors, I store the literal numbers (e.g. 0.9,0.22,1.234,...) 
as a `float[]`. And for sparse boolean vectors, I store indices where the 
vector is "true" (1,22,99,101,...) as a `int[]`.

In both cases the ints and floats are serialized as a byte array using the 
sun.misc.Unsafe module and passed to Lucene as a `new BinaryDocValuesField()`. 
From the perspective of Lucene, the serialization protocol shouldn't matter. I 
could just as well be using an ObjectOutputStream, DataOutputStream, etc. 
Typical vector length is ~1000, so 1000 4-byte ints/floats produces `byte[]` 
with length 4000. I also experimented with variable-length encoding schemes, 
but determined it wasn't saving much space at all since Lucene was already 
compressing the byte array.

My benchmark just repeatedly runs queries against a corpus of these stored 
vectors. So it's  loop like:
 * get a query vector
 * for every doc in the lucene shard
 ** read the vector corresponding to the doc from binary doc values (this is 
LZ4.decompress() part that got much slower with the upgrade to 8.5.0).
 ** convert the bytearray to an array of floats or ints
 ** compute the similarity score of the array against the query vector (this is 
the `sortedIntersectionCount` in the screenshots I posted).
 ** return the score

The code is in a very experimental state, but if it helps I can try to clean it 
up and make it reproducible for others.

It seems like a nice solution would be the ability to configure or disable the 
level of compression when I store a BinaryDocValuesField.

Or maybe there is another way to store these vectors that avoid the compression 
overhead? I'm open to other options. I'm much more familiar with Elasticsearch 
internals than I am with Lucene internals.

 

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14566) Record "NOW" on "coordinator" log messages

2020-06-16 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136667#comment-17136667
 ] 

Jason Gerlowski commented on SOLR-14566:


Huh, didn't know about that.  And yeah, that raises some questions.

My primary drive here is to have something that's recorded by default.  Having 
information recorded by the DebugComponent is nice, but in practice customers 
are unlikely to have it configured when a QTime spike or other performance 
problem happens on their live cluster.  Whoever is tasked with looking at their 
logs after the fact and intuiting a cause isn't going to benefit from 
DebugComponent.

So whatever we want this key to look like - whether it's a UUID, whether it's 
the NOW timestamp, whether it's the format that DC uses 
({{hostName-coreName-millisSinceEpoch-requestCounter}}), I'd really like to see 
it recorded by default and not limited to an infrequently used Component.

If no one objects, I can move the requestId stuff out of DC and into 
SearchHandler where the NOW logic currently lives.  But I wonder if there won't 
be objections - presumably whoever wrote DC put stuff in there because there 
was a consensus at the time that it _shouldn't_ be recorded by default, 
probably for perf reasons.  That's why I initially proposed using the NOW 
timestamp - because it sidesteps the question of id-gen perf entirely.  But 
maybe no one cares - in which case I'll update this PR in a few days to use 
DC's "requestId" format.  Or at least do a perf test to see if there's any 
noticeable difference there.

> Record "NOW" on "coordinator" log messages
> --
>
> Key: SOLR-14566
> URL: https://issues.apache.org/jira/browse/SOLR-14566
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, in SolrCore.java we log each search request that comes through 
> each core as it is finishing.  This includes the path, query-params, QTime, 
> and status.  In the case of a distributed search both the "coordinator" node 
> and each of the per-shard requests produce a log message.
> When Solr is fielding many identical queries, such as those created by a 
> healthcheck or dashboard, it can be hard when examining logs to link the 
> per-shard requests with the "cooordinator" request that came in upstream.
> One thing that would make this easier is if the {{NOW}} param added to 
> per-shard requests is also included in the log message from the 
> "coordinator".  While {{NOW}} isn't unique strictly speaking, it often is in 
> practice, and along with the query-params would allow debuggers to associate 
> shard requests with coordinator requests a large majority of the time.
> An alternative approach would be to create a {{qid}} or {{query-uuid}} when 
> the coordinator starts its work that can be logged everywhere.  This provides 
> a stronger expectation around uniqueness, but would require UUID generation 
> on the coordinator, which may be non-negligible work at high QPS (maybe? I 
> have no idea).  It also loses the neatness of reusing data already present on 
> the shard requests.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9353) Move metadata of the terms dictionary to its own file

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136644#comment-17136644
 ] 

ASF subversion and git services commented on LUCENE-9353:
-

Commit 87a3bef50f8c08404ee8bd66ca868caf5dd072cb in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=87a3bef ]

LUCENE-9353: Move terms metadata to its own file. (#1473)



> Move metadata of the terms dictionary to its own file
> -
>
> Key: LUCENE-9353
> URL: https://issues.apache.org/jira/browse/LUCENE-9353
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently opening a terms index requires jumping to the end of the terms 
> index and terms dictionaries to decode some metadata such as sumTtf or file 
> pointers where information for a given field is located. It'd be nicer to 
> have it in a separate file, which would also have the benefit of letting us 
> verify checksums for this part of the content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9353) Move metadata of the terms dictionary to its own file

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136643#comment-17136643
 ] 

ASF subversion and git services commented on LUCENE-9353:
-

Commit 87a3bef50f8c08404ee8bd66ca868caf5dd072cb in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=87a3bef ]

LUCENE-9353: Move terms metadata to its own file. (#1473)



> Move metadata of the terms dictionary to its own file
> -
>
> Key: LUCENE-9353
> URL: https://issues.apache.org/jira/browse/LUCENE-9353
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently opening a terms index requires jumping to the end of the terms 
> index and terms dictionaries to decode some metadata such as sumTtf or file 
> pointers where information for a given field is located. It'd be nicer to 
> have it in a separate file, which would also have the benefit of letting us 
> verify checksums for this part of the content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-06-16 Thread GitBox


jpountz merged pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14558) SolrLogPostTool should record all lines

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136630#comment-17136630
 ] 

ASF subversion and git services commented on SOLR-14558:


Commit 740bfc9183f62fe3e9cee368f900329c088bb384 in lucene-solr's branch 
refs/heads/branch_8x from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=740bfc9 ]

SOLR-14558: Record all log lines in SolrLogPostTool (#1570)


> SolrLogPostTool should record all lines
> ---
>
> Key: SOLR-14558
> URL: https://issues.apache.org/jira/browse/SOLR-14558
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, SolrLogPostTool recognizes a predefined set of "types" of log 
> messages: queries, errors, commits, etc.  This makes it easy to find and 
> explore the traffic your cluster is seeing.
> But it would also be cool if we also indexed all records, even if many of 
> them are just assigned a catch-all "other" type_s value.  We won't be able to 
> parse out detailed values from the log messages the way we would for 
> type_s=query for example, but we can still store the line and timestamp.  
> Gives much better search over the logs than dropping down to "grep" for 
> anything that's not one of the predefined types.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw merged pull request #1573: Cleanup TermsHashPerField

2020-06-16 Thread GitBox


s1monw merged pull request #1573:
URL: https://github.com/apache/lucene-solr/pull/1573


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9405) IndexWriter incorrectly calls closeMergeReaders twice when the merged segment is 100% deleted

2020-06-16 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136616#comment-17136616
 ] 

Michael McCandless commented on LUCENE-9405:


Thanks [~simonw]!

> IndexWriter incorrectly calls closeMergeReaders twice when the merged segment 
> is 100% deleted
> -
>
> Key: LUCENE-9405
> URL: https://issues.apache.org/jira/browse/LUCENE-9405
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: master (9.0), 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is the first spinoff from a [controversial PR to add a new index-time 
> feature to Lucene to merge small segments during 
> commit|https://github.com/apache/lucene-solr/pull/1552].  This can 
> substantially reduce the number of small index segments to search.
> See specifically [this discussion 
> there|https://github.com/apache/lucene-solr/pull/1552#discussion_r440298695].
> {{IndexWriter}} seems to be missing a {{success = true}} inside 
> {{mergeMiddle}} in the case where all segments being merged have 100% 
> deletions and the segments will simply be dropped.
> In this case, in master today, I think we are incorrectly calling 
> {{closeMergedReaders}} twice, first with {{suppressExceptions = false}} and 
> second time with {{true}}.
> There is a [dedicated test case here showing the 
> issue|https://github.com/apache/lucene-solr/commit/cab5ef5e6f2bdcda59fd669a298ec137af9d],
>  but that test case relies on changes in the controversial feature (added 
> {{MergePolicy.findFullFlushMerges}}). I think it should be possible to make 
> another test case show the bug without that controversial feature, and I am 
> unsure why our existing randomized tests have not uncovered this yet ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on pull request #1573: Cleanup TermsHashPerField

2020-06-16 Thread GitBox


mikemccand commented on pull request #1573:
URL: https://github.com/apache/lucene-solr/pull/1573#issuecomment-644734269


   > thanks @mikemccand - I will run tests again and push.
   
   ++



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1573: Cleanup TermsHashPerField

2020-06-16 Thread GitBox


s1monw commented on pull request #1573:
URL: https://github.com/apache/lucene-solr/pull/1573#issuecomment-644732898


   thanks @mikemccand - I will run tests again and push.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on pull request #1573: Cleanup TermsHashPerField

2020-06-16 Thread GitBox


mikemccand commented on pull request #1573:
URL: https://github.com/apache/lucene-solr/pull/1573#issuecomment-644731914


   I tested indexing throughput on `luceneutil` with `wikimediumall`, single 
thread for indexing `SerialMergeScheduler`:
   
   ```
   [mike@beast3 facet]$ grep "GB/hour" /l/logs/simon?
   /l/logs/simon0:Indexer: 46.44432470391602 GB/hour plain text
   /l/logs/simon1:Indexer: 46.267723012921515 GB/hour plain text
   /l/logs/simon2:Indexer: 46.26414201429784 GB/hour plain text
   [mike@beast3 facet]$ grep "GB/hour" /l/logs/trunk?
   /l/logs/trunk0:Indexer: 45.632881600179495 GB/hour plain text
   /l/logs/trunk1:Indexer: 46.09383252131896 GB/hour plain text
   /l/logs/trunk2:Indexer: 45.439666582156924 GB/hour plain text
   ```
   
   Net/net this change might be a bit faster, or just noise, so all clear to 
push!  Thanks @s1monw.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14558) SolrLogPostTool should record all lines

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136599#comment-17136599
 ] 

ASF subversion and git services commented on SOLR-14558:


Commit a7792b129b096245d70ab960f57d15842c60bbd6 in lucene-solr's branch 
refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a7792b1 ]

SOLR-14558: Record all log lines in SolrLogPostTool (#1570)



> SolrLogPostTool should record all lines
> ---
>
> Key: SOLR-14558
> URL: https://issues.apache.org/jira/browse/SOLR-14558
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, SolrLogPostTool recognizes a predefined set of "types" of log 
> messages: queries, errors, commits, etc.  This makes it easy to find and 
> explore the traffic your cluster is seeing.
> But it would also be cool if we also indexed all records, even if many of 
> them are just assigned a catch-all "other" type_s value.  We won't be able to 
> parse out detailed values from the log messages the way we would for 
> type_s=query for example, but we can still store the line and timestamp.  
> Gives much better search over the logs than dropping down to "grep" for 
> anything that's not one of the predefined types.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija merged pull request #1570: SOLR-14558: Record all log lines in SolrLogPostTool

2020-06-16 Thread GitBox


gerlowskija merged pull request #1570:
URL: https://github.com/apache/lucene-solr/pull/1570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14571) Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136591#comment-17136591
 ] 

Lucene/Solr QA commented on SOLR-14571:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m  2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:black}{color} | {color:black} {color} | {color:black}  0m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14571 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13005760/SOLR-14571.patch |
| Optional Tests |  validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / a108f90869c |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| modules | C: solr solr/webapp U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/764/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Index download speed while replicating is fixed at 5.1 in replication.html
> --
>
> Key: SOLR-14571
> URL: https://issues.apache.org/jira/browse/SOLR-14571
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, master (9.0), 8.5.2
>Reporter: Florin Babes
>Priority: Trivial
>  Labels: AdminUI, Replication
> Attachments: SOLR-14571.patch
>
>
> Hello,
> While checking ways to optimize the speed of replication I've noticed that 
> the index download speed is fixed at 5.1 in replication.html. There is a 
> reason for that? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #1581: SOLR-14572 document missing SearchComponents

2020-06-16 Thread GitBox


epugh commented on a change in pull request #1581:
URL: https://github.com/apache/lucene-solr/pull/1581#discussion_r440778391



##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,15 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:
+
+[cols="20,40,40",options="header"]
+|===
+|Component Name |Class Name |More Information
+|phrases |`solr.PhrasesIdentificationComponent` | Learn more in the 
{solr-javadocs}org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent]
 java class.

Review comment:
   Thanks @gerlowskija..   In looking at the page a day later, I think that 
a table that is more in the vein of `Component Name, Class Name (?), Purpose, 
More Information, where More Information is some links might be the way to go.  
 Or, dropping the Class Name as kind of an implementaiton detail that you can 
look up on the More Information page.   Which maybe gets me back to just one 
big list! I'll mull some more.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #1581: SOLR-14572 document missing SearchComponents

2020-06-16 Thread GitBox


epugh commented on a change in pull request #1581:
URL: https://github.com/apache/lucene-solr/pull/1581#discussion_r440777282



##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,15 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:

Review comment:
   These each have a detailed section in the Ref Guide, and in fact I was 
thinking of moving the `Get` component, which is covered in the Ref Guide up to 
here as well.   Having said that, I'm rethinking the way the page lists the 
components out, and maybe taking a bigger pass through.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1581: SOLR-14572 document missing SearchComponents

2020-06-16 Thread GitBox


gerlowskija commented on a change in pull request #1581:
URL: https://github.com/apache/lucene-solr/pull/1581#discussion_r44077



##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,15 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:

Review comment:
   I'm surprised you put these components in a separate table, rather than 
appending them to the existing list right above here.  Any particular reason 
you kept this list separate from the list above?

##
File path: 
solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
##
@@ -169,3 +169,15 @@ Many of the other useful components are described in 
sections of this Guide for
 * `TermVectorComponent`, described in the section 
<>.
 * `QueryElevationComponent`, described in the section 
<>.
 * `TermsComponent`, described in the section 
<>.
+
+Other components that ship with Solr include:
+
+[cols="20,40,40",options="header"]
+|===
+|Component Name |Class Name |More Information
+|phrases |`solr.PhrasesIdentificationComponent` | Learn more in the 
{solr-javadocs}org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent]
 java class.

Review comment:
   I understand the desire to avoid duplication here with the Javadoc 
content, but it might still be nice to have a 1 sentence description, so 
readers don't have to click through just to figure out whether they're 
interested.
   
   That said, even without that, this PR raises the visibility of these 
components which is a Good Thing.  So I'm happy either way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9404) simplify checksum calculation of ByteBuffersIndexOutput

2020-06-16 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-9404.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> simplify checksum calculation of ByteBuffersIndexOutput
> ---
>
> Key: LUCENE-9404
> URL: https://issues.apache.org/jira/browse/LUCENE-9404
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9404.patch
>
>
> I think this class can avoid its current logic/copying and just call 
> CRC32.update(ByteBuffer) which is optimized for both array and direct buffers?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9404) simplify checksum calculation of ByteBuffersIndexOutput

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136524#comment-17136524
 ] 

ASF subversion and git services commented on LUCENE-9404:
-

Commit a108f90869cdb030db6ba14653036a4fee58ff68 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a108f90 ]

LUCENE-9404: simplify checksum calculation of ByteBuffersIndexOutput

Rather than copying from buffers, we can pass the buffers directly to the 
checksum with good performance in JDK9+


> simplify checksum calculation of ByteBuffersIndexOutput
> ---
>
> Key: LUCENE-9404
> URL: https://issues.apache.org/jira/browse/LUCENE-9404
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9404.patch
>
>
> I think this class can avoid its current logic/copying and just call 
> CRC32.update(ByteBuffer) which is optimized for both array and direct buffers?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Marius Grama (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136513#comment-17136513
 ] 

Marius Grama edited comment on LUCENE-9407 at 6/16/20, 10:26 AM:
-

{quote}
the instanceof happening here in the elasticsearch is an anti-pattern, we 
shouldn't design our API around it.
{quote}

[~rcmuir] thanks for the reply.

 

For the following use case:

{quote}

If the percolator deals with 


org.apache.lucene.document.LatLonPointInPolygonQuery

then it should probably suffice making use of  the 

org.apache.lucene.document.LatLonShapeBoundingBoxQuery

for finding the search queries that have polygons containing the LatLonPoint of 
the location field of the document being percolated.

{quote}

could you give me a hint on how to approach the problem without using 
{{instanceof}} ?

 

 


was (Author: mariusneo):
{quote}

the instanceof happening here in the elasticsearch is an anti-pattern, we 
shouldn't design our API around it.

 \{quote}

[~rcmuir] thanks for the reply.

 

For the following use case:

{quote}

If the percolator deals with 


{{ org.apache.lucene.document.LatLonPointInPolygonQuery}}

then it should probably suffice making use of  the 
{{}}

{{ org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}

for finding the search queries that have polygons containing the LatLonPoint of 
the location field of the document being percolated.

 \{quote}

could you give me a hint on how to approach the problem without using 
{{instanceof}} ?

 

 

> Change the visibility of LatLonXQuery classes to public 
> 
>
> Key: LUCENE-9407
> URL: https://issues.apache.org/jira/browse/LUCENE-9407
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.5.2
>Reporter: Marius Grama
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Problem description
>  
>  A few years ago the geospatial queries  classes have been refactored to be 
> package-private:
>   
> {code:java}
> final class LatLonPointInPolygonQuery extends Query
> {code}
>  
>  I get that there must be a reason for making use of package-private 
> constructors in the geospatial query classes, but I'm wondering whether it 
> would hurt to leave the classes still public.
>   
>  Having the classes package-private means that they can't be used outside of 
> the 
>   
>   
>  {{package org.apache.lucene.document;}}
>   
>  This is the PR in which the refactoring was made:
>  
> [https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
>   
>   
>   
> h2. Background
> h3. Elasticsearch Percolator dealing with geospatial queries 
> In the elasticsearch code (specifically over the percolator functionality) I 
> have noticed that when using polygon queries at the moment there isn't 
> possible to do a reversed search on the search queries index.
>   
>  This means that for all the geospatial queries are applied against the 
> elasticsearch memory index in order to check for a percolation match.
>   
>  If the percolator deals with 
>   
>  {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
>   
>  then it should probably suffice making use of  the 
>   
>  {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
>   
>  for finding the search queries that have polygons containing the LatLonPoint 
> of the location field of the document being percolated.
> h2. Proposed solution
>  
>  Increase the visibility of the LatLonXQuery classes to {{public}} so that 
> they can 
>  be used in other packages (e.g. elasticsearch percolator code).
>  
> *NOTE* that the constructors of the classes are still package-protected 
> reason why the classes won't be able to be instantiated outside of their 
> original package.
>  
>  In the elasticsearch percolator code I'd have to make use explicitly of the 
> LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
> queries to be used in the percolation process:
>   
>  
> [https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
>   
> h3. Pull request
>  
> [https://github.com/apache/lucene-solr/pull/1583]
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9403) tune BufferedChecksum.DEFAULT_BUFFERSIZE

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136516#comment-17136516
 ] 

ASF subversion and git services commented on LUCENE-9403:
-

Commit 4decd5aa9c59c1a64ce43a3182784595636b1273 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4decd5a ]

LUCENE-9403: tune BufferedChecksum.DEFAULT_BUFFERSIZE

Increase the buffersize used for ChecksumIndexInput for better crc performance.


> tune BufferedChecksum.DEFAULT_BUFFERSIZE
> 
>
> Key: LUCENE-9403
> URL: https://issues.apache.org/jira/browse/LUCENE-9403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9403.patch
>
>
> This is currently set to 256 bytes, so that's the amount of data we pass to 
> crc.update() at once. 
> I tried different sizes with https://github.com/benalexau/hash-bench and 
> JDK14:
> {noformat}
> HashBench.withArray crc32-jre   256  avgt5   81.349 ±  8.364  
> ns/op
> HashBench.withArray crc32-jre   512  avgt5   95.204 ± 10.057  
> ns/op
> HashBench.withArray crc32-jre  1024  avgt5  120.081 ±  8.471  
> ns/op
> HashBench.withArray crc32-jre  2048  avgt5  173.505 ±  8.857  
> ns/op
> HashBench.withArray crc32-jre  8192  avgt5  487.721 ± 11.435  
> ns/op
> {noformat}
> based on this let's bump the buffersize from 256 to 1024? I think we want to 
> avoid huge buffers but still keep the CPU overhead low. It only impacts 
> ChecksumIndexInputs (e.g. speed of checkIntegrity() calls at merge) because 
> IndexOutputs do not need this buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9403) tune BufferedChecksum.DEFAULT_BUFFERSIZE

2020-06-16 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-9403.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> tune BufferedChecksum.DEFAULT_BUFFERSIZE
> 
>
> Key: LUCENE-9403
> URL: https://issues.apache.org/jira/browse/LUCENE-9403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-9403.patch
>
>
> This is currently set to 256 bytes, so that's the amount of data we pass to 
> crc.update() at once. 
> I tried different sizes with https://github.com/benalexau/hash-bench and 
> JDK14:
> {noformat}
> HashBench.withArray crc32-jre   256  avgt5   81.349 ±  8.364  
> ns/op
> HashBench.withArray crc32-jre   512  avgt5   95.204 ± 10.057  
> ns/op
> HashBench.withArray crc32-jre  1024  avgt5  120.081 ±  8.471  
> ns/op
> HashBench.withArray crc32-jre  2048  avgt5  173.505 ±  8.857  
> ns/op
> HashBench.withArray crc32-jre  8192  avgt5  487.721 ± 11.435  
> ns/op
> {noformat}
> based on this let's bump the buffersize from 256 to 1024? I think we want to 
> avoid huge buffers but still keep the CPU overhead low. It only impacts 
> ChecksumIndexInputs (e.g. speed of checkIntegrity() calls at merge) because 
> IndexOutputs do not need this buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Marius Grama (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136513#comment-17136513
 ] 

Marius Grama edited comment on LUCENE-9407 at 6/16/20, 10:25 AM:
-

{quote}

the instanceof happening here in the elasticsearch is an anti-pattern, we 
shouldn't design our API around it.

 \{quote}

[~rcmuir] thanks for the reply.

 

For the following use case:

{quote}

If the percolator deals with 


{{ org.apache.lucene.document.LatLonPointInPolygonQuery}}

then it should probably suffice making use of  the 
{{}}

{{ org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}

for finding the search queries that have polygons containing the LatLonPoint of 
the location field of the document being percolated.

 \{quote}

could you give me a hint on how to approach the problem without using 
{{instanceof}} ?

 

 


was (Author: mariusneo):
>  the instanceof happening here in the elasticsearch is an anti-pattern, we 
>shouldn't design our API around it.

 

[~rcmuir] thanks for the reply.

 

For the following use case:

> If the percolator deals with 
{{> org.apache.lucene.document.LatLonPointInPolygonQuery}}
> then it should probably suffice making use of  the 
{{> org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
> for finding the search queries that have polygons containing the LatLonPoint 
> of the location field of the document being percolated.

 

could you give me a hint on how to approach the problem without using 
{{instanceof}} ?

 

Thank you for taking the time to look over this issue.

> Change the visibility of LatLonXQuery classes to public 
> 
>
> Key: LUCENE-9407
> URL: https://issues.apache.org/jira/browse/LUCENE-9407
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.5.2
>Reporter: Marius Grama
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Problem description
>  
>  A few years ago the geospatial queries  classes have been refactored to be 
> package-private:
>   
> {code:java}
> final class LatLonPointInPolygonQuery extends Query
> {code}
>  
>  I get that there must be a reason for making use of package-private 
> constructors in the geospatial query classes, but I'm wondering whether it 
> would hurt to leave the classes still public.
>   
>  Having the classes package-private means that they can't be used outside of 
> the 
>   
>   
>  {{package org.apache.lucene.document;}}
>   
>  This is the PR in which the refactoring was made:
>  
> [https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
>   
>   
>   
> h2. Background
> h3. Elasticsearch Percolator dealing with geospatial queries 
> In the elasticsearch code (specifically over the percolator functionality) I 
> have noticed that when using polygon queries at the moment there isn't 
> possible to do a reversed search on the search queries index.
>   
>  This means that for all the geospatial queries are applied against the 
> elasticsearch memory index in order to check for a percolation match.
>   
>  If the percolator deals with 
>   
>  {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
>   
>  then it should probably suffice making use of  the 
>   
>  {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
>   
>  for finding the search queries that have polygons containing the LatLonPoint 
> of the location field of the document being percolated.
> h2. Proposed solution
>  
>  Increase the visibility of the LatLonXQuery classes to {{public}} so that 
> they can 
>  be used in other packages (e.g. elasticsearch percolator code).
>  
> *NOTE* that the constructors of the classes are still package-protected 
> reason why the classes won't be able to be instantiated outside of their 
> original package.
>  
>  In the elasticsearch percolator code I'd have to make use explicitly of the 
> LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
> queries to be used in the percolation process:
>   
>  
> [https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
>   
> h3. Pull request
>  
> [https://github.com/apache/lucene-solr/pull/1583]
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Marius Grama (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136513#comment-17136513
 ] 

Marius Grama commented on LUCENE-9407:
--

>  the instanceof happening here in the elasticsearch is an anti-pattern, we 
>shouldn't design our API around it.

 

[~rcmuir] thanks for the reply.

 

For the following use case:

> If the percolator deals with 
{{> org.apache.lucene.document.LatLonPointInPolygonQuery}}
> then it should probably suffice making use of  the 
{{> org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
> for finding the search queries that have polygons containing the LatLonPoint 
> of the location field of the document being percolated.

 

could you give me a hint on how to approach the problem without using 
{{instanceof}} ?

 

Thank you for taking the time to look over this issue.

> Change the visibility of LatLonXQuery classes to public 
> 
>
> Key: LUCENE-9407
> URL: https://issues.apache.org/jira/browse/LUCENE-9407
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.5.2
>Reporter: Marius Grama
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Problem description
>  
>  A few years ago the geospatial queries  classes have been refactored to be 
> package-private:
>   
> {code:java}
> final class LatLonPointInPolygonQuery extends Query
> {code}
>  
>  I get that there must be a reason for making use of package-private 
> constructors in the geospatial query classes, but I'm wondering whether it 
> would hurt to leave the classes still public.
>   
>  Having the classes package-private means that they can't be used outside of 
> the 
>   
>   
>  {{package org.apache.lucene.document;}}
>   
>  This is the PR in which the refactoring was made:
>  
> [https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
>   
>   
>   
> h2. Background
> h3. Elasticsearch Percolator dealing with geospatial queries 
> In the elasticsearch code (specifically over the percolator functionality) I 
> have noticed that when using polygon queries at the moment there isn't 
> possible to do a reversed search on the search queries index.
>   
>  This means that for all the geospatial queries are applied against the 
> elasticsearch memory index in order to check for a percolation match.
>   
>  If the percolator deals with 
>   
>  {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
>   
>  then it should probably suffice making use of  the 
>   
>  {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
>   
>  for finding the search queries that have polygons containing the LatLonPoint 
> of the location field of the document being percolated.
> h2. Proposed solution
>  
>  Increase the visibility of the LatLonXQuery classes to {{public}} so that 
> they can 
>  be used in other packages (e.g. elasticsearch percolator code).
>  
> *NOTE* that the constructors of the classes are still package-protected 
> reason why the classes won't be able to be instantiated outside of their 
> original package.
>  
>  In the elasticsearch percolator code I'd have to make use explicitly of the 
> LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
> queries to be used in the percolation process:
>   
>  
> [https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
>   
> h3. Pull request
>  
> [https://github.com/apache/lucene-solr/pull/1583]
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136507#comment-17136507
 ] 

Robert Muir commented on LUCENE-9407:
-

-1: the instanceof happening here in the elasticsearch is an anti-pattern, we 
shouldn't design our API around it.

> Change the visibility of LatLonXQuery classes to public 
> 
>
> Key: LUCENE-9407
> URL: https://issues.apache.org/jira/browse/LUCENE-9407
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.5.2
>Reporter: Marius Grama
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Problem description
>  
>  A few years ago the geospatial queries  classes have been refactored to be 
> package-private:
>   
> {code:java}
> final class LatLonPointInPolygonQuery extends Query
> {code}
>  
>  I get that there must be a reason for making use of package-private 
> constructors in the geospatial query classes, but I'm wondering whether it 
> would hurt to leave the classes still public.
>   
>  Having the classes package-private means that they can't be used outside of 
> the 
>   
>   
>  {{package org.apache.lucene.document;}}
>   
>  This is the PR in which the refactoring was made:
>  
> [https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
>   
>   
>   
> h2. Background
> h3. Elasticsearch Percolator dealing with geospatial queries 
> In the elasticsearch code (specifically over the percolator functionality) I 
> have noticed that when using polygon queries at the moment there isn't 
> possible to do a reversed search on the search queries index.
>   
>  This means that for all the geospatial queries are applied against the 
> elasticsearch memory index in order to check for a percolation match.
>   
>  If the percolator deals with 
>   
>  {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
>   
>  then it should probably suffice making use of  the 
>   
>  {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
>   
>  for finding the search queries that have polygons containing the LatLonPoint 
> of the location field of the document being percolated.
> h2. Proposed solution
>  
>  Increase the visibility of the LatLonXQuery classes to {{public}} so that 
> they can 
>  be used in other packages (e.g. elasticsearch percolator code).
>  
> *NOTE* that the constructors of the classes are still package-protected 
> reason why the classes won't be able to be instantiated outside of their 
> original package.
>  
>  In the elasticsearch percolator code I'd have to make use explicitly of the 
> LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
> queries to be used in the percolation process:
>   
>  
> [https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
>   
> h3. Pull request
>  
> [https://github.com/apache/lucene-solr/pull/1583]
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9396) Improve truncation detection for points

2020-06-16 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9396.
--
Fix Version/s: 8.6
   Resolution: Fixed

> Improve truncation detection for points
> ---
>
> Key: LUCENE-9396
> URL: https://issues.apache.org/jira/browse/LUCENE-9396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With the refactoring of LUCENE-9148, it becomes possible to improve 
> corruption detection by serializing the length of the index and data files in 
> the meta file instead of relying on CodecUtil#retrieveChecksum.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9396) Improve truncation detection for points

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136504#comment-17136504
 ] 

ASF subversion and git services commented on LUCENE-9396:
-

Commit 2711c288421945aa7a1f22b77ee3a672ed5db7e4 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2711c28 ]

LUCENE-9396: Improve truncation detection for points. (#1557)



> Improve truncation detection for points
> ---
>
> Key: LUCENE-9396
> URL: https://issues.apache.org/jira/browse/LUCENE-9396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With the refactoring of LUCENE-9148, it becomes possible to improve 
> corruption detection by serializing the length of the index and data files in 
> the meta file instead of relying on CodecUtil#retrieveChecksum.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9396) Improve truncation detection for points

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136496#comment-17136496
 ] 

ASF subversion and git services commented on LUCENE-9396:
-

Commit 2b61b205fc76b25091a0078cf4ca27cd2e530aff in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2b61b20 ]

LUCENE-9396: Improve truncation detection for points. (#1557)



> Improve truncation detection for points
> ---
>
> Key: LUCENE-9396
> URL: https://issues.apache.org/jira/browse/LUCENE-9396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With the refactoring of LUCENE-9148, it becomes possible to improve 
> corruption detection by serializing the length of the index and data files in 
> the meta file instead of relying on CodecUtil#retrieveChecksum.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-16 Thread GitBox


jpountz merged pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-16 Thread GitBox


jpountz commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-644651408


   I implemented your suggestion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Marius Grama (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marius Grama updated LUCENE-9407:
-
Description: 
h2. Problem description

 
 A few years ago the geospatial queries  classes have been refactored to be 
package-private:
  
{code:java}
final class LatLonPointInPolygonQuery extends Query
{code}
 
 I get that there must be a reason for making use of package-private 
constructors in the geospatial query classes, but I'm wondering whether it 
would hurt to leave the classes still public.
  
 Having the classes package-private means that they can't be used outside of 
the 
  
  
 {{package org.apache.lucene.document;}}
  
 This is the PR in which the refactoring was made:
 
[https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
  
  
  
h2. Background
h3. Elasticsearch Percolator dealing with geospatial queries 

In the elasticsearch code (specifically over the percolator functionality) I 
have noticed that when using polygon queries at the moment there isn't possible 
to do a reversed search on the search queries index.
  
 This means that for all the geospatial queries are applied against the 
elasticsearch memory index in order to check for a percolation match.
  
 If the percolator deals with 
  
 {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
  
 then it should probably suffice making use of  the 
  
 {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
  
 for finding the search queries that have polygons containing the LatLonPoint 
of the location field of the document being percolated.
h2. Proposed solution

 
 Increase the visibility of the LatLonXQuery classes to {{public}} so that they 
can 
 be used in other packages (e.g. elasticsearch percolator code).

 

**NOTE** that the constructors of the classes are still package-protected 
reason why the classes won't be able to be instantiated outside of their 
original package.


  
 In the elasticsearch percolator code I'd have to make use explicitly of the 
LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
queries to be used in the percolation process:
  
 
[https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
  
h3. Pull request

 

[https://github.com/apache/lucene-solr/pull/1583]

 
  

  was:
h2. Problem description

 
 A few years ago the geospatial queries  classes have been refactored to be 
package-private:
  
{code:java}
final class LatLonPointInPolygonQuery extends Query
{code}
 
 I get that there must be a reason for making use of package-private 
constructors in the geospatial query classes, but I'm wondering whether it 
would hurt to leave the classes still public.
  
 Having the classes package-private means that they can't be used outside of 
the 
  
  
 {{package org.apache.lucene.document;}}
  
 This is the PR in which the refactoring was made:
 
[https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
  
  
  
h2. Background
h3. Elasticsearch Percolator dealing with geospatial queries 

In the elasticsearch code (specifically over the percolator functionality) I 
have noticed that when using polygon queries at the moment there isn't possible 
to do a reversed search on the search queries index.
  
 This means that for all the geospatial queries are applied against the 
elasticsearch memory index in order to check for a percolation match.
  
 If the percolator deals with 
  
 {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
  
 then it should probably suffice making use of  the 
  
 {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
  
 for finding the search queries that have polygons containing the LatLonPoint 
of the location field of the document being percolated.
h2. Proposed solution

 
 Increase the visibility of the LatLonXQuery classes to {{public}} so that they 
can 
 be used in other packages (e.g. elasticsearch percolator code)
  
 In the elasticsearch percolator code I'd have to make use explicitly of the 
LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
queries to be used in the percolation process:
  
 
[https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
  
h3. Pull request

 

[https://github.com/apache/lucene-solr/pull/1583]


  
  


> Change the visibility of LatLonXQuery classes to public 
> 
>
> Key: LUCENE-9407
> URL: https://issues.apache.org/jira/browse/LUCENE-9407
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.5.2
>Reporter: Marius Grama
>Priority: Major
>  Time Spent: 10m
>  

[jira] [Updated] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Marius Grama (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marius Grama updated LUCENE-9407:
-
Description: 
h2. Problem description

 
 A few years ago the geospatial queries  classes have been refactored to be 
package-private:
  
{code:java}
final class LatLonPointInPolygonQuery extends Query
{code}
 
 I get that there must be a reason for making use of package-private 
constructors in the geospatial query classes, but I'm wondering whether it 
would hurt to leave the classes still public.
  
 Having the classes package-private means that they can't be used outside of 
the 
  
  
 {{package org.apache.lucene.document;}}
  
 This is the PR in which the refactoring was made:
 
[https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
  
  
  
h2. Background
h3. Elasticsearch Percolator dealing with geospatial queries 

In the elasticsearch code (specifically over the percolator functionality) I 
have noticed that when using polygon queries at the moment there isn't possible 
to do a reversed search on the search queries index.
  
 This means that for all the geospatial queries are applied against the 
elasticsearch memory index in order to check for a percolation match.
  
 If the percolator deals with 
  
 {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
  
 then it should probably suffice making use of  the 
  
 {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
  
 for finding the search queries that have polygons containing the LatLonPoint 
of the location field of the document being percolated.
h2. Proposed solution

 
 Increase the visibility of the LatLonXQuery classes to {{public}} so that they 
can 
 be used in other packages (e.g. elasticsearch percolator code).

 

*NOTE* that the constructors of the classes are still package-protected reason 
why the classes won't be able to be instantiated outside of their original 
package.

 
 In the elasticsearch percolator code I'd have to make use explicitly of the 
LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
queries to be used in the percolation process:
  
 
[https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
  
h3. Pull request

 

[https://github.com/apache/lucene-solr/pull/1583]

 
  

  was:
h2. Problem description

 
 A few years ago the geospatial queries  classes have been refactored to be 
package-private:
  
{code:java}
final class LatLonPointInPolygonQuery extends Query
{code}
 
 I get that there must be a reason for making use of package-private 
constructors in the geospatial query classes, but I'm wondering whether it 
would hurt to leave the classes still public.
  
 Having the classes package-private means that they can't be used outside of 
the 
  
  
 {{package org.apache.lucene.document;}}
  
 This is the PR in which the refactoring was made:
 
[https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
  
  
  
h2. Background
h3. Elasticsearch Percolator dealing with geospatial queries 

In the elasticsearch code (specifically over the percolator functionality) I 
have noticed that when using polygon queries at the moment there isn't possible 
to do a reversed search on the search queries index.
  
 This means that for all the geospatial queries are applied against the 
elasticsearch memory index in order to check for a percolation match.
  
 If the percolator deals with 
  
 {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
  
 then it should probably suffice making use of  the 
  
 {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
  
 for finding the search queries that have polygons containing the LatLonPoint 
of the location field of the document being percolated.
h2. Proposed solution

 
 Increase the visibility of the LatLonXQuery classes to {{public}} so that they 
can 
 be used in other packages (e.g. elasticsearch percolator code).

 

**NOTE** that the constructors of the classes are still package-protected 
reason why the classes won't be able to be instantiated outside of their 
original package.


  
 In the elasticsearch percolator code I'd have to make use explicitly of the 
LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
queries to be used in the percolation process:
  
 
[https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
  
h3. Pull request

 

[https://github.com/apache/lucene-solr/pull/1583]

 
  


> Change the visibility of LatLonXQuery classes to public 
> 
>
> Key: LUCENE-9407
> URL: https://issues.apache.org/jira/browse/LUCENE-9407
> Project: Lucene - Core
>  Issue Type: 

[jira] [Updated] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Marius Grama (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marius Grama updated LUCENE-9407:
-
Description: 
h2. Problem description

 
 A few years ago the geospatial queries  classes have been refactored to be 
package-private:
  
{code:java}
final class LatLonPointInPolygonQuery extends Query
{code}
 
 I get that there must be a reason for making use of package-private 
constructors in the geospatial query classes, but I'm wondering whether it 
would hurt to leave the classes still public.
  
 Having the classes package-private means that they can't be used outside of 
the 
  
  
 {{package org.apache.lucene.document;}}
  
 This is the PR in which the refactoring was made:
 
[https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
  
  
  
h2. Background
h3. Elasticsearch Percolator dealing with geospatial queries 

In the elasticsearch code (specifically over the percolator functionality) I 
have noticed that when using polygon queries at the moment there isn't possible 
to do a reversed search on the search queries index.
  
 This means that for all the geospatial queries are applied against the 
elasticsearch memory index in order to check for a percolation match.
  
 If the percolator deals with 
  
 {{org.apache.lucene.document.LatLonPointInPolygonQuery}}
  
 then it should probably suffice making use of  the 
  
 {{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
  
 for finding the search queries that have polygons containing the LatLonPoint 
of the location field of the document being percolated.
h2. Proposed solution

 
 Increase the visibility of the LatLonXQuery classes to {{public}} so that they 
can 
 be used in other packages (e.g. elasticsearch percolator code)
  
 In the elasticsearch percolator code I'd have to make use explicitly of the 
LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
queries to be used in the percolation process:
  
 
[https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
  
h3. Pull request

 

[https://github.com/apache/lucene-solr/pull/1583]


  
  

  was:
h2. Problem description
 
A few years ago the geospatial queries  classes have been refactored to be 
package-private:
 
{code}
final class LatLonPointInPolygonQuery extends Query
{code}
 
I get that there must be a reason for making use of package-private 
constructors in the geospatial query classes, but I'm wondering whether it 
would hurt to leave the classes still public.
 
Having the classes package-private means that they can't be used outside of the 
 
 
{{package org.apache.lucene.document;}}
 
This is the PR in which the refactoring was made:
[https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
 
 
 
h2. Background
h3. Elasticsearch Percolator dealing with geospatial queries 
In the elasticsearch code (specifically over the percolator functionality) I 
have noticed that when using polygon queries at the moment there isn't possible 
to do a reversed search on the search queries index.
 
This means that for all the geospatial queries are applied against the 
elasticsearch memory index in order to check for a percolation match.
 
If the percolator deals with 
 
{{org.apache.lucene.document.LatLonPointInPolygonQuery}}
 
then it should probably suffice making use of  the 
 
{{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
 
for finding the search queries that have polygons containing the LatLonPoint of 
the location field of the document being percolated.
h2. Proposed solution
 
Increase the visibility of the LatLonXQuery classes to {{public}} so that they 
can 
be used in other packages (e.g. elasticsearch percolator code)
 
In the elasticsearch percolator code I'd have to make use explicitly of the 
LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
queries to be used in the percolation process:
 
[https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
 
 
 


> Change the visibility of LatLonXQuery classes to public 
> 
>
> Key: LUCENE-9407
> URL: https://issues.apache.org/jira/browse/LUCENE-9407
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.5.2
>Reporter: Marius Grama
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Problem description
>  
>  A few years ago the geospatial queries  classes have been refactored to be 
> package-private:
>   
> {code:java}
> final class LatLonPointInPolygonQuery extends Query
> {code}
>  
>  I get that there must be a reason for making use of 

[GitHub] [lucene-solr] mariusneo opened a new pull request #1583: LUCENE-9407: change the visibility of the LatLonXQuery classes to public

2020-06-16 Thread GitBox


mariusneo opened a new pull request #1583:
URL: https://github.com/apache/lucene-solr/pull/1583


   LUCENE-9407: change the visibility of the LatLonXQuery classes to public in 
order to allow using them outside of the org.apache.lucene.document package
   
   
   # Description
   
   A few years ago the geospatial queries  classes have been refactored to be 
package-private.
   This restriction doesn't allow these classes to be used outside of the 
org.apache.lucene.document package.
   
   # Solution
   
   Changed the visibility of the LatLonXQuery classes to public so that they 
can be used outside of the org.apache.lucene.document package (e.g. 
elasticsearch percolator)
   
   # Tests
   
   No Tests required because no functionality has been changed.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9407) Change the visibility of LatLonXQuery classes to public

2020-06-16 Thread Marius Grama (Jira)
Marius Grama created LUCENE-9407:


 Summary: Change the visibility of LatLonXQuery classes to public 
 Key: LUCENE-9407
 URL: https://issues.apache.org/jira/browse/LUCENE-9407
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 8.5.2
Reporter: Marius Grama


h2. Problem description
 
A few years ago the geospatial queries  classes have been refactored to be 
package-private:
 
{code}
final class LatLonPointInPolygonQuery extends Query
{code}
 
I get that there must be a reason for making use of package-private 
constructors in the geospatial query classes, but I'm wondering whether it 
would hurt to leave the classes still public.
 
Having the classes package-private means that they can't be used outside of the 
 
 
{{package org.apache.lucene.document;}}
 
This is the PR in which the refactoring was made:
[https://github.com/apache/lucene-solr/commit/2264600ffe4649abb0edbe7a6882ffc82f6e918b]
 
 
 
h2. Background
h3. Elasticsearch Percolator dealing with geospatial queries 
In the elasticsearch code (specifically over the percolator functionality) I 
have noticed that when using polygon queries at the moment there isn't possible 
to do a reversed search on the search queries index.
 
This means that for all the geospatial queries are applied against the 
elasticsearch memory index in order to check for a percolation match.
 
If the percolator deals with 
 
{{org.apache.lucene.document.LatLonPointInPolygonQuery}}
 
then it should probably suffice making use of  the 
 
{{org.apache.lucene.document.LatLonShapeBoundingBoxQuery}}
 
for finding the search queries that have polygons containing the LatLonPoint of 
the location field of the document being percolated.
h2. Proposed solution
 
Increase the visibility of the LatLonXQuery classes to {{public}} so that they 
can 
be used in other packages (e.g. elasticsearch percolator code)
 
In the elasticsearch percolator code I'd have to make use explicitly of the 
LatLonPointInPolygonQuery class (_instanceof_) when analyzing the search 
queries to be used in the percolation process:
 
[https://github.com/elastic/elasticsearch/blob/master/modules/percolator/src/main/java/org/elasticsearch/percolator/QueryAnalyzer.java#L186]
 
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-16 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136454#comment-17136454
 ] 

Adrien Grand commented on LUCENE-9378:
--

[~alexklibisz] Can you give more details about what you are storing in binary 
doc values and what your benchmark does?

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-06-16 Thread GitBox


jpountz commented on a change in pull request #1351:
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r440675787



##
File path: 
lucene/core/src/java/org/apache/lucene/search/FilteringNumericComparator.java
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import org.apache.lucene.index.LeafReaderContext;
+
+import java.io.IOException;
+
+/**
+ * A wrapper over {@code NumericComparator} that provides a leaf comparator 
that can filter non-competitive docs.
+ */
+class FilteringNumericComparator extends 
FilteringFieldComparator {
+  public FilteringNumericComparator(NumericComparator in, boolean reverse, 
boolean singleSort) {
+super(in, reverse, singleSort);
+  }
+
+  @Override
+  public final FilteringLeafFieldComparator 
getLeafComparator(LeafReaderContext context) throws IOException {
+LeafFieldComparator inLeafComparator = in.getLeafComparator(context);
+Class comparatorClass = inLeafComparator.getClass();
+if (comparatorClass == FieldComparator.LongComparator.class) {
+  return new 
FilteringNumericLeafComparator.FilteringLongLeafComparator((FieldComparator.LongComparator)
 inLeafComparator, context,
+  ((LongComparator) inLeafComparator).field, reverse, singleSort, 
hasTopValue);
+} if (comparatorClass == FieldComparator.IntComparator.class) {
+  return new 
FilteringNumericLeafComparator.FilteringIntLeafComparator((FieldComparator.IntComparator)
 inLeafComparator, context,
+  ((IntComparator) inLeafComparator).field, reverse, singleSort, 
hasTopValue);
+} else if (comparatorClass == FieldComparator.DoubleComparator.class) {
+  return new 
FilteringNumericLeafComparator.FilteringDoubleLeafComparator((FieldComparator.DoubleComparator)
 inLeafComparator, context,
+  ((DoubleComparator) inLeafComparator).field, reverse, singleSort, 
hasTopValue);
+} else if (comparatorClass == FieldComparator.FloatComparator.class) {
+  return new 
FilteringNumericLeafComparator.FilteringFloatLeafComparator((FieldComparator.FloatComparator)
 inLeafComparator, context,
+  ((FloatComparator) inLeafComparator).field, reverse, singleSort, 
hasTopValue);
+} else {
+  assert false: "Unexpected class for [FieldComparator]!";

Review comment:
   add the class to the error message?

##
File path: 
lucene/core/src/java/org/apache/lucene/search/FilteringNumericLeafComparator.java
##
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import org.apache.lucene.document.DoublePoint;
+import org.apache.lucene.document.FloatPoint;
+import org.apache.lucene.document.IntPoint;
+import org.apache.lucene.document.LongPoint;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PointValues;
+import org.apache.lucene.util.DocIdSetBuilder;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+/**
+ * A {@code FilteringLeafFieldComparator} that provides a functionality to 
skip over non-competitive documents
+ * for numeric fields indexed with points.
+ */
+abstract class FilteringNumericLeafComparator implements 
FilteringLeafFieldComparator {
+  protected final LeafFieldComparator in;
+  protected final boolean reverse;
+  protected final boolean singleSort;
+  private final boolean hasTopValue;
+  private final PointValues pointValues;
+  private final 

[GitHub] [lucene-solr] jimczi commented on a change in pull request #1577: LUCENE-9390: JapaneseTokenizer discards token that is all punctuation characters only

2020-06-16 Thread GitBox


jimczi commented on a change in pull request #1577:
URL: https://github.com/apache/lucene-solr/pull/1577#discussion_r440654020



##
File path: 
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java
##
@@ -1917,4 +1917,15 @@ private static boolean isPunctuation(char ch) {
 return false;
 }
   }
+
+  private static boolean isAllCharPunctuation(char[] ch, int offset, int 
length) {
+boolean flag = true;
+for (int i = offset; i < offset + length; i++) {
+  if (!isPunctuation(ch[i])) {
+flag = false;
+break;
+  }
+}
+return flag;

Review comment:
   return `true` ?

##
File path: 
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java
##
@@ -1917,4 +1917,15 @@ private static boolean isPunctuation(char ch) {
 return false;
 }
   }
+
+  private static boolean isAllCharPunctuation(char[] ch, int offset, int 
length) {
+boolean flag = true;
+for (int i = offset; i < offset + length; i++) {
+  if (!isPunctuation(ch[i])) {
+flag = false;

Review comment:
   nit: you can return `false` directly ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9400) Tessellator might fail when several holes share the same vertex

2020-06-16 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-9400.
--
Fix Version/s: 8.6
 Assignee: Ignacio Vera
   Resolution: Fixed

> Tessellator might fail when several holes share the same vertex
> ---
>
> Key: LUCENE-9400
> URL: https://issues.apache.org/jira/browse/LUCENE-9400
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.6
>
> Attachments: image-2020-06-10-09-54-46-316.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We found a new case where tessellation might fail on a valid polygon. The 
> issue shows when a polygon has several holes sharing the same vertex. In this 
> case the merging logic for holes might fail creating an invalid polygon. For 
> example a polygon like this one:
>  
> !image-2020-06-10-09-54-46-316.png|width=260,height=266!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1567: LUCENE-9402: Let MultiCollector handle minCompetitiveScore

2020-06-16 Thread GitBox


jpountz commented on a change in pull request #1567:
URL: https://github.com/apache/lucene-solr/pull/1567#discussion_r440628129



##
File path: lucene/core/src/test/org/apache/lucene/search/MultiCollectorTest.java
##
@@ -163,4 +163,115 @@ public void testCacheScoresIfNecessary() throws 
IOException {
 reader.close();
 dir.close();
   }
+  
+  public void testScorerWrappingForTopScores() throws IOException {
+Directory dir = newDirectory();
+RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
+iw.addDocument(new Document());
+DirectoryReader reader = iw.getReader();
+iw.close();
+final LeafReaderContext ctx = reader.leaves().get(0);
+Collector c1 = collector(ScoreMode.TOP_SCORES, 
MultiCollector.MinCompetitiveScoreAwareScorable.class);
+Collector c2 = collector(ScoreMode.TOP_SCORES, 
MultiCollector.MinCompetitiveScoreAwareScorable.class);
+MultiCollector.wrap(c1, c2).getLeafCollector(ctx).setScorer(new 
ScoreAndDoc());
+
+c1 = collector(ScoreMode.TOP_SCORES, ScoreCachingWrappingScorer.class);
+c2 = collector(ScoreMode.COMPLETE, ScoreCachingWrappingScorer.class);
+MultiCollector.wrap(c1, c2).getLeafCollector(ctx).setScorer(new 
ScoreAndDoc());
+
+reader.close();
+dir.close();
+  }
+  
+  public void testMinCompetitiveScore() throws IOException {
+float[] currentMinScores = new float[3];
+float[] minCompetitiveScore = new float[1];
+Scorable scorer = new Scorable() {
+  
+  @Override
+  public float score() throws IOException {
+return 0;
+  }
+  
+  @Override
+  public int docID() {
+return 0;
+  }
+  
+  @Override
+  public void setMinCompetitiveScore(float minScore) throws IOException {
+minCompetitiveScore[0] = minScore;
+  }
+};
+Scorable s0 = new MultiCollector.MinCompetitiveScoreAwareScorable(scorer, 
0, currentMinScores);
+Scorable s1 = new MultiCollector.MinCompetitiveScoreAwareScorable(scorer, 
1, currentMinScores);
+Scorable s2 = new MultiCollector.MinCompetitiveScoreAwareScorable(scorer, 
2, currentMinScores);
+assertEquals(0f, minCompetitiveScore[0], 0);
+s0.setMinCompetitiveScore(0.5f);
+assertEquals(0f, minCompetitiveScore[0], 0);
+s1.setMinCompetitiveScore(0.8f);
+assertEquals(0f, minCompetitiveScore[0], 0);
+s2.setMinCompetitiveScore(0.3f);
+assertEquals(0.3f, minCompetitiveScore[0], 0);
+s2.setMinCompetitiveScore(0.1f);
+assertEquals(0.3f, minCompetitiveScore[0], 0);
+s1.setMinCompetitiveScore(Float.MAX_VALUE);
+assertEquals(0.3f, minCompetitiveScore[0], 0);
+s2.setMinCompetitiveScore(Float.MAX_VALUE);
+assertEquals(0.5f, minCompetitiveScore[0], 0);
+s0.setMinCompetitiveScore(Float.MAX_VALUE);
+assertEquals(Float.MAX_VALUE, minCompetitiveScore[0], 0);
+  }
+  
+  public void testCollectionTermination() throws IOException {
+Directory dir = newDirectory();
+RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
+iw.addDocument(new Document());
+DirectoryReader reader = iw.getReader();
+iw.close();
+final LeafReaderContext ctx = reader.leaves().get(0);
+DummyCollector c1 = new DummyCollector() {
+  @Override
+  public void collect(int doc) throws IOException {
+if (doc == 1) {
+  throw new CollectionTerminatedException();
+}
+super.collect(doc);
+  }
+  
+};
+
+DummyCollector c2 = new DummyCollector() {
+  @Override
+  public void collect(int doc) throws IOException {
+if (doc == 2) {
+  throw new CollectionTerminatedException();
+}
+super.collect(doc);
+  }
+  
+};
+
+Collector mc = MultiCollector.wrap(c1, c2);
+LeafCollector lc = mc.getLeafCollector(ctx);
+lc.setScorer(new ScoreAndDoc());
+lc.collect(0); // OK
+assertTrue("c1's collect should be called", c1.collectCalled);
+assertTrue("c2's collect should be called", c2.collectCalled);
+c1.collectCalled = false;
+c2.collectCalled = false;
+lc.collect(1); // OK, but c1 should terminate

Review comment:
   maybe create a new variant of this test that calls setScorer again after 
this collect call?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9400) Tessellator might fail when several holes share the same vertex

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136373#comment-17136373
 ] 

ASF subversion and git services commented on LUCENE-9400:
-

Commit d78f430eca3588ac35d4994de6670637016a90b2 in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d78f430 ]

LUCENE-9400: Tessellator might fail when several holes share the same vertex 
(#1562)



> Tessellator might fail when several holes share the same vertex
> ---
>
> Key: LUCENE-9400
> URL: https://issues.apache.org/jira/browse/LUCENE-9400
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: image-2020-06-10-09-54-46-316.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We found a new case where tessellation might fail on a valid polygon. The 
> issue shows when a polygon has several holes sharing the same vertex. In this 
> case the merging logic for holes might fail creating an invalid polygon. For 
> example a polygon like this one:
>  
> !image-2020-06-10-09-54-46-316.png|width=260,height=266!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase merged pull request #1562: LUCENE-9400: Tessellator might fail when several holes share the same vertex

2020-06-16 Thread GitBox


iverase merged pull request #1562:
URL: https://github.com/apache/lucene-solr/pull/1562


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9400) Tessellator might fail when several holes share the same vertex

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136372#comment-17136372
 ] 

ASF subversion and git services commented on LUCENE-9400:
-

Commit 75491ab3814cf1544987296ce958137f2fc75e8a in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=75491ab ]

LUCENE-9400: Tessellator might fail when several holes share the same vertex 
(#1562)



> Tessellator might fail when several holes share the same vertex
> ---
>
> Key: LUCENE-9400
> URL: https://issues.apache.org/jira/browse/LUCENE-9400
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: image-2020-06-10-09-54-46-316.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We found a new case where tessellation might fail on a valid polygon. The 
> issue shows when a polygon has several holes sharing the same vertex. In this 
> case the merging logic for holes might fail creating an invalid polygon. For 
> example a polygon like this one:
>  
> !image-2020-06-10-09-54-46-316.png|width=260,height=266!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8274) Add per-request MDC logging based on user-provided value.

2020-06-16 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136353#comment-17136353
 ] 

David Smiley commented on SOLR-8274:


Note a recent issue SOLR-14566 proposes a particular way and there's some 
interesting conversation.

> Add per-request MDC logging based on user-provided value.
> -
>
> Key: SOLR-8274
> URL: https://issues.apache.org/jira/browse/SOLR-8274
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-8274.patch
>
>
> *Problem 1* Currently, there's no way (AFAIK) to find all log messages 
> associated with a particular request.
> *Problem 2* There's also no easy way for multi-tenant Solr setups to find all 
> log messages associated with a particular customer/tenant.
> Both of these problems would be more manageable if Solr could be configured 
> to record an MDC tag based on a header, or some other user provided value.
> This would allow admins to group together logs about a single request.  If 
> the same header value is repeated multiple times this functionality could 
> also be used to group together arbitrary requests, such as those that come 
> from a particular user, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14571) Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Florin Babes (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florin Babes updated SOLR-14571:

Status: Open  (was: Patch Available)

> Index download speed while replicating is fixed at 5.1 in replication.html
> --
>
> Key: SOLR-14571
> URL: https://issues.apache.org/jira/browse/SOLR-14571
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, master (9.0), 8.5.2
>Reporter: Florin Babes
>Priority: Trivial
>  Labels: AdminUI, Replication
> Attachments: SOLR-14571.patch
>
>
> Hello,
> While checking ways to optimize the speed of replication I've noticed that 
> the index download speed is fixed at 5.1 in replication.html. There is a 
> reason for that? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14571) Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Florin Babes (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florin Babes updated SOLR-14571:

Status: Patch Available  (was: Open)

> Index download speed while replicating is fixed at 5.1 in replication.html
> --
>
> Key: SOLR-14571
> URL: https://issues.apache.org/jira/browse/SOLR-14571
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, master (9.0), 8.5.2
>Reporter: Florin Babes
>Priority: Trivial
>  Labels: AdminUI, Replication
> Attachments: SOLR-14571.patch
>
>
> Hello,
> While checking ways to optimize the speed of replication I've noticed that 
> the index download speed is fixed at 5.1 in replication.html. There is a 
> reason for that? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14571) Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Florin Babes (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florin Babes updated SOLR-14571:

Attachment: SOLR-14571.patch

> Index download speed while replicating is fixed at 5.1 in replication.html
> --
>
> Key: SOLR-14571
> URL: https://issues.apache.org/jira/browse/SOLR-14571
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, master (9.0), 8.5.2
>Reporter: Florin Babes
>Priority: Trivial
>  Labels: AdminUI, Replication
> Attachments: SOLR-14571.patch
>
>
> Hello,
> While checking ways to optimize the speed of replication I've noticed that 
> the index download speed is fixed at 5.1 in replication.html. There is a 
> reason for that? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14571) Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Florin Babes (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florin Babes updated SOLR-14571:

Attachment: (was: SOLR-14571.patch)

> Index download speed while replicating is fixed at 5.1 in replication.html
> --
>
> Key: SOLR-14571
> URL: https://issues.apache.org/jira/browse/SOLR-14571
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.0, master (9.0), 8.5.2
>Reporter: Florin Babes
>Priority: Trivial
>  Labels: AdminUI, Replication
>
> Hello,
> While checking ways to optimize the speed of replication I've noticed that 
> the index download speed is fixed at 5.1 in replication.html. There is a 
> reason for that? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14384) Stack SolrRequestInfo

2020-06-16 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-14384:

Fix Version/s: 8.6
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Stack SolrRequestInfo
> -
>
> Key: SOLR-14384
> URL: https://issues.apache.org/jira/browse/SOLR-14384
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Sometimes SolrRequestInfo need to be suspended/overridden with a new one that 
> is used temporarily. Examples are in the {{[subquery]}} transformer, and in 
> warm of caches, and in QuerySenderListener (another type of warming), maybe 
> others.  This can be annoying to do correctly, and in at least one place it 
> isn't done correctly.  SolrRequestInfoSuspender shows some complexity.  In 
> this issue, [~dsmiley] proposes using a stack internally to SolrRequestInfo 
> that is push'ed and pop'ed.  It's not the only way to solve this but it's one 
> way.
>  See linked issues for the context and discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14384) Stack SolrRequestInfo

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136333#comment-17136333
 ] 

ASF subversion and git services commented on SOLR-14384:


Commit 35bdf9b413512fa4b2e360df14991f27462ecb6f in lucene-solr's branch 
refs/heads/branch_8x from Nazerke Seidan
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=35bdf9b ]

SOLR-14384: SolrRequestInfo now stacks internally.
* "set" now MUST pair with a "clear"
* fixes SolrIndexSearcher.warm which should have re-instated previous SRI
* cleans up some SRI set/clear users

Closes #1527

(cherry picked from commit 2da71c2a405483e2cf5270dfc20cbd760cd66486)


> Stack SolrRequestInfo
> -
>
> Key: SOLR-14384
> URL: https://issues.apache.org/jira/browse/SOLR-14384
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Sometimes SolrRequestInfo need to be suspended/overridden with a new one that 
> is used temporarily. Examples are in the {{[subquery]}} transformer, and in 
> warm of caches, and in QuerySenderListener (another type of warming), maybe 
> others.  This can be annoying to do correctly, and in at least one place it 
> isn't done correctly.  SolrRequestInfoSuspender shows some complexity.  In 
> this issue, [~dsmiley] proposes using a stack internally to SolrRequestInfo 
> that is push'ed and pop'ed.  It's not the only way to solve this but it's one 
> way.
>  See linked issues for the context and discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org