[jira] [Resolved] (SOLR-13739) Managed resource observers have to be added only once

2019-09-13 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-13739.
-
Fix Version/s: 8.3
 Assignee: David Smiley
   Resolution: Fixed

Thanks for contributing Thomas!

> Managed resource observers have to be added only once
> -
>
> Key: SOLR-13739
> URL: https://issues.apache.org/jira/browse/SOLR-13739
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 8.0, 8.1, master (9.0), 8.2, 8.1.1
>Reporter: Thomas Wöckinger
>Assignee: David Smiley
>Priority: Major
>  Labels: easyfix, performance, pull-request-available, 
> ready-to-commit
> Fix For: 8.3
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> On huge schema modifications, mostly happen due to creation of a new 
> collection, the same observer instance of an ResourceLoaderAware component is 
> added again and again.
> This leads to a runtime behaviour of n²/2 where n is is the number of schema 
> operation multiplied by ResourceLoaderAware components instead of the number 
> of containing ResourceLoaderAware components.
> E.g. If you have 1000 schema operations and 2 ResourceLoaderAware components 
> this leads to 50 operations instead of 2000.
> Even worse the corresponding resource is registered again and again, which 
> can take some time e.g. ManagedSynonymGraphFilterFactory needs about 5s on 
> each call (depending on the size of synonyms).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8921) IndexSearcher.termStatistics should not require TermStates but docFreq and totalTermFreq

2019-09-13 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929311#comment-16929311
 ] 

David Smiley commented on LUCENE-8921:
--

Barring further feedback, I'll commit what Bruno has Tuesday next week.  As 
discussed in GitHub, the 8x backport will retain the current method, made 
final, made deprecated , and will call the new method to do the work.

> IndexSearcher.termStatistics should not require TermStates but docFreq and 
> totalTermFreq
> 
>
> Key: LUCENE-8921
> URL: https://issues.apache.org/jira/browse/LUCENE-8921
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> IndexSearcher.termStatistics(Term term, TermStates context) is the way to 
> create a TermStatistics. It requires a TermStates param although it only 
> cares about the docFreq and totalTermFreq.
>  
> For customizations that what to create TermStatistics based on docFreq and 
> totalTermFreq, but that do not have available TermStates, this method forces 
> to create a TermStates instance (which is not very lightweight) only to pass 
> two ints.
> termStatistics could be modified to the following signature:
> termStatistics(Term term, int docFreq, int totalTermFreq)
> Since it would change the API, it could be done in master for next major 
> release.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1954) Highlighter component should expose snippet character offsets and the score.

2019-09-10 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-1954:
---
Attachment: SOLR-1954.patch
  Assignee: David Smiley
Status: Open  (was: Open)

At the Lucene/Solr Hackday event, I worked on this for the Unified highlighter. 
 I'm attaching a patch that is very much WIP but basically works.  It adds a 
"hl.extended" boolean flag which will mean a structured detailed response in 
place of the list of snippets.  TODOs:
* Expose more info; I just did a couple things.
* Probably make the format nicer.  Definitely some rough edges in this code; 
TODOs and WIP bits are there.  Tidying up to do still.
* SolrJ QueryResponse
* Ref guide

> Highlighter component should expose snippet character offsets and the score.
> 
>
> Key: SOLR-1954
> URL: https://issues.apache.org/jira/browse/SOLR-1954
> Project: Solr
>  Issue Type: New Feature
>  Components: highlighter
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: SOLR-1954.patch, SOLR-1954_start_and_end_offsets.patch
>
>
> The Highlighter Component does not currently expose the snippet character 
> offsets nor the score.  There is a TODO in DefaultSolrHighlighter indicating 
> the intention to add this eventually.  This information is needed when doing 
> highlighting on external content.  The data is there so its pretty easy to 
> output it in some way.  The challenge is deciding on the output and its 
> ramifications on backwards compatibility.  The current highlighter component 
> response structure doesn't lend itself to adding any new data, unfortunately. 
>  I wish the original implementer had some foresight.  Unfortunately all the 
> highlighting tests assume this structure.  Here is a snippet of the current 
> response structure in Solr's sample data searching for "sdram" for reference:
> {code:xml}
> 
>  
>   
>   CORSAIR ValueSelect 1GB 184-Pin DDR emSDRAM/em 
> Unbuffered DDR 400 (PC 3200) System Memory - Retail
>   
>  
> 
> {code}
> Perhaps as a little hack, we introduce a pseudo field called 
> text_startCharOffset which is the concatenation of the matching field and 
> "_startCharOffset".  This would be an array of ints.  Likewise, there would 
> be another array for endCharOffset and score.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13745) Test should close resources: AtomicUpdateProcessorFactoryTest

2019-09-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926002#comment-16926002
 ] 

David Smiley commented on SOLR-13745:
-

Aha; this ObjectReleaseTracker looks super easy to use.  Activated when 
assertions are enabled.  Cool; maybe I'll file an issue for it.

> Test should close resources: AtomicUpdateProcessorFactoryTest 
> --
>
> Key: SOLR-13745
> URL: https://issues.apache.org/jira/browse/SOLR-13745
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
>
> This tests hangs after the test runs because there are directory or request 
> resources (not sure yet) that are not closed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8638) Remove deprecated code in master

2019-09-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925655#comment-16925655
 ] 

David Smiley commented on LUCENE-8638:
--

The branch doesn't pass precommit now because javadocs in LuceneTestCase refer 
to the getBaseTempDirForTestClass you removed.

> Remove deprecated code in master
> 
>
> Key: LUCENE-8638
> URL: https://issues.apache.org/jira/browse/LUCENE-8638
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: master (9.0)
>
>
> There are a number of deprecations in master that should be removed. This 
> issue is to keep track of deprecations as a whole, some individual 
> deprecations may require their own issues.
>  
> Work on this issue should be pushed to the `master-deprecations` branch on 
> gitbox



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13138) Remove deprecated code in master

2019-09-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925654#comment-16925654
 ] 

David Smiley commented on SOLR-13138:
-

Two Jira issues but one branch, and thus presumably one commit that spans 
projects?  Ehh; this wouldn't be my preference but I guess it's okay.

> Remove deprecated code in master
> 
>
> Key: SOLR-13138
> URL: https://issues.apache.org/jira/browse/SOLR-13138
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Alan Woodward
>Priority: Major
>
> There are a number of deprecations in master that should be removed.  This 
> issue is to keep track of deprecations as a whole, some individual 
> deprecations may require their own issues.
>  
> Work on this issue should be pushed to the `master-deprecations` branch on 
> gitbox.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13745) Test should close resources: AtomicUpdateProcessorFactoryTest

2019-09-07 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924960#comment-16924960
 ] 

David Smiley commented on SOLR-13745:
-

Aha; very interesting. Yeah I agree on your assessment. It'd be nice if failing 
to close a SolrQueryRequest might be enforced in tests but at least we're good 
at enforcing the check at the SolrIndexSearcher level. I'm glad you're chasing 
down these issues.

> Test should close resources: AtomicUpdateProcessorFactoryTest 
> --
>
> Key: SOLR-13745
> URL: https://issues.apache.org/jira/browse/SOLR-13745
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
>
> This tests hangs after the test runs because there are directory or request 
> resources (not sure yet) that are not closed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13677) All Metrics Gauges should be unregistered by the objects that registered them

2019-09-06 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924616#comment-16924616
 ] 

David Smiley commented on SOLR-13677:
-

+Yes, lets revert this now then+.  I appreciate that [~ab] is setting a good 
bar for quality software!

> All Metrics Gauges should be unregistered by the objects that registered them
> -
>
> Key: SOLR-13677
> URL: https://issues.apache.org/jira/browse/SOLR-13677
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The life cycle of Metrics producers are managed by the core (mostly). So, if 
> the lifecycle of the object is different from that of the core itself, these 
> objects will never be unregistered from the metrics registry. This will lead 
> to memory leaks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13745) Test should close resources: AtomicUpdateProcessorFactoryTest

2019-09-06 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-13745.
-
Fix Version/s: 8.3
   Resolution: Fixed

> Test should close resources: AtomicUpdateProcessorFactoryTest 
> --
>
> Key: SOLR-13745
> URL: https://issues.apache.org/jira/browse/SOLR-13745
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
>
> This tests hangs after the test runs because there are directory or request 
> resources (not sure yet) that are not closed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8753) New PostingFormat - UniformSplit

2019-09-06 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-8753.
--
Fix Version/s: 8.3
   Resolution: Fixed

Thanks [~bruno.roustant] and [~juan.duran]!

BTW in the 8.x backport, precommit failed because JDK 8 doesn't like the stray 
"" in the package-info.java files so I removed them.

> New PostingFormat - UniformSplit
> 
>
> Key: LUCENE-8753
> URL: https://issues.apache.org/jira/browse/LUCENE-8753
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.0
>Reporter: Bruno Roustant
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.3
>
> Attachments: Uniform Split Technique.pdf, luceneutil.benchmark.txt
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> This is a proposal to add a new PostingsFormat called "UniformSplit" with 4 
> objectives:
>  - Clear design and simple code.
>  - Easily extensible, for both the logic and the index format.
>  - Light memory usage with a very compact FST.
>  - Focus on efficient TermQuery, PhraseQuery and PrefixQuery performance.
> (the pdf attached explains visually the technique in more details)
>  The principle is to split the list of terms into blocks and use a FST to 
> access the block, but not as a prefix trie, rather with a seek-floor pattern. 
> For the selection of the blocks, there is a target average block size (number 
> of terms), with an allowed delta variation (10%) to compare the terms and 
> select the one with the minimal distinguishing prefix.
>  There are also several optimizations inside the block to make it more 
> compact and speed up the loading/scanning.
> The performance obtained is interesting with the luceneutil benchmark, 
> comparing UniformSplit with BlockTree. Find it in the first comment and also 
> attached for better formatting.
> Although the precise percentages vary between runs, three main points:
>  - TermQuery and PhraseQuery are improved.
>  - PrefixQuery and WildcardQuery are ok.
>  - Fuzzy queries are clearly less performant, because BlockTree is so 
> optimized for them.
> Compared to BlockTree, FST size is reduced by 15%, and segment writing time 
> is reduced by 20%. So this PostingsFormat scales to lots of docs, as 
> BlockTree.
> This initial version passes all Lucene tests. Use “ant test 
> -Dtests.codec=UniformSplitTesting” to test with this PostingsFormat.
> Subjectively, we think we have fulfilled our goal of code simplicity. And we 
> have already exercised this PostingsFormat extensibility to create a 
> different flavor for our own use-case.
> Contributors: Juan Camilo Rodriguez Duran, Bruno Roustant, David Smiley



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13728) Fail partial updates if it would inadvertently remove nested docs

2019-09-06 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924482#comment-16924482
 ] 

David Smiley commented on SOLR-13728:
-

I'm puzzled but lets discuss further in SOLR-13745 -- an issue I both filed and 
fixed for the problem you identified within the last half hour.

> Fail partial updates if it would inadvertently remove nested docs
> -
>
> Key: SOLR-13728
> URL: https://issues.apache.org/jira/browse/SOLR-13728
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
> Attachments: SOLR-13728.patch
>
>
> In SOLR-12638 Solr gained the ability to do partial updates (aka atomic 
> updates) to nested documents.  However this feature only works if the schema 
> meets certain circumstances.  We can know we don't support it and fail the 
> request – what I propose here.  This is much friendlier than wiping out 
> existing documents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-13745) Test should close resources: AtomicUpdateProcessorFactoryTest

2019-09-06 Thread David Smiley (Jira)
David Smiley created SOLR-13745:
---

 Summary: Test should close resources: 
AtomicUpdateProcessorFactoryTest 
 Key: SOLR-13745
 URL: https://issues.apache.org/jira/browse/SOLR-13745
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley
Assignee: David Smiley


This tests hangs after the test runs because there are directory or request 
resources (not sure yet) that are not closed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13728) Fail partial updates if it would inadvertently remove nested docs

2019-09-06 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-13728.
-
Resolution: Fixed

[~hossman] I'm confident you mean to comment on SOLR-13523 (June 20th), not 
this one.

> Fail partial updates if it would inadvertently remove nested docs
> -
>
> Key: SOLR-13728
> URL: https://issues.apache.org/jira/browse/SOLR-13728
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
> Attachments: SOLR-13728.patch
>
>
> In SOLR-12638 Solr gained the ability to do partial updates (aka atomic 
> updates) to nested documents.  However this feature only works if the schema 
> meets certain circumstances.  We can know we don't support it and fail the 
> request – what I propose here.  This is much friendlier than wiping out 
> existing documents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13728) Fail partial updates if it would inadvertently remove nested docs

2019-09-06 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924433#comment-16924433
 ] 

David Smiley commented on SOLR-13728:
-

I'll investigate [~hossman].  I had run the tests right before committing, and 
looked for CI failures this morning.  So this is a mystery.

> Fail partial updates if it would inadvertently remove nested docs
> -
>
> Key: SOLR-13728
> URL: https://issues.apache.org/jira/browse/SOLR-13728
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
> Attachments: SOLR-13728.patch
>
>
> In SOLR-12638 Solr gained the ability to do partial updates (aka atomic 
> updates) to nested documents.  However this feature only works if the schema 
> meets certain circumstances.  We can know we don't support it and fail the 
> request – what I propose here.  This is much friendlier than wiping out 
> existing documents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13738) UnifiedHighlighter can't highlight GraphQuery

2019-09-05 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-13738:

Component/s: highlighter

> UnifiedHighlighter can't highlight GraphQuery
> -
>
> Key: SOLR-13738
> URL: https://issues.apache.org/jira/browse/SOLR-13738
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 8.2
>Reporter: Jochen Barth
>Priority: Major
>
> Mikhail Khludnev said, this is a bug;
> Here the complete error message for the Query below:
> Just tested wirth 8.1.1: works.
> {quote}
> 2019-08-30 12:40:40.476 ERROR (qtp2116511124-65) [   x:Suchindex] 
> o.a.s.h.RequestHandlerBase java.lang.ClassCastException: class 
> org.apache.lucene.search.IndexSearcher cannot be cast to class 
> org.apache.solr.search.SolrIndexSearcher (or
> g.apache.lucene.search.IndexSearcher and 
> org.apache.solr.search.SolrIndexSearcher are in unnamed module of loader 
> org.eclipse.jetty.webapp.WebAppClassLoader @5ed190be)
> at 
> org.apache.solr.search.join.GraphQuery.createWeight(GraphQuery.java:115)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:137)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
> at 
> org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnum(MemoryIndexOffsetStrategy.java:110)
> at 
> org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:641)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:510)
> at 
> org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
> at 
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.Server.handle(Server.java:505)
> at 

[jira] [Commented] (SOLR-13738) UnifiedHighlighter can't highlight GraphQuery

2019-09-05 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923921#comment-16923921
 ] 

David Smiley commented on SOLR-13738:
-

Likely caused by LUCENE-8848.  Ironically the goal of that one is to highlight 
more query types, so it's a bit odd that in this case, it did just the 
opposite.  Based on the stack trace, it appears what might help is passing the 
SolrIndexSearcher that UnifiedHighlighter has into UHComponents where the 
FieldOffsetStrategy could grab it instead of creating a new IndexSearcher.  All 
this said, I do wonder how/why it worked before.  An investigation for another 
day.

In the mean time, you might avoid this error by toggling some settings like not 
using the weight matcher mode.

> UnifiedHighlighter can't highlight GraphQuery
> -
>
> Key: SOLR-13738
> URL: https://issues.apache.org/jira/browse/SOLR-13738
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.2
>Reporter: Jochen Barth
>Priority: Major
>
> Mikhail Khludnev said, this is a bug;
> Here the complete error message for the Query below:
> Just tested wirth 8.1.1: works.
> {quote}
> 2019-08-30 12:40:40.476 ERROR (qtp2116511124-65) [   x:Suchindex] 
> o.a.s.h.RequestHandlerBase java.lang.ClassCastException: class 
> org.apache.lucene.search.IndexSearcher cannot be cast to class 
> org.apache.solr.search.SolrIndexSearcher (or
> g.apache.lucene.search.IndexSearcher and 
> org.apache.solr.search.SolrIndexSearcher are in unnamed module of loader 
> org.eclipse.jetty.webapp.WebAppClassLoader @5ed190be)
> at 
> org.apache.solr.search.join.GraphQuery.createWeight(GraphQuery.java:115)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:137)
> at 
> org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
> at 
> org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnum(MemoryIndexOffsetStrategy.java:110)
> at 
> org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:641)
> at 
> org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:510)
> at 
> org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
> at 
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at 
> 

[jira] [Updated] (SOLR-13728) Fail partial updates if it would inadvertently remove nested docs

2019-09-05 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-13728:

Fix Version/s: 8.3
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fail partial updates if it would inadvertently remove nested docs
> -
>
> Key: SOLR-13728
> URL: https://issues.apache.org/jira/browse/SOLR-13728
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
> Attachments: SOLR-13728.patch
>
>
> In SOLR-12638 Solr gained the ability to do partial updates (aka atomic 
> updates) to nested documents.  However this feature only works if the schema 
> meets certain circumstances.  We can know we don't support it and fail the 
> request – what I propose here.  This is much friendlier than wiping out 
> existing documents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8965) ConcurrentMergeScheduler should maybe sometimes do synchronously

2019-09-04 Thread David Smiley (Jira)
David Smiley created LUCENE-8965:


 Summary: ConcurrentMergeScheduler should maybe sometimes do 
synchronously
 Key: LUCENE-8965
 URL: https://issues.apache.org/jira/browse/LUCENE-8965
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: David Smiley
Assignee: David Smiley


It can be beneficial for the ConcurrentMergeScheduler to sometimes _not_ do 
concurrent merges (i.e. sometimes do a serial merge).  When the provided 
"OneMerge" is _small_ and when the MergeTrigger is FULL_FLUSH (i.e. on a 
commit), a new searcher would benefit from seeing fewer segments.  If index 
replication is used, this setting can reduce the net replication by merging a 
little more eagerly sometimes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2019-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922749#comment-16922749
 ] 

David Smiley commented on LUCENE-8962:
--

The existing MergeTrigger mechanism is a way to conditionally activate this 
stuff.  We already did that; I forgot to mention it.  Thus it's only on a 
commit (not a flush) that activates this behavior.  I'll create an issue for 
the merge scheduler and link here.

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9658) Caches should have an optional way to clean if idle for 'x' mins

2019-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922432#comment-16922432
 ] 

David Smiley commented on SOLR-9658:


Sure; I don't mean to stand in your way.

> Caches should have an optional way to clean if idle for 'x' mins
> 
>
> Key: SOLR-9658
> URL: https://issues.apache.org/jira/browse/SOLR-9658
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch, 
> SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch
>
>
> If a cache is idle for long, it consumes precious memory. It should be 
> configurable to clear the cache if it was not accessed for 'x' secs. The 
> cache configuration can have an extra config {{maxIdleTime}} . if we wish it 
> to the cleaned after 10 mins of inactivity set it to {{maxIdleTime=600}}. 
> [~dragonsinth] would it be a solution for the memory leak you mentioned?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13705) Double-checked Locking Should Not be Used

2019-09-03 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned SOLR-13705:
---

Assignee: David Smiley

> Double-checked Locking Should Not be Used
> -
>
> Key: SOLR-13705
> URL: https://issues.apache.org/jira/browse/SOLR-13705
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.2
>Reporter: Furkan KAMACI
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Using double-checked locking for the lazy initialization of any other type of 
> primitive or mutable object risks a second thread using an uninitialized or 
> partially initialized member while the first thread is still creating it, and 
> crashing the program.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2019-09-03 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921796#comment-16921796
 ] 

David Smiley commented on LUCENE-8962:
--

At Salesforce I worked on a custom merge policy to better address handling of 
small segments than TieredMergePolicy's choices.  What's disappointing about 
TMP is that TMP insists on merging getSegmentsPerTier() (10) segments, _even 
when they are small_ (below getFloorSegmentMB()).  Instead we wanted some 
"cheap merges" of a smaller number of segments (even as few as 3 for us) that 
solely consist of the small segments.  This cut our average segment count in 
half, although cost us more I/O -- a trade-off we were happy with.  I'd like to 
open-source this, perhaps as a direct change to TMP with defaults to do a 
similar amount of I/O but averaging fewer segments.  The difficult part is 
doing simulations to prove out the theories.

Additionally, I worked on a custom MergeScheduler that executed those "cheap 
merges" synchronously (directly in the calling thread) while having the regular 
other merges pass through to the concurrent scheduler.  The rationale wasn't 
tied to NRT but I could see NRT benefiting from this if getting an NRT searcher 
calls out to the merge code (I don't know if it does).

Perhaps your use-case could benefit from this as well.  Unlike what you propose 
in the description, it doesn't involve changes/features to Lucene itself.  WDYT?

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13655) Cut Over Collections.unmodifiableSet usages to Set.*

2019-09-03 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921733#comment-16921733
 ] 

David Smiley commented on SOLR-13655:
-

This is a trivial master-only change since it requires Java 11.  Perhaps we 
should delay such +sweeping+ changes until master (9.0) is released?  We 
needn't back out this but at least put the breaks on other such things.  I am 
appreciative of your efforts, guys.

> Cut Over Collections.unmodifiableSet usages to Set.*
> 
>
> Key: SOLR-13655
> URL: https://issues.apache.org/jira/browse/SOLR-13655
> Project: Solr
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9658) Caches should have an optional way to clean if idle for 'x' mins

2019-09-03 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921723#comment-16921723
 ] 

David Smiley commented on SOLR-9658:


Do we as a project really want to implement our own caches or is it about time 
that we use other popular caches like 
[Caffeine|https://github.com/ben-manes/caffeine]?  Don't get me wrong, I love 
writing code, and working on caches is fun, but I'd rather we use one of our 
many already existing dependencies for this common task.  SOLR-8241 is about 
adding Caffeine to Solr.

> Caches should have an optional way to clean if idle for 'x' mins
> 
>
> Key: SOLR-9658
> URL: https://issues.apache.org/jira/browse/SOLR-9658
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch, 
> SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch
>
>
> If a cache is idle for long, it consumes precious memory. It should be 
> configurable to clear the cache if it was not accessed for 'x' secs. The 
> cache configuration can have an extra config {{maxIdleTime}} . if we wish it 
> to the cleaned after 10 mins of inactivity set it to {{maxIdleTime=600}}. 
> [~dragonsinth] would it be a solution for the memory leak you mentioned?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-3486) The memory size of Solr caches should be configurable

2019-09-03 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed SOLR-3486.
--

> The memory size of Solr caches should be configurable
> -
>
> Key: SOLR-3486
> URL: https://issues.apache.org/jira/browse/SOLR-3486
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LFUMap.java, SOLR-3486.patch, SOLR-3486.patch
>
>
> It is currently possible to configure the sizes of Solr caches based on the 
> number of entries of the cache. The problem is that the memory size of cached 
> values may vary a lot over time (depending on IndexReader.maxDoc and the 
> queries that are run) although the JVM heap size does not.
> Having a configurable max size in bytes would also help optimize cache 
> utilization, making it possible to store more values provided that they have 
> a small memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3486) The memory size of Solr caches should be configurable

2019-09-03 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-3486.

Resolution: Duplicate

> The memory size of Solr caches should be configurable
> -
>
> Key: SOLR-3486
> URL: https://issues.apache.org/jira/browse/SOLR-3486
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LFUMap.java, SOLR-3486.patch, SOLR-3486.patch
>
>
> It is currently possible to configure the sizes of Solr caches based on the 
> number of entries of the cache. The problem is that the memory size of cached 
> values may vary a lot over time (depending on IndexReader.maxDoc and the 
> queries that are run) although the JVM heap size does not.
> Having a configurable max size in bytes would also help optimize cache 
> utilization, making it possible to store more values provided that they have 
> a small memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present

2019-09-03 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-8403.
--
Resolution: Won't Fix

> Support 'filtered' term vectors - don't require all terms to be present
> ---
>
> Key: LUCENE-8403
> URL: https://issues.apache.org/jira/browse/LUCENE-8403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Braun
>Priority: Minor
> Attachments: LUCENE-8403.patch
>
>
> The genesis of this was a conversation and idea from [~dsmiley] several years 
> ago.
> In order to optimize term vector storage, we may not actually need all tokens 
> to be present in the term vectors - and if so, ideally our codec could just 
> opt not to store them.
> I attempted to fork the standard codec and override the TermVectorsFormat and 
> TermVectorsWriter to ignore storing certain Terms within a field. This 
> worked, however, CheckIndex checks that the terms present in the standard 
> postings are also present in the TVs, if TVs enabled. So this then doesn't 
> work as 'valid' according to CheckIndex.
> Can the TermVectorsFormat be made in such a way to support configuration of 
> tokens that should not be stored (benefits: less storage, more optimal 
> retrieval per doc)? Is this valuable to the wider community? Is there a way 
> we can design this to not break CheckIndex's contract while at the same time 
> lessening storage for unneeded tokens?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present

2019-09-03 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed LUCENE-8403.


> Support 'filtered' term vectors - don't require all terms to be present
> ---
>
> Key: LUCENE-8403
> URL: https://issues.apache.org/jira/browse/LUCENE-8403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Braun
>Priority: Minor
> Attachments: LUCENE-8403.patch
>
>
> The genesis of this was a conversation and idea from [~dsmiley] several years 
> ago.
> In order to optimize term vector storage, we may not actually need all tokens 
> to be present in the term vectors - and if so, ideally our codec could just 
> opt not to store them.
> I attempted to fork the standard codec and override the TermVectorsFormat and 
> TermVectorsWriter to ignore storing certain Terms within a field. This 
> worked, however, CheckIndex checks that the terms present in the standard 
> postings are also present in the TVs, if TVs enabled. So this then doesn't 
> work as 'valid' according to CheckIndex.
> Can the TermVectorsFormat be made in such a way to support configuration of 
> tokens that should not be stored (benefits: less storage, more optimal 
> retrieval per doc)? Is this valuable to the wider community? Is there a way 
> we can design this to not break CheckIndex's contract while at the same time 
> lessening storage for unneeded tokens?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13709) Race condition on core reload while core is still loading?

2019-09-02 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921141#comment-16921141
 ] 

David Smiley commented on SOLR-13709:
-

bq. have the reload operation wait until core loading was complete. I'll give 
that a try with some debugging code in place just to prove the hypothesis. 
Thinking more, it seems like swap, unload, and create should all block until 
the coreContainer has completed loading as well. Actually, it seems like all 
core API commands should wait until after CoreContainer.load() is done.

+1 to all that

> Race condition on core reload while core is still loading?
> --
>
> Key: SOLR-13709
> URL: https://issues.apache.org/jira/browse/SOLR-13709
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Erick Erickson
>Priority: Major
> Attachments: apache_Lucene-Solr-Tests-8.x_449.log.txt
>
>
> A recent jenkins failure from {{TestSolrCLIRunExample}} seems to suggest that 
> there may be a race condition when attempting to re-load a SolrCore while the 
> core is currently in the process of (re)loading that can leave the SolrCore 
> in an unusable state.
> Details to follow...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13690) Migrate field type configurations in default/example schema files to look up factories by "name"

2019-09-01 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920433#comment-16920433
 ] 

David Smiley commented on SOLR-13690:
-

[~tomoko] it's very likely that in this issue or similar related ones you are 
doing, that you broke the build.  See the CI failures, and in particular run 
TestConfigSetsAPI.testUserAndTestDefaultConfigsetsAreSame()

> Migrate field type configurations in default/example schema files to look up 
> factories by "name"
> 
>
> Key: SOLR-13690
> URL: https://issues.apache.org/jira/browse/SOLR-13690
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13690.patch, SOLR-13690.patch, Screenshot from 
> 2019-08-30 01-09-43.png
>
>
> This is a follow-up task for SOLR-13593.
> To encourage users to use the "name" attribute in field type configurations, 
> we should migrate all managed-schema files bundled with Solr.
> There are 8 managed-schemas (except for test resources) in solr.
> {code}
> lucene-solr-mirror $ find solr -name "managed-schema" | grep -v test
> solr/server/solr/configsets/sample_techproducts_configs/conf/managed-schema
> solr/server/solr/configsets/_default/conf/managed-schema
> solr/example/files/conf/managed-schema
> solr/example/example-DIH/solr/solr/conf/managed-schema
> solr/example/example-DIH/solr/db/conf/managed-schema
> solr/example/example-DIH/solr/atom/conf/managed-schema
> solr/example/example-DIH/solr/mail/conf/managed-schema
> solr/example/example-DIH/solr/tika/conf/managed-schema
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8753) New PostingFormat - UniformSplit

2019-08-31 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920296#comment-16920296
 ] 

David Smiley commented on LUCENE-8753:
--

Oh; I was just looking at PR #828, by the way.  As I write this, it does not 
include the "shared terms" in #676.  Please incorporate that into #828; okay?  
One PR to review/commit.  The other two can then be closed as obsoleted.

> New PostingFormat - UniformSplit
> 
>
> Key: LUCENE-8753
> URL: https://issues.apache.org/jira/browse/LUCENE-8753
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.0
>Reporter: Bruno Roustant
>Assignee: David Smiley
>Priority: Major
> Attachments: Uniform Split Technique.pdf, luceneutil.benchmark.txt
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is a proposal to add a new PostingsFormat called "UniformSplit" with 4 
> objectives:
>  - Clear design and simple code.
>  - Easily extensible, for both the logic and the index format.
>  - Light memory usage with a very compact FST.
>  - Focus on efficient TermQuery, PhraseQuery and PrefixQuery performance.
> (the pdf attached explains visually the technique in more details)
>  The principle is to split the list of terms into blocks and use a FST to 
> access the block, but not as a prefix trie, rather with a seek-floor pattern. 
> For the selection of the blocks, there is a target average block size (number 
> of terms), with an allowed delta variation (10%) to compare the terms and 
> select the one with the minimal distinguishing prefix.
>  There are also several optimizations inside the block to make it more 
> compact and speed up the loading/scanning.
> The performance obtained is interesting with the luceneutil benchmark, 
> comparing UniformSplit with BlockTree. Find it in the first comment and also 
> attached for better formatting.
> Although the precise percentages vary between runs, three main points:
>  - TermQuery and PhraseQuery are improved.
>  - PrefixQuery and WildcardQuery are ok.
>  - Fuzzy queries are clearly less performant, because BlockTree is so 
> optimized for them.
> Compared to BlockTree, FST size is reduced by 15%, and segment writing time 
> is reduced by 20%. So this PostingsFormat scales to lots of docs, as 
> BlockTree.
> This initial version passes all Lucene tests. Use “ant test 
> -Dtests.codec=UniformSplitTesting” to test with this PostingsFormat.
> Subjectively, we think we have fulfilled our goal of code simplicity. And we 
> have already exercised this PostingsFormat extensibility to create a 
> different flavor for our own use-case.
> Contributors: Juan Camilo Rodriguez Duran, Bruno Roustant, David Smiley



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8753) New PostingFormat - UniformSplit

2019-08-31 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920294#comment-16920294
 ] 

David Smiley commented on LUCENE-8753:
--

Bruno:  Why a new PR?  Please don't do that next time unless the new PR is 
quite different in approach.  I know you needed to update the PR to consider 
changes in Lucene but that doesn't require a new PR; just rebase it and 
force-push.

I encourage others to take a look if they wish.  If there aren't further issues 
to address, I'd like to commit this later next week (e.g. maybe Sept 8th).

> New PostingFormat - UniformSplit
> 
>
> Key: LUCENE-8753
> URL: https://issues.apache.org/jira/browse/LUCENE-8753
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.0
>Reporter: Bruno Roustant
>Assignee: David Smiley
>Priority: Major
> Attachments: Uniform Split Technique.pdf, luceneutil.benchmark.txt
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is a proposal to add a new PostingsFormat called "UniformSplit" with 4 
> objectives:
>  - Clear design and simple code.
>  - Easily extensible, for both the logic and the index format.
>  - Light memory usage with a very compact FST.
>  - Focus on efficient TermQuery, PhraseQuery and PrefixQuery performance.
> (the pdf attached explains visually the technique in more details)
>  The principle is to split the list of terms into blocks and use a FST to 
> access the block, but not as a prefix trie, rather with a seek-floor pattern. 
> For the selection of the blocks, there is a target average block size (number 
> of terms), with an allowed delta variation (10%) to compare the terms and 
> select the one with the minimal distinguishing prefix.
>  There are also several optimizations inside the block to make it more 
> compact and speed up the loading/scanning.
> The performance obtained is interesting with the luceneutil benchmark, 
> comparing UniformSplit with BlockTree. Find it in the first comment and also 
> attached for better formatting.
> Although the precise percentages vary between runs, three main points:
>  - TermQuery and PhraseQuery are improved.
>  - PrefixQuery and WildcardQuery are ok.
>  - Fuzzy queries are clearly less performant, because BlockTree is so 
> optimized for them.
> Compared to BlockTree, FST size is reduced by 15%, and segment writing time 
> is reduced by 20%. So this PostingsFormat scales to lots of docs, as 
> BlockTree.
> This initial version passes all Lucene tests. Use “ant test 
> -Dtests.codec=UniformSplitTesting” to test with this PostingsFormat.
> Subjectively, we think we have fulfilled our goal of code simplicity. And we 
> have already exercised this PostingsFormat extensibility to create a 
> different flavor for our own use-case.
> Contributors: Juan Camilo Rodriguez Duran, Bruno Roustant, David Smiley



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8758) Class Field levelN is not populated correctly in QuadPrefixTree

2019-08-30 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8758:
-
Fix Version/s: (was: 8.x)
   8.3
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks Amish.

> Class Field levelN is not populated correctly in QuadPrefixTree
> ---
>
> Key: LUCENE-8758
> URL: https://issues.apache.org/jira/browse/LUCENE-8758
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Affects Versions: 4.0, 5.0, 6.0, 7.0, 8.0
>Reporter: Dominic Page
>Assignee: David Smiley
>Priority: Trivial
>  Labels: beginner
> Fix For: 8.3
>
> Attachments: LUCENE-8758.patch
>
>
> QuadPrefixTree in Lucene prepopulates these arrays:
> {{levelW = new double[maxLevels];}}
> {{levelH = new double[maxLevels];}}
> {{*levelS = new int[maxLevels];*}}
> {{*levelN = new int[maxLevels];*}}
> Like this
> {{for (int i = 1; i < levelW.length; i++) {}}
> {{ levelW[i] = levelW[i - 1] / 2.0;}}
> {{ levelH[i] = levelH[i - 1] / 2.0;}}
> {{ *levelS[i] = levelS[i - 1] * 2;*}}
> {{ *levelN[i] = levelN[i - 1] * 4;*}}
> {{}}}
> The field
> {{levelN[]}}
> overflows after level 14 = 1073741824 where maxLevels is limited to 
> {{MAX_LEVELS_POSSIBLE = 50;}}
> The field levelN appears not to be used anywhere. Likewise, the field
> {{levelS[] }}
> is only used in the 
> {{printInfo}}
> method. I would propose either to remove both 
> {{levelN[],}}{{levelS[]}}
> or to change the datatype
> {{levelN = new long[maxLevels];}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-08-29 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919200#comment-16919200
 ] 

David Smiley commented on SOLR-12638:
-

Thanks Moshe; you are right. Still, I think there is probably more we can do to 
have Solr be friendlier in this regard.  If the user forgets about the 
specific/unusual settings of the root field in the schema, and nonetheless 
tries to do a partial-update, then it's a bad/confusing user experience.  To 
help I think we can throw an error in this case: SOLR-13728.


> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1
>
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13728) Fail partial updates if it would inadvertently remove nested docs

2019-08-29 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-13728:

Attachment: SOLR-13728.patch
Status: Patch Available  (was: Patch Available)

> Fail partial updates if it would inadvertently remove nested docs
> -
>
> Key: SOLR-13728
> URL: https://issues.apache.org/jira/browse/SOLR-13728
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: SOLR-13728.patch
>
>
> In SOLR-12638 Solr gained the ability to do partial updates (aka atomic 
> updates) to nested documents.  However this feature only works if the schema 
> meets certain circumstances.  We can know we don't support it and fail the 
> request – what I propose here.  This is much friendlier than wiping out 
> existing documents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13728) Fail partial updates if it would inadvertently remove nested docs

2019-08-29 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-13728:

Status: Patch Available  (was: Open)

> Fail partial updates if it would inadvertently remove nested docs
> -
>
> Key: SOLR-13728
> URL: https://issues.apache.org/jira/browse/SOLR-13728
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> In SOLR-12638 Solr gained the ability to do partial updates (aka atomic 
> updates) to nested documents.  However this feature only works if the schema 
> meets certain circumstances.  We can know we don't support it and fail the 
> request – what I propose here.  This is much friendlier than wiping out 
> existing documents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-13728) Fail partial updates if it would inadvertently remove nested docs

2019-08-29 Thread David Smiley (Jira)
David Smiley created SOLR-13728:
---

 Summary: Fail partial updates if it would inadvertently remove 
nested docs
 Key: SOLR-13728
 URL: https://issues.apache.org/jira/browse/SOLR-13728
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley
Assignee: David Smiley


In SOLR-12638 Solr gained the ability to do partial updates (aka atomic 
updates) to nested documents.  However this feature only works if the schema 
meets certain circumstances.  We can know we don't support it and fail the 
request – what I propose here.  This is much friendlier than wiping out 
existing documents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8758) Class Field levelN is not populated correctly in QuadPrefixTree

2019-08-29 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8758:
-
Status: Patch Available  (was: Open)

> Class Field levelN is not populated correctly in QuadPrefixTree
> ---
>
> Key: LUCENE-8758
> URL: https://issues.apache.org/jira/browse/LUCENE-8758
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Affects Versions: 4.0, 5.0, 6.0, 7.0, 8.0
>Reporter: Dominic Page
>Assignee: David Smiley
>Priority: Trivial
>  Labels: beginner
> Fix For: 8.x
>
> Attachments: LUCENE-8758.patch
>
>
> QuadPrefixTree in Lucene prepopulates these arrays:
> {{levelW = new double[maxLevels];}}
> {{levelH = new double[maxLevels];}}
> {{*levelS = new int[maxLevels];*}}
> {{*levelN = new int[maxLevels];*}}
> Like this
> {{for (int i = 1; i < levelW.length; i++) {}}
> {{ levelW[i] = levelW[i - 1] / 2.0;}}
> {{ levelH[i] = levelH[i - 1] / 2.0;}}
> {{ *levelS[i] = levelS[i - 1] * 2;*}}
> {{ *levelN[i] = levelN[i - 1] * 4;*}}
> {{}}}
> The field
> {{levelN[]}}
> overflows after level 14 = 1073741824 where maxLevels is limited to 
> {{MAX_LEVELS_POSSIBLE = 50;}}
> The field levelN appears not to be used anywhere. Likewise, the field
> {{levelS[] }}
> is only used in the 
> {{printInfo}}
> method. I would propose either to remove both 
> {{levelN[],}}{{levelS[]}}
> or to change the datatype
> {{levelN = new long[maxLevels];}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-8758) Class Field levelN is not populated correctly in QuadPrefixTree

2019-08-29 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-8758:


Assignee: David Smiley

> Class Field levelN is not populated correctly in QuadPrefixTree
> ---
>
> Key: LUCENE-8758
> URL: https://issues.apache.org/jira/browse/LUCENE-8758
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Affects Versions: 4.0, 5.0, 6.0, 7.0, 8.0
>Reporter: Dominic Page
>Assignee: David Smiley
>Priority: Trivial
>  Labels: beginner
> Fix For: 8.x
>
> Attachments: LUCENE-8758.patch
>
>
> QuadPrefixTree in Lucene prepopulates these arrays:
> {{levelW = new double[maxLevels];}}
> {{levelH = new double[maxLevels];}}
> {{*levelS = new int[maxLevels];*}}
> {{*levelN = new int[maxLevels];*}}
> Like this
> {{for (int i = 1; i < levelW.length; i++) {}}
> {{ levelW[i] = levelW[i - 1] / 2.0;}}
> {{ levelH[i] = levelH[i - 1] / 2.0;}}
> {{ *levelS[i] = levelS[i - 1] * 2;*}}
> {{ *levelN[i] = levelN[i - 1] * 4;*}}
> {{}}}
> The field
> {{levelN[]}}
> overflows after level 14 = 1073741824 where maxLevels is limited to 
> {{MAX_LEVELS_POSSIBLE = 50;}}
> The field levelN appears not to be used anywhere. Likewise, the field
> {{levelS[] }}
> is only used in the 
> {{printInfo}}
> method. I would propose either to remove both 
> {{levelN[],}}{{levelS[]}}
> or to change the datatype
> {{levelN = new long[maxLevels];}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present

2019-08-29 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918913#comment-16918913
 ] 

David Smiley commented on LUCENE-8403:
--

RE a separate field:  That's a valid approach, yes.  I/Michael should have 
acknowledged that up front.  However the trade-off is that it would mean 
analyzing the text all over again[1], and mucking with the higher level 
features to use a separate field for the term vector (e.g. in a highlighter).

[1]: It'd be neat if somehow one IndexableField could _listen_ for analysis 
events processed from another field.  It's probably possible to hack something 
up that works today assuming you know the order of fields.  This might be used 
not only for populating term vectors but also for populating SortedSetDocValues 
sourced from analyzed terms.

RE "There's no good technical reason to introduce a layering violation".  It 
debatable if term vectors need to be seen as "layered".  I understand that you 
do, and hence your strong opposition about the proposal here.

> Support 'filtered' term vectors - don't require all terms to be present
> ---
>
> Key: LUCENE-8403
> URL: https://issues.apache.org/jira/browse/LUCENE-8403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Braun
>Priority: Minor
> Attachments: LUCENE-8403.patch
>
>
> The genesis of this was a conversation and idea from [~dsmiley] several years 
> ago.
> In order to optimize term vector storage, we may not actually need all tokens 
> to be present in the term vectors - and if so, ideally our codec could just 
> opt not to store them.
> I attempted to fork the standard codec and override the TermVectorsFormat and 
> TermVectorsWriter to ignore storing certain Terms within a field. This 
> worked, however, CheckIndex checks that the terms present in the standard 
> postings are also present in the TVs, if TVs enabled. So this then doesn't 
> work as 'valid' according to CheckIndex.
> Can the TermVectorsFormat be made in such a way to support configuration of 
> tokens that should not be stored (benefits: less storage, more optimal 
> retrieval per doc)? Is this valuable to the wider community? Is there a way 
> we can design this to not break CheckIndex's contract while at the same time 
> lessening storage for unneeded tokens?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present

2019-08-29 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918711#comment-16918711
 ] 

David Smiley commented on LUCENE-8403:
--

{quote}I understand the approaches – your approach seems to be a longer term 
solution (I am not sure of the complexity implications though).
{quote}
I don't think it's long term; I expect it's a simple flag to inform CheckIndex 
that it shouldn't check something in this case.  Perhaps if you want to explore 
this you might see if it's this simple.  The biggest part would be a test 
including a custom format that exercises this flag to ensure check index 
doesn't freak out.

> Support 'filtered' term vectors - don't require all terms to be present
> ---
>
> Key: LUCENE-8403
> URL: https://issues.apache.org/jira/browse/LUCENE-8403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Braun
>Priority: Minor
> Attachments: LUCENE-8403.patch
>
>
> The genesis of this was a conversation and idea from [~dsmiley] several years 
> ago.
> In order to optimize term vector storage, we may not actually need all tokens 
> to be present in the term vectors - and if so, ideally our codec could just 
> opt not to store them.
> I attempted to fork the standard codec and override the TermVectorsFormat and 
> TermVectorsWriter to ignore storing certain Terms within a field. This 
> worked, however, CheckIndex checks that the terms present in the standard 
> postings are also present in the TVs, if TVs enabled. So this then doesn't 
> work as 'valid' according to CheckIndex.
> Can the TermVectorsFormat be made in such a way to support configuration of 
> tokens that should not be stored (benefits: less storage, more optimal 
> retrieval per doc)? Is this valuable to the wider community? Is there a way 
> we can design this to not break CheckIndex's contract while at the same time 
> lessening storage for unneeded tokens?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8951) Create issues@ and builds@ lists and update notifications

2019-08-28 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917738#comment-16917738
 ] 

David Smiley commented on LUCENE-8951:
--

Ugh; why do mailing lists have to be such a usability nightmare.  Ancient 
software.

> Create issues@ and builds@ lists and update notifications
> -
>
> Key: LUCENE-8951
> URL: https://issues.apache.org/jira/browse/LUCENE-8951
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>
> Issue to plan and execute decision from dev mailing list 
> [https://lists.apache.org/thread.html/762d72a9045642dc488dc7a2fd0a525707e5fa5671ac0648a3604c9b@%3Cdev.lucene.apache.org%3E]
>  # Create mailing lists as an announce only list (/)
>  # Subscribe all emails that will be allowed to post (/)
>  # Update websites with info about the new lists
>  # Announce to dev@ list that the change will happen
>  # Modify Jira and Github bots to post to issues@ list instead of dev@
>  # Modify Jenkins (including Policeman and other) to post to builds@
>  # Announce to dev@ list that the change is effective



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9418) Statistical Phrase Identifier

2019-08-27 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916697#comment-16916697
 ] 

David Smiley commented on SOLR-9418:


I just want to point out that there's no trace of this in the Solr Reference 
Guide, and as-such is basically a hidden feature.

> Statistical Phrase Identifier
> -
>
> Key: SOLR-9418
> URL: https://issues.apache.org/jira/browse/SOLR-9418
> Project: Solr
>  Issue Type: New Feature
>Reporter: Akash Mehta
>Assignee: Hoss Man
>Priority: Major
> Fix For: 7.5, 8.0
>
> Attachments: SOLR-9418.patch, SOLR-9418.patch, SOLR-9418.patch, 
> SOLR-9418.zip
>
>
> h2. *Summary:*
> The Statistical Phrase Identifier is a Solr contribution that takes in a 
> string of text and then leverages a language model (an Apache Lucene/Solr 
> inverted index) to predict how the inputted text should be divided into 
> phrases. The intended purpose of this tool is to parse short-text queries 
> into phrases prior to executing a keyword search (as opposed parsing out each 
> keyword as a single term).
> It is being generously donated to the Solr project by CareerBuilder, with the 
> original source code and a quickly demo-able version located here:  
> [https://github.com/careerbuilder/statistical-phrase-identifier|https://github.com/careerbuilder/statistical-phrase-identifier,]
> h2. *Purpose:*
> Assume you're building a job search engine, and one of your users searches 
> for the following:
>  _machine learning research and development Portland, OR software engineer 
> AND hadoop, java_
> Most search engines will natively parse this query into the following boolean 
> representation:
>  _(machine AND learning AND research AND development AND Portland) OR 
> (software AND engineer AND hadoop AND java)_
> While this query may still yield relevant results, it is clear that the 
> intent of the user wasn't understood very well at all. By leveraging the 
> Statistical Phrase Identifier on this string prior to query parsing, you can 
> instead expect the following parsing:
> _{machine learning} \{and} \{research and development} \{Portland, OR} 
> \{software engineer} \{AND} \{hadoop,} \{java}_
> It is then possile to modify all the multi-word phrases prior to executing 
> the search:
>  _"machine learning" and "research and development" "Portland, OR" "software 
> engineer" AND hadoop, java_
> Of course, you could do your own query parsing to specifically handle the 
> boolean syntax, but the following would eventually be interpreted correctly 
> by Apache Solr and most other search engines:
>  _"machine learning" AND "research and development" AND "Portland, OR" AND 
> "software engineer" AND hadoop AND java_ 
> h2. *History:*
> This project was originally implemented by the search team at CareerBuilder 
> in the summer of 2015 for use as part of their semantic search system. In the 
> summer of 2016, Akash Mehta, implemented a much simpler version as a proof of 
> concept based upon publicly available information about the CareerBuilder 
> implementation (the first attached patch).  In July of 2018, CareerBuilder 
> open sourced their original version 
> ([https://github.com/careerbuilder/statistical-phrase-identifier),|https://github.com/careerbuilder/statistical-phrase-identifier,]
>  and agreed to also donate the code to the Apache Software foundation as a 
> Solr contribution. An Solr patch with the CareerBuilder version was added to 
> this issue on September 5th, 2018, and community feedback and contributions 
> are encouraged.
> This issue was originally titled the "Probabilistic Query Parser", but the 
> name has now been updated to "Statistical Phrase Identifier" to avoid 
> ambiguity with Solr's query parsers (per some of the feedback on this issue), 
> as the implementation is actually just a mechanism for identifying phrases 
> statistically from a string and is NOT a Solr query parser. 
> h2. *Example usage:*
> h3. (See contrib readme or configuration files in the patch for full 
> configuration details)
> h3. *{{Request:}}*
> {code:java}
> http://localhost:8983/solr/spi/parse?q=darth vader obi wan kenobi anakin 
> skywalker toad x men magneto professor xavier{code}
> h3. *{{Response:}}* 
> {code:java}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":25},
>     "top_parsed_query":"{darth vader} {obi wan kenobi} {anakin skywalker} 
> {toad} {x men} {magneto} {professor xavier}",
>     "top_parsed_phrases":[
>       "darth vader",
>       "obi wan kenobi",
>       "anakin skywalker",
>       "toad",
>       "x-men",
>       "magneto",
>       "professor xavier"],
>       "potential_parsings":[{
>       "parsed_phrases":["darth vader",
>       "obi wan kenobi",
>       "anakin skywalker",
>       "toad",
>       "x-men",
>       "magneto",
>       

[jira] [Commented] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present

2019-08-27 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916687#comment-16916687
 ] 

David Smiley commented on LUCENE-8403:
--

bq. Does it make sense for me to adapt the patch to support pattern based 
filtering?

I think you should discuss the idea here first since there's a blocker.

What I'd most prefer, if [~rcmuir] might approve, is a way for the 
TermVectorsFormat to somehow advertise that it's contents do not align with a 
PostingsFormat.  Straw-man: perhaps a {{TermVectorsFormat.isFiltered()}} 
method.  In such a case, CheckIndex could still check that the TVF API works 
(it would call CheckIndex.checkFields(tvFields, ...) but it would not compare 
it to the terms() -- logic gated by {{doSlowChecks}} param in 
{{testTermVectors()}}.  This would be very general and allow all manner of 
variations a term vector might have from the analyzed text.  

A less general approach is one akin to Hoss's suggestion that the TVF 
advertises *which* terms are consistent.  Though not a list which is way too 
inflexible, more like a callback method such as 
{{TermVectorsFormat.acceptsTerm(BytesRef)}}.

I don't think IndexWriterConfig should be modified as I think this is too 
expert to warrant that.

Atri, curious, what exactly was the error message string thrown by CheckIndex 
for a filtered term?

Side note: TermVectorsWriter's API is dated; ought to look more like Postings 
writing.  I have some old notes on a plan to tackle that.

> Support 'filtered' term vectors - don't require all terms to be present
> ---
>
> Key: LUCENE-8403
> URL: https://issues.apache.org/jira/browse/LUCENE-8403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Braun
>Priority: Minor
> Attachments: LUCENE-8403.patch
>
>
> The genesis of this was a conversation and idea from [~dsmiley] several years 
> ago.
> In order to optimize term vector storage, we may not actually need all tokens 
> to be present in the term vectors - and if so, ideally our codec could just 
> opt not to store them.
> I attempted to fork the standard codec and override the TermVectorsFormat and 
> TermVectorsWriter to ignore storing certain Terms within a field. This 
> worked, however, CheckIndex checks that the terms present in the standard 
> postings are also present in the TVs, if TVs enabled. So this then doesn't 
> work as 'valid' according to CheckIndex.
> Can the TermVectorsFormat be made in such a way to support configuration of 
> tokens that should not be stored (benefits: less storage, more optimal 
> retrieval per doc)? Is this valuable to the wider community? Is there a way 
> we can design this to not break CheckIndex's contract while at the same time 
> lessening storage for unneeded tokens?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present

2019-08-26 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916316#comment-16916316
 ] 

David Smiley commented on LUCENE-8403:
--

Atri, I appreciate you put some effort into this but your patch wouldn't work 
for the use case that inspired the creation of this feature-request.  The terms 
to be omitted by the term vector are matchable by a pattern; it's not a fixed 
pre-determined list.  For example imagine filtering all terms that start or end 
with a special character.

But this issue is stuck without addressing the concern Robert raises -- 
CheckIndex.  I don't recall the particulars of where in CheckIndex.java it 
complains but try it out on your patch to see.  Given randomized checkIndex 
usage automatically within tests, I suspect your patch will ultimately fail 
given enough iterations.

> Support 'filtered' term vectors - don't require all terms to be present
> ---
>
> Key: LUCENE-8403
> URL: https://issues.apache.org/jira/browse/LUCENE-8403
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Braun
>Priority: Minor
> Attachments: LUCENE-8403.patch
>
>
> The genesis of this was a conversation and idea from [~dsmiley] several years 
> ago.
> In order to optimize term vector storage, we may not actually need all tokens 
> to be present in the term vectors - and if so, ideally our codec could just 
> opt not to store them.
> I attempted to fork the standard codec and override the TermVectorsFormat and 
> TermVectorsWriter to ignore storing certain Terms within a field. This 
> worked, however, CheckIndex checks that the terms present in the standard 
> postings are also present in the TVs, if TVs enabled. So this then doesn't 
> work as 'valid' according to CheckIndex.
> Can the TermVectorsFormat be made in such a way to support configuration of 
> tokens that should not be stored (benefits: less storage, more optimal 
> retrieval per doc)? Is this valuable to the wider community? Is there a way 
> we can design this to not break CheckIndex's contract while at the same time 
> lessening storage for unneeded tokens?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8758) Class Field levelN is not populated correctly in QuadPrefixTree

2019-08-25 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915422#comment-16915422
 ] 

David Smiley commented on LUCENE-8758:
--

Go for it!

> Class Field levelN is not populated correctly in QuadPrefixTree
> ---
>
> Key: LUCENE-8758
> URL: https://issues.apache.org/jira/browse/LUCENE-8758
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Affects Versions: 4.0, 5.0, 6.0, 7.0, 8.0
>Reporter: Dominic Page
>Priority: Trivial
>  Labels: beginner
> Fix For: 8.x
>
>
> QuadPrefixTree in Lucene prepopulates these arrays:
> {{levelW = new double[maxLevels];}}
> {{levelH = new double[maxLevels];}}
> {{*levelS = new int[maxLevels];*}}
> {{*levelN = new int[maxLevels];*}}
> Like this
> {{for (int i = 1; i < levelW.length; i++) {}}
> {{ levelW[i] = levelW[i - 1] / 2.0;}}
> {{ levelH[i] = levelH[i - 1] / 2.0;}}
> {{ *levelS[i] = levelS[i - 1] * 2;*}}
> {{ *levelN[i] = levelN[i - 1] * 4;*}}
> {{}}}
> The field
> {{levelN[]}}
> overflows after level 14 = 1073741824 where maxLevels is limited to 
> {{MAX_LEVELS_POSSIBLE = 50;}}
> The field levelN appears not to be used anywhere. Likewise, the field
> {{levelS[] }}
> is only used in the 
> {{printInfo}}
> method. I would propose either to remove both 
> {{levelN[],}}{{levelS[]}}
> or to change the datatype
> {{levelN = new long[maxLevels];}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11601) geodist fails for some fields when field is in parenthesis instead of sfield param

2019-08-23 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-11601:

Fix Version/s: 8.3
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for contributing the fix Amrit.  Feel free to prod us when we neglect to 
pay attention to your waiting patches :-)

Note I moved the test from TestGeoJSONResponseWriter to TestSolr4Spatial2 
because it had nothing to do with geojson.

> geodist fails for some fields when field is in parenthesis instead of sfield 
> param
> --
>
> Key: SOLR-11601
> URL: https://issues.apache.org/jira/browse/SOLR-11601
> Project: Solr
>  Issue Type: Improvement
>  Components: spatial
>Affects Versions: 6.6
>Reporter: Clemens Wyss
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
> Attachments: SOLR-11601.patch, SOLR-11601.patch, SOLR-11601.patch
>
>
> Im switching my schemas from derprecated solr.LatLonType to 
> solr.LatLonPointSpatialField.
> Now my sortquery (which used to work with solr.LatLonType):
> *sort=geodist(b4_location__geo_si,47.36667,8.55) asc*
> raises the error
> {color:red}*"sort param could not be parsed as a query, and is not a field 
> that exists in the index: geodist(b4_location__geo_si,47.36667,8.55)"*{color}
> Invoking sort using syntax 
> {color:#14892c}sfield=b4_location__geo_si=47.36667,8.55=geodist() asc
> works as expected though...{color}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12393) ExpandComponent only calculates the score of expanded docs when sorted by score

2019-08-22 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913780#comment-16913780
 ] 

David Smiley commented on SOLR-12393:
-

I reviewed the code but admittedly not the test.  I don't think we should 
examine {{groupExpandCollector.scoreMode()}} to determine if the scores are 
already present.  Instead, just look to see if the scores aren't already there 
-- they will be NaN if not.  This addresses the two comment lines you added 
which can then be removed.

> ExpandComponent only calculates the score of expanded docs when sorted by 
> score
> ---
>
> Key: SOLR-12393
> URL: https://issues.apache.org/jira/browse/SOLR-12393
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Reporter: David Smiley
>Priority: Major
> Attachments: SOLR-12393.patch, SOLR-12393.patch, SOLR-12393.patch
>
>
> If you use the ExpandComponent to show expanded docs and if you want the 
> score back (specified in "fl"), it will be NaN if the expanded docs are 
> sorted by anything other than the default score descending.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12393) ExpandComponent only calculates the score of expanded docs when sorted by score

2019-08-21 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912815#comment-16912815
 ] 

David Smiley commented on SOLR-12393:
-

I'll review this Munendra; thanks for looking.  What'd different in the latest 
patch, which is much smaller?

> ExpandComponent only calculates the score of expanded docs when sorted by 
> score
> ---
>
> Key: SOLR-12393
> URL: https://issues.apache.org/jira/browse/SOLR-12393
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Reporter: David Smiley
>Priority: Major
> Attachments: SOLR-12393.patch, SOLR-12393.patch, SOLR-12393.patch
>
>
> If you use the ExpandComponent to show expanded docs and if you want the 
> score back (specified in "fl"), it will be NaN if the expanded docs are 
> sorted by anything other than the default score descending.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8755) QuadPrefixTree robustness: can throw exception while indexing a point at high precision

2019-08-20 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-8755.
--
Fix Version/s: 8.3
 Assignee: David Smiley
   Resolution: Fixed

_Thanks Chongchen Chen!_

I added several items in CHANGES.txt:
* upgrade notes: because users need to know they should add a version to 
SpatialPrefixTreeFactory args now.  I updated the javadocs for this as well.
* optimization: For each level of the grid we removed division (a little thing 
I did) and removed creating a Rectangle and various code using it.  It's now 
much less computation for point data.  I did some benchmarking in the benchmark 
module tweaking spatial.alg and it appears there's a 7% improvement but YMMV of 
course.
* bug fix: as described above

> QuadPrefixTree robustness: can throw exception while indexing a point at high 
> precision
> ---
>
> Key: LUCENE-8755
> URL: https://issues.apache.org/jira/browse/LUCENE-8755
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial-extras
>Reporter: senthil nathan
>Assignee: David Smiley
>Priority: Critical
> Fix For: 8.3
>
> Attachments: LUCENE-8755.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When trying to index a below document with apache solr 7.5.0 i am getting 
> java.lang.IndexOutOfBoundsException, this data is causing the whole full 
> import to be failed. I have also defined my schema for your reference 
>  
> Data:
> [
> { "street_description":"SAMPLE_TEXT", "pao_start_number":6, 
> "x_coordinate":244502.06, "sao_text":"FIRST FLOOR", "logical_status":"1", 
> "street_record_type":1, "id":"AA60L12-ENG", 
> "street_description_str":"SAMPLE_TEXT", "lpi_logical_status":"1", 
> "administrative_area":"SAMPLE_TEXT & HOVE", "uprn":"8899889", 
> "town_name":"TEST TOWN", "street_description_full":"60 DEMO ", 
> "y_coordinate":639062.07, "postcode_locator":"AB1 1BB", "location":"244502.06 
> 639062.07" }
> ]
>  
> Configuration in managed-schema.xml
>  
>  geo="false" maxDistErr="0.09" worldBounds="ENVELOPE(0,70,130,0)" 
> distErrPct="0.15"/>
>  stored="false"/>
>   stored="false"/>
>  
>   indexed="true" stored="true"/>
>   stored="true"/>
>   required="true" stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   indexed="false" stored="true"/>
>   indexed="false" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   multiValued="false" indexed="true" stored="true"/>
>   multiValued="false" indexed="true" stored="true"/> 
>   indexed="false" stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   stored="true"/>



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (LUCENE-8755) QuadPrefixTree robustness: can throw exception while indexing a point at high precision

2019-08-07 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8755:
-
Comment: was deleted

(was: As I write this, there is strangely no automated linking here to the PR, 
so I will specify it: https://github.com/apache/lucene-solr/pull/824)

> QuadPrefixTree robustness: can throw exception while indexing a point at high 
> precision
> ---
>
> Key: LUCENE-8755
> URL: https://issues.apache.org/jira/browse/LUCENE-8755
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial-extras
>Reporter: senthil nathan
>Priority: Critical
> Attachments: LUCENE-8755.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When trying to index a below document with apache solr 7.5.0 i am getting 
> java.lang.IndexOutOfBoundsException, this data is causing the whole full 
> import to be failed. I have also defined my schema for your reference 
>  
> Data:
> [
> { "street_description":"SAMPLE_TEXT", "pao_start_number":6, 
> "x_coordinate":244502.06, "sao_text":"FIRST FLOOR", "logical_status":"1", 
> "street_record_type":1, "id":"AA60L12-ENG", 
> "street_description_str":"SAMPLE_TEXT", "lpi_logical_status":"1", 
> "administrative_area":"SAMPLE_TEXT & HOVE", "uprn":"8899889", 
> "town_name":"TEST TOWN", "street_description_full":"60 DEMO ", 
> "y_coordinate":639062.07, "postcode_locator":"AB1 1BB", "location":"244502.06 
> 639062.07" }
> ]
>  
> Configuration in managed-schema.xml
>  
>  geo="false" maxDistErr="0.09" worldBounds="ENVELOPE(0,70,130,0)" 
> distErrPct="0.15"/>
>  stored="false"/>
>   stored="false"/>
>  
>   indexed="true" stored="true"/>
>   stored="true"/>
>   required="true" stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   indexed="false" stored="true"/>
>   indexed="false" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   multiValued="false" indexed="true" stored="true"/>
>   multiValued="false" indexed="true" stored="true"/> 
>   indexed="false" stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   stored="true"/>



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8755) QuadPrefixTree robustness: can throw exception while indexing a point at high precision

2019-08-07 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902540#comment-16902540
 ] 

David Smiley commented on LUCENE-8755:
--

As I write this, there is strangely no automated linking here to the PR, so I 
will specify it: https://github.com/apache/lucene-solr/pull/824

> QuadPrefixTree robustness: can throw exception while indexing a point at high 
> precision
> ---
>
> Key: LUCENE-8755
> URL: https://issues.apache.org/jira/browse/LUCENE-8755
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial-extras
>Reporter: senthil nathan
>Priority: Critical
> Attachments: LUCENE-8755.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When trying to index a below document with apache solr 7.5.0 i am getting 
> java.lang.IndexOutOfBoundsException, this data is causing the whole full 
> import to be failed. I have also defined my schema for your reference 
>  
> Data:
> [
> { "street_description":"SAMPLE_TEXT", "pao_start_number":6, 
> "x_coordinate":244502.06, "sao_text":"FIRST FLOOR", "logical_status":"1", 
> "street_record_type":1, "id":"AA60L12-ENG", 
> "street_description_str":"SAMPLE_TEXT", "lpi_logical_status":"1", 
> "administrative_area":"SAMPLE_TEXT & HOVE", "uprn":"8899889", 
> "town_name":"TEST TOWN", "street_description_full":"60 DEMO ", 
> "y_coordinate":639062.07, "postcode_locator":"AB1 1BB", "location":"244502.06 
> 639062.07" }
> ]
>  
> Configuration in managed-schema.xml
>  
>  geo="false" maxDistErr="0.09" worldBounds="ENVELOPE(0,70,130,0)" 
> distErrPct="0.15"/>
>  stored="false"/>
>   stored="false"/>
>  
>   indexed="true" stored="true"/>
>   stored="true"/>
>   required="true" stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   indexed="false" stored="true"/>
>   indexed="false" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   stored="true"/>
>   indexed="true" stored="true"/>
>   multiValued="false" indexed="true" stored="true"/>
>   multiValued="false" indexed="true" stored="true"/> 
>   indexed="false" stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   stored="true"/>
>   stored="true"/>



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-08-06 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-8883.
--
   Resolution: Fixed
Fix Version/s: 8.3

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
> Attachments: LUCENE-8883.patch, LUCENE-8883.patch, LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11866) Support efficient subset matching in query elevation rules

2019-08-05 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899974#comment-16899974
 ] 

David Smiley commented on SOLR-11866:
-

Thanks Munendra... I pushed the commit just now; I had committed locally 
without pushing it like 30min ago.  This is one aspect of git that's annoying.

> Support efficient subset matching in query elevation rules
> --
>
> Key: SOLR-11866
> URL: https://issues.apache.org/jira/browse/SOLR-11866
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 8.0
>Reporter: Bruno Roustant
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Leverages the SOLR-11865 refactoring by introducing a 
> SubsetMatchElevationProvider in QueryElevationComponent. This provider calls 
> a new util class TrieSubsetMatcher to efficiently match all query elevation 
> rules which subset is contained by the current query list of terms.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-08-05 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899969#comment-16899969
 ] 

David Smiley commented on SOLR-13593:
-

Both name & class seems error prone; can't we simply disallow this (throw 
error) from the start?

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11866) Support efficient subset matching in query elevation rules

2019-08-05 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-11866:

   Resolution: Fixed
Fix Version/s: 8.3
   Status: Resolved  (was: Patch Available)

Thanks Bruno!

My commit to master stupidly included a file for some WIP, and so I reverted 
that.
You might notice a slight line or two change in QEC where I did an inline or 
trivial thing.  On the 8x backport I had to be more specific on the 
parameterized types.

> Support efficient subset matching in query elevation rules
> --
>
> Key: SOLR-11866
> URL: https://issues.apache.org/jira/browse/SOLR-11866
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 8.0
>Reporter: Bruno Roustant
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Leverages the SOLR-11865 refactoring by introducing a 
> SubsetMatchElevationProvider in QueryElevationComponent. This provider calls 
> a new util class TrieSubsetMatcher to efficiently match all query elevation 
> rules which subset is contained by the current query list of terms.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5381) Split Clusterstate and scale

2019-08-02 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899026#comment-16899026
 ] 

David Smiley commented on SOLR-5381:


Shall we mark this closed now?  I see this as done, save for one item.  And I 
know work has been done on improving Overseer efficiency a tone since 2013.  CC 
[~noble.paul]

> Split Clusterstate and scale 
> -
>
> Key: SOLR-5381
> URL: https://issues.apache.org/jira/browse/SOLR-5381
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> clusterstate.json is a single point of contention for all components in 
> SolrCloud. It would be hard to scale SolrCloud beyond a few thousand nodes 
> because there are too many updates and too many nodes need to be notified of 
> the changes. As the no:of nodes go up the size of clusterstate.json keeps 
> going up and it will soon exceed the limit impossed by ZK.
> The first step is to store the shards information in separate nodes and each 
> node can just listen to the shard node it belongs to. We may also need to 
> split each collection into its own node and the clusterstate.json just 
> holding the names of the collections .
> This is an umbrella issue



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8906) Lucene50PostingsReader.postings() casts BlockTermState param to private IntBlockTermState

2019-08-01 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-8906.
--
   Resolution: Fixed
 Assignee: David Smiley
Fix Version/s: 8.3

Merged PR.  Thanks Bruno!

> Lucene50PostingsReader.postings() casts BlockTermState param to private 
> IntBlockTermState
> -
>
> Key: LUCENE-8906
> URL: https://issues.apache.org/jira/browse/LUCENE-8906
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Bruno Roustant
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Lucene50PostingsReader is the public API that offers the postings() method to 
> read the postings. Any PostingFormat can use it (as well as 
> Lucene50PostingsWriter) to read/write postings.
> But the postings() method asks for a (public) BlockTermState param which is 
> internally cast to the private IntBlockTermState. This BlockTermState is 
> provided by Lucene50PostingsReader.newTermState().
> public PostingsEnum postings(FieldInfo fieldInfo, BlockTermState termState, 
> PostingsEnum reuse, int flags)
> This actually makes impossible to a custom PostingFormat customizing the 
> Block file structure to use this postings() method by providing their 
> (Int)BlockTermState, because they cannot access the FP fields of the 
> IntBlockTermState returned by PostingsReaderBase.newTermState().
> Proposed change:
>  * Either make IntBlockTermState public, as well as its fields.
>  * Or replace it by an interface in the postings() method. In this case the 
> IntBlockTermState fields currently accessed directly would be replaced by 
> getter/setter.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8941) Build wildcard matches more lazily

2019-08-01 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898394#comment-16898394
 ] 

David Smiley commented on LUCENE-8941:
--

It'd be clearer if TermsEnumDisjunctionMatchesIterator had a line of 
documentation somewhere that more explicitly points out that it is merely 
lazily wrapping the MatchesIterator because it may not be needed.  Also maybe 
pulling out the initialization code to a method named as such, like init() 
would overall be clearer.

It's a shame FilterMatchesIterator cannot be used here, given all but one 
method simply delegates.  We can't because the input is declared to be final.  
Do you think it's worth loosening that so that we can use it?

I confess I don't see how the test here validates the laziness.  I anticipated 
you were going to create a boolean AND query including the MTQ and some other 
simple term query that sometimes doesn't match.

> Build wildcard matches more lazily
> --
>
> Key: LUCENE-8941
> URL: https://issues.apache.org/jira/browse/LUCENE-8941
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8941.patch
>
>
> When retrieving a Matches object from a multi-term query, such as an 
> AutomatonQuery or TermInSetQuery, we currently find all matching term 
> iterators up-front, to return a disjunction over all of them.  This can be 
> inefficient if we're only interested in finding out if anything matched, and 
> are iterating over a different field to retrieve offsets.
> We can improve this by returning immediately when the first matching term is 
> found, and only collecting other matching terms when we start iterating.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13523) Atomic Update results in NullPointerException

2019-07-31 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897607#comment-16897607
 ] 

David Smiley commented on SOLR-13523:
-

I think that warrants it's own discussion on dev@lucene.apache.org with 
modifications to the release process so that this case is systematically 
handled instead of by hapinstance

> Atomic Update results in NullPointerException
> -
>
> Key: SOLR-13523
> URL: https://issues.apache.org/jira/browse/SOLR-13523
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 8.1
> Environment: * Operating system: Win10 v1803 build 17143.766
>  * Java version:
> java 11.0.1 2018-10-16 LTS
> Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
> Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
>  * solr-spec: 8.1.1
>  * solr-impl: 8.1.1 fcbe46c28cef11bc058779afba09521de1b19bef - ab - 
> 2019-05-22 15:20:01
>  * lucene-spec: 8.1.1
>  * lucene-impl: 8.1.1 fcbe46c28cef11bc058779afba09521de1b19bef - ab - 
> 2019-05-22 15:15:24
>Reporter: Kieran Devlin
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1.2
>
> Attachments: SOLR-13523.patch, SOLR-13523.patch, SOLR-13523.patch, 
> SOLR-13523.patch, SOLR-13523_WIP_bug_hunt.patch, XUBrk.png, Xn1RW.png, 
> reproduce.sh
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Partially update a document via an atomic update, when I do so, the web sever 
> responds with a 500 status with the stack trace:
> {code:java}
> { "responseHeader":{ "status":500, "QTime":1}, "error":{ 
> "trace":"java.lang.NullPointerException\r\n\tat 
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:301)\r\n\tat
>  
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:398)\r\n\tat
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:697)\r\n\tat
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:372)\r\n\tat
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:337)\r\n\tat
>  
> org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)\r\n\tat
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:337)\r\n\tat
>  
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:223)\r\n\tat
>  
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\r\n\tat
>  
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
>  
> org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92)\r\n\tat
>  
> 

[jira] [Updated] (SOLR-13669) [CVE-2019-0193] Remote Code Execution via DataImportHandler

2019-07-31 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-13669:

Description: 
The DataImportHandler, an optional but popular module to pull in data from 
databases and other sources, has a feature in which the whole DIH configuration 
can come from a request's "dataConfig" parameter. The debug mode of the DIH 
admin screen uses this to allow convenient debugging / development of a DIH 
config. Since a DIH config can contain scripts, this parameter is a security 
risk. Starting with version 8.2.0 of Solr, use of this parameter requires 
setting the Java System property "enable.dih.dataConfigParam" to true.

Mitigations:
* Upgrade to 8.2.0 or later, which is secure by default.
* or, edit solrconfig.xml to configure all DataImportHandler usages with an 
"invariants" section listing the "dataConfig" parameter set to am empty string. 
 
* Ensure your network settings are configured so that only trusted traffic 
communicates with Solr, especially to the DIH request handler.  This is a best 
practice to all of Solr.

Credits:
* Michael Stepankin

References:
* https://issues.apache.org/jira/browse/SOLR-13669
* https://cwiki.apache.org/confluence/display/solr/SolrSecurity

  was:
The DataImportHandler, an optional but popular module to pull in data from 
databases and other sources, has a feature in which the whole DIH configuration 
can come from a request's "dataConfig" parameter. The debug mode of the DIH 
admin screen uses this to allow convenient debugging / development of a DIH 
config. Since a DIH config can contain scripts, this parameter is a security 
risk. Starting with version 8.2.0 of Solr, use of this parameter requires 
setting the Java System property "enable.dih.dataConfigParam" to true.

Mitigations:
* Upgrade to 8.2.0 or later, which is secure by default.
* or, edit solrconfig.xml to configure all DataImportHandler usages with an 
"invariants" section listing the "dataConfig" parameter set to am empty string. 
 
* Ensure your network settings are configured so that only trusted traffic 
communicates with Solr, especially to the DIH request handler.  This is a best 
practice to all of Solr.

Credits:
* Michael Stepankin

References:
* https://issues.apache.org/jira/browse/SOLR-13158
* https://cwiki.apache.org/confluence/display/solr/SolrSecurity


> [CVE-2019-0193] Remote Code Execution via DataImportHandler
> ---
>
> Key: SOLR-13669
> URL: https://issues.apache.org/jira/browse/SOLR-13669
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Reporter: Michael Stepankin
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1.2
>
>
> The DataImportHandler, an optional but popular module to pull in data from 
> databases and other sources, has a feature in which the whole DIH 
> configuration can come from a request's "dataConfig" parameter. The debug 
> mode of the DIH admin screen uses this to allow convenient debugging / 
> development of a DIH config. Since a DIH config can contain scripts, this 
> parameter is a security risk. Starting with version 8.2.0 of Solr, use of 
> this parameter requires setting the Java System property 
> "enable.dih.dataConfigParam" to true.
> Mitigations:
> * Upgrade to 8.2.0 or later, which is secure by default.
> * or, edit solrconfig.xml to configure all DataImportHandler usages with an 
> "invariants" section listing the "dataConfig" parameter set to am empty 
> string.  
> * Ensure your network settings are configured so that only trusted traffic 
> communicates with Solr, especially to the DIH request handler.  This is a 
> best practice to all of Solr.
> Credits:
> * Michael Stepankin
> References:
> * https://issues.apache.org/jira/browse/SOLR-13669
> * https://cwiki.apache.org/confluence/display/solr/SolrSecurity



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13669) [CVE-2019-0193] Remote Code Execution via DataImportHandler

2019-07-31 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-13669:

Security: Public  (was: Private (Security Issue))

> [CVE-2019-0193] Remote Code Execution via DataImportHandler
> ---
>
> Key: SOLR-13669
> URL: https://issues.apache.org/jira/browse/SOLR-13669
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Reporter: Michael Stepankin
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1.2
>
>
> The DataImportHandler, an optional but popular module to pull in data from 
> databases and other sources, has a feature in which the whole DIH 
> configuration can come from a request's "dataConfig" parameter. The debug 
> mode of the DIH admin screen uses this to allow convenient debugging / 
> development of a DIH config. Since a DIH config can contain scripts, this 
> parameter is a security risk. Starting with version 8.2.0 of Solr, use of 
> this parameter requires setting the Java System property 
> "enable.dih.dataConfigParam" to true.
> Mitigations:
> * Upgrade to 8.2.0 or later, which is secure by default.
> * or, edit solrconfig.xml to configure all DataImportHandler usages with an 
> "invariants" section listing the "dataConfig" parameter set to am empty 
> string.  
> * Ensure your network settings are configured so that only trusted traffic 
> communicates with Solr, especially to the DIH request handler.  This is a 
> best practice to all of Solr.
> Credits:
> * Michael Stepankin
> References:
> * https://issues.apache.org/jira/browse/SOLR-13158
> * https://cwiki.apache.org/confluence/display/solr/SolrSecurity



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13658) Discuss adding the new "var" construct to the forbidden API list.

2019-07-30 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896578#comment-16896578
 ] 

David Smiley commented on SOLR-13658:
-

+1 to prevent use of "var" until Solr 8 is finished.  The dubious benefit is 
not worth the backport pain.  I've had to fix this in one issue.  After Solr 8 
can be brought up at a future time but I'm hesitant.

> Discuss adding the new "var" construct to the forbidden API list.
> -
>
> Key: SOLR-13658
> URL: https://issues.apache.org/jira/browse/SOLR-13658
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> Personally, I'm strongly against allowing the "var" construct in Lucene/Solr 
> code. I think it's a wonderful opportunity to introduce bugs that won't be 
> found until runtime as well as making maintainence significantly harder. I 
> don't even think for a project like Solr it would save any time overall...
> So let's discuss this ahead of time and see if we can reach a consensus. I'll 
> start the discussion off:
> My baseline argument is that for a large complex project, especially ones 
> with many different people coding, I want the compiler to give me all the 
> help possible. And the "var" construct takes away some of that help.
> I’ve seen this argument go around at least 4 times in my career. The argument 
> that “it takes longer to write if you have to type all this stuff” is bogus. 
> Last I knew, 80% of the time spent is in maintaining/reading it. So the 
> argument “I can write faster” means I can save some fraction of the 20% of 
> the time writing the original code but spend many times that figuring out 
> what the code is actually doing the other 80% of the time.
> The IDE makes _writing_ this slightly faster, admittedly.
> {code:java}
> Whatever what = new Whatever();
> var kidding = what.getComplex();
> var blivet = kidding.get("stuff");
> {code}
> But once that’s done, if I’m reading the code again I don't have any clue what
> {code:java}
> kidding or blivet
> {code}
> are. Here's the signature for getComplex:
> {code:java}
> Map> getComplex()
> {code}
> I have to go over to the definition (which I admit is easier than it used to 
> be in the bad old days, but still) to find out.
> HERE'S THE PART I REALLY OBJECT TO!
> The above I could probably live with, maybe we could get the InteliJ 
> developers and see if they can make hover show the inference. What I will 
> kick and scream about is introducing bugs that are not found until runtime. 
> Even this obvious stupidity fails with a ClassCastException:
> {code:java}
> var corny = new TreeMap();
> corny.put("one", "two");
> corny.get(1);
> {code}
> But it's much worse when using classes from somewhere else. For instance, 
> change the underlying class in the first example to return
> {code:java}
> Map>{code}
> . 
>  This code that used to work now throws an error, _but it compiles_.
> {code:java}
> var kidding = what.getComplex();
> var blivet = kidding.get("stuff");
> var blah = kidding.get("stuff").get(1); //  generates ClassCastException: 
> class java.lang.String cannot be cast to class java.lang.Integer
> {code}
> So in order to save some time writing (that I claim will be lost multiple 
> times over when maintaining the code) we'll introduce run-time errors that 
> will take a bunch _more_ time to figure out, and won’t be found during unit 
> tests unless and until we have complete code coverage.
> If there's a way to insure that this kind of thing can't get into the code 
> and we implement that, I could be persuaded, but let's make that an explicit 
> requirement (and find a suitable task for the build system, precommit or 
> whatever).
> The floor is open...



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8776) Start offset going backwards has a legitimate purpose

2019-07-29 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895597#comment-16895597
 ] 

David Smiley commented on LUCENE-8776:
--

I think your response/question is about tokenstream contracts with offsets and 
not about highlighting, and that's well established territory already discussed 
in this thread.

> Start offset going backwards has a legitimate purpose
> -
>
> Key: LUCENE-8776
> URL: https://issues.apache.org/jira/browse/LUCENE-8776
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.6
>Reporter: Ram Venkat
>Priority: Major
>
> Here is the use case where startOffset can go backwards:
> Say there is a line "Organic light-emitting-diode glows", and I want to run 
> span queries and highlight them properly. 
> During index time, light-emitting-diode is split into three words, which 
> allows me to search for 'light', 'emitting' and 'diode' individually. The 
> three words occupy adjacent positions in the index, as 'light' adjacent to 
> 'emitting' and 'light' at a distance of two words from 'diode' need to match 
> this word. So, the order of words after splitting are: Organic, light, 
> emitting, diode, glows. 
> But, I also want to search for 'organic' being adjacent to 
> 'light-emitting-diode' or 'light-emitting-diode' being adjacent to 'glows'. 
> The way I solved this was to also generate 'light-emitting-diode' at two 
> positions: (a) In the same position as 'light' and (b) in the same position 
> as 'glows', like below:
> ||organic||light||emitting||diode||glows||
> | |light-emitting-diode| |light-emitting-diode| |
> |0|1|2|3|4|
> The positions of the two 'light-emitting-diode' are 1 and 3, but the offsets 
> are obviously the same. This works beautifully in Lucene 5.x in both 
> searching and highlighting with span queries. 
> But when I try this in Lucene 7.6, it hits the condition "Offsets must not go 
> backwards" at DefaultIndexingChain:818. This IllegalArgumentException is 
> being thrown without any comments on why this check is needed. As I explained 
> above, startOffset going backwards is perfectly valid, to deal with word 
> splitting and span operations on these specialized use cases. On the other 
> hand, it is not clear what value is added by this check and which highlighter 
> code is affected by offsets going backwards. This same check is done at 
> BaseTokenStreamTestCase:245. 
> I see others talk about how this check found bugs in WordDelimiter etc. but 
> it also prevents legitimate use cases. Can this check be removed?  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-07-28 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894821#comment-16894821
 ] 

David Smiley commented on SOLR-12638:
-

RE schema:  Yes that's right Adi.  No special additional attributes need to be 
specified on nest_path; internally NestPathField uses appropriate settings so 
that you needn't worry about it.

When providing documents to Solr when you have a nest_path, don't use 
\_childDocuments_ any longer.  I'm not sure if using that syntax has issues 
with partial updates; maybe.  Instead, use a fake field name that represents 
the relationship meaningfully.  Some examples are in the ref guide on that very 
page you linked (search "content").

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1
>
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8776) Start offset going backwards has a legitimate purpose

2019-07-26 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893852#comment-16893852
 ] 

David Smiley commented on LUCENE-8776:
--

[~venkat11] I am curious if you've tried the latest version of the 
UnifiedHighlighter that can be configured to use the "WeightMatches" API.  This 
is toggled via the 
{{org.apache.lucene.search.uhighlight.UnifiedHighlighter.HighlightFlag#WEIGHT_MATCHES}}
 flag.  This isn't the default at the Lucene level but ought to be changed to 
be.  You may notice some highlighting differences, like for phrases and spans 
in which phrases as a whole get one pair of open/close tags instead of the 
constituent words.  There's a chance that this mode ameliorates your 
highlighting woes with offsets.

> Start offset going backwards has a legitimate purpose
> -
>
> Key: LUCENE-8776
> URL: https://issues.apache.org/jira/browse/LUCENE-8776
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.6
>Reporter: Ram Venkat
>Priority: Major
>
> Here is the use case where startOffset can go backwards:
> Say there is a line "Organic light-emitting-diode glows", and I want to run 
> span queries and highlight them properly. 
> During index time, light-emitting-diode is split into three words, which 
> allows me to search for 'light', 'emitting' and 'diode' individually. The 
> three words occupy adjacent positions in the index, as 'light' adjacent to 
> 'emitting' and 'light' at a distance of two words from 'diode' need to match 
> this word. So, the order of words after splitting are: Organic, light, 
> emitting, diode, glows. 
> But, I also want to search for 'organic' being adjacent to 
> 'light-emitting-diode' or 'light-emitting-diode' being adjacent to 'glows'. 
> The way I solved this was to also generate 'light-emitting-diode' at two 
> positions: (a) In the same position as 'light' and (b) in the same position 
> as 'glows', like below:
> ||organic||light||emitting||diode||glows||
> | |light-emitting-diode| |light-emitting-diode| |
> |0|1|2|3|4|
> The positions of the two 'light-emitting-diode' are 1 and 3, but the offsets 
> are obviously the same. This works beautifully in Lucene 5.x in both 
> searching and highlighting with span queries. 
> But when I try this in Lucene 7.6, it hits the condition "Offsets must not go 
> backwards" at DefaultIndexingChain:818. This IllegalArgumentException is 
> being thrown without any comments on why this check is needed. As I explained 
> above, startOffset going backwards is perfectly valid, to deal with word 
> splitting and span operations on these specialized use cases. On the other 
> hand, it is not clear what value is added by this check and which highlighter 
> code is affected by offsets going backwards. This same check is done at 
> BaseTokenStreamTestCase:245. 
> I see others talk about how this check found bugs in WordDelimiter etc. but 
> it also prevents legitimate use cases. Can this check be removed?  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-07-26 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893833#comment-16893833
 ] 

David Smiley commented on LUCENE-8369:
--

IMO to most users anything beyond points, rectangles, and point-radius, is 
exotic/specialized.  Many search apps don't even have any spatial at all for 
that matter.

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-07-25 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892757#comment-16892757
 ] 

David Smiley commented on SOLR-12638:
-

CC [~moshebla]
I found the cause of this bug or maybe non-bug depending on how you want to 
look at it.   For partial updates to nested documents to work, it is _not only_ 
necessary to have the \_nest_path_ field in the schema, but the \_root_ field 
must be stored=true.   If either are false then any such partial updates will 
wipe out child documents.  So the cause here is actually very similar to 
SOLR-13523.  
org.apache.solr.update.processor.NestedAtomicUpdateTest#testBlockAtomicStack 
exercises partial updates and it passes... but it passes because this test 
suite uses the test schema-nest.xml with root stored=true.

When I was working on SOLR-13523 I started to work on wiping out all traces of 
stored=true in all schemas but it got to be a chunk of work that was a bit 
distracting from a simple bug fix so I had tabled it.  But that would have 
surfaced this problem then had I continued.  Perhaps _for now_, root with 
stored=true this is simply a requirement for partial updates to work in the 
presence of nested docs.  We don't document that and it's a problem.  I also 
think this requirement stinks... I'd much rather this feature work without 
having to toggle the stored attribute because I think it's a source of errors 
(e.g. this issue), something to document, something to want to test in 
different ways, and ultimately not truly necessary if we code this better.

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1
>
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-07-25 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892745#comment-16892745
 ] 

David Smiley commented on LUCENE-8369:
--

Lots of awesome functionality _commonly_ needed in search is in our modules -- 
like highlighting, autocomplete, and spellcheck, to name a few.  Why should 
spatial be an exception?

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-24 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8883:
-
Attachment: LUCENE-8883.patch
Status: Open  (was: Open)

In the latest patch I added an "Improvements" category (Lucene & Solr) and I 
renamed that is_bugfix variable.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch, LUCENE-8883.patch, LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-07-24 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892157#comment-16892157
 ] 

David Smiley commented on SOLR-12638:
-

Partial updates of any part of a nested doc (i.e. even to a parent) is not 
supported unless the schema has \_nest_path – something I see we need to 
document in updating-parts-of-documents.adoc and in 
indexing-nested-documents.adoc.  So the apparent bug I discovered would 
definitely not result in a fix for your circumstance, sorry.  I _can_ imagine 
how a "root-only schema" might support partial updates but that's hypothetical.

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1
>
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-07-23 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891514#comment-16891514
 ] 

David Smiley commented on SOLR-12638:
-

Ronen, your explanation wasn't clear to me since using "parentFilter" param in 
the child doc is not supported in a nested schema (schemas with a nest path 
field), and Solr will complain about that.  Furthermore the link to Yonik's old 
tutorial pre-dated this feature and would need to be updated to use a 
relationship name like "reviews" instead of specifically "_childDocuments_" to 
use the new nested schema stuff, and you didn't mention doing that.  
Nonetheless I tweaked the curl statement to add the data in this way and was 
able to see that the atomic update did *not* leave the children in place, which 
is troubling and not what I expected.  I'll look further to see what's going on.

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1
>
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-16 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886112#comment-16886112
 ] 

David Smiley commented on LUCENE-8883:
--

Thanks for the review Christine; I'll rename that variable.
Adrien I can add Optimizations as well; I'm torn either way and accept your 
preference.
Then I can commit this I think.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch, LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-13631) Java 11 date parsing causing NPEs

2019-07-14 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed SOLR-13631.
---

> Java 11 date parsing causing NPEs
> -
>
> Key: SOLR-13631
> URL: https://issues.apache.org/jira/browse/SOLR-13631
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> Across all my machines with Fedora 30, I encounter 20+ test failures on 
> master with the following NPEs. Same when starting Solr (it doesn't start up).
> {code}
>[junit4]   2> Caused by: java.time.format.DateTimeParseException: Text 
> '2019-07-14T07:09:13.341Z' could not be parsed: null
>[junit4]   2>  at 
> java.base/java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:2017)
>[junit4]   2>  at 
> java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1920)
>[junit4]   2>  at 
> org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.parseInstant(ParseDateFieldUpdateProcessorFactory.java:230)
>[junit4]   2>  at 
> org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.validateFormatter(ParseDateFieldUpdateProcessorFactory.java:214)
>[junit4]   2>  ... 44 more
> {code}
> Here are my environment details:
> OS: Fedora 30
> Java version:
> {code}
> [ishan@chromebox core] $ java -version
> openjdk version "11.0.3" 2019-04-16
> OpenJDK Runtime Environment 18.9 (build 11.0.3+7)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.3+7, mixed mode, sharing)
> {code}
> Kernel: 5.1.16-300.fc30.x86_64
> JDK was installed via "sudo dnf install java-11-openjdk-devel".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13631) Java 11 date parsing causing NPEs

2019-07-14 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-13631.
-
Resolution: Duplicate

I dug into this; see my first comment on : SOLR-13606

> Java 11 date parsing causing NPEs
> -
>
> Key: SOLR-13631
> URL: https://issues.apache.org/jira/browse/SOLR-13631
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> Across all my machines with Fedora 30, I encounter 20+ test failures on 
> master with the following NPEs. Same when starting Solr (it doesn't start up).
> {code}
>[junit4]   2> Caused by: java.time.format.DateTimeParseException: Text 
> '2019-07-14T07:09:13.341Z' could not be parsed: null
>[junit4]   2>  at 
> java.base/java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:2017)
>[junit4]   2>  at 
> java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1920)
>[junit4]   2>  at 
> org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.parseInstant(ParseDateFieldUpdateProcessorFactory.java:230)
>[junit4]   2>  at 
> org.apache.solr.update.processor.ParseDateFieldUpdateProcessorFactory.validateFormatter(ParseDateFieldUpdateProcessorFactory.java:214)
>[junit4]   2>  ... 44 more
> {code}
> Here are my environment details:
> OS: Fedora 30
> Java version:
> {code}
> [ishan@chromebox core] $ java -version
> openjdk version "11.0.3" 2019-04-16
> OpenJDK Runtime Environment 18.9 (build 11.0.3+7)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.3+7, mixed mode, sharing)
> {code}
> Kernel: 5.1.16-300.fc30.x86_64
> JDK was installed via "sudo dnf install java-11-openjdk-devel".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-07-11 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883498#comment-16883498
 ] 

David Smiley commented on SOLR-12638:
-

Yes all documents (parent/child) can be atomically updated.

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.1
>
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-10 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882617#comment-16882617
 ] 

David Smiley commented on LUCENE-8883:
--

Optimization really does seam useful & meaningful, and I admit it's not easy to 
differentiate a New Feature from an Improvement.  

I do wish we also were in the habit of prefixing both CHANGES entries and 
commit messages with the module or feature in question, so that we/users 
needn't parse it out in our heads when reading them.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch, LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-10 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8883:
-
Attachment: LUCENE-8883.patch
Status: Open  (was: Open)

Nice unix piping :)

In this updated patch I added the "-" line beneath the 
headings because I reconsidered and found it pleasant on the eyes.  And I added 
the "(No changes)" below that for each because you asked.

For a bug fix release, the only heading is "Bug Fixes" (both Lucene & Solr),

Lucene's heading list:

['API Changes', 'New Features', 'Improvements', 'Bug Fixes', 'Other']

Solr's:

['Upgrade Notes', 'New Features', 'Improvements', 'Bug Fixes', 'Other Changes']

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch, LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10377) Improve readability of the explain output for JSON format

2019-07-10 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882580#comment-16882580
 ] 

David Smiley commented on SOLR-10377:
-

I really don't think everyone would like structured=true by default; I'm "meh" 
at best.  It's bulky.  I dunno if there's a good solution here.  I wonder if 
maybe when the admin UI renders the JSON, it might insert literal carriage 
returns after the occurrences of "\n" (actual backslash then 'n').  Or maybe 
somehow we could insert a pop-up with it rendered correctly.  Shrug... that's 
the best I can think up right now.  One way to ameliorate the situation is to 
add debug.explain.structured=true to the admin UI so you could click it easily 
and not have to know it exists.  Maybe that's the best we can do that also 
doesn't feel like a hack.

> Improve readability of the explain output for JSON format
> -
>
> Key: SOLR-10377
> URL: https://issues.apache.org/jira/browse/SOLR-10377
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
>
> Today when I ask solr for the debug query output In json with indent I get 
> this:
> {code}
> 1: " 3.545981 = sum of: 3.545981 = weight(name:dns in 0) [SchemaSimilarity], 
> result of: 3.545981 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 
> 0.5)) from: 2.0 = docFreq 24.0 = docCount 1.54 = tfNorm, computed as (freq * 
> (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 
> 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 
> 1.0 = fieldLength ",
> 2: " 7.4202514 = sum of: 7.4202514 = sum of: 2.7921112 = weight(name:domain 
> in 1) [SchemaSimilarity], result of: 2.7921112 = score(doc=1,freq=1.0 = 
> termFreq=1.0 ), product of: 2.3025851 = idf, computed as log(1 + (docCount - 
> docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 24.0 = docCount 
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * 
> fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 = parameter k1 
> 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 2.7921112 = 
> weight(name:name in 1) [SchemaSimilarity], result of: 2.7921112 = 
> score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 2.3025851 = idf, computed 
> as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 
> 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + 
> k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 
> = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of: 1.8360289 
> = score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 1.5141277 = idf, 
> computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 5.0 = 
> docFreq 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / 
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = 
> termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = 
> fieldLength "
> {code}
> When I run the same query with "wt=ruby" I get a much nicer output
> {code}
> '2'=>'
> 7.4202514 = sum of:
>   7.4202514 = sum of:
> 2.7921112 = weight(name:domain in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 2.7921112 = weight(name:name in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of:
>   1.8360289 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 1.5141277 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   5.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 

[jira] [Commented] (SOLR-10377) Improve readability of the explain output for JSON format

2019-07-10 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882004#comment-16882004
 ] 

David Smiley commented on SOLR-10377:
-

Thanks for the reminder of that obscure parameter.  I think the main use-case 
is the user using Solr's admin screen, defaulting to json, and choosing the 
debugQuery checkbox to determine what happened with the score.  When the 
default wt (writer type / response format) was XML, the output was 
understandable.  With JSON it is not.

> Improve readability of the explain output for JSON format
> -
>
> Key: SOLR-10377
> URL: https://issues.apache.org/jira/browse/SOLR-10377
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
>
> Today when I ask solr for the debug query output In json with indent I get 
> this:
> {code}
> 1: " 3.545981 = sum of: 3.545981 = weight(name:dns in 0) [SchemaSimilarity], 
> result of: 3.545981 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 
> 0.5)) from: 2.0 = docFreq 24.0 = docCount 1.54 = tfNorm, computed as (freq * 
> (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 
> 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 
> 1.0 = fieldLength ",
> 2: " 7.4202514 = sum of: 7.4202514 = sum of: 2.7921112 = weight(name:domain 
> in 1) [SchemaSimilarity], result of: 2.7921112 = score(doc=1,freq=1.0 = 
> termFreq=1.0 ), product of: 2.3025851 = idf, computed as log(1 + (docCount - 
> docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 24.0 = docCount 
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * 
> fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 = parameter k1 
> 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 2.7921112 = 
> weight(name:name in 1) [SchemaSimilarity], result of: 2.7921112 = 
> score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 2.3025851 = idf, computed 
> as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 
> 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + 
> k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 
> = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of: 1.8360289 
> = score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 1.5141277 = idf, 
> computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 5.0 = 
> docFreq 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / 
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = 
> termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = 
> fieldLength "
> {code}
> When I run the same query with "wt=ruby" I get a much nicer output
> {code}
> '2'=>'
> 7.4202514 = sum of:
>   7.4202514 = sum of:
> 2.7921112 = weight(name:domain in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 2.7921112 = weight(name:name in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of:
>   1.8360289 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 1.5141277 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   5.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> ',
>   '1'=>'
> 3.545981 = sum of:
>   3.545981 = weight(name:dns in 0) 

[jira] [Commented] (SOLR-13578) Implement a generic Resource Manager for monitoring and controlling limited resources

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881574#comment-16881574
 ] 

David Smiley commented on SOLR-13578:
-

This looks useful.  For example I believe [~mkhludnev] was looking at 
BackupRepository impls that might want a shared thread pool, or something like 
that.  I've also worked on customer specific stuff involving shared connection 
pools to other systems (e.g. some DB or whatever).

> Implement a generic Resource Manager for monitoring and controlling limited 
> resources
> -
>
> Key: SOLR-13578
> URL: https://issues.apache.org/jira/browse/SOLR-13578
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
>
> Many common resources such as CPUs, threads, file descriptors, heap, etc. are 
> shared between multiple SolrCore-s within a CoreContainer.
> Most of these resources can already be monitored for usage using metrics. 
> However, in most cases Solr doesn't have any control mechanism to actually do 
> something about excessive use (or extreme under-utilization) of a resource by 
> any particular SolrCore or CoreContainer. Furthermore, even when a control 
> mechanism exists it's usually available only as a static configuration 
> parameter (eg. max cache size) and changing it requires at least a core 
> reload, or restarting the JVM.
> This issue is especially important for multi-tenant applications where the 
> admin cannot assume voluntary co-operation of users and needs more 
> fine-grained tools to prevent DOS attacks, either accidental or purposeful.
> This is an umbrella issue that proposes the following:
>  * adding a generic ResourceManager component to Solr, which would run at a 
> CoreContainer level and would be able to monitor and enforce both global 
> limits and a "fair" division of resources among competing SolrCore-s.
>  * extending key existing components so that their resource consumption 
> aspects can be dynamically controlled.
>  * adding a number of management plugins that implement specific strategies 
> for managing eg. the cache sizes according to the specified "fairness" and 
> global limits.
>  * the API should allow for implementation of this control loop both in Solr 
> and as an outside mechanism.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13579) Create resource management API

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881572#comment-16881572
 ] 

David Smiley commented on SOLR-13579:
-

Oh I see this is a sub-task, and the parent task has a fine description.

> Create resource management API
> --
>
> Key: SOLR-13579
> URL: https://issues.apache.org/jira/browse/SOLR-13579
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Attachments: SOLR-13579.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13579) Create resource management API

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881571#comment-16881571
 ] 

David Smiley commented on SOLR-13579:
-

Could you please add an issue description?  The title is not so self 
explanatory so as to excuse you from writing one.  It's a little unclear to me 
what the objective is.  Node-wide cache management seems to be just one example 
or is that the whole point?  What might other purposes be?  I could use my 
imagination but I'd rather the proposal spell it out for us.  Thanks.

> Create resource management API
> --
>
> Key: SOLR-13579
> URL: https://issues.apache.org/jira/browse/SOLR-13579
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Attachments: SOLR-13579.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13606) DateTimeFormatter Exception on Create Core

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881518#comment-16881518
 ] 

David Smiley commented on SOLR-13606:
-

Nonetheless it's bad that the JDK throws an NPE, although not Solr's fault.  
This very same stack trace has been vexing us where I work at Salesforce until 
I determined it was due to a customized time zone database that's fixable by 
setting the System Property {{java.locale.providers=JRE,SPI}}.  I suspect we're 
going to chase this down into the JDK so we don't get NPEs eventually in some 
future java update. 

> DateTimeFormatter Exception on Create Core
> --
>
> Key: SOLR-13606
> URL: https://issues.apache.org/jira/browse/SOLR-13606
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 8.1.1
> Environment: Red Hat 8.0
> Java 11
> Solr 8.1.1
>Reporter: Joseph Krauss
>Priority: Critical
>  Labels: newbie
>
> I have a fresh install of RH 8.0 with Java 11 JDK and I've run into an issue 
> with Solr 8.0.0 and 8.1.1 when attempting to create a core. I'm guessing 
> here, but the error appears to be an issue with the date format. From what 
> I've read Java date parser is expecting a period between seconds and 
> milliseconds? Hopefully, there's something simple I overlooked when I 
> configured the environment for solr. 
> Caused by: java.time.format.DateTimeParseException: Text 
> '2019-07-03T20:00:{color:#FF}00.050Z{color}'
> Oracle Corporation OpenJDK 64-Bit Server VM 11.0.3 11.0.3+7-LTS
> org.apache.solr.common.SolrException: Error CREATEing SolrCore 'testarms': 
> Unable to create core [testarms] Caused by: null
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1187)
>   at 
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:92)
>   at 
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
>   at 
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:396)
>   at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>   at 
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:796)
>   at 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:762)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:522)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>   at org.eclipse.jetty.server.Server.handle(Server.java:502)
>   at 

[jira] [Reopened] (SOLR-10377) Improve readability of the explain output for JSON format

2019-07-09 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reopened SOLR-10377:
-

> Improve readability of the explain output for JSON format
> -
>
> Key: SOLR-10377
> URL: https://issues.apache.org/jira/browse/SOLR-10377
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
>
> Today when I ask solr for the debug query output In json with indent I get 
> this:
> {code}
> 1: " 3.545981 = sum of: 3.545981 = weight(name:dns in 0) [SchemaSimilarity], 
> result of: 3.545981 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 
> 0.5)) from: 2.0 = docFreq 24.0 = docCount 1.54 = tfNorm, computed as (freq * 
> (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 
> 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 
> 1.0 = fieldLength ",
> 2: " 7.4202514 = sum of: 7.4202514 = sum of: 2.7921112 = weight(name:domain 
> in 1) [SchemaSimilarity], result of: 2.7921112 = score(doc=1,freq=1.0 = 
> termFreq=1.0 ), product of: 2.3025851 = idf, computed as log(1 + (docCount - 
> docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 24.0 = docCount 
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * 
> fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 = parameter k1 
> 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 2.7921112 = 
> weight(name:name in 1) [SchemaSimilarity], result of: 2.7921112 = 
> score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 2.3025851 = idf, computed 
> as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 
> 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + 
> k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 
> = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of: 1.8360289 
> = score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 1.5141277 = idf, 
> computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 5.0 = 
> docFreq 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / 
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = 
> termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = 
> fieldLength "
> {code}
> When I run the same query with "wt=ruby" I get a much nicer output
> {code}
> '2'=>'
> 7.4202514 = sum of:
>   7.4202514 = sum of:
> 2.7921112 = weight(name:domain in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 2.7921112 = weight(name:name in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of:
>   1.8360289 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 1.5141277 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   5.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> ',
>   '1'=>'
> 3.545981 = sum of:
>   3.545981 = weight(name:dns in 0) [SchemaSimilarity], result of:
> 3.545981 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
>   2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
> 2.0 = docFreq
> 24.0 = docCount
>   1.54 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b 
> * fieldLength / avgFieldLength)) from:
> 

[jira] [Commented] (SOLR-10377) Improve readability of the explain output for JSON format

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881507#comment-16881507
 ] 

David Smiley commented on SOLR-10377:
-

The indentation problem has been a long-standing problem no matter what browser 
I use. "the ordering" is fine. Using the techproducts data set with query 
name:ddr I get this one-liner explanation for the first doc:

{{ "VS1GB400C3":"\n0.89796925 = weight(name:ddr in 1) [SchemaSimilarity], 
result of:\n 0.89796925 = score(freq=2.0), product of:\n 1.8382795 = idf, 
computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n 3 = n, number of 
documents containing term\n 21 = N, total number of documents with field\n 
0.48848352 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) 
from:\n 2.0 = freq, occurrences of term within document\n 1.2 = k1, term 
saturation parameter\n 0.75 = b, length normalization parameter\n 15.0 = dl, 
length of field\n 7.5238094 = avgdl, average length of field\n", }}

> Improve readability of the explain output for JSON format
> -
>
> Key: SOLR-10377
> URL: https://issues.apache.org/jira/browse/SOLR-10377
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
>
> Today when I ask solr for the debug query output In json with indent I get 
> this:
> {code}
> 1: " 3.545981 = sum of: 3.545981 = weight(name:dns in 0) [SchemaSimilarity], 
> result of: 3.545981 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 
> 0.5)) from: 2.0 = docFreq 24.0 = docCount 1.54 = tfNorm, computed as (freq * 
> (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 
> 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 
> 1.0 = fieldLength ",
> 2: " 7.4202514 = sum of: 7.4202514 = sum of: 2.7921112 = weight(name:domain 
> in 1) [SchemaSimilarity], result of: 2.7921112 = score(doc=1,freq=1.0 = 
> termFreq=1.0 ), product of: 2.3025851 = idf, computed as log(1 + (docCount - 
> docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 24.0 = docCount 
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * 
> fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 = parameter k1 
> 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 2.7921112 = 
> weight(name:name in 1) [SchemaSimilarity], result of: 2.7921112 = 
> score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 2.3025851 = idf, computed 
> as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 2.0 = docFreq 
> 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + 
> k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = termFreq=1.0 1.2 
> = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = fieldLength 
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of: 1.8360289 
> = score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 1.5141277 = idf, 
> computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 5.0 = 
> docFreq 24.0 = docCount 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / 
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 1.0 = 
> termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 7.0 = avgFieldLength 4.0 = 
> fieldLength "
> {code}
> When I run the same query with "wt=ruby" I get a much nicer output
> {code}
> '2'=>'
> 7.4202514 = sum of:
>   7.4202514 = sum of:
> 2.7921112 = weight(name:domain in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 2.7921112 = weight(name:name in 1) [SchemaSimilarity], result of:
>   2.7921112 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 2.3025851 = idf, computed as log(1 + (docCount - docFreq + 0.5) / 
> (docFreq + 0.5)) from:
>   2.0 = docFreq
>   24.0 = docCount
> 1.2125984 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - 
> b + b * fieldLength / avgFieldLength)) from:
>   1.0 = termFreq=1.0
>   1.2 = parameter k1
>   0.75 = parameter b
>   7.0 = avgFieldLength
>   4.0 = fieldLength
> 1.8360289 = weight(name:system in 1) [SchemaSimilarity], result of:
>   1.8360289 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
> 1.5141277 = idf, computed as log(1 + 

[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881500#comment-16881500
 ] 

David Smiley commented on SOLR-13593:
-

I really like this change!  Love "name".  I think you can make these changes 
everywhere in the next release without waiting for 9.0.  I do strongly suggest 
separating the commits from the essence of the change (what's in your PR now) 
with the changing of this pattern all over the place.

> Allow to specify analyzer components by their SPI names in schema definition
> 
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13257) Enable replica routing affinity for better cache usage

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881478#comment-16881478
 ] 

David Smiley commented on SOLR-13257:
-

Cool!

> Enable replica routing affinity for better cache usage
> --
>
> Key: SOLR-13257
> URL: https://issues.apache.org/jira/browse/SOLR-13257
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Michael Gibney
>Priority: Minor
> Attachments: AffinityShardHandlerFactory.java, SOLR-13257.patch, 
> SOLR-13257.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For each shard in a distributed request, Solr currently routes each request 
> randomly via 
> [ShufflingReplicaListTransformer|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/component/ShufflingReplicaListTransformer.java]
>  to a particular replica. In setups with replication factor >1, this normally 
> results in a situation where subsequent requests (which one would hope/expect 
> to leverage cached results from previous related requests) end up getting 
> routed to a replica that hasn't seen any related requests.
> The problem can be replicated by issuing a relatively expensive query (maybe 
> containing common terms?). The first request initializes the 
> {{queryResultCache}} on the consulted replicas. If replication factor >1 and 
> there are a sufficient number of shards, subsequent requests will likely be 
> routed to at least one replica that _hasn't_ seen the query before. The 
> replicas with uninitialized caches become a bottleneck, and from the client's 
> perspective, many subsequent requests appear not to benefit from caching at 
> all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8069) Allow index sorting by field length

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881476#comment-16881476
 ] 

David Smiley commented on LUCENE-8069:
--

Also; is luceneutil assuming the search query doesn't want the number of total 
hits, and could then use impacts?  Yet this is not now most people use Lucene...

Would a contribution here (not yet present) be something that allowed you to 
sort on the norm _of a particular field_?  Hmm.  I guess people wanting these 
benefits today without any changes to Lucene could simply add a norm-like field 
(e.g. sum of raw char lengths of all tokenized fields) and then configure 
Lucene to sort on that.  Would that work?

> Allow index sorting by field length
> ---
>
> Key: LUCENE-8069
> URL: https://issues.apache.org/jira/browse/LUCENE-8069
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by 
> field length would mean we would be likely to collect best matches first. 
> Depending on the similarity implementation, this might even allow to early 
> terminate collection of top documents on term queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8069) Allow index sorting by field length

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881474#comment-16881474
 ] 

David Smiley commented on LUCENE-8069:
--

Index sorting means slower indexing (naturally).  How much slower was the 
indexing here?  

> Allow index sorting by field length
> ---
>
> Key: LUCENE-8069
> URL: https://issues.apache.org/jira/browse/LUCENE-8069
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by 
> field length would mean we would be likely to collect best matches first. 
> Depending on the similarity implementation, this might even allow to early 
> terminate collection of top documents on term queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881452#comment-16881452
 ] 

David Smiley commented on LUCENE-8883:
--

Thanks Adrien for determining this list of actual usages!  Just curious; did 
you do this with a script or just haphazardly with repeated "Find" or similar?

I suppose we need differences between Lucene and Solr with this list.  Solr 
doesn't have "API Changes" as a section but it does use "Upgrade Notes" which 
is conceptually similar.  I'll have this list get passed into this method so 
that we can specialize per-project.

For bug fix releases, I suppose we can detect this via the last octet being 
non-zero?  And then we only generate "Bug Fixes"?

I'm aware "Optimizations" has been a popular heading in Lucene's CHANGES.txt 
but might it be better re-categorized as "Improvements", and thus make the 
mapping of JIRA issue types to CHANGES heading the same?  Or maybe for 
consistency we could get this new type added to JIRA if it's so important?

I'll add "(No changes)".

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13375) Dimensional Routed Aliases

2019-07-08 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880918#comment-16880918
 ] 

David Smiley commented on SOLR-13375:
-

Fascinating bug to track down; congrats on that!  I hope it might help some 
other tests to be less flakey.

> Dimensional Routed Aliases
> --
>
> Key: SOLR-13375
> URL: https://issues.apache.org/jira/browse/SOLR-13375
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: master (9.0)
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Attachments: SOLR-13375.patch, SOLR-13375.patch, SOLR-13375.patch
>
>
> Current available routed aliases are restricted to a single field. This 
> feature will allow Solr to provide data driven collection access, creation 
> and management based on multiple fields in a document. The collections will 
> be queried and updated in a unified manner via an alias. Current routing is 
> restricted to the values of a single field. The particularly useful 
> combination at this time will be Category X Time routing but Category X 
> Category may also be useful. More importantly, if additional routing schemes 
> are created in the future (either as contributions or as custom code by 
> users) combination among these should be supported. 
> It is expected that not all combinations will be useful, and that 
> determination of usefulness I expect to leave up to the user. Some Routing 
> schemes may need to be limited to be the leaf/last routing scheme for 
> technical reasons, though I'm not entirely convinced of that yet. If so, a 
> flag will be added to the RoutedAlias interface.
> Initial desire is to support two levels, though if arbitrary levels can be 
> supported easily that will be done.
> This could also have been called CompositeRoutedAlias, but that creates a TLA 
> clash with CategoryRoutedAlias.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-08 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8883:
-
Attachment: LUCENE-8883.patch
Status: Open  (was: Open)

Here's the patch.  Note I have never written Python code before so it'd be 
helpful if someone who has might eyeball these changes.  I think the changes 
were simple enough and there was enough existing Python code here for me to 
learn from that I did it right.  I did run the changes and saw it work as I 
intended.

All the patch does is add the names of the headers with a blank line 
in-between.  I did not add a "--" line below each; I see Lucene hasn't been 
doing this but Solr has, and I like Lucene's approach better just barely.  Also 
I didn't add "(No changes)"; seems needless / self-evident.  I could have added 
an "Upgrade Notes" section but opted not to... I think this won't be as much of 
an issue but I could easily go either way.

Alexandre:  Are you proposing additional python scripts to basically do all 
CHANGES.txt manipulation?  I'm not sure what to think of that... I'm lukewarm I 
guess.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13605) HttpSolrClient.Builder.withHttpClient() is useless for the purpose of setting client scoped so/connect timeouts

2019-07-03 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878271#comment-16878271
 ] 

David Smiley commented on SOLR-13605:
-

Thorough analysis Hoss; well researched!

> HttpSolrClient.Builder.withHttpClient() is useless for the purpose of setting 
> client scoped so/connect timeouts
> ---
>
> Key: SOLR-13605
> URL: https://issues.apache.org/jira/browse/SOLR-13605
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
>
> TL;DR: trying to use {{HttpSolrClient.Builder.withHttpClient}} is useless for 
> the the purpose of specifying an {{HttpClient}} with the default "timeouts" 
> you want to use on all requests, because of how {{HttpSolrClient.Builder}} 
> and {{HttpClientUtil.createDefaultRequestConfigBuilder()}} hardcode values 
> thta get set on every {{HttpRequest}}.
> This internally affects code that uses things like 
> {{UpdateShardHandler.getDefaultHttpClient()}}, 
> {{UpdateShardHandler.getUpdateOnlyHttpClient()}} 
> {{UpdateShardHandler.getRecoveryOnlyHttpClient()}}, etc...
> 
> While looking into the patch in SOLR-13532, I realized that the way 
> {{HttpSolrClient.Builder}} and it's super class {{SolrClientBuilder}} work, 
> the following code doesn't do what a reasonable person would expect...
> {code:java}
> SolrParams clientParams = params(HttpClientUtil.PROP_SO_TIMEOUT, 12345,
>  HttpClientUtil.PROP_CONNECTION_TIMEOUT, 
> 67890);
> HttpClient httpClient = HttpClientUtil.createClient(clientParams);
> HttpSolrClient solrClient = new HttpSolrClient.Builder(ANY_BASE_SOLR_URL)
> .withHttpClient(httpClient)
> .build();
> {code}
> When {{solrClient}} is used to execute a request, neither of the properties 
> passed to {{HttpClientUtil.createClient(...)}} will matter - the 
> {{HttpSolrClient.Builder}} (via inheritence from {{SolrClientBuilder}} has 
> the following hardcoded values...
> {code:java}
>   // SolrClientBuilder
>   protected Integer connectionTimeoutMillis = 15000;
>   protected Integer socketTimeoutMillis = 12;
> {code}
> ...which unless overridden by calls to {{withConnectionTimeout()}} and 
> {{withSocketTimeout()}} will get set on the {{HttpSolrClient}} object, and 
> used on every request...
> {code:java}
> // protected HttpSolrClient constructor
> this.connectionTimeout = builder.connectionTimeoutMillis;
> this.soTimeout = builder.socketTimeoutMillis;
> {code}
> It would be tempting to try and do something like this to work around the 
> problem...
> {code:java}
> SolrParams clientParams = params(HttpClientUtil.PROP_SO_TIMEOUT, 12345,
>  HttpClientUtil.PROP_CONNECTION_TIMEOUT, 
> 67890);
> HttpClient httpClient = HttpClientUtil.createClient(clientParams);
> HttpSolrClient solrClient = new HttpSolrClient.Builder(ANY_BASE_SOLR_URL)
> .withHttpClient(httpClient)
> .withSocketTimeout(null)
> .withConnectionTimeout(null)
> .build();
> {code}
> ...except for 2 problems:
>  # In {{HttpSolrClient.executeMethod}}, if the values of 
> {{this.connectionTimeout}} or {{this.soTimeout}} are null, then the values 
> from {{HttpClientUtil.createDefaultRequestConfigBuilder();}} get used, which 
> has it's own hardcoded defaults.
>  # {{withSocketTimeout}} and {{withConnectionTimeout}} take an int, not a 
> (nullable) Integer.
> So then maybe something like this would work? - particularly since at the 
> {{HttpClient}} / {{HttpRequest}} / {{RequestConfig}} level, a "-1" set on the 
> {{HttpRequest}}'s {{RequestConfig}} is suppose to mean "use the (client) 
> default" ...
> {code:java}
> SolrParams clientParams = params(HttpClientUtil.PROP_SO_TIMEOUT, 12345,
>  HttpClientUtil.PROP_CONNECTION_TIMEOUT, 
> 67890);
> HttpClient httpClient = HttpClientUtil.createClient(clientParams);
> HttpSolrClient client = new HttpSolrClient.Builder(ANY_BASE_SOLR_URL)
> .withHttpClient(httpClient)
> .withSocketTimeout(-1)
> .withConnectionTimeout(-1)
> .build();
> {code}
> ...except that if we do *that* we get an IllegalArgumentException...
> {code:java}
>   // SolrClientBuilder
>   public B withConnectionTimeout(int connectionTimeoutMillis) {
> if (connectionTimeoutMillis < 0) {
>   throw new IllegalArgumentException("connectionTimeoutMillis must be a 
> non-negative integer.");
> }
> {code}
> This is madness, and eliminates most/all of the known value of using 
> {{.withHttpClient}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, 

[jira] [Created] (SOLR-13604) if updateLog isn't in config, complain loudly if it's needed for SolrCloud

2019-07-03 Thread David Smiley (JIRA)
David Smiley created SOLR-13604:
---

 Summary: if updateLog isn't in config, complain loudly if it's 
needed for SolrCloud
 Key: SOLR-13604
 URL: https://issues.apache.org/jira/browse/SOLR-13604
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 7.1
Reporter: David Smiley


The updateLog can be commented out, and SolrCloud will only contain this error:

2019-07-03 16:15:55.907 ERROR (qtp214649627-92) [c:gettingstarted s:shard1 
r:core_node4 x:gettingstarted_shard1_replica_n2] o.a.s.c.SyncStrategy No 
UpdateLog found - cannot sync

I suspect it's truly required in which Solr should complain more loudly about 
this -- throw an exception.  For single replica collections, I suppose it's 
fine (not an error?)?  Granted then there could be an issue of adding replicas 
later (unless pull type; those are fine).  Even if multi-replica SolrCloud can 
usefully work without an updateLog (news to me) then maybe we shouldn't log an 
error, although the lack of an updateLog would probably be an oversight that 
would be hard to track down the root cause of.

I wish the updateLog altogether could be more optional but that's another 
conversation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8895) Switch all FSTs to use direct addressing optimization

2019-07-03 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877843#comment-16877843
 ] 

David Smiley commented on LUCENE-8895:
--

Dare I ask:  Did you intend to only Deprecate the first getByOutput method and 
not the other?    It's my favorite method after all :)

Tip: when creating issues related to each other, link them in Jira so the 
"watchers" of the first issue know about the new issue

> Switch all FSTs to use direct addressing optimization
> -
>
> Key: LUCENE-8895
> URL: https://issues.apache.org/jira/browse/LUCENE-8895
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Major
> Fix For: 8.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See discussion in LUCENE-8781 about turning on array-with-gaps encoding 
> everywhere. Let's conduct any further discussion here so we can use an open 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8781) Explore FST direct array arc encoding

2019-07-02 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reopened LUCENE-8781:
--
  Assignee: Mike Sokolov  (was: Dawid Weiss)

Just checking: you will continue to leave getByOutput but it's 10% slower?  
Fine.  BTW I think that is one of the FST's cool/useful options.

I'm going to re-open to reflect work is actually still in-progress.  CC 
[~ivera] 8.2 RM

> Explore FST direct array arc encoding 
> --
>
> Key: LUCENE-8781
> URL: https://issues.apache.org/jira/browse/LUCENE-8781
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Assignee: Mike Sokolov
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: FST-2-4.png, FST-6-9.png, FST-size.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This issue is for exploring an alternate FST encoding of Arcs as full-sized 
> arrays so Arcs are addressed directly by label, avoiding binary search that 
> we use today for arrays of Arcs. PR: 
> https://github.com/apache/lucene-solr/pull/657
> h3. Testing
> ant test passes. I added some unit tests that were helpful in uncovering bugs 
> while
> implementing which are more difficult to chase down when uncovered by the 
> randomized testing we already do. They don't really test anything new; 
> they're just more focused.
> I'm not sure why, but ant precommit failed for me with:
> {noformat}
>  ...lucene-solr/solr/common-build.xml:536: Check for forbidden API calls 
> failed while scanning class 
> 'org.apache.solr.metrics.reporters.SolrGangliaReporterTest' 
> (SolrGangliaReporterTest.java): java.lang.ClassNotFoundException: 
> info.ganglia.gmetric4j.gmetric.GMetric (while looking up details about 
> referenced class 'info.ganglia.gmetric4j.gmetric.GMetric')
> {noformat}
> I also got Test2BFST running (it was originally timing out due to excessive 
> calls to ramBytesUsage(), which seems to have gotten slow), and it passed; 
> that change isn't include here.
> h4. Micro-benchmark
> I timed lookups in FST via FSTEnum.seekExact in a unit test under various 
> conditions. 
> h5. English words
> A test of looking up existing words in a dictionary of ~17 English words 
> shows improvements; the numbers listed are % change in FST size, time to look 
> up (FSTEnum.seekExact) words that are in the dict, and time to look up random 
> strings that are not in the dict. The comparison is against the current 
> codebase with the optimization disabled. A separate comparison of showed no 
> significant change of the baseline (no opto applied) vs the current master 
> FST impl with no code changes applied.
> ||  load=2||   load=4 ||  load=16 ||
> | +4, -6, -7  | +18, -11, -8 | +22, -11.5, -7 |
> The "load factor" used for those measurements controls when direct array arc 
> encoding is used;
> namely when the number of outgoing arcs was > load * (max label - min label).
> h5. sequential and random terms
> The same test, with terms being a sequence of integers as strings shows a 
> larger improvement, around 20% (load=4). This is presumably the best case for 
> this delta, where every Arc is encoded as a direct lookup.
> When random lowercase ASCII strings are used, a smaller improvement of around 
> 4% is seen.
> h4. luceneutil
> Testing w/luceneutil (wikimediumall) we see improvements mostly in the 
> PKLookup case. Other results seem noisy, with perhaps a small improvment in 
> some of the queries.
> {noformat}
> TaskQPS base  StdDevQPS opto  StdDev  
>   Pct diff
>   OrHighHigh6.93  (3.0%)6.89  (3.1%)   
> -0.5% (  -6% -5%)
>OrHighMed   45.15  (3.9%)   44.92  (3.5%)   
> -0.5% (  -7% -7%)
> Wildcard8.72  (4.7%)8.69  (4.6%)   
> -0.4% (  -9% -9%)
>   AndHighLow  274.11  (2.6%)  273.58  (3.1%)   
> -0.2% (  -5% -5%)
>OrHighLow  241.41  (1.9%)  241.11  (3.5%)   
> -0.1% (  -5% -5%)
>   AndHighMed   52.23  (4.1%)   52.41  (5.3%)
> 0.3% (  -8% -   10%)
>  MedTerm 1026.24  (3.1%) 1030.52  (4.3%)
> 0.4% (  -6% -8%)
> HighTerm .10  (3.4%) 1116.70  (4.0%)
> 0.5% (  -6% -8%)
>HighTermDayOfYearSort   14.59  (8.2%)   14.73  (9.3%)
> 1.0% ( -15% -   20%)
>  AndHighHigh   13.45  (6.2%)   13.61  (4.4%)
> 1.2% (  -8% -   12%)
>HighTermMonthSort   63.09 (12.5%)   64.13 (10.9%)
> 1.6% ( -19% -   28%)
>  LowTerm 1338.94  (3.3%) 1383.90  (5.5%)
> 3.4% (  -5% -   12%)

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-29 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875599#comment-16875599
 ] 

David Smiley commented on LUCENE-8781:
--

[~dweiss] or [~jpountz] do you have any opinion on wether this should just 
simply be the default?  I think it should be; see my comment above; search for 
"nonetheless".  For example, why not on FST50 codec?

> Explore FST direct array arc encoding 
> --
>
> Key: LUCENE-8781
> URL: https://issues.apache.org/jira/browse/LUCENE-8781
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: FST-2-4.png, FST-6-9.png, FST-size.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This issue is for exploring an alternate FST encoding of Arcs as full-sized 
> arrays so Arcs are addressed directly by label, avoiding binary search that 
> we use today for arrays of Arcs. PR: 
> https://github.com/apache/lucene-solr/pull/657
> h3. Testing
> ant test passes. I added some unit tests that were helpful in uncovering bugs 
> while
> implementing which are more difficult to chase down when uncovered by the 
> randomized testing we already do. They don't really test anything new; 
> they're just more focused.
> I'm not sure why, but ant precommit failed for me with:
> {noformat}
>  ...lucene-solr/solr/common-build.xml:536: Check for forbidden API calls 
> failed while scanning class 
> 'org.apache.solr.metrics.reporters.SolrGangliaReporterTest' 
> (SolrGangliaReporterTest.java): java.lang.ClassNotFoundException: 
> info.ganglia.gmetric4j.gmetric.GMetric (while looking up details about 
> referenced class 'info.ganglia.gmetric4j.gmetric.GMetric')
> {noformat}
> I also got Test2BFST running (it was originally timing out due to excessive 
> calls to ramBytesUsage(), which seems to have gotten slow), and it passed; 
> that change isn't include here.
> h4. Micro-benchmark
> I timed lookups in FST via FSTEnum.seekExact in a unit test under various 
> conditions. 
> h5. English words
> A test of looking up existing words in a dictionary of ~17 English words 
> shows improvements; the numbers listed are % change in FST size, time to look 
> up (FSTEnum.seekExact) words that are in the dict, and time to look up random 
> strings that are not in the dict. The comparison is against the current 
> codebase with the optimization disabled. A separate comparison of showed no 
> significant change of the baseline (no opto applied) vs the current master 
> FST impl with no code changes applied.
> ||  load=2||   load=4 ||  load=16 ||
> | +4, -6, -7  | +18, -11, -8 | +22, -11.5, -7 |
> The "load factor" used for those measurements controls when direct array arc 
> encoding is used;
> namely when the number of outgoing arcs was > load * (max label - min label).
> h5. sequential and random terms
> The same test, with terms being a sequence of integers as strings shows a 
> larger improvement, around 20% (load=4). This is presumably the best case for 
> this delta, where every Arc is encoded as a direct lookup.
> When random lowercase ASCII strings are used, a smaller improvement of around 
> 4% is seen.
> h4. luceneutil
> Testing w/luceneutil (wikimediumall) we see improvements mostly in the 
> PKLookup case. Other results seem noisy, with perhaps a small improvment in 
> some of the queries.
> {noformat}
> TaskQPS base  StdDevQPS opto  StdDev  
>   Pct diff
>   OrHighHigh6.93  (3.0%)6.89  (3.1%)   
> -0.5% (  -6% -5%)
>OrHighMed   45.15  (3.9%)   44.92  (3.5%)   
> -0.5% (  -7% -7%)
> Wildcard8.72  (4.7%)8.69  (4.6%)   
> -0.4% (  -9% -9%)
>   AndHighLow  274.11  (2.6%)  273.58  (3.1%)   
> -0.2% (  -5% -5%)
>OrHighLow  241.41  (1.9%)  241.11  (3.5%)   
> -0.1% (  -5% -5%)
>   AndHighMed   52.23  (4.1%)   52.41  (5.3%)
> 0.3% (  -8% -   10%)
>  MedTerm 1026.24  (3.1%) 1030.52  (4.3%)
> 0.4% (  -6% -8%)
> HighTerm .10  (3.4%) 1116.70  (4.0%)
> 0.5% (  -6% -8%)
>HighTermDayOfYearSort   14.59  (8.2%)   14.73  (9.3%)
> 1.0% ( -15% -   20%)
>  AndHighHigh   13.45  (6.2%)   13.61  (4.4%)
> 1.2% (  -8% -   12%)
>HighTermMonthSort   63.09 (12.5%)   64.13 (10.9%)
> 1.6% ( -19% -   28%)
>  LowTerm 1338.94  (3.3%) 1383.90  (5.5%)
> 3.4% (  -5% -   12%)
> PKLookup  

[jira] [Commented] (LUCENE-8855) Add Accountable to some Query implementations

2019-06-27 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874402#comment-16874402
 ] 

David Smiley commented on LUCENE-8855:
--

Yeah I agree!  The bug distinction on SOLR-13003 is debatable, and it had 
multiple possible solutions (like simply not supporting the mem setting on that 
cache).  It feels wrong to pull in non-bugs like this into a point release for 
that.

> Add Accountable to some Query implementations
> -
>
> Key: LUCENE-8855
> URL: https://issues.apache.org/jira/browse/LUCENE-8855
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.2
>
> Attachments: LUCENE-8855.patch, LUCENE-8855.patch, LUCENE-8855.patch, 
> LUCENE-8855.patch, LUCENE-8855.patch
>
>
> Query implementations should also support {{Accountable}} API in order to 
> monitor the memory consumption e.g. in caches where either keys or values are 
> {{Query}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4197) EDismax allows end users to use local params in q= to override global params

2019-06-26 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-4197.

   Resolution: Fixed
Fix Version/s: 7.2

> EDismax allows end users to use local params in q= to override global params
> 
>
> Key: SOLR-4197
> URL: https://issues.apache.org/jira/browse/SOLR-4197
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.5, 3.6, 4.0
>Reporter: Peter Wolanin
>Priority: Major
> Fix For: 7.2
>
>
> Edismax is advertised as suitable to be used to "process advanced user input 
> directly".  Thus, it would seem reasonable to have an application directly 
> pass user input in the q= parameter to a back-end Solr server.
> However, it seems that users can enter local params at the start of q= which 
> override the global params that the application (e.g. website) may have set 
> on the query string.  Confirmed with Erik Hatcher that this is somewhat 
> unexpected behavior (though one could argue it's an expected feature of any 
> query parser)
> Proposed fix - add a parameter (e.g. that can be used as an invariant) that 
> can be passed to inhibit Solr from using local params from the q= parameter.
> This is somewhat related to SOLR-1687



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >