[GitHub] [lucene-solr] mocobeta commented on issue #885: LUCENE-8981: update Kuromoji javadocs, adding experimental tags to Di…

2019-09-16 Thread GitBox
mocobeta commented on issue #885: LUCENE-8981: update Kuromoji javadocs, adding 
experimental tags to Di…
URL: https://github.com/apache/lucene-solr/pull/885#issuecomment-532025182
 
 
   Thanks for giving the elaborated DictionaryBuilder Javadoc!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta commented on a change in pull request #885: LUCENE-8981: update Kuromoji javadocs, adding experimental tags to Di…

2019-09-16 Thread GitBox
mocobeta commented on a change in pull request #885: LUCENE-8981: update 
Kuromoji javadocs, adding experimental tags to Di…
URL: https://github.com/apache/lucene-solr/pull/885#discussion_r324951538
 
 

 ##
 File path: 
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseTokenizer.java
 ##
 @@ -219,9 +219,9 @@ public JapaneseTokenizer(UserDictionary userDictionary, 
boolean discardPunctuati
   }
 
   /**
-   * Create a new JapaneseTokenizer, supplying a custom system dictionary and 
unknown dictionary.
-   * 
-   * Uses the default AttributeFactory.
+   * Create a new JapaneseTokenizer, supplying a custom system dictionary 
and unknown dictionary.
+   * This constructor provides an entry point for users that want to 
constructcustom language models
 
 Review comment:
   Seems a whitespace is missing : constructcustom -> construct custom


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13452) Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.

2019-09-16 Thread Mark Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930977#comment-16930977
 ] 

Mark Miller commented on SOLR-13452:


I think with this kind of test load we need to start clearly thinking about the 
tests as Nightly and not (it's not a great name IMO) and non Nightly test runs 
need to be geared towards stability and developers running tests somewhat 
interactively. Stuff that is going to take forever should be run by CI. Make 
something with performance for the non Nightly runs.

This this free CI stuff from GitHub I think there is a lot of room to solve 
this nicely.

I'd also like to explore adding integrationTest in the future, but that's not 
near term for me.

> Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
> -
>
> Key: SOLR-13452
> URL: https://issues.apache.org/jira/browse/SOLR-13452
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: gradle-build.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I took some things from the great work that Dat did in 
> [https://github.com/apache/lucene-solr/tree/jira/gradle] and took the ball a 
> little further.
>  
> When working with gradle in sub modules directly, I recommend 
> [https://github.com/dougborg/gdub]
> This gradle branch uses the following plugin for version locking, version 
> configuration and version consistency across modules: 
> [https://github.com/palantir/gradle-consistent-versions]
>  
> https://github.com/apache/lucene-solr/tree/jira/SOLR-13452_gradle_7



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13452) Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.

2019-09-16 Thread Mark Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930973#comment-16930973
 ] 

Mark Miller commented on SOLR-13452:


I'll have to do something about test performance and duration. It's gotten 
pretty, pretty bad on master and now that we won't be often running the super 
long tests early, it can be quite devastating.

Even on master, we are not as good as we should be on that because some tests 
can easily randomly be much slower or faster.

In general, we are strangling our tests with over randomization and fault 
injection. It would be good to know the tests work without 100 mock and crazy 
and random things severely impacting a 4 jetty node cluster on a single machine.

Our heavy integration tests should only do crazy random stuff on key 
performance paths for @Nightly unless designed for that- so that regular runs 
are not sometimes insanely slow and so that we can better understand what is 
stable in a stable env and what is not stable in a crazy fault injected, mock, 
random environment we don't understand well but apply to every tests great and 
small generally.

> Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
> -
>
> Key: SOLR-13452
> URL: https://issues.apache.org/jira/browse/SOLR-13452
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: gradle-build.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I took some things from the great work that Dat did in 
> [https://github.com/apache/lucene-solr/tree/jira/gradle] and took the ball a 
> little further.
>  
> When working with gradle in sub modules directly, I recommend 
> [https://github.com/dougborg/gdub]
> This gradle branch uses the following plugin for version locking, version 
> configuration and version consistency across modules: 
> [https://github.com/palantir/gradle-consistent-versions]
>  
> https://github.com/apache/lucene-solr/tree/jira/SOLR-13452_gradle_7



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13238) BlobHandler generates non-padded md5

2019-09-16 Thread Jeff Walraven (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930971#comment-16930971
 ] 

Jeff Walraven commented on SOLR-13238:
--

[~janhoy] Thanks! I will keep that in mind for future PRs :)

> BlobHandler generates non-padded md5
> 
>
> Key: SOLR-13238
> URL: https://issues.apache.org/jira/browse/SOLR-13238
> Project: Solr
>  Issue Type: Bug
>  Components: blobstore
>Affects Versions: 6.0, 6.6.5, 7.0, 7.6, 7.7
>Reporter: Jeff Walraven
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Introduced in SOLR-6787
> The blob handler currently uses the following logic for generating/storing 
> the md5 for uploads:
> {code:java}
> MessageDigest m = MessageDigest.getInstance("MD5");
> m.update(payload.array(), payload.position(), payload.limit());
> String md5 = new BigInteger(1, m.digest()).toString(16);
> {code}
> Unfortunately, this method does not provide padding for any md5 with less 
> than 0x10 for its most significant byte. This means that on many occasions it 
> could end up with a md5 hash of 31 characters instead of 32. 
> I have opened a PR with the following recommended change:
> {code:java}
> String md5 = new String(Hex.encodeHex(m.digest()));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13767) Upgrade jackson to 2.9.9

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930963#comment-16930963
 ] 

ASF subversion and git services commented on SOLR-13767:


Commit fce0a5d45b16cba8e758cda534bae0a1d89d8ed1 in lucene-solr's branch 
refs/heads/branch_8x from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fce0a5d ]

SOLR-13767: Upgrade jackson to 2.9.9 (#886)

(cherry picked from commit b617769614a5dedf2bcbb317fcddc73711ac407f)


> Upgrade jackson to 2.9.9
> 
>
> Key: SOLR-13767
> URL: https://issues.apache.org/jira/browse/SOLR-13767
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13767) Upgrade jackson to 2.9.9

2019-09-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-13767.

Resolution: Fixed

> Upgrade jackson to 2.9.9
> 
>
> Key: SOLR-13767
> URL: https://issues.apache.org/jira/browse/SOLR-13767
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13767) Upgrade jackson to 2.9.9

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930952#comment-16930952
 ] 

ASF subversion and git services commented on SOLR-13767:


Commit b617769614a5dedf2bcbb317fcddc73711ac407f in lucene-solr's branch 
refs/heads/master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b617769 ]

SOLR-13767: Upgrade jackson to 2.9.9 (#886)



> Upgrade jackson to 2.9.9
> 
>
> Key: SOLR-13767
> URL: https://issues.apache.org/jira/browse/SOLR-13767
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy merged pull request #886: SOLR-13767: Upgrade jackson to 2.9.9

2019-09-16 Thread GitBox
janhoy merged pull request #886: SOLR-13767: Upgrade jackson to 2.9.9
URL: https://github.com/apache/lucene-solr/pull/886
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13767) Upgrade jackson to 2.9.9

2019-09-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-13767:
---
Fix Version/s: 8.3

> Upgrade jackson to 2.9.9
> 
>
> Key: SOLR-13767
> URL: https://issues.apache.org/jira/browse/SOLR-13767
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy opened a new pull request #886: SOLR-13767: Upgrade jackson to 2.9.9

2019-09-16 Thread GitBox
janhoy opened a new pull request #886: SOLR-13767: Upgrade jackson to 2.9.9
URL: https://github.com/apache/lucene-solr/pull/886
 
 
   # Description
   
   2.9.9 was released in May 2019
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I am authorized to contribute this code to the ASF and have removed 
any code I do not have a license to distribute.
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13272) Interval facet support for JSON faceting

2019-09-16 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930913#comment-16930913
 ] 

Lucene/Solr QA commented on SOLR-13272:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  3m 17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  3m 17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  3m 17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
45s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 96m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13272 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980425/SOLR-13272.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 30aad17 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/549/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/549/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13767) Upgrade jackson to 2.9.9

2019-09-16 Thread Jira
Jan Høydahl created SOLR-13767:
--

 Summary: Upgrade jackson to 2.9.9
 Key: SOLR-13767
 URL: https://issues.apache.org/jira/browse/SOLR-13767
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Jan Høydahl
Assignee: Jan Høydahl






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13762) Support binary values when using XMLCodec

2019-09-16 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930858#comment-16930858
 ] 

Jason Gerlowski commented on SOLR-13762:


It looks like there's some [prior 
discussion|https://issues.apache.org/jira/browse/SOLR-1116] around how binary 
fields should be encoded to fit within XML and other text-based formats.  The 
consensus there was base64, which I think lines up with your PR?  It's also 
interesting that someone (at some point) thought about fitting fields with 
binary content into text-base response formats.  I wonder whether this still 
works for fields of type {{BinaryField}}, or whether this broke at some point.  
If the code in SOLR-1116 (the link above) is still valid, we should probably 
try to fix things here in a similar manner.  (unless there's some particular 
reason not to do so).

Additionally, I know this issue is scoped somewhat narrowly to XML, but I 
wonder if we couldn't also fix this problem in JSON with a similar amount of 
work.  Still need to do more digging into the SOLR-1116 commit, and how these 
codecs work today to say whether that's possible though.  Maybe it's not.

> Support binary values when using XMLCodec
> -
>
> Key: SOLR-13762
> URL: https://issues.apache.org/jira/browse/SOLR-13762
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers, Response Writers, Server, SolrJ, 
> UpdateRequestProcessors
>Affects Versions: master (9.0), 8.3
>Reporter: Thomas Wöckinger
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As Solr can handle binary fields, it should be possible to use XML as Codec 
> to encode and decode it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13765) Deadlock on Solr cloud request causing 'Too many open files'

2019-09-16 Thread Lei Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wu updated SOLR-13765:
--
Summary: Deadlock on Solr cloud request causing 'Too many open files'  
(was: Deadlock on Solr cloud request)

> Deadlock on Solr cloud request causing 'Too many open files'
> 
>
> Key: SOLR-13765
> URL: https://issues.apache.org/jira/browse/SOLR-13765
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Lei Wu
>Priority: Major
>
> Hi there,
> We are seeing an issue about deadlock on Solr cloud request. 
> Say we have a collection with one shard and two replicas for that shard. For 
> whatever reason the cluster appears to be active but each individual replica 
> is down. And when a request comes in, Solr (replica 1) tries to find a remote 
> node (replica 2) to handle the request since the local core (replica 1) is 
> down and when the other node (replica 2) receives the request it does the 
> same to forward the request back to the original node (replica 1). This 
> causes deadlock and eventually uses all the socket causing 
> `{color:#FF}Too many open files{color}`.
> Not sure what the purpose of finding an inactive node to handle request in 
> HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13765) Deadlock on Solr cloud request causing 'Too many open files' error

2019-09-16 Thread Lei Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wu updated SOLR-13765:
--
Summary: Deadlock on Solr cloud request causing 'Too many open files' error 
 (was: Deadlock on Solr cloud request causing 'Too many open files')

> Deadlock on Solr cloud request causing 'Too many open files' error
> --
>
> Key: SOLR-13765
> URL: https://issues.apache.org/jira/browse/SOLR-13765
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Lei Wu
>Priority: Major
>
> Hi there,
> We are seeing an issue about deadlock on Solr cloud request. 
> Say we have a collection with one shard and two replicas for that shard. For 
> whatever reason the cluster appears to be active but each individual replica 
> is down. And when a request comes in, Solr (replica 1) tries to find a remote 
> node (replica 2) to handle the request since the local core (replica 1) is 
> down and when the other node (replica 2) receives the request it does the 
> same to forward the request back to the original node (replica 1). This 
> causes deadlock and eventually uses all the socket causing 
> `{color:#FF}Too many open files{color}`.
> Not sure what the purpose of finding an inactive node to handle request in 
> HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13765) Deadlock on Solr cloud request

2019-09-16 Thread Lei Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wu updated SOLR-13765:
--
Description: 
Hi there,

We are seeing an issue about deadlock on Solr cloud request. 

Say we have a collection with one shard and two replicas for that shard. For 
whatever reason the cluster appears to be active but each individual replica is 
down. And when a request comes in, Solr (replica 1) tries to find a remote node 
(replica 2) to handle the request since the local core (replica 1) is down and 
when the other node (replica 2) receives the request it does the same to 
forward the request back to the original node (replica 1). This causes deadlock 
and eventually uses all the socket causing `{color:#FF}Too many open 
files{color}`.

Not sure what the purpose of finding an inactive node to handle request in 
HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem

  was:
Hi there,

We are seeing an issue about deadlock on Solr cloud request. 

Say we have a collection with one shard and two replicas for that shard. For 
whatever reason the cluster appears to be active but each individual replica is 
down. And when a request comes in, Solr (replica 1) tries to find a remote node 
(replica 2) to handle the request since the local core (replica 1) is down and 
when the other node (replica 2) receives the request it does the same to 
forward the request back to the original node (replica 1). This causes deadlock 
and eventually uses all the socket cause `Too many open files`.

Not sure what the purpose of finding an inactive node to handle request in 
HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem


> Deadlock on Solr cloud request
> --
>
> Key: SOLR-13765
> URL: https://issues.apache.org/jira/browse/SOLR-13765
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Lei Wu
>Priority: Major
>
> Hi there,
> We are seeing an issue about deadlock on Solr cloud request. 
> Say we have a collection with one shard and two replicas for that shard. For 
> whatever reason the cluster appears to be active but each individual replica 
> is down. And when a request comes in, Solr (replica 1) tries to find a remote 
> node (replica 2) to handle the request since the local core (replica 1) is 
> down and when the other node (replica 2) receives the request it does the 
> same to forward the request back to the original node (replica 1). This 
> causes deadlock and eventually uses all the socket causing 
> `{color:#FF}Too many open files{color}`.
> Not sure what the purpose of finding an inactive node to handle request in 
> HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13765) Deadlock on Solr cloud request

2019-09-16 Thread Lei Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wu updated SOLR-13765:
--
Description: 
Hi there,

We are seeing an issue about Deadlock on Solr cloud request. 

Say we have a collection with one shard and two replicas for that shard. For 
whatever reason the cluster appears to be active but each individual replica is 
down. And when a request comes in, Solr (replica 1) tries to find a remote node 
(replica 2) to handle the request since the local core (replica 1) is down and 
when the other node (replica 2) receives the request it does the same to 
forward the request back to the original node (replica 1). This causes deadlock 
and eventually uses all the socket cause `Too many open files`.

Not sure what the purpose of finding an inactive node to handle request in 
HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem

  was:
Hi there,

We are seeing an issue about Deadlock on Solr cloud request. 

Say we have a collection with one shard and two replicas for that shard. For 
whatever reason the cluster appears to be active but each individual replica is 
down. And when a request comes in, Solr (replica 1) tries to find a remote node 
(replica 2) to handle the request since the local core (replica 1) is down and 
when the other node (replica 2) receives the request it does the same to 
forward the request back to the original node (replica 1). This causes deadlock 
and eventually uses all the socket cause `
Too many open files
`.

Not sure what the purpose of finding an inactive node to handle request in 
HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem


> Deadlock on Solr cloud request
> --
>
> Key: SOLR-13765
> URL: https://issues.apache.org/jira/browse/SOLR-13765
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Lei Wu
>Priority: Major
>
> Hi there,
> We are seeing an issue about Deadlock on Solr cloud request. 
> Say we have a collection with one shard and two replicas for that shard. For 
> whatever reason the cluster appears to be active but each individual replica 
> is down. And when a request comes in, Solr (replica 1) tries to find a remote 
> node (replica 2) to handle the request since the local core (replica 1) is 
> down and when the other node (replica 2) receives the request it does the 
> same to forward the request back to the original node (replica 1). This 
> causes deadlock and eventually uses all the socket cause `Too many open 
> files`.
> Not sure what the purpose of finding an inactive node to handle request in 
> HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13765) Deadlock on Solr cloud request

2019-09-16 Thread Lei Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wu updated SOLR-13765:
--
Description: 
Hi there,

We are seeing an issue about deadlock on Solr cloud request. 

Say we have a collection with one shard and two replicas for that shard. For 
whatever reason the cluster appears to be active but each individual replica is 
down. And when a request comes in, Solr (replica 1) tries to find a remote node 
(replica 2) to handle the request since the local core (replica 1) is down and 
when the other node (replica 2) receives the request it does the same to 
forward the request back to the original node (replica 1). This causes deadlock 
and eventually uses all the socket cause `Too many open files`.

Not sure what the purpose of finding an inactive node to handle request in 
HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem

  was:
Hi there,

We are seeing an issue about Deadlock on Solr cloud request. 

Say we have a collection with one shard and two replicas for that shard. For 
whatever reason the cluster appears to be active but each individual replica is 
down. And when a request comes in, Solr (replica 1) tries to find a remote node 
(replica 2) to handle the request since the local core (replica 1) is down and 
when the other node (replica 2) receives the request it does the same to 
forward the request back to the original node (replica 1). This causes deadlock 
and eventually uses all the socket cause `Too many open files`.

Not sure what the purpose of finding an inactive node to handle request in 
HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem


> Deadlock on Solr cloud request
> --
>
> Key: SOLR-13765
> URL: https://issues.apache.org/jira/browse/SOLR-13765
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Lei Wu
>Priority: Major
>
> Hi there,
> We are seeing an issue about deadlock on Solr cloud request. 
> Say we have a collection with one shard and two replicas for that shard. For 
> whatever reason the cluster appears to be active but each individual replica 
> is down. And when a request comes in, Solr (replica 1) tries to find a remote 
> node (replica 2) to handle the request since the local core (replica 1) is 
> down and when the other node (replica 2) receives the request it does the 
> same to forward the request back to the original node (replica 1). This 
> causes deadlock and eventually uses all the socket cause `Too many open 
> files`.
> Not sure what the purpose of finding an inactive node to handle request in 
> HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13764) Parse Interval Query from JSON API

2019-09-16 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13764:

Description: 
h2. Context

Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy man's 
Spans/Phrases. Note: It's not about ranges nor facets.
h2. Problem

There's no way to search by IntervalQuery via JSON Query DSL.
h2. Suggestion
 * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie one 
can combine a few such refs in {{json.query.bool}}

 * It accepts just a name of JSON params, nothing like this happens yet.
 * This param carries plain json which is accessible via {{req.getJSON()}}

{\{{}}
 {{  query: {bool:{should:[}}
 {{    \{interval:i_1},}}
 {{    {interval:

{query:i_2, df:title}

}}
 {{  }]}},}}
 {{  params:{}}
 {{    df: description_t,}}
 {{    i_1:\{phrase:"lorem ipsum"},}}
 {{    i_2:{ unordered: [

{term:"bar"}

,\{phrase:"bag ban"}]}}}
 \{{  }}}
 {{}}}
h2. Challenges
 * I have no idea about particular JSON DSL for these queries, Lucene API seems 
like easy JSON-able. Proposals are welcome.
 * Another awkward things is combining analysis and low level query API. eg 
what if one request term for one word and analysis yield two tokens, and vice 
versa requesting phrase might end up with single token stream.
 * Putting json into Jira ticket description

h2. Q: Why don't..

.. put intervals DSL right into {{json.query}}, avoiding these odd param refs? 
 A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
for handling old good http parametrized queires.

  was:
h2. Context

Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy man's 
Spans/Phrases. Note: It's not about ranges nor facets.
h2. Problem

There's no way to search by IntervalQuery via JSON Query DSL.
h2. Suggestion
 * Create classic QParser {{{!interval df=text_content}a_json_param}}, ie one 
can combine a few such refs in {{json.query.bool}}
 * It accepts just a name of JSON params, nothing like this happens yet.
 * This param carries plain json which is accessible via {{req.getJSON()}}

{{{}}
{{  query: {bool:{should:[}}
{{    \{interval:i_1},}}
{{    \{interval:{query:i_2, df:title}}}
{{  }]}},}}
{{  params:{}}
{{    df: description_t,}}
{{    i_1:\{phrase:"lorem ipsum"},}}
{{    i_2:\{ unordered: [{term:"bar"},\{phrase:"bag ban"}]}}}
{{  }}}
{{}}}
h2. Challenges
 * I have no idea about particular JSON DSL for these queries, Lucene API seems 
like easy JSON-able. Proposals are welcome.
 * Another awkward things is combining analysis and low level query API. eg 
what if one request term for one word and analysis yield two tokens, and vice 
versa requesting phrase might end up with single token stream.

h2. Q: Why don't..

.. put intervals DSL right into {{json.query}}, avoiding these odd param refs? 
 A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
for handling old good http parametrized queires.


> Parse Interval Query from JSON API
> --
>
> Key: SOLR-13764
> URL: https://issues.apache.org/jira/browse/SOLR-13764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>
> h2. Context
> Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy 
> man's Spans/Phrases. Note: It's not about ranges nor facets.
> h2. Problem
> There's no way to search by IntervalQuery via JSON Query DSL.
> h2. Suggestion
>  * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie 
> one can combine a few such refs in {{json.query.bool}}
>  * It accepts just a name of JSON params, nothing like this happens yet.
>  * This param carries plain json which is accessible via {{req.getJSON()}}
> {\{{}}
>  {{  query: {bool:{should:[}}
>  {{    \{interval:i_1},}}
>  {{    {interval:
> {query:i_2, df:title}
> }}
>  {{  }]}},}}
>  {{  params:{}}
>  {{    df: description_t,}}
>  {{    i_1:\{phrase:"lorem ipsum"},}}
>  {{    i_2:{ unordered: [
> {term:"bar"}
> ,\{phrase:"bag ban"}]}}}
>  \{{  }}}
>  {{}}}
> h2. Challenges
>  * I have no idea about particular JSON DSL for these queries, Lucene API 
> seems like easy JSON-able. Proposals are welcome.
>  * Another awkward things is combining analysis and low level query API. eg 
> what if one request term for one word and analysis yield two tokens, and vice 
> versa requesting phrase might end up with single token stream.
>  * Putting json into Jira ticket description
> h2. Q: Why don't..
> .. put intervals DSL right into {{json.query}}, avoiding these odd param 
> refs? 
>  A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
> for handling old good http parametrized queires.



--
This message 

[jira] [Created] (SOLR-13764) Parse Interval Query from JSON API

2019-09-16 Thread Mikhail Khludnev (Jira)
Mikhail Khludnev created SOLR-13764:
---

 Summary: Parse Interval Query from JSON API
 Key: SOLR-13764
 URL: https://issues.apache.org/jira/browse/SOLR-13764
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: query parsers
Reporter: Mikhail Khludnev


h2. Context

Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy man's 
Spans/Phrases. Note: It's not about ranges nor facets.
h2. Problem

There's no way to search by IntervalQuery via JSON Query DSL.
h2. Suggestion
 * Create classic QParser {{{!interval df=text_content}a_json_param}}, ie one 
can combine a few such refs in {{json.query.bool}}
 * It accepts just a name of JSON params, nothing like this happens yet.
 * This param carries plain json which is accessible via {{req.getJSON()}}

{{{}}
{{  query: {bool:{should:[}}
{{    \{interval:i_1},}}
{{    \{interval:{query:i_2, df:title}}}
{{  }]}},}}
{{  params:{}}
{{    df: description_t,}}
{{    i_1:\{phrase:"lorem ipsum"},}}
{{    i_2:\{ unordered: [{term:"bar"},\{phrase:"bag ban"}]}}}
{{  }}}
{{}}}
h2. Challenges
 * I have no idea about particular JSON DSL for these queries, Lucene API seems 
like easy JSON-able. Proposals are welcome.
 * Another awkward things is combining analysis and low level query API. eg 
what if one request term for one word and analysis yield two tokens, and vice 
versa requesting phrase might end up with single token stream.

h2. Q: Why don't..

.. put intervals DSL right into {{json.query}}, avoiding these odd param refs? 
 A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined 
for handling old good http parametrized queires.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-16 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930799#comment-16930799
 ] 

Ishan Chattopadhyaya commented on SOLR-13661:
-

Based on feedback from Activate conference attendees and some internal design 
discussions, here's what we plan to do:
# Support for multiple jars in the same package
# Packages should support verify operations after installation/updation
# Put package metadata (like setup commands etc.) into the blob store


> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
>  Labels: package
>
> Solr needs a unified cohesive package management system so that users can 
> deploy/redeploy plugins in a safe manner. This is an umbrella issue to 
> eventually build that solution



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13762) Support binary values when using XMLCodec

2019-09-16 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930791#comment-16930791
 ] 

Jason Gerlowski commented on SOLR-13762:


Dumb question: do you know what field types this all effects?  Or are there a 
few fields where this is definitely an issue?  I'm asking to make sure to try 
each of them out, and make sure we have tests if it makes sense.  But I'm also 
looking for an easy way to reproduce.  So if you've got a concrete way to 
trigger this, feel free to post a script or something as an attachment.

I'd like to review this, but it'll take me some time to read up on the jira 
history and code involved.  Aiming to get some feedback out by next weekend.  
If anyone else with more context wants to jump in, please do.

> Support binary values when using XMLCodec
> -
>
> Key: SOLR-13762
> URL: https://issues.apache.org/jira/browse/SOLR-13762
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers, Response Writers, Server, SolrJ, 
> UpdateRequestProcessors
>Affects Versions: master (9.0), 8.3
>Reporter: Thomas Wöckinger
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As Solr can handle binary fields, it should be possible to use XML as Codec 
> to encode and decode it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930765#comment-16930765
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 35fe0be42c8260f817adee5a2e196e9f41677e42 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=35fe0be ]

SOLR-13105: More transform docs 3


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930760#comment-16930760
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit f44940b271a39e077be2e1f11e3f7832d44c7f7e in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f44940b ]

SOLR-13105: More transform docs 2


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8972) CharFilter version of ICUTransformFilter, to better support dictionary-based tokenization

2019-09-16 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930759#comment-16930759
 ] 

Robert Muir commented on LUCENE-8972:
-

Yes, this would be another thing, good one for tests. But the whole idea is 
sound, I think you should be able to make it work!

> CharFilter version of ICUTransformFilter, to better support dictionary-based 
> tokenization
> -
>
> Key: LUCENE-8972
> URL: https://issues.apache.org/jira/browse/LUCENE-8972
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
>
> The ICU Transliteration API is currently exposed through Lucene only 
> post-tokinzer, via ICUTransformFilter. Some tokenizers (particularly 
> dictionary-based) may assume pre-normalized input (e.g., for Chinese 
> characters, there may be an assumption of traditional-only or simplified-only 
> input characters, at the level of either all input, or 
> per-dictionary-defined-token).
> The potential usefulness of a CharFilter that exposes the ICU Transliteration 
> API was suggested in a [thread on the Solr mailing 
> list|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201807.mbox/%3C4DAB7BA7-42A8-4009-8B49-60822B00DE7D%40wunderwood.org%3E],
>  and my hope is that this issue can facilitate more detailed discussion of 
> the proposed addition.
> A concrete example of mixed traditional/simplified characters that are 
> currently tokenized differently by the ICUTokenizer are:
>  * 红楼梦 (SSS)
>  * 紅樓夢 (TTT)
>  * 紅楼夢 (TST)
> The first two tokens (simplified-only and traditional-only, respectively) are 
> included in the [CJ dictionary that backs 
> ICUTokenizer|https://raw.githubusercontent.com/unicode-org/icu/release-62-1/icu4c/source/data/brkitr/dictionaries/cjdict.txt],
>  but the last (a mixture of traditional and simplified characters) is not, 
> and is not recognized as a token. Even _if_ we assume this to be an 
> intentional omission from the dictionary that results in behavior that could 
> be desirable for some use cases, there are surely some use cases that would 
> benefit from a more permissive dictionary-based tokenization strategy (such 
> as could be supported by pre-tokenizer transliteration).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13725) TermsFacetMap.setLimit() unnecessarily rejects negative parameter value

2019-09-16 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930735#comment-16930735
 ] 

Munendra S N edited comment on SOLR-13725 at 9/16/19 5:47 PM:
--

 [^SOLR-13725.patch] 
[~gerlowskija]
I have removed check from {{setLimit}}. I checked {{setMinCount}}, solrJ 
doesn't allow mincount to be 0.
Currently, solr doesn't support mincount=0 for numeric fieldtypes in terms 
facet but, for other types it is supported. Shouldn't we just allow setting 
mincount to 0 and handle this case at Server(Solr already throws error on 
mincount=0 for numeric types)


was (Author: munendrasn):
 [^SOLR-13725.patch] 
[~gerlowskija]
I have removed check from {{setLimit}}. I checked {{setMinCount}}, solrJ 
doesn't allow mincount to be 0.
Currently, solr doesn't support mincount=0 for numeric fieldtypes in terms 
facet but, for other types it is supported. Shouldn't we just allow setting 
mincount to 0 and handle this case at Server(Solr already throws error on 
mincount=0 for numeric types)

> TermsFacetMap.setLimit() unnecessarily rejects negative parameter value
> ---
>
> Key: SOLR-13725
> URL: https://issues.apache.org/jira/browse/SOLR-13725
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.2
>Reporter: Richard Walker
>Assignee: Munendra S N
>Priority: Trivial
> Attachments: SOLR-13725.patch, SOLR-13725.patch
>
>
> SolrJ's {{TermsFacetMap.setLimit(int maximumBuckets)}} rejects a negative 
> parameter value with an IllegalArgumentException "Parameter 'maximumBuckets' 
> must be non-negative".
> But a negative value for the limit parameter is accepted by Solr server, and 
> is meaningful: i.e., it means "no limit".
> The {{setLimit()}} method shouldn't reject a negative parameter value.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13725) TermsFacetMap.setLimit() unnecessarily rejects negative parameter value

2019-09-16 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13725:

Attachment: (was: SOLR-13725.patch)

> TermsFacetMap.setLimit() unnecessarily rejects negative parameter value
> ---
>
> Key: SOLR-13725
> URL: https://issues.apache.org/jira/browse/SOLR-13725
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.2
>Reporter: Richard Walker
>Assignee: Munendra S N
>Priority: Trivial
> Attachments: SOLR-13725.patch
>
>
> SolrJ's {{TermsFacetMap.setLimit(int maximumBuckets)}} rejects a negative 
> parameter value with an IllegalArgumentException "Parameter 'maximumBuckets' 
> must be non-negative".
> But a negative value for the limit parameter is accepted by Solr server, and 
> is meaningful: i.e., it means "no limit".
> The {{setLimit()}} method shouldn't reject a negative parameter value.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13725) TermsFacetMap.setLimit() unnecessarily rejects negative parameter value

2019-09-16 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13725:

Attachment: SOLR-13725.patch

> TermsFacetMap.setLimit() unnecessarily rejects negative parameter value
> ---
>
> Key: SOLR-13725
> URL: https://issues.apache.org/jira/browse/SOLR-13725
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.2
>Reporter: Richard Walker
>Assignee: Munendra S N
>Priority: Trivial
> Attachments: SOLR-13725.patch
>
>
> SolrJ's {{TermsFacetMap.setLimit(int maximumBuckets)}} rejects a negative 
> parameter value with an IllegalArgumentException "Parameter 'maximumBuckets' 
> must be non-negative".
> But a negative value for the limit parameter is accepted by Solr server, and 
> is meaningful: i.e., it means "no limit".
> The {{setLimit()}} method shouldn't reject a negative parameter value.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13725) TermsFacetMap.setLimit() unnecessarily rejects negative parameter value

2019-09-16 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13725:

Status: Open  (was: Open)

> TermsFacetMap.setLimit() unnecessarily rejects negative parameter value
> ---
>
> Key: SOLR-13725
> URL: https://issues.apache.org/jira/browse/SOLR-13725
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.2
>Reporter: Richard Walker
>Assignee: Munendra S N
>Priority: Trivial
> Attachments: SOLR-13725.patch
>
>
> SolrJ's {{TermsFacetMap.setLimit(int maximumBuckets)}} rejects a negative 
> parameter value with an IllegalArgumentException "Parameter 'maximumBuckets' 
> must be non-negative".
> But a negative value for the limit parameter is accepted by Solr server, and 
> is meaningful: i.e., it means "no limit".
> The {{setLimit()}} method shouldn't reject a negative parameter value.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13725) TermsFacetMap.setLimit() unnecessarily rejects negative parameter value

2019-09-16 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N reassigned SOLR-13725:
---

Assignee: Munendra S N

> TermsFacetMap.setLimit() unnecessarily rejects negative parameter value
> ---
>
> Key: SOLR-13725
> URL: https://issues.apache.org/jira/browse/SOLR-13725
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.2
>Reporter: Richard Walker
>Assignee: Munendra S N
>Priority: Trivial
> Attachments: SOLR-13725.patch
>
>
> SolrJ's {{TermsFacetMap.setLimit(int maximumBuckets)}} rejects a negative 
> parameter value with an IllegalArgumentException "Parameter 'maximumBuckets' 
> must be non-negative".
> But a negative value for the limit parameter is accepted by Solr server, and 
> is meaningful: i.e., it means "no limit".
> The {{setLimit()}} method shouldn't reject a negative parameter value.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13725) TermsFacetMap.setLimit() unnecessarily rejects negative parameter value

2019-09-16 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930735#comment-16930735
 ] 

Munendra S N commented on SOLR-13725:
-

 [^SOLR-13725.patch] 
[~gerlowskija]
I have removed check from {{setLimit}}. I checked {{setMinCount}}, solrJ 
doesn't allow mincount to be 0.
Currently, solr doesn't support mincount=0 for numeric fieldtypes in terms 
facet but, for other types it is supported. Shouldn't we just allow setting 
mincount to 0 and handle this case at Server(Solr already throws error on 
mincount=0 for numeric types)

> TermsFacetMap.setLimit() unnecessarily rejects negative parameter value
> ---
>
> Key: SOLR-13725
> URL: https://issues.apache.org/jira/browse/SOLR-13725
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.2
>Reporter: Richard Walker
>Priority: Trivial
> Attachments: SOLR-13725.patch
>
>
> SolrJ's {{TermsFacetMap.setLimit(int maximumBuckets)}} rejects a negative 
> parameter value with an IllegalArgumentException "Parameter 'maximumBuckets' 
> must be non-negative".
> But a negative value for the limit parameter is accepted by Solr server, and 
> is meaningful: i.e., it means "no limit".
> The {{setLimit()}} method shouldn't reject a negative parameter value.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on a change in pull request #860: SOLR-13734 JWTAuthPlugin to support multiple issuers

2019-09-16 Thread GitBox
janhoy commented on a change in pull request #860: SOLR-13734 JWTAuthPlugin to 
support multiple issuers
URL: https://github.com/apache/lucene-solr/pull/860#discussion_r324801451
 
 

 ##
 File path: solr/solr-ref-guide/src/major-changes-in-solr-9.adoc
 ##
 @@ -44,3 +44,5 @@ A thorough review of the list in Major Changes in Earlier 
8.x Versions as well a
 === Authentication & Security Changes in Solr 9
 
 * BasicAuthPlugin property 'blockUnknown' now defaults to 'true'. This change 
is backward incompatible. If you need the pre-9.0 default behavior, you need to 
explicitly set 'blockUnknown':'false' in security.json.
+
+* JWTAuthPlugin configuration option `requireSub` is no longer needed and will 
cause an error if used.
 
 Review comment:
   Note to self: This is not part of this PR, must remember to remove 
requireSub totally in master after merge to branch_8x


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13725) TermsFacetMap.setLimit() unnecessarily rejects negative parameter value

2019-09-16 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13725:

Attachment: SOLR-13725.patch

> TermsFacetMap.setLimit() unnecessarily rejects negative parameter value
> ---
>
> Key: SOLR-13725
> URL: https://issues.apache.org/jira/browse/SOLR-13725
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.2
>Reporter: Richard Walker
>Priority: Trivial
> Attachments: SOLR-13725.patch
>
>
> SolrJ's {{TermsFacetMap.setLimit(int maximumBuckets)}} rejects a negative 
> parameter value with an IllegalArgumentException "Parameter 'maximumBuckets' 
> must be non-negative".
> But a negative value for the limit parameter is accepted by Solr server, and 
> is meaningful: i.e., it means "no limit".
> The {{setLimit()}} method shouldn't reject a negative parameter value.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on issue #860: SOLR-13734 JWTAuthPlugin to support multiple issuers

2019-09-16 Thread GitBox
janhoy commented on issue #860: SOLR-13734 JWTAuthPlugin to support multiple 
issuers
URL: https://github.com/apache/lucene-solr/pull/860#issuecomment-531881327
 
 
   > When JIRAs have github PR's what's the appropriate place for higher-level, 
non-line-specific review comments? Is there a consensus on this? Is one more 
discoverable than the other?
   
   Good question. I'd hope we could move totally to GitHub issues+pr and scrap 
JIRA. But for now my thought is that if it is a big PR that will attract much 
comments in GitHub perhaps it is best to keep general comments in the PR as 
well, to keep all in one place, and then update the JIRA once in a while with 
general progress, i.e. (planning to merge in 3 days) to attract more attention.
   
   Thanks for your review. See JIRA for comments :) 
   
   > Would it be possible to deprecate iss, wellKnownUrl etc outside of the 
issuers hash in this PR?
   
   I'll wait until both the "REST API issuers support" and "Admin UI choose 
issuer to log in with" are solved, so deprecation will not happen in this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13734) JWTAuthPlugin to support multiple issuers

2019-09-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930727#comment-16930727
 ] 

Jan Høydahl commented on SOLR-13734:


{quote}What's the purpose of the distinction between the "primary" issuer and 
the secondary issuers under the {{issuers}} key? I imagine the "primary" issuer 
is just kept around for back-compat purposes?
{quote}
It's a backcaompat solution for 8.x for sure. But also since there is no 
REST-API support for adding to the issuers array we cannot yet deprecate it. 
Another reason is that the Admin UI login is not written to choose between 
multiple IdPs (could be done in a followup issue), so therefore the Admin UI 
will always use the first (primary) issuer. Once those two features are 
complete, we could deprecate top-level keys from 9.x

> JWTAuthPlugin to support multiple issuers
> -
>
> Key: SOLR-13734
> URL: https://issues.apache.org/jira/browse/SOLR-13734
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: security
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: JWT, authentication, pull-request-available
> Fix For: 8.3
>
> Attachments: jwt-authentication-plugin.html
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some large enterprise environments, there is more than one [Identity 
> Provider|https://en.wikipedia.org/wiki/Identity_provider] to issue tokens for 
> users. The equivalent example from the public internet is logging in to a 
> website and choose between multiple pre-defined IdPs (such as Google, GitHub, 
> Facebook etc) in the Oauth2/OIDC flow.
> In the enterprise the IdPs could be public ones but most likely they will be 
> private IdPs in various networks inside the enterprise. Users will interact 
> with a search application, e.g. one providing enterprise wide search, and 
> will authenticate with one out of several IdPs depending on their local 
> affiliation. The search app will then request an access token (JWT) for the 
> user and issue requests to Solr using that token.
> The JWT plugin currently supports exactly one IdP. This JIRA will extend 
> support for multiple IdPs for access token validation only. To limit the 
> scope of this Jira, Admin UI login must still happen to the "primary" IdP. 
> Supporting multiple IdPs for Admin UI login can be done in followup issues.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov opened a new pull request #885: LUCENE-8981: update Kuromoji javadocs, adding experimental tags to Di…

2019-09-16 Thread GitBox
msokolov opened a new pull request #885: LUCENE-8981: update Kuromoji javadocs, 
adding experimental tags to Di…
URL: https://github.com/apache/lucene-solr/pull/885
 
 
   …ctionaryBuilder and JapaneseTokenizer ctor


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-8981) Update javadocs to reflect experimental status of Kuromoji DictionaryBuilder

2019-09-16 Thread Mike Sokolov (Jira)
Mike Sokolov created LUCENE-8981:


 Summary: Update javadocs to reflect experimental status of 
Kuromoji DictionaryBuilder
 Key: LUCENE-8981
 URL: https://issues.apache.org/jira/browse/LUCENE-8981
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Mike Sokolov


This is follow up to LUCENE-8971



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13272) Interval facet support for JSON faceting

2019-09-16 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930715#comment-16930715
 ] 

Munendra S N commented on SOLR-13272:
-

 [^SOLR-13272.patch] 
I have renamed {{intervals}} to {{ranges}}

> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13272) Interval facet support for JSON faceting

2019-09-16 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13272:

Status: Patch Available  (was: Open)

> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on issue #862: LUCENE-8971: Enable constructing JapaneseTokenizer with custom dictio…

2019-09-16 Thread GitBox
msokolov commented on issue #862: LUCENE-8971: Enable constructing 
JapaneseTokenizer with custom dictio…
URL: https://github.com/apache/lucene-solr/pull/862#issuecomment-531872041
 
 
   Yes, that makes sense. I'll post a new CR soon with updated javadocs


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13272) Interval facet support for JSON faceting

2019-09-16 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13272:

Attachment: SOLR-13272.patch

> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13159) Autoscaling not distributing collection evenly

2019-09-16 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  resolved SOLR-13159.
--
Fix Version/s: 8.3
   Resolution: Fixed

Added a note to the RefGuide. Thanks Gus for investigating this!

> Autoscaling not distributing collection evenly
> --
>
> Key: SOLR-13159
> URL: https://issues.apache.org/jira/browse/SOLR-13159
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 8.0
>Reporter: Gus Heck
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13159.patch, autoscaling.json, clstat.json
>
>
> I recently ran into a very strange behavior described in detail in the mail 
> linked at the bottom of this description. In short: 
>  # Default settings didn't distribute nodes evenly on brand new 50 node 
> cluster
>  # Can't seem to write rules producing suggestions to distribute them evenly 
>  # Suggestions are made that then fail despite quiet cluster, no changes.
> Also of note was diagnostic output containing this seemingly impossible 
> result with 2 cores counted and no replicas listed:
> {code:java}
> {
> "node": "solr-2.customer.redacted.com:8983_solr",
> "isLive": true,
> "cores": 2,
> "freedisk": 140.03918838500977,
> "totaldisk": 147.5209503173828,
> "replicas": {}
> },{code}
> I will attach anonymized cluster status output and autoscaling.json shortly 
> This issue may be related to SOLR-13142
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201901.mbox/%3CCAEUNc48HRZA7qo-uKtJQEtZnO9VG9OErQZGzoOmCTBe7C9zvNw%40mail.gmail.com%3E
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Subscribe

2019-09-16 Thread Eric Pugh

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



[jira] [Commented] (SOLR-13159) Autoscaling not distributing collection evenly

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930692#comment-16930692
 ] 

ASF subversion and git services commented on SOLR-13159:


Commit d3671fd0d2bfda4ded0b905be5955b5ccfafff79 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d3671fd ]

SOLR-13159: Add a warning about DNS resolution in SolrCloud clusters.


> Autoscaling not distributing collection evenly
> --
>
> Key: SOLR-13159
> URL: https://issues.apache.org/jira/browse/SOLR-13159
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 8.0
>Reporter: Gus Heck
>Assignee: Andrzej Bialecki 
>Priority: Major
> Attachments: SOLR-13159.patch, autoscaling.json, clstat.json
>
>
> I recently ran into a very strange behavior described in detail in the mail 
> linked at the bottom of this description. In short: 
>  # Default settings didn't distribute nodes evenly on brand new 50 node 
> cluster
>  # Can't seem to write rules producing suggestions to distribute them evenly 
>  # Suggestions are made that then fail despite quiet cluster, no changes.
> Also of note was diagnostic output containing this seemingly impossible 
> result with 2 cores counted and no replicas listed:
> {code:java}
> {
> "node": "solr-2.customer.redacted.com:8983_solr",
> "isLive": true,
> "cores": 2,
> "freedisk": 140.03918838500977,
> "totaldisk": 147.5209503173828,
> "replicas": {}
> },{code}
> I will attach anonymized cluster status output and autoscaling.json shortly 
> This issue may be related to SOLR-13142
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201901.mbox/%3CCAEUNc48HRZA7qo-uKtJQEtZnO9VG9OErQZGzoOmCTBe7C9zvNw%40mail.gmail.com%3E
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930688#comment-16930688
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit cc208c835305fa103e35a08e1656b38356ef20fc in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cc208c8 ]

SOLR-13105: More transform docs


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-9658) Caches should have an optional way to clean if idle for 'x' mins

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930686#comment-16930686
 ] 

ASF subversion and git services commented on SOLR-9658:
---

Commit 2f701c6787f9f216e5065e7f7fa1e7ea01126e22 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2f701c6 ]

SOLR-9658: Max idle time support for SolrCache implementations.


> Caches should have an optional way to clean if idle for 'x' mins
> 
>
> Key: SOLR-9658
> URL: https://issues.apache.org/jira/browse/SOLR-9658
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch, 
> SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch
>
>
> If a cache is idle for long, it consumes precious memory. It should be 
> configurable to clear the cache if it was not accessed for 'x' secs. The 
> cache configuration can have an extra config {{maxIdleTime}} . if we wish it 
> to the cleaned after 10 mins of inactivity set it to {{maxIdleTime=600}}. 
> [~dragonsinth] would it be a solution for the memory leak you mentioned?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12075) TestLargeCluster is too flaky

2019-09-16 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  resolved SOLR-12075.
--
Resolution: Fixed

Fixed as a part of SOLR-12923.

> TestLargeCluster is too flaky
> -
>
> Key: SOLR-12075
> URL: https://issues.apache.org/jira/browse/SOLR-12075
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Major
>
> This test is failing a lot in jenkins builds, with two types of failures:
>  * specific test method failures - this may be caused by either bugs in the 
> autoscaling code, bugs in the simulator or timing issues. It should be 
> possible to narrow down the cause by using different speeds of simulated time.
>  * suite-level failures due to leaked threads - most of these failures 
> indicate the ongoing Policy calculations, eg:
> {code}
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from 
> SUITE scope at org.apache.solr.cloud.autoscaling.sim.TestLargeCluster: 
>   1) Thread[id=21406, name=AutoscalingActionExecutor-7277-thread-1, 
> state=RUNNABLE, group=TGRP-TestLargeCluster]
>at java.util.ArrayList.iterator(ArrayList.java:834)
>at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:131)
>at org.apache.solr.common.util.Utils.makeDeepCopy(Utils.java:110)
>at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:92)
>at org.apache.solr.common.util.Utils.makeDeepCopy(Utils.java:108)
>at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:92)
>at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:74)
>at org.apache.solr.client.solrj.cloud.autoscaling.Row.copy(Row.java:91)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.lambda$getMatrixCopy$1(Policy.java:297)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session$$Lambda$466/1757323495.apply(Unknown
>  Source)
>at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.getMatrixCopy(Policy.java:298)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.copy(Policy.java:287)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.Row.removeReplica(Row.java:156)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.tryEachNode(MoveReplicaSuggester.java:60)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.init(MoveReplicaSuggester.java:34)
>at 
> org.apache.solr.client.solrj.cloud.autoscaling.Suggester.getSuggestion(Suggester.java:129)
>at 
> org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:98)
>at 
> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:307)
>at 
> org.apache.solr.cloud.autoscaling.ScheduledTriggers$$Lambda$439/951218654.run(Unknown
>  Source)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
>at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$9/1677458082.run(Unknown
>  Source)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>at java.lang.Thread.run(Thread.java:748)
>   at __randomizedtesting.SeedInfo.seed([C6FA0364D13DAFCC]:0)
> {code}
> It's possible that somewhere an InterruptedException is caught and not 
> propagated so that the Policy calculations don't terminate when the thread is 
> interrupted when closing parent components.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13742) Allow optional redaction of data saved by 'bin/solr autoscaling -save'

2019-09-16 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-13742:
-
Fix Version/s: 8.3

> Allow optional redaction of data saved by 'bin/solr autoscaling -save'
> --
>
> Key: SOLR-13742
> URL: https://issues.apache.org/jira/browse/SOLR-13742
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Fix For: 8.3
>
>
> Currently we can redact only the data that is printed out to the console at 
> the end of simulation. The tool should support also saving redacted data.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on issue #665: Fixes SOLR-13539

2019-09-16 Thread GitBox
thomaswoeckinger commented on issue #665: Fixes SOLR-13539
URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-531841216
 
 
   @gerlowskija: would be great if we can get this into 8.3, which will start 
in a about 2 weeks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8972) CharFilter version of ICUTransformFilter, to better support dictionary-based tokenization

2019-09-16 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930664#comment-16930664
 ] 

Michael Gibney commented on LUCENE-8972:


Thanks for the feedback/advice, [~rcmuir]. Along the same lines as what you 
mention, I think some attention also needs to be payed to the 
resolution/accuracy of offset correction. I'm going to take a crack at this and 
hope to have something shortly.

> CharFilter version of ICUTransformFilter, to better support dictionary-based 
> tokenization
> -
>
> Key: LUCENE-8972
> URL: https://issues.apache.org/jira/browse/LUCENE-8972
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
>
> The ICU Transliteration API is currently exposed through Lucene only 
> post-tokinzer, via ICUTransformFilter. Some tokenizers (particularly 
> dictionary-based) may assume pre-normalized input (e.g., for Chinese 
> characters, there may be an assumption of traditional-only or simplified-only 
> input characters, at the level of either all input, or 
> per-dictionary-defined-token).
> The potential usefulness of a CharFilter that exposes the ICU Transliteration 
> API was suggested in a [thread on the Solr mailing 
> list|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201807.mbox/%3C4DAB7BA7-42A8-4009-8B49-60822B00DE7D%40wunderwood.org%3E],
>  and my hope is that this issue can facilitate more detailed discussion of 
> the proposed addition.
> A concrete example of mixed traditional/simplified characters that are 
> currently tokenized differently by the ICUTokenizer are:
>  * 红楼梦 (SSS)
>  * 紅樓夢 (TTT)
>  * 紅楼夢 (TST)
> The first two tokens (simplified-only and traditional-only, respectively) are 
> included in the [CJ dictionary that backs 
> ICUTokenizer|https://raw.githubusercontent.com/unicode-org/icu/release-62-1/icu4c/source/data/brkitr/dictionaries/cjdict.txt],
>  but the last (a mixture of traditional and simplified characters) is not, 
> and is not recognized as a token. Even _if_ we assume this to be an 
> intentional omission from the dictionary that results in behavior that could 
> be desirable for some use cases, there are surely some use cases that would 
> benefit from a more permissive dictionary-based tokenization strategy (such 
> as could be supported by pre-tokenizer transliteration).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13763) Improve the tracking of "freedisk" in autoscaling simulations

2019-09-16 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki  created SOLR-13763:


 Summary: Improve the tracking of "freedisk" in autoscaling 
simulations
 Key: SOLR-13763
 URL: https://issues.apache.org/jira/browse/SOLR-13763
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 8.3


The "freedisk" node metric is tracked closely when adding / removing / moving 
replicas but it's not tracked for simulated updates, even though the 
corresponding simulated replica sizes are.

This causes some inconsistencies in "freedisk" calculation and reporting, which 
may affect the results of simulations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #665: Fixes SOLR-13539

2019-09-16 Thread GitBox
thomaswoeckinger commented on a change in pull request #665: Fixes SOLR-13539
URL: https://github.com/apache/lucene-solr/pull/665#discussion_r324729013
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/handler/loader/XMLLoader.java
 ##
 @@ -429,7 +434,18 @@ public SolrInputDocument readDoc(XMLStreamReader parser) 
throws XMLStreamExcepti
 break;
   } else if ("field".equals(parser.getLocalName())) {
 // should I warn in some text has been found too
-Object v = isNull ? null : text.toString();
+Object v;
 
 Review comment:
   > Ah, I think I see. Without this piece of the change, the new tests would 
fail because particular field types have binary data, and as-is we don't handle 
that correctly in SolrJ/EmbeddedSolrServer for wt=xml.
   > 
   Yes, of course!
   
   > But figuring out how Solr should handle binary field data when using 
`wt=xml` is probably worth its own jira, for a few reasons: (1) Other people in 
the community are likely to have opinions and want to chime in. (2) It also, to 
be honest about my limitations, strays into areas of the code I don't know as 
well. (3) It's independent conceptually from the problem we started out trying 
to solve here (adding some atomic update tests to cover what we currently 
support).
   > 
   Added new issue SOLR-13762 and PR #883 
   
   > I'd like to see it get fixed, and I can work with you if you open a JIRA 
specifically for the binary-xml stuff. I just don't think that fixing it here 
is the right approach. Can you remove the binary-xml related changes, and 
either comment out the added tests that will fail, or add a TODO to add them 
once the underlying XML issue is fixed. (If you file a separate JIRA and 
reference that in your TODO comment, others will have the context if they want 
it).
   
   I was thinking about the same, so it was not much work anyway, and it is 
better separated now
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija edited a comment on issue #860: SOLR-13734 JWTAuthPlugin to support multiple issuers

2019-09-16 Thread GitBox
gerlowskija edited a comment on issue #860: SOLR-13734 JWTAuthPlugin to support 
multiple issuers
URL: https://github.com/apache/lucene-solr/pull/860#issuecomment-531812473
 
 
   Cross-posting this comment from JIRA.  (When JIRAs have github PR's what's 
the appropriate place for higher-level, non-line-specific review comments?  Is 
there a consensus on this?  Is one more discoverable than the other?)
   
   
   
   What's the purpose of the distinction between the "primary" issuer and the 
secondary issuers under the issuers key? I imagine the "primary" issuer is just 
kept around for back-compat purposes? If it's just for back-compat, I 
understand it needs to be done, but it's a shame. The JSON would be easier to 
understand (IMO) if all of the issuer-specific properties lived only under the 
new `issuers` key post-9.0. Would it be possible to deprecate iss, wellKnownUrl 
etc outside of the issuers hash in this PR? That'd make it easier to get to the 
cleaner alternative at some point down the road...
   
   Outside of that though, looks good to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-9658) Caches should have an optional way to clean if idle for 'x' mins

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930630#comment-16930630
 ] 

ASF subversion and git services commented on SOLR-9658:
---

Commit e04917dc9f66bad9cebfc945cac7f39f5ff1f0c2 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e04917d ]

SOLR-9658: Max idle time support for SolrCache implementations.


> Caches should have an optional way to clean if idle for 'x' mins
> 
>
> Key: SOLR-9658
> URL: https://issues.apache.org/jira/browse/SOLR-9658
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch, 
> SOLR-9658.patch, SOLR-9658.patch, SOLR-9658.patch
>
>
> If a cache is idle for long, it consumes precious memory. It should be 
> configurable to clear the cache if it was not accessed for 'x' secs. The 
> cache configuration can have an extra config {{maxIdleTime}} . if we wish it 
> to the cleaned after 10 mins of inactivity set it to {{maxIdleTime=600}}. 
> [~dragonsinth] would it be a solution for the memory leak you mentioned?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija commented on issue #860: SOLR-13734 JWTAuthPlugin to support multiple issuers

2019-09-16 Thread GitBox
gerlowskija commented on issue #860: SOLR-13734 JWTAuthPlugin to support 
multiple issuers
URL: https://github.com/apache/lucene-solr/pull/860#issuecomment-531812473
 
 
   Cross-posting this comment from JIRA.  (When JIRAs have github PR's what's 
the appropriate place for higher-level, non-line-specific review comments?  Is 
there a consensus on this?  Is one more discoverable than the other?)
   
   
   
   What's the purpose of the distinction between the "primary" issuer and the 
secondary issuers under the issuers key? I imagine the "primary" issuer is just 
kept around for back-compat purposes? If it's just for back-compat, I 
understand it needs to be done, but it's a shame. The JSON would be easier to 
understand (IMO) if all of the issuer-specific properties lived only under the 
new `issuers` key post-9.0. Would it be possible to deprecate iss, wellKnownUrl 
etc outside of the issuers hash in this PR? That'd make it easier to get to the 
cleaner alternative at some point down the road...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13761) Cannot index documents into Solr 8.1.1

2019-09-16 Thread Cassandra Targett (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett resolved SOLR-13761.
--
Resolution: Invalid

You should take this problem to the solr-user mailing list before filing a 
Jira. We accept Jira issues for identified bugs or enhancements only - 
troubleshooting support is done only on the mailing lists: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc

The error is pointing to an incorrect field definition, likely involving 
docValues. There were many changes here between 6.6 and 8.1, so when you ask 
the list please include details about the field type definition and how you 
upgraded.

> Cannot index documents into Solr 8.1.1
> --
>
> Key: SOLR-13761
> URL: https://issues.apache.org/jira/browse/SOLR-13761
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Affects Versions: 8.1.1
> Environment: I tried to index only one document via Solrj and also 
> using the Solr Admin UI but got the same error all the time.
>Reporter: Bhuvaneshwar Venkatraman
>Priority: Major
>
> I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
> is in use. All configurations and schema files are exactly alike, but when I 
> try to index the same documents Solr throws *cannot change field "FIELD_NAME" 
> from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index 
> options=DOCS* for a specific field which is of type *string*. It is a 
> required field so cannot be omitted. 
> For another Collection in the same core(Solr 8.1.1), Solr throws *cannot 
> change docValues type from SORTED_NUMERIC to SORTED for field 
> "ANOTHER_FIELD_NAME"* to the field of type *string.*
> *Note:* It is indexing perfectly in the existing Solr i.e., 6.6.2



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13734) JWTAuthPlugin to support multiple issuers

2019-09-16 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930623#comment-16930623
 ] 

Jason Gerlowski commented on SOLR-13734:


What's the purpose of the distinction between the "primary" issuer and the 
secondary issuers under the {{issuers}} key?  I imagine the "primary" issuer is 
just kept around for back-compat purposes?  If it's just for back-compat, I 
understand it needs to be done, but it's a shame.  The JSON would be easier to 
understand (IMO) if everything lived under the new {{issuers}} key post-9.0.  
Would it be possible to mark {{iss}}, {{jwkUrl}} etc as deprecated outside of 
the {{issuers}} hash in this PR?  That'd make it easier to get to the cleaner 
alternative at some point down the road...

> JWTAuthPlugin to support multiple issuers
> -
>
> Key: SOLR-13734
> URL: https://issues.apache.org/jira/browse/SOLR-13734
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: security
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: JWT, authentication, pull-request-available
> Fix For: 8.3
>
> Attachments: jwt-authentication-plugin.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some large enterprise environments, there is more than one [Identity 
> Provider|https://en.wikipedia.org/wiki/Identity_provider] to issue tokens for 
> users. The equivalent example from the public internet is logging in to a 
> website and choose between multiple pre-defined IdPs (such as Google, GitHub, 
> Facebook etc) in the Oauth2/OIDC flow.
> In the enterprise the IdPs could be public ones but most likely they will be 
> private IdPs in various networks inside the enterprise. Users will interact 
> with a search application, e.g. one providing enterprise wide search, and 
> will authenticate with one out of several IdPs depending on their local 
> affiliation. The search app will then request an access token (JWT) for the 
> user and issue requests to Solr using that token.
> The JWT plugin currently supports exactly one IdP. This JIRA will extend 
> support for multiple IdPs for access token validation only. To limit the 
> scope of this Jira, Admin UI login must still happen to the "primary" IdP. 
> Supporting multiple IdPs for Admin UI login can be done in followup issues.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-09-16 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930608#comment-16930608
 ] 

Bruno Roustant edited comment on LUCENE-8920 at 9/16/19 2:29 PM:
-

Open-addressing benchmark to store byte labels in an array of size between 8 
and 256.

"Array" means array size.

"tries" means max number of slot comparisons allowed (when a slot we want to 
put a label to is not empty - move to the next slot).

"loadFactor" = max load factor (num labels / array size) for which we can add 
all the labels by respecting "tries". Above this max load factor, the 
open-addressing fails to store all labels and fallbacks to binary search. This 
max load factor is an average of the max load factor for 1000 random 
constructions done by the benchmark.

"memory" = space increase factor = 1 / loadFactor.

 

Array=8, tries=1, loadFactor=0.400125, memory x 2.499219
 Array=8, tries=2, loadFactor=0.63825, memory x 1.5667841
 Array=8, tries=3, loadFactor=0.78675, memory x 1.2710518
 Array=12, tries=1, loadFactor=0.34391674, memory x 2.9076805
 Array=12, tries=2, loadFactor=0.54183316, memory x 1.8455865
 Array=12, tries=3, loadFactor=0.6964172, memory x 1.4359208
 Array=16, tries=1, loadFactor=0.29825, memory x 3.352892
 Array=16, tries=2, loadFactor=0.5039375, memory x 1.9843731
 Array=16, tries=3, loadFactor=0.646625, memory x 1.5464914
 Array=16, tries=4, loadFactor=0.7493125, memory x 1.3345567
 Array=24, tries=1, loadFactor=0.25141662, memory x 3.9774618
 Array=24, tries=2, loadFactor=0.44929162, memory x 2.225726
 Array=24, tries=3, loadFactor=0.58083373, memory x 1.7216631
 Array=24, tries=4, loadFactor=0.6867924, memory x 1.4560441
 Array=32, tries=1, loadFactor=0.22528125, memory x 4.4388957
 Array=32, tries=2, loadFactor=0.4059375, memory x 2.4634335
 Array=32, tries=3, loadFactor=0.53625, memory x 1.8648019
 Array=32, tries=4, loadFactor=0.63653123, memory x 1.5710148
 Array=32, tries=5, loadFactor=0.7165625, memory x 1.3955517
 Array=48, tries=1, loadFactor=0.1885, memory x 5.2910066
 Array=48, tries=2, loadFactor=0.36193773, memory x 2.762906
 Array=48, tries=3, loadFactor=0.49154142, memory x 2.0344167
 Array=48, tries=4, loadFactor=0.5871248, memory x 1.7032154
 Array=48, tries=5, loadFactor=0.6700212, memory x 1.4924902
 Array=64, tries=1, loadFactor=0.17101562, memory x 5.8474193
 Array=64, tries=2, loadFactor=0.33707812, memory x 2.9666712
 Array=64, tries=3, loadFactor=0.46217188, memory x 2.1636972
 Array=64, tries=4, loadFactor=0.56409377, memory x 1.7727549
 Array=64, tries=5, loadFactor=0.6392813, memory x 1.5642567
 Array=64, tries=6, loadFactor=0.6963125, memory x 1.4361368
 Array=96, tries=1, loadFactor=0.15294789, memory x 6.5381746
 Array=96, tries=2, loadFactor=0.31204194, memory x 3.2046974
 Array=96, tries=3, loadFactor=0.42829168, memory x 2.3348575
 Array=96, tries=4, loadFactor=0.5277291, memory x 1.8949116
 Array=96, tries=5, loadFactor=0.60920805, memory x 1.6414753
 Array=96, tries=6, loadFactor=0.66278124, memory x 1.5087935
 Array=128, tries=1, loadFactor=0.15130469, memory x 6.6091805
 Array=128, tries=2, loadFactor=0.3054219, memory x 3.2741597
 Array=128, tries=3, loadFactor=0.43253124, memory x 2.3119717
 Array=128, tries=4, loadFactor=0.5283203, memory x 1.8927912
 Array=128, tries=5, loadFactor=0.6043203, memory x 1.6547517
 Array=128, tries=6, loadFactor=0.66267186, memory x 1.5090425
 Array=128, tries=7, loadFactor=0.70567185, memory x 1.4170892
 Array=192, tries=1, loadFactor=0.14355215, memory x 6.9661093
 Array=192, tries=2, loadFactor=0.32579714, memory x 3.0693946
 Array=192, tries=3, loadFactor=0.42888057, memory x 2.3316514
 Array=192, tries=4, loadFactor=0.5238486, memory x 1.9089485
 Array=192, tries=5, loadFactor=0.59191173, memory x 1.6894411
 Array=192, tries=6, loadFactor=0.65411395, memory x 1.5287856
 Array=192, tries=7, loadFactor=0.6975258, memory x 1.4336387
 Array=256, tries=1, loadFactor=1.0, memory x 1.0
 Array=256, tries=2, loadFactor=1.0, memory x 1.0
 Array=256, tries=3, loadFactor=1.0, memory x 1.0
 Array=256, tries=4, loadFactor=1.0, memory x 1.0
 Array=256, tries=5, loadFactor=1.0, memory x 1.0
 Array=256, tries=6, loadFactor=1.0, memory x 1.0
 Array=256, tries=7, loadFactor=1.0, memory x 1.0
 Array=256, tries=8, loadFactor=1.0, memory x 1.0


was (Author: bruno.roustant):
Open-addressing benchmark to store byte labels in an array of size between 8 
and 256.

"Array" means array size.

"tries" means max number of slot comparisons allowed (when a slot we want to 
put a label to is not empty - move to the next slot).

"loadFactor" = max load factor (num labels / array size) for which we can add 
all the labels by respecting "tries". Above this max load factor, the 
open-addressing fails to store all labels and fallbacks to binary search. This 
max load factor is an average of 1000 random constructions done by the 

[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-09-16 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930601#comment-16930601
 ] 

Bruno Roustant edited comment on LUCENE-8920 at 9/16/19 2:26 PM:
-

I tried a quick benchmark to evaluate the load factor (num labels / array size) 
we can expect for open-addressing. This gives us a good estimate of the space 
requirement.

This load factor depends on the number of tries (called L above) we accept - 
this defines the worst case perf. For each power-of-two array size, the 
benchmark attempts various L, and outputs the max load factor (above this load 
factor, the open-addressing aborts and we fallback to binary search) (benchmark 
print in next post below).

Outcome:
 * Load factor slightly above 50% for good perf (for L = log(N)/2 + 1), as 
expected for open-addressing in literature.
 * It is also possible to use a power-of-two x 1.5 array size with efficient 
hash (to stay at load factor 50% in all cases).
 * We have to encode the “empty slot” (no label in a given array slot). 
Probably with both a 0 label and 0 value (if the node needs that, then abort 
and fallback to binary search).

This means we have to expect a space increase of 2x (compared to binary 
search), for better perf than binary search (L = log(N)/2 + 1, which is the 
worst case, most of the time the open-addressing stops when encountering an 
empty slot before L).

 

To me this balance between space and performance cannot be hardcoded. This 
depends on the use-case. There should be a balance tuning parameter in the FST 
constructor (e.g. max-perf, perf-over-space, space-over-perf, min-space). And 
based on this balance we could set the values of a couple of thresholds that 
define when to use direct-addressing, open-addressing, binary-search, 
compact-list.


was (Author: bruno.roustant):
I tried a quick benchmark to evaluate the load factor (num labels / array size) 
we can expect for open-addressing. This gives us a good estimate of the space 
requirement.

This load factor depends on the number of tries (called L above) we accept - 
this defines the worst case perf. For each power-of-two array size, the 
benchmark tries various L, and outputs the max load factor (above this load 
factor, the open-addressing aborts and we fallback to binary search) (benchmark 
print in next post below).

Outcome:
 * Load factor slightly above 50% for good perf (for L = log(N)/2 + 1), as 
expected for open-addressing in literature.
 * It is also possible to use a power-of-two x 1.5 array size with efficient 
hash (to stay at load factor 50% in all cases).
 * We have to encode the “empty slot” (no label in a given array slot). 
Probably with both a 0 label and 0 value (if the node needs that, then abort 
and fallback to binary search).

This means we have to expect a space increase of 2x (compared to binary 
search), for better perf than binary search (L = log(N)/2 + 1, which is the 
worst case, most of the time the open-addressing stops when encountering an 
empty slot before L).

 

To me this balance between space and performance cannot be hardcoded. This 
depends on the use-case. There should be a balance tuning parameter in the FST 
constructor (e.g. max-perf, perf-over-space, space-over-perf, min-space). And 
based on this balance we could set the values of a couple of thresholds that 
define when to use direct-addressing, open-addressing, binary-search, 
compact-list.

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-09-16 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930608#comment-16930608
 ] 

Bruno Roustant commented on LUCENE-8920:


Open-addressing benchmark to store byte labels in an array of size between 8 
and 256.

"Array" means array size.

"tries" means max number of slot comparisons allowed (when a slot we want to 
put a label to is not empty - move to the next slot).

"loadFactor" = max load factor (num labels / array size) for which we can add 
all the labels by respecting "tries". Above this max load factor, the 
open-addressing fails to store all labels and fallbacks to binary search. This 
max load factor is an average of 1000 random constructions done by the 
benchmark.

"memory" = space increase factor = 1 / loadFactor.

 

Array=8, tries=1, loadFactor=0.400125, memory x 2.499219
Array=8, tries=2, loadFactor=0.63825, memory x 1.5667841
Array=8, tries=3, loadFactor=0.78675, memory x 1.2710518
Array=12, tries=1, loadFactor=0.34391674, memory x 2.9076805
Array=12, tries=2, loadFactor=0.54183316, memory x 1.8455865
Array=12, tries=3, loadFactor=0.6964172, memory x 1.4359208
Array=16, tries=1, loadFactor=0.29825, memory x 3.352892
Array=16, tries=2, loadFactor=0.5039375, memory x 1.9843731
Array=16, tries=3, loadFactor=0.646625, memory x 1.5464914
Array=16, tries=4, loadFactor=0.7493125, memory x 1.3345567
Array=24, tries=1, loadFactor=0.25141662, memory x 3.9774618
Array=24, tries=2, loadFactor=0.44929162, memory x 2.225726
Array=24, tries=3, loadFactor=0.58083373, memory x 1.7216631
Array=24, tries=4, loadFactor=0.6867924, memory x 1.4560441
Array=32, tries=1, loadFactor=0.22528125, memory x 4.4388957
Array=32, tries=2, loadFactor=0.4059375, memory x 2.4634335
Array=32, tries=3, loadFactor=0.53625, memory x 1.8648019
Array=32, tries=4, loadFactor=0.63653123, memory x 1.5710148
Array=32, tries=5, loadFactor=0.7165625, memory x 1.3955517
Array=48, tries=1, loadFactor=0.1885, memory x 5.2910066
Array=48, tries=2, loadFactor=0.36193773, memory x 2.762906
Array=48, tries=3, loadFactor=0.49154142, memory x 2.0344167
Array=48, tries=4, loadFactor=0.5871248, memory x 1.7032154
Array=48, tries=5, loadFactor=0.6700212, memory x 1.4924902
Array=64, tries=1, loadFactor=0.17101562, memory x 5.8474193
Array=64, tries=2, loadFactor=0.33707812, memory x 2.9666712
Array=64, tries=3, loadFactor=0.46217188, memory x 2.1636972
Array=64, tries=4, loadFactor=0.56409377, memory x 1.7727549
Array=64, tries=5, loadFactor=0.6392813, memory x 1.5642567
Array=64, tries=6, loadFactor=0.6963125, memory x 1.4361368
Array=96, tries=1, loadFactor=0.15294789, memory x 6.5381746
Array=96, tries=2, loadFactor=0.31204194, memory x 3.2046974
Array=96, tries=3, loadFactor=0.42829168, memory x 2.3348575
Array=96, tries=4, loadFactor=0.5277291, memory x 1.8949116
Array=96, tries=5, loadFactor=0.60920805, memory x 1.6414753
Array=96, tries=6, loadFactor=0.66278124, memory x 1.5087935
Array=128, tries=1, loadFactor=0.15130469, memory x 6.6091805
Array=128, tries=2, loadFactor=0.3054219, memory x 3.2741597
Array=128, tries=3, loadFactor=0.43253124, memory x 2.3119717
Array=128, tries=4, loadFactor=0.5283203, memory x 1.8927912
Array=128, tries=5, loadFactor=0.6043203, memory x 1.6547517
Array=128, tries=6, loadFactor=0.66267186, memory x 1.5090425
Array=128, tries=7, loadFactor=0.70567185, memory x 1.4170892
Array=192, tries=1, loadFactor=0.14355215, memory x 6.9661093
Array=192, tries=2, loadFactor=0.32579714, memory x 3.0693946
Array=192, tries=3, loadFactor=0.42888057, memory x 2.3316514
Array=192, tries=4, loadFactor=0.5238486, memory x 1.9089485
Array=192, tries=5, loadFactor=0.59191173, memory x 1.6894411
Array=192, tries=6, loadFactor=0.65411395, memory x 1.5287856
Array=192, tries=7, loadFactor=0.6975258, memory x 1.4336387
Array=256, tries=1, loadFactor=1.0, memory x 1.0
Array=256, tries=2, loadFactor=1.0, memory x 1.0
Array=256, tries=3, loadFactor=1.0, memory x 1.0
Array=256, tries=4, loadFactor=1.0, memory x 1.0
Array=256, tries=5, loadFactor=1.0, memory x 1.0
Array=256, tries=6, loadFactor=1.0, memory x 1.0
Array=256, tries=7, loadFactor=1.0, memory x 1.0
Array=256, tries=8, loadFactor=1.0, memory x 1.0

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing 

[GitHub] [lucene-solr] mikemccand commented on issue #877: LUCENE-8978: Maximal Bottom Score Based Early Termination

2019-09-16 Thread GitBox
mikemccand commented on issue #877: LUCENE-8978: Maximal Bottom Score Based 
Early Termination
URL: https://github.com/apache/lucene-solr/pull/877#issuecomment-531797932
 
 
   We should expect to see an improvement to red-line QPS with this change, 
right?  I.e. it is spending less CPU per query since the thread slices are 
communicating with one another as they collect about how good a new hit must be 
to be competitive.
   
   Versus today, where each query thread collects the full top N and only in 
the end does a partial merge sort to pick the total top N.
   
   We should maybe expect query latencies (at well below red-line QPS) to get a 
little worse than concurrent search today, because we are adding a bit of 
thread contention?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13761) Cannot index documents into Solr 8.1.1

2019-09-16 Thread Bhuvaneshwar Venkatraman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvaneshwar Venkatraman updated SOLR-13761:

Description: 
I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
is in use. All configurations and schema files are exactly alike, but when I 
try to index the same documents Solr throws *cannot change field "FIELD_NAME" 
from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index 
options=DOCS* for a specific field which is of type *string*. It is a required 
field so cannot be omitted. 

For another Collection in the same core(Solr 8.1.1), Solr throws *cannot change 
docValues type from SORTED_NUMERIC to SORTED for field "ANOTHER_FIELD_NAME"* to 
the field of type *string.*

*Note:* It is indexing perfectly in the existing Solr i.e., 6.6.2

  was:
I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
is in use. All configurations and schema files are exactly alike, but when I 
try to index the same documents Solr throws *cannot change field "FIELD_NAME" 
from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index 
options=DOCS* for a specific field which is of type *string*. It is a required 
field so cannot be omitted. 

For another Collection in the same core(Solr 8.1.1), Solr throws *cannot change 
docValues type from SORTED_NUMERIC to SORTED for field "ANOTHER_FIELD_NAME"* to 
field of type *string.*

*Note:* It is indexing perfectly in the existing Solr i.e., 6.6.2


> Cannot index documents into Solr 8.1.1
> --
>
> Key: SOLR-13761
> URL: https://issues.apache.org/jira/browse/SOLR-13761
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Affects Versions: 8.1.1
> Environment: I tried to index only one document via Solrj and also 
> using the Solr Admin UI but got the same error all the time.
>Reporter: Bhuvaneshwar Venkatraman
>Priority: Major
>
> I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
> is in use. All configurations and schema files are exactly alike, but when I 
> try to index the same documents Solr throws *cannot change field "FIELD_NAME" 
> from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index 
> options=DOCS* for a specific field which is of type *string*. It is a 
> required field so cannot be omitted. 
> For another Collection in the same core(Solr 8.1.1), Solr throws *cannot 
> change docValues type from SORTED_NUMERIC to SORTED for field 
> "ANOTHER_FIELD_NAME"* to the field of type *string.*
> *Note:* It is indexing perfectly in the existing Solr i.e., 6.6.2



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13761) Cannot index documents into Solr 8.1.1

2019-09-16 Thread Bhuvaneshwar Venkatraman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvaneshwar Venkatraman updated SOLR-13761:

Description: 
I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
is in use. All configurations and schema files are exactly alike, but when I 
try to index the same documents Solr throws *cannot change field "FIELD_NAME" 
from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index 
options=DOCS* for a specific field which is of type *string*. It is a required 
field so cannot be omitted. 

For another Collection in the same core(Solr 8.1.1), Solr throws *cannot change 
docValues type from SORTED_NUMERIC to SORTED for field "ANOTHER_FIELD_NAME"* to 
field of type *string.*

*Note:* It is indexing perfectly in the existing Solr i.e., 6.6.2

  was:
I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
is in use. All configurations and schema files are exactly alike, but when I 
try to index the same documents Solr throws *cannot change field "FIELD_NAME" 
from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index 
options=DOCS* for a specific field which is of type *string*. It is a required 
field so cannot be omitted. 

For another Collection Solr throws *cannot change docValues type from 
SORTED_NUMERIC to SORTED for field "ANOTHER_FIELD_NAME"* to field of type 
*string.*

*Note:* It is indexing perfectly in the existing Solr i.e., 6.6.2


> Cannot index documents into Solr 8.1.1
> --
>
> Key: SOLR-13761
> URL: https://issues.apache.org/jira/browse/SOLR-13761
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Affects Versions: 8.1.1
> Environment: I tried to index only one document via Solrj and also 
> using the Solr Admin UI but got the same error all the time.
>Reporter: Bhuvaneshwar Venkatraman
>Priority: Major
>
> I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
> is in use. All configurations and schema files are exactly alike, but when I 
> try to index the same documents Solr throws *cannot change field "FIELD_NAME" 
> from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent index 
> options=DOCS* for a specific field which is of type *string*. It is a 
> required field so cannot be omitted. 
> For another Collection in the same core(Solr 8.1.1), Solr throws *cannot 
> change docValues type from SORTED_NUMERIC to SORTED for field 
> "ANOTHER_FIELD_NAME"* to field of type *string.*
> *Note:* It is indexing perfectly in the existing Solr i.e., 6.6.2



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8980) Optimise SegmentTermsEnum.seekExact performance

2019-09-16 Thread Guoqiang Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Jiang updated LUCENE-8980:
---
Description: 
*Description*

In Elasticsearch, each document has an _id field that uniquely identifies it, 
which is indexed so that documents can be looked up from Lucene. When users 
write Elasticsearch with self-generated _id values, even if the conflict rate 
is very low, Elasticsearch has to check _id uniqueness through Lucene API for 
each document, which result in poor write performance.

 

*Solution*

1. Choose a better _id generator before writing ES

Different _id formats have a great impact on write performance. We have 
verified this in production cluster. Users can refer to the following blog and 
choose a better _id generator.

[http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html]

2. Optimise with min/maxTerm metrics in Lucene

As Lucene stores min/maxTerm metrics for each segment and field, we can use 
those metrics to optimise performance of Lucene look up API. When calling 
SegmentTermsEnum.seekExact() to lookup an term in one segment, we can check 
whether the term fall in the range of minTerm and maxTerm, so that wo skip some 
useless segments as soon as possible.
 

*Tests*

I have made some write benchmark using _id in UUID V1 format, and the benchmark 
result is as follows:
||Branch||Write speed after 4h||CPU cost||Overall improvement||Write speed 
after 8h||CPU cost||Overall improvement||
|Original Lucene|29.9w/s|68.4%|N/A|26.7w/s|66.6%|N/A|
|Optimised Lucene|34.5w/s
(+15.4%)|63.8
(-6.7%)|+22.1%|31.5w/s
(18.0%)|61.5
(-7.7%)|+25.7%|

As shown above, after 8 hours of continuous writing, write speed improves by 
18.0%, CPU cost decreases by 7.7%, and overall performance improves by 25.7%. 
The Elasticsearch GET API and ids query would get similar performance 
improvements.

It should be noted that the benchmark test needs to be run several hours 
continuously, because the performance improvements is not obvious when the data 
is completely cached or the number of segments is too small.

  was:
*Description*

In Elasticsearch, each document has an _id field that uniquely identifies it, 
which is indexed so that documents can be looked up from Lucene. When users 
write Elasticsearch with self-generated _id values, even if the conflict rate 
is very low, ES have to check _id uniqueness through Lucene API for each 
document, which result in poor write performance. 

 

*Solution*

1. Choose a better _id generator

Different _id formats have a great impact on write performance. We have 
verified this in production cluster. Users can refer to the following blog and 
choose a better _id generator.

[http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html]

2. Optimise with min/maxTerm metrics in Lucene

As Lucene store min/maxTerm metrics for each segment and field, we can use 
those metrics to optimise performance of Lucene look up API.

 

*Tests*

I have made some write benchmark using _id in UUID V1 format, and the benchmark 
result is as follows:
||Branch||Write speed after 4h||CPU cost||Overall improvement||Write speed 
after 8h||CPU cost||Overall improvement||
|Original Lucene|29.9w/s|68.4%|N/A|26.7w/s|66.6%|N/A|
|Optimised Lucene|34.5w/s
(+15.4%)|63.8
(-6.7%)|+22.1%|31.5w/s
(18.0%)|61.5
(-7.7%)|+25.7%|

As shown above, after 8 hours of continuous writing, write performance improves 
by 18.0%, CPU overhead decreases by 7.7%, and overall performance improves by 
25.7%. The Elasticsearch GET API and ids query would get similar performance 
improvements.

It should be noted that the benchmark test needs to run several hours 
continuously, because the performance improvements is not obvious when the data 
is completely cached or the number of segments is too small.


> Optimise SegmentTermsEnum.seekExact performance
> ---
>
> Key: LUCENE-8980
> URL: https://issues.apache.org/jira/browse/LUCENE-8980
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.2
>Reporter: Guoqiang Jiang
>Priority: Major
>  Labels: performance
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> In Elasticsearch, each document has an _id field that uniquely identifies it, 
> which is indexed so that documents can be looked up from Lucene. When users 
> write Elasticsearch with self-generated _id values, even if the conflict rate 
> is very low, Elasticsearch has to check _id uniqueness through Lucene API for 
> each document, which result in poor write performance.
>  
> *Solution*
> 1. Choose a better _id generator before writing ES
> Different _id formats have a great impact on write performance. We have 
> 

[GitHub] [lucene-solr] jgq2008303393 opened a new pull request #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-09-16 Thread GitBox
jgq2008303393 opened a new pull request #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884
 
 
   
   
   
   # Description
   In Elasticsearch, each document has an _id field that uniquely identifies 
it, which is indexed so that documents can be looked up from Lucene. When users 
write Elasticsearch with self-generated _id values, even if the conflict rate 
is very low, Elasticsearch has to check _id uniqueness through Lucene API for 
each document, which result in poor write performance. 
   
   # Solution
   1. Choose a better _id generator before writing ES
   Different _id formats have a great impact on write performance. We have 
verified this in production cluster. Users can refer to the following blog and 
choose a better _id generator.
   
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
   2. Optimise with min/maxTerm metrics in Lucene
   As Lucene stores min/maxTerm metrics for each segment and field, we can use 
those metrics to optimise performance of Lucene look up API. When calling 
SegmentTermsEnum.seekExact() to lookup an term in one segment, we can check 
whether the term fall in the range of minTerm and maxTerm, so that wo skip some 
useless segments as soon as possible.
   
   
   # Tests
   I have made some write benchmark using _id in UUID V1 format, and the 
benchmark result is as follows:
   
   | Branch  | Write speed after 4h  | CPU cost | Overall improvement | 
Write speed after 8h  | CPU cost | Overall improvement | 
   | -- | :---:  | :---: | :---:  | 
:---: | :---:  | :---: |
   | Original Lucene | 29.9w/s | 68.4% | N/A | 26.7w/s | 66.6% | N/A |
   | Optimised Lucene | 34.5w/s(+15.4%) | 63.8(-6.7%) | +22.1% | 31.5w/s(18.0%) 
| 61.5(-7.7%) | +25.7% |
   
   As shown above, after 8 hours of continuous writing, write speed improves by 
18.0%, CPU cost decreases by 7.7%, and overall performance improves by 25.7%. 
The Elasticsearch GET API and ids query would get similar performance 
improvements.
   
   It should be noted that the benchmark test needs to be run several hours 
continuously, because the performance improvements is not obvious when the data 
is completely cached or the number of segments is too small.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on issue #883: SOLR-13762: Add binary support to XMLCodec

2019-09-16 Thread GitBox
thomaswoeckinger commented on issue #883: SOLR-13762: Add binary support to 
XMLCodec
URL: https://github.com/apache/lucene-solr/pull/883#issuecomment-531720937
 
 
   @gerlowskija please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger opened a new pull request #883: SOLR-13762: Add binary support to XMLCodec

2019-09-16 Thread GitBox
thomaswoeckinger opened a new pull request #883: SOLR-13762: Add binary support 
to XMLCodec
URL: https://github.com/apache/lucene-solr/pull/883
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I am authorized to contribute this code to the ASF and have removed 
any code I do not have a license to distribute.
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13762) Support binary values when using XMLCodec

2019-09-16 Thread Jira
Thomas Wöckinger created SOLR-13762:
---

 Summary: Support binary values when using XMLCodec
 Key: SOLR-13762
 URL: https://issues.apache.org/jira/browse/SOLR-13762
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: query parsers, Response Writers, Server, SolrJ, 
UpdateRequestProcessors
Affects Versions: master (9.0), 8.3
Reporter: Thomas Wöckinger


As Solr can handle binary fields, it should be possible to use XML as Codec to 
encode and decode it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13761) Cannot index documents into Solr 8.1.1

2019-09-16 Thread Bhuvaneshwar Venkatraman (Jira)
Bhuvaneshwar Venkatraman created SOLR-13761:
---

 Summary: Cannot index documents into Solr 8.1.1
 Key: SOLR-13761
 URL: https://issues.apache.org/jira/browse/SOLR-13761
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Schema and Analysis
Affects Versions: 8.1.1
 Environment: I tried to index only one document via Solrj and also 
using the Solr Admin UI but got the same error all the time.
Reporter: Bhuvaneshwar Venkatraman


I created a cloud Solr 8.1.1 with zookeeper similar to cloud Solr 6.6.2 which 
is in use. All configurations and schema files are exactly alike one another, 
but when I try to index the same documents Solr throws *cannot change field 
"FIELD_NAME" from* *index options=DOCS_AND_FREQS_AND_POSITIONS to inconsistent 
index options=DOCS* for a specific field which is of type *string*. It is a 
required field so cannot be omitted. 

For another Collection Solr throws *cannot change docValues type from 
SORTED_NUMERIC to SORTED for field "ANOTHER_FIELD_NAME"* to field of type 
*string.*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13272) Interval facet support for JSON faceting

2019-09-16 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930268#comment-16930268
 ] 

Mikhail Khludnev commented on SOLR-13272:
-

+1. Thank you, [~apoorvprecisely] and [~munendrasn]! 

> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13272.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org