[jira] [Commented] (SOLR-13677) All Metrics Gauges should be unregistered by the objects that registered them

2019-09-25 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938274#comment-16938274
 ] 

Ishan Chattopadhyaya commented on SOLR-13677:
-

This is a blocker for 8.3, but is still unassigned. [~ab], based on the private 
conversations, should I assign it to you?

(Or, [~noble.paul], should I assign it back to you, since you were attempting 
the fix?)

> All Metrics Gauges should be unregistered by the objects that registered them
> -
>
> Key: SOLR-13677
> URL: https://issues.apache.org/jira/browse/SOLR-13677
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Noble Paul
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The life cycle of Metrics producers are managed by the core (mostly). So, if 
> the lifecycle of the object is different from that of the core itself, these 
> objects will never be unregistered from the metrics registry. This will lead 
> to memory leaks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13272) Interval facet support for JSON faceting

2019-09-25 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N resolved SOLR-13272.
-
Resolution: Fixed

Thanks [~apoorvprecisely]
Special thanks to [~ichattopadhyaya] and [~mkhl] for the reviews

> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13272-doc.patch, SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13272) Interval facet support for JSON faceting

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938267#comment-16938267
 ] 

ASF subversion and git services commented on SOLR-13272:


Commit d23303649ace3056b78833d97c00f87bc0c7bcdb in lucene-solr's branch 
refs/heads/branch_8x from Munendra S N
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d233036 ]

SOLR-13272: add documentation for arbitrary range in JSON facet


> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13272-doc.patch, SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13272) Interval facet support for JSON faceting

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938261#comment-16938261
 ] 

ASF subversion and git services commented on SOLR-13272:


Commit 42e64ffd53def30dd2bfe2ddb955e9c8d8e83329 in lucene-solr's branch 
refs/heads/master from Munendra S N
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=42e64ff ]

SOLR-13272: add documentation for arbitrary range in JSON facet


> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13272-doc.patch, SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13788) Resolve multiple IPs from specified zookeeper URL

2019-09-25 Thread Ween Jiann (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ween Jiann updated SOLR-13788:
--
Component/s: SolrCloud

> Resolve multiple IPs from specified zookeeper URL
> -
>
> Key: SOLR-13788
> URL: https://issues.apache.org/jira/browse/SOLR-13788
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.1.1
>Reporter: Ween Jiann
>Priority: Minor
>  Labels: features
>
> Use DNS lookup to get the IPs of the servers listed in ZK_HOST or -z param. 
> This would help cloud deployment as DNS is often used to group services 
> together.
> [https://lucene.apache.org/solr/guide/8_1/setting-up-an-external-zookeeper-ensemble.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13788) Resolve multiple IPs from specified zookeeper URL

2019-09-25 Thread Ween Jiann (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938236#comment-16938236
 ] 

Ween Jiann edited comment on SOLR-13788 at 9/26/19 3:46 AM:


I'm trying to modify the helm chart for solr such that it works for kubernetes 
(k8s) deployment correctly. There needs to be a particular change in the way 
solr resolves zookeepers hostname in order for this to happen.

 

Let me explain...

The standard way to configure solr is by listing all the zookeeper hostname/IP 
in either:
 * {{solr.in.sh}} or {{solr.in.cmd}}
 * {{zoo.cfg}}
 * {{-z param}}

For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

 

However, when it comes to cloud deployment, in particular on k8s using helm 
chart, this is not an ideal situation as the user is required to modify zk_host 
each time they scale the number of zookeeper up/down.

For example (scale down): ZK_HOST="zk1:2181,zk2:2181".

For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

This cannot be done automatically using in helm/k8s. In k8s, this parameter 
should remain static, meaning that it should not be changed after deployment of 
the chart.

For example (k8s): ZK_HOST="zk-headless:2181".

 

What a chart can do is to create a service with a DNS name such as zk-headless 
that contains all the IP of the zookeepers, and as zookeeper scales, the number 
of IP resolved from zk-headless changes. I'm asking if there could be an 
improvement that allows solr to resolve multiple zookeeper IPs from a single 
name.

 

I will also raise this question on the user's list.


was (Author: lwj5):
I'm trying to modify the helm chart for solr such that it works for kubernetes 
(k8s) deployment correctly. There needs to be a particular change in the way 
solr resolves zookeepers hostname in order for this to happen.

 

Let me explain...

The standard way to configure solr is by listing all the zookeeper hostname/IP 
in either:
 * {{solr.in.sh}} or {{solr.in.cmd}}
 * {{zoo.cfg}}
 * {{-z param}}

For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

 

However, when it comes to cloud deployment, in particular on k8s using helm 
chart, this is not an ideal situation as the user is required to modify zk_host 
each time they scale the number of zookeeper up/down.

For example (scale down): ZK_HOST="zk1:2181,zk2:2181".

For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

This cannot be done automatically using in helm/k8s. In k8s, this parameter 
should remain static, meaning that it should not be changed after deployment of 
the chart.

For example (k8s): ZK_HOST="zk-headless:2181".

 

What a chart can do is to create a service with a DNS name such as zk-headless 
that contains all the IP of the zookeeper, and as zookeeper scales, the number 
of IP resolved from zk-headless changes. I'm asking if there could be an 
improvement that allows solr to resolve multiple zookeeper IPs from a single 
name.

 

I will also raise this question on the user's list.

> Resolve multiple IPs from specified zookeeper URL
> -
>
> Key: SOLR-13788
> URL: https://issues.apache.org/jira/browse/SOLR-13788
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.1.1
>Reporter: Ween Jiann
>Priority: Minor
>  Labels: features
>
> Use DNS lookup to get the IPs of the servers listed in ZK_HOST or -z param. 
> This would help cloud deployment as DNS is often used to group services 
> together.
> [https://lucene.apache.org/solr/guide/8_1/setting-up-an-external-zookeeper-ensemble.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13788) Resolve multiple IPs from specified zookeeper URL

2019-09-25 Thread Ween Jiann (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938236#comment-16938236
 ] 

Ween Jiann edited comment on SOLR-13788 at 9/26/19 3:46 AM:


I'm trying to modify the helm chart for solr such that it works for kubernetes 
(k8s) deployment correctly. There needs to be a particular change in the way 
solr resolves zookeepers hostname in order for this to happen.

 

Let me explain...

The standard way to configure solr is by listing all the zookeeper hostname/IP 
in either:
 * {{solr.in.sh}} or {{solr.in.cmd}}
 * {{zoo.cfg}}
 * {{-z param}}

For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

 

However, when it comes to cloud deployment, in particular on k8s using helm 
chart, this is not an ideal situation as the user is required to modify zk_host 
each time they scale the number of zookeeper up/down.

For example (scale down): ZK_HOST="zk1:2181,zk2:2181".

For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

This cannot be done automatically using in helm/k8s. In k8s, this parameter 
should remain static, meaning that it should not be changed after deployment of 
the chart.

For example (k8s): ZK_HOST="zk-headless:2181".

 

What a chart can do is to create a service with a DNS name such as zk-headless 
that contains all the IP of the zookeeper, and as zookeeper scales, the number 
of IP resolved from zk-headless changes. I'm asking if there could be an 
improvement that allows solr to resolve multiple zookeeper IPs from a single 
name.

 

I will also raise this question on the user's list.


was (Author: lwj5):
I'm trying to modify the helm chart for solr such that it works for kubernetes 
(k8s) deployment correctly. There needs to be a particular change in the way 
solr resolves zookeepers hostname in order for this to happen.

 

Let me explain...

The standard way to configure solr is by listing all the zookeeper hostname/IP 
in either:
 * {{solr.in.sh}} or {{solr.in.cmd}}
 * {{zoo.cfg}}
 * {{-z param}}

For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

 

However, when it comes to cloud deployment, in particular on k8s using helm 
chart, this is not an ideal situation as the user is required to modify zk_host 
each time they scale the number of zookeeper up/down.

For example (scale down): ZK_HOST="zk1:2181,zk2:2181".

For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

And this cannot be done automatically using in helm. In k8s, this parameter 
should remain static, meaning that it should not be changed after deployment of 
the chart.

For example (k8s): ZK_HOST="zk-headless:2181".

 

What a chart can do is to create a service with a DNS name such as zk-headless 
that contains all the IP of the zookeeper, and as zookeeper scales, the number 
of IP resolved from zk-headless changes. I'm asking if there could be an 
improvement that allows solr to resolve multiple zookeeper IPs from a single 
name.

 

I will also raise this question on the user's list.

> Resolve multiple IPs from specified zookeeper URL
> -
>
> Key: SOLR-13788
> URL: https://issues.apache.org/jira/browse/SOLR-13788
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.1.1
>Reporter: Ween Jiann
>Priority: Minor
>  Labels: features
>
> Use DNS lookup to get the IPs of the servers listed in ZK_HOST or -z param. 
> This would help cloud deployment as DNS is often used to group services 
> together.
> [https://lucene.apache.org/solr/guide/8_1/setting-up-an-external-zookeeper-ensemble.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13788) Resolve multiple IPs from specified zookeeper URL

2019-09-25 Thread Ween Jiann (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938236#comment-16938236
 ] 

Ween Jiann edited comment on SOLR-13788 at 9/26/19 3:44 AM:


I'm trying to modify the helm chart for solr such that it works for kubernetes 
(k8s) deployment correctly. There needs to be a particular change in the way 
solr resolves zookeepers hostname in order for this to happen.

 

Let me explain...

The standard way to configure solr is by listing all the zookeeper hostname/IP 
in either:
 * {{solr.in.sh}} or {{solr.in.cmd}}
 * {{zoo.cfg}}
 * {{-z param}}

For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

 

However, when it comes to cloud deployment, in particular on k8s using helm 
chart, this is not an ideal situation as the user is required to modify zk_host 
each time they scale the number of zookeeper up/down.

For example (scale down): ZK_HOST="zk1:2181,zk2:2181".

For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

And this cannot be done automatically using in helm. In k8s, this parameter 
should remain static, meaning that it should not be changed after deployment of 
the chart.

For example (k8s): ZK_HOST="zk-headless:2181".

 

What a chart can do is to create a service with a DNS name such as zk-headless 
that contains all the IP of the zookeeper, and as zookeeper scales, the number 
of IP resolved from zk-headless changes. I'm asking if there could be an 
improvement that allows solr to resolve multiple zookeeper IPs from a single 
name.

 

I will also raise this question on the user's list.


was (Author: lwj5):
I'm trying to modify the helm chart for solr such that it works for kubernetes 
(k8s) deployment properly. There needs to be a particular change in the way 
solr resolves zookeepers hostname in order for this to happen.

 

Let me explain...

The standard way to configure solr is by listing all the zookeeper hostname/IP 
in either:
 * {{solr.in.sh}} or {{solr.in.cmd}}
 * {{zoo.cfg}}
 * {{-z param}}

For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

 

However, when it comes to cloud deployment, in particular on k8s using helm 
chart, this is not an ideal situation as the user is required to modify zk_host 
each time they scale the number of zookeeper up/down.

For example (scale down): ZK_HOST="zk1:2181,zk2:2181".

For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

And this cannot be done automatically using in helm. In k8s, this parameter 
should remain static, meaning that it should not be changed after deployment of 
the chart.

For example (k8s): ZK_HOST="zk-headless:2181".

 

What a chart can do is to create a service with a DNS name such as zk-headless 
that contains all the IP of the zookeeper, and as zookeeper scales, the number 
of IP resolved from zk-headless changes. I'm asking if there could be an 
improvement that allows solr to resolve multiple zookeeper IPs from a single 
name.

 

I will also raise this question on the user's list.

> Resolve multiple IPs from specified zookeeper URL
> -
>
> Key: SOLR-13788
> URL: https://issues.apache.org/jira/browse/SOLR-13788
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.1.1
>Reporter: Ween Jiann
>Priority: Minor
>  Labels: features
>
> Use DNS lookup to get the IPs of the servers listed in ZK_HOST or -z param. 
> This would help cloud deployment as DNS is often used to group services 
> together.
> [https://lucene.apache.org/solr/guide/8_1/setting-up-an-external-zookeeper-ensemble.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13788) Resolve multiple IPs from specified zookeeper URL

2019-09-25 Thread Ween Jiann (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938236#comment-16938236
 ] 

Ween Jiann commented on SOLR-13788:
---

I'm trying to modify the helm chart for solr such that it works for kubernetes 
(k8s) deployment properly. There needs to be a particular change in the way 
solr resolves zookeepers hostname in order for this to happen.

 

Let me explain...

The standard way to configure solr is by listing all the zookeeper hostname/IP 
in either:
 * {{solr.in.sh}} or {{solr.in.cmd}}
 * {{zoo.cfg}}
 * {{-z param}}

For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

 

However, when it comes to cloud deployment, in particular on k8s using helm 
chart, this is not an ideal situation as the user is required to modify zk_host 
each time they scale the number of zookeeper up/down.

For example (scale down): ZK_HOST="zk1:2181,zk2:2181".

For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

And this cannot be done automatically using in helm. In k8s, this parameter 
should remain static, meaning that it should not be changed after deployment of 
the chart.

For example (k8s): ZK_HOST="zk-headless:2181".

 

What a chart can do is to create a service with a DNS name such as zk-headless 
that contains all the IP of the zookeeper, and as zookeeper scales, the number 
of IP resolved from zk-headless changes. I'm asking if there could be an 
improvement that allows solr to resolve multiple zookeeper IPs from a single 
name.

 

I will also raise this question on the user's list.

> Resolve multiple IPs from specified zookeeper URL
> -
>
> Key: SOLR-13788
> URL: https://issues.apache.org/jira/browse/SOLR-13788
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.1.1
>Reporter: Ween Jiann
>Priority: Minor
>  Labels: features
>
> Use DNS lookup to get the IPs of the servers listed in ZK_HOST or -z param. 
> This would help cloud deployment as DNS is often used to group services 
> together.
> [https://lucene.apache.org/solr/guide/8_1/setting-up-an-external-zookeeper-ensemble.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938181#comment-16938181
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 0ec7f7e945a20269b1da01b00b530799bee60b77 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0ec7f7e ]

SOLR-13105: Update text inline tocs 11


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13794) Delete solr/core/src/test-files/solr/configsets/_default

2019-09-25 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938158#comment-16938158
 ] 

Ishan Chattopadhyaya commented on SOLR-13794:
-

I'm heads down into SOLR-13662 and some other issues, so I'll only be able to 
review the patch (and your concerns about the nocommits) next week.

> Delete solr/core/src/test-files/solr/configsets/_default
> 
>
> Key: SOLR-13794
> URL: https://issues.apache.org/jira/browse/SOLR-13794
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13794.patch, SOLR-13794_code_only.patch
>
>
> For as long as we've had a {{_default}} configset in solr, we've also had a 
> copy of that default in {{core/src/test-files/}} - as well as a unit test 
> that confirms they are identical.
> It's never really been clear to me *why* we have this duplication, instead of 
> just having the test-framework take the necessary steps to ensure that 
> {{server/solr/configsets/_default}} is properly used when running tests.
> I'd like to propose we eliminate the duplication since it only ever seems to 
> cause problems (notably spurious test failures when people modify the 
> {{_default}} configset w/o remembering that they need to make identical edits 
> to the {{test-files}} clone) and instead have {{SolrTestCase}} set the 
> (already existing & supported) {{solr.default.confdir}} system property to 
> point to the (already existing) {{ExternalPaths.DEFAULT_CONFIGSET}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13794) Delete solr/core/src/test-files/solr/configsets/_default

2019-09-25 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938156#comment-16938156
 ] 

Ishan Chattopadhyaya commented on SOLR-13794:
-

+1 to the proposition. I wasn't aware at the time of how I could've used the 
user facing _default configset in tests, and hence copied it over. I admit I 
didn't look hard enough. Thanks for tackling this.

> Delete solr/core/src/test-files/solr/configsets/_default
> 
>
> Key: SOLR-13794
> URL: https://issues.apache.org/jira/browse/SOLR-13794
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13794.patch, SOLR-13794_code_only.patch
>
>
> For as long as we've had a {{_default}} configset in solr, we've also had a 
> copy of that default in {{core/src/test-files/}} - as well as a unit test 
> that confirms they are identical.
> It's never really been clear to me *why* we have this duplication, instead of 
> just having the test-framework take the necessary steps to ensure that 
> {{server/solr/configsets/_default}} is properly used when running tests.
> I'd like to propose we eliminate the duplication since it only ever seems to 
> cause problems (notably spurious test failures when people modify the 
> {{_default}} configset w/o remembering that they need to make identical edits 
> to the {{test-files}} clone) and instead have {{SolrTestCase}} set the 
> (already existing & supported) {{solr.default.confdir}} system property to 
> point to the (already existing) {{ExternalPaths.DEFAULT_CONFIGSET}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13794) Delete solr/core/src/test-files/solr/configsets/_default

2019-09-25 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13794:
--
Attachment: SOLR-13794.patch
SOLR-13794_code_only.patch
Status: Open  (was: Open)

FWIW: The original motivation for the duplication seems to have come from this 
comment/suggestion from shalin (emphasis added by me)...

https://issues.apache.org/jira/browse/SOLR-10272?focusedCommentId=16064813=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16064813
{quote}Since our collection creation API now uses _default configset by 
default, our tests should also do the same if no explicit configset name is 
specified. The _default for tests should be identical to the ones that users 
will use. This ensures that if a functionality is tested using the _default 
configset, it will actually work in the hands of our users. _If we need to 
duplicate _default in two places to achieve this, then we should add a test to 
assert that both are same._ This only really affects new tests since I am 
assuming existing ones have been cut over to "conf" already.
{quote}
In a discussion about ensuring that {{_default}} was used correctly as the 
"default configset" when solr was rurning in tests, shalin suggested that *if* 
duplication was necessary then we should have a test verifying that the copies 
were identical – but i can find no discussion indicating duplication *is* 
necessary.

The attached patch leverages the existing {{ExternalPaths.DEFAULT_CONFIGSET}} 
hueristicly determined path for the default configset and uses that to set the 
{{solr.default.confdir}} _unless it is already set by the test environment_ (so 
that external users leveraging the test-framework in their own tests can set it 
as they see fit)

It also updates the existing 
{{TestConfigSetsAPI.testUserAndTestDefaultConfigsetsAreSame()}} replacing the 
existing logic for comparing the two directories (which is now nonsense since 
there is only one directory) with a much simpler test to ensure that the value 
used by {{ZkController}} when bootstraping matches the 
{{ExternalPaths.DEFAULT_CONFIGSET}} – something that should (now) be garunteed 
when running solr-core tests.

All tests pass with these changes, the only nocommit's in the current patch 
relate to the removal of {{ZkController.getDefaultConfigDirFromClasspath(...)}} 
which is how solr currently locates the duplicate 
{{solr/core/src/test-files/solr/configsets/_default}} (by counting on 
{{solr/core/src/test-files}} being the the classpath, and looking for 
{{solr/configsets/_default}}). I think this is safe to remove, but I suppose 
it's possible that end users have come to depend on this undocumented feature 
of looking for this specific path in the classpath?

[~ichattopadhyaya] & [~shalinmangar] 
 * Do you see anything wrong with this approach?
 * is there some motivation for this duplication that I'm overlooking?
 * do you have any thoughts/concerns regarding the nocommits?

—

(NOTE: i've actually attached 2 patches – one includes all the changes; for 
readability the second is just the code change subset of the first, w/o the 
"noise" of {{git rm -r solr/core/src/test-files/solr/configsets/_default}})

> Delete solr/core/src/test-files/solr/configsets/_default
> 
>
> Key: SOLR-13794
> URL: https://issues.apache.org/jira/browse/SOLR-13794
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13794.patch, SOLR-13794_code_only.patch
>
>
> For as long as we've had a {{_default}} configset in solr, we've also had a 
> copy of that default in {{core/src/test-files/}} - as well as a unit test 
> that confirms they are identical.
> It's never really been clear to me *why* we have this duplication, instead of 
> just having the test-framework take the necessary steps to ensure that 
> {{server/solr/configsets/_default}} is properly used when running tests.
> I'd like to propose we eliminate the duplication since it only ever seems to 
> cause problems (notably spurious test failures when people modify the 
> {{_default}} configset w/o remembering that they need to make identical edits 
> to the {{test-files}} clone) and instead have {{SolrTestCase}} set the 
> (already existing & supported) {{solr.default.confdir}} system property to 
> point to the (already existing) {{ExternalPaths.DEFAULT_CONFIGSET}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13794) Delete solr/core/src/test-files/solr/configsets/_default

2019-09-25 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created SOLR-13794:
-

 Summary: Delete solr/core/src/test-files/solr/configsets/_default
 Key: SOLR-13794
 URL: https://issues.apache.org/jira/browse/SOLR-13794
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter


For as long as we've had a {{_default}} configset in solr, we've also had a 
copy of that default in {{core/src/test-files/}} - as well as a unit test that 
confirms they are identical.

It's never really been clear to me *why* we have this duplication, instead of 
just having the test-framework take the necessary steps to ensure that 
{{server/solr/configsets/_default}} is properly used when running tests.

I'd like to propose we eliminate the duplication since it only ever seems to 
cause problems (notably spurious test failures when people modify the 
{{_default}} configset w/o remembering that they need to make identical edits 
to the {{test-files}} clone) and instead have {{SolrTestCase}} set the (already 
existing & supported) {{solr.default.confdir}} system property to point to the 
(already existing) {{ExternalPaths.DEFAULT_CONFIGSET}}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Anshum Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated LUCENE-8984:
-
Lucene Fields:   (was: New,Patch Available)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0), 8.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Anshum Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated LUCENE-8984:
-
Fix Version/s: 8.3

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0), 8.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938106#comment-16938106
 ] 

ASF subversion and git services commented on LUCENE-8984:
-

Commit 3c3d5b1172fe9221a44482a4a0ca04b9fd5f2246 in lucene-solr's branch 
refs/heads/branch_8x from Anshum Gupta
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3c3d5b1 ]

LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (#871) (#901)

* LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (#871)

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938105#comment-16938105
 ] 

ASF subversion and git services commented on LUCENE-8984:
-

Commit 3c3d5b1172fe9221a44482a4a0ca04b9fd5f2246 in lucene-solr's branch 
refs/heads/branch_8x from Anshum Gupta
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3c3d5b1 ]

LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (#871) (#901)

* LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (#871)

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] anshumg merged pull request #901: LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (#871)

2019-09-25 Thread GitBox
anshumg merged pull request #901: LUCENE-8984: MoreLikeThis MLT is biased for 
uncommon fields (#871)
URL: https://github.com/apache/lucene-solr/pull/901
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] anshumg opened a new pull request #901: LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (#871)

2019-09-25 Thread GitBox
anshumg opened a new pull request #901: LUCENE-8984: MoreLikeThis MLT is biased 
for uncommon fields (#871)
URL: https://github.com/apache/lucene-solr/pull/901
 
 
   Cherrypick from master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-25 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938084#comment-16938084
 ] 

Noble Paul commented on SOLR-13101:
---

We can have multiple file share systems. Just use a different name so that we 
don't confuse the users

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul opened a new pull request #900: test 13722 . branch

2019-09-25 Thread GitBox
noblepaul opened a new pull request #900: test 13722 . branch
URL: https://github.com/apache/lucene-solr/pull/900
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #856: LUCENE-8965 SometimesConcurrentMergeScheduler

2019-09-25 Thread GitBox
dsmiley commented on a change in pull request #856: LUCENE-8965 
SometimesConcurrentMergeScheduler
URL: https://github.com/apache/lucene-solr/pull/856#discussion_r328342889
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/index/SometimesConcurrentMergeScheduler.java
 ##
 @@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.util.Observable;
+import java.util.Observer;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.lucene.util.ThreadInterruptedException;
+
+/**
+ * A variant of CMS: If there are cheap merges, wait for a merge to complete 
before continuing. This
+ * has the benefit of greatly increasing the odds that an {@link 
org.apache.lucene.search.IndexSearcher}
+ * will see fewer segments. Normally, CMS does all merges concurrently, and so 
it won't be until the
+ * next commit that the IndexSearcher benefits from fewer segments. The 
trade-off is less
+ * concurrency, and there will be some delay on a segment flush for a cheap 
merge if present.
+ *
+ * @author dsmiley
+ * @since solr.7
 
 Review comment:
   Whoops; it was a mistake that I left this when copying from internal work.  
Different coding practices.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13710) Persist package jars locally & expose them over http

2019-09-25 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938011#comment-16938011
 ] 

Yonik Seeley commented on SOLR-13710:
-

I agree, the proliferation and relationship of these various blob apis is 
confusing.

> Persist package jars locally & expose them over http
> 
>
> Key: SOLR-13710
> URL: https://issues.apache.org/jira/browse/SOLR-13710
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * All jars for packages downloaded are stored in a dir SOLR_HOME/blobs. 
> * The file names will be the sha256 hash of the files.
> * Before downloading the a jar from a location, it's first checked in the 
> local directory
> * POST a jar to http://localhost://8983/api/cluster/blob to distibute it in 
> the cluster
> * A new API end point {{http://localhost://8983/api/node/blob}} will list the 
> available jars
> example
> {code:json}
> {
> "blob":["e1f9e23988c19619402f1040c9251556dcd6e02b9d3e3b966a129ea1be5c70fc",
> "79298d7d5c3e60d91154efe7d72f4536eac46698edfa22ab894b85492d562ed4"]
> }
> {code}
> * The jar will be downloadable at 
> {{http://localhost://8983/api/node/blob/}} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13793) HTTPSolrCall makes cascading calls even when all replicas are down for a collection

2019-09-25 Thread Kesharee Nandan Vishwakarma (Jira)
Kesharee Nandan Vishwakarma created SOLR-13793:
--

 Summary: HTTPSolrCall makes cascading calls even when all replicas 
are down for a collection
 Key: SOLR-13793
 URL: https://issues.apache.org/jira/browse/SOLR-13793
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 6.6
Reporter: Kesharee Nandan Vishwakarma


REMOTEQUERY action in HTTPSolrCall ends up making too many cascading 
remoteQuery calls when all all the replicas of a collection are in down state. 
This results in increase in thread count, unresponsive solr nodes and 
eventually node (one's which have this collection) going out of live nodes.

*Example scenario*: Consider a cluster with 3 nodes(solr1, solrw1, 
solr-overseer1). A collection is present on solr1, solrw1 but both replicas are 
in down state. When a search request is made to solr-overseer1, since replica 
is not present locally a remote query is made to solr1 (we also consider 
inactive slices/coreUrls), solr1 also doesn't see an active replica present 
locally, it forwards to solrw1, again solrw1 will forward request to solr1. 
This goes on till both of solr1, solrw1 become unresponsive. Attached logs for 
this.

This is happening because we are considering [inactive 
slices|https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L913
 ], [inactive coreUrl| 
https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L929]
 while forwarding requests to nodes.

*Steps to reproduce*:
#  Bring down all replicas of a collection but ensure nodes containing them are 
up 
# Make any search call to any of solr nodes for this collection. 
 
*Possible fixes*: 
# Ensure we select only active slices/coreUrls before making remote queries
# Put a limit on cascading calls probably limit to number of replicas 
 
{noformat} 
solrw1_1 | at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 solrw1_1 | at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
 solrw1_1 | at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
solrw1_1 | at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
 solrw1_1 | at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
 solrw1_1 | at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) 
solrw1_1 | at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
 solrw1_1 | at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) 
solrw1_1 | at org.eclipse.jetty.server.Server.handle(Server.java:534) solrw1_1 
| at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) solrw1_1 
| at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) 
solrw1_1 | at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
 solrw1_1 | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) 
solrw1_1 | at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) 
solrw1_1 | at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
 solrw1_1 | at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
 solrw1_1 | at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
 solrw1_1 | at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
 solrw1_1 | at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) 
solrw1_1 | at java.lang.Thread.run(Thread.java:748) solrw1_1 | Caused by: 
java.net.SocketTimeoutException: Read timed out solrw1_1 | at 
java.net.SocketInputStream.socketRead0(Native Method) solrw1_1 | at 
java.net.SocketInputStream.socketRead(SocketInputStream.java:116) solrw1_1 | at 
java.net.SocketInputStream.read(SocketInputStream.java:171) solrw1_1 | at 
java.net.SocketInputStream.read(SocketInputStream.java:141) solrw1_1 | at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 solrw1_1 | at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) 
solrw1_1 | at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 solrw1_1 | at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 solrw1_1 | at 

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-25 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937998#comment-16937998
 ] 

David Smiley commented on SOLR-13101:
-

Solr already has a [Blob 
Store|https://lucene.apache.org/solr/guide/8_0/blob-store-api.html] for jars 
(plugins).  And as Noble points out, SOLR-13710 introduces _a second_ Blob 
Store in SOLR-13710 that appears duplicative with the former, committed to 8.x. 
 Eventually either could be used for not just plugins but resources (e.g. 
language models, etc.) generally.  Does it make sense to use the same name 
"blob store" for index data?  That would imply not just a common name but some 
common APIs as well that work seamlessly.  I'm not sure if these use cases fit 
well together or not.  If we separate them, I suggest we abandon this nebulous 
word "blob" and be more specific – a "Resource Store" and a "Index Store".  
What do others think?

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13710) Persist package jars locally & expose them over http

2019-09-25 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937995#comment-16937995
 ] 

David Smiley commented on SOLR-13710:
-

[~noble.paul] what is the plan of this issue relative to the existing "[Blob 
Store|[https://lucene.apache.org/solr/guide/8_0/blob-store-api.html]];.  I 
think such explanation deserves to be articulated _in this issue_, because it 
introduces a duplicative concept to something that already exists.

> Persist package jars locally & expose them over http
> 
>
> Key: SOLR-13710
> URL: https://issues.apache.org/jira/browse/SOLR-13710
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * All jars for packages downloaded are stored in a dir SOLR_HOME/blobs. 
> * The file names will be the sha256 hash of the files.
> * Before downloading the a jar from a location, it's first checked in the 
> local directory
> * POST a jar to http://localhost://8983/api/cluster/blob to distibute it in 
> the cluster
> * A new API end point {{http://localhost://8983/api/node/blob}} will list the 
> available jars
> example
> {code:json}
> {
> "blob":["e1f9e23988c19619402f1040c9251556dcd6e02b9d3e3b966a129ea1be5c70fc",
> "79298d7d5c3e60d91154efe7d72f4536eac46698edfa22ab894b85492d562ed4"]
> }
> {code}
> * The jar will be downloadable at 
> {{http://localhost://8983/api/node/blob/}} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] diegoceccarelli commented on a change in pull request #300: SOLR-11831: Skip second grouping step if group.limit is 1 (aka Las Vegas Patch)

2019-09-25 Thread GitBox
diegoceccarelli commented on a change in pull request #300: SOLR-11831: Skip 
second grouping step if group.limit is 1 (aka Las Vegas Patch)
URL: https://github.com/apache/lucene-solr/pull/300#discussion_r328272478
 
 

 ##
 File path: 
lucene/grouping/src/java/org/apache/lucene/search/grouping/FirstPassGroupingCollector.java
 ##
 @@ -139,10 +139,18 @@ public ScoreMode scoreMode() {
   // System.out.println("  group=" + (group.groupValue == null ? "null" : 
group.groupValue.toString()));
   SearchGroup searchGroup = new SearchGroup<>();
   searchGroup.groupValue = group.groupValue;
+  // We pass this around so that we can get the corresponding solr id when 
serializing the search group to send to the federator
+  searchGroup.topDocLuceneId = group.topDoc;
   searchGroup.sortValues = new Object[sortFieldCount];
   for(int sortFieldIDX=0;sortFieldIDX

[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-25 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937957#comment-16937957
 ] 

Noble Paul commented on SOLR-13661:
---

#1 & #2 can be implemented in the future and there is nothing in the current 
impl that prevents is from doing any of it. 

 

#3 is not something I'm sure about

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13661) A package management system for Solr

2019-09-25 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937954#comment-16937954
 ] 

David Smiley commented on SOLR-13661:
-

There are a few particular things I'm not sure I've seen here yet that I want 
to ensure _could be supported_ (i.e. without major redesign/rework).

(1) Declare plugin/package dependencies for the whole configSet, such as via 
additional options to solrconfig.xml "" tags.  While this would not work 
with hot deployment of plugins since a core reload is needed, this would at 
least support the entire range of plugin interfaces we have that work from 
within a SolrCore, and thus support all of the schema as well _without_ making 
users add new tags to all these various elements.

(2) Use the new blob store for _resources_ (data) as an alternative to the 
configSet.  The configSet lives in ZooKeeper and as-such is a poor fit for, 
say, machine learning models, language models, or any other large immutable 
files needed within a running SolrCore.  Such resource data might be versioned 
like plugins but would be considered data and not code.  Plugins could declare 
dependencies on resources, or might be declared directly by the configSet in 
some manner.  Detaching from the configSet is also good for re-use of large 
files across configs.  I could imagine a SolrResourceLoader that interprets 
special prefixes like "blob://mymodel/foo.dat" or some-such to generically plug 
into our existing resource spots.

(3) Pluggable blob store implementation and/or support for a shared file system 
across the cluster with a configurable path.

> A package management system for Solr
> 
>
> Key: SOLR-13661
> URL: https://issues.apache.org/jira/browse/SOLR-13661
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: package
> Attachments: plugin-usage.png, repos.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Here's the design doc:
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13791) Remove BeanUtils reference from ivy-versions.properties

2019-09-25 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937945#comment-16937945
 ] 

Lucene/Solr QA commented on SOLR-13791:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} Check licenses {color} | {color:green} 
 0m  4s{color} | {color:green} Check licenses check-licenses passed {color} |
| {color:red}-1{color} | {color:red} Check licenses {color} | {color:red}  0m 
32s{color} | {color:red} Check licenses check-licenses failed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m  4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:black}{color} | {color:black} {color} | {color:black}  1m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13791 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981338/SOLR-13791-01.patch |
| Optional Tests |  checklicenses  validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 0d0af505a03 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Check licenses | 
https://builds.apache.org/job/PreCommit-SOLR-Build/559/artifact/out/patch-check-licenses-solr.txt
 |
| modules | C: lucene U: lucene |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/559/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Remove BeanUtils reference from ivy-versions.properties
> ---
>
> Key: SOLR-13791
> URL: https://issues.apache.org/jira/browse/SOLR-13791
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andras Salamon
>Priority: Major
> Attachments: SOLR-13791-01.patch
>
>
> SOLR-12617 removed Commons BeanUtils, but {{lucene/ivy-versions.properties}} 
> still have a reference to beanutils, because SOLR-9515 added this line back.
> We can remove this line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13787) An annotation based system to write v2 only APIs

2019-09-25 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937932#comment-16937932
 ] 

Noble Paul edited comment on SOLR-13787 at 9/25/19 5:36 PM:


This is not a wholesale rewrite of v2 APIs.

Basically, there are 2 types of v2 APIs. 
 # Without a json payload
 # With a json payload

#1 really doesn't need a spec file and we can avoid it. 

#2 would need a spec file which contains the json schema of the payload. The 
schema is best expressed in json schema format.


was (Author: noble.paul):
This is not a wholesale rewrite of v2 APIs.

Basically, there are 2 types of v2 APIs. 
 # Without a json payload
 # With a json payload

#1 can avoid a spec file and we can avoid it. 

#2 would need a spec file which contains the json schema of the payload. The 
schema is best expressed in json schema format.

> An annotation based system to write v2 only APIs
> 
>
> Key: SOLR-13787
> URL: https://issues.apache.org/jira/browse/SOLR-13787
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> example v2 API may look as follows
> {code:java}
> @EndPoint(
>  spec = "cluster.package",
>  method = POST,
>  permission = PKG_EDIT
> )
> static class PkgEdit {
>  @Command(name = "add")
>  public void add(CallInfo callInfo) throws Exception {
>  }
>  @Command(name = "update")
>  public void update(CallInfo callInfo) throws Exception {
> }
>  @Command(name = "delete")
>  boolean deletePackage(CallInfo params) throws Exception {
> }
> {code}
> This expects you to already have the API spec json 
>  
> The annotations are:
>  
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target({ElementType.TYPE})
> public @interface EndPoint {
>   /**name of the API spec file without the '.json' suffix
>*/
>   String spec();
>   /**Http method
>*/
>   SolrRequest.METHOD method();
>   /**The well known persmission name if any
>*/
>   PermissionNameProvider.Name permission();
> }
> {code}
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target(ElementType.METHOD)
> public @interface Command {
>   String name() default "";
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs

2019-09-25 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937932#comment-16937932
 ] 

Noble Paul commented on SOLR-13787:
---

This is not a wholesale rewrite of v2 APIs.

Basically, there are 2 types of v2 APIs. 
 # Without a json payload
 # With a json payload

#1 can avoid a spec file and we can avoid it. 

#2 would need a spec file which contains the json schema of the payload. The 
schema is best expressed in json schema format.

> An annotation based system to write v2 only APIs
> 
>
> Key: SOLR-13787
> URL: https://issues.apache.org/jira/browse/SOLR-13787
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> example v2 API may look as follows
> {code:java}
> @EndPoint(
>  spec = "cluster.package",
>  method = POST,
>  permission = PKG_EDIT
> )
> static class PkgEdit {
>  @Command(name = "add")
>  public void add(CallInfo callInfo) throws Exception {
>  }
>  @Command(name = "update")
>  public void update(CallInfo callInfo) throws Exception {
> }
>  @Command(name = "delete")
>  boolean deletePackage(CallInfo params) throws Exception {
> }
> {code}
> This expects you to already have the API spec json 
>  
> The annotations are:
>  
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target({ElementType.TYPE})
> public @interface EndPoint {
>   /**name of the API spec file without the '.json' suffix
>*/
>   String spec();
>   /**Http method
>*/
>   SolrRequest.METHOD method();
>   /**The well known persmission name if any
>*/
>   PermissionNameProvider.Name permission();
> }
> {code}
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target(ElementType.METHOD)
> public @interface Command {
>   String name() default "";
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris opened a new pull request #899: LUCENE-8989: Allow IndexSearcher To Handle Rejected Execution

2019-09-25 Thread GitBox
atris opened a new pull request #899: LUCENE-8989: Allow IndexSearcher To 
Handle Rejected Execution
URL: https://github.com/apache/lucene-solr/pull/899
 
 
   When executing queries using Executors, we should gracefully handle the case 
when Executor rejects a task and run the task on the caller thread.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13792) SolrZkClient should include more MDC info when zkCallback threads process a WatchedEvent

2019-09-25 Thread Hoss Man (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-13792:

Attachment: SOLR-13792.patch
Status: Open  (was: Open)


I initially set out to see what it would take just to ensure that the 
{{MDCLoggingContext.setNode(...)}} information was set when {{SolrZkClient}} 
submitted tasks to the  {{zkCallback}} {{MDCAwareThreadPoolExecutor}}, but the 
more I looked a it the more I started to think what might make a lot more sense 
in general is if {{SolrZkClient}} ensured that the MDC context used when 
submitting a Runnable (to the {{zkCallback}} {{MDCAwareThreadPoolExecutor}}) 
contained all the MDC information of the thread that *registered* the 
{{Watcher}} in the first place.

This could be useful to external solrj based {{SolrZkClient}} users that might 
use MDC when registering their own {{Watcher}} instances, but would also be 
useful in Solr not only to ensure that "node level" Watchers (the most common) 
would at least have the {{n:host_port}} dteails in their log messages, but also 
to help ensure that if any "individual requests or SolrCore level code 
registered a {{Watcher}} we'd have that collection/core level detail from that 
request in the log messages as well.

The attached patch takes things in this direction by making a copy of the MDC 
context info in the {{ProcessWatchWithExecutor}} that 
{{SolrZkClient.wrapWatcher}} creates for every {{Watcher}} provided by it's 
caller.  This MDC context copy is then used when the Watcher is processed, but 
the code ensures that any MDC values set at that time can override them (just 
in case down the road ZooKeeper ever uses MDC).

On the whole I think it's an improvement -- but one thing I'm not sure about is 
if/how thse MDC values should be accounted for in 
{{ProcessWatchWithExecutor.hashCode()}} and 
{{ProcessWatchWithExecutor.equals()}} given the comments in that class.  (i've 
included a nocommit regarding this)



Some sample log output from an arbitrary cloud test 
(TestMiniSolrCloudClusterSSL) w/o the patch...

{noformat}
   [junit4]   2> 8511 INFO  (qtp666514211-77) [n:127.0.0.1:43335_solr 
c:first_collection s:shard1 r:core_node4 x:first_collection_shard1_replica_n1 ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores 
params={qt=/admin/cores=org.apache.solr.cloud.TestMiniSolrCloudClusterSSL=true=first_collection=2=NRT=solrconfig-tlog.xml=core_node4=first_collection_shard1_replica_n1=CREATE=3=shard1=javabin}
 status=0 QTime=2840
   [junit4]   2> 8514 INFO  (qtp1436582928-73) [n:127.0.0.1:39513_solr 
c:first_collection s:shard3 r:core_node6 x:first_collection_shard3_replica_n3 ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores 
params={qt=/admin/cores=org.apache.solr.cloud.TestMiniSolrCloudClusterSSL=true=first_collection=2=NRT=solrconfig-tlog.xml=core_node6=first_collection_shard3_replica_n3=CREATE=3=shard3=javabin}
 status=0 QTime=2851
   [junit4]   2> 8530 INFO  (qtp1436582928-75) [n:127.0.0.1:39513_solr ] 
o.a.s.h.a.CollectionsHandler Wait for new collection to be active for at most 
45 seconds. Check all shard replicas
   [junit4]   2> 8611 INFO  (zkCallback-36-thread-1) [ ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/first_collection/state.json] for collection 
[first_collection] has occurred - updating... (live nodes size: [3])
   [junit4]   2> 8611 INFO  (zkCallback-36-thread-2) [ ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/first_collection/state.json] for collection 
[first_collection] has occurred - updating... (live nodes size: [3])
   [junit4]   2> 8611 INFO  (zkCallback-40-thread-3) [ ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/first_collection/state.json] for collection 
[first_collection] has occurred - updating... (live nodes size: [3])
   [junit4]   2> 8611 INFO  (zkCallback-36-thread-3) [ ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/first_collection/state.json] for collection 
[first_collection] has occurred - updating... (live nodes size: [3])
   [junit4]   2> 8611 INFO  (zkCallback-44-thread-3) [ ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/first_collection/state.json] for collection 
[first_collection] has occurred - updating... (live nodes size: [3])
   [junit4]   2> 8612 INFO  (zkCallback-44-thread-1) [ ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/first_collection/state.json] for collection 
[first_collection] has occurred - updating... (live nodes size: 

[jira] [Created] (SOLR-13792) SolrZkClient should include more MDC info when zkCallback threads process a WatchedEvent

2019-09-25 Thread Hoss Man (Jira)
Hoss Man created SOLR-13792:
---

 Summary: SolrZkClient should include more MDC info when zkCallback 
threads process a WatchedEvent
 Key: SOLR-13792
 URL: https://issues.apache.org/jira/browse/SOLR-13792
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Hoss Man


One of the biggest headaches when debigging multi-node cloud tests is 
disambiguiating the log messages and what nodes they came from.

For many threads, the MDC context info makes this a non-issue, but in the case 
of "zkCallback" threads it can be virtually impossible to tell which "node" of 
the cluster each of the zkCallback threads belongs to, because they don't have 
MDC info ({{SolrZkClient}} already uses an {{MDCAwareThreadPoolExecutor}} to 
process the ZK {{WatchEvent}} callbacks, and {{MDCAwareThreadPoolExecutor}} 
ensures that the _submitter's_ MDC values are used in the Thread that executes 
the Runnable -- but in this case the "submitter" is the ZooKeeper Thread.

>From a test debugging standpoint, it would be very useful if more MDC context 
>info about the *node* existed when {{zkCallback}} thread's execute.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
atris commented on issue #815: LUCENE-8213: Introduce Asynchronous Caching in 
LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#issuecomment-535102724
 
 
   @mikemccand Updated, please see


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs

2019-09-25 Thread Anshum Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937887#comment-16937887
 ] 

Anshum Gupta commented on SOLR-13787:
-

[~noble.paul] - for the sake of clarity, can you add some more context the 
annotations, as well as the spec file? If there's already a sample or document 
available outside of this JIRA, a link would do.

Considering that the last time I took a look at and/or did anything with v2 
APIs was a while ago, I feel this might be a little complicated for what you 
are trying to accomplish. At the same time, I may be wrong here and some 
documentation about the annotation/spec file would make things better w.r.t. 
understanding this proposal.

> An annotation based system to write v2 only APIs
> 
>
> Key: SOLR-13787
> URL: https://issues.apache.org/jira/browse/SOLR-13787
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> example v2 API may look as follows
> {code:java}
> @EndPoint(
>  spec = "cluster.package",
>  method = POST,
>  permission = PKG_EDIT
> )
> static class PkgEdit {
>  @Command(name = "add")
>  public void add(CallInfo callInfo) throws Exception {
>  }
>  @Command(name = "update")
>  public void update(CallInfo callInfo) throws Exception {
> }
>  @Command(name = "delete")
>  boolean deletePackage(CallInfo params) throws Exception {
> }
> {code}
> This expects you to already have the API spec json 
>  
> The annotations are:
>  
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target({ElementType.TYPE})
> public @interface EndPoint {
>   /**name of the API spec file without the '.json' suffix
>*/
>   String spec();
>   /**Http method
>*/
>   SolrRequest.METHOD method();
>   /**The well known persmission name if any
>*/
>   PermissionNameProvider.Name permission();
> }
> {code}
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target(ElementType.METHOD)
> public @interface Command {
>   String name() default "";
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937884#comment-16937884
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit f6194a53832ecb4c7534950b9e8a2f56db201bb1 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f6194a5 ]

SOLR-13105: Update text inline tocs 10


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce 
Asynchronous Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#discussion_r328208652
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -832,5 +920,25 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
   return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, 
ScoreMode.COMPLETE_NO_SCORES, disi));
 }
 
+// Perform a cache load asynchronously
+private void cacheAsynchronously(LeafReaderContext context, 
IndexReader.CacheHelper cacheHelper) throws RejectedExecutionException {
+  /*
+   * If the current query is already being asynchronously cached,
+   * do not trigger another cache operation
+   */
+  if (inFlightAsyncLoadQueries.add(in.getQuery()) == false) {
+return;
+  }
+
+  FutureTask task = new FutureTask<>(() -> {
+DocIdSet localDocIdSet = cache(context);
+putIfAbsent(in.getQuery(), localDocIdSet, cacheHelper);
+
+//Remove the key from inflight -- the key is loaded now
+inFlightAsyncLoadQueries.remove(in.getQuery());
 
 Review comment:
   Can you `assert` the return value here?  I.e. we should *always* succeed in 
removing this query from the set.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce 
Asynchronous Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#discussion_r328207678
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -813,8 +881,28 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean performSynchronousCaching = !(executor != null);
 
 Review comment:
   Fix double negative here too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce 
Asynchronous Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#discussion_r328208798
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -832,5 +889,25 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
   return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, 
ScoreMode.COMPLETE_NO_SCORES, disi));
 }
 
+// Perform a cache load asynchronously
+private void cacheAsynchronously(LeafReaderContext context, 
IndexReader.CacheHelper cacheHelper) {
+  /*
+   * If the current query is already being asynchronously cached,
+   * do not trigger another cache operation
+   */
+  if (inFlightAsyncLoadQueries.add(in.getQuery()) == false) {
+return;
+  }
+
+  FutureTask task = new FutureTask<>(() -> {
+DocIdSet localDocIdSet = cache(context);
+putIfAbsent(in.getQuery(), localDocIdSet, cacheHelper);
+
+//remove the key from inflight -- the key is loaded now
+inFlightAsyncLoadQueries.remove(in.getQuery());
+return null;
+  });
+  executor.execute(task);
 
 Review comment:
   Yeah it is, thanks for opening separate issue ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce 
Asynchronous Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#discussion_r328206271
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -732,8 +779,29 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean performSynchronousCaching = !(executor != null);
 
 Review comment:
   Can you remove the double-negative here?  I.e.:
   
   ```
 boolean performSynchronousCaching = executor == null
   ```
   
   Also maybe rename to `cacheSynchronously`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce 
Asynchronous Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#discussion_r328208409
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -813,8 +881,28 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean performSynchronousCaching = !(executor != null);
+  // If asynchronous caching is requested, perform the same and return
+  // the uncached iterator
+  if (!performSynchronousCaching) {
+try {
+  cacheAsynchronously(context, cacheHelper);
+} catch (RejectedExecutionException e) {
 
 Review comment:
   Hmm, can you shrink-wrap this exception handling?  I.e., handle it down in 
`cacheAsynchronously` where we ask the executor to execute the task, instead of 
up here?
   
   Maybe then return a `boolean` from `cacheAsynchronously` to know (here) if 
caller should then cache synchronously ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
mikemccand commented on a change in pull request #815: LUCENE-8213: Introduce 
Asynchronous Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#discussion_r328207416
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -732,8 +779,29 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
 
   if (docIdSet == null) {
 if (policy.shouldCache(in.getQuery())) {
-  docIdSet = cache(context);
-  putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+  boolean performSynchronousCaching = !(executor != null);
+
+  // If asynchronous caching is requested, perform the same and return
+  // the uncached iterator
+  if (!performSynchronousCaching) {
+try {
+  cacheAsynchronously(context, cacheHelper);
+} catch (RejectedExecutionException e) {
+  // Trigger synchronous caching
+  performSynchronousCaching = true;
+}
+
+// If async caching failed, synchronous caching will
+// be performed, hence do not return the uncached value
+if (!performSynchronousCaching) {
 
 Review comment:
   Change to `performSynchronousCaching == false`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Anshum Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937869#comment-16937869
 ] 

Anshum Gupta commented on LUCENE-8984:
--

Thanks [~jim.ferenczi] for fixing the test and build! 

I will backport both of these commits to 8x.

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Anshum Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated LUCENE-8984:
-
Fix Version/s: (was: 8.3)

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Anshum Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated LUCENE-8984:
-
Fix Version/s: 8.3
   master (9.0)

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0), 8.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-25 Thread Shawn Heisey (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937851#comment-16937851
 ] 

Shawn Heisey commented on SOLR-8241:


bq. Why does Solr have both limits enabled at once?

If I understand it all correctly, and it's always possible that I don't, I 
think the idea is to allow a cache up to N entries, and if a smaller number of 
entries consumes a specific byte size in heap requirements, to use the smaller 
number instead.

I personally wouldn't have a problem with throwing an error if both limits are 
specified, but that opinion might go against what others think.


> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13784) EmbeddedSolrServer coreName should be optional

2019-09-25 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-13784.
-
Fix Version/s: 8.3
 Assignee: David Smiley
   Resolution: Fixed

> EmbeddedSolrServer coreName should be optional
> --
>
> Key: SOLR-13784
> URL: https://issues.apache.org/jira/browse/SOLR-13784
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
>
> The coreName constructor argument to EmbeddedSolrServer should be optional 
> because it's possible to use EmbeddedSolrServer with admin commands.  That 
> used to not be possible historically but it is today.  Since it's mandatory, 
> I have to use some dummy value which is ugly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13784) EmbeddedSolrServer coreName should be optional

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937845#comment-16937845
 ] 

ASF subversion and git services commented on SOLR-13784:


Commit 74cfacee96e646373abc267a4ba48c05a326c442 in lucene-solr's branch 
refs/heads/branch_8x from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74cface ]

SOLR-13784: EmbeddedSolrServer coreName optional

(cherry picked from commit 0d0af505a034a04e3d86cd24447b5a747bfa23c0)


> EmbeddedSolrServer coreName should be optional
> --
>
> Key: SOLR-13784
> URL: https://issues.apache.org/jira/browse/SOLR-13784
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Minor
>
> The coreName constructor argument to EmbeddedSolrServer should be optional 
> because it's possible to use EmbeddedSolrServer with admin commands.  That 
> used to not be possible historically but it is today.  Since it's mandatory, 
> I have to use some dummy value which is ugly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13784) EmbeddedSolrServer coreName should be optional

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937843#comment-16937843
 ] 

ASF subversion and git services commented on SOLR-13784:


Commit 0d0af505a034a04e3d86cd24447b5a747bfa23c0 in lucene-solr's branch 
refs/heads/master from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0d0af50 ]

SOLR-13784: EmbeddedSolrServer coreName optional


> EmbeddedSolrServer coreName should be optional
> --
>
> Key: SOLR-13784
> URL: https://issues.apache.org/jira/browse/SOLR-13784
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Minor
>
> The coreName constructor argument to EmbeddedSolrServer should be optional 
> because it's possible to use EmbeddedSolrServer with admin commands.  That 
> used to not be possible historically but it is today.  Since it's mandatory, 
> I have to use some dummy value which is ugly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-25 Thread Ben Manes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937836#comment-16937836
 ] 

Ben Manes commented on SOLR-8241:
-

You're right, only one size threshold is supported. Why does Solr have both 
limits enabled at once?

Internally {{maximumSize}} uses a {{Weigher}} that returns {{1}}. That is 
normally stored on the entry, but since it's a known constant, we can codegen a 
version to drop that field. This gives a number of items limit using the same 
logic required to support a limit with variably sized entries.

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13774) Add Lucene/Solr OpenJDK Compatibility Matrix to Ref Guide

2019-09-25 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-13774:
-

Assignee: Erick Erickson

> Add Lucene/Solr OpenJDK Compatibility Matrix to Ref Guide
> -
>
> Key: SOLR-13774
> URL: https://issues.apache.org/jira/browse/SOLR-13774
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Affects Versions: 8.1.1
> Environment: EC2 t2.2xlarge
> Ubuntu 16.04.2 LTS
> Solr source downloaded from: [https://archive.apache.org/dist/lucene/solr]
> OpenJDK binaries downloaded from: [https://jdk.java.net|https://jdk.java.net/]
> OpenJDK version information is included in the documentation.
>  
>Reporter: Nick
>Assignee: Erick Erickson
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Create a reusable build system to run Lucene/Solr ant test source code suite 
> against different versions of OpenJDK binaries. Generate a table with the 
> results of BUILD SUCCESSFUL or BUILD FAILED and incorporate the output into 
> the Ref Guide here: solr/solr-ref-guide/src/solr-system-requirements.adoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13791) Remove BeanUtils reference from ivy-versions.properties

2019-09-25 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated SOLR-13791:
--
Attachment: SOLR-13791-01.patch
Status: Open  (was: Open)

> Remove BeanUtils reference from ivy-versions.properties
> ---
>
> Key: SOLR-13791
> URL: https://issues.apache.org/jira/browse/SOLR-13791
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andras Salamon
>Priority: Major
> Attachments: SOLR-13791-01.patch
>
>
> SOLR-12617 removed Commons BeanUtils, but {{lucene/ivy-versions.properties}} 
> still have a reference to beanutils, because SOLR-9515 added this line back.
> We can remove this line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13791) Remove BeanUtils reference from ivy-versions.properties

2019-09-25 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated SOLR-13791:
--
Status: Patch Available  (was: Open)

> Remove BeanUtils reference from ivy-versions.properties
> ---
>
> Key: SOLR-13791
> URL: https://issues.apache.org/jira/browse/SOLR-13791
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andras Salamon
>Priority: Major
> Attachments: SOLR-13791-01.patch
>
>
> SOLR-12617 removed Commons BeanUtils, but {{lucene/ivy-versions.properties}} 
> still have a reference to beanutils, because SOLR-9515 added this line back.
> We can remove this line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937786#comment-16937786
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 42a46b1d6d87a7e63a97914c8b50168cfdf74038 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=42a46b1 ]

SOLR-13105: Update text inline tocs 8


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937787#comment-16937787
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 4bfbeee27464c0288c49af1f309565ee80f19e6f in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4bfbeee ]

SOLR-13105: Update text inline tocs 9


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937784#comment-16937784
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 077aad7e66ee934776ee7006123c64305d7a546c in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=077aad7 ]

SOLR-13105: Update text inline tocs 7


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937780#comment-16937780
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit dacbf48b2117f30f2e773fe8807ccc3d1b32382b in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=dacbf48 ]

SOLR-13105: Update text analytics docs


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13789) Solr core reload with config change upstream put solr instance in bad state

2019-09-25 Thread Nilesh Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937774#comment-16937774
 ] 

Nilesh Singh commented on SOLR-13789:
-

this is a edge case scenario where a custom cache takes time to warmup. No 
junit test was added to with core with custom cache.

Manual Testing:

Product Setup: In our product we have a custom cache that reads the indexes 
during replication and populates our data structures that is used inside the 
application during search/reponseWriter execution. This application uses 
Master/slave(Indexer/Application) architecture for replication. we have around 
300 application instances running in our production environment.

Issue: Modify the solr schema on our Indexer that propagated to Application, 
due to this current bug the core was reloaded to apply new schema but the new 
core got registered before the custom cache was ready. Due to that most of the 
application instances went into bad state and started throwing 500.

Resolution/Verification: after putting the fix the core reload operation is 
checking for the searcher status before registering itself. manually tested the 
above issue multiple times to verify the patch.

> Solr core reload with config change upstream put solr instance in bad state
> ---
>
> Key: SOLR-13789
> URL: https://issues.apache.org/jira/browse/SOLR-13789
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 6.6.5, 8.1.1
>Reporter: Nilesh Singh
>Priority: Critical
>  Labels: patch
> Attachments: SOLR-13789.patch
>
>
> In a master/slave setup if the schema.xml is changed on master, during 
> replication on slave solr instance, solr tries to reload the core with the 
> new config.
> During the reload operation solr creates a new searcher but it is not 
> checking the status of the searcher and make this new core active.
> Problem arises when the custom caches attached to the searcher has not been 
> warmed up. In some scenarios custom caches might take some times to warm up 
> in the background. 
>  
> This issue becomes critical during the schema changes in live solr instances, 
> that makes solr to return bad responses until the searcher caches are fully 
> ready.
> Another issue arises due to reload and openNewSearcherAndUpdateCommitPoint 
> methods by warming two searcher`s in parallel that is bad for cpu/memory.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2019-09-25 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937764#comment-16937764
 ] 

Andrzej Bialecki  commented on SOLR-8241:
-

[~ben.manes] Existing Solr cache implementations allow using a combination of 
limits on maximum size (number of items) and maximum heap size (number of 
bytes), with entries being force evicted whichever condition is met first. I 
can see how to use {{Weigher}} to implement the latter, but I also spotted this 
in the {{Caffeine.weigher(...)}}:
{code}
requireState(!strictParsing || this.maximumSize == UNSET_INT,
"weigher can not be combined with maximum size", this.maximumSize);
{code}
This would suggest that it's not possible to implement this combination of max 
size / max total weight limits?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Ben Manes
>Assignee: Andrzej Bialecki 
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: EvictionBenchmark.png, GetPutBenchmark.png, 
> SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, 
> SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, 
> solr_caffeine.patch.gz, solr_jmh_results.json
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs

2019-09-25 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937733#comment-16937733
 ] 

Andrzej Bialecki  commented on SOLR-13787:
--

I think the idea that we discussed during the committer's meeting was to 
eliminate the need for creating JSON specs. As it is this proposal adds 
complexity because now you need to both annotate *and* create a spec file. 

> An annotation based system to write v2 only APIs
> 
>
> Key: SOLR-13787
> URL: https://issues.apache.org/jira/browse/SOLR-13787
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> example v2 API may look as follows
> {code:java}
> @EndPoint(
>  spec = "cluster.package",
>  method = POST,
>  permission = PKG_EDIT
> )
> static class PkgEdit {
>  @Command(name = "add")
>  public void add(CallInfo callInfo) throws Exception {
>  }
>  @Command(name = "update")
>  public void update(CallInfo callInfo) throws Exception {
> }
>  @Command(name = "delete")
>  boolean deletePackage(CallInfo params) throws Exception {
> }
> {code}
> This expects you to already have the API spec json 
>  
> The annotations are:
>  
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target({ElementType.TYPE})
> public @interface EndPoint {
>   /**name of the API spec file without the '.json' suffix
>*/
>   String spec();
>   /**Http method
>*/
>   SolrRequest.METHOD method();
>   /**The well known persmission name if any
>*/
>   PermissionNameProvider.Name permission();
> }
> {code}
> {code:java}
> @Retention(RetentionPolicy.RUNTIME)
> @Target(ElementType.METHOD)
> public @interface Command {
>   String name() default "";
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937718#comment-16937718
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit a451526c502abe84f205a70e57f6175e6767088f in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a451526 ]

SOLR-13105: Update with inline tocs 6


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937717#comment-16937717
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit ded5ffe09f96af13a0b5ec7103aebd24f6d6c092 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ded5ffe ]

SOLR-13105: Update with inline tocs 5


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937716#comment-16937716
 ] 

ASF subversion and git services commented on SOLR-13105:


Commit 835aab55d5760c0074e2cd76b78d563aa3db9178 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=835aab5 ]

SOLR-13105: Update with inline tocs 4


> A visual guide to Solr Math Expressions and Streaming Expressions
> -
>
> Key: SOLR-13105
> URL: https://issues.apache.org/jira/browse/SOLR-13105
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot 
> 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, 
> Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 
> AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png
>
>
> Visualization is now a fundamental element of Solr Streaming Expressions and 
> Math Expressions. This ticket will create a visual guide to Solr Math 
> Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* 
> visualization examples.
> It will also cover using the JDBC expression to *analyze* and *visualize* 
> results from any JDBC compliant data source.
> Intro from the guide:
> {code:java}
> Streaming Expressions exposes the capabilities of Solr Cloud as composable 
> functions. These functions provide a system for searching, transforming, 
> analyzing and visualizing data stored in Solr Cloud collections.
> At a high level there are four main capabilities that will be explored in the 
> documentation:
> * Searching, sampling and aggregating results from Solr.
> * Transforming result sets after they are retrieved from Solr.
> * Analyzing and modeling result sets using probability and statistics and 
> machine learning libraries.
> * Visualizing result sets, aggregations and statistical models of the data.
> {code}
>  
> A few sample visualizations are attached to the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8989) IndexSearcher Should Handle Rejection of Concurrent Task

2019-09-25 Thread Atri Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937679#comment-16937679
 ] 

Atri Sharma commented on LUCENE-8989:
-

I think the safest bet here is to run the rejected task on the caller thread. 
Any objections/thoughts?

> IndexSearcher Should Handle Rejection of Concurrent Task
> 
>
> Key: LUCENE-8989
> URL: https://issues.apache.org/jira/browse/LUCENE-8989
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> As discussed in [https://github.com/apache/lucene-solr/pull/815,] 
> IndexSearcher should handle the case when the executor rejects the execution 
> of a task (unavailability of threads?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-13790) LRUStatsCache size explosion

2019-09-25 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki  created SOLR-13790:


 Summary: LRUStatsCache size explosion
 Key: SOLR-13790
 URL: https://issues.apache.org/jira/browse/SOLR-13790
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.2, 7.7.2, 8.3
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 7.7.3, 8.3


On a sizeable cluster with multi-shard multi-replica collections, when 
{{LRUStatsCache}} was in use we encountered excessive memory usage, which 
consequently led to severe performance problems.

On a closer examination of the heapdumps it became apparent that when 
{{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of 
{{FastLRUCache}} using the passed {{shard}} argument - however, the value of 
this argument is not a simple shard name but instead it's a randomly ordered 
list of ALL replica URLs for this shard.

As a result, due to the combinatoric number of possible keys, over time the map 
in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries...

The fix seems to be simply to extract the shard name and cache using this name 
instead of the full string value of the {{shard}} parameter. Existing unit 
tests also need much improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul opened a new pull request #898: SOLR-13661 A package management system for Solr

2019-09-25 Thread GitBox
noblepaul opened a new pull request #898: SOLR-13661 A package management 
system for Solr
URL: https://github.com/apache/lucene-solr/pull/898
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8978) "Max Bottom" Based Early Termination For Concurrent Search

2019-09-25 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937635#comment-16937635
 ] 

Jim Ferenczi commented on LUCENE-8978:
--

+1 to backport, yes

> "Max Bottom" Based Early Termination For Concurrent Search
> --
>
> Key: LUCENE-8978
> URL: https://issues.apache.org/jira/browse/LUCENE-8978
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Atri Sharma
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> When running a search concurrently, collectors which have collected the 
> number of hits requested locally i.e. their local priority queue is full can 
> then globally publish their bottom hit's score, and other collectors can then 
> use that score as the filter. If multiple collectors have full priority 
> queues, the maximum of all bottom scores will be considered as the global 
> bottom score.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-8989) IndexSearcher Should Handle Rejection of Concurrent Task

2019-09-25 Thread Atri Sharma (Jira)
Atri Sharma created LUCENE-8989:
---

 Summary: IndexSearcher Should Handle Rejection of Concurrent Task
 Key: LUCENE-8989
 URL: https://issues.apache.org/jira/browse/LUCENE-8989
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Atri Sharma


As discussed in [https://github.com/apache/lucene-solr/pull/815,] IndexSearcher 
should handle the case when the executor rejects the execution of a task 
(unavailability of threads?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937618#comment-16937618
 ] 

Jim Ferenczi edited comment on LUCENE-8984 at 9/25/19 10:49 AM:


I pushed a patch for the test failure. [~anshum]  [~andyhind] don't forget to 
apply the patch if/when you backport to branch_8x ;).


was (Author: jim.ferenczi):
I pushed a patch for the test failure. [~andyhind] don't forget to apply the 
patch if/when you backport to branch_8x ;).

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937618#comment-16937618
 ] 

Jim Ferenczi commented on LUCENE-8984:
--

I pushed a patch for the test failure. [~andyhind] don't forget to apply the 
patch if/when you backport to branch_8x ;).

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937615#comment-16937615
 ] 

ASF subversion and git services commented on LUCENE-8984:
-

Commit a333b6dee3d2cbd157fea250873b900bde880c51 in lucene-solr's branch 
refs/heads/master from jimczi
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a333b6d ]

LUCENE-8984: Fix ut by cleaning up resources after test


> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with “description”, so the key terms end up 
> being just the terms with the highest term frequencies.
> It is inconsistent because the MLT-search then uses these extracted key terms 
> and scores all documents using an idf which is computed only on the subset of 
> documents with “description”. So one part of the MLT uses a different numDocs 
> than another part. This sounds like an odd choice, and not expected at all, 
> and I wonder if I’m missing something.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
atris commented on issue #815: LUCENE-8213: Introduce Asynchronous Caching in 
LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#issuecomment-534957255
 
 
   @mikemccand Updated the PR, please see and let me know.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13272) Interval facet support for JSON faceting

2019-09-25 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937571#comment-16937571
 ] 

Mikhail Khludnev commented on SOLR-13272:
-

Reviewed pdf after applying patch locally. Awesome. Thank you [~munendrasn], 
[~apoorvprecisely]!

> Interval facet support for JSON faceting
> 
>
> Key: SOLR-13272
> URL: https://issues.apache.org/jira/browse/SOLR-13272
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Apoorv Bhawsar
>Assignee: Munendra S N
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13272-doc.patch, SOLR-13272.patch, SOLR-13272.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Interval facet is supported in classical facet component but has no support 
> in json facet requests.
>  In cases of block join and aggregations, this would be helpful
> Assuming request format -
> {code:java}
> json.facet={pubyear:{type : interval,field : 
> pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}}
> {code}
>  
>  PR https://github.com/apache/lucene-solr/pull/597



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10532) The suggest.build and suggest.reload params should be distributed to all replicas

2019-09-25 Thread dhirajforyou (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937548#comment-16937548
 ] 

dhirajforyou commented on SOLR-10532:
-

Got similar issue. Any workaround on this ?

> The suggest.build and suggest.reload params should be distributed to all 
> replicas
> -
>
> Key: SOLR-10532
> URL: https://issues.apache.org/jira/browse/SOLR-10532
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud, Suggester
>Reporter: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 7.0
>
>
> This is inspired by a discussion on solr-user. Today, the suggest.build and 
> suggest.reload parameters are all local to the replica receiving the request. 
> This is both confusing and annoying to users because the expectation is that 
> doing so will build/reload the suggest index on all replicas of a collection 
> but the reality is that it happens only on one replica of each shard as per 
> the normal distributed query process. 
> We should distribute the build and reload param to all replicas of a 
> collection before actually processing the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-8988) Maximal -- Minimum Based Early Termination For TopFieldCollector

2019-09-25 Thread Atri Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Atri Sharma reassigned LUCENE-8988:
---

Assignee: Atri Sharma

> Maximal -- Minimum Based Early Termination For TopFieldCollector
> 
>
> Key: LUCENE-8988
> URL: https://issues.apache.org/jira/browse/LUCENE-8988
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Atri Sharma
>Priority: Major
>
> Use LUCENE-8978 to implement the same logic for TopFieldCollector



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-8988) Maximal -- Minimum Based Early Termination For TopFieldCollector

2019-09-25 Thread Atri Sharma (Jira)
Atri Sharma created LUCENE-8988:
---

 Summary: Maximal -- Minimum Based Early Termination For 
TopFieldCollector
 Key: LUCENE-8988
 URL: https://issues.apache.org/jira/browse/LUCENE-8988
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Atri Sharma


Use LUCENE-8978 to implement the same logic for TopFieldCollector



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8978) "Max Bottom" Based Early Termination For Concurrent Search

2019-09-25 Thread Atri Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Atri Sharma resolved LUCENE-8978.
-
Resolution: Fixed

> "Max Bottom" Based Early Termination For Concurrent Search
> --
>
> Key: LUCENE-8978
> URL: https://issues.apache.org/jira/browse/LUCENE-8978
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Atri Sharma
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> When running a search concurrently, collectors which have collected the 
> number of hits requested locally i.e. their local priority queue is full can 
> then globally publish their bottom hit's score, and other collectors can then 
> use that score as the filter. If multiple collectors have full priority 
> queues, the maximum of all bottom scores will be considered as the global 
> bottom score.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #897: LUCENE-8978: Maximal Of Minimum Scores Based Concurrent Early Termination

2019-09-25 Thread GitBox
atris commented on issue #897: LUCENE-8978: Maximal Of Minimum Scores Based 
Concurrent Early Termination
URL: https://github.com/apache/lucene-solr/pull/897#issuecomment-534904623
 
 
   Thanks @jimczi !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8978) "Max Bottom" Based Early Termination For Concurrent Search

2019-09-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937497#comment-16937497
 ] 

ASF subversion and git services commented on LUCENE-8978:
-

Commit 25f88c5a63af6b367aa3d424ce5b60ab25f717b4 in lucene-solr's branch 
refs/heads/master from Atri Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=25f88c5 ]

LUCENE-8978: Maximal Of Minimum Scores Based Concurrent Early Termination (#897)

* LUCENE-8978: Maximal Of Minimum Scores Based Concurrent Early
Termination

This commit introduces a mechanism to allow threads to early terminate segments 
based on globally shared maximum of minimum scores.

> "Max Bottom" Based Early Termination For Concurrent Search
> --
>
> Key: LUCENE-8978
> URL: https://issues.apache.org/jira/browse/LUCENE-8978
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Atri Sharma
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> When running a search concurrently, collectors which have collected the 
> number of hits requested locally i.e. their local priority queue is full can 
> then globally publish their bottom hit's score, and other collectors can then 
> use that score as the filter. If multiple collectors have full priority 
> queues, the maximum of all bottom scores will be considered as the global 
> bottom score.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris merged pull request #897: LUCENE-8978: Maximal Of Minimum Scores Based Concurrent Early Termination

2019-09-25 Thread GitBox
atris merged pull request #897: LUCENE-8978: Maximal Of Minimum Scores Based 
Concurrent Early Termination
URL: https://github.com/apache/lucene-solr/pull/897
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on a change in pull request #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-09-25 Thread GitBox
atris commented on a change in pull request #815: LUCENE-8213: Introduce 
Asynchronous Caching in LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/815#discussion_r327971441
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##
 @@ -832,5 +889,25 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
   return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, 
ScoreMode.COMPLETE_NO_SCORES, disi));
 }
 
+// Perform a cache load asynchronously
+private void cacheAsynchronously(LeafReaderContext context, 
IndexReader.CacheHelper cacheHelper) {
+  /*
+   * If the current query is already being asynchronously cached,
+   * do not trigger another cache operation
+   */
+  if (inFlightAsyncLoadQueries.add(in.getQuery()) == false) {
+return;
+  }
+
+  FutureTask task = new FutureTask<>(() -> {
+DocIdSet localDocIdSet = cache(context);
+putIfAbsent(in.getQuery(), localDocIdSet, cacheHelper);
+
+//remove the key from inflight -- the key is loaded now
+inFlightAsyncLoadQueries.remove(in.getQuery());
+return null;
+  });
+  executor.execute(task);
 
 Review comment:
   Fixed. I think that actually is a valid problem even in `IndexSearcher`? I 
will open another issue to track that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8984) MoreLikeThis MLT is biased for uncommon fields

2019-09-25 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937432#comment-16937432
 ] 

Ignacio Vera commented on LUCENE-8984:
--

This change seems to make CI unhappy:

{code}
05:52:34[junit4]   2> NOTE: test params are: codec=Asserting(Lucene80): 
{one_percent=BlockTreeOrds(blocksize=128), text2=FSTOrd50, text=FSTOrd50}, 
docValues:{}, maxPointsInLeafNode=983, maxMBSortInHeap=7.314239663019871, 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@48031d4f),
 locale=fr-CI, timezone=Antarctica/Davis
05:52:34[junit4]   2> NOTE: Linux 4.15.0-1044-gcp amd64/Oracle Corporation 
11.0.2 (64-bit)/cpus=16,threads=1,free=473368128,total=536870912
05:52:34[junit4]   2> NOTE: All tests run in this JVM: 
[TestBoolValOfNumericDVs, TestDocValuesFieldSources, TestIndexReaderFunctions, 
TestFunctionRangeQuery, TestIntervals, TestMoreLikeThis]
05:52:34[junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestMoreLikeThis -Dtests.seed=502A5EC44CFFA041 -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=fr-CI -Dtests.timezone=Antarctica/Davis 
-Dtests.asserts=true -Dtests.file.encoding=UTF8
05:52:34[junit4] ERROR   0.00s J1 | TestMoreLikeThis (suite) <<<
05:52:34[junit4]> Throwable #1: 
com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope 
SUITE failed to close. Resource was registered from thread Thread[id=35, 
name=TEST-TestMoreLikeThis.testSmallSampleFromCorpus-seed#[502A5EC44CFFA041], 
state=RUNNABLE, group=TGRP-TestMoreLikeThis], registration stack trace below.
05:52:34[junit4]>   at 
__randomizedtesting.SeedInfo.seed([502A5EC44CFFA041]:0)
05:52:34[junit4]>   at 
java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
05:52:34[junit4]>   at 
com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:157)
05:52:34[junit4]>   at 
org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:777)
05:52:34[junit4]>   at 
org.apache.lucene.util.LuceneTestCase.wrapDirectory(LuceneTestCase.java:1464)
05:52:34[junit4]>   at 
org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:1333)
05:52:34[junit4]>   at 
org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:1315)
05:52:34[junit4]>   at 
org.apache.lucene.queries.mlt.TestMoreLikeThis.testSmallSampleFromCorpus(TestMoreLikeThis.java:136)
05:52:34[junit4]>   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
05:52:34[junit4]>   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
05:52:34[junit4]>   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
05:52:34[junit4]>   at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
05:52:34[junit4]>   at 
java.base/java.lang.Thread.run(Thread.java:834)
05:52:34[junit4]> Caused by: java.lang.AssertionError: Directory not 
closed: MockDirectoryWrapper(ByteBuffersDirectory@1a46a392 
lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@3e971b3a)
05:52:34[junit4]>   at 
org.apache.lucene.util.CloseableDirectory.close(CloseableDirectory.java:45)
05:52:34[junit4]>   at 
com.carrotsearch.randomizedtesting.RandomizedContext.closeResources(RandomizedContext.java:225)
05:52:34[junit4]>   ... 2 more
{code}

> MoreLikeThis MLT is biased for uncommon fields
> --
>
> Key: LUCENE-8984
> URL: https://issues.apache.org/jira/browse/LUCENE-8984
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andy Hind
>Assignee: Anshum Gupta
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> MLT always uses the total doc count and not the count of docs with the 
> specific field
>  
> To quote Maria Mestre from the discussion on the mailing list - 29/01/19
>  
> {quote}The issue I have is that when retrieving the key scored terms 
> (interestingTerms), the code uses the total number of documents in the index, 
> not the total number of documents with populated “description” field. This is 
> where it’s done in the code: 
> [https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_master_lucene_queries_src_java_org_apache_lucene_queries_mlt_MoreLikeThis.java-23L651=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=XIYHWqjoenB2nuyYPl8m6c5xBIOD8PZJ4CWx0j6tQjA=gYOyL1Msgk2dpzigOsIvXq3CiFF0T7ApMLBVVDKW2dQ=v4mgEvgP3HWtMZcL3FTiKeY2nBOPJpTypmCpCBwPkQs=]
> The effect of this choice is that the “idf” does not vary much, given that 
> numDocs >> number of documents with