Re: Proximity Search with phrases

2020-12-03 Thread Radu Gheorghe
Hi Mark,

I don’t really get your use-case. Maybe you can provide another example?

In either case, maybe the surround query parser would help? 
https://lucene.apache.org/solr/guide/8_4/other-parsers.html#surround-query-parser

Or span queries in general via the XML query parser? 
https://lucene.apache.org/solr/guide/8_4/other-parsers.html#xml-query-parser

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 27 Nov 2020, at 14:25, Mark R  wrote:
> 
> Use Case: Is it possible to perform a proximity search using phrases for 
> example: "phrase 1" with 10 words of "phrase 2"
> 
> SOLR Version: 8.4.1
> 
> Query using: "(\"word1 word2\"(\"word3 word4\")"~10
> 
> While this returns results seems to be evaluating the words with each other.
> 
> Are stop words removed when querying, I assume yes. ?
> 
> Thanks in advance
> 
> Mark
> 
> 



Re: Shard Lock

2020-12-03 Thread Radu Gheorghe
Wild shot here: two Solr instances started on the same data directory?

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 1 Dec 2020, at 06:25, sambasivarao giddaluri 
>  wrote:
> 
> when checked in to opt/solr/volumes/data/cores/ both 
> k04o95kz_shard2_replica_n10 and k04o95kz_shard3_replica_n16 replicate are not 
> present no idea how they got deleted.
> 
> On Mon, Nov 30, 2020 at 4:13 PM sambasivarao giddaluri 
>  wrote:
> Hi All,
> We are getting below exception from Solr where 3 zk with 3 solr nodes and 3 
> replicas. It was working fine and we got this exception unexpectedly.
> 
>   • k04o95kz_shard2_replica_n10: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Index dir 
> '/opt/solr/volumes/data/cores/k04o95kz_shard2_replica_n10/data/index.20201126040543992'
>  of core 'k04o95kz_shard2_replica_n10' is already locked. The most likely 
> cause is another Solr server (or another solr core in this server) also 
> configured to use this directory; other possible causes may be specific to 
> lockType: native
>   • k04o95kz_shard3_replica_n16: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Index dir 
> '/opt/solr/volumes/data/cores/k04o95kz_shard3_replica_n16/data/index.20201126040544142'
>  of core 'k04o95kz_shard3_replica_n16' is already locked. The most likely 
> cause is another Solr server (or another solr core in this server) also 
> configured to use this directory; other possible causes may be specific to 
> lockType: native
> 
> 
> 
> 
> 
> 
> Any advice
> 
> Thanks 
> sam



Re: Facet to part of search results

2020-12-03 Thread Radu Gheorghe


> On 3 Dec 2020, at 20:18, Shawn Heisey  wrote:
> 
> On 12/3/2020 9:55 AM, Jae Joo wrote:
>> Is there any way to apply facet to the partial search result?
>> For ex, we have 10m return by "dog" and like to apply facet to first 10K.
>> Possible?
> 
> The point of facets is to provide accurate numbers.
> 
> What would it mean to only apply to the first 10K?  If there are 10 million 
> documents in the query results that contain "dog" then the facet should say 
> 10 million, not 10K.  I do not understand what you're trying to do.
> 

Maybe sampling? I’m not aware of a built-in way to do that. But you could index 
a random float between, say 0 and 100 and then filter out a sample by filtering 
for number

Re: facet.method=smart

2020-12-03 Thread Radu Gheorghe
Hi Jae,

No, it’s not smarter than explicitly defining, for example enum for a 
low-cardinality field.

Think of “smart” as a default path, and explicit definitions as some “hints”. 
You can see that default path in this function: 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetField.java#L74

Note that I’ve added a PR with a bit more explanations for the “hits” here: 
https://github.com/apache/lucene-solr/pull/2057 But if you’re missing some 
info, please feel free to comment (here or there), I could add some more info.

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 30 Nov 2020, at 22:46, Jae Joo  wrote:
> 
> Is "smart" really smarter than one explicitly defined?
> 
> For "emun" type, would it be faster to define facet.method=enum than smart?
> 
> Jae



Re: increasing number of threads for faceting in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Munedra,

This is great that I can get things faster by reducing the gap and by
increasing the number of threads. How to reduce gaps I know: one can
replace   "gap":   "+1HOUR" with   "gap":   "+1MONTH" What should I change
in the text below to increase the number of threads from one to 20?

Cheers,
Arturas

On Thu, Dec 3, 2020 at 1:54 PM Munendra S N  wrote:

> Hi,
>
> Currently, JSON facets have support for specifying the number of threads.
> In the above request, the range facet is computed over 2 years with a gap
> of 1 hour. By reducing the number of buckets, computation should become
> much faster
>
> Regards,
> Munendra S N
>
>
>
> On Thu, Dec 3, 2020 at 1:52 PM Arturas Mazeika  wrote:
>
> > Hi Solr-Users,
> >
> > I am trying to better understand the solr capabilities, how one can
> > formulate queries in JSON format as well as tweak parameters. Currently I
> > have a logs collection (ca 6GB large) with a dozen of attributes running
> in
> > single server mode (F:\solr_deployment\solr-8.7.0\bin\solr.cmd start -h
> > localhost -p  -m 4g)
> >
> > I am playing with faceting functionality in solr and query a couple of
> > attributes there. My typical query is:
> >
> > GET http://localhost:/solr/db/query
> >  HTTP/1.1
> > content-type: application/json
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> > "facet": {
> > "t" : {
> > "type":  "terms",
> > "field": "fcomp",
> > "sort":  "index",
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > }
> > }
> > },
> > }
> > }
> >
> > not surprisingly, it takes a bit to compute the result, so I tried to
> > increase the number of threads. How do I do it in JSON format? I tried
> > adding
> >
> > {
> > "params": {
> > "facet.threads": 8
> > },
> > "query"  : "*:*",
> > ...
> > }
> >
> > and checked the jstack  of the solr java process, but I still see
> only
> > one thread working.  Can I configure params through the params section?
> >
> > I also tried
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> > "facet": {
> > "t" : {
> > "type":  "terms",
> > "field": "fcomp",
> > "sort":  "index",
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > }
> > },
> > "threads":8
> > },
> > }
> > }
> >
> > but this ran in one thread as well. Can I influence the number of threads
> > in the "facet" section of JSON?
> >
> > Cheers,
> > Arturas
> >
>


Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Michael,

I wish I were able to do a percent of what you are doing. Where does your
inspiration come from? It is not from the manuals, cause I've checked
those. How do you come up with this piece of art? Did you check this from
the source code? Which lines revealed these secrets? I am eternally
grateful for your help!

Michael, maybe you happen to know how I can plugin in facet.threads
parameter in that JSON body below, so the query uses more threads to
compute the answer? I am dying out of curiosity.

Cheers,
Arturas

On Thu, Dec 3, 2020 at 7:59 PM Michael Gibney 
wrote:

> I think the first "error" case in your set of examples above is closest to
> being correct. For "query" facet type, I think you want to explicitly
> specify `"type":"query"`, and specify the query itself in the `"q"` param,
> i.e.:
> {
> "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "aip": {
> "type":  "query",
> "q":  "cfname2:aip",
> "facet": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
> }
> }
>
> On Thu, Dec 3, 2020 at 12:59 PM Arturas Mazeika  wrote:
>
> > Hi Michael,
> >
> > Thanks for helping me to figure this out.
> >
> > If I fire:
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> >
> > "facet": {
> > "aip": { "query":  "cfname2:aip", }
> >
> > }
> > }
> >
> > I get
> >
> > "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> > "docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307 } } }
> >
> > (works). If I fire
> >
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > "limit": 1
> > }
> > }
> > }
> >
> > I get
> >
> > "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> > "docs": [] }, "facets": { "count": 20560849, "t_buckets": { "buckets": [
> {
> > "val": "2018-05-02T17:00:00Z", "count": 150 },
> >
> > (works). If I fire:
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> >
> > "facet": {
> > "aip": { "query":  "cfname2:aip",
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > "limit": 1
> > }
> > }
> > }
> > }
> > }
> >
> > I get
> >
> > "error": { "metadata": [ "error-class",
> > "org.apache.solr.common.SolrException", "root-error-class",
> > "org.apache.solr.common.SolrException" ], "msg": "expected facet/stat
> type
> > name, like {type:range, field:price, ...} but got null , path=/facet",
> > "code": 400 } }
> >
> > If I fire
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> >
> > "facet": {
> > "aip": { "query":  "cfname2:aip",
> >
> > "facet": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > "limit": 1
> > }
> > }
> > }
> > }
> >
> > I get
> >
> > "error": { "metadata": [ "error-class",
> > "org.apache.solr.common.SolrException", "root-error-class",
> > "org.apache.solr.common.SolrException" ], "msg": "expected facet/stat
> type
> > name, like {type:range, field:price, ...} but got null , path=/facet",
> > "code": 400 } }
> >
> > What else can I try out?
> >
> > Cheers,
> > Arturas
> >
> > On Thu, Dec 3, 2020 at 3:55 PM Michael Gibney  >
> > wrote:
> >
> > > Arturas,
> > > I think your syntax is wrong for the range subfacet? -- the
> configuration
> > > of the range facet should be directly under the `tt` key, rather than
> > > nested under `t_buckets` in the request. (The response introduces a
> > > "buckets" attribute that is not part of the request syntax).
> > > Michael
> > >
> > > On Thu, Dec 3, 2020 at 3:47 AM Arturas Mazeika 
> > wrote:
> > >
> > > > Hi Solr Team,
> > > >
> > > > I am trying to check how I can formulate facet queries using JSON
> > > format. I
> > > > can successfully formulate query, range, term queries, 

parsing multivalued fields in Value Source Parser

2020-12-03 Thread Manna,Tridib
Hello All,

I am writing a custom function query that requires to parse a multivalued 
field. I am getting this exception : org.apache.solr.common.SolrException: can 
not use FieldCache on multivalued field
The function query works as expected with single-valued field.

How can I parse a multi-valued fields with FunctionQParser( or any other way)? 
and get the all the values for that field for further processing in my custom 
function ?

TIA,

Tridib Manna


Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Michael Gibney
I think the first "error" case in your set of examples above is closest to
being correct. For "query" facet type, I think you want to explicitly
specify `"type":"query"`, and specify the query itself in the `"q"` param,
i.e.:
{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"aip": {
"type":  "query",
"q":  "cfname2:aip",
"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
"limit": 1
}
}
}
}
}

On Thu, Dec 3, 2020 at 12:59 PM Arturas Mazeika  wrote:

> Hi Michael,
>
> Thanks for helping me to figure this out.
>
> If I fire:
>
> {
> "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "aip": { "query":  "cfname2:aip", }
>
> }
> }
>
> I get
>
> "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> "docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307 } } }
>
> (works). If I fire
>
>
> {
> "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
>
> I get
>
> "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> "docs": [] }, "facets": { "count": 20560849, "t_buckets": { "buckets": [ {
> "val": "2018-05-02T17:00:00Z", "count": 150 },
>
> (works). If I fire:
>
> {
> "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "aip": { "query":  "cfname2:aip",
>
> "facet": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
> }
> }
>
> I get
>
> "error": { "metadata": [ "error-class",
> "org.apache.solr.common.SolrException", "root-error-class",
> "org.apache.solr.common.SolrException" ], "msg": "expected facet/stat type
> name, like {type:range, field:price, ...} but got null , path=/facet",
> "code": 400 } }
>
> If I fire
>
> {
> "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "aip": { "query":  "cfname2:aip",
>
> "facet": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
> }
>
> I get
>
> "error": { "metadata": [ "error-class",
> "org.apache.solr.common.SolrException", "root-error-class",
> "org.apache.solr.common.SolrException" ], "msg": "expected facet/stat type
> name, like {type:range, field:price, ...} but got null , path=/facet",
> "code": 400 } }
>
> What else can I try out?
>
> Cheers,
> Arturas
>
> On Thu, Dec 3, 2020 at 3:55 PM Michael Gibney 
> wrote:
>
> > Arturas,
> > I think your syntax is wrong for the range subfacet? -- the configuration
> > of the range facet should be directly under the `tt` key, rather than
> > nested under `t_buckets` in the request. (The response introduces a
> > "buckets" attribute that is not part of the request syntax).
> > Michael
> >
> > On Thu, Dec 3, 2020 at 3:47 AM Arturas Mazeika 
> wrote:
> >
> > > Hi Solr Team,
> > >
> > > I am trying to check how I can formulate facet queries using JSON
> > format. I
> > > can successfully formulate query, range, term queries, as well as
> nested
> > > term queries. How can I formulate a nested facet query involving
> "query"
> > as
> > > well as "range" formulations? The following does not work:
> > >
> > >
> > > GET http://localhost:/solr/db/query HTTP/1.1
> > > content-type: application/json
> > >
> > > {
> > > "query"  : "*:*",
> > > "limit"  : 0,
> > > "facet": {
> > > "a1": { "query":  "cfname2:1" },
> > > "a2": { "query":  "cfname2:2" },
> > > "a3": { "field":  "cfname2", "type":"terms", "prefix":"3" },
> > > "a4": { "query":  "cfname2:4" },
> > > "a5": { "query":  "cfname2:5" },
> > > "a6": { "query":  "cfname2:6" },
> > >
> > > "tt": {
> > > "t_buckets": {
> > > "type":  "range",
> > > "field": "t",
> > > "sort": { "t": "asc" },
> > > "start": 

Re: Solrj supporting term vector component ?

2020-12-03 Thread Shawn Heisey

On 12/3/2020 10:20 AM, Deepu wrote:

I am planning to use Term vector component for one of the use cases, as per
below solr documentation link solrj not supporting Term Vector Component,
do you have any other suggestions to use TVC in java application?

https://lucene.apache.org/solr/guide/8_4/the-term-vector-component.html#solrj-and-the-term-vector-component


SolrJ will support just about any query you might care to send, you just 
have to give it all the required parameters when building the request. 
All the results will be available, though you'll almost certainly have 
to provide code yourself that rips apart the NamedList into usable info.


What is being said in the documentation is that there are not any 
special objects or methods for doing term vector queries.  It's not 
saying that it can't be done.


Thanks,
Shawn


Re: Facet to part of search results

2020-12-03 Thread Shawn Heisey

On 12/3/2020 9:55 AM, Jae Joo wrote:

Is there any way to apply facet to the partial search result?
For ex, we have 10m return by "dog" and like to apply facet to first 10K.
Possible?


The point of facets is to provide accurate numbers.

What would it mean to only apply to the first 10K?  If there are 10 
million documents in the query results that contain "dog" then the facet 
should say 10 million, not 10K.  I do not understand what you're trying 
to do.


Shawn


Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Michael,

Thanks for helping me to figure this out.

If I fire:

{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"aip": { "query":  "cfname2:aip", }

}
}

I get

"response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
"docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307 } } }

(works). If I fire


{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
"limit": 1
}
}
}

I get

"response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
"docs": [] }, "facets": { "count": 20560849, "t_buckets": { "buckets": [ {
"val": "2018-05-02T17:00:00Z", "count": 150 },

(works). If I fire:

{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"aip": { "query":  "cfname2:aip",

"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
"limit": 1
}
}
}
}
}

I get

"error": { "metadata": [ "error-class",
"org.apache.solr.common.SolrException", "root-error-class",
"org.apache.solr.common.SolrException" ], "msg": "expected facet/stat type
name, like {type:range, field:price, ...} but got null , path=/facet",
"code": 400 } }

If I fire

{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"aip": { "query":  "cfname2:aip",

"facet": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
"limit": 1
}
}
}
}

I get

"error": { "metadata": [ "error-class",
"org.apache.solr.common.SolrException", "root-error-class",
"org.apache.solr.common.SolrException" ], "msg": "expected facet/stat type
name, like {type:range, field:price, ...} but got null , path=/facet",
"code": 400 } }

What else can I try out?

Cheers,
Arturas

On Thu, Dec 3, 2020 at 3:55 PM Michael Gibney 
wrote:

> Arturas,
> I think your syntax is wrong for the range subfacet? -- the configuration
> of the range facet should be directly under the `tt` key, rather than
> nested under `t_buckets` in the request. (The response introduces a
> "buckets" attribute that is not part of the request syntax).
> Michael
>
> On Thu, Dec 3, 2020 at 3:47 AM Arturas Mazeika  wrote:
>
> > Hi Solr Team,
> >
> > I am trying to check how I can formulate facet queries using JSON
> format. I
> > can successfully formulate query, range, term queries, as well as nested
> > term queries. How can I formulate a nested facet query involving "query"
> as
> > well as "range" formulations? The following does not work:
> >
> >
> > GET http://localhost:/solr/db/query HTTP/1.1
> > content-type: application/json
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> > "facet": {
> > "a1": { "query":  "cfname2:1" },
> > "a2": { "query":  "cfname2:2" },
> > "a3": { "field":  "cfname2", "type":"terms", "prefix":"3" },
> > "a4": { "query":  "cfname2:4" },
> > "a5": { "query":  "cfname2:5" },
> > "a6": { "query":  "cfname2:6" },
> >
> > "tt": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > }
> > }
> > }
> > }
> >
> > Single (not nested facets separately on individual queries as well as for
> > range) work in flying colors.
> >
> > Cheers,
> > Arturas
> >
>


Solrj supporting term vector component ?

2020-12-03 Thread Deepu
Dear Community Members,

I am planning to use Term vector component for one of the use cases, as per
below solr documentation link solrj not supporting Term Vector Component,
do you have any other suggestions to use TVC in java application?

https://lucene.apache.org/solr/guide/8_4/the-term-vector-component.html#solrj-and-the-term-vector-component


Thanks,
Deepu


Facet to part of search results

2020-12-03 Thread Jae Joo
Is there any way to apply facet to the partial search result?
For ex, we have 10m return by "dog" and like to apply facet to first 10K.
Possible?

Jae


Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Michael Gibney
Arturas,
I think your syntax is wrong for the range subfacet? -- the configuration
of the range facet should be directly under the `tt` key, rather than
nested under `t_buckets` in the request. (The response introduces a
"buckets" attribute that is not part of the request syntax).
Michael

On Thu, Dec 3, 2020 at 3:47 AM Arturas Mazeika  wrote:

> Hi Solr Team,
>
> I am trying to check how I can formulate facet queries using JSON format. I
> can successfully formulate query, range, term queries, as well as nested
> term queries. How can I formulate a nested facet query involving "query" as
> well as "range" formulations? The following does not work:
>
>
> GET http://localhost:/solr/db/query HTTP/1.1
> content-type: application/json
>
> {
> "query"  : "*:*",
> "limit"  : 0,
> "facet": {
> "a1": { "query":  "cfname2:1" },
> "a2": { "query":  "cfname2:2" },
> "a3": { "field":  "cfname2", "type":"terms", "prefix":"3" },
> "a4": { "query":  "cfname2:4" },
> "a5": { "query":  "cfname2:5" },
> "a6": { "query":  "cfname2:6" },
>
> "tt": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> }
> }
> }
> }
>
> Single (not nested facets separately on individual queries as well as for
> range) work in flying colors.
>
> Cheers,
> Arturas
>


Re: Solr8.7 - How to optmize my index ?

2020-12-03 Thread Erick Erickson
Dave:

Yeah, every time there’s generic advice, there’s some situations where it’s not 
the best choice ;).

In your situation, you’re trading of some space savings for moving up to 450G 
all at once. Which sounds like it is worthwhile to you, although I’d check perf 
numbers sometime

You may want to check out expungeDeletes. That will deal only with segments 
with more than 10% deleted docs, and may get you most all of the benefits of 
optimize without the problems. Specifically, let’s say you have a segment right 
at the limit (5G by default) that has exactly one deleted doc. Optimize will 
rewrite that, expungeDeletes will not. It’s an open question whether there’s 
any practical difference, ‘cause if all the segments in your index have > 10% 
deleted documents, they all get rewritten in either case….

And the mechanism for optimize changed pretty significantly in Solr 7.5, the 
short form is that before that the result was a single massive segment, whereas 
after that the default max segment size of 5G is respected by default (although 
you can force to one segment if you take explicit actions).

Here are two articles that explain it all:
Pre Solr 7.4: 
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
Post Solr 7.4: 
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

Best,
Erick

> On Dec 2, 2020, at 11:05 PM, Dave  wrote:
> 
> I’m going to go against the advice SLIGHTLY, it really depends on how you 
> have things set up as far as your solr server hosting is done. If you’re 
> searching off the same solr server you’re indexing to, yeah don’t ever 
> optimize it will take care of itself, people much smarter than us, like 
> Erick/Walter/Yonik, have spent time on this and if they say don’t do it don't 
> do it. 
> 
> In my particular use case I do see a measured improvement from optimizing 
> every three or four months.  In my case a large portion, over 75% of the 
> documents, which each measure around 500k to 3mg get reindexed every month, 
> as the fields in the documents change every month, while documents are added 
> to it daily as well.  So when I can go from a 650gb index to a 450gb once in 
> a while it makes a difference if I only have 500gb of memory to work with on 
> the searchers and can fit all the segments straight to memory. Also I use the 
> old set up of master slave, so my indexing server, when it’s optimizing has 
> no impact on the searching servers.  Once the optimized index gets warmed 
> back up in the searcher I do notice improvement in my qtimes (I like to 
> think) however I’ve been using my same integration process of occasional hard 
> optimizations since 1.4, and it might just be i like to watch the index 
> inflate three times the size then shrivel up. Old habits die hard. 
> 
>> On Dec 2, 2020, at 10:28 PM, Matheo Software  
>> wrote:
>> 
>> Hi Erick,
>> Hi Walter,
>> 
>> Thanks for these information,
>> 
>> I will learn seriously about the solr article you gave me. 
>> I thought it was important to always delete and optimize collection.
>> 
>> More information concerning my collection,
>> Index size is about 390Go for 130M docs (3-5ko / doc), around 25 fields 
>> (indexed, stored)
>> All Tuesday I do an update of around 1M docs and all Thusday I do an add new 
>> docs (around 50 000). 
>> 
>> Many thanks !
>> 
>> Regards,
>> Bruno
>> 
>> -Message d'origine-
>> De : Erick Erickson [mailto:erickerick...@gmail.com] 
>> Envoyé : mercredi 2 décembre 2020 14:07
>> À : solr-user@lucene.apache.org
>> Objet : Re: Solr8.7 - How to optmize my index ?
>> 
>> expungeDeletes is unnecessary, optimize is a superset of expungeDeletes.
>> The key difference is commit=true. I suspect if you’d waited until your 
>> indexing process added another doc and committed, you’d have seen the index 
>> size drop.
>> 
>> Just to check, you send the command to my_core but talk about collections.
>> Specifying the collection is sufficient, but I’ll assume that’s a typo and 
>> you’re really saying my_collection.
>> 
>> I agree with Walter like I always do, you shouldn’t be running optimize 
>> without some proof that it’s helping. About the only time I think it’s 
>> reasonable is when you have a static index, unless you can demonstrate 
>> improved performance. The optimize button was removed precisely because it 
>> was so tempting. In much earlier versions of Lucene, it made a demonstrable 
>> difference so was put front and center. In more recent versions of Solr 
>> optimize doesn’t help nearly as much so it was removed.
>> 
>> You say you have 38M deleted documents. How many documents total? If this is 
>> 50% of your index, that’s one thing. If it’s 5%, it’s certainly not worth 
>> the effort. You’re rewriting 466G of index, if you’re not seeing 
>> demonstrable performance improvements, that’s a lot of wasted effort…
>> 
>> See: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
>> and the linked article for what 

Re: ConcurrentUpdateSolrClient stall prevention bug in Solr 8.4+

2020-12-03 Thread Erick Erickson
Exactly _how_ are you indexing? In particular, how often are commits happening?

If you’re committing too often, Solr can block until some of the background 
merges are complete. This can happen particularly when you are doing hard 
commits in rapid succession, either through, say, committing from the client 
(which I recommend against in almost all cases) or haveing your  
intervals set too short.

Your autocommit settings should be as long as your application can tolerate,
committing is expensive.

Here’s some background:
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The other possibility is if you have very long GC pauses, so I’d also
monitor the GC activity and see if you have stop-the-world GC
pauses exceeding 20 seconds coincident with this problem.

Best,
Erick

> On Dec 3, 2020, at 6:12 AM, Sebastian Lutter  
> wrote:
> 
> Hi!
> 
> I run a three nodes Solr 8.5.1 cluster and experienced a bug when updating 
> the index: (adding document)
> 
> {
>   "responseHeader":{
> "rf":3,
> "status":500,
> "QTime":22938},
>   "error":{
> "msg":"Task queue processing has stalled for 20205 ms with 0 remaining 
> elements to process.",
> "trace":"java.io.IOException: Task queue processing has stalled for 20205 
> ms with 0 remaining elements to process.\n\tat 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.blockUntilFinished(ConcurrentUpdateHttp2SolrClient.java:501)\n\tat
>  
> org.apache.solr.update.StreamingSolrClients.blockUntilFinished(StreamingSolrClients.java:87)\n\tat
>  
> org.apache.solr.update.SolrCmdDistributor.blockAndDoRetries(SolrCmdDistributor.java:265)\n\tat
>  
> org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:251)\n\tat
>  
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:201)\n\tat
>  
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)\n\tat
>  
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:72)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
>  org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
>  
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
>  org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat
>  
> 

Re: increasing number of threads for faceting in JSON format

2020-12-03 Thread Munendra S N
Hi,

Currently, JSON facets have support for specifying the number of threads.
In the above request, the range facet is computed over 2 years with a gap
of 1 hour. By reducing the number of buckets, computation should become
much faster

Regards,
Munendra S N



On Thu, Dec 3, 2020 at 1:52 PM Arturas Mazeika  wrote:

> Hi Solr-Users,
>
> I am trying to better understand the solr capabilities, how one can
> formulate queries in JSON format as well as tweak parameters. Currently I
> have a logs collection (ca 6GB large) with a dozen of attributes running in
> single server mode (F:\solr_deployment\solr-8.7.0\bin\solr.cmd start -h
> localhost -p  -m 4g)
>
> I am playing with faceting functionality in solr and query a couple of
> attributes there. My typical query is:
>
> GET http://localhost:/solr/db/query
>  HTTP/1.1
> content-type: application/json
>
> {
> "query"  : "*:*",
> "limit"  : 0,
> "facet": {
> "t" : {
> "type":  "terms",
> "field": "fcomp",
> "sort":  "index",
>
> "facet": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> }
> }
> },
> }
> }
>
> not surprisingly, it takes a bit to compute the result, so I tried to
> increase the number of threads. How do I do it in JSON format? I tried
> adding
>
> {
> "params": {
> "facet.threads": 8
> },
> "query"  : "*:*",
> ...
> }
>
> and checked the jstack  of the solr java process, but I still see only
> one thread working.  Can I configure params through the params section?
>
> I also tried
>
> {
> "query"  : "*:*",
> "limit"  : 0,
> "facet": {
> "t" : {
> "type":  "terms",
> "field": "fcomp",
> "sort":  "index",
>
> "facet": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> }
> },
> "threads":8
> },
> }
> }
>
> but this ran in one thread as well. Can I influence the number of threads
> in the "facet" section of JSON?
>
> Cheers,
> Arturas
>


ConcurrentUpdateSolrClient stall prevention bug in Solr 8.4+

2020-12-03 Thread Sebastian Lutter
Hi!

I run a three nodes Solr 8.5.1 cluster and experienced a bug when
updating the index: (adding document)

{
  "responseHeader":{
"rf":3,
"status":500,
"QTime":22938},
  "error":{
"msg":"Task queue processing has stalled for 20205 ms with 0 remaining 
elements to process.",
"trace":"java.io.IOException: Task queue processing has stalled for 20205 
ms with 0 remaining elements to process.\n\tat 
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.blockUntilFinished(ConcurrentUpdateHttp2SolrClient.java:501)\n\tat
 
org.apache.solr.update.StreamingSolrClients.blockUntilFinished(StreamingSolrClients.java:87)\n\tat
 
org.apache.solr.update.SolrCmdDistributor.blockAndDoRetries(SolrCmdDistributor.java:265)\n\tat
 
org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:251)\n\tat
 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:201)\n\tat
 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)\n\tat
 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:72)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
 
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
 org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat
 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
 org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat
 java.base/java.lang.Thread.run(Unknown Source)\n",
"code":500}}

I have no routine yet to reproduce the issue, this happened in a
production environment.


nested facets of query and terms type in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Solr Team,

I am trying to check how I can formulate facet queries using JSON format. I
can successfully formulate query, range, term queries, as well as nested
term queries. How can I formulate a nested facet query involving "query" as
well as "range" formulations? The following does not work:


GET http://localhost:/solr/db/query HTTP/1.1
content-type: application/json

{
"query"  : "*:*",
"limit"  : 0,
"facet": {
"a1": { "query":  "cfname2:1" },
"a2": { "query":  "cfname2:2" },
"a3": { "field":  "cfname2", "type":"terms", "prefix":"3" },
"a4": { "query":  "cfname2:4" },
"a5": { "query":  "cfname2:5" },
"a6": { "query":  "cfname2:6" },

"tt": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
}
}
}
}

Single (not nested facets separately on individual queries as well as for
range) work in flying colors.

Cheers,
Arturas


increasing number of threads for faceting in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Solr-Users,

I am trying to better understand the solr capabilities, how one can
formulate queries in JSON format as well as tweak parameters. Currently I
have a logs collection (ca 6GB large) with a dozen of attributes running in
single server mode (F:\solr_deployment\solr-8.7.0\bin\solr.cmd start -h
localhost -p  -m 4g)

I am playing with faceting functionality in solr and query a couple of
attributes there. My typical query is:

GET http://localhost:/solr/db/query
 HTTP/1.1
content-type: application/json

{
"query"  : "*:*",
"limit"  : 0,
"facet": {
"t" : {
"type":  "terms",
"field": "fcomp",
"sort":  "index",

"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
}
}
},
}
}

not surprisingly, it takes a bit to compute the result, so I tried to
increase the number of threads. How do I do it in JSON format? I tried
adding

{
"params": {
"facet.threads": 8
},
"query"  : "*:*",
...
}

and checked the jstack  of the solr java process, but I still see only
one thread working.  Can I configure params through the params section?

I also tried

{
"query"  : "*:*",
"limit"  : 0,
"facet": {
"t" : {
"type":  "terms",
"field": "fcomp",
"sort":  "index",

"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
}
},
"threads":8
},
}
}

but this ran in one thread as well. Can I influence the number of threads
in the "facet" section of JSON?

Cheers,
Arturas