Re: Optimizing fq query performance

2019-04-17 Thread John Davis
I did a few tests with our instance solr-7.4.0 and field:* vs field:[* TO
*] doesn't seem materially different compared to has_field:1. If no one
knows why Lucene optimizes one but not another, it's not clear whether it
even optimizes one to be sure.

On Wed, Apr 17, 2019 at 4:27 PM Shawn Heisey  wrote:

> On 4/17/2019 1:21 PM, John Davis wrote:
> > If what you describe is the case for range query [* TO *], why would
> lucene
> > not optimize field:* similar way?
>
> I don't know.  Low level lucene operation is a mystery to me.
>
> I have seen first-hand that the range query is MUCH faster than the
> wildcard query.
>
> Thanks,
> Shawn
>


Re: Solr8.0.0 date search issue

2019-04-17 Thread Anuj Bhargava
Added the following in the Schema file -


And modified

to


Still the following not working
NOW
NOW-7DAYS
NOW-30DAYS

The following has started working
fq=date_upload:[2018-12-01 TO 2019-04-17]

On Thu, 18 Apr 2019 at 08:15, Anuj Bhargava  wrote:

> I have an issue while searching on the Date field date_upload
>
> My Schema file has the following entry for DATE Field
>
> **
>
> My data-config.xml has the following entry -
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * driver="com.mysql.jdbc.Driver"batchSize="-1"
> autoReconnect="true"socketTimeout="0"
> connectTimeout="0"encoding="UTF-8"
> url="jdbc:mysql://xxx.xxx.xxx.xx:3306/news?zeroDateTimeBehavior=convertToNull"
> user="admin"password="admin"/> pk="posting_id"  query="SELECT * FROM news10"  deltaImportQuery="SELECT *
> FROM news10WHERE poting_id = '${dataimporter.delta.posting_id}'"
> deltaQuery="SELECT posting_id FROM news10WHERE last_modified >
> '${dataimporter.last_index_time}'">*
>
> *The following do not work -*
> fq=date_upload:NOW  (does not work)
> http://localhost:8983/solr/Nlive/select?fq=date_c%3ANOW=*%3A*
>
> fq=date_upload:NOW-1DAY  (does not work)
> fq=date_upload:(NOW-30DAYS)  (does not work)
>
> fq=date_upload:[2018-12-01T:00:00:00Z TO 2019-04-17T00:00:00Z]  (does not
> work)
> *"msg":"Invalid Date in Date Math String:'2018-12-01T:00:00:00Z'",*
>
> fq=date_upload:[2018-12-01 TO 2019-04-17] Gives the following error
>
>
>
>
>
>
>
>
>
>
>
>
>
> *{  "responseHeader":{"status":400,"QTime":1,"params":{
> "q":"*:*",  "fq":"date_upload:[2018-12-01 TO 2019-04-17]",
> "_":"1555386354522"}},  "error":{"metadata":[
> "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Invalid Date String:'2018-12-01'","code":400}}*
>
> *However this does give results -*
> fq=date_upload:[* TO NOW]
>
> http://localhost:8983/solr/Nlive/select??fq=date_c%3A%5B*%20TO%20NOW%5D=*%3A*
>


Solr8.0.0 date search issue

2019-04-17 Thread Anuj Bhargava
I have an issue while searching on the Date field date_upload

My Schema file has the following entry for DATE Field

**

My data-config.xml has the following entry -
























**

*The following do not work -*
fq=date_upload:NOW  (does not work)
http://localhost:8983/solr/Nlive/select?fq=date_c%3ANOW=*%3A*

fq=date_upload:NOW-1DAY  (does not work)
fq=date_upload:(NOW-30DAYS)  (does not work)

fq=date_upload:[2018-12-01T:00:00:00Z TO 2019-04-17T00:00:00Z]  (does not
work)
*"msg":"Invalid Date in Date Math String:'2018-12-01T:00:00:00Z'",*

fq=date_upload:[2018-12-01 TO 2019-04-17] Gives the following error













*{  "responseHeader":{"status":400,"QTime":1,"params":{
"q":"*:*",  "fq":"date_upload:[2018-12-01 TO 2019-04-17]",
"_":"1555386354522"}},  "error":{"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"Invalid Date String:'2018-12-01'","code":400}}*

*However this does give results -*
fq=date_upload:[* TO NOW]
http://localhost:8983/solr/Nlive/select??fq=date_c%3A%5B*%20TO%20NOW%5D=*%3A*


Re: JSON Facet query to retrieve count all collections in Solr 8.0.0

2019-04-17 Thread Zheng Lin Edwin Yeo
Hi Jason,

The same problem still persist after restarting my Solr nodes. The only
time the problem didn't occur is when I disabled the basic authentication.

I have tried with a few "/select?q=*:*", and they do not exhibit the same
problem. Even the similar query with only 1 shard does not have the problem.

https://localhost:8983/solr/collection1/select?q=testing=https://localhost:8983/solr/collection1=0={categories
: {type : terms,field : content_type,limit : 100}}


It is only when there are 2 or more shards, that the problem occur.

https://localhost:8983/solr/collection1/select?q=testing=https://localhost:8983/solr/collection1,https://localhost:8983/solr/collection2=0={categories
: {type : terms,field : content_type,limit : 100}}


Regards,
Edwin


On Thu, 18 Apr 2019 at 01:15, Jason Gerlowski  wrote:

> Agreed, I'd be surprised if this behavior was specific to JSON
> Faceting.  Though I'm surprised it's happening at all, so...
>
> Anyway, that's easy for you to test though.  Try a few "/select?q=*:*"
> queries and see whether they also exhibits this behavior.  One other
> question: does the behavior persist after restarting your Solr nodes?
>
> Good luck,
>
> Jason
>
> On Wed, Apr 17, 2019 at 4:05 AM Zheng Lin Edwin Yeo
>  wrote:
> >
> > Hi,
> >
> > For your info, I have enabled basic authentication and SSL in all the 3
> > versions, and I'm not sure if the issue is more on the authentication
> side
> > instead of the JSON Facet query?
> >
> > Regards,
> > Edwin
> >
> > On Wed, 17 Apr 2019 at 06:54, Zheng Lin Edwin Yeo 
> > wrote:
> >
> > > Hi Jason,
> > >
> > > Yes, that is correct.
> > >
> > > Below is the format of my security.json. I have changed the masked
> > > password for security purposes.
> > >
> > > {
> > > "authentication":{
> > >"blockUnknown": true,
> > >"class":"solr.BasicAuthPlugin",
> > >"credentials":{"user1":"hyHXXuJSqcZdNgdSTGUvrQZRpqrYFUQ2ffmlWQ4GUTk=
> > > E0w3/2FD+rlxulbPm2G7i9HZqT+2gMBzcyJCcGcMWwA="}
> > > },
> > > "authorization":{
> > >"class":"solr.RuleBasedAuthorizationPlugin",
> > >"user-role":{"user1":"admin"},
> > >"permissions":[{"name":"security-edit",
> > >   "role":"admin"}]
> > > }}
> > >
> > > Regards,
> > > Edwin
> > >
> > > On Tue, 16 Apr 2019 at 23:12, Jason Gerlowski 
> > > wrote:
> > >
> > >> Hi Edwin,
> > >>
> > >> To clarify what you're running into:
> > >>
> > >> - on 7.6, this query works all the time
> > >> - on 7.7 this query works all the time
> > >> - on 8.0, this query works the first time you run it, but subsequent
> > >> runs return a 401 error?
> > >>
> > >> Is that correct?  It might be helpful for others if you could share
> > >> your security.json.
> > >>
> > >> Best,
> > >>
> > >> Jason
> > >>
> > >> On Mon, Apr 15, 2019 at 10:40 PM Zheng Lin Edwin Yeo
> > >>  wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > I am using the below JSON Facet to retrieve the count of all the
> > >> different
> > >> > collections in one query.
> > >> >
> > >> >
> > >>
> https://localhost:8983/solr/collection1/select?q=testing=https://localhost:8983/solr/collection1,https://localhost:8983/solr/collection2,https://localhost:8983/solr/collection3,https://localhost:8983/solr/collection4,https://localhost:8983/solr/collection5,https://localhost:8983/solr/collection6=0={categories
> > >> > : {type : terms,field : content_type,limit : 100}}
> > >> >
> > >> >
> > >> > Previously, in Solr 7.6 and Solr 7.7, this query can work correctly
> and
> > >> we
> > >> > are able to produce the correct output.
> > >> >
> > >> > {
> > >> >   "responseHeader":{
> > >> > "zkConnected":true,
> > >> > "status":0,
> > >> > "QTime":24},
> > >> >
>  "response":{"numFound":41200,"start":0,"maxScore":12.993215,"docs":[]
> > >> >   },
> > >> >   "facets":{
> > >> > "count":41200,
> > >> > "categories":{
> > >> >   "buckets":[{
> > >> >   "val":"collection1",
> > >> >   "count":26213},
> > >> > {
> > >> >   "val":"collection2",
> > >> >   "count":12075},
> > >> > {
> > >> >   "val":"collection3",
> > >> >   "count":1947},
> > >> > {
> > >> >   "val":"collection4",
> > >> >   "count":850},
> > >> > {
> > >> >   "val":"collection5",
> > >> >   "count":111},
> > >> > {
> > >> >   "val":"collection6",
> > >> >   "count":4}]}}}
> > >> >
> > >> >
> > >> > However, in the new Solr 8.0.0, this query can only work once.
> > >> > Subsequently, we will get the following error of 'require
> > >> authentication':
> > >> >
> > >> > {
> > >> >   "responseHeader":{
> > >> > "zkConnected":true,
> > >> > "status":401,
> > >> > "QTime":11},
> > >> >   "error":{
> > >> > "metadata":[
> > >> >
> > >> >
> > >>
> "error-class","org.apache.solr.client.solrj.impl.Http2SolrClient$RemoteSolrException",
> > >> >
> > >> >
> > >>
> 

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

2019-04-17 Thread Erick Erickson
Specifically a _leader_ being put into the down or recovering state is almost 
always because ZooKeeper cannot ping it and get a response back before it times 
out. This also points to large GC pauses no the Solr node. Using something like 
GCViewer on the GC logs at the time of the problem will help a lot.

A _follower_ can go into recovery when an update takes too long but that’s 
“leader initiated recovery” and originates _from_ the leader, which is much 
different than the leader going into a down state.

Best,
Erick

> On Apr 17, 2019, at 7:54 AM, Shawn Heisey  wrote:
> 
> On 4/17/2019 6:25 AM, vishal patel wrote:
>> Why did shard1 take a 1.8 minutes time for update? and if it took time for 
>> update then why did replica1 try to become leader? Is it required to update 
>> any timeout?
> 
> There's no information here that can tell us why the update took so long.  My 
> best guess would be long GC pauses due to the heap size being too small.  But 
> there might be other causes.
> 
> Indexing a single document should be VERY fast.  Even a large document should 
> only take a handful of milliseconds.
> 
> If the request included "commit=true" as a parameter, then it might be the 
> commit that was slow, not the indexing.  You'll need to check the logs to 
> determine that.
> 
> The reason that the leader changed was almost certainly the fact that the 
> update took so long.  SolrCloud would have decided that the node was down if 
> any operation took that long.
> 
> Thanks,
> Shawn



Blockjoin with Filter Query on Child Doc Result Set

2019-04-17 Thread Jeffrey Walraven
Hello,

Is there a good way to do Solr Parent blockjoins with filter queries on
children (i.e. the results of the children query set affect the filter
on the children of the filter query)?

Solr has a convenient way of doing filter queries on the result set of
parent block joins. E.g.

|q={!parent which=doc_type:parent}content:dog ={!parent
which=doc_type:parent}comment:cat |

But note that this only filters the *parent* result set based on the
results of the children. The full children result set is used for the
filter both times.

To clarify, if I insert a document with child A and B

|{ "doc_type":"parent", "id":"1", __children__: [ { "id":"A" }, {
"id":"B" } ] } |

I wish to be able to search for A and filter B to get nothing in the
results, since the child doc match for id was on A (not B). Current
behavior is that the result comes back since both children are searched
in either the query or the filter query. Solr Search Example 1


To contrast, normal document filter behavior is to remove non-matching
documents from the result set. E.g. if there are two documents with id 1
& 2 and I search id:1 with a filter query of id:2 I will get no
documents back.

I know that the proper behavior can be achieved using a single query:

|q={!parent which=doc_type:parent}id:A AND id:B |

But there are certain cases where a filter query may be preferred esp.
in a situation where multiple parent block-join queries are used in the
q (or fq) value and you wish to filter over them with an fq.

If there is no current implementation for this in the current solr
project, I would appreciate guidance on if this would be reasonable to
implement.

Thanks,

Jeffrey Walraven


By the way, I posted this question on stackoverflow a few months back:
https://stackoverflow.com/questions/52600471/solr-parent-blockjoin-with-filter-query-on-children
But I wanted to see if anybody on the solr mailing list had any
solutions/ideas on how to approach this.



Re: Optimizing fq query performance

2019-04-17 Thread Shawn Heisey

On 4/17/2019 1:21 PM, John Davis wrote:

If what you describe is the case for range query [* TO *], why would lucene
not optimize field:* similar way?


I don't know.  Low level lucene operation is a mystery to me.

I have seen first-hand that the range query is MUCH faster than the 
wildcard query.


Thanks,
Shawn


Re: Optimizing fq query performance

2019-04-17 Thread John Davis
If what you describe is the case for range query [* TO *], why would lucene
not optimize field:* similar way?

On Wed, Apr 17, 2019 at 10:36 AM Shawn Heisey  wrote:

> On 4/17/2019 10:51 AM, John Davis wrote:
> > Can you clarify why field:[* TO *] is lot more efficient than field:*
>
> It's a range query.  For every document, Lucene just has to answer two
> questions -- is the value more than any possible value and is the value
> less than any possible value.  The answer will be yes if the field
> exists, and no if it doesn't.  With one million documents, there are two
> million questions that Lucene has to answer.  Which probably seems like
> a lot ... but keep reading.  (Side note:  It wouldn't surprise me if
> Lucene has an optimization specifically for the all inclusive range such
> that it actually only asks one question, not two)
>
> With a wildcard query, there are as many questions as there are values
> in the field.  Every question is asked for every single document.  So if
> you have a million documents and there are three hundred thousand
> different values contained in the field across the whole index, that's
> 300 billion questions.
>
> Thanks,
> Shawn
>


Re: Optimizing fq query performance

2019-04-17 Thread Shawn Heisey

On 4/17/2019 10:51 AM, John Davis wrote:

Can you clarify why field:[* TO *] is lot more efficient than field:*


It's a range query.  For every document, Lucene just has to answer two 
questions -- is the value more than any possible value and is the value 
less than any possible value.  The answer will be yes if the field 
exists, and no if it doesn't.  With one million documents, there are two 
million questions that Lucene has to answer.  Which probably seems like 
a lot ... but keep reading.  (Side note:  It wouldn't surprise me if 
Lucene has an optimization specifically for the all inclusive range such 
that it actually only asks one question, not two)


With a wildcard query, there are as many questions as there are values 
in the field.  Every question is asked for every single document.  So if 
you have a million documents and there are three hundred thousand 
different values contained in the field across the whole index, that's 
300 billion questions.


Thanks,
Shawn


Re: JSON Facet query to retrieve count all collections in Solr 8.0.0

2019-04-17 Thread Jason Gerlowski
Agreed, I'd be surprised if this behavior was specific to JSON
Faceting.  Though I'm surprised it's happening at all, so...

Anyway, that's easy for you to test though.  Try a few "/select?q=*:*"
queries and see whether they also exhibits this behavior.  One other
question: does the behavior persist after restarting your Solr nodes?

Good luck,

Jason

On Wed, Apr 17, 2019 at 4:05 AM Zheng Lin Edwin Yeo
 wrote:
>
> Hi,
>
> For your info, I have enabled basic authentication and SSL in all the 3
> versions, and I'm not sure if the issue is more on the authentication side
> instead of the JSON Facet query?
>
> Regards,
> Edwin
>
> On Wed, 17 Apr 2019 at 06:54, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Jason,
> >
> > Yes, that is correct.
> >
> > Below is the format of my security.json. I have changed the masked
> > password for security purposes.
> >
> > {
> > "authentication":{
> >"blockUnknown": true,
> >"class":"solr.BasicAuthPlugin",
> >"credentials":{"user1":"hyHXXuJSqcZdNgdSTGUvrQZRpqrYFUQ2ffmlWQ4GUTk=
> > E0w3/2FD+rlxulbPm2G7i9HZqT+2gMBzcyJCcGcMWwA="}
> > },
> > "authorization":{
> >"class":"solr.RuleBasedAuthorizationPlugin",
> >"user-role":{"user1":"admin"},
> >"permissions":[{"name":"security-edit",
> >   "role":"admin"}]
> > }}
> >
> > Regards,
> > Edwin
> >
> > On Tue, 16 Apr 2019 at 23:12, Jason Gerlowski 
> > wrote:
> >
> >> Hi Edwin,
> >>
> >> To clarify what you're running into:
> >>
> >> - on 7.6, this query works all the time
> >> - on 7.7 this query works all the time
> >> - on 8.0, this query works the first time you run it, but subsequent
> >> runs return a 401 error?
> >>
> >> Is that correct?  It might be helpful for others if you could share
> >> your security.json.
> >>
> >> Best,
> >>
> >> Jason
> >>
> >> On Mon, Apr 15, 2019 at 10:40 PM Zheng Lin Edwin Yeo
> >>  wrote:
> >> >
> >> > Hi,
> >> >
> >> > I am using the below JSON Facet to retrieve the count of all the
> >> different
> >> > collections in one query.
> >> >
> >> >
> >> https://localhost:8983/solr/collection1/select?q=testing=https://localhost:8983/solr/collection1,https://localhost:8983/solr/collection2,https://localhost:8983/solr/collection3,https://localhost:8983/solr/collection4,https://localhost:8983/solr/collection5,https://localhost:8983/solr/collection6=0={categories
> >> > : {type : terms,field : content_type,limit : 100}}
> >> >
> >> >
> >> > Previously, in Solr 7.6 and Solr 7.7, this query can work correctly and
> >> we
> >> > are able to produce the correct output.
> >> >
> >> > {
> >> >   "responseHeader":{
> >> > "zkConnected":true,
> >> > "status":0,
> >> > "QTime":24},
> >> >   "response":{"numFound":41200,"start":0,"maxScore":12.993215,"docs":[]
> >> >   },
> >> >   "facets":{
> >> > "count":41200,
> >> > "categories":{
> >> >   "buckets":[{
> >> >   "val":"collection1",
> >> >   "count":26213},
> >> > {
> >> >   "val":"collection2",
> >> >   "count":12075},
> >> > {
> >> >   "val":"collection3",
> >> >   "count":1947},
> >> > {
> >> >   "val":"collection4",
> >> >   "count":850},
> >> > {
> >> >   "val":"collection5",
> >> >   "count":111},
> >> > {
> >> >   "val":"collection6",
> >> >   "count":4}]}}}
> >> >
> >> >
> >> > However, in the new Solr 8.0.0, this query can only work once.
> >> > Subsequently, we will get the following error of 'require
> >> authentication':
> >> >
> >> > {
> >> >   "responseHeader":{
> >> > "zkConnected":true,
> >> > "status":401,
> >> > "QTime":11},
> >> >   "error":{
> >> > "metadata":[
> >> >
> >> >
> >> "error-class","org.apache.solr.client.solrj.impl.Http2SolrClient$RemoteSolrException",
> >> >
> >> >
> >> "root-error-class","org.apache.solr.client.solrj.impl.Http2SolrClient$RemoteSolrException"],
> >> > "msg":"Error from server at null: Expected mime type
> >> > application/octet-stream but got text/html. \n\n >> > http-equiv=\"Content-Type\"
> >> > content=\"text/html;charset=utf-8\"/>\nError 401 require
> >> > authentication\n\nHTTP ERROR
> >> 401\nProblem
> >> > accessing /solr/collection6/select. Reason:\nrequire
> >> > authentication\n\n\n",
> >> > "code":401}}
> >> >
> >> > This issue does not occur in Solr 7.6 and Solr 7.7, even though I have
> >> set
> >> > up the same authentication for all the versions.
> >> >
> >> > What could be the issue that causes this?
> >> >
> >> > Regards,
> >> > Edwin
> >>
> >


Re: Optimizing fq query performance

2019-04-17 Thread John Davis
Can you clarify why field:[* TO *] is lot more efficient than field:*

On Sun, Apr 14, 2019 at 12:14 PM Shawn Heisey  wrote:

> On 4/13/2019 12:58 PM, John Davis wrote:
> > We noticed a sizable performance degradation when we add certain fq
> filters
> > to the query even though the result set does not change between the two
> > queries. I would've expected solr to optimize internally by picking the
> > most constrained fq filter first, but maybe my understanding is wrong.
>
> All filters cover the entire index, unless the query parser that you're
> using implements the PostFilter interface, the filter cost is set high
> enough, and caching is disabled.  All three of those conditions must be
> met in order for a filter to only run on results instead of the entire
> index.
>
> http://yonik.com/advanced-filter-caching-in-solr/
> https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/
>
> Most query parsers don't implement the PostFilter interface.  The lucene
> and edismax parsers do not implement PostFilter.  Unless you've
> specified the query parser in the fq parameter, it will use the lucene
> query parser, and it cannot be a PostFilter.
>
> > Here's an example:
> >
> > query1: fq = 'field1:* AND field2:value'
> > query2: fq = 'field2:value'
>
> If the point of the "field1:*" query clause is "make sure field1 exists
> in the document" then you would be a lot better off with this query clause:
>
> field1:[* TO *]
>
> This is an all-inclusive range query.  It works with all field types
> where I have tried it, and that includes TextField types.   It will be a
> lot more efficient than the wildcard query.
>
> Here's what happens with "field1:*".  If the cardinality of field1 is
> ten million different values, then the query that gets constructed for
> Lucene will literally contain ten million values.  And every single one
> of them will need to be compared to every document.  That's a LOT of
> comparisons.  Wildcard queries are normally very slow.
>
> Thanks,
> Shawn
>


Re: Cannot set pollInterval in SolrCloud for PULL or TLOG replica

2019-04-17 Thread Dmitry Vorotilin
It looks like `/solr//replication?command=disablepoll` doesn't work
in cloud mode so there's no way to change settings for interval as well as
to say replicas to stop polling.
My own conclusion: if you have bulk updates and commit with
openSearcher=true only at the end PULL/TLOG replicas isn't your choice, the
only option you have is NRT which burns CPU on all machines slowing down
all select queries.

On Wed, Apr 17, 2019 at 3:25 PM Dmitry Vorotilin 
wrote:

> Hi Vadim, thank you seems like we both had similar questions.
> So I think that all confirms that it's not configurable for now. That's in
> fact a pity because it only makes sense to use PULL/TLOG replicas in order
> to save CPU and not reindex docs on every node but current situation with
> reopening searcher every time ruins it all at least for bulk updates. The
> only solution I see now is to use manual replication and trigger it on
> every node after leader optimized index and this configuration was
> available on master-salve legacy...
>
> On Tue, Apr 16, 2019 at 6:30 PM Vadim Ivanov <
> vadim.iva...@spb.ntk-intourist.ru> wrote:
>
>> Hi, Dmitri
>> There was discussion here a while ago...
>>
>> http://lucene.472066.n3.nabble.com/Soft-commit-and-new-replica-types-td4417253.html
>> May be it helps you somehow.
>>
>> --
>> Vadim
>>
>>
>> > -Original Message-
>> > From: Dmitry Vorotilin [mailto:d.voroti...@gmail.com]
>> > Sent: Tuesday, April 16, 2019 9:41 AM
>> > To: solr-user@lucene.apache.org
>> > Subject: Cannot set pollInterval in SolrCloud for PULL or TLOG replica
>> >
>> > Hi everyone,
>> >
>> > We have SolrCloud cluster with 3 zk and 3 solr nodes. It's 1 shard only
>> and
>> > all replicas are PULL.
>> > We have bulk updates so like once a day we reindex all cores (no soft
>> > commits, only hard commit every 15s), do commit with openSearcher=true
>> > and
>> > all our indexes become available for search.
>> >
>> > The issue is that for PULL replication when leader reindexing starts it
>> > downloads index every
>> > hard commit / 2 seconds (o.a.s.h.ReplicationHandler Poll scheduled at an
>> > interval of 7000ms) then puts index into proper directory and just
>> reopens
>> > searcher so that we see no changes on leader because there was no commit
>> > with openSearcher=true yet and that index keeps growing on PULL
>> replicas.
>> >
>> > Judging by this page
>> > > > replication-in-solr>
>> > there's no setting for pollInterval or when to start replication on
>> slaves
>> > in SolrCloud and the info is rather confusing because in cloud we still
>> use
>> > the same handlers which we cannot configure.
>> >
>> > We changed replication from NRT to PULL because we don't need realtime
>> > and
>> > burn CPU with bulk updates on every machine, but this constantly
>> catching
>> > up index on slaves isn't any better...
>> >
>> > Do you know any way to fix it?
>>
>>


Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

2019-04-17 Thread Shawn Heisey

On 4/17/2019 6:25 AM, vishal patel wrote:

Why did shard1 take a 1.8 minutes time for update? and if it took time for 
update then why did replica1 try to become leader? Is it required to update any 
timeout?


There's no information here that can tell us why the update took so 
long.  My best guess would be long GC pauses due to the heap size being 
too small.  But there might be other causes.


Indexing a single document should be VERY fast.  Even a large document 
should only take a handful of milliseconds.


If the request included "commit=true" as a parameter, then it might be 
the commit that was slow, not the indexing.  You'll need to check the 
logs to determine that.


The reason that the leader changed was almost certainly the fact that 
the update took so long.  SolrCloud would have decided that the node was 
down if any operation took that long.


Thanks,
Shawn


Re: Upgrading Solr 6.3.0 to 7.5.0 without having to re-index

2019-04-17 Thread Shawn Heisey

On 4/17/2019 3:52 AM, Ritesh Kumar wrote:

Field type in old configuration - string (solr.StrField)   indexed and
stored set to true.
Field type in new configuration - solr.SortableTextField (docValues enabled)


On your schema, you have changed the field class -- from StrField to 
SortableTextField.  Which, by the way, isn't going to work without a 
reindex even if there are no docValues problems.  You also changed the 
docValues flag, which can sometimes change the docValues type at the 
Lucene level.


If a field has its Lucene docValues type changed, indexing is going to 
fail.  The index will have to be completely deleted, restarted, and 
rebuilt from scratch.  If the index directory is not deleted completely, 
then the error you saw will continue even through a reindex.


Thanks,
Shawn


Re: Understanding Performance of Function Query

2019-04-17 Thread Sidharth Negi
This does indeed reduce the time. but doesn't quite do what I wanted. This
approach penalizes the docs based on "coord" factor. In other words, for a
doc with scores=5 on just one query (and nothing on others), the resulting
score would now be 5/3 since only one clause matches.

1. I wonder why does the above query work at all? I can't find the above
query syntax anywhere in any docs or books on Solr, can you point me to
your source for this syntax?

2. Which parser is used to parse the larger query? No info about the parser
used for the larger query is given from parsedQuery field. (using
debug=true)

3. What if I did not want to sum (the scores of q1, q2, q3) but rather
wanted to use their values in some other way (eg. sqrt(q1) + sqrt(q2) +
0.6*q3). Is there no way of cleanly implementing a flow of computations to
be done on sub-query scores?

On Tue, Apr 9, 2019 at 7:40 PM Erik Hatcher  wrote:

> maybe something like q=
>
> ({!edismax  v=$q1} OR {!edismax  v=$q2} OR {!edismax ...
> v=$q3})
>
>  and setting q1, q2, q3 as needed (or all to the same maybe with different
> qf’s and such)
>
>   Erik
>
> > On Apr 9, 2019, at 09:12, sidharth228  wrote:
> >
> > I did infact use "bf" parameter for individual edismax queries.
> >
> > However, the reason I can't condense these edismax queries into a single
> > edismax query is because each of them uses different fields in "qf".
> >
> > Basically what I'm trying to do is this: each of these edismax queries
> (q1,
> > q2, q3) has a logic, and scores docs using it. I am then trying to
> combine
> > the scores (to get an overall score) from these scores later by summing
> > them.
> >
> > What options do I have of implementing this?
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Cannot set pollInterval in SolrCloud for PULL or TLOG replica

2019-04-17 Thread Dmitry Vorotilin
Hi Vadim, thank you seems like we both had similar questions.
So I think that all confirms that it's not configurable for now. That's in
fact a pity because it only makes sense to use PULL/TLOG replicas in order
to save CPU and not reindex docs on every node but current situation with
reopening searcher every time ruins it all at least for bulk updates. The
only solution I see now is to use manual replication and trigger it on
every node after leader optimized index and this configuration was
available on master-salve legacy...

On Tue, Apr 16, 2019 at 6:30 PM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

> Hi, Dmitri
> There was discussion here a while ago...
>
> http://lucene.472066.n3.nabble.com/Soft-commit-and-new-replica-types-td4417253.html
> May be it helps you somehow.
>
> --
> Vadim
>
>
> > -Original Message-
> > From: Dmitry Vorotilin [mailto:d.voroti...@gmail.com]
> > Sent: Tuesday, April 16, 2019 9:41 AM
> > To: solr-user@lucene.apache.org
> > Subject: Cannot set pollInterval in SolrCloud for PULL or TLOG replica
> >
> > Hi everyone,
> >
> > We have SolrCloud cluster with 3 zk and 3 solr nodes. It's 1 shard only
> and
> > all replicas are PULL.
> > We have bulk updates so like once a day we reindex all cores (no soft
> > commits, only hard commit every 15s), do commit with openSearcher=true
> > and
> > all our indexes become available for search.
> >
> > The issue is that for PULL replication when leader reindexing starts it
> > downloads index every
> > hard commit / 2 seconds (o.a.s.h.ReplicationHandler Poll scheduled at an
> > interval of 7000ms) then puts index into proper directory and just
> reopens
> > searcher so that we see no changes on leader because there was no commit
> > with openSearcher=true yet and that index keeps growing on PULL replicas.
> >
> > Judging by this page
> >  > replication-in-solr>
> > there's no setting for pollInterval or when to start replication on
> slaves
> > in SolrCloud and the info is rather confusing because in cloud we still
> use
> > the same handlers which we cannot configure.
> >
> > We changed replication from NRT to PULL because we don't need realtime
> > and
> > burn CPU with bulk updates on every machine, but this constantly catching
> > up index on slaves isn't any better...
> >
> > Do you know any way to fix it?
>
>


Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

2019-04-17 Thread vishal patel

We have 2 shard and 2 replicas in production server.Somehow replica1 became 
leader when some commit process was running in shard1.
Log ::

***shard1***
2019-04-08 12:52:09.930 INFO  
(searcherExecutor-30-thread-1-processing-n:shard1:8983_solr x:productData 
s:shard1 c:productData r:core_node1) [c:productData s:shard1 r:core_node1 
x:productData] o.a.s.c.QuerySenderListener QuerySenderListener done.
2019-04-08 12:54:01.397 INFO  (qtp1239731077-1359101) [c:product s:shard1 
r:core_node1 x:product] o.a.s.u.p.LogUpdateProcessorFactory [product]  
webapp=/solr path=/update params={wt=javabin=2}{add=[PRO23241768 
(1630250393598427136)]} 0 111711

***replica1***
2019-04-08 12:52:09.581 INFO  (qtp1239731077-1021605) [c:product s:shard1 
r:core_node3 x:product] o.a.s.u.p.LogUpdateProcessorFactory [product]  
webapp=/solr path=/update 
params={update.distrib=FROMLEADER=shard1:8983/solr/product/=javabin=2}{add=[PRO23241768
 (1630250393598427136)]} 0 0
2019-04-08 12:52:19.717 INFO  
(zkCallback-4-thread-207-processing-n:replica1:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A live node change: [WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes], has occurred - updating... (live 
nodes size: [4])

PRO23241768 was successfully updated at time 12:52:09.581 in replica1 but 
updated time was 12:54:01.397 in shard1. It took around 1.86(111711) minutes. 
In between replica1 tried to become a leader at time 12:52:19.717 and it became 
successfully.

My production solr.xml
${zkClientTimeout:60}
${distribUpdateSoTimeout:60}
${distribUpdateConnTimeout:6}


 ${socketTimeout:60}
 ${connTimeout:6}


Collection : product and productData.

Version ::
solr  : 6.1.0
Zoo keeper : 3.4.6


Why did shard1 take a 1.8 minutes time for update? and if it took time for 
update then why did replica1 try to become leader? Is it required to update any 
timeout?

Note : PRO23241768 was soft commit and log was info level.


Upgrading Solr 6.3.0 to 7.5.0 without having to re-index

2019-04-17 Thread Ritesh Kumar
Hello Team,

I have been trying to upgrade Solr 6.3.0 to 7.5.0 and I do not want to
re-index. I tried it using the Index Upgrader Tool
. The
tool did its part and the current index is according to the current file
format.

The problem I am facing is with fields which have docValues enabled in the
current configuration but was not in the earlier configuration.
The error I get is
*java.lang.IllegalStateException: unexpected docvalues type NONE for field
'abc' (expected one of [SORTED, SORTED_SET]). Re-index with correct
docvalues type.*

Field type in old configuration - string (solr.StrField)   indexed and
stored set to true.
Field type in new configuration - solr.SortableTextField (docValues enabled)

Is there any way I can upgrade with the current field configuration without
having to re-index?

Best,

Ritesh Kumar


Re: local paramas only with defType=lucene?

2019-04-17 Thread Nicolas Franck
Yup

Changes in Solr 7.2: local parameters only parsed when defType is either 
"lucene" or "func"

cf. https://lucene.apache.org/solr/guide/7_3/solr-upgrade-notes.html#solr-7-2
cf. https://issues.apache.org/jira/browse/SOLR-11501


On 17 Apr 2019, at 10:35, Michael Aleythe, Sternwald 
mailto:michael.aley...@sternwald.com>> wrote:

Hi everybody,

is it correct that local parameters ( q={!edismax qf=MEDIA_ID v=283813390} ) in 
solr only work with the lucene query parser defined for the main query? I tried 
with dismax/edismax but it did not work. The documentation is not clear on this 
point.

Best regards
Michael Aleythe



local paramas only with defType=lucene?

2019-04-17 Thread Michael Aleythe, Sternwald
Hi everybody,

is it correct that local parameters ( q={!edismax qf=MEDIA_ID v=283813390} ) in 
solr only work with the lucene query parser defined for the main query? I tried 
with dismax/edismax but it did not work. The documentation is not clear on this 
point.

Best regards
Michael Aleythe


Re: JSON Facet query to retrieve count all collections in Solr 8.0.0

2019-04-17 Thread Zheng Lin Edwin Yeo
Hi,

For your info, I have enabled basic authentication and SSL in all the 3
versions, and I'm not sure if the issue is more on the authentication side
instead of the JSON Facet query?

Regards,
Edwin

On Wed, 17 Apr 2019 at 06:54, Zheng Lin Edwin Yeo 
wrote:

> Hi Jason,
>
> Yes, that is correct.
>
> Below is the format of my security.json. I have changed the masked
> password for security purposes.
>
> {
> "authentication":{
>"blockUnknown": true,
>"class":"solr.BasicAuthPlugin",
>"credentials":{"user1":"hyHXXuJSqcZdNgdSTGUvrQZRpqrYFUQ2ffmlWQ4GUTk=
> E0w3/2FD+rlxulbPm2G7i9HZqT+2gMBzcyJCcGcMWwA="}
> },
> "authorization":{
>"class":"solr.RuleBasedAuthorizationPlugin",
>"user-role":{"user1":"admin"},
>"permissions":[{"name":"security-edit",
>   "role":"admin"}]
> }}
>
> Regards,
> Edwin
>
> On Tue, 16 Apr 2019 at 23:12, Jason Gerlowski 
> wrote:
>
>> Hi Edwin,
>>
>> To clarify what you're running into:
>>
>> - on 7.6, this query works all the time
>> - on 7.7 this query works all the time
>> - on 8.0, this query works the first time you run it, but subsequent
>> runs return a 401 error?
>>
>> Is that correct?  It might be helpful for others if you could share
>> your security.json.
>>
>> Best,
>>
>> Jason
>>
>> On Mon, Apr 15, 2019 at 10:40 PM Zheng Lin Edwin Yeo
>>  wrote:
>> >
>> > Hi,
>> >
>> > I am using the below JSON Facet to retrieve the count of all the
>> different
>> > collections in one query.
>> >
>> >
>> https://localhost:8983/solr/collection1/select?q=testing=https://localhost:8983/solr/collection1,https://localhost:8983/solr/collection2,https://localhost:8983/solr/collection3,https://localhost:8983/solr/collection4,https://localhost:8983/solr/collection5,https://localhost:8983/solr/collection6=0={categories
>> > : {type : terms,field : content_type,limit : 100}}
>> >
>> >
>> > Previously, in Solr 7.6 and Solr 7.7, this query can work correctly and
>> we
>> > are able to produce the correct output.
>> >
>> > {
>> >   "responseHeader":{
>> > "zkConnected":true,
>> > "status":0,
>> > "QTime":24},
>> >   "response":{"numFound":41200,"start":0,"maxScore":12.993215,"docs":[]
>> >   },
>> >   "facets":{
>> > "count":41200,
>> > "categories":{
>> >   "buckets":[{
>> >   "val":"collection1",
>> >   "count":26213},
>> > {
>> >   "val":"collection2",
>> >   "count":12075},
>> > {
>> >   "val":"collection3",
>> >   "count":1947},
>> > {
>> >   "val":"collection4",
>> >   "count":850},
>> > {
>> >   "val":"collection5",
>> >   "count":111},
>> > {
>> >   "val":"collection6",
>> >   "count":4}]}}}
>> >
>> >
>> > However, in the new Solr 8.0.0, this query can only work once.
>> > Subsequently, we will get the following error of 'require
>> authentication':
>> >
>> > {
>> >   "responseHeader":{
>> > "zkConnected":true,
>> > "status":401,
>> > "QTime":11},
>> >   "error":{
>> > "metadata":[
>> >
>> >
>> "error-class","org.apache.solr.client.solrj.impl.Http2SolrClient$RemoteSolrException",
>> >
>> >
>> "root-error-class","org.apache.solr.client.solrj.impl.Http2SolrClient$RemoteSolrException"],
>> > "msg":"Error from server at null: Expected mime type
>> > application/octet-stream but got text/html. \n\n> > http-equiv=\"Content-Type\"
>> > content=\"text/html;charset=utf-8\"/>\nError 401 require
>> > authentication\n\nHTTP ERROR
>> 401\nProblem
>> > accessing /solr/collection6/select. Reason:\nrequire
>> > authentication\n\n\n",
>> > "code":401}}
>> >
>> > This issue does not occur in Solr 7.6 and Solr 7.7, even though I have
>> set
>> > up the same authentication for all the versions.
>> >
>> > What could be the issue that causes this?
>> >
>> > Regards,
>> > Edwin
>>
>


Re: "dismax" parameter "bq" filters instead of boosting

2019-04-17 Thread Alexandre Rafalovitch
I am not sure whether it is a bug or not actually. I never really used
dismax. Perhaps others can comment on that.

Regards,
   Alex.

On Wed, 17 Apr 2019 at 09:59, Nicolas Franck  wrote:
>
> Ok, thanks for your investigation ;-) That was quick.
>
> So you consider this as a bug, as it was fixed for edismax parser?
>
> I thought the parameter q.op only applied to the terms in de main
> query (parameter "q"), making ..
>
>   jakarta apache
>
> to be interpreted as
>
>   +jakarta +apache
>
> when q.op = AND
>
> The documentation of bq at least describes it as an "optional" query that only
> influences the score, not the result list.
>
>
> > On 16 Apr 2019, at 23:59, Alexandre Rafalovitch  wrote:
> >
> > If you set q.op=OR (and not as 'AND' you defined in your config), you
> > will see the difference between your last two queries. The second last
> > one will show 6 items and the last one still 5.
> >
> > As is, with your custom config, booster query is added as one more
> > clause in the search. q.op=ALL forces it to be a compulsory clause,
> > rather than an optional (boosting one).
> >
> > FQ is always a forced compulsory clause. Maybe it accepts boosts, but
> > all scores are ignored anyway (it is just 0 for fail and anything else
> > for pass).
> >
> > Adding 'debug=all' into the query parameters (or defaults) would help
> > you see that for yourself.
> >
> > But it does seem (in 7.2.1 I have here) that edismax seems to wrap
> > both query parts in individual brackets. Maybe there was a bug that
> > was fixed in eDismax only. No ideas there, except that most of the
> > effort goes into eDismax these days rather than dismax.
> >
> > Regards,
> >   Alex
> > P.s. My suggestion was actually to give the queries against STOCK
> > examples. That would have made all these parameters explicit and more
> > obvious. And perhaps would have allowed you to discover the minimum
> > parameter set causing the issue without all those other qf and pf in
> > the game.
> >
> > On Tue, 16 Apr 2019 at 16:13, Nicolas Franck  
> > wrote:
> >>
> >> I agree, but I thought my thread was lost in the long list of issues.
> >>
> >> I prepared a simple case for solr 8.0:
> >>
> >>  basic_dismax_set/config:
> >>
> >> schema.xml and solrconfig.xml
> >>
> >>  basic_dismax_set/data:
> >>
> >> records_pp.json
> >>
> >> Total 6 records:
> >>
> >> http://localhost:8983/solr/test/select?echoParams=all
> >>
> >> 5 records match format:book
> >>
> >> http://localhost:8983/solr/test/select?echoParams=all=format:book=lucene
> >>
> >> and 1 format:film
> >>
> >> http://localhost:8983/solr/test/select?echoParams=all=format:film=lucene
> >>
> >> But when I try this (defType is dismax) ..:
> >>
> >> http://localhost:8983/solr/test/select?echoParams=all=format:book^2
> >>
> >> the result list is filtered on format:book (total of 5 records)
> >>
> >> This url gives the same result by the way:
> >>
> >> http://localhost:8983/solr/test/select?echoParams=all=format:book^2
> >>
> >> while the character ^ isn't supposed to work in fq, right?
> >>
> >> The same result in both Solr 7.4.0 and Solr 8.0
> >>
> >> Thanks in advance
> >>
>


Re: Solr nested objects (child documents)

2019-04-17 Thread roiwexler
latest. why is that important?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: "dismax" parameter "bq" filters instead of boosting

2019-04-17 Thread Nicolas Franck
Ok, thanks for your investigation ;-) That was quick.

So you consider this as a bug, as it was fixed for edismax parser?

I thought the parameter q.op only applied to the terms in de main
query (parameter "q"), making ..

  jakarta apache

to be interpreted as

  +jakarta +apache

when q.op = AND

The documentation of bq at least describes it as an "optional" query that only
influences the score, not the result list.


> On 16 Apr 2019, at 23:59, Alexandre Rafalovitch  wrote:
> 
> If you set q.op=OR (and not as 'AND' you defined in your config), you
> will see the difference between your last two queries. The second last
> one will show 6 items and the last one still 5.
> 
> As is, with your custom config, booster query is added as one more
> clause in the search. q.op=ALL forces it to be a compulsory clause,
> rather than an optional (boosting one).
> 
> FQ is always a forced compulsory clause. Maybe it accepts boosts, but
> all scores are ignored anyway (it is just 0 for fail and anything else
> for pass).
> 
> Adding 'debug=all' into the query parameters (or defaults) would help
> you see that for yourself.
> 
> But it does seem (in 7.2.1 I have here) that edismax seems to wrap
> both query parts in individual brackets. Maybe there was a bug that
> was fixed in eDismax only. No ideas there, except that most of the
> effort goes into eDismax these days rather than dismax.
> 
> Regards,
>   Alex
> P.s. My suggestion was actually to give the queries against STOCK
> examples. That would have made all these parameters explicit and more
> obvious. And perhaps would have allowed you to discover the minimum
> parameter set causing the issue without all those other qf and pf in
> the game.
> 
> On Tue, 16 Apr 2019 at 16:13, Nicolas Franck  wrote:
>> 
>> I agree, but I thought my thread was lost in the long list of issues.
>> 
>> I prepared a simple case for solr 8.0:
>> 
>>  basic_dismax_set/config:
>> 
>> schema.xml and solrconfig.xml
>> 
>>  basic_dismax_set/data:
>> 
>> records_pp.json
>> 
>> Total 6 records:
>> 
>> http://localhost:8983/solr/test/select?echoParams=all
>> 
>> 5 records match format:book
>> 
>> http://localhost:8983/solr/test/select?echoParams=all=format:book=lucene
>> 
>> and 1 format:film
>> 
>> http://localhost:8983/solr/test/select?echoParams=all=format:film=lucene
>> 
>> But when I try this (defType is dismax) ..:
>> 
>> http://localhost:8983/solr/test/select?echoParams=all=format:book^2
>> 
>> the result list is filtered on format:book (total of 5 records)
>> 
>> This url gives the same result by the way:
>> 
>> http://localhost:8983/solr/test/select?echoParams=all=format:book^2
>> 
>> while the character ^ isn't supposed to work in fq, right?
>> 
>> The same result in both Solr 7.4.0 and Solr 8.0
>> 
>> Thanks in advance
>>