RE: heavy reads from disk when off-heap ram is constrained

2020-02-27 Thread Markus Jelsma
Hello Kyle,

This is actually the manual [1] clearly warns for. Snippet copied from the 
manual:

"When setting the maximum heap size, be careful not to let the JVM consume all 
available physical memory. If the JVM process space grows too large, the 
operating system will start swapping it, which will severely impact 
performance. In addition, the operating system uses memory space not allocated 
to processes for file system cache and other purposes. This is especially 
important for I/O-intensive applications, like Lucene/Solr. The larger your 
indexes, the more you will benefit from filesystem caching by the OS. It may 
require some experimentation to determine the optimal tradeoff between heap 
space for the JVM and memory space for the OS to use."

Please check it out, there are more useful hints to be found there.

Regards,
Markus

[1] 
https://lucene.apache.org/solr/guide/8_4/jvm-settings.html#JVMSettings-ChoosingMemoryHeapSettings

 
-Original message-
> From:lstusr 5u93n4 
> Sent: Thursday 27th February 2020 18:45
> To: solr-user@lucene.apache.org
> Subject: heavy reads from disk when off-heap ram is constrained
> 
> Hi All,
> 
> Something we learned recently that might be useful to the community.
> 
> We're running solr in docker, and we've constrained each of our containers
> to have access to 10G of the host's ram. Also, through `docker stats`, we
> can see the Block IO (filesystem reads/writes) that the solr process is
> doing.
> 
> On a test system with three nodes, three shards, each with two NRT
> replicas, and indexing a reference set of a million documents:
> 
>  - When allocating half of the container's available ram to the jvm (i.e.
> starting solr with -m 5g) we see a read/write distribution of roughly
> 400M/2G on each solr node.
> 
>  - When allocation ALL of the container's available ram to the jvm (i.e.
> starting solr with -m 10g) we see a read/write distribution of around 10G /
> 2G on each solr node, and the latency on the underlying disk soars.
> 
> The takeaway here is that Solr really does need non-jvm RAM to function,
> and if you're having performance issues, "adding more ram to the jvm" isn't
> always the right way to get things going faster.
> 
> Best,
> 
> Kyle
> 


RE: Repeatable search term bug in Solr 8?

2020-02-27 Thread Markus Jelsma
Hello Phil,

Solr never returns "The website encountered an unexpected error. Please try 
again later." as an error. To get to the root of the problem, you should at 
least post error logs that Solr actually throws, if it does at all.

You either have an application error, or an actual Solr problem. Neither is 
sure with this information.

It would be helpful if you can reproduce actual queries on Solr itself, without 
the application layer, and then if an error occurs share it with the community.

Regards,
Markus

 
 
-Original message-
> From:Staley, Phil R - DCF 
> Sent: Thursday 27th February 2020 22:32
> To: 'solr-user@lucene.apache.org' 
> Subject: Repeatable search term bug in Solr 8?
> 
> All,
> 
> We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting 
> reports of certain patterns of search terms resulting in an error that reads, 
> "The website encountered an unexpected error. Please try again later."
> 
> Below is a list of example terms that repeatably result in this error and a 
> similar list that works fine.  The problem pattern seems to be a search term 
> that contains 2 or 3 characters followed by a space, followed by additional 
> text.
> 
> To confirm that the problem is version 8 of SOLR, I have updated our local 
> and UAT sites with the latest Drupal updates that did include an update to 
> the Search API Solr module and tested the terms below under SOLR 7.7.2, 
> 8.3.1, and 8.4.1.  Under version 7.7.2  everything works fine. Under either 
> of the version 8, the problem returns.
> 
> Thoughts?
> 
> Search terms that result in error
> 
>   *   w-2 agency directory
>   *   agency w-2 directory
>   *   w-2 agency
>   *   w-2 directory
>   *   w2 agency directory
>   *   w2 agency
>   *   w2 directory
> 
> Search terms that do not result in error
> 
>   *   w-22 agency directory
>   *   agency directory w-2
>   *   agency w-2directory
>   *   agencyw-2 directory
>   *   w-2
>   *   w2
>   *   agency directory
>   *   agency
>   *   directory
>   *   -2 agency directory
>   *   2 agency directory
>   *   w-2agency directory
>   *   w2agency directory
> 
> 
> 
> 


Repeatable search term bug in Solr 8?

2020-02-27 Thread Staley, Phil R - DCF
All,

We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting 
reports of certain patterns of search terms resulting in an error that reads, 
"The website encountered an unexpected error. Please try again later."

Below is a list of example terms that repeatably result in this error and a 
similar list that works fine.  The problem pattern seems to be a search term 
that contains 2 or 3 characters followed by a space, followed by additional 
text.

To confirm that the problem is version 8 of SOLR, I have updated our local and 
UAT sites with the latest Drupal updates that did include an update to the 
Search API Solr module and tested the terms below under SOLR 7.7.2, 8.3.1, and 
8.4.1.  Under version 7.7.2  everything works fine. Under either of the version 
8, the problem returns.

Thoughts?

Search terms that result in error

  *   w-2 agency directory
  *   agency w-2 directory
  *   w-2 agency
  *   w-2 directory
  *   w2 agency directory
  *   w2 agency
  *   w2 directory

Search terms that do not result in error

  *   w-22 agency directory
  *   agency directory w-2
  *   agency w-2directory
  *   agencyw-2 directory
  *   w-2
  *   w2
  *   agency directory
  *   agency
  *   directory
  *   -2 agency directory
  *   2 agency directory
  *   w-2agency directory
  *   w2agency directory





Re: Graph Query Parser Syntax

2020-02-27 Thread sambasivarao giddaluri
Hi All ,
any suggestions?


On Fri, Feb 14, 2020 at 5:20 PM sambasivarao giddaluri <
sambasiva.giddal...@gmail.com> wrote:

> Hi All,
> In our project we have to use multiple graph queries with AND and OR
> conditions but graph query parser does not work for the below scenario, can
> any one suggest how to overcome this kind of problem? this is stopping our
> pre prod release .
> we are also using traversalFilter but our usecase still need multiple OR
> and AND graph query .
>
>
>
> *works*
> {!graph from=parentId to=parentId returnRoot=false}id:abc
> *works*
> ({!graph from=parentId to=parentId returnRoot=false}id:abc)
> *works*
> ({!graph from=parentId to=parentId returnRoot=false}id:abc AND name:test)
> *works*
> {!graph from=parentId to=parentId returnRoot=false}(id:abc AND name:test)
>
> *Fails Syntax Error *
> ({!graph from=parentId to=parentId returnRoot=false}(id:abc AND
> name:test))
>
> *Fails Syntax Error  *
> ({!graph from=parentId to=parentId returnRoot=false}(id:abc AND
> name:test))  OR (({!graph from=parentId to=parentId
> returnRoot=false}(description :abc AND name:test))
>
>
> '(id:abc': Encountered \"\" at line 1, column 13.\nWas expecting one
> of:\n ...\n ...\n ...\n\"+\" ...\n\"-\"
> ...\n ...\n\"(\" ...\n\")\" ...\n\"*\" ...\n
>  \"^\" ...\n ...\n ...\n ...\n
>   ...\n ...\n ...\n\"[\"
> ...\n\"{\" ...\n ...\n\"filter(\" ...\n
> ...\n",
>
> Regards
> sam
>
>
>


heavy reads from disk when off-heap ram is constrained

2020-02-27 Thread lstusr 5u93n4
Hi All,

Something we learned recently that might be useful to the community.

We're running solr in docker, and we've constrained each of our containers
to have access to 10G of the host's ram. Also, through `docker stats`, we
can see the Block IO (filesystem reads/writes) that the solr process is
doing.

On a test system with three nodes, three shards, each with two NRT
replicas, and indexing a reference set of a million documents:

 - When allocating half of the container's available ram to the jvm (i.e.
starting solr with -m 5g) we see a read/write distribution of roughly
400M/2G on each solr node.

 - When allocation ALL of the container's available ram to the jvm (i.e.
starting solr with -m 10g) we see a read/write distribution of around 10G /
2G on each solr node, and the latency on the underlying disk soars.

The takeaway here is that Solr really does need non-jvm RAM to function,
and if you're having performance issues, "adding more ram to the jvm" isn't
always the right way to get things going faster.

Best,

Kyle


Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-27 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Paras,

Thank you for this response! Yes, you are being clear __

Regarding the assumptions you make for MRR, do you have any research papers to 
confirm that these user behaviors have been observed? I only ask because this 
paper http://yichang-cs.com/yahoo/sigir14_SearchAssist.pdf talks about how 
users often skip results and go straight to vanilla search even though their 
query is displayed in the top of the suggestions list (section 3.2 "QAC User 
Behavior Analysis"), among other behaviors that go against general IR 
intuition. This is only one paper, of course, but it seems that user research 
of QAC is hard to come by otherwise.

So acceptance rate = # of suggestions taken / total queries issued ?
And Selection to Display = # of suggestions taken (this would only be 1, if the 
not-taken suggestions are given 0s) / total suggestions displayed ?

If the above is true, wouldn't Selection to Display be binary? I.e. it's either 
1/# of suggestions displayed (assuming this is a constant) or 0?

Best,
Audrey



From: Paras Lehana 
Sent: Thursday, February 27, 2020 2:58:25 AM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation

Hi Audrey,

For MRR, we assume that if a suggestion is selected, it's relevant. It's
also assumed that the user will always click the highest relevant
suggestion. Thus, we calculate position selection for each selection. If
still, I'm not understanding your question correctly, feel free to contact
me personally (hangouts?).

And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?


I was expecting you to ask this - I should have explained a bit more.
Acceptance Rate is the searches through Auto-Suggest for all Searches.
Whereas, value for Selection to Display is 1 if the Selection is made given
the suggestions were displayed otherwise 0. Here, the cases where results
are displayed is the universal set. Acceptance Rate is counted 0 even for
those searches where Selection was not made because there were no results
while S/D will not count this - it only counts cases where the result was
displayed.

Hope I'm clear. :)

On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> This article
> https://urldefense.proofpoint.com/v2/url?u=http-3A__wwwconference.org_proceedings_www2011_proceedings_p107.pdf=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo=fVp2mKYimlchSj0RMKpd595S7C2nGxK2G3CQSkrycg4=
>   also
> indicates that MRR needs binary relevance labels, p. 114: "To this end, we
> selected a random sample of 198 (query, context) pairs from the set of
> 7,311 pairs, and manually tagged each of them as related (i.e., the query
> is related to the context; 60% of the pairs) and unrelated (40% of the
> pairs)."
>
> On 2/25/20, 10:25 AM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com" <
> audrey.lorberf...@ibm.com> wrote:
>
> Thank you, Walter & Paras!
>
> So, from the MRR equation, I was under the impression the suggestions
> all needed a binary label (0,1) indicating relevance.* But it's great to
> know that you guys use proxies for relevance, such as clicks.
>
> *The reason I think MRR has to have binary relevance labels is this
> Wikipedia article:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank=DwIGaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw=
> , where it states below the formula that rank_i = "refers to the rank
> position of the first relevant document for the i-th query." If the
> suggestions are not labeled as relevant (0) or not relevant (1), then how
> do you compute the rank of the first RELEVANT document?
>
> I'll check out these readings asap, thank you!
>
> And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?
>
> Best,
> Audrey
>
> On 2/25/20, 1:11 AM, "Walter Underwood"  wrote:
>
> Here is a blog article with a worked example for MRR based on
> customer clicks.
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI=
>
> At my place of work, we compare the CTR and MRR of queries using
> suggestions to those that do not use suggestions. Solr autosuggest based on
> lexicon of book titles is highly effective for us.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
>
> 

About Learning To Rank

2020-02-27 Thread 李世明


Excuse me, how to embed a linear formula calculation in the outer layer of LTR?

We need to do a calculation on the score of LTR




Thank

Regarding Installation of Solr 5.1.0 as Paas in AZure

2020-02-27 Thread Shrinivasan Mohanakrishnan
Hi Team,

Is it possible to install Solr 5.1.0 as Paas in Azure, Since my sitecore is 8.2 
update 5 which supports only solr 5.1.0. I tried to install using various 
articles but it is throwing 404 error. Can anyone please help on this.

Thanks in Advance.

Regards,
Shrinivasan.


This e-mail communication and any attachments to it are confidential and 
privileged to Hexaware and are strictly intended only for the personal and 
confidential use of the designated recipient(s) named above. If you are not the 
intended recipient of this message, you are hereby notified that any review, 
dissemination, distribution or copying of this message is strictly prohibited 
and may be unlawful.

Please notify the sender immediately and destroy all copies of this message 
along with all attachments thereto.


Javadocs are not linkable

2020-02-27 Thread Thomas Scheffler
Hi,

I recently noticed that the SOLR javadocs hosted by lucene are not linkable as 
the „package-list“ file is not downloadable. Is this on purpose?

$ curl https://lucene.apache.org/solr/8_4_0/solr-solrj/package-list


301 Moved Permanently

Moved Permanently
The document has moved https://lucene.apache.org/solr/8_4_0/solr-solrj/package-list/;>here.



It’s the same issue with older versions. My maven build fails with:

MavenReportException: Error while generating Javadoc:
[ERROR] Exit code: 1 - javadoc: error - Error fetching URL: 
https://lucene.apache.org/solr/8_3_0/solr-solrj/

kind regards

Thomas

Re: Upgrading from 6.5.0 to 8.4.1

2020-02-27 Thread Erick Erickson
Despite deleting all the docs, what I suspect is that you still have segments 
that have no live docs in them, but still preserve the 6x marker.

That said, what’s the point of the 7.7.2 step? Why not just index once into a 
pristine 8.4.1 cluster? You could even do this on a few machines, serving no 
traffic, and when you were satisfied repurpose your current machines by 
installing Solr 8.4.1 on them and using ADDREPLICA to build out your cluster.

Oh, and I strongly recommend that you do _not_ simply change the 
LuceneMatchVersion and reuse your current config. Please start with a new 
configset and port just the changes from your 6x config to them.

Best,
Erick

> On Feb 27, 2020, at 8:02 AM, Pavel Polivka  wrote:
> 
> Hello,
> 
> I am doing upgrade of SolrCloud cluster from 6.5.0 to 8.4.1.
> 
> My process is:
> 
> Upgrade to 7.7.2.
> Reconfigure my solrconfig to have  
> 7.7.2, reload collections.
> Delete all docs in all collections and index them again.
> 
> Upgrade to 8.4.1.
> This step always fails for me with message in logs saying that the index 
> format is probably created by 6.x version.
> My thinking is that the reindex I do in 7.7.2 is not enough.
> 
> Is there a way to check in what format my index is?
> 
> Any ideas if I am doing anything wrong?
> 
> 
> Thanks.
> 
> 
> Pavel Polivka


Re: Upgrading from 6.5.0 to 8.4.1

2020-02-27 Thread Jörn Franke
You did a reload and not a reindex? 
Probably the best is to delete the collection fully, create it new and index 
then .

> Am 27.02.2020 um 14:02 schrieb Pavel Polivka :
> 
> Hello,
> 
> I am doing upgrade of SolrCloud cluster from 6.5.0 to 8.4.1.
> 
> My process is:
> 
> Upgrade to 7.7.2.
> Reconfigure my solrconfig to have  
> 7.7.2, reload collections.
> Delete all docs in all collections and index them again.
> 
> Upgrade to 8.4.1.
> This step always fails for me with message in logs saying that the index 
> format is probably created by 6.x version.
> My thinking is that the reindex I do in 7.7.2 is not enough.
> 
> Is there a way to check in what format my index is?
> 
> Any ideas if I am doing anything wrong?
> 
> 
> Thanks.
> 
> 
> Pavel Polivka


Upgrading from 6.5.0 to 8.4.1

2020-02-27 Thread Pavel Polivka
Hello,

I am doing upgrade of SolrCloud cluster from 6.5.0 to 8.4.1.

My process is:

Upgrade to 7.7.2.
Reconfigure my solrconfig to have  
7.7.2, reload collections.
Delete all docs in all collections and index them again.

Upgrade to 8.4.1.
This step always fails for me with message in logs saying that the index format 
is probably created by 6.x version.
My thinking is that the reindex I do in 7.7.2 is not enough.

Is there a way to check in what format my index is?

Any ideas if I am doing anything wrong?


Thanks.


Pavel Polivka


Re: Rule of thumb for determining maxTime of AutoCommit

2020-02-27 Thread Dwane Hall
Hey Kaya,

How are you adding documents to your index?  Do you control this yourself or do 
you have multiple clients (curl, solrJ, calls directly to /update*) updating 
data in your index as I suspect (based on your hard and soft commit settings) 
that a client may be causing your soft commits making updates using the 
commitWithin parameter (who's default behaviour is a soft commit 
[openSearcher=ture]).  If this is the case and you don't want to allow client 
commits you can prevent this behaviour by adding a processor to your 
updateRequestProcessorChain of your solrconfig.xml and then manage your commits 
by configuring the settings you've described below 
(https://lucene.apache.org/solr/guide/8_4/shards-and-indexing-data-in-solrcloud.html#ignoring-commits-from-client-applications-in-solrcloud).


I always go back to Erick Erikson's great Lucidworks blog post for all things 
indexing related so if you haven't seen it I highly recommend checking it out.  
There's a ton of great info in this post so it may take a few reads to 
completely digest it all and towards the end he provides some index and query 
scenarios and some sensible hard and soft commit settings to address some 
common use cases.
To this day his words in this post still ring true in my ear "hard commits are 
about durability, soft commits are about visibility" 
(https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/)

Good luck,

Dwane


From: Emir Arnautović 
Sent: Thursday, 27 February 2020 9:23 PM
To: solr-user@lucene.apache.org 
Subject: Re: Rule of thumb for determining maxTime of AutoCommit

Hi Kaya,
Since you do not have soft commits, you must have explicit commits somewhere 
since your hard commits are configured not to open searcher.

Re warming up: yes - you are right. You need to check your queries and warmup 
numbers in cache configs. What you need to check is how log does warmup takes 
and if it takes too long reduce number of warmup queries/items. I think that 
there is cumulative warming time in admin console, or if you prefer some proper 
Solr monitoring tool, you can check out our Solr integration: 
https://apps.sematext.com/demo 

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Feb 2020, at 03:00, Kayak28  wrote:
>
> Hello, Emir:
>
> Thank you for your reply.
> I do understand that the frequency of creating searcher depends on how much
> realitime-search is required.
>
> As you advise me, I have checked a soft-commit configuration.
> It is configured as:
> ${solr.autoSoftCommit.maxTime:-1}
>
> If I am correct, I have not set autoSoftCommit, which means autoSoftCommit
> does not create a new searcher.
> Next, I will take a look at my explicit commit.
>
> I am curious about what you say "warming strategy."
> Is there any good resource to learn about tuning warming up?
>
> As far as I know about warming up, there is two warming-up functions in
> Solr.
> One is static warming up, which you can configure queries in solrconfig.xml
> The other is dynamic warming up, which uses queries from old cache.
>
> How should I tune them?
> What is the first step to look at?
> (I am kinda guessing the answer can vary depends on the system, the
> service, etc... )
>
>
>
> Sincerely,
> Kaya Ota
>
>
>
> 2020年2月26日(水) 17:36 Emir Arnautović :
>
>> Hi Kaya,
>> The answer is simple: as much as your requirements allow delay between
>> data being indexed and changes being visible. It is sometimes seconds and
>> sometimes hours or even a day is tolerable. On each commit your caches are
>> invalidated and warmed (if it is configured like that) so in order to get
>> better use of caches, you should commit as rare as possible.
>>
>> The setting that you provided is about hard commits and those are
>> configured not to open new searcher so such commit does not cause “exceeded
>> limit” error. You either have soft auto commits configured or you do
>> explicit commits when updating documents. Check and tune those and if you
>> do explicit commits, remove those if possible. If you cannot afford less
>> frequent commits, you have to tune your warming strategy to make sure it
>> does not take as much time as period between two commits.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>>> On 26 Feb 2020, at 06:16, Kayak28  wrote:
>>>
>>> Hello, Solr Community:
>>>
>>> Another day, I had an error "exceeded limit of maxWarmingSearchers=2."
>>> I know this error causes when multiple commits(which opens a new
>> searcher)
>>> are requested too frequently.
>>>
>>> As far as I read Solr wiki, it recommends for me to have more interval
>>> between each commit, and make commit frequency less.
>>> Using autoCommit,  I would like to 

Position And Offset in Solr Query

2020-02-27 Thread ????
Hi all,


I am working in Chinese App. I am suffering a question of whether a PhraseQuery 
involved StartOffset and EndOffset of a term on SOLR.


For instance:
Text: ??
Indexing: ??|??||||??||??
position: 1 | 2| 3 | 4| 
4 | 5  | 5  | 5
querying: ??|??|||??


We have seen PhraseQuery base on Position. It works well expected for recalling 
nothing when the query is "|??".


Will PhraseQuery match documents according to 'StartOffset/EndOffset' of the 
terms? If so, it will filter the above document(|??||??).


The question is HOW can I do set something on PhraseQuery for working within 
term's offset.
Or, maybe we have the other approaches to do that.


Thanks very much in advance.



---
Daming
Apache SkyWalking

Re: Mix Index having child-free and nested documents

2020-02-27 Thread Naman Jain
here is query
/solr/test/select?q={!parent%20which=doc_type:Parent%20score=max}%20%20{!boost%20b=100.0%20}color:Red%20{!dismax%20qf=title%20v=%27Regular%27%20score=total}=id,product_class_type,title,score,color=json


On Thu, Feb 27, 2020 at 2:37 PM Naman Jain  wrote:

> I have a solr core which has a mix of child-free and with-child documents,
> sample xml:
> 
> 
> 4
> Regular Shirt
> Parent
> Black
> 
> 
> 8
> Solid Rug
> Parent
> Solid
> 
> 
> 1
> Regular color Shirts
> Parent
> 
> 
> 2
> Child
> Red>
> 
> 
> 3
> Child
> Blue>
> 
> 
> 
> 
> 5
> Rugs
> Parent
> 
> 
> 6
> Child
> Abstract
> 
> 
> 7
> Child
> Printed
> 
> 
> 
> 
> Now i want to write a query which fetched all items [with child and
> without child] with color:red, title having shirt
> Here is my query:
>
> /solr/test/select?q={!parent%20which=doc_type:Parent%20score=max}%20%20{!boost%20b=100.0%20}pcs_color:Red%20{!dismax%20qf=title%20v=%27Regular%27%20score=total}=id,product_class_type,title,score,pcs_color=json
> 
>
> Error:
> "msg": "Child query must not match same docs with parent filter. Combine
> them as must clauses (+) to find a problem doc. docId=4, class
> org.apache.lucene.search.DisjunctionSumScorer",
> I think docId 4 is matching with query but as this is a parent solr is
> throwing error.
>
> How to proceed here?
>
>
>
>


Mix Index having child-free and nested documents

2020-02-27 Thread Naman Jain
I have a solr core which has a mix of child-free and with-child documents,
sample xml:


4
Regular Shirt
Parent
Black


8
Solid Rug
Parent
Solid


1
Regular color Shirts
Parent


2
Child
Red>


3
Child
Blue>




5
Rugs
Parent


6
Child
Abstract


7
Child
Printed




Now i want to write a query which fetched all items [with child and without
child] with color:red, title having shirt
Here is my query:
/solr/test/select?q={!parent%20which=doc_type:Parent%20score=max}%20%20{!boost%20b=100.0%20}pcs_color:Red%20{!dismax%20qf=title%20v=%27Regular%27%20score=total}=id,product_class_type,title,score,pcs_color=json


Error:
"msg": "Child query must not match same docs with parent filter. Combine
them as must clauses (+) to find a problem doc. docId=4, class
org.apache.lucene.search.DisjunctionSumScorer",
I think docId 4 is matching with query but as this is a parent solr is
throwing error.

How to proceed here?


Re: Rule of thumb for determining maxTime of AutoCommit

2020-02-27 Thread Emir Arnautović
Hi Kaya,
Since you do not have soft commits, you must have explicit commits somewhere 
since your hard commits are configured not to open searcher.

Re warming up: yes - you are right. You need to check your queries and warmup 
numbers in cache configs. What you need to check is how log does warmup takes 
and if it takes too long reduce number of warmup queries/items. I think that 
there is cumulative warming time in admin console, or if you prefer some proper 
Solr monitoring tool, you can check out our Solr integration: 
https://apps.sematext.com/demo 

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Feb 2020, at 03:00, Kayak28  wrote:
> 
> Hello, Emir:
> 
> Thank you for your reply.
> I do understand that the frequency of creating searcher depends on how much
> realitime-search is required.
> 
> As you advise me, I have checked a soft-commit configuration.
> It is configured as:
> ${solr.autoSoftCommit.maxTime:-1}
> 
> If I am correct, I have not set autoSoftCommit, which means autoSoftCommit
> does not create a new searcher.
> Next, I will take a look at my explicit commit.
> 
> I am curious about what you say "warming strategy."
> Is there any good resource to learn about tuning warming up?
> 
> As far as I know about warming up, there is two warming-up functions in
> Solr.
> One is static warming up, which you can configure queries in solrconfig.xml
> The other is dynamic warming up, which uses queries from old cache.
> 
> How should I tune them?
> What is the first step to look at?
> (I am kinda guessing the answer can vary depends on the system, the
> service, etc... )
> 
> 
> 
> Sincerely,
> Kaya Ota
> 
> 
> 
> 2020年2月26日(水) 17:36 Emir Arnautović :
> 
>> Hi Kaya,
>> The answer is simple: as much as your requirements allow delay between
>> data being indexed and changes being visible. It is sometimes seconds and
>> sometimes hours or even a day is tolerable. On each commit your caches are
>> invalidated and warmed (if it is configured like that) so in order to get
>> better use of caches, you should commit as rare as possible.
>> 
>> The setting that you provided is about hard commits and those are
>> configured not to open new searcher so such commit does not cause “exceeded
>> limit” error. You either have soft auto commits configured or you do
>> explicit commits when updating documents. Check and tune those and if you
>> do explicit commits, remove those if possible. If you cannot afford less
>> frequent commits, you have to tune your warming strategy to make sure it
>> does not take as much time as period between two commits.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 26 Feb 2020, at 06:16, Kayak28  wrote:
>>> 
>>> Hello, Solr Community:
>>> 
>>> Another day, I had an error "exceeded limit of maxWarmingSearchers=2."
>>> I know this error causes when multiple commits(which opens a new
>> searcher)
>>> are requested too frequently.
>>> 
>>> As far as I read Solr wiki, it recommends for me to have more interval
>>> between each commit, and make commit frequency less.
>>> Using autoCommit,  I would like to decrease the commit frequency, but I
>> am
>>> not sure how much I should increase the value of maxTime in autoCommit?
>>> 
>>> My current configuration is the following:
>>> 
>>>   
>>> ${solr.autoCommit.maxTime:15000}
>>> false
>>>   
>>> 
>>> 
>>> 
>>> How do you determine how much you increase the value in this case?
>>> Is there any rule of thumb advice to configure commit frequency?
>>> 
>>> Any help will be appreciated.
>>> 
>>> Sincerely,
>>> Kaya Ota
>> 
>> 



Re: Time out problems with the Solr server 8.4.1

2020-02-27 Thread Massimiliano Randazzo
Thank you,

I proceed with installing the system directly on the server where I have
the data folder by removing NFS and I will let you know

Il giorno gio 27 feb 2020 alle ore 10:52 Dario Rigolin <
dario.rigo...@comperio.it> ha scritto:

> I this the issue is NFS. If you mode all to a NVMe or SSD local to the
> server indexing process will work fine.
> NFS is the wrong filesystem for solr.
>
> I hope this helps.
>
> Il giorno gio 27 feb 2020 alle ore 00:03 Massimiliano Randazzo <
> massimiliano.randa...@gmail.com> ha scritto:
>
> > Il giorno mer 26 feb 2020 alle ore 23:42 Vincenzo D'Amore <
> > v.dam...@gmail.com> ha scritto:
> >
> > > Hi Massimiliano,
> > >
> > > it’s not clear how much memory you have configured for your Solr
> > instance.
> > >
> >
> > SOLR_HEAP="20480m"
> > SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> > GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
> >   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> >
> > > And I would avoid an nfs mount for the datadir.
> > >
> > > Ciao,
> > > Vincenzo
> > >
> > > --
> > > mobile: 3498513251
> > > skype: free.dev
> > >
> > > > On 26 Feb 2020, at 19:44, Massimiliano Randazzo <
> > > massimiliano.randa...@gmail.com> wrote:
> > > >
> > > > Il giorno mer 26 feb 2020 alle ore 19:30 Dario Rigolin <
> > > > dario.rigo...@comperio.it> ha scritto:
> > > >
> > > >> You can avoid commit and leave solr do autocommit at certain times.
> > > >> Or use softcommit if you have search queries at the same time to
> > answer.
> > > >> 55 pages of 3500 words isn't a big deal for a solr server,
> what's
> > > the
> > > >> hardware configuration?
> > > > The solr instance runs on a server with the following configuration:
> > > > 12 core Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
> > > > 64GB Ram
> > > > solr's DataDir is on a volume of another server that I mounted via
> NFS
> > (I
> > > > was thinking of moving the solr server to the server where the
> DataDir
> > > > resides even if it has lower characteristics 8 core Intel(R) Xeon(R)
> > CPU
> > > >   E5506  @ 2.13GHz 24GB Ram)
> > > >
> > > > What's you single solr document a single newspaper? a single page?
> > > >
> > > > the single solr document refers to the single word of the document
> > > >
> > > >
> > > >> Do you have a solrcloud with 8 nodes? Or are you sending same
> document
> > > to 8
> > > >> single solr servers?
> > > >> I have 8 servers that process 550,000 newspapers and all of them
> write
> > > on
> > > > 1 solr server only
> > > >
> > > >
> > > >>> Il giorno mer 26 feb 2020 alle ore 19:22 Massimiliano Randazzo <
> > > >>> massimiliano.randa...@gmail.com> ha scritto:
> > > >>> Good morning
> > > >>> I have the following situation I have to index the OCR of about
> > 550,000
> > > >>> pages of newspapers counting an average of 3,500 words per page and
> > > >> making
> > > >>> a document per word the records are many.
> > > >>> At the moment I have 1 instance of Solr and 8 servers that read and
> > > write
> > > >>> all on the same instance at the same time, at the beginning
> > everything
> > > is
> > > >>> fine after a while when I add, delete or commit it gives me a
> TimeOut
> > > >> error
> > > >>> towards the solr server.
> > > >>> I suspect the problem is due to the fact that it is that I do many
> > > commit
> > > >>> operations of many docs at a time (practically if the newspaper is
> 30
> > > >> pages
> > > >>> I do 105,000 add and in the end I commit), if everyone does this
> and
> > 8
> > > >>> servers within walking distance of each other I think this creates
> > > >> problems
> > > >>> for Solr.
> > > >>> What can I do to solve the problem?
> > > >>> Do I make a commi to each add?
> > > >>> Is it possible to configure the solr server to apply the add and
> > delete
> > > >>> commands, and to commit it, the server autonomously supports the
> > > >> available
> > > >>> resources as it seems to do for the optmized command?
> > > >>> Reading the documentation I would have found this configuration to
> > > >>> implement but not if it solves my problem
> > > >>> 
> > > >>> 1
> > > >>> 0
> > > >>>  > > >>
> > >
> >
> name="maxCommitAge">1DAYfalse
> > > >>> Thanks for your consideration
> > > >>> Massimiliano Randazzo
> > > >> --
> > > >> Dario Rigolin
> > > >> Comperio srl - CTO
> > > >> Mobile: +39 347 7232652 - Office: +39 0425 471482
> > > >> Skype: dario.rigolin
> > > >
> > > >
> > > > --
> > > > Massimiliano Randazzo
> > > >
> > > > Analista Programmatore,
> > > > Sistemista Senior
> > > > Mobile +39 335 6488039
> > > > email: massimiliano.randa...@gmail.com
> > > > pec: massimiliano.randa...@pec.net
> > >
> >
> >
> > --
> > Massimiliano Randazzo
> >
> > Analista Programmatore,
> > Sistemista Senior
> > Mobile +39 335 6488039
> > email: massimiliano.randa...@gmail.com
> > pec: massimiliano.randa...@pec.net
> >
>
>
> --
>
> Dario Rigolin
> Comperio srl - CTO
> Mobile: +39 347 7232652 - Office: 

Re: Time out problems with the Solr server 8.4.1

2020-02-27 Thread Dario Rigolin
I this the issue is NFS. If you mode all to a NVMe or SSD local to the
server indexing process will work fine.
NFS is the wrong filesystem for solr.

I hope this helps.

Il giorno gio 27 feb 2020 alle ore 00:03 Massimiliano Randazzo <
massimiliano.randa...@gmail.com> ha scritto:

> Il giorno mer 26 feb 2020 alle ore 23:42 Vincenzo D'Amore <
> v.dam...@gmail.com> ha scritto:
>
> > Hi Massimiliano,
> >
> > it’s not clear how much memory you have configured for your Solr
> instance.
> >
>
> SOLR_HEAP="20480m"
> SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
>   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
>
> > And I would avoid an nfs mount for the datadir.
> >
> > Ciao,
> > Vincenzo
> >
> > --
> > mobile: 3498513251
> > skype: free.dev
> >
> > > On 26 Feb 2020, at 19:44, Massimiliano Randazzo <
> > massimiliano.randa...@gmail.com> wrote:
> > >
> > > Il giorno mer 26 feb 2020 alle ore 19:30 Dario Rigolin <
> > > dario.rigo...@comperio.it> ha scritto:
> > >
> > >> You can avoid commit and leave solr do autocommit at certain times.
> > >> Or use softcommit if you have search queries at the same time to
> answer.
> > >> 55 pages of 3500 words isn't a big deal for a solr server, what's
> > the
> > >> hardware configuration?
> > > The solr instance runs on a server with the following configuration:
> > > 12 core Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
> > > 64GB Ram
> > > solr's DataDir is on a volume of another server that I mounted via NFS
> (I
> > > was thinking of moving the solr server to the server where the DataDir
> > > resides even if it has lower characteristics 8 core Intel(R) Xeon(R)
> CPU
> > >   E5506  @ 2.13GHz 24GB Ram)
> > >
> > > What's you single solr document a single newspaper? a single page?
> > >
> > > the single solr document refers to the single word of the document
> > >
> > >
> > >> Do you have a solrcloud with 8 nodes? Or are you sending same document
> > to 8
> > >> single solr servers?
> > >> I have 8 servers that process 550,000 newspapers and all of them write
> > on
> > > 1 solr server only
> > >
> > >
> > >>> Il giorno mer 26 feb 2020 alle ore 19:22 Massimiliano Randazzo <
> > >>> massimiliano.randa...@gmail.com> ha scritto:
> > >>> Good morning
> > >>> I have the following situation I have to index the OCR of about
> 550,000
> > >>> pages of newspapers counting an average of 3,500 words per page and
> > >> making
> > >>> a document per word the records are many.
> > >>> At the moment I have 1 instance of Solr and 8 servers that read and
> > write
> > >>> all on the same instance at the same time, at the beginning
> everything
> > is
> > >>> fine after a while when I add, delete or commit it gives me a TimeOut
> > >> error
> > >>> towards the solr server.
> > >>> I suspect the problem is due to the fact that it is that I do many
> > commit
> > >>> operations of many docs at a time (practically if the newspaper is 30
> > >> pages
> > >>> I do 105,000 add and in the end I commit), if everyone does this and
> 8
> > >>> servers within walking distance of each other I think this creates
> > >> problems
> > >>> for Solr.
> > >>> What can I do to solve the problem?
> > >>> Do I make a commi to each add?
> > >>> Is it possible to configure the solr server to apply the add and
> delete
> > >>> commands, and to commit it, the server autonomously supports the
> > >> available
> > >>> resources as it seems to do for the optmized command?
> > >>> Reading the documentation I would have found this configuration to
> > >>> implement but not if it solves my problem
> > >>> 
> > >>> 1
> > >>> 0
> > >>>  > >>
> >
> name="maxCommitAge">1DAYfalse
> > >>> Thanks for your consideration
> > >>> Massimiliano Randazzo
> > >> --
> > >> Dario Rigolin
> > >> Comperio srl - CTO
> > >> Mobile: +39 347 7232652 - Office: +39 0425 471482
> > >> Skype: dario.rigolin
> > >
> > >
> > > --
> > > Massimiliano Randazzo
> > >
> > > Analista Programmatore,
> > > Sistemista Senior
> > > Mobile +39 335 6488039
> > > email: massimiliano.randa...@gmail.com
> > > pec: massimiliano.randa...@pec.net
> >
>
>
> --
> Massimiliano Randazzo
>
> Analista Programmatore,
> Sistemista Senior
> Mobile +39 335 6488039
> email: massimiliano.randa...@gmail.com
> pec: massimiliano.randa...@pec.net
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Haystack US tickets on sale!

2020-02-27 Thread Charlie Hull

Hi all,

Very happy to announce that Haystack US 2020, the search relevance 
conference, is now open for business! www.haystackconf.com for details 
of the event running during the week of April 27th in Charlottesville, 
including associated training. We have a fantastic lineup of speakers 
due to be published soon, there will be fun social events, book signings 
and more. Earlybird discounts are active until the end of March.


(If you can't wait that long we're also running some Solr training in 
March in London 
https://www.eventbrite.co.uk/e/think-like-a-relevance-engineer-solr-march-2020-london-uk-tickets-92942813457 
and holding our London Solr Meetup that same week 
https://www.meetup.com/Apache-Lucene-Solr-London-User-Group/)


Cheers

Charlie

--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



Re: Ordering in Nested Document

2020-02-27 Thread Gajendra Dadheech
Thanks, Mikhail for Reply.
One more question here:
Lets say xml is like this:


4
Regular Shirt
Parent
Black


8
Solid Rug
Parent
Solid


1
Regular color Shirts
Parent


2
Child
Red>


3
Child
Blue>




5
Rugs
Parent


6
Child
Abstract


7
Child
Printed




Now i want to write a query which fetched all items [with child ->1,5 and
without child -> 4,8] with color:red, title having shirt
here is query
/solr/test/select?q={!parent%20which=doc_type:Parent%20score=max}%20%20{!boost%20b=100.0%20}color:Red%20{!dismax%20qf=title%20v=%27Regular%27%20score=total}=id,product_class_type,title,score,color=json



Error:
"msg": "Child query must not match same docs with parent filter. Combine
them as must clauses (+) to find a problem doc. docId=4, class
org.apache.lucene.search.DisjunctionSumScorer",
I think docId 4 is matching with query but as this is a parent document,
solr is throwing error. Is this kind of schema supported in solr 7.6 [mix
of child-free and nested document]



On Mon, Feb 24, 2020 at 5:24 PM Mikhail Khludnev  wrote:

> You may try. Content-type should be absolutely the same across parents and
> child-free. It may work now.
> Earlier, mixing blocks and childfrees in one index wasn't supported.
>
> On Mon, Feb 24, 2020 at 2:57 AM Gajendra Dadheech 
> wrote:
>
> > That extra s was intentional, should have added a better name.
> >
> > So ideally we shouldn't have childfree and blocks together while
> indexing?
> > Or in the whole index they shouldn't be together, i.e. We should have
> > atleast one child doc for all if any of doc has one?
> >
> > On Mon, Feb 24, 2020 at 4:24 PM Mikhail Khludnev 
> wrote:
> >
> > > Hello, Gajendra.
> > > Pics doesn't come through mailing list.
> > > May it caused by unnecessary s  *s*
> > > parentDocument?
> > > At least earlier mixing childfrees and blocks wasn't allowed, and
> caused
> > > some troubles. Usually, child stub used to keep childfrees in the
> index.
> > >
> > > On Mon, Feb 24, 2020 at 2:22 AM Gajendra Dadheech  >
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > i want to ingest below documents, where there is a mix of nested and
> > > > un-nested documents:
> > > > 
> > > >   
> > > >   5
> > > >   5
> > > >   5Solr adds block join support
> > > >   sparentDocument
> > > >   
> > > >  
> > > >   1
> > > >   1
> > > >   Solr adds block join support
> > > >   parentDocument
> > > >   
> > > >   2
> > > >   1
> > > >   SolrCloud supports it too!
> > > >   childDocument
> > > >   
> > > >   
> > > >   
> > > >   3
> > > >   3
> > > >   New Lucene and Solr release is out
> > > >   parentDocument
> > > >   
> > > > 4
> > > > 4
> > > > Lots of new features
> > > > childDocument
> > > >   
> > > >   
> > > > 
> > > >
> > > >
> > > > Output of block join query after ingesting above docs:
> > > > [image: image.png]
> > > >
> > > > So doc id 5 is getting linked to doc id 1. Is this expected
> behavior, I
> > > > believ Id-5 should be a different document tree.
> > > >
> > > > Shall I Ingest them in some order ?
> > > >
> > > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>