Re: clearing document cache || solr 6.6

2019-01-29 Thread Shawn Heisey
On 1/29/2019 11:27 PM, sachin gk wrote: Is there a way to clear the *document cache* after we commit to the indexer. All Solr caches are invalidated when you issue a commit with openSearcher set to true. The default setting is true, and normally it doesn't get set to false unless you

Re: Solrcloud TimeoutException: Idle timeout expired

2019-01-29 Thread Deepak Goel
Document is not being passed. It has zero content. It could be due to no memory in heap. For this please check GC logs On Tue, 29 Jan 2019, 08:54 Schaum Mallik I am seeing this error in our logs. Our Solr heap is set to more than 10G. > Any clues which anyone can provide will be very helpful. >

Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

2019-01-29 Thread Shawn Heisey
On 1/28/2019 10:14 AM, Zheng Lin Edwin Yeo wrote: We have the following TieredMergePolicyFactory configuration in our solrconfig,xml 10 10 10 These three settings are the really important ones. Except for maxMergeAtOnceExplicit, you have these at

Re: Error using collapse parser with /export

2019-01-29 Thread Rahul Goswami
I checked again and looks like all documents with the same "id_field" reside on the same shard, in which case I would expect collapse parser to work. Here is my complete query: http://localhost:8983/solr/mycollection/stream/?expr=search(mycollection ,sort="field1 asc,field2

clearing document cache || solr 6.6

2019-01-29 Thread sachin gk
Hi All, Is there a way to clear the *document cache* after we commit to the indexer. -- Regards, Sachin

Re: SPLITSHARD not working as expected

2019-01-29 Thread Rahul Goswami
Thanks for the reply Jan. I have been referring to documentation for SPLISHARD on 7.2.1 which seems to be missing some important information present in 7.6

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi everyone, We have tried to do the setup and indexing on the latest Solr 7.6.0 However, we faced exactly the same issue as what we faced in Solr 7.5.0, in which the search for customers collection slowed down once we indexed policies collection. Regards, Edwin On Wed, 30 Jan 2019 at 01:19,

Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi, Anyone has any insights of this? Thank you in advance. Regards, Edwin On Tue, 29 Jan 2019 at 01:14, Zheng Lin Edwin Yeo wrote: > Hi, > > We have the following TieredMergePolicyFactory configuration in our > solrconfig,xml > > > 10 > 10 > 10 > 10 >

Re: The parent shard will never be delete/clean?

2019-01-29 Thread zhenyuan wei
That is Cool~ , I'll try it. Thanks ! Andrzej Białecki 于2019年1月23日周三 下午8:53写道: > Solr 7.4.0 added a periodic maintenance task that cleans up old inactive > parent shards left after the split. “Old” means 2 days by default. > > > On 22 Jan 2019, at 15:31, Jason Gerlowski wrote: > > > > Hi, >

Creating shard with core.properties

2019-01-29 Thread Bharath Kumar
Hi All, I am trying to create a shard using solr 7.6.0 using just core.properties file (like auto-discovering the shard) with legacyCloud set to false. But i am getting an error message like below even though i specify the coreNodeName in the core.properties file:- "coreNodeName " + coreNodeName

Solrcloud TimeoutException: Idle timeout expired

2019-01-29 Thread Schaum Mallik
I am seeing this error in our logs. Our Solr heap is set to more than 10G. Any clues which anyone can provide will be very helpful. Thank you null:java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 12/12 ms at

Re: Solr 5.2.1 replication hangs possibly during segment merge after a delete operation

2019-01-29 Thread Ravi Prakash
Thanks. I am not explicitly asking solr to optimize. I do send -commit yes in the POST command when I execute the delete query. In the master-slave node where replication is hung I see this: On the master: -bash-4.1$ ls -al data/index/segments_* -rw-rw-r--. 1 u g 1269 Jan 29 16:23

Re: HttpParser URI is too large

2019-01-29 Thread levtannen
Thank you Jan. This solution worked. The warning message "URI is too large >81920" disappeared. But this fix unleashed an another problem: The INFO message that was suppressed by the previous error now is displayed in all its length. And it is way too long because it lists all 100 collections. I

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Walter Underwood
Is this a sharded Solr Cloud collection? If so, you can try using global IDF. That should make the scores more similar on different nodes. https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_ wunder Walter Underwood

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread David Hastings
Maybe instead of using the solr score in your metrics, find a way to use the documents location in the results? you can never trust the score to be consistent, its constantly changing as the indexes changes On Tue, Jan 29, 2019 at 1:29 PM Ashish Bisht wrote: > Hi Erick, > > Our business

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht
Hi Erick, Our business wanted score not to be totally based on default relevancy algo. Instead a mix of solr relevancy+usermetrics(80%+20%). Each result doc is calculated against max score as a fraction of 80.Remaining 20 is from user metrics. Finally sort happens on new score. But say we

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi Paul, Thanks for the reply and suggestion Yes, we have installed RamMap, and are analyzing the results from there. The problem we are facing is that once the query for that collection becomes slow, it will not be fast again even after we restart Solr or the entire machine. Regards, Edwin On

Re: How to specify custom update chain in a SolrJ request

2019-01-29 Thread Chris Wareham
Answering myself, the solution is to update my code as follows: UpdateRequest request = new UpdateRequest(); request.setParam("update.chain", "skipexisting"); for (Map.Entry user : users.entrySet()) { SolrInputDocument document = new SolrInputDocument(); document.addField("id",

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi Shawn, No worries, and thanks for your clarification. We make these changes in order to use the Unifed Highlighter, with hl.offsetSource = POSTING, and add "light" term vectors. The settings comes from what is written in the Solr guide on highlighting, which says the following: *Postings*:

Re: MLT - unexpected design choice

2019-01-29 Thread Maria Mestre
Hi Alessandro and Matt, Thanks so much for your help! @Alessandro: I will do so, thank you :-) > On 29 Jan 2019, at 12:26, Alessandro Benedetti wrote: > > Hi Maria, > this is actually a great catch! > I have been working a lot on the More Like This and this mistake never > caught my

How to specify custom update chain in a SolrJ request

2019-01-29 Thread Chris Wareham
I'm trying to update records in my Solr core, and have configured a custom update chain that skips updates to records that don't exist: true true My SolrJ update code is currently: for (Map.Entry user : users.entrySet()) { SolrInputDocument document = new

SOLR 7.5 returns http response 304 to SOLR admin UI query - is this correct when httpCaching never304="true" is a set?

2019-01-29 Thread Standen Guy
Hi All, I have recently upgraded to SOLR 7.5 from SOLR 4.10.3 and believe I have noticed a change in the way HTTP caching is operating. I have installed the vanilla SOLR 7.5 on Windows 2012 R2 I have run the techproducts example where the solrconfig includes : I

SOLR 7.5 returns http response 304 to SOLR admin UI query - is this correct when is a set?

2019-01-29 Thread Standen Guy
Hi All, I have recently upgraded to SOLR 7.5 from SOLR 4.10.3 and believe I have noticed a change in the way HTTP caching is operating. I have installed the vanilla SOLR 7.5 on Windows 2012 R2 I have run the techproducts example where the solrconfig includes : I

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Erick Erickson
No, this is not a bug but a consequence of the design. ExactStats can help, but there is no guarantee that different replicas will compute the exact same score. Scores should be very close however. You haven't explained why you need the scores to match. 99% of the time, worrying about scores at

Re: Solr 5.2.1 replication hangs possibly during segment merge after a delete operation

2019-01-29 Thread Shawn Heisey
On 1/28/2019 5:39 PM, Ravi Prakash wrote: I have a situation where I am trying to setup a once daily cron job on the master node to delete old documents from the index based on our retention policy. This reply may not do you any good. Just wanted you to know up front that I might not be

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey
On 1/29/2019 5:25 AM, Shawn Heisey wrote: Adding termVectors will make the index bigger.  Potentially much bigger. This will increase the overall RAM requirement of the server, especially if the server is handling software other than Solr.  Anything that makes the index bigger can affect

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey
On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote: My guess is after we change our searchFields_tcs schema which is: *From*: *To:* Adding termVectors will make the index bigger. Potentially much bigger. This will increase the overall RAM requirement of the server, especially if the

AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd
Hi If the reason for the difference in speed is that the index is being read from disk, I would expect that the first query would be slow, but subsequent queries on the same collection should speed up. A query on the other collection could then be slower. In this case I would say that this is

Re: MLT - unexpected design choice

2019-01-29 Thread Alessandro Benedetti
Hi Maria, this is actually a great catch! I have been working a lot on the More Like This and this mistake never caught my attention. I agree with you, feel free to open a Jira Issue. First of all what you say, makes sense. Secondly it is the way it is the standard way used in the similarity

AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd
References, sorry: [1] https://support.microsoft.com/en-ca/help/976618/you-experience-performance-issues-in-applications-and-services-when-th [2] https://docs.microsoft.com/en-us/sysinternals/downloads/rammap -Ursprüngliche Nachricht- Von: Dodd, Paul Sutton (UB) Gesendet: Dienstag, 29.

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for you reply. However, we did not delete our index when the screenshot was taken. All the indexes are still in Solr. My guess is after we change our searchFields_tcs schema which is: *From*: *To:* The above change was done in order to use the Solr recommended unified

Limit facet terms based on a substring using the JSON facet API

2019-01-29 Thread Tom Van Cuyck
Hi In the old Solr facet API there are the facet.contains and facet.conains.ignoreCase parameters to limit the facet values to those terms containing the specified substring. Is there an equivalent option in the JSON facet API? Or is there a way to obtain the same behavior with the JSON API? I

Re: MLT - unexpected design choice

2019-01-29 Thread Matt Pearce
Hi Maria, Would it help to add a filter to your query to restrict the results to just those where the description field is populated? Eg. add fq=description:[* TO *] to your query parameters. Apologies if I'm misunderstanding the problem! Best, Matt On 28/01/2019 16:29, Maria Mestre

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey
On 1/26/2019 4:48 PM, Zheng Lin Edwin Yeo wrote: Thanks for your reply. Below are the replies to your email: 1) We have tried to set the heap size to be 8g previously when we faced the same issue, and changing to 7g does not help too. 2) We are using standard disk at the moment. 3) In the

Re: PatternReplaceFilterFactory problem

2019-01-29 Thread Chris Wareham
Thanks for the help - changing the field type of the destination for the copy fields to "text_en" solved the problem. I'd foolishly assumed that the analysis of the source fields was applied then the resulting tokens passed to the copy field, which doesn't really make sense now that I think

Re: Large Number of Collections takes down Solr 7.3

2019-01-29 Thread Hendrik Haddorp
How much memory do the Solr instances have? Any more details on what happens when the Solr instances start to fail? We are using multiple Solr clouds to keep the collection count low(er). On 29.01.2019 06:53, Gus Heck wrote: Does it all have to be in a single cloud? On Mon, Jan 28, 2019,

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht
Hi Erick, To test this scenario I added replica again and from few days have been monitoring metrics like Num Docs, Max Doc, Deleted Docs from *Overview* section of core.Checked *Segments Info* section too.Everything looks in sync. http://:8983/solr/#/MyTestCollection_*shard1_replica_n7*/

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi Shawn / Jan, Do we have any further insights about this problem? The same problem still happens even after we make the changes and re-index all the data. Regards, Edwin On Sun, 27 Jan 2019 at 07:48, Zheng Lin Edwin Yeo wrote: > Hi Shawn, > > Thanks for your reply. Below are the replies to