Minimum set of jars to run EmbeddedSolrServer

2020-09-28 Thread Alexandre Rafalovitch
Hello, Does anybody know (or even experimented) with what the minimum set of jars needed to run EmbeddedSolrServer. If I just include solr-core, that pulls in a huge number of Jars. I don't need - for example - Lucene analyzers for Korean and Japanese for this application. But what else do I

ApacheCon at Home 2020 starts tomorrow!

2020-09-28 Thread Anshum Gupta
Hey everyone! ApacheCon at Home 2020 starts tomorrow. The event is 100% virtual, and free to register. What’s even better is that this year we have reintroduced the Lucene/Solr/Search track at ApacheCon. With 2 full days of sessions covering various Lucene, Solr, and Search, I hope you are able

Re: solr performance with >1 NUMAs

2020-09-28 Thread Shawn Heisey
On 9/28/2020 12:17 PM, Wei wrote: Thanks Shawn. Looks like Java 11 is the way to go with -XX:+UseNUMA. Do you see any backward compatibility issue for Solr 8 with Java 11? Can we run Solr 8 built with JDK 8 in Java 11 JRE, or need to rebuild solr with Java 11 JDK? I do not know of any problems

Vulnerabilities in SOLR 8.6.2

2020-09-28 Thread Narayanan, Lakshmi
Hello Solr-User Support team We have installed the SOLR 8.6.2 package into docker container in our DEV environment. Prior to using it, our security team scanned the docker image using SysDig and found a lot of Critical/High/Medium vulnerabilities. The full list is in the attached spreadsheet

Re: Solr storage of fields <-> indexed data

2020-09-28 Thread Edward Turner
That's really good and helpful info, thank you. Perfect. Best wishes, Edd On Mon, 28 Sep 2020, 5:53 pm Shawn Heisey, wrote: > On 9/28/2020 8:56 AM, Edward Turner wrote: > > By removing the copyfields, we've found that our index sizes have reduced > > by ~40% in some cases, which is great!

Re: solr performance with >1 NUMAs

2020-09-28 Thread Wei
Thanks Shawn. Looks like Java 11 is the way to go with -XX:+UseNUMA. Do you see any backward compatibility issue for Solr 8 with Java 11? Can we run Solr 8 built with JDK 8 in Java 11 JRE, or need to rebuild solr with Java 11 JDK? Best, Wei On Sat, Sep 26, 2020 at 6:44 PM Shawn Heisey wrote: >

Worker node / collection creation, parallelized streams

2020-09-28 Thread uyilmaz
Hi all, Today I was fiddling with a streaming expression that takes too long to finish and times out. First of all, is it normal for it to time out, rather than just taking too long? Then I read about the parallelized streaming expressions, which takes a worker number as parameter. We have

Add Hosts in SolrCloud

2020-09-28 Thread Massimiliano Randazzo
Hello everybody I have a SolrCloud consisting of 4 Servers, I have a collection with 2 shars in replica 2 Collection: bookReaderAttilioHortis Shard count: 2 configName: BookReader replicationFactor: 2 maxShardsPerNode: 2 router: compositeId autoAddReplicas: false I would like to add 2 more

Re: Solr storage of fields <-> indexed data

2020-09-28 Thread Shawn Heisey
On 9/28/2020 8:56 AM, Edward Turner wrote: By removing the copyfields, we've found that our index sizes have reduced by ~40% in some cases, which is great! We're just curious now as to exactly how this can be ... That's not surprising. My question is, given the following two schemas, if we

Returning fields a specific order

2020-09-28 Thread gnandre
Hi, I have a use-case where I want to compare stored fields values of Solr documents from two different Solr instances. I can use a diff tool to compare them but only if they returned the fields in specific order in the response. I tried setting fl param with all the fields specified in

Re: Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2

2020-09-28 Thread gnandre
Thanks, this is helpful. I agree. q.op param should not affect fq parameter. I think this is a feature and not a bug. On Wed, Sep 23, 2020 at 4:39 PM Erik Hatcher wrote: > In 6.3 it did that? It shouldn't have. q and fq shouldn't share > parameters. fq's themselves shouldn't, IMO, have

Re: Solr storage of fields <-> indexed data

2020-09-28 Thread Erick Erickson
Fields are placed in the index totally separately from each other, so it’s no wonder that removing the copyField results in this kind of savings. And they have to be separate. Consider what comes out of the end of the analysis chain. The same input could produce totally different output. As a

Solr storage of fields <-> indexed data

2020-09-28 Thread Edward Turner
Hi all, We have recently switched to using edismax + qf fields, and no longer use copyfields to allow us to easily search over values in multiple fields (by copying multiple fields' values to the copyfield destinations, and then performing queries over the destination field). By removing the

Re: SOLR Cursor Pagination Issue

2020-09-28 Thread Erick Erickson
I said nothing about docId changing. _any_ sort criteria changing is an issue. You’re sorting by score. Well, as you index documents, the new docs change the values used to calculate scores for _all_ documents will change, thus changing the sort order and potentially causing unexpected results

Re: SOLR Cursor Pagination Issue

2020-09-28 Thread vmakovsky
Hi, Erick I have a python script that sends requests with CursorMark. This script checks data against the following Expected series criteria: Collected series: Number of requests: Collected unique series: The request looks like this: select?indent=off=edismax=json={!key=NUM_DOCS}NOT

Re: Unable to upload updated solr config set

2020-09-28 Thread Erick Erickson
Until then, you can use bin/solr zk upconfig…. Best, Erick > On Sep 28, 2020, at 10:06 AM, Houston Putman wrote: > > Until the next Solr minor version is released you will not be able to > overwrite an existing configSet with a new configSet of the same name. > > The ticket for this feature

Re: Unable to upload updated solr config set

2020-09-28 Thread Houston Putman
Until the next Solr minor version is released you will not be able to overwrite an existing configSet with a new configSet of the same name. The ticket for this feature is SOLR-10391 , and it will be included in the 8.7.0 release. Until then you

Re: Corrupted records after successful commit

2020-09-28 Thread Mr Havercamp
Yes, id is unique key. > I bet that if you redefined your updateHandler to give it some name other than “/update” in solrconfig.xml two things would happen: Hmm, nice. I didn't think of that but that would definitely identify the problem. We do have other scripts writing to the index but they

Re: Corrupted records after successful commit

2020-09-28 Thread Erick Erickson
Is your “id” field is your , and is it tokenized? It shouldn’t be, use something like “string” or keywordTokenizer. Definitely do NOT use, say, text_general. It’s very unlikely that records are not being flushed on commit, I’m 99.99% certain that’s a red herring and that this is a problem in

Re: Corrupted records after successful commit

2020-09-28 Thread Mr Havercamp
Thanks Eric. My knowledge is fairly limited but 1) sounds feasible. Some logs: I write a bunch of recods to Solr: 2020-09-28 11:01:01.255 INFO (qtp918312414-21) [ x:vnc] o.a.s.u.p.LogUpdateProcessorFactory [vnc] webapp=/solr path=/update params={json.nl=flat=false=json}{add=[

Re: SOLR Cursor Pagination Issue

2020-09-28 Thread Erick Erickson
Define “incorrect” please. Also, showing the exact query you use would be helpful. That said, indexing data at the same time you are using CursorMark is not guaranteed do find all documents. Consider a sort with date asc, id asc. doc53 has a date of 2001 and you’re already returned the doc.

Re: Corrupted records after successful commit

2020-09-28 Thread Erick Erickson
There are several possibilities: 1> you simply have some process incorrectly updating documents. 2> you’ve changed your schema sometime without completely deleting your old index and re-indexing all documents from scratch. I recommend in fact indexing into a new collection and using collection

Corrupted records after successful commit

2020-09-28 Thread Mr Havercamp
Hi, We're seeing strange behaviour when records have been committed. It doesn't happen all the time but enough that the index is very inconsistent. What happens: 1. We commit a doc to Solr, 2. The doc shows in the search results, 3. Later (may be immediate, may take minutes, may take hours),

Re: What does current mean?

2020-09-28 Thread Kayak28
Hello, Wei-san Thank you for answering my question. That pretty makes sense to me. Sincerely, Kaya 2020年9月27日(日) 9:16 Wei : > My understanding is that current means whether there is data pending to be > committed. > > Best, > Wei > > On Sat, Sep 26, 2020 at 5:09 PM Kayak28 wrote: > > > Hello,

SOLR Cursor Pagination Issue

2020-09-28 Thread vmakovsky
Good afternoon, Could you please suggest us a solution: during data updating process in solrCloud, requests with cursor mark return incorrect data. I suppose that the results do not follow each other during the indexation process, because the data doesn't have enough time to be replicated

Re: Solr waitForMerges() causing leaderless shard during shutdown

2020-09-28 Thread Andrzej Białecki
Hi Ramsey, This is an interesting scenario, I vaguely remember someone (Cao Manh Dat?) on a similar issue - I’m not sure if newer versions of Solr already fixed that but it would be helpful to create a Jira issue to investigate it and verify that it’s indeed fixed in a more recent Solr