Re: wildcards match end-of-word?

2020-02-11 Thread Walter Underwood
“kinase*” does match “kinase”. On the page you linked to, it defines “*” as matching "Multiple characters (matches zero or more sequential characters)”. If it is not matching, you may be using a stemmer on that field or doing some other processing that changes the tokens. wunder Walter

Re: per-field count of documents matched?

2020-02-11 Thread Erick Erickson
Hmmm, you could do a facet query (or a series of them). facet.query=LastName:stone=Street:stone etc….. That’d automatically only tally for the docs that match. You could also consider a custom search component. For the exact case you describe, it’s actually fairly simple. The postings list

wildcards match end-of-word?

2020-02-11 Thread Fischer, Stephen
Hi, I am a solr newbie. I was surprised to discover that a search for kinase* returned fewer results than kinase. Then I read the wildcard documentation, and saw why. kinase*

RE: [External] Re: per-field count of documents matched?

2020-02-11 Thread Fischer, Stephen
Thanks very much! By the way, we are using eDisMax, and the queries our UI supports don't include fancy Booleans, so your ideas just might work Thanks again, Steve -Original Message- From: Erick Erickson Sent: Tuesday, February 11, 2020 7:16 PM To: solr-user@lucene.apache.org

Re: solr-injection

2020-02-11 Thread Jörn Franke
Do not have users accessing Solr directly. Have your own secure web frontend/ own APIs for it. In this way you can control secure access. Secure Solr with https and Kerberos. Have for your web frontend only access rights needed and for your admins only the access rights they need. Automate

SolrJ 8.2: Too many Connection evictor threads

2020-02-11 Thread Andreas Kahl
Hello everyone, we just updated our Solr from former 5.4 to 8.2. The server runs fine, but in our client applications we are seeing issues with thousands of threads created with the name "Connection evictor". Can you give a hint how to limit those threads? Should we better use HttpSolrClient

solr-injection

2020-02-11 Thread Martin Frank Hansen (MHQ)
Hi, I was wondering how others are handling solr – injection in their solutions? After reading this post: https://www.waratek.com/apache-solr-injection-vulnerability-customer-alert/ I can see how important it is to update to Solr-8.2 or higher. Has anyone been successful in injecting

Possible performance issue in my environment setup

2020-02-11 Thread Rudenko, Artur
I'm am currently investigating a performance issue in our environment (20M large PARENT documents and 800M nested small CHILD documents). The system inserts about 400K PARENT documents and 16M CHILD documents per day. This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each,

Issue to upgrade solr cloud

2020-02-11 Thread Yogesh Chaudhari
Hi All, Currently we are using Solr 5.2.1 on production server and want upgrade to Solr 7.7.2. We are using solr 5.2.1 from last 5 years, we do have millions of documents on production server. We have Solr cloud with 2 shards and 3 replicas on production server. I have upgraded Solr 5.2.1 to

Re: Issue to upgrade solr cloud

2020-02-11 Thread Erick Erickson
You really have to re-index your content in this case. This is enforced in Lucene/Solr 8. Upgrading from one version to another isn’t sufficient. The mail server pretty aggressively strips attachments, so your picture (?) didn’t come through. The log you posted isn’t very helpful, we’d need

Re: Possible performance issue in my environment setup

2020-02-11 Thread Erick Erickson
My first bit of advice would be to fix your autocommit intervals. There’s not much point in having openSearcher set to true _and_ having your soft commit times also set, all soft commit does is open a searcher and your autocommit does that. I’d also reduce the time for autoCommit. You’re

Re: cursorMark and shards? (6.6.2)

2020-02-11 Thread Erick Erickson
Wow, that’s pretty horrible performance. Yeah, I was conflating a couple of things here. Now it’s clear. If you specify rows=1, what do you get in response time? I’m wondering if your time is spent just assembling the response rather than searching. You’d have to have massive docs for that to

Antw: Re: SolrJ 8.2: Too many Connection evictor threads

2020-02-11 Thread Andreas Kahl
Erick, Thanks, that's why we want to upgrade our clients to the same Solr(J) version as the server has. But I am still fighting the uncontrolled creation of those Connection evictor threads in my tomcat. Best Regards Andreas >>> Erick Erickson 11.02.20 15.06 Uhr >>> Are you running a

Re: SolrJ 8.2: Too many Connection evictor threads

2020-02-11 Thread Erick Erickson
Are you running a 5x SolrJ client against an 8x server? There’s no guarantee at all that that would work (or vice-versa for that matter). Most generally, SolrJ clients should be able to work with version X-1, but X-3 is unsupported. Best, Erick > On Feb 11, 2020, at 6:36 AM, Andreas Kahl

RE: Possible performance issue in my environment setup

2020-02-11 Thread Rudenko, Artur
Thanks for helping, I will keep investigating. Just note, we did stopped indexing and we did not saw any significant changes. Artur Rudenko Analytics Developer Customer Engagement Solutions, VERINT T +972.74.747.2536 | M +972.52.425.4686 -Original Message- From: Erick Erickson Sent:

GC_TUNE setting from solr.in.sh is not applied

2020-02-11 Thread Steffen Moldenhauer
Hi all, I installed Solr 8.4.1 (first time on a linux sub-system for testing purposes only) and for whatever reason the default GC settings prevented the server from running. So I tried to change the setting with a GC_TUNE in solr.in.sh But it got not applied to the start up. So I looked at

Re: Storage/Volume type for Kubernetes Solr POD?

2020-02-11 Thread Susheel Kumar
Thanks, Karl for sharing. With local SSD's you be able to auto scale. Is that correct? On Fri, Feb 7, 2020 at 5:22 AM Nicolas PARIS wrote: > hi all > > what about cephfs or lustre distrubuted filesystem for such purpose ? > > > Karl Stoney writes: > > > we personally run solr on google cloud

Re: Dependency log4j-slf4j-impl for solr-core:7.5.0 causing a number of build problems

2020-02-11 Thread Wolf, Chris (ELS-CON)
(I found this stuck in my outbox, sorry for the delayed response) Hi, Thank you, I finally was able to configure maven to exclude that logging implementation. But now I'm having an issue building a Spring-Boot executable WAR with embedded Tomcat, for some reason, when I "spring-boot:run" it,

Re: Dependency log4j-slf4j-impl for solr-core:7.5.0 causing a number of build problems

2020-02-11 Thread Wolf, Chris (ELS-CON)
(sorry for bad formatting Outlook-for-Mac doesn't support Internet quoting) Thanks Mark, I did that until I finally was able to exclude it altogether. -Chris On 1/17/20, 10:20 AM, "Mark H. Wood" wrote: For the version problem, I would try adding something like:

Support Tesseract in Apache Solr

2020-02-11 Thread Karan Jain
Hi All, The Solr version 7.6.0 is running on my local machine. I have installed Tesseract through following steps:- yum install tesseract echo export PATH=$PATH:/usr/share/tesseract >>~/.bash_profile echo export TESSDATA_PREFIX=/usr/share/tesseract >>~/.bash_profile Now the deployed Solr is

Re: Support Tesseract in Apache Solr

2020-02-11 Thread Jörn Franke
Honestly i would not run tesseract on the same server as Solr. It takes a lot of resources and may negatively impact Solr. Just write a small program using Tika+Tesseract that runs on a different server / container and posts the results to Solr. About your question: Probably Tika (a dependency

Re: Solr 8.2 replicas use only 1 CPU at 100% every solr.autoCommit.maxTime minutes

2020-02-11 Thread Vangelis Katsikaros
Hi On Mon, Feb 10, 2020 at 5:05 PM Vangelis Katsikaros wrote: > Hi all > > We run Solr 8.2.0 > * with Amazon Corretto 11.0.5.10.1 SDK (java arguments shown in [1]), > * on Ubuntu 18.04 > * on AWS EC2 m5.2xlarge with 8 CPUs and 32GB of RAM > * with -Xmx16g [1]. > > We have migrated from Solr 3.5

Re: cursorMark and shards? (6.6.2)

2020-02-11 Thread Walter Underwood
Good questions. Here is the QTime for rows=1000. Looks pretty reasonable. I’d blame the slowness on the VPN connection, but the median response time of 10,000 msec is measured at the server. The client is in Python, using wt=json. Average document size in JSON is 5132 bytes. The system should

Re: Support Tesseract in Apache Solr

2020-02-11 Thread Edward Ribeiro
I second Jorn: don't deploy Tesseract + Tika on the same server as Solr. Tesseract, specially with OCR enabled, will drain your machine resources that could be used to indexing/searching. In addition to that, any malformed PDF could potentially shutdown the Solr server. Best bet would be to use

per-field count of documents matched?

2020-02-11 Thread Fischer, Stephen
Hi wise Solr experts, For our scientific use-case we want to show users a per-field count of documents that match that field. We like to do this efficiently because we might return up to a million documents. For example, if we had documents describing People, and a query of, say, "Stone" we

Re: Solr 8.2 replicas use only 1 CPU at 100% every solr.autoCommit.maxTime minutes

2020-02-11 Thread Edward Ribeiro
Is your autoCommit configured to open new searchers? Did you try to set openSearcher to false? Edward On Tue, Feb 11, 2020 at 3:40 PM Vangelis Katsikaros wrote: > Hi > > On Mon, Feb 10, 2020 at 5:05 PM Vangelis Katsikaros > > wrote: > > > Hi all > > > > We run Solr 8.2.0 > > * with Amazon

Re: cursorMark and shards? (6.6.2)

2020-02-11 Thread Erick Erickson
Curiouser and curiouser. So two possibilities are just the time it takes to assemble the packet and the time it takes to send it back. Three more experiments then. 1> change the returned doc to return a single docValues=true field. My claim: The response will be very close to the 400-600 ms

Re: Storage/Volume type for Kubernetes Solr POD?

2020-02-11 Thread Karl Stoney
yes we scale with pd-ssd or local-ssd just fine From: Susheel Kumar Sent: 11 February 2020 17:15 To: solr-user@lucene.apache.org Subject: Re: Storage/Volume type for Kubernetes Solr POD? Thanks, Karl for sharing. With local SSD's you be able to auto scale. Is