Re: Slower queries with 7.3.1?

2018-05-26 Thread Deepak Goel
Is it possible to profile the code to find the exact points which are taking more time comparatively? On Sun, 27 May 2018, 06:02 Will Currie, wrote: > I raised https://issues.apache.org/jira/browse/SOLR-12407. In case anybody > else sees a similar slowdown with boosts. > > On

Re: Slower queries with 7.3.1?

2018-05-26 Thread Will Currie
I raised https://issues.apache.org/jira/browse/SOLR-12407. In case anybody else sees a similar slowdown with boosts. On Sat, May 26, 2018 at 4:10 PM, Will Currie wrote: > I did some more (micro)benchmarking with a single query. Setting the query > cache size to zero I see

Re: Index protected zip

2018-05-26 Thread Erick Erickson
Thanks! now I can just record the URL and then paste it in ;) Who knows, maybe people will see it first too! On Sat, May 26, 2018 at 9:48 AM, Tim Allison wrote: > W00t! Thank you, Shawn! > > The "don't use ERH in production" response comes up frequently enough >> that I

Re: How to specify what pages to add to index?

2018-05-26 Thread Jonathan Candelaria
Thanks- It's actually more like a localhost/app2: app2 in question is Omeka (digital publishing platform) When Omeka is installed on a server, it's usually all alone on the server. So you *tell *it to index something and what core corresponds to that index and it indexes it? If so, I think I'll

Re: How to specify what pages to add to index?

2018-05-26 Thread Alexandre Rafalovitch
I think you may have other pieces of software in that equation. Solr does not normally pull data from websites, it gets data pushed. Well, data import handler can do it. Then you normally start indexing by a command to Solr. That commans corresponds to a request handler in solrconfig.xml that

Re: How to specify what pages to add to index?

2018-05-26 Thread Jonathan Candelaria
Hello. I have a page that consists of a domain name and several folders in it corresponding to different web applications. eg: website.university.edu/app1 website.university.edu/app2 website.university.edu/app3 And all the pages are stored in separate folders in an html directory. There is

Re: Index protected zip

2018-05-26 Thread Tim Allison
W00t! Thank you, Shawn! The "don't use ERH in production" response comes up frequently enough > that I have created a wiki page we can use for responses: > > https://wiki.apache.org/solr/RecommendCustomIndexingWithTika > > Tim, you are extremely well-qualified to expand and correct this page. >

Re: Index protected zip

2018-05-26 Thread Shawn Heisey
On 5/26/2018 4:52 AM, Tim Allison wrote: Please see Erick Erickson’s evergreen advice and linked blog post: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201805.mbox/%3ccan4yxve_0gn0a1y7wjpr27inuddo6+jzwwfgvzkfs40gh3r...@mail.gmail.com%3e The "don't use ERH in production"

Re: simple enrich uploaded binary documents with sha256 hashes

2018-05-26 Thread Tim Allison
+1 as always to Erick’s advice. DIH is only a PoC. We do have a DigestingParser in Tika, and when you combine that w the RecursiveParserWrapper, you can get digests not only of the main file but also on all embedded files/attachments...which can be pretty neat for some use cases. Operators are

Re: Index protected zip

2018-05-26 Thread Tim Allison
On third thought, I can’t think of how you’d easily inject a PasswordProvider into Solr’s integration. Please see Erick Erickson’s evergreen advice and linked blog post:

Re: Index protected zip

2018-05-26 Thread Tim Allison
You’ll need to provide a PasswordProvider in the ParseContext. I don’t think that is currently possible in the Solr integration. Please open a ticket if SolrJ doesn’t meet your needs. On Thu, May 24, 2018 at 1:03 PM Alexandre Rafalovitch wrote: > Hmm. If it works, then it

Re: Slower queries with 7.3.1?

2018-05-26 Thread Will Currie
I did some more (micro)benchmarking with a single query. Setting the query cache size to zero I see 400ms response time on 7.2 and 600ms on 7.3. Running curl in a loop on my laptop. ~4M docs. ~3G index. 1M total hits for the query.. Yup. I'm reluctant to post the query. It has multiple 300+