RE: Tesseract language

2018-10-26 Thread Martin Frank Hansen (MHQ)
Hi again, Now I moved the OCR part to Tika, but I still can't make it work with Danish. It works when using default language settings and it seems like Tika is missing Danish dictionary. My java code looks like this: { File file = new File(pathfilename); Metadata meta

Re: A different result with filters

2018-10-26 Thread Kydryavtsev Andrey
This two queries are not similar. If you have parent with two children - "{condition_s:0, price_i: 100}" and "{condition_s:1, price_i: 10}", it will be matched by first query, it won't be matched by second. 26.10.2018, 09:50, "Владислав Властовский" : > Hi, I use 7.5.0 Solr > > Why

Solr IndexSearcher lifecycle

2018-10-26 Thread Xiaolong Zheng
Hi, I would like to have more understanding of the lifecycle for IndexSearcher in Solr, I understand that IndexSearcher for Lucene would recommended that “For performance reasons, if your index is unchanging, you should share a single IndexSearcher instance across multiple searches instead of

SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Sofiya Strochyk
Hi everyone, We have a SolrCloud setup with the following configuration: * 4 nodes (3x128GB RAM Intel Xeon E5-1650v2, 1x64GB RAM Intel Xeon E5-1650v2, 12 cores, with SSDs) * One collection, 4 shards, each has only a single replica (so 4 replicas in total), using compositeId router *

Re: A different result with filters

2018-10-26 Thread Kydryavtsev Andrey
There supposed to be support of children "filters" attribute in latest releases. Try it out. Link https://lucene.apache.org/solr/guide/7_3/other-parsers.html#filtering-and-tagging-2 26.10.2018, 16:34, "Владислав Властовский" : > Andrey, ok > > How can I tag the filter then? > > I send: > { >  

RE: Reading data using Tika to Solr

2018-10-26 Thread Martin Frank Hansen (MHQ)
Hi Tim, Thanks again, I will update Tika and try it again. -Original Message- From: Tim Allison Sent: 26. oktober 2018 12:53 To: solr-user@lucene.apache.org Subject: Re: Reading data using Tika to Solr Ha...emails passed in the ether. As you saw, we added the RecursiveParserWrapper a

Re: A different result with filters

2018-10-26 Thread Владислав Властовский
Andrey, ok How can I tag the filter then? I send: { "query": "*:*", "limit": 1000, "filter": [ "{!parent which=kind_s:edition}condition_s:0 AND {!tag=price}price_i:[* TO 75]" ] } I got: { "error": { "metadata": [ "error-class",

Re: Edismax query returning the same number of results using AND as it does with OR

2018-10-26 Thread Shawn Heisey
Followup: I had a theory that Nicky tested, and I think what was observed confirms the theory. TL;DR: In previous versions, I think there was a bug where the presence of boolean operators caused edismax to ignore the mm parameter, and only rely on the boolean operator(s). After that bug got

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Sofiya Strochyk
Thanks Erick, 1. We already use Solr 7.5, upgraded some of our nodes only recently to see if this eliminates the difference in performance (it doesn't, but I'll test and see if the situation with replicas syncing/recovery has improved since then) 2. Yes, we only open searcher once every 30

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Walter Underwood
The G1 collector should improve 95th percentile performance, because it limits the length of pauses. With the CMS/ParNew collector, I ran very large Eden spaces, 2 Gb out of an 8 Gb heap. Nearly all of the allocations in Solr have the lifetime of one request, so you don’t want any of those

Re: Tesseract language

2018-10-26 Thread Tim Allison
Tika relies on you to install tesseract and all the language libraries you'll need. If you can successfully call `tesseract testing/eurotext.png testing/eurotext-dan -l dan`, Tika _should_ be able to specify "dan" with your code above. On Fri, Oct 26, 2018 at 10:49 AM Martin Frank Hansen (MHQ)

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Toke Eskildsen
Sofiya Strochyk wrote: > 5. Yes, docValues are enabled for the fields we sort on > (except score which is an internal field); [...] I am currently working on https://issues.apache.org/jira/browse/LUCENE-8374 which speeds up DocValues-operations for indexes with many documents. What "many" means

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread David Hastings
Would adding the docValues in the schema, but not reindexing, cause errors? IE, only apply the doc values after the next reindex, but in the meantime keep functioning as there were none until then? On Fri, Oct 26, 2018 at 2:15 PM Toke Eskildsen wrote: > Sofiya Strochyk wrote: > > 5. Yes,

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Erick Erickson
Some ideas: 1> What version of Solr? Solr 7.3 completely re-wrote Leader Initiated Recovery and 7.5 has other improvements for recovery, we're hoping that the recovery situation is much improved. 2> In the 7x code line, there are TLOG and PULL replicas. As of 7.5, you can set up so the queries

Re: A different result with filters

2018-10-26 Thread Владислав Властовский
Andrey, thx. You are an expert! пт, 26 окт. 2018 г. в 18:40, Kydryavtsev Andrey : > There supposed to be support of children "filters" attribute in latest > releases. Try it out. > > Link > https://lucene.apache.org/solr/guide/7_3/other-parsers.html#filtering-and-tagging-2 > > 26.10.2018,

Re: Tesseract language

2018-10-26 Thread Rohan Kasat
Hi Martin, Are you using it For image formats , I think you can try tess4j and use give TESSDATA_PREFIX as the home for tessarct Configs. I have tried it and it works pretty well in my local machine. I have used java 8 and tesseact 3 for the same. Regards, Rohan Kasat On Fri, Oct 26, 2018 at

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Erick Erickson
Sofiya: I haven't said so before, but it's a great pleasure to work with someone who's done a lot of homework before pinging the list. The only unfortunate bit is that it usually means the simple "Oh, I can fix that without thinking about it much" doesn't work ;) 2. I'll clarify a bit here. Any

Re: LTR features on solr

2018-10-26 Thread Kamuela Lau
I have never done such a thing myself, but I think that dynamic field would probably be the way to go. I've not used it myself, but you might also be able to do what you want with payloads: https://lucene.apache.org/solr/guide/7_5/function-queries.html#payload-function

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Toke Eskildsen
David Hastings wrote: > Would adding the docValues in the schema, but not reindexing, cause > errors? IE, only apply the doc values after the next reindex, but in the > meantime keep functioning as there were none until then? As soon as you specify in the schema that a field has docValues=true,

RE: Tesseract language

2018-10-26 Thread Martin Frank Hansen (MHQ)
Hi Tim, You were right. When I called `tesseract testing/eurotext.png testing/eurotext-dan -l dan`, I got an error message so I downloaded "dan.traineddata" and added it to the Tesseract-OCR/tessdata folder. Furthermore I added the 'TESSDATA_PREFIX' variable to the path-variables pointing to

Re: A different result with filters

2018-10-26 Thread Владислав Властовский
Emir, no пт, 26 окт. 2018 г. в 10:17, Emir Arnautović : > Hi, > The second query is equivalent to: > > { > > "query": "*:*", > > "limit": 0, > > "filter": [ > >"{!parent which=kind_s:edition}condition_s:0", > >"price_i:[* TO 75]" > > ] > > } > > > HTH, > Emir > -- > Monitoring -

RE: Reading data using Tika to Solr

2018-10-26 Thread Martin Frank Hansen (MHQ)
Hi Tim, It is msg files and I added tika-app-1.14.jar to the build path - and now it works  But how do I get it to read the attachments as well? -Original Message- From: Tim Allison Sent: 25. oktober 2018 21:57 To: solr-user@lucene.apache.org Subject: Re: Reading data using Tika to

Re: LTR features on solr

2018-10-26 Thread Kamuela Lau
Hi, Just to confirm, are you asking about the following? For a particular query, you have a list of documents, and for each document, you have data on the number of times the document was clicked on, added to a cart, and ordered, and you would like to use this data for features. Is this correct?

LTR features on solr

2018-10-26 Thread Midas A
Hi All, I am new in implementing solr LTR . so facing few challenges Broadly we have 3 kind of features a) Based on query b) based on document *c) Based on query-document from click ,cart and order from tracker data.* So my question here is how to store c) type of features - Old

A different result with filters

2018-10-26 Thread Владислав Властовский
Hi, I use 7.5.0 Solr Why do I get two different results for similar requests? First req/res: { "query": "*:*", "limit": 0, "filter": [ "{!parent which=kind_s:edition}condition_s:0", "{!parent which=kind_s:edition}price_i:[* TO 75]" ] } { "response": { "numFound": 453,

Re: A different result with filters

2018-10-26 Thread Emir Arnautović
Hi, The second query is equivalent to: > { > "query": "*:*", > "limit": 0, > "filter": [ >"{!parent which=kind_s:edition}condition_s:0", >"price_i:[* TO 75]" > ] > } HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support

RE: Reading data using Tika to Solr

2018-10-26 Thread Martin Frank Hansen (MHQ)
Hi again, Never mind, I got manage to get the content of the msg-files as well using the following link as inspiration: https://wiki.apache.org/tika/RecursiveMetadata But thanks again for all your help! -Original Message- From: Martin Frank Hansen (MHQ) Sent: 26. oktober 2018 10:14

Re: LTR features on solr

2018-10-26 Thread Midas A
*Thanks for relpy . Please find my answers below inline.* On Fri, Oct 26, 2018 at 2:41 PM Kamuela Lau wrote: > Hi, > > Just to confirm, are you asking about the following? > > For a particular query, you have a list of documents, and for each > document, you have data > on the number of times

Re: Reading data using Tika to Solr

2018-10-26 Thread Tim Allison
IIRC, somewhere btwn 1.14 and now (1.19.1), we changed the default behavior for the AutoDetectParser from skip attachments to include attachments. So, two options: 1) upgrade to 1.19.1 and use the AutoDetectParser or 2) pass an AutoDetectParser via the ParseContext to be used for attachments. If

Re: Reading data using Tika to Solr

2018-10-26 Thread Tim Allison
Ha...emails passed in the ether. As you saw, we added the RecursiveParserWrapper a while back into Tika so no need to re-invent that wheel. That’s my preferred method/format because it maintains metadata from attachments and lets you know about exceptions in embedded files. The legacy method