Re: regarding exposing merge metrics

2018-01-08 Thread Shalin Shekhar Mangar
The merge metrics were enabled by default in 6.4 but they were turned off in 6.4.2 because of large performance degradations. For more details, see https://issues.apache.org/jira/browse/SOLR-10130 On Tue, Jan 9, 2018 at 9:11 AM, S G wrote: > Yes, this is actually

Indexing of MD5 and SHA256 fields

2018-01-08 Thread Zheng Lin Edwin Yeo
Hi, I'm using Solr 7.2.0. Would like to check, is there a way that we can index the MD5 and SHA256 fields that are being extracted by Tika for EML files? Example: X-TIKA:digest:MD5: 0035b9e8a14bb8ca5dd3c6b63a74a31d X-TIKA:digest:SHA256:

Re: In-place update vs Atomic updates

2018-01-08 Thread kshitij tyagi
Hi Shawn, Thanks for the information, 1. Does in place updates opens a new searcher by itself or not? 2. As the entire segment is rewriten, it means that frequent in place updates are expensive as each in place update will rewrite the entire segment again? Correct me here if my understanding is

Re: Profanity

2018-01-08 Thread John Blythe
Gladly. Good luck! On Mon, Jan 8, 2018 at 8:27 PM Sadiki Latty wrote: > Thanks for the feedback John, > > This is a genius idea if I don’t want to create my own processor. I could > simply check that field for data for my reports. Either the field will have > data or it

Re: regarding exposing merge metrics

2018-01-08 Thread S G
Yes, this is actually confusing and the documentation ( https://lucene.apache.org/solr/guide/7_2/metrics-reporting.html) does not help either: *Index Merge Metrics* : These metrics are collected in respective registries for each core (e.g., solr.core.collection1…​.), under the INDEX category.

Re: Profanity

2018-01-08 Thread Sadiki Latty
Thanks for the feedback John, This is a genius idea if I don’t want to create my own processor. I could simply check that field for data for my reports. Either the field will have data or it won’t. Thanks Sid Sent from my iPhone > On Jan 8, 2018, at 4:38 PM, John Blythe

Re: Profanity

2018-01-08 Thread Sadiki Latty
Thanks a lot guys. Multilingual will also be a hurdle tbh. The data will only be coming From 2 languages but it will prove to be potentially challenging nonetheless. French and English so “merde” will be making that list. This requirement is in itself an edge case for my project so ML may be

Re: solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
OK just restarting all the solr nodes did fix it, since they are in production I was hesitant to do that From: Petersen, Robert (Contr) Sent: Monday, January 8, 2018 12:34:28 PM To: solr-user@lucene.apache.org Subject: solr 5.4.1 leader

Re: solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
Perhaps I didn't explain well, three nodes live. Two are in recovering mode exception being they cant get to the Leader because the Leader replies that he is not the leader. On the dashboard it shows him as the leader but he thinks he isn't. The exceptions are below... Do I have to just restart

RE: Profanity

2018-01-08 Thread Markus Jelsma
Indeed, hence the small suggestion to use ML for this instead of a dumb set of terms, which is useless in almost any real solution. We have had very good results with SVM's for text processing, although in the end it depends on your input data, and the care for selecting edge cases. Regards,

RE: Profanity

2018-01-08 Thread Davis, Daniel (NIH/NLM) [C]
Fun topic. Same complicated issues as normal search: Multilingual support?Is "Merde" profanity too, or just in French. Multi-word synonyms? Does "God Damn" becomes "goddamn", or do you treat "Damn" and "God damn" the same because you drop "God"

regarding exposing merge metrics

2018-01-08 Thread suresh pendap
Hi, I am following the instructions from https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html in order to expose the Index merge related metrics. The document says that we have to add the below snippet in order to expose the merge metrics ... 524288 true

RE: Profanity

2018-01-08 Thread Markus Jelsma
Yes, an UpdateRequestProcessor is the API to implement for these sorts of requirements. In the URP you have access to a SolrDocument object that carries the input data. You can inspect the fields, and add, remove or modify fields if you want, or discard the input altogether. So, check your

Re: Profanity

2018-01-08 Thread John Blythe
you could use the keepwords functionality. have a field that only keeps profanity and then you can query against that field having its default value vs. profane text -- John Blythe On Mon, Jan 8, 2018 at 3:12 PM, Sadiki Latty wrote: > Hey > > I would like to find a solution

Profanity

2018-01-08 Thread Sadiki Latty
Hey I would like to find a solution to flag (at index-time) profanity. Optimally, it would be good if it function similar to stopwords in the sense that I can have a predefined list that is read and if token is on the list that document is 'flagged' in a different field. Does anyone know of

Re: solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
I'm on zookeeper 3.4.8 From: Petersen, Robert (Contr) Sent: Monday, January 8, 2018 12:34:28 PM To: solr-user@lucene.apache.org Subject: solr 5.4.1 leader issue Hi got two out of my three servers think they are replicas on one shard

solr 5.4.1 leader issue

2018-01-08 Thread Petersen, Robert (Contr)
Hi got two out of my three servers think they are replicas on one shard getting exceptions wondering what is the easiest way to fix this? Can I just restart zookeeper across the servers? Here are the exceptions: TY Robi ERROR null RecoveryStrategy Error while trying to recover.

SSL configuration with Master/Slave

2018-01-08 Thread Sundaram, Dinesh
Team, I'm facing an SSL issue while configuring Master/Slave. Master runs fine lone with SSL and Slave runs fine lone with SSL but getting SSL exception during the synch up. It gives the below error. I believe we need to trust the target server at source. Can you give me the steps to allow

Re: Limit search queries only to pull replicas

2018-01-08 Thread Tomas Fernandez Lobbe
This feature is not currently supported. I was thinking in implementing it by extending the work done in SOLR-10880. I still didn’t have time to work on it though. There is a patch for SOLR-10880 that doesn’t implement support for replica types, but could be used as base. Tomás > On Jan 8,

Re: Newbie Question

2018-01-08 Thread Deepak Goel
Got it . Thank You for your help Deepak "Please stop cruelty to Animals, help by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" On Mon, Jan 8, 2018 at 11:48 PM, Deepak Goel

Re: Newbie Question

2018-01-08 Thread Deepak Goel
*Is this right?* SolrClient client = new HttpSolrClient.Builder(" http://localhost:8983/solr/shakespeare/select;).build(); SolrQuery query = new SolrQuery(); query.setQuery("henry"); query.setFields("text_entry"); query.setStart(0); queryResponse =

Re: Newbie Question

2018-01-08 Thread Alexandre Rafalovitch
I think you are missing /query handler endpoint in the URL. Plus actual search parameters. You may try using the admin UI to build your queries first. Regards, Alex On Jan 8, 2018 12:23 PM, "Deepak Goel" wrote: > Hello > > *I am trying to search for documents in my

Newbie Question

2018-01-08 Thread Deepak Goel
Hello *I am trying to search for documents in my collection (Shakespeare). The code is as follows:* SolrClient client = new HttpSolrClient.Builder(" http://localhost:8983/solr/shakespeare;).build(); SolrDocument doc = client.getById("2"); *However this does not return any document. What mistake

Re: docValues with stored and useDocValuesAsStored

2018-01-08 Thread Shalin Shekhar Mangar
Hi Bernd, If Solr can fetch a field from both stored and docValues then it chooses docValues only if such field is single-valued and that allows Solr to avoid accessing the stored document altogether for *all* fields to be returned. Otherwise stored values are preferred. This is the behavior

Re: In-place update vs Atomic updates

2018-01-08 Thread Shawn Heisey
On 1/8/2018 4:05 AM, kshitij tyagi wrote: What are the major differences between atomic and in-place updates, I have gone through the documentation but it does not give detail internal information. Atomic updates are nearly identical to simple indexing, except that the existing document is

Re: SolrJ with Async Http Client

2018-01-08 Thread Emir Arnautović
Not sure if alilgns with your expectations, but here is something that is declared as “async solr client”: https://github.com/inoio/solrs HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training -

Re: Personalized search parameters

2018-01-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I'm assuming that you are writing the cosine similarity and you have two vectors containing the pairs . The two vectors could have different sizes because they only contain the terms that have tfidf != 0. if you want to compute cosine similarity between the two lists you just have

In-place update vs Atomic updates

2018-01-08 Thread kshitij tyagi
Hi, What are the major differences between atomic and in-place updates, I have gone through the documentation but it does not give detail internal information. 1. Does doing in-place update prevents solr cache burst or not, what are the benefits of using in-place updates? I want to update one

docValues with stored and useDocValuesAsStored

2018-01-08 Thread Bernd Fehling
What is the precedence when docValues with stored=true is used? e.g. My guess, because of useDocValuesAsStored=true is default, that stored=true is ignored and the values are pulled from docValues. And only if useDocValuesAsStored=false is explicitly used then stored=true comes into play. Or

Re: Limit search queries only to pull replicas

2018-01-08 Thread Ere Maijala
Server load alone doesn't always indicate the server's ability to serve queries. Memory and cache state are important too, and they're not as easy to monitor. Additionally, server load at any single point in time or a short term average is not indicative of the server's ability to handle