Re: Database logins and active sessions

2018-02-07 Thread Shawn Heisey
On 2/7/2018 11:40 PM, Srinivas Kashyap wrote: We have configured Solr index server on tomcat and fetch the data from database to index the data. We have implemented delta query indexing based on modify_ts. What version of Solr? Just as an FYI: Since version 5.0, running in user-provided

Re: Design Question

2018-02-07 Thread Emir Arnautović
Hi Deepthi, Is dictionary static? Can value for some id change? If static, and if query performance matters to you, the best and also the simplest solution is to denormalise data and store dictionary values with docs. Alternative is to use join query parser:

Design Question

2018-02-07 Thread Deepthi P
I have a dictionary of 2 ID's and their description which is in a collection. There is another solr collection in which each document have 10 or more ID's(multi valued field). I would like to text search in the dictionary and bring back the matched ID's and search these ID's in solr

Database logins and active sessions

2018-02-07 Thread Srinivas Kashyap
Hello, We have configured Solr index server on tomcat and fetch the data from database to index the data. We have implemented delta query indexing based on modify_ts. In our data-config.xml we have a parent entity and 17 child entity. We have 18 such solr cores. When we call delta-import on a

Fwd: Design Question

2018-02-07 Thread Deepthi P
I have a dictionary of 2 ID's and their description which is in a collection. There is another solr collection in which each document have 10 or more ID's(multi valued field). I would like to text search in the dictionary and bring back the matched ID's and search these ID's in solr

Normalizing payload values

2018-02-07 Thread Shreya Kampli
Hi, I am using a payload parser using Payload Score parser as below: {!payload_score f=field v=$q func=max includeSpanScore=true}. The issue is that the payload value in this field is around the range 1-1. Due to this, the boosts added to other fields are never effective as maximum of the

Hard commits blocked | non-solrcloud v6.6.2

2018-02-07 Thread mmb1234
I am seeing that after some time hard commits in all my solr cores stop and each one's searcher has an "opened at" date to be hours ago even though they are continuing to ingesting data successfully (index size increasing continuously).

Re: Bi Gram token generation with fuzzy searches

2018-02-07 Thread Sravan Kumar
@Emir : The 'sow' parameter in edismax along with the nested query '_query_' works. Tuning has to be done for desired relevancy. @Walter: It would be nice to have SOLR-629 integrated into the project. As Emir suggested, _query_ caters to my need by by applying fuzzy parameter to the query.

Re: Best Practice about solr cloud schema

2018-02-07 Thread Erick Erickson
It can pretty much be used as-is, _except_ you'll find one or more entries in your request handlers like: _text_ Change "_text_" to something in your schema, that's the default search field if you don't field-qualify your search terms. Note that if you take out, for instance, all of your

Re: Best Practice about solr cloud schema

2018-02-07 Thread Pratik Patel
Hey Eric, thanks for the clarification! What about solrConfig.xml file? Sure, it should be customized to suit one's needs but can it be used as a base or is it best to create one from scratch ? Thanks, Pratik On Wed, Feb 7, 2018 at 5:29 PM, Erick Erickson wrote: >

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread Erick Erickson
Agree with Walter, this is seeming like an XY problem. Also, Solr does _not_ implement strict boolean logic, see: https://lucidworks.com/2011/12/28/why-not-and-or-and-not/ Best, Erick On Wed, Feb 7, 2018 at 1:49 PM, Walter Underwood wrote: > I understand what you are

Re: Best Practice about solr cloud schema

2018-02-07 Thread Erick Erickson
That's really the point of the default managed-schema, to be a base you use for your customizations. In fact, I often _remove_ most of the fields (and especially fieldTypes) that I don't need. This includes dynamic fields, copyFields and the like. Sometimes it's actually easier, though, to just

Best Practice about solr cloud schema

2018-02-07 Thread Pratik Patel
Hello all, I have added some fields to default managed-schema file. I was wondering if it is safe to take default managed-schema file as is and add your own fields to it in production. What is the best practice for this? As I understand, it should be safe to use default schema as base if

Re: can you migrate solr index files from osx to linux

2018-02-07 Thread Jeff Dyke
I forgot to report back on this. For anyone that runs into it, you need the entire data directory not just the index directory, at least that's what made it work for me. On Thu, Feb 1, 2018 at 9:52 PM, Erick Erickson wrote: > I think SCP will be fine. Shawn's comment

Judging the MoreLikeThis results for relevancy

2018-02-07 Thread Arnold Bronley
Hi, I am using MoreLikeThis handler to get related documents for a given document. To determine if I am getting good results or not, here is what I do: The same original document should be returned as a top match. If it is not, then there is some problem with the relevancy. Then, as same input

Re: Spellcheck collations results

2018-02-07 Thread Arnold Bronley
Thanks for replying Alessandro. I am passing these parameters: q=polt=polt=json=true=true=7=true=true=true=3=3=true=0.72 On Thu, Jan 25, 2018 at 4:28 AM, alessandro.benedetti wrote: > Can you tell us the request parameters used for the spellcheck ? > > In particular

Relevancy Tuning For Solr With Apache Nutch 2.3

2018-02-07 Thread Mukhopadhyay, Aratrika
Hello , I am attempting to tune my results that I retrieve from solr to boost the importance of certain fields. The syntax of the query I am using is as follows :

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread Walter Underwood
I understand what you are asking for. Solr doesn’t work like that. Solr is not a programming language Short-circuit evaluation isn’t especially useful for a search engine. Most of the work is fetching and uncompressing the posting lists. Calculating the score for each document is pretty fast.

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread bbarani
Walter, It's just that I have a use case (to evaluate one field over other) for which I am trying out multiple solutions in order to avoid making multiple calls to SOLR. I am trying to do a Short-circuit evaluation. Short-circuit evaluation, minimal evaluation, or McCarthy evaluation (after

Solr Autoscaling multi-AZ rules

2018-02-07 Thread Jeff Wartes
I’ve been messing around with the Solr 7.2 autoscaling framework this week. Some things seem trivial, but I’m also running into questions and issues. If anyone else has experience with this stuff, I’d be glad to hear it. Specifically: Context: -One collection, consisting of 42 shards, where

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread Walter Underwood
You don’t get to control the order of execution, other than specifying a filter query. I think you have the wrong mental model of how Solr does search. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 7, 2018, at 1:28 PM, bbarani

Payload fields

2018-02-07 Thread Brian Yee
Hello, I am trying to use Payload fields to store per-zone delivery dates for products. I have an index where my documents are products and for each product we want to store a date by when we can deliver that product for 1-100 different zones. Since the payload() function only supports int and

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread bbarani
Thanks Erick. I will check this out. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread bbarani
You are right. I don't care about the score rather I want a document containing specific term in a specific field to be evaluated first before checking the next field. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: MODIFYCOLLECTION via Solrj

2018-02-07 Thread Erick Erickson
Yeah, sometimes the sugar-methods/classes in SolrJ lag a bit behind the collections API. but at root about all these classes do is create a ModifiableSolrParams with all the params you'd specify and make an http call via the AsyncCollectionAdminRequest.process command last I knew. Best, Erick

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread Erick Erickson
If you don't care about its contribution to scoring, one option is to move the clause you want evaluated to an fq clause sitn {!cache=false cost=101}. see: http://yonik.com/advanced-filter-caching-in-solr/ Best, Erick On Wed, Feb 7, 2018 at 12:05 PM, Emir Arnautović

Re: Solr Swap space

2018-02-07 Thread Shawn Heisey
On 2/7/2018 12:01 PM, Susheel Kumar wrote: Just trying to find where do we set swap space available to Solr process. I see in our 6.0 instances it was set to 2GB on and on 6.6 instances its set to 16GB. Solr has absolutely no involvement or control over swap space. Neither does Java. This

Re: Solr Swap space

2018-02-07 Thread Emir Arnautović
Hi Susheel, Swap space is OS thing, not Solr thing. You should see how to disable swap space or at least set swappiness to some low number on your OS. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread Emir Arnautović
Hi, Also note that score is different if only one term match and if both terms are matched. Your case would make sense if you do not plan to order by score, but as Walter explained, Solr does not go document by document and evaluate query conditions, but it gets list of documents matching each

DataWorks Summit San Jose -Call For Abstract closes this Friday

2018-02-07 Thread Ana Castro
Hi Folks, This Friday is the last day to submit abstracts and talks in around Solr and Big Data Search. Could you please help reach out to others people in the Solr community to get the word out? Regards, Ana Castro [cid:image001.jpg@01D3A004.BEC85630] Hi Folks, DataWorks Summit San

Solr Swap space

2018-02-07 Thread Susheel Kumar
Hello, Just trying to find where do we set swap space available to Solr process. I see in our 6.0 instances it was set to 2GB on and on 6.6 instances its set to 16GB. Thanks, Susheel

Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread Walter Underwood
That doesn’t really make sense for Solr query evaluation. It fetches the posting lists for each term, then walks through them evaluating the query against all the documents. It can skip a document as soon as it fails the query, but it still has to fetch the posting lists. So, that feature

MODIFYCOLLECTION via Solrj

2018-02-07 Thread Hendrik Haddorp
Hi, I'm unable to find how I can do a MODIFYCOLLECTION via Solrj. I would like to change the replication factor of a collection but can't find it in the Solrj API. Is that not supported? regards, Hendrik

How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread bbarani
I am trying to figure out a way to form boolean (||) query in SOLR. Ideally my expectation is that with boolean operator ||, if first term is true second term shouldn't be evaluated. =searchTerms:"testing" || matchStemming:"stemming" works same as =searchTerms:"testing" OR

Highlighting over date fields

2018-02-07 Thread LOPEZ-CORTES Mariano-ext
It's possible to use highlighting over date fields ? We've tried but we've got no highlighting response for the field.

Re: Long GC Pauses

2018-02-07 Thread Shawn Heisey
On 2/7/2018 8:08 AM, Shawn Heisey wrote: If your queries are producing the correct results, then I will tell you that the "summary" part of your query example is quite possibly completely unnecessary After further thought, I have concluded that this part of what I said is probably completely

Re: Bi Gram token generation with fuzzy searches

2018-02-07 Thread Walter Underwood
I think you need the feature in SOLR-629 that adds fuzzy to edismax. https://issues.apache.org/jira/browse/SOLR-629 The patch on that issue is for Solr 4.x, but I believe someone is working on a new patch. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re: Long GC Pauses

2018-02-07 Thread Shawn Heisey
On 2/7/2018 5:20 AM, Maulin Rathod wrote: Further analyzing issue we found that asking for too many rows (e.g. rows=1000) can cause full GC problem as mentioned in below link. This is because when you ask for 10 million rows, Solr allocates a memory structure capable of storing

Re: SynonymGraphFilterFactory with WordDelimiterGraphFilterFactory usage

2018-02-07 Thread Steve Rowe
Thanks Webster, I created https://issues.apache.org/jira/browse/SOLR-11955 to work on this. -- Steve www.lucidworks.com > On Feb 6, 2018, at 2:47 PM, Webster Homer wrote: > > I noticed that in some of the current example schemas that are shipped with > Solr, there is a

Re: Multiple consecutive wildcards (**) causes Out-of-memory

2018-02-07 Thread Bjarke Buur Mortensen
Just to clarify: I can only cause this to happen when using the complexphrase query parser. Lucene/dismax/edismax parsers are not affected. 2018-02-07 13:09 GMT+01:00 Bjarke Buur Mortensen : > Hello list, > > Whenever I make a query for ** (two consecutive wildcards) it

Re: Long GC Pauses

2018-02-07 Thread Ere Maijala
Hi Maulin, I'll chime in by referring to my own findings when analyzing Solr performance: https://www.mail-archive.com/solr-user@lucene.apache.org/msg135857.html Yonik has a good article about paging: http://yonik.com/solr/paging-and-deep-paging/. While it's about deep paging, the same

RE: Long GC Pauses

2018-02-07 Thread Maulin Rathod
Hi Erick, Thanks for your response. It shows GC pauses in Solr GC logs (refer below solr gc log where it shows 138.4138211 sec pause). Seems like some bad query causes high memory allocation. Further analyzing issue we found that asking for too many rows (e.g. rows=1000) can cause

Multiple consecutive wildcards (**) causes Out-of-memory

2018-02-07 Thread Bjarke Buur Mortensen
Hello list, Whenever I make a query for ** (two consecutive wildcards) it causes my Solr to run out of memory. http://localhost:8983/solr/select?q=** Why is that? I realize that this is not a reasonable query to make, but the system supports input from users, and they might by accident input

Re: Bi Gram token generation with fuzzy searches

2018-02-07 Thread Emir Arnautović
Hi Sravan, Edismax has ’sow’ parameter that results in edismax to pass query to field analysis, but not sure how it will work with fuzzy search. What you might do is use _query synthax to separate shingle and non shingle queries, e.g. q=_query({!edismax sow=false qf=title_bigrams}$v) OR

Bi Gram token generation with fuzzy searches

2018-02-07 Thread Sravan Kumar
We have the following two fields for our movie title search - title without symbols a custom analyser with WordDelimiterFilterFactory, SynonymFilterFactory and other filters to retain only alpha numeric characters. - title with word bi grams a custom analyser with solr.ShingleFilterFactory to

Re: 9000+ CLOSE_WAIT connections in solr v6.2.2 causing it to "die"

2018-02-07 Thread mmb1234
> Maybe this is the issue: https://github.com/eclipse/jetty.project/issues/2169 Looks like it is the issue. (I've readacted IP addresses below for security reasons) solr [ /opt/solr ]$ netstat -ptan | awk '{print $6 " " $7 }' | sort | uniq -c 8425 CLOSE_WAIT - 92 ESTABLISHED - 1