Reading data from Oracle

2018-02-14 Thread LOPEZ-CORTES Mariano-ext
Hello We have to delete our Solr collection and feed it periodically from an Oracle database (up to 40M rows). We've done the following test: From a java program, we read chunks of data from Oracle and inject to Solr (via Solrj). The problem : It is really really slow (1'5 nights). Is there

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Howe, David
I have re-run both scenarios and captured the total size of each type of index file. The MB (1) column is for the baseline scenario which has the smaller index and acceptable performance. The MB(2) column is after I have added the extra field to the index. Ext MB (1) MB (2)

Re: facet.method=uif not working in solr cloud?

2018-02-14 Thread Wei
Thanks Yonik. If uif has big upfront cost when hits solr the first time, in solr cloud the same faceting request could hit different replicas in the same shard, so that cost will happen at least for the number of replicas? If we are doing frequent auto commits, fieldvaluecache will be invalidated

Using dynamic synonyms file

2018-02-14 Thread Roopa Rao
Hi, Is it possible to specify the synonyms file as a variable, set a default synonym file and passing the file name from the request? If so, is there an example of this? Such as, Thanks, Roopa

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Roopa Rao
I see okay, thank you. On Wed, Feb 14, 2018 at 10:34 AM, Alessandro Benedetti wrote: > I see, > According to what I know it is not possible to run for the same field > different query time analysis. > > Not sure if anyone was working on that. > > Regards > > > > - >

Re: facet.method=uif not working in solr cloud?

2018-02-14 Thread Yonik Seeley
On Wed, Feb 14, 2018 at 2:28 PM, Wei wrote: > Thanks all! It's really great learning. A bit off the topic, after I > enabled facet.method = uif in solr cloud, the faceting performance is > actually much worse than the original fc( ~1000 ms with uif vs ~200 ms > with fc).

Re: facet.method=uif not working in solr cloud?

2018-02-14 Thread Wei
Thanks all! It's really great learning. A bit off the topic, after I enabled facet.method = uif in solr cloud, the faceting performance is actually much worse than the original fc( ~1000 ms with uif vs ~200 ms with fc). My cloud has 8 shards with 6 replicas in each shard. I do see that

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
You are right, in my case this field type was applied to many text fields. These includes many copy fields and dynamic fields as well. In my case, only specifying omitNorms=true for field type "text_general" fixed the issue. I didn't do anything else or had any other bug. On Wed, Feb 14, 2018 at

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Alessandro Benedetti
Hi pratik, how is it possible that just the norms for a single field were causing such a massive index size increment in your case ? In your case I think it was for a field type used by multiple fields, but it's still suspicious in my opinions, norms should be that big. If I remember correctly in

Re: Issue Using JSON Facet API Buckets in Solr 6.6

2018-02-14 Thread Yonik Seeley
Could you provide the full stack trace containing "Invalid Date String" and the full request that causes it? Are you using any custom code/plugins in Solr? -Yonik On Mon, Feb 12, 2018 at 4:55 PM, Antelmo Aguilar wrote: > Hi, > > I was using the following part of a query to get

Re: Issue Using JSON Facet API Buckets in Solr 6.6

2018-02-14 Thread Antelmo Aguilar
Hello, I just wanted to follow up on this issue I am having in case it got lost. I have been trying to figure this out and so far the only solution I can find is using the older version. If you need more details from me, please let me know. I would really appreciate any help. Best, Antelmo On

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Erick Erickson
Pratik may have jumped right to the difference. We'd have gotten there eventually by looking at file extensions, but just checking his recommendation would be the first thing to do! bq: what would be the right scenarios to use docvalues='true'? Whenever you want to facet, group or sort on the

Solr Recommended setup

2018-02-14 Thread Wael Kader
Hi, I would like to get a recommendation for the SOLR setup I have. I have an index getting around 2 Million records per day. The index used is in Cloudera Search (Solr). I am running everything on one node. I run SOLR commits for whatever data that comes to the index every 5 minutes. The whole

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Alessandro Benedetti
I see, According to what I know it is not possible to run for the same field different query time analysis. Not sure if anyone was working on that. Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from:

Re: Replicas: sending query to leader and replica simultaneously

2018-02-14 Thread SOLR4189
Thank you, Emir for your answer *But it will not send request to multiple replicas - that would be a waste of resources.* What if server is overloaded, but it is responsive? Then it will not be a waste of resources, because second replica will response faster then overloaded replica. *and flag

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Roopa Rao
So, I would end up with ~6 copy fields with ~8 synonym files so that would be about 48 field/synonym combination. Would that be a significant in terms of index size. What would be the best way to measure this? Custom parser: This would take the file name, field to run the analysis on. This field

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Roopa Rao
So, I would end up with ~6 copy fields with ~8 synonym files so that would be about 48 field/synonym combination. Would that be a significant in terms of index size. I guess that depends on the thesaurus size, what would be the best way to measure this? Custom parser: This would take the file

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
I had a similar issue with index size after upgrading to version 6.4.1 from 5.x. The issue for me was that the field which caused index size to be increased disproportionately had a field type("text_general") for which default value of omitNorms was not true. Turning it on explicitly on field

Solr CDCR doesn't work if the authentication is enabled

2018-02-14 Thread dimaf
I set up CDCR in my test environment and it worked perfectly until I uploaded security.json files to Zookeeper clusters of a Target and a Source SolrClouds. security.json files are identical for both Clouds as well as collections name. The Source has the next errors:

Re: Not getting appropriate spell suggestions

2018-02-14 Thread Alessandro Benedetti
Given your schema the stemmer seems a very likely responsible. You need to disable it and re-index. Just commenting it is not going to work if you don't re-index. Cheers - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io --

Re: Replicas: sending query to leader and replica simultaneously

2018-02-14 Thread Emir Arnautović
Hi, Solr will loadbalance replicas and if one is unresponsive, send it to another and flag unresponsive one. But it will not send request to multiple replicas - that would be a waste of resources. If you want something like that, you would probably have to set up two separate clusters and send

Re: Solr - Managed Resources REST API to get stopwords

2018-02-14 Thread Alessandro Hoss
So are you saying this REST api can't give me access to stopwords defined in this file? Is there a query which will give me stopwords defined in server\solr\collection\conf\lang\stopwords_en.txt file ? No, the managed resources are managed via API, and stored in a "

RE: Solr search word NOT followed by another word

2018-02-14 Thread Allison, Timothy B.
In process, should finish by end of this week. I had to put SlowFuzzyQuery back in, and I discovered SOLR-11976 while trying to upgrade. I'll have to do a workaround until that is fixed. -Original Message- From: simon [mailto:mtnes...@gmail.com] Sent: Monday, February 12, 2018 1:21

Re: Solr - Managed Resources REST API to get stopwords

2018-02-14 Thread ruby
I was hoping to get back the list of stopwords which are defined in server\solr\collection\conf\lang\stopwords_en.txt file. So are you saying this REST api can't give me access to stopwords defined in this file? Is there a query which will give me stopwords defined in

Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-14 Thread Emir Arnautović
Hi Ganesh, I cannot confirm for sure, but I would assume that it will not get reindexed, but just segments doc values file rewritten. It is best if you test this and see for yourself. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting

Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-14 Thread mganeshs
Hi Emir, Thanks for confirming that strField is not considered / available for in place updates. As per documentation, it says... *An atomic update operation is performed using this approach only when the fields to be updated meet these three conditions: are non-indexed (indexed="false"),

Not getting appropriate spell suggestions

2018-02-14 Thread Jaimin Patel
Here is link to solrconfig.xml and managed-schema . I have document with following value in suggestions (field used for spell check), same information as

RE: Solr search word NOT followed by another word

2018-02-14 Thread ivan
Hi Timothy, i'm trying to use your Parser, but i'm having some trouble with the versions of solr\lucene. I'm trying to use version 6.4.1 but i'm facing a lot of incompatibilities with version 5. Is there any updated version of the plugin? -- Sent from:

Re: Limit search queries only to pull replicas

2018-02-14 Thread Ere Maijala
I've now posted https://issues.apache.org/jira/browse/SOLR-11982 with a patch. It works just like preferLocalShards. SOLR-10880 is awesome, but my idea is not to filter out anything, so this just adjusts the order of nodes. --Ere Tomas Fernandez Lobbe kirjoitti 8.1.2018 klo 21.42: This

Re: docvalues set to true, and indexed is false and stored is set to false

2018-02-14 Thread Emir Arnautović
Hi Ganesh, Doc values are enabled for strField and UUID but in place updates are not. It is not free = according to some discussions on mailing list (did not check the code) in place updates are not update of some value in doc values file but rewrite of doc values file for the segment that it

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Alessandro Benedetti
"I can go with the "title" field and have that include the synonyms in analysis. Only problem is that the number of fields and number of synonyms files are quite a lot (~ 8 synonyms files) due to different weightage and type of expansion (exact vs partial) based on these. Hence going with this

Re: Judging the MoreLikeThis results for relevancy

2018-02-14 Thread Alessandro Benedetti
So let me answer point by point : 1) Similarity is misleading here if you interpret it as a probabilistic measure. Given a query, it doesn't exist the "Ideal Document". Both with TF-IDF and BM25 ( that solves the problem better) you are scoring the document. Higher the score, higher the

Re: Request routing / load-balancing TLOG & PULL replica types

2018-02-14 Thread Ere Maijala
A patch is now available: https://issues.apache.org/jira/browse/SOLR-11982 --Ere Greg Roodt kirjoitti 12.2.2018 klo 22.06: Thanks Ere. I've taken a look at the discussion here: http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html This is how I was

Re: Solr search word NOT followed by another word

2018-02-14 Thread ivan
I'm working on 6.4.1 (but i tried on 7.2.1 too) and i'm not getting results for the case i've shown before. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html