Re: add field requires collection reload

2016-04-18 Thread Hendrik Haddorp
Thanks, I knew I had seen a bug like this somewhere but could not find it yesterday. In yesterday's test run I actually had only one node and still got this problem. So I'll keep the collection reload until switching to 6.1 then. On 19/04/16 01:51, Erick Erickson wrote: > The key here is you say

Re: Overall large size in Solr across collections

2016-04-18 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for your explanation. I have set my segment size to 20GB under the TieredMergePolicy 10 10 20480 Does it means that the segment merging will occurs more often, as it will need to keep merging during indexing till it reaches 20GB. I do have 192GB of RAM on my server which

Re: Not seeing the tokenized values when using solr.PathHierarchyTokenizerFactory

2016-04-18 Thread Mark Robinson
Thanks much Eric! Got it. Best, Mark. On Mon, Apr 18, 2016 at 7:53 PM, Erick Erickson wrote: > Assuming that you're talking about the docs returned in the result > sets, these are the _stored_ fields, not the analyzed field. Stored > fields are a verbatim copy of the

Re: Querying of multiple string value

2016-04-18 Thread Zheng Lin Edwin Yeo
Hi Shawn, Regarding the terms query parser, is it possible to search for query that are not in the list? In the normal OR parameters, I can do something like http://localhost:8983/solr/collection1/highlight?q=!id:collection1_0001

Re: block join rollups

2016-04-18 Thread Nick Vasilyev
Hi Yonik, Well, no one replied to this yet, so I thought I'd chime in with some of the use cases that I am working with. Please note that I am lagging a big behind the last few releases, so I haven't had time to experiment with Solr 5.3+, I am sure that some of this is included in there already

Re: Cannot use Phrase Queries in eDisMax and filtering

2016-04-18 Thread Doug Turnbull
Also you mentioned your field was a string? This means the field must match *exactly* to be considered.a phrase match. Have you considered changing the field to text field type with a tokenizer and doing phrase matching -- it might work more like you'd expect. Thanks -Doug On Mon, Apr 18, 2016

Re: what is opening realtime Searcher

2016-04-18 Thread Doug Turnbull
Erick can correct me. I think "searcher" here might just sound a bit misleading. Real time get is really about fetching by id, not issuing searches per-se. Only after a soft or hard commit does a document truly become searchable. On Mon, Apr 18, 2016 at 8:02 PM Erick Erickson

Re: Solr Support for BM25F

2016-04-18 Thread Doug Turnbull
It's worth adding that Lucene's BlendedTermQuery, (used in Elasticsearch's cross_field search), attempts to blend field's document frequency together. So I wonder what BlendedTermQuery plus BM25 similarity per-field would do? It might be close to true BM25F aside for the length issue. (You'd have

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread John Bickerstaff
Thanks Eric, for the confirmation. On Apr 18, 2016 5:48 PM, "Erick Erickson" wrote: > In short, I'm afraid I have to agree with your IT guy. > > I like SolrCloud, it's wy cool. But in your situation I really > can't say it's compelling. > > The places SolrCloud

Re: what is opening realtime Searcher

2016-04-18 Thread Erick Erickson
This is about real-time get. The idea is this. Suppose you have a doc doc1 already in your index at time T1 and update it at time T2 and your soft commit happens at time T3. If a search a search happens between time T1 and T2 but the fetch happens between T2 and T3, you get back the updated

Re: Cannot use Phrase Queries in eDisMax and filtering

2016-04-18 Thread Erick Erickson
bq: I cannot find either the condition on the field analyzer to be able to use pf, pf2 and pf3. These don't apply to field analysis at all. What they translate into is a series of phrase queries against different sets of fields. So, you may have pf=fieldA^5 fieldB pf2=fieldA^3 fieldC Now a query

Re: Not seeing the tokenized values when using solr.PathHierarchyTokenizerFactory

2016-04-18 Thread Erick Erickson
Assuming that you're talking about the docs returned in the result sets, these are the _stored_ fields, not the analyzed field. Stored fields are a verbatim copy of the original input. Best, Erick On Mon, Apr 18, 2016 at 12:51 PM, Mark Robinson wrote: > Hi, > > I was

Re: add field requires collection reload

2016-04-18 Thread Erick Erickson
The key here is you say "sometimes". It takes a while for the reload operation to propagate to _all_ the replicas that makeup your collection. My bet is that by immediately indexing after changing the data, your updates are getting to a core that hasn't reloaded yet. That said,

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Erick Erickson
In short, I'm afraid I have to agree with your IT guy. I like SolrCloud, it's wy cool. But in your situation I really can't say it's compelling. The places SolrCloud shines: automatically routing docs to shards.. You're not sharing. Automatically electing a new leader (analogous to master)

Re: Solr Support for BM25F

2016-04-18 Thread Tom Burton-West
Hi David, It may not matter for your use case but just in case you really are interested in the "real BM25F" there is a difference between configuring K1 and B for different fields in Solr and a "real" BM25F implementation. This has to do with Solr's model of fields being mini-documents (i.e.

add field requires collection reload

2016-04-18 Thread Hendrik Haddorp
Hi, I'm using SolrCloud 6.0 with a managed schema. When I add fields using SolrJ and immediately afterwards try to index data I sometimes get an error telling me that a field that I just added does not exist. If I do an explicit collection reload after the schema modification things seem to work.

Not seeing the tokenized values when using solr.PathHierarchyTokenizerFactory

2016-04-18 Thread Mark Robinson
Hi, I was using the solr.PathHierarchyTokenizerFactory for a field say fieldB. An input data like A/B/C when I check using the ANALYSIS facility in the admin UI, is tokenized as A, A/B, A/B/C in fieldB. A/B/C in my system is a "string" value in a fieldA which is both indexed=stored=true. I

Re: Cannot use Phrase Queries in eDisMax and filtering

2016-04-18 Thread Antoine LE FLOC'H
Hello, I don't have Solr source code handy but is pf3=1& pf2=1& valid ? What would that do ? use the df or qf fields ? This https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser says that the value of pf2 is a multivalued list of fields ? There are not many example

Re: Getting duplicate output while doing auto suggestion based on multiple filed using copy filed in solr 5.5

2016-04-18 Thread Tejas Bhanushali
HI Team, I tried to run the same example as suggested by Chris Hostetter and i get to know it's working fine for single field, but my requirement is it should suggest based on multiple fields .i.e not only on "cat" field but it should suggest based on few other fields like 'name','manu' etc. and

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread John Bickerstaff
Nice - thanks Daniel. On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote: > One thing I like about SolrCloud is that I don't have to configure > Master/Slave replication in each "core" the same way to get them to > replicate. > > The other thing I like

RE: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Davis, Daniel (NIH/NLM) [C]
One thing I like about SolrCloud is that I don't have to configure Master/Slave replication in each "core" the same way to get them to replicate. The other thing I like about SolrCloud, which is largely theoretical at this point, is that I don't need to test changes to a collection's

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread John Bickerstaff
So - my IT guy makes the case that we don't really need Zookeeper / Solr Cloud... He may be right - we're serving static data (changes to the collection occur only 2 or 3 times a year and are minor) We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- each configured the same

what is opening realtime Searcher

2016-04-18 Thread Jaroslaw Rozanski
Hi, What exactly triggers opening new "realtime" searcher? 2016-04-18_16:28:02.33289 INFO  (qtp1038620625-13) [c:col1 s:shard1 r:core_node3 x:col1_shard1_replica3] o.a.s.s.SolrIndexSearcher Opening Searcher@752e986f[col1_shard1_replica3] realtime I am seeing above being triggered when adding

Re: Index BackUp using JDK 8 & Restore using JDK 7. Does this work?

2016-04-18 Thread Manohar Sripada
Thanks Shawn! :-) On Mon, Apr 18, 2016 at 6:42 PM, Shawn Heisey wrote: > On 4/18/2016 12:49 AM, Manohar Sripada wrote: > > We are using Solr 5.2.1 and JDK 7. We do create a static index in one > > cluster (solr cluster 1) and ship that index to another cluster (Solr > >

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Tom Evans
On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff wrote: > Thanks all - very helpful. > > @Shawn - your reply implies that even if I'm hitting the URL for a single > endpoint via HTTP - the "balancing" will still occur across the Solr Cloud > (I understand the caveat

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread John Bickerstaff
Excellent - thanks! On Mon, Apr 18, 2016 at 9:16 AM, Erick Erickson wrote: > Your summary pretty much nails it. > > For (b) note that CloudSolrClient uses an internal software load > balancer to distribute queries, FWIW. > > > > On Mon, Apr 18, 2016 at 7:52 AM, John

Re: Wildcard query behavior.

2016-04-18 Thread Erick Erickson
Here's a blog on the subject: https://lucidworks.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ bq: When validator is changed to validate, both at query time and index time, then should not validator*/validator return the same results at-least? This is one of

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Erick Erickson
Your summary pretty much nails it. For (b) note that CloudSolrClient uses an internal software load balancer to distribute queries, FWIW. On Mon, Apr 18, 2016 at 7:52 AM, John Bickerstaff wrote: > Thanks all - very helpful. > > @Shawn - your reply implies that even

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread John Bickerstaff
Thanks all - very helpful. @Shawn - your reply implies that even if I'm hitting the URL for a single endpoint via HTTP - the "balancing" will still occur across the Solr Cloud (I understand the caveat about that single endpoint being a potential point of failure). I just want to verify that I'm

want to subscribe

2016-04-18 Thread SRINI SOLR

Re: Adding a new shard

2016-04-18 Thread Jay Potharaju
Thanks for the explaination Erick!. I will try out your recommendation. On Sun, Apr 17, 2016 at 3:34 PM, Erick Erickson wrote: > bq: So inorder for me move the shards to its own instances, I will have to > take a down time and move the newly created shards & replicas

Re: normal solr query vs facet query performance

2016-04-18 Thread Shawn Heisey
On 4/18/2016 5:06 AM, Mugeesh Husain wrote: > 1.)solr normal query(q=*:*) vs facet query(facet.query="abc") ? > 2.)solr normal query(q=*:*) vs facet > search(facet=tru=coullumn_name) ? > 3.)solr filter query(q=Column:some value) vs facet query(facet.query="abc") > ? > 4.)solr normal query(q=*:*)

Re: Overall large size in Solr across collections

2016-04-18 Thread Shawn Heisey
On 4/18/2016 4:22 AM, Zheng Lin Edwin Yeo wrote: > I have many collections in Solr, but with only 1 shard. I found that the > index size across all the collections has passed the 1TB mark. Currently > the query speed is still normal, but the indexing speed seems to be become > slower. > > Will it

Re: Wildcard query behavior.

2016-04-18 Thread Shawn Heisey
On 4/18/2016 1:18 AM, Modassar Ather wrote: > When I search for f:validator I get 80K+ documents whereas if I search for > f:validator* I get only around 150 results. > > When I checked on analysis page I see that validator is changed to > validate. Per my understanding in both the above cases it

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Shawn Heisey
On 4/17/2016 10:35 PM, John Bickerstaff wrote: > My prior use of SOLR in production was pre SOLR cloud. We put a > round-robin load balancer in front of replicas for searching. > > Do I understand correctly that a load balancer is unnecessary with SOLR > Cloud? I. E. -- SOLR and Zookeeper will

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Jack Krupansky
SolrJ does indeed provide load balancing via CloudSolrClient which uses LBHttpSolrClient: https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/impl/LBHttpSolrClient.html

Re: Index BackUp using JDK 8 & Restore using JDK 7. Does this work?

2016-04-18 Thread Shawn Heisey
On 4/18/2016 12:49 AM, Manohar Sripada wrote: > We are using Solr 5.2.1 and JDK 7. We do create a static index in one > cluster (solr cluster 1) and ship that index to another cluster (Solr > cluster 2). Solr Cluster 2 is the one where queries will be fired. > > Due to some unavoidable reasons,

[ANNOUNCEMENT] Luke 6.0.0 released

2016-04-18 Thread Dmitry Kan
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-6.0.0 Major upgrade to new Lucene 6.0.0 API. #55 Enjoy! -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter:

normal solr query vs facet query performance

2016-04-18 Thread Mugeesh Husain
Hello, I am looking for which query will be fast in term of performance, 1.)solr normal query(q=*:*) vs facet query(facet.query="abc") ? 2.)solr normal query(q=*:*) vs facet search(facet=tru=coullumn_name) ? 3.)solr filter query(q=Column:some value) vs facet query(facet.query="abc") ? 4.)solr

Overall large size in Solr across collections

2016-04-18 Thread Zheng Lin Edwin Yeo
Hi, I have many collections in Solr, but with only 1 shard. I found that the index size across all the collections has passed the 1TB mark. Currently the query speed is still normal, but the indexing speed seems to be become slower. Will it affect the performance if I continue to increase the

Re: Wildcard query behavior.

2016-04-18 Thread Modassar Ather
Thanks Reth for your response. When validator is changed to validate, both at query time and index time, then should not validator*/validator return the same results at-least? E.g. 5 documents contains validator. At index time validator got changed to validate. Now when validator* is searched it

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Jaroslaw Rozanski
Hi, How are you executing searches? I am asking because if you search using Solr client, for example SolrJ - ie. create instance of CloudSolrClient, and not directly via HTTP endpoint, it will provided load-balancing (last time I checked it picks random non-stale node). Thanks, Jarek On Mon,

Re: Wildcard query behavior.

2016-04-18 Thread Reth RM
If you search for f:validat*, then I believe you will get same number of results. Please check. f:validator* is searching for records that have prefix "validator" where as field with stemmer which stems "validator" to "validate" (if this stemming was applied at index time as well as query time)

Wildcard query behavior.

2016-04-18 Thread Modassar Ather
Hi, Please help me understand following. I have analysis chain which uses KStemFilterFactory for a field. Solr version is 5.4.0 When I search for f:validator I get 80K+ documents whereas if I search for f:validator* I get only around 150 results. When I checked on analysis page I see that

Index BackUp using JDK 8 & Restore using JDK 7. Does this work?

2016-04-18 Thread Manohar Sripada
We are using Solr 5.2.1 and JDK 7. We do create a static index in one cluster (solr cluster 1) and ship that index to another cluster (Solr cluster 2). Solr Cluster 2 is the one where queries will be fired. Due to some unavoidable reasons, we want to upgrade Solr Cluster 1 to JDK 8. But, we

Re: Solr best practices for many to many relations...

2016-04-18 Thread Bastien Latard - MDPI AG
Thanks everybody. Your answers are very interesting, however I'm not sure I'm getting them properly (sorry I'm not an expert... it might be evident for you)... *When you're speaking about denormalization, does it mean: 1. something like that?* */-> I think that the answer