Re: Solr 7.2.1 - unexpected docvalues type

2019-11-08 Thread Antony Alphonse
>
> Hi Shawn,
>

I will try that solution. Also I had to mention that the queries that fail
with this error has the "group.field":"lowercase". Should I change the
field type?

Thanks,
Antony


Solr as a windows service

2019-11-08 Thread thusharanv631
Hi, 

I am using solr 8.3 and I need it to use as a running background windows 
service. After the 5x release it’s not supporting external container as tomcat 
so what is the steps to use it as a background windows service running 
automatically. I used nssm to achieve this but other than that any thing 
possible or can deploy it still with tomcat? 
Regards,

Thushara 

Re: Solr 7.2.1 - unexpected docvalues type

2019-11-08 Thread Shawn Heisey

On 11/8/2019 5:31 PM, Antony Alphonse wrote:

I shared the collection and re-indexed the data with the same schema. But
one of the field is throwing the below error. Any suggestions?



ERROR (qtp672320506-32) [c: s:shard3 r:core_node01 x:_shard3_replica_n69]
o.a.s.h.RequestHandlerBase java.lang.IllegalStateException: unexpected
docvalues type SORTED_SET for field 'lowercase' (expected=SORTED). Re-index
with correct docvalues type.


This error means that part of the index was created with one definition 
for the field in question, then the schema was changed in an 
incompatible way, and additional indexing was attempted.


The solution to this particular error is to completely delete the index 
directories that make up the collection, reload it, and then build it 
from scratch again.  The error happens at the Lucene level and the only 
way to fix it is to completely delete the index.  You could do it by 
creating an entirely new collection.


Thanks,
Shawn


Solr 7.2.1 - unexpected docvalues type

2019-11-08 Thread Antony Alphonse
Hi,

I shared the collection and re-indexed the data with the same schema. But
one of the field is throwing the below error. Any suggestions?





ERROR (qtp672320506-32) [c: s:shard3 r:core_node01 x:_shard3_replica_n69]
o.a.s.h.RequestHandlerBase java.lang.IllegalStateException: unexpected
docvalues type SORTED_SET for field 'lowercase' (expected=SORTED). Re-index
with correct docvalues type.
at org.apache.lucene.index.DocValues.checkField(DocValues.java:340)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:392)
at
org.apache.lucene.search.grouping.TermGroupSelector.setNextReader(TermGroupSelector.java:56)
at
org.apache.lucene.search.grouping.FirstPassGroupingCollector.doSetNextReader(FirstPassGroupingCollector.java:350)
at
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
at
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121)
at
org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:651)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:462)
at
org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:239)
at
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:162)
at
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchFirstPhase(QueryComponent.java:1279)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:360)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)

Thanks!!


Re: Need some guidance to understand differences (infra, security etc) between LTS version and latest stable version.

2019-11-08 Thread suyog joshi
Sure, Thanks Erick for quick reply as always :) 

Regards,
Suyog Joshi



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
I use 3 word shingles with stopwords for my MLT ML trainer that worked
pretty well for such a solution, but for a full index the size became
prohibitive

On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood 
wrote:

> If we had IDF for phrases, they would be super effective. The 2X weight is
> a hack that mostly works.
>
> Infoseek had phrase IDF and it was a killer algorithm for relevance.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Nov 8, 2019, at 11:08 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > the pf and qf fields are REALLY nice for this
> >
> > On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood 
> > wrote:
> >
> >> I always enable phrase searching in edismax for exactly this reason.
> >>
> >> Something like:
> >>
> >>   title^16 keywords^8 text^2
> >>
> >> To deal with concepts in queries, a classifier and/or named entity
> >> extractor can be helpful. If you have a list of concepts (“controlled
> >> vocabulary”) that includes “Lamin A”, and that shows up in a query, that
> >> term can be queried against the field matching that vocabulary.
> >>
> >> This is how LinkedIn separates people, companies, and places, for
> example.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Nov 8, 2019, at 10:48 AM, Erick Erickson 
> >> wrote:
> >>>
> >>> Look at the “mm” parameter, try setting it to 100%. Although that’t not
> >> entirely likely to do what you want either since virtually every doc
> will
> >> have “a” in it. But at least you’d get docs that have both terms.
> >>>
> >>> you may also be able to search for things like “Lamin A” _only as a
> >> phrase_ and have some luck. But this is a gnarly problem in general.
> Some
> >> people have been able to substitute synonyms and/or shingles to make
> this
> >> work at the expense of a larger index.
> >>>
> >>> This is a generic problem with context. “Lamin A” is really a
> “concept”,
> >> not just two words that happen to be near each other. Searching as a
> phrase
> >> is an OOB-but-naive way to try to make it more likely that the ranked
> >> results refer to the _concept_ of “Lamin A”. The assumption here is “if
> >> these two words appear next to each other, they’re more likely to be
> what I
> >> want”. I say “naive” because “Lamins: A new approach to...” would
> _also_ be
> >> found for a naive phrase search. (I have no idea whether such a title
> makes
> >> sense or not, but you figured that out already)...
> >>>
> >>> To do this well you’d have to dive in to NLP/Machine learning.
> >>>
> >>> I truly wish we could have the DWIM search algorithm (Do What I Mean)….
> >>>
>  On Nov 8, 2019, at 11:29 AM, Guilherme Viteri 
> >> wrote:
> 
>  HI Walter and Paras
> 
>  I indexed it removing all the references to StopWordFilter and I went
> >> from 121 results to near 20K as the search term q="Lymphoid and a
> >> non-Lymphoid cell" is matching entities such as "IFT A" or  "Lamin A".
> So I
> >> don't think removing it completely is the way to go from the scenario we
> >> have, but I appreciate the suggestion…
> 
>  Yes the response is using fl=*
>  I am trying some combinations at the moment, but yet no success.
> 
>  defType=edismax
>  q.alt=Lymphoid and a non-Lymphoid cell
>  Number of results=1599
>  Quite a considerable increase, even though reasonable meaningful
> >> results.
> 
>  I am sorry but I didn't understand what do you want me to do exactly
> >> with the lst (??) and qf and bf.
> 
>  Thanks everyone with their inputs
> 
> 
> > On 8 Nov 2019, at 06:45, Paras Lehana 
> >> wrote:
> >
> > Hi Guilherme
> >
> > By accident, I ended up querying the using the default handler
> >> (/select) and it worked.
> >
> > You've just found the culprit. Thanks for giving the material I
> >> requested. Your analysis chain is working as expected. I don't see any
> >> issue in either StopWordFilter or your boosts. I also use a boost of 50
> >> when boosting contextual suggestions (boosting "gold iphone" on a page
> of
> >> iphone) but I take Walter's suggestion and would try to optimize my
> >> weights. I agree that this 50 thing was not researched much about by us
> as
> >> well (we never faced performance or relevance issues).
> >
> > See the major difference in both the handlers - edismax. I'm pretty
> >> sure that your problem lies in the parsing of queries (you can confirm
> that
> >> from parsedquery key in debug of both JSON responses). I hope you have
> >> provided the response with fl=*. Replace q with q.alt in your /search
> >> handler query and I think you should start getting responses. That's
> >> because q.alt uses standard parser. If you want to keep using edisMax, I
> >> suggest you to test the responses removing some combination of lst (qf,
> bf)
> >> and find what's restricting 

RE: [EXTERNAL] Re: XLSX Response Writer

2019-11-08 Thread Lewin Joy (TMNA)
Hi Jorn,

I am using Solr version 7.1

Correction on the change that I did. I just added the jars in my solrconfig.xml 
file as below:




The cp command was the steps I saw in the reference guide. 
I figured it should be the same thing.

-Lewin

-Original Message-
From: Jörn Franke  
Sent: Friday, November 8, 2019 11:51 AM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: XLSX Response Writer

Which Solr version are you using?
The below command suggest you are using 6.3 - is this correct?

Have you restarted the Solr server after copying?

> Am 08.11.2019 um 18:42 schrieb Lewin Joy (TMNA) :
> 
> Hi,
> 
> How do I use the xlsx response writer to extract my results to an excel file?
> 
> I made the changes as per documentation to include the jars and gave wt=xlsx. 
> It did not work.
> Would this only work with solrJ? Can't we use this in the query parameter 
> =xlsx?
> 
> cp contrib/extraction/lib/*.jar server/solr-webapp/webapp/WEB-INF/lib/
> cp dist/solr-cell-6.3.0.jar server/solr-webapp/webapp/WEB-INF/lib/
> 
> 
> Thanks,
> Lewin


Re: XLSX Response Writer

2019-11-08 Thread Jörn Franke
Which Solr version are you using?
The below command suggest you are using 6.3 - is this correct?

Have you restarted the Solr server after copying?

> Am 08.11.2019 um 18:42 schrieb Lewin Joy (TMNA) :
> 
> Hi,
> 
> How do I use the xlsx response writer to extract my results to an excel file?
> 
> I made the changes as per documentation to include the jars and gave wt=xlsx. 
> It did not work.
> Would this only work with solrJ? Can't we use this in the query parameter 
> =xlsx?
> 
> cp contrib/extraction/lib/*.jar server/solr-webapp/webapp/WEB-INF/lib/
> cp dist/solr-cell-6.3.0.jar server/solr-webapp/webapp/WEB-INF/lib/
> 
> 
> Thanks,
> Lewin


XLSX Response Writer

2019-11-08 Thread Lewin Joy (TMNA)
Hi,

How do I use the xlsx response writer to extract my results to an excel file?

I made the changes as per documentation to include the jars and gave wt=xlsx. 
It did not work.
Would this only work with solrJ? Can't we use this in the query parameter 
=xlsx?

cp contrib/extraction/lib/*.jar server/solr-webapp/webapp/WEB-INF/lib/
cp dist/solr-cell-6.3.0.jar server/solr-webapp/webapp/WEB-INF/lib/


Thanks,
Lewin


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
If we had IDF for phrases, they would be super effective. The 2X weight is a 
hack that mostly works.

Infoseek had phrase IDF and it was a killer algorithm for relevance.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 8, 2019, at 11:08 AM, David Hastings  
> wrote:
> 
> the pf and qf fields are REALLY nice for this
> 
> On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood 
> wrote:
> 
>> I always enable phrase searching in edismax for exactly this reason.
>> 
>> Something like:
>> 
>>   title^16 keywords^8 text^2
>> 
>> To deal with concepts in queries, a classifier and/or named entity
>> extractor can be helpful. If you have a list of concepts (“controlled
>> vocabulary”) that includes “Lamin A”, and that shows up in a query, that
>> term can be queried against the field matching that vocabulary.
>> 
>> This is how LinkedIn separates people, companies, and places, for example.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Nov 8, 2019, at 10:48 AM, Erick Erickson 
>> wrote:
>>> 
>>> Look at the “mm” parameter, try setting it to 100%. Although that’t not
>> entirely likely to do what you want either since virtually every doc will
>> have “a” in it. But at least you’d get docs that have both terms.
>>> 
>>> you may also be able to search for things like “Lamin A” _only as a
>> phrase_ and have some luck. But this is a gnarly problem in general. Some
>> people have been able to substitute synonyms and/or shingles to make this
>> work at the expense of a larger index.
>>> 
>>> This is a generic problem with context. “Lamin A” is really a “concept”,
>> not just two words that happen to be near each other. Searching as a phrase
>> is an OOB-but-naive way to try to make it more likely that the ranked
>> results refer to the _concept_ of “Lamin A”. The assumption here is “if
>> these two words appear next to each other, they’re more likely to be what I
>> want”. I say “naive” because “Lamins: A new approach to...” would _also_ be
>> found for a naive phrase search. (I have no idea whether such a title makes
>> sense or not, but you figured that out already)...
>>> 
>>> To do this well you’d have to dive in to NLP/Machine learning.
>>> 
>>> I truly wish we could have the DWIM search algorithm (Do What I Mean)….
>>> 
 On Nov 8, 2019, at 11:29 AM, Guilherme Viteri 
>> wrote:
 
 HI Walter and Paras
 
 I indexed it removing all the references to StopWordFilter and I went
>> from 121 results to near 20K as the search term q="Lymphoid and a
>> non-Lymphoid cell" is matching entities such as "IFT A" or  "Lamin A". So I
>> don't think removing it completely is the way to go from the scenario we
>> have, but I appreciate the suggestion…
 
 Yes the response is using fl=*
 I am trying some combinations at the moment, but yet no success.
 
 defType=edismax
 q.alt=Lymphoid and a non-Lymphoid cell
 Number of results=1599
 Quite a considerable increase, even though reasonable meaningful
>> results.
 
 I am sorry but I didn't understand what do you want me to do exactly
>> with the lst (??) and qf and bf.
 
 Thanks everyone with their inputs
 
 
> On 8 Nov 2019, at 06:45, Paras Lehana 
>> wrote:
> 
> Hi Guilherme
> 
> By accident, I ended up querying the using the default handler
>> (/select) and it worked.
> 
> You've just found the culprit. Thanks for giving the material I
>> requested. Your analysis chain is working as expected. I don't see any
>> issue in either StopWordFilter or your boosts. I also use a boost of 50
>> when boosting contextual suggestions (boosting "gold iphone" on a page of
>> iphone) but I take Walter's suggestion and would try to optimize my
>> weights. I agree that this 50 thing was not researched much about by us as
>> well (we never faced performance or relevance issues).
> 
> See the major difference in both the handlers - edismax. I'm pretty
>> sure that your problem lies in the parsing of queries (you can confirm that
>> from parsedquery key in debug of both JSON responses). I hope you have
>> provided the response with fl=*. Replace q with q.alt in your /search
>> handler query and I think you should start getting responses. That's
>> because q.alt uses standard parser. If you want to keep using edisMax, I
>> suggest you to test the responses removing some combination of lst (qf, bf)
>> and find what's restricting the documents to come up. I'm out of office
>> today - would have certainly tried analyzing the field values of the
>> document in /select request and compare it with qf/bq in solrconfig.xml
>> /search. Do this for me and you'd certainly find something.
> 
> On Thu, 7 Nov 2019 at 21:00, Walter Underwood > > wrote:
> I normally use a weight of 8 for the most important field, like title.
>> Other fields might get a 4 

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
the pf and qf fields are REALLY nice for this

On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood 
wrote:

> I always enable phrase searching in edismax for exactly this reason.
>
> Something like:
>
>title^16 keywords^8 text^2
>
> To deal with concepts in queries, a classifier and/or named entity
> extractor can be helpful. If you have a list of concepts (“controlled
> vocabulary”) that includes “Lamin A”, and that shows up in a query, that
> term can be queried against the field matching that vocabulary.
>
> This is how LinkedIn separates people, companies, and places, for example.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Nov 8, 2019, at 10:48 AM, Erick Erickson 
> wrote:
> >
> > Look at the “mm” parameter, try setting it to 100%. Although that’t not
> entirely likely to do what you want either since virtually every doc will
> have “a” in it. But at least you’d get docs that have both terms.
> >
> > you may also be able to search for things like “Lamin A” _only as a
> phrase_ and have some luck. But this is a gnarly problem in general. Some
> people have been able to substitute synonyms and/or shingles to make this
> work at the expense of a larger index.
> >
> > This is a generic problem with context. “Lamin A” is really a “concept”,
> not just two words that happen to be near each other. Searching as a phrase
> is an OOB-but-naive way to try to make it more likely that the ranked
> results refer to the _concept_ of “Lamin A”. The assumption here is “if
> these two words appear next to each other, they’re more likely to be what I
> want”. I say “naive” because “Lamins: A new approach to...” would _also_ be
> found for a naive phrase search. (I have no idea whether such a title makes
> sense or not, but you figured that out already)...
> >
> > To do this well you’d have to dive in to NLP/Machine learning.
> >
> > I truly wish we could have the DWIM search algorithm (Do What I Mean)….
> >
> >> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri 
> wrote:
> >>
> >> HI Walter and Paras
> >>
> >> I indexed it removing all the references to StopWordFilter and I went
> from 121 results to near 20K as the search term q="Lymphoid and a
> non-Lymphoid cell" is matching entities such as "IFT A" or  "Lamin A". So I
> don't think removing it completely is the way to go from the scenario we
> have, but I appreciate the suggestion…
> >>
> >> Yes the response is using fl=*
> >> I am trying some combinations at the moment, but yet no success.
> >>
> >> defType=edismax
> >> q.alt=Lymphoid and a non-Lymphoid cell
> >> Number of results=1599
> >> Quite a considerable increase, even though reasonable meaningful
> results.
> >>
> >> I am sorry but I didn't understand what do you want me to do exactly
> with the lst (??) and qf and bf.
> >>
> >> Thanks everyone with their inputs
> >>
> >>
> >>> On 8 Nov 2019, at 06:45, Paras Lehana 
> wrote:
> >>>
> >>> Hi Guilherme
> >>>
> >>> By accident, I ended up querying the using the default handler
> (/select) and it worked.
> >>>
> >>> You've just found the culprit. Thanks for giving the material I
> requested. Your analysis chain is working as expected. I don't see any
> issue in either StopWordFilter or your boosts. I also use a boost of 50
> when boosting contextual suggestions (boosting "gold iphone" on a page of
> iphone) but I take Walter's suggestion and would try to optimize my
> weights. I agree that this 50 thing was not researched much about by us as
> well (we never faced performance or relevance issues).
> >>>
> >>> See the major difference in both the handlers - edismax. I'm pretty
> sure that your problem lies in the parsing of queries (you can confirm that
> from parsedquery key in debug of both JSON responses). I hope you have
> provided the response with fl=*. Replace q with q.alt in your /search
> handler query and I think you should start getting responses. That's
> because q.alt uses standard parser. If you want to keep using edisMax, I
> suggest you to test the responses removing some combination of lst (qf, bf)
> and find what's restricting the documents to come up. I'm out of office
> today - would have certainly tried analyzing the field values of the
> document in /select request and compare it with qf/bq in solrconfig.xml
> /search. Do this for me and you'd certainly find something.
> >>>
> >>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood  > wrote:
> >>> I normally use a weight of 8 for the most important field, like title.
> Other fields might get a 4 or 2.
> >>>
> >>> I add a “pf” field with the weights doubled, so that phrase matches
> have a higher weight.
> >>>
> >>> The weight of 8 comes from experience at Infoseek and Inktomi, two
> early web search engines. With different relevance algorithms and totally
> different evaluation and tuning systems, they settled on weights of 8 and
> 7.5 for HTML titles. With the the two radically different system getting

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
I always enable phrase searching in edismax for exactly this reason.

Something like:

   title^16 keywords^8 text^2

To deal with concepts in queries, a classifier and/or named entity extractor 
can be helpful. If you have a list of concepts (“controlled vocabulary”) that 
includes “Lamin A”, and that shows up in a query, that term can be queried 
against the field matching that vocabulary.

This is how LinkedIn separates people, companies, and places, for example.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 8, 2019, at 10:48 AM, Erick Erickson  wrote:
> 
> Look at the “mm” parameter, try setting it to 100%. Although that’t not 
> entirely likely to do what you want either since virtually every doc will 
> have “a” in it. But at least you’d get docs that have both terms.
> 
> you may also be able to search for things like “Lamin A” _only as a phrase_ 
> and have some luck. But this is a gnarly problem in general. Some people have 
> been able to substitute synonyms and/or shingles to make this work at the 
> expense of a larger index.
> 
> This is a generic problem with context. “Lamin A” is really a “concept”, not 
> just two words that happen to be near each other. Searching as a phrase is an 
> OOB-but-naive way to try to make it more likely that the ranked results refer 
> to the _concept_ of “Lamin A”. The assumption here is “if these two words 
> appear next to each other, they’re more likely to be what I want”. I say 
> “naive” because “Lamins: A new approach to...” would _also_ be found for a 
> naive phrase search. (I have no idea whether such a title makes sense or not, 
> but you figured that out already)...
> 
> To do this well you’d have to dive in to NLP/Machine learning.
> 
> I truly wish we could have the DWIM search algorithm (Do What I Mean)….
> 
>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri  wrote:
>> 
>> HI Walter and Paras
>> 
>> I indexed it removing all the references to StopWordFilter and I went from 
>> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid 
>> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think 
>> removing it completely is the way to go from the scenario we have, but I 
>> appreciate the suggestion…
>> 
>> Yes the response is using fl=*
>> I am trying some combinations at the moment, but yet no success.
>> 
>> defType=edismax
>> q.alt=Lymphoid and a non-Lymphoid cell
>> Number of results=1599
>> Quite a considerable increase, even though reasonable meaningful results. 
>> 
>> I am sorry but I didn't understand what do you want me to do exactly with 
>> the lst (??) and qf and bf.
>> 
>> Thanks everyone with their inputs
>> 
>> 
>>> On 8 Nov 2019, at 06:45, Paras Lehana  wrote:
>>> 
>>> Hi Guilherme
>>> 
>>> By accident, I ended up querying the using the default handler (/select) 
>>> and it worked. 
>>> 
>>> You've just found the culprit. Thanks for giving the material I requested. 
>>> Your analysis chain is working as expected. I don't see any issue in either 
>>> StopWordFilter or your boosts. I also use a boost of 50 when boosting 
>>> contextual suggestions (boosting "gold iphone" on a page of iphone) but I 
>>> take Walter's suggestion and would try to optimize my weights. I agree that 
>>> this 50 thing was not researched much about by us as well (we never faced 
>>> performance or relevance issues).  
>>> 
>>> See the major difference in both the handlers - edismax. I'm pretty sure 
>>> that your problem lies in the parsing of queries (you can confirm that from 
>>> parsedquery key in debug of both JSON responses). I hope you have provided 
>>> the response with fl=*. Replace q with q.alt in your /search handler query 
>>> and I think you should start getting responses. That's because q.alt uses 
>>> standard parser. If you want to keep using edisMax, I suggest you to test 
>>> the responses removing some combination of lst (qf, bf) and find what's 
>>> restricting the documents to come up. I'm out of office today - would have 
>>> certainly tried analyzing the field values of the document in /select 
>>> request and compare it with qf/bq in solrconfig.xml /search. Do this for me 
>>> and you'd certainly find something.  
>>> 
>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood >> > wrote:
>>> I normally use a weight of 8 for the most important field, like title. 
>>> Other fields might get a 4 or 2.
>>> 
>>> I add a “pf” field with the weights doubled, so that phrase matches have a 
>>> higher weight.
>>> 
>>> The weight of 8 comes from experience at Infoseek and Inktomi, two early 
>>> web search engines. With different relevance algorithms and totally 
>>> different evaluation and tuning systems, they settled on weights of 8 and 
>>> 7.5 for HTML titles. With the the two radically different system getting 
>>> the same number, I decided that was a property of the documents, not of the 
>>> search engines.
>>> 
>>> 

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
But when you change it to AND, a single misspelling means zero results. That is 
usually not helpful.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 8, 2019, at 10:43 AM, David Hastings  
> wrote:
> 
> is your default operator OR?
> change it to AND
> 
> 
> On Fri, Nov 8, 2019 at 11:30 AM Guilherme Viteri  wrote:
> 
>> HI Walter and Paras
>> 
>> I indexed it removing all the references to StopWordFilter and I went from
>> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid
>> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think
>> removing it completely is the way to go from the scenario we have, but I
>> appreciate the suggestion...
>> 
>> Yes the response is using fl=*
>> I am trying some combinations at the moment, but yet no success.
>> 
>> defType=edismax
>> q.alt=Lymphoid and a non-Lymphoid cell
>> Number of results=1599
>> Quite a considerable increase, even though reasonable meaningful results.
>> 
>> I am sorry but I didn't understand what do you want me to do exactly with
>> the lst (??) and qf and bf.
>> 
>> Thanks everyone with their inputs
>> 
>> 
>>> On 8 Nov 2019, at 06:45, Paras Lehana 
>> wrote:
>>> 
>>> Hi Guilherme
>>> 
>>> By accident, I ended up querying the using the default handler (/select)
>> and it worked.
>>> 
>>> You've just found the culprit. Thanks for giving the material I
>> requested. Your analysis chain is working as expected. I don't see any
>> issue in either StopWordFilter or your boosts. I also use a boost of 50
>> when boosting contextual suggestions (boosting "gold iphone" on a page of
>> iphone) but I take Walter's suggestion and would try to optimize my
>> weights. I agree that this 50 thing was not researched much about by us as
>> well (we never faced performance or relevance issues).
>>> 
>>> See the major difference in both the handlers - edismax. I'm pretty sure
>> that your problem lies in the parsing of queries (you can confirm that from
>> parsedquery key in debug of both JSON responses). I hope you have provided
>> the response with fl=*. Replace q with q.alt in your /search handler query
>> and I think you should start getting responses. That's because q.alt uses
>> standard parser. If you want to keep using edisMax, I suggest you to test
>> the responses removing some combination of lst (qf, bf) and find what's
>> restricting the documents to come up. I'm out of office today - would have
>> certainly tried analyzing the field values of the document in /select
>> request and compare it with qf/bq in solrconfig.xml /search. Do this for me
>> and you'd certainly find something.
>>> 
>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood > > wrote:
>>> I normally use a weight of 8 for the most important field, like title.
>> Other fields might get a 4 or 2.
>>> 
>>> I add a “pf” field with the weights doubled, so that phrase matches have
>> a higher weight.
>>> 
>>> The weight of 8 comes from experience at Infoseek and Inktomi, two early
>> web search engines. With different relevance algorithms and totally
>> different evaluation and tuning systems, they settled on weights of 8 and
>> 7.5 for HTML titles. With the the two radically different system getting
>> the same number, I decided that was a property of the documents, not of the
>> search engines.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org 
>>> http://observer.wunderwood.org/   (my
>> blog)
>>> 
 On Nov 7, 2019, at 9:03 AM, Guilherme Viteri > > wrote:
 
 Hi Wunder,
 
 My indexer takes quite a few hours to be executed I am shortening it to
>> run faster, but I also need to make sure it gives what we are expecting.
>> This implementation's been there for >4y, and massively used.
 
> In your edismax handlers, weights of 20, 50, and 100 are extremely
>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
>> of configuring Solr.
 I've inherited that implementation and I am really keen to adequate it,
>> what would you recommend ?
 
 Cheers
 Guilherme
 
> On 7 Nov 2019, at 14:43, Walter Underwood > > wrote:
> 
> Thanks for posting the files. Looking at schema.xml, I see that you
>> still are using StopFilterFactory. The first advice we gave you was to
>> remove that.
> 
> Remove StopFilterFactory everywhere and reindex.
> 
> You will continue to have problems matching stopwords until you do
>> that.
> 
> In your edismax handlers, weights of 20, 50, and 100 are extremely
>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
>> of configuring Solr.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/ 
>> 

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Erick Erickson
Look at the “mm” parameter, try setting it to 100%. Although that’t not 
entirely likely to do what you want either since virtually every doc will have 
“a” in it. But at least you’d get docs that have both terms.

you may also be able to search for things like “Lamin A” _only as a phrase_ and 
have some luck. But this is a gnarly problem in general. Some people have been 
able to substitute synonyms and/or shingles to make this work at the expense of 
a larger index.

This is a generic problem with context. “Lamin A” is really a “concept”, not 
just two words that happen to be near each other. Searching as a phrase is an 
OOB-but-naive way to try to make it more likely that the ranked results refer 
to the _concept_ of “Lamin A”. The assumption here is “if these two words 
appear next to each other, they’re more likely to be what I want”. I say 
“naive” because “Lamins: A new approach to...” would _also_ be found for a 
naive phrase search. (I have no idea whether such a title makes sense or not, 
but you figured that out already)...

To do this well you’d have to dive in to NLP/Machine learning.

I truly wish we could have the DWIM search algorithm (Do What I Mean)….

> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri  wrote:
> 
> HI Walter and Paras
> 
> I indexed it removing all the references to StopWordFilter and I went from 
> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid 
> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think 
> removing it completely is the way to go from the scenario we have, but I 
> appreciate the suggestion…
> 
> Yes the response is using fl=*
> I am trying some combinations at the moment, but yet no success.
> 
> defType=edismax
> q.alt=Lymphoid and a non-Lymphoid cell
> Number of results=1599
> Quite a considerable increase, even though reasonable meaningful results. 
> 
> I am sorry but I didn't understand what do you want me to do exactly with the 
> lst (??) and qf and bf.
> 
> Thanks everyone with their inputs
> 
> 
>> On 8 Nov 2019, at 06:45, Paras Lehana  wrote:
>> 
>> Hi Guilherme
>> 
>> By accident, I ended up querying the using the default handler (/select) and 
>> it worked. 
>> 
>> You've just found the culprit. Thanks for giving the material I requested. 
>> Your analysis chain is working as expected. I don't see any issue in either 
>> StopWordFilter or your boosts. I also use a boost of 50 when boosting 
>> contextual suggestions (boosting "gold iphone" on a page of iphone) but I 
>> take Walter's suggestion and would try to optimize my weights. I agree that 
>> this 50 thing was not researched much about by us as well (we never faced 
>> performance or relevance issues).  
>> 
>> See the major difference in both the handlers - edismax. I'm pretty sure 
>> that your problem lies in the parsing of queries (you can confirm that from 
>> parsedquery key in debug of both JSON responses). I hope you have provided 
>> the response with fl=*. Replace q with q.alt in your /search handler query 
>> and I think you should start getting responses. That's because q.alt uses 
>> standard parser. If you want to keep using edisMax, I suggest you to test 
>> the responses removing some combination of lst (qf, bf) and find what's 
>> restricting the documents to come up. I'm out of office today - would have 
>> certainly tried analyzing the field values of the document in /select 
>> request and compare it with qf/bq in solrconfig.xml /search. Do this for me 
>> and you'd certainly find something.  
>> 
>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood > > wrote:
>> I normally use a weight of 8 for the most important field, like title. Other 
>> fields might get a 4 or 2.
>> 
>> I add a “pf” field with the weights doubled, so that phrase matches have a 
>> higher weight.
>> 
>> The weight of 8 comes from experience at Infoseek and Inktomi, two early web 
>> search engines. With different relevance algorithms and totally different 
>> evaluation and tuning systems, they settled on weights of 8 and 7.5 for HTML 
>> titles. With the the two radically different system getting the same number, 
>> I decided that was a property of the documents, not of the search engines.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org 
>> http://observer.wunderwood.org/   (my blog)
>> 
>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri >> > wrote:
>>> 
>>> Hi Wunder,
>>> 
>>> My indexer takes quite a few hours to be executed I am shortening it to run 
>>> faster, but I also need to make sure it gives what we are expecting. This 
>>> implementation's been there for >4y, and massively used.
>>> 
 In your edismax handlers, weights of 20, 50, and 100 are extremely high. I 
 don’t think I’ve ever used a weight higher than 16 in a dozen years of 
 configuring Solr.
>>> I've inherited that implementation and I am really 

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Guilherme Viteri
OR

 



OR
explicit
edismax
*:*
name

 ...
   


> On 8 Nov 2019, at 16:43, David Hastings  wrote:
> 
> is your default operator OR?
> change it to AND
> 
> 
> On Fri, Nov 8, 2019 at 11:30 AM Guilherme Viteri  wrote:
> 
>> HI Walter and Paras
>> 
>> I indexed it removing all the references to StopWordFilter and I went from
>> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid
>> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think
>> removing it completely is the way to go from the scenario we have, but I
>> appreciate the suggestion...
>> 
>> Yes the response is using fl=*
>> I am trying some combinations at the moment, but yet no success.
>> 
>> defType=edismax
>> q.alt=Lymphoid and a non-Lymphoid cell
>> Number of results=1599
>> Quite a considerable increase, even though reasonable meaningful results.
>> 
>> I am sorry but I didn't understand what do you want me to do exactly with
>> the lst (??) and qf and bf.
>> 
>> Thanks everyone with their inputs
>> 
>> 
>>> On 8 Nov 2019, at 06:45, Paras Lehana 
>> wrote:
>>> 
>>> Hi Guilherme
>>> 
>>> By accident, I ended up querying the using the default handler (/select)
>> and it worked.
>>> 
>>> You've just found the culprit. Thanks for giving the material I
>> requested. Your analysis chain is working as expected. I don't see any
>> issue in either StopWordFilter or your boosts. I also use a boost of 50
>> when boosting contextual suggestions (boosting "gold iphone" on a page of
>> iphone) but I take Walter's suggestion and would try to optimize my
>> weights. I agree that this 50 thing was not researched much about by us as
>> well (we never faced performance or relevance issues).
>>> 
>>> See the major difference in both the handlers - edismax. I'm pretty sure
>> that your problem lies in the parsing of queries (you can confirm that from
>> parsedquery key in debug of both JSON responses). I hope you have provided
>> the response with fl=*. Replace q with q.alt in your /search handler query
>> and I think you should start getting responses. That's because q.alt uses
>> standard parser. If you want to keep using edisMax, I suggest you to test
>> the responses removing some combination of lst (qf, bf) and find what's
>> restricting the documents to come up. I'm out of office today - would have
>> certainly tried analyzing the field values of the document in /select
>> request and compare it with qf/bq in solrconfig.xml /search. Do this for me
>> and you'd certainly find something.
>>> 
>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood > > wrote:
>>> I normally use a weight of 8 for the most important field, like title.
>> Other fields might get a 4 or 2.
>>> 
>>> I add a “pf” field with the weights doubled, so that phrase matches have
>> a higher weight.
>>> 
>>> The weight of 8 comes from experience at Infoseek and Inktomi, two early
>> web search engines. With different relevance algorithms and totally
>> different evaluation and tuning systems, they settled on weights of 8 and
>> 7.5 for HTML titles. With the the two radically different system getting
>> the same number, I decided that was a property of the documents, not of the
>> search engines.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org 
>>> http://observer.wunderwood.org/   (my
>> blog)
>>> 
 On Nov 7, 2019, at 9:03 AM, Guilherme Viteri > > wrote:
 
 Hi Wunder,
 
 My indexer takes quite a few hours to be executed I am shortening it to
>> run faster, but I also need to make sure it gives what we are expecting.
>> This implementation's been there for >4y, and massively used.
 
> In your edismax handlers, weights of 20, 50, and 100 are extremely
>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
>> of configuring Solr.
 I've inherited that implementation and I am really keen to adequate it,
>> what would you recommend ?
 
 Cheers
 Guilherme
 
> On 7 Nov 2019, at 14:43, Walter Underwood > > wrote:
> 
> Thanks for posting the files. Looking at schema.xml, I see that you
>> still are using StopFilterFactory. The first advice we gave you was to
>> remove that.
> 
> Remove StopFilterFactory everywhere and reindex.
> 
> You will continue to have problems matching stopwords until you do
>> that.
> 
> In your edismax handlers, weights of 20, 50, and 100 are extremely
>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
>> of configuring Solr.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/ 
>> (my blog)
> 
>> On Nov 7, 

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
is your default operator OR?
change it to AND


On Fri, Nov 8, 2019 at 11:30 AM Guilherme Viteri  wrote:

> HI Walter and Paras
>
> I indexed it removing all the references to StopWordFilter and I went from
> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid
> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think
> removing it completely is the way to go from the scenario we have, but I
> appreciate the suggestion...
>
> Yes the response is using fl=*
> I am trying some combinations at the moment, but yet no success.
>
> defType=edismax
> q.alt=Lymphoid and a non-Lymphoid cell
> Number of results=1599
> Quite a considerable increase, even though reasonable meaningful results.
>
> I am sorry but I didn't understand what do you want me to do exactly with
> the lst (??) and qf and bf.
>
> Thanks everyone with their inputs
>
>
> > On 8 Nov 2019, at 06:45, Paras Lehana 
> wrote:
> >
> > Hi Guilherme
> >
> > By accident, I ended up querying the using the default handler (/select)
> and it worked.
> >
> > You've just found the culprit. Thanks for giving the material I
> requested. Your analysis chain is working as expected. I don't see any
> issue in either StopWordFilter or your boosts. I also use a boost of 50
> when boosting contextual suggestions (boosting "gold iphone" on a page of
> iphone) but I take Walter's suggestion and would try to optimize my
> weights. I agree that this 50 thing was not researched much about by us as
> well (we never faced performance or relevance issues).
> >
> > See the major difference in both the handlers - edismax. I'm pretty sure
> that your problem lies in the parsing of queries (you can confirm that from
> parsedquery key in debug of both JSON responses). I hope you have provided
> the response with fl=*. Replace q with q.alt in your /search handler query
> and I think you should start getting responses. That's because q.alt uses
> standard parser. If you want to keep using edisMax, I suggest you to test
> the responses removing some combination of lst (qf, bf) and find what's
> restricting the documents to come up. I'm out of office today - would have
> certainly tried analyzing the field values of the document in /select
> request and compare it with qf/bq in solrconfig.xml /search. Do this for me
> and you'd certainly find something.
> >
> > On Thu, 7 Nov 2019 at 21:00, Walter Underwood  > wrote:
> > I normally use a weight of 8 for the most important field, like title.
> Other fields might get a 4 or 2.
> >
> > I add a “pf” field with the weights doubled, so that phrase matches have
> a higher weight.
> >
> > The weight of 8 comes from experience at Infoseek and Inktomi, two early
> web search engines. With different relevance algorithms and totally
> different evaluation and tuning systems, they settled on weights of 8 and
> 7.5 for HTML titles. With the the two radically different system getting
> the same number, I decided that was a property of the documents, not of the
> search engines.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org 
> > http://observer.wunderwood.org/   (my
> blog)
> >
> >> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri  > wrote:
> >>
> >> Hi Wunder,
> >>
> >> My indexer takes quite a few hours to be executed I am shortening it to
> run faster, but I also need to make sure it gives what we are expecting.
> This implementation's been there for >4y, and massively used.
> >>
> >>> In your edismax handlers, weights of 20, 50, and 100 are extremely
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
> of configuring Solr.
> >> I've inherited that implementation and I am really keen to adequate it,
> what would you recommend ?
> >>
> >> Cheers
> >> Guilherme
> >>
> >>> On 7 Nov 2019, at 14:43, Walter Underwood  > wrote:
> >>>
> >>> Thanks for posting the files. Looking at schema.xml, I see that you
> still are using StopFilterFactory. The first advice we gave you was to
> remove that.
> >>>
> >>> Remove StopFilterFactory everywhere and reindex.
> >>>
> >>> You will continue to have problems matching stopwords until you do
> that.
> >>>
> >>> In your edismax handlers, weights of 20, 50, and 100 are extremely
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
> of configuring Solr.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org 
> >>> http://observer.wunderwood.org/ 
> (my blog)
> >>>
>  On Nov 7, 2019, at 6:56 AM, Guilherme Viteri  > wrote:
> 
>  Hi Paras, everyone
> 
>  Thank you again for your inputs and suggestions. I sorry to hear you
> had trouble with the attachments I will host it somewhere and share the
> links.
>  I don't tweak my index, I get the data from 

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Guilherme Viteri
HI Walter and Paras

I indexed it removing all the references to StopWordFilter and I went from 121 
results to near 20K as the search term q="Lymphoid and a non-Lymphoid cell" is 
matching entities such as "IFT A" or  "Lamin A". So I don't think removing it 
completely is the way to go from the scenario we have, but I appreciate the 
suggestion...

Yes the response is using fl=*
I am trying some combinations at the moment, but yet no success.

defType=edismax
q.alt=Lymphoid and a non-Lymphoid cell
Number of results=1599
Quite a considerable increase, even though reasonable meaningful results. 

I am sorry but I didn't understand what do you want me to do exactly with the 
lst (??) and qf and bf.

Thanks everyone with their inputs


> On 8 Nov 2019, at 06:45, Paras Lehana  wrote:
> 
> Hi Guilherme
> 
> By accident, I ended up querying the using the default handler (/select) and 
> it worked. 
> 
> You've just found the culprit. Thanks for giving the material I requested. 
> Your analysis chain is working as expected. I don't see any issue in either 
> StopWordFilter or your boosts. I also use a boost of 50 when boosting 
> contextual suggestions (boosting "gold iphone" on a page of iphone) but I 
> take Walter's suggestion and would try to optimize my weights. I agree that 
> this 50 thing was not researched much about by us as well (we never faced 
> performance or relevance issues).  
> 
> See the major difference in both the handlers - edismax. I'm pretty sure that 
> your problem lies in the parsing of queries (you can confirm that from 
> parsedquery key in debug of both JSON responses). I hope you have provided 
> the response with fl=*. Replace q with q.alt in your /search handler query 
> and I think you should start getting responses. That's because q.alt uses 
> standard parser. If you want to keep using edisMax, I suggest you to test the 
> responses removing some combination of lst (qf, bf) and find what's 
> restricting the documents to come up. I'm out of office today - would have 
> certainly tried analyzing the field values of the document in /select request 
> and compare it with qf/bq in solrconfig.xml /search. Do this for me and you'd 
> certainly find something.  
> 
> On Thu, 7 Nov 2019 at 21:00, Walter Underwood  > wrote:
> I normally use a weight of 8 for the most important field, like title. Other 
> fields might get a 4 or 2.
> 
> I add a “pf” field with the weights doubled, so that phrase matches have a 
> higher weight.
> 
> The weight of 8 comes from experience at Infoseek and Inktomi, two early web 
> search engines. With different relevance algorithms and totally different 
> evaluation and tuning systems, they settled on weights of 8 and 7.5 for HTML 
> titles. With the the two radically different system getting the same number, 
> I decided that was a property of the documents, not of the search engines.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/   (my blog)
> 
>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri > > wrote:
>> 
>> Hi Wunder,
>> 
>> My indexer takes quite a few hours to be executed I am shortening it to run 
>> faster, but I also need to make sure it gives what we are expecting. This 
>> implementation's been there for >4y, and massively used.
>> 
>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. I 
>>> don’t think I’ve ever used a weight higher than 16 in a dozen years of 
>>> configuring Solr.
>> I've inherited that implementation and I am really keen to adequate it, what 
>> would you recommend ?
>> 
>> Cheers
>> Guilherme
>> 
>>> On 7 Nov 2019, at 14:43, Walter Underwood >> > wrote:
>>> 
>>> Thanks for posting the files. Looking at schema.xml, I see that you still 
>>> are using StopFilterFactory. The first advice we gave you was to remove 
>>> that.
>>> 
>>> Remove StopFilterFactory everywhere and reindex.
>>> 
>>> You will continue to have problems matching stopwords until you do that.
>>> 
>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. I 
>>> don’t think I’ve ever used a weight higher than 16 in a dozen years of 
>>> configuring Solr.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org 
>>> http://observer.wunderwood.org/   (my blog)
>>> 
 On Nov 7, 2019, at 6:56 AM, Guilherme Viteri >>> > wrote:
 
 Hi Paras, everyone
 
 Thank you again for your inputs and suggestions. I sorry to hear you had 
 trouble with the attachments I will host it somewhere and share the links. 
 I don't tweak my index, I get the data from the graph database, create a 
 document as they are and save to solr.
 
 So, I am sending the new analysis screen querying the way you suggested. 
 Also the 

Re: Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-08 Thread Alexandre Rafalovitch
Something does not make sense, because your schema defines "title" as
the uniqueKey field, but your message talks about "id". Are you
absolutely sure that the Solr/collection you get an error for is the
same Solr where you are checking the schema?

Also, do you have a bit more of the error and stack trace. I find
"...or Unknown field" to be very puzzling. What are you trying to do
when you get this error?

Regards,
  Alex.

On Sat, 9 Nov 2019 at 01:05, Sthitaprajna  wrote:
>
> Thanks,
>
> I did reload after solr configuration upload to zk
> Yes i push the config set to zk and i can see all my changes are on cloud
> I turned off the managed schema
> Yes it has, ypu could have seen it if the attachment are available. I have 
> attached again may be it will be available.
>
> On Fri, 8 Nov 2019, 21:13 Erick Erickson,  wrote:
>>
>> Attachments are aggressively stripped by the mail server, so I can’t see 
>> them.
>>
>> Possibilities
>> - you didn’t reload your core/collection
>> - you didn’t push the configset to Zookeeper if using SolrCloud
>> - you are using the managed schema, which uses a file called 
>> “managed-schema” rather than classic, which uses schema.xml
>> - your input doesn’t really have a field “title”.
>> - the doc just doesn’t have a field called “title” in it when it’s sent to 
>> Solr.
>>
>>
>> Best,
>> Erick
>>
>> > On Nov 8, 2019, at 4:41 AM, Sthitaprajna  
>> > wrote:
>> >
>> > title
>>


Re: Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-08 Thread Sthitaprajna
Thanks,

I did reload after solr configuration upload to zk
Yes i push the config set to zk and i can see all my changes are on cloud
I turned off the managed schema
Yes it has, ypu could have seen it if the attachment are available. I have
attached again may be it will be available.

On Fri, 8 Nov 2019, 21:13 Erick Erickson,  wrote:

> Attachments are aggressively stripped by the mail server, so I can’t see
> them.
>
> Possibilities
> - you didn’t reload your core/collection
> - you didn’t push the configset to Zookeeper if using SolrCloud
> - you are using the managed schema, which uses a file called
> “managed-schema” rather than classic, which uses schema.xml
> - your input doesn’t really have a field “title”.
> - the doc just doesn’t have a field called “title” in it when it’s sent to
> Solr.
>
>
> Best,
> Erick
>
> > On Nov 8, 2019, at 4:41 AM, Sthitaprajna 
> wrote:
> >
> > title
>
>


Re: Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-08 Thread Erick Erickson
Attachments are aggressively stripped by the mail server, so I can’t see them.

Possibilities
- you didn’t reload your core/collection
- you didn’t push the configset to Zookeeper if using SolrCloud
- you are using the managed schema, which uses a file called “managed-schema” 
rather than classic, which uses schema.xml
- your input doesn’t really have a field “title”.
- the doc just doesn’t have a field called “title” in it when it’s sent to Solr.


Best,
Erick

> On Nov 8, 2019, at 4:41 AM, Sthitaprajna  
> wrote:
> 
> title



Re: Need some guidance to understand differences (infra, security etc) between LTS version and latest stable version.

2019-11-08 Thread Erick Erickson
Please read through the release notes and the Solr and Lucene CHANGES.txt files 
then ask specific questions.

Best,
Erick

> On Nov 8, 2019, at 4:10 AM, suyog joshi  wrote:
> 
> Hi Erik/Team,
> 
> Thanks for your help in previous query. Just have one other doubt, can you
> please assist on it ?
> 
> Q - Are there any major differences between current LTS version(7.7.x) and
> latest stable releases (8.x.x) in terms of security, stability, logging,
> monitoring, authentication etc ?
> 
> Any inputs of links pointing out major differences will be helpful.
> 
> Note - As per your recommendation, mostly we will be going with 8.x.x only,
> but would be really helpful to have any idea about such differences, which
> directly impacts infra setup.
> 
> Kindly guide. Thanks !!
> 
> Regards,
> Suyog Joshi 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Commit disabled

2019-11-08 Thread Erick Erickson
Please explain the use case more fully, as what you’re asking makes little 
sense. 

You say “manually indexed the item with changes”. How does that change get to 
Solr? The autocommit settings are all about how long it takes a doc _after_ 
it’s indexed in Solr to be searchable. How it gets to Solr has nothing to do 
with these settings.

Here’s more than you want to know about commits: 
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Also, you mentioned Sitecore, which uses Solr. Perhaps this more a question for 
Sitecore?

Best,
Erick

> On Nov 8, 2019, at 7:53 AM, Villacorta, David (Arlington) 
>  wrote:
> 
> Thanks for the feedback
> 
> Is there a config setting that can be used for explicit commit? I was 
> thinking the  should be handling this already?
> In our issue, the changes will only be reflected back to sitecore once we 
> manually indexed  the item with changes
> 
> Regards
> David Villacorta
> 
> -Original Message-
> From: Emir Arnautović [mailto:emir.arnauto...@sematext.com]
> Sent: Friday, November 08, 2019 7:53 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Commit disabled
> 
> Hi David,
> Index will get updated (hard commit is happening every 15s) but changes will 
> not be visible until you explicitly commit or you reload core. Note that Solr 
> restart reloads cores.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sematext.com_=DwIFAg=3NBXXUKukgVIjVXwt0Rin6h0GAxIKZespWWvcJx4w9c=hHHYgXsMRB8bPM5zNhKSH56W7zaV_SQcrmlwXd5ocLI0qfMw_ySz2DWVBjaVtE7v=Elg8qsST_TFKjg7Ti53TOSeAEzjrdqn_9X5gqbLJezw=5I_RPlXE6z0MGcaCMeNTekm90bN2m81prJ5pJUQFxEo=
> 
> 
> 
>> On 8 Nov 2019, at 12:19, Villacorta, David (Arlington) 
>>  wrote:
>> 
>> Just want to confirm, given the following config settings at solrconfig.xml:
>> 
>> 
>> ${solr.autoCommit.maxTime:15000}
>> false
>>   
>> 
>> 
>> ${solr.autoSoftCommit.maxTime:-1}
>> 
>> 
>> Solr index will not be updated unless created item in Sitecore is manually 
>> indexed, right?
>> 
>> Regards
>> David Villacorta
>> 
>> Notice of Confidentiality
>> This email contains confidential material prepared for the intended 
>> addressees only and it may contain intellectual property of Willis Towers 
>> Watson, its affiliates or a third party. This material may not be suitable 
>> for, and we accept no responsibility for, use in any context or for any 
>> purpose other than for the intended context and purpose. If you are not the 
>> intended recipient or if we did not authorize your receipt of this material, 
>> any use, distribution or copying of this material is strictly prohibited and 
>> may be unlawful. If you have received this communication in error, please 
>> return it to the original sender with the subject heading "Received in 
>> error," then delete any copies.
>> 
>> You may receive direct marketing communications from Willis Towers Watson. 
>> If so, you have the right to opt out of these communications. You can opt 
>> out of these communications or request a copy of Willis Towers Watson's 
>> privacy notice by emailing 
>> unsubscr...@willistowerswatson.com.
>> 
>> 
>> This e-mail has come to you from Willis Towers Watson US LLC
> 
> Notice of Confidentiality
> This email contains confidential material prepared for the intended 
> addressees only and it may contain intellectual property of Willis Towers 
> Watson, its affiliates or a third party. This material may not be suitable 
> for, and we accept no responsibility for, use in any context or for any 
> purpose other than for the intended context and purpose. If you are not the 
> intended recipient or if we did not authorize your receipt of this material, 
> any use, distribution or copying of this material is strictly prohibited and 
> may be unlawful. If you have received this communication in error, please 
> return it to the original sender with the subject heading "Received in 
> error," then delete any copies.
> 
> You may receive direct marketing communications from Willis Towers Watson. If 
> so, you have the right to opt out of these communications. You can opt out of 
> these communications or request a copy of Willis Towers Watson's privacy 
> notice by emailing 
> unsubscr...@willistowerswatson.com.
> 
> 
> This e-mail has come to you from Willis Towers Watson US LLC



RE: Commit disabled

2019-11-08 Thread Villacorta, David (Arlington)
Thanks for the feedback

Is there a config setting that can be used for explicit commit? I was thinking 
the  should be handling this already?
In our issue, the changes will only be reflected back to sitecore once we 
manually indexed  the item with changes

Regards
David Villacorta

-Original Message-
From: Emir Arnautović [mailto:emir.arnauto...@sematext.com]
Sent: Friday, November 08, 2019 7:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Commit disabled

Hi David,
Index will get updated (hard commit is happening every 15s) but changes will 
not be visible until you explicitly commit or you reload core. Note that Solr 
restart reloads cores.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch 
Consulting Support Training - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sematext.com_=DwIFAg=3NBXXUKukgVIjVXwt0Rin6h0GAxIKZespWWvcJx4w9c=hHHYgXsMRB8bPM5zNhKSH56W7zaV_SQcrmlwXd5ocLI0qfMw_ySz2DWVBjaVtE7v=Elg8qsST_TFKjg7Ti53TOSeAEzjrdqn_9X5gqbLJezw=5I_RPlXE6z0MGcaCMeNTekm90bN2m81prJ5pJUQFxEo=



> On 8 Nov 2019, at 12:19, Villacorta, David (Arlington) 
>  wrote:
>
> Just want to confirm, given the following config settings at solrconfig.xml:
>
> 
>  ${solr.autoCommit.maxTime:15000}
>  false
>
>
> 
>  ${solr.autoSoftCommit.maxTime:-1}
> 
>
> Solr index will not be updated unless created item in Sitecore is manually 
> indexed, right?
>
> Regards
> David Villacorta
>
> Notice of Confidentiality
> This email contains confidential material prepared for the intended 
> addressees only and it may contain intellectual property of Willis Towers 
> Watson, its affiliates or a third party. This material may not be suitable 
> for, and we accept no responsibility for, use in any context or for any 
> purpose other than for the intended context and purpose. If you are not the 
> intended recipient or if we did not authorize your receipt of this material, 
> any use, distribution or copying of this material is strictly prohibited and 
> may be unlawful. If you have received this communication in error, please 
> return it to the original sender with the subject heading "Received in 
> error," then delete any copies.
>
> You may receive direct marketing communications from Willis Towers Watson. If 
> so, you have the right to opt out of these communications. You can opt out of 
> these communications or request a copy of Willis Towers Watson's privacy 
> notice by emailing 
> unsubscr...@willistowerswatson.com.
>
>
> This e-mail has come to you from Willis Towers Watson US LLC

Notice of Confidentiality
This email contains confidential material prepared for the intended addressees 
only and it may contain intellectual property of Willis Towers Watson, its 
affiliates or a third party. This material may not be suitable for, and we 
accept no responsibility for, use in any context or for any purpose other than 
for the intended context and purpose. If you are not the intended recipient or 
if we did not authorize your receipt of this material, any use, distribution or 
copying of this material is strictly prohibited and may be unlawful. If you 
have received this communication in error, please return it to the original 
sender with the subject heading "Received in error," then delete any copies.

You may receive direct marketing communications from Willis Towers Watson. If 
so, you have the right to opt out of these communications. You can opt out of 
these communications or request a copy of Willis Towers Watson's privacy notice 
by emailing 
unsubscr...@willistowerswatson.com.


This e-mail has come to you from Willis Towers Watson US LLC


Re: Commit disabled

2019-11-08 Thread Emir Arnautović
Hi David,
Index will get updated (hard commit is happening every 15s) but changes will 
not be visible until you explicitly commit or you reload core. Note that Solr 
restart reloads cores.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 8 Nov 2019, at 12:19, Villacorta, David (Arlington) 
>  wrote:
> 
> Just want to confirm, given the following config settings at solrconfig.xml:
> 
> 
>  ${solr.autoCommit.maxTime:15000}
>  false
>
> 
> 
>  ${solr.autoSoftCommit.maxTime:-1}
> 
> 
> Solr index will not be updated unless created item in Sitecore is manually 
> indexed, right?
> 
> Regards
> David Villacorta
> 
> Notice of Confidentiality
> This email contains confidential material prepared for the intended 
> addressees only and it may contain intellectual property of Willis Towers 
> Watson, its affiliates or a third party. This material may not be suitable 
> for, and we accept no responsibility for, use in any context or for any 
> purpose other than for the intended context and purpose. If you are not the 
> intended recipient or if we did not authorize your receipt of this material, 
> any use, distribution or copying of this material is strictly prohibited and 
> may be unlawful. If you have received this communication in error, please 
> return it to the original sender with the subject heading "Received in 
> error," then delete any copies.
> 
> You may receive direct marketing communications from Willis Towers Watson. If 
> so, you have the right to opt out of these communications. You can opt out of 
> these communications or request a copy of Willis Towers Watson's privacy 
> notice by emailing 
> unsubscr...@willistowerswatson.com.
> 
> 
> This e-mail has come to you from Willis Towers Watson US LLC



Commit disabled

2019-11-08 Thread Villacorta, David (Arlington)
Just want to confirm, given the following config settings at solrconfig.xml:


  ${solr.autoCommit.maxTime:15000}
  false



  ${solr.autoSoftCommit.maxTime:-1}


Solr index will not be updated unless created item in Sitecore is manually 
indexed, right?

Regards
David Villacorta

Notice of Confidentiality
This email contains confidential material prepared for the intended addressees 
only and it may contain intellectual property of Willis Towers Watson, its 
affiliates or a third party. This material may not be suitable for, and we 
accept no responsibility for, use in any context or for any purpose other than 
for the intended context and purpose. If you are not the intended recipient or 
if we did not authorize your receipt of this material, any use, distribution or 
copying of this material is strictly prohibited and may be unlawful. If you 
have received this communication in error, please return it to the original 
sender with the subject heading "Received in error," then delete any copies.

You may receive direct marketing communications from Willis Towers Watson. If 
so, you have the right to opt out of these communications. You can opt out of 
these communications or request a copy of Willis Towers Watson's privacy notice 
by emailing 
unsubscr...@willistowerswatson.com.


This e-mail has come to you from Willis Towers Watson US LLC


Need some guidance to understand differences (infra, security etc) between LTS version and latest stable version.

2019-11-08 Thread suyog joshi
Hi Erik/Team,

Thanks for your help in previous query. Just have one other doubt, can you
please assist on it ?

Q - Are there any major differences between current LTS version(7.7.x) and
latest stable releases (8.x.x) in terms of security, stability, logging,
monitoring, authentication etc ?

Any inputs of links pointing out major differences will be helpful.

Note - As per your recommendation, mostly we will be going with 8.x.x only,
but would be really helpful to have any idea about such differences, which
directly impacts infra setup.

Kindly guide. Thanks !!

Regards,
Suyog Joshi 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-08 Thread Sthitaprajna
I am using solr 8.1.1

Created core/collection. Then after update schema & solrconfig i am getting
these errors.

Before adding id field to schema.xml







[image: sol1.PNG]

After adding id field on schema.xml


[image: sol2.PNG]

Here are my schema.xml & solrconfig.xml, what is wrong i am doing ?

schema.xml :







title





solrconfig:




  
${tests.luceneMatchVersion:LUCENE_CURRENT}

  


  ${solr.ulog.dir:}
  

 
${solr.autoCommit.maxTime:15000}
   false


 
${solr.autoSoftCommit.maxTime:-1}
 
   

  

  

  
  /select
  title:*
  
  server-enabled.txt
  

  

  

  

true
 
   


Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-08 Thread sthita



I am using solr 8.1.1

Created core/collection. Then after update schema & solrconfig i am getting
these errors. 
Before adding id field to schema.xml
 
After adding id field on schema.xml 
 

Here are my schema.xml & solrconfig.xml, what is wrong i am doing ?

schema.xml : 






title





solrconfig: 




 
${tests.luceneMatchVersion:LUCENE_CURRENT}
  

  


  ${solr.ulog.dir:}
  

  
${solr.autoCommit.maxTime:15000} 
   false 


  
${solr.autoSoftCommit.maxTime:-1} 
 
   

  

  

  
  /select
  title:*
  
  server-enabled.txt
  

  

  

  

true
 
   



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html