Re: 7.2.1 Solr collection sluggish

2018-08-15 Thread Shawn Heisey
On 8/8/2018 7:26 AM, Markus Jelsma wrote:
> We also took a good look at our monitoring, JVM heap was normal, IO was 
> normal, CPU was normal until the first restart. CPU usage is since the first 
> restart erratic but not worryingly off the charts, just not 'normal' as usual.

I've seen systems with severe performance issues where the user did not
see anything out of the ordinary for these metrics.  Sometimes this is
because they do not know what to look for.  What exactly does "normal"
mean to you?

> No changes were made to the collection for days before it became sluggish.
>
> CPU sampling with VisualVM is not helpful either, nothing really stands out, 
> especially when i compare it to another cluster that is still healthy. GC is 
> also normal.
>
> So, any ideas out here?

Here's the initial questions for a performance issue, to see whether
it'srelated to available memory or not:

* What OS is it running on?
* How much memory does the server have?
* How much index data is being handled by all Solr instances on that
machine?
* What is the total size of all Solr heaps on that machine?
* Is there any other software besides Solr on the machine?

If the OS is Linux or another POSIX operating system that has the gnu
version of "top" installed, then the following information is
*extremely* helpful, and can answer most of the questions asked above:

Run the "top" program.  Don't use htop or some other variant, it must be
the actual program named "top" and it should be the version of that
program from the Gnu projectso that Gnu keyboard shortcuts work.

Press shift-M to sort the listing by resident memory size.  If your
version of top is not from the Gnu project, this might not work ... but
this is an extremely important step in these instructions, so if you
don't have gnu top, you should see if you can get your version to sort
by the resident memory column, descending.

Grab a screenshot of the top listing and share it with a file-sharing
website.  Dropbox is usually a good choice.

If you're running Solr on Windows, you can use the program named
"Resource Monitor" to get something very similar.  In that program,
click on the Memory tab, click the "Working Set" column until it's
sorted descending, and grab a screenshot.  If necessary, expand the
columns so all the numbers can be seen clearly.

Thanks,
Shawn



Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
I am not using edismax (eventually I would like to get there) but I'm just
testing with standard query right now.  Original posting:

I'm trying to figure out why the multi-word synonym expansion is not
working correctly (or, at least what I'm misunderstanding).  Specifically,
when I test a standard query with Solr Admin it appears to still split on
whitespace.

Here is my setup:
- Solr 7.2.1
- synonym example: LCD => liquid crystal display
- q=myfield:LCD
- added parameter: sow=false
- myfield schema looks like (analyzer both applicable to index and query
time):


  


...


When debugging the query, Solr Admin shows the parsed query as:

myfield:liquid myfield:crystal myfield:display


(default operator being OR), as you can see it would incorrectly match on
any of those words, but not all, which is what I would expect...

Should it not do a phrase query search for the exact translated synonym,
"liquid crystal display"?



On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Also share your fieldType settings for myfield as well from your schema
> On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>
> > Aside from the screenshot issue, one  thing to check: are you searching
> > with defType=edismax ?
> >
> > As in
> > q=lcd=myfield=false=edismax
> >
> > ?
> >
> > Also sow=false should the the default on Solr 7 and above
> >
> > Doug
> >
> > On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:
> >
> >> I'm trying to figure out why the multi-word synonym expansion is not
> >> working
> >> correctly.  Specifically, when I test a standard query with Solr Admin
> it
> >> is
> >> still splitting on whitespace.
> >>
> >> Here is my setup:
> >> - Solr 7.2.1
> >> - synonym LCD => liquid crystal display
> >> - q=myfield:LCD
> >> - added: sow=false
> >> - myfield looks like:
> >>
> >>
> >> Solr Admin shows the parsed query looks like:
> >>
> >> myfield:liquid myfield:crystal myfield:display
> >>
> >> (default operator being OR), which would incorrectly match documents
> with
> >> any of those words, but not all, which is what I would expect...
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
> >
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull
Also share your fieldType settings for myfield as well from your schema
On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Aside from the screenshot issue, one  thing to check: are you searching
> with defType=edismax ?
>
> As in
> q=lcd=myfield=false=edismax
>
> ?
>
> Also sow=false should the the default on Solr 7 and above
>
> Doug
>
> On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:
>
>> I'm trying to figure out why the multi-word synonym expansion is not
>> working
>> correctly.  Specifically, when I test a standard query with Solr Admin it
>> is
>> still splitting on whitespace.
>>
>> Here is my setup:
>> - Solr 7.2.1
>> - synonym LCD => liquid crystal display
>> - q=myfield:LCD
>> - added: sow=false
>> - myfield looks like:
>>
>>
>> Solr Admin shows the parsed query looks like:
>>
>> myfield:liquid myfield:crystal myfield:display
>>
>> (default operator being OR), which would incorrectly match documents with
>> any of those words, but not all, which is what I would expect...
>>
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull
Aside from the screenshot issue, one  thing to check: are you searching
with defType=edismax ?

As in
q=lcd=myfield=false=edismax

?

Also sow=false should the the default on Solr 7 and above

Doug

On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:

> I'm trying to figure out why the multi-word synonym expansion is not
> working
> correctly.  Specifically, when I test a standard query with Solr Admin it
> is
> still splitting on whitespace.
>
> Here is my setup:
> - Solr 7.2.1
> - synonym LCD => liquid crystal display
> - q=myfield:LCD
> - added: sow=false
> - myfield looks like:
>
>
> Solr Admin shows the parsed query looks like:
>
> myfield:liquid myfield:crystal myfield:display
>
> (default operator being OR), which would incorrectly match documents with
> any of those words, but not all, which is what I would expect...
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Steve Rowe
Yes please.  That way we’ll see the whole thing.

--
Steve
www.lucidworks.com

> On Aug 15, 2018, at 7:20 PM, Roy Lim  wrote:
> 
> I've subscribed, shall I re-post it then via email?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
I've subscribed, shall I re-post it then via email?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Type ahead functionality using complex phrase query parser

2018-08-15 Thread Hanjan, Harinder
Keeping the field as string so that no analysis is done on it has yielded 
promising results.  



I will test more tomorrow and report back.

-Original Message-
From: Hanjan, Harinder [mailto:harinder.han...@calgary.ca] 
Sent: Wednesday, August 15, 2018 5:01 PM
To: solr-user@lucene.apache.org
Subject: [EXT] Type ahead functionality using complex phrase query parser

Hello!

I can't get Solr to give the results I would expect, would appreciate if 
someone could point me in the right direction here.

/select?q={!complexphrase}"gar*"
shows me the following terms

-garages

-garburator

-gardening

-gardens

-garage

-garden

-garbage

-century gardens

-community gardens

I was not expecting to see the bottom two.

--- schema.xml ---
  
  
  
   


--- query ---
/select?q={!complexphrase}"gar*"

--- solrconfig.xml ---

   
  explicit
  10
  suggestion
   


Thanks!
Harinder


NOTICE -
This communication is intended ONLY for the use of the person or entity named 
above and may contain information that is confidential or legally privileged. 
If you are not the intended recipient named above or a person responsible for 
delivering messages or communications to the intended recipient, YOU ARE HEREBY 
NOTIFIED that any use, distribution, or copying of this communication or any of 
the information contained in it is strictly prohibited. If you have received 
this communication in error, please notify us immediately by telephone and then 
destroy or delete this communication, or return it to us by mail if requested 
by us. The City of Calgary thanks you for your attention and co-operation.


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Steve Rowe
Roy,

Not sure of the point of Nabble when it strips content before passing messages 
on to the mailing list.  I’ve emailed them about this problem in the past but 
they have done nothing about it.

Updating a post on Nabble will never make it to the mailing list.  If you want 
us to be able to read your post in full, you should subscribe to the mailing 
list instead of using Nabble.  Instructions here: 
http://lucene.apache.org/solr/community.html#solr-user-list-solr-userluceneapacheorg

--
Steve
www.lucidworks.com

> On Aug 15, 2018, at 7:00 PM, Roy Lim  wrote:
> 
> Thanks, updated original post.  It just removed what I surrounded with the
> raw text markup, I've added it back without markup.  Not sure of the point
> of raw text if it's always removed 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Type ahead functionality using complex phrase query parser

2018-08-15 Thread Hanjan, Harinder
Hello!

I can't get Solr to give the results I would expect, would appreciate if 
someone could point me in the right direction here.

/select?q={!complexphrase}"gar*"
shows me the following terms

-garages

-garburator

-gardening

-gardens

-garage

-garden

-garbage

-century gardens

-community gardens

I was not expecting to see the bottom two.

--- schema.xml ---



  
  
   


--- query ---
/select?q={!complexphrase}"gar*"

--- solrconfig.xml ---

   
  explicit
  10
  suggestion
   


Thanks!
Harinder


NOTICE -
This communication is intended ONLY for the use of the person or entity named 
above and may contain information that is confidential or legally privileged. 
If you are not the intended recipient named above or a person responsible for 
delivering messages or communications to the intended recipient, YOU ARE HEREBY 
NOTIFIED that any use, distribution, or copying of this communication or any of 
the information contained in it is strictly prohibited. If you have received 
this communication in error, please notify us immediately by telephone and then 
destroy or delete this communication, or return it to us by mail if requested 
by us. The City of Calgary thanks you for your attention and co-operation.


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
Thanks, updated original post.  It just removed what I surrounded with the
raw text markup, I've added it back without markup.  Not sure of the point
of raw text if it's always removed 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Erick Erickson
The mail server strips pretty much all screenshots and attachments, so
I think some of the data you're trying to provide is missing from the
e-mail.

Best,
Erick

On Wed, Aug 15, 2018 at 3:27 PM, Roy Lim  wrote:
> I'm trying to figure out why the multi-word synonym expansion is not working
> correctly.  Specifically, when I test a standard query with Solr Admin it is
> still splitting on whitespace.
>
> Here is my setup:
> - Solr 7.2.1
> - synonym LCD => liquid crystal display
> - q=myfield:LCD
> - added: sow=false
> - myfield looks like:
>
>
> Solr Admin shows the parsed query looks like:
>
> myfield:liquid myfield:crystal myfield:display
>
> (default operator being OR), which would incorrectly match documents with
> any of those words, but not all, which is what I would expect...
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
I'm trying to figure out why the multi-word synonym expansion is not working
correctly.  Specifically, when I test a standard query with Solr Admin it is
still splitting on whitespace.

Here is my setup:
- Solr 7.2.1
- synonym LCD => liquid crystal display
- q=myfield:LCD
- added: sow=false
- myfield looks like:


Solr Admin shows the parsed query looks like:

myfield:liquid myfield:crystal myfield:display

(default operator being OR), which would incorrectly match documents with
any of those words, but not all, which is what I would expect...





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Authentication between solr-exporter and solrcloud

2018-08-15 Thread Dwane Hall
Hi Sushant,

I had the same issue and unfortunately the exporter does not appear to support 
a secure cluster.  I raised a JIRA feature request so please upvote it as this 
will increase the chances of it being included in a future release.

https://issues.apache.org/jira/browse/SOLR-12584

Thanks

From: Sushant Vengurlekar 
Sent: Wednesday, 15 August 2018 10:39 PM
To: solr-user@lucene.apache.org
Subject: Authentication between solr-exporter and solrcloud

I have followed this guide for monitoring the solrcloud
https://lucene.apache.org/solr/guide/7_3/monitoring-solr-with-prometheus-and-grafana.html

I have basic authentication enabled for the solrcloud. How do I configure
the solr-exporter to authenticate with the set username and password.

Thank you


Re: [OT] Lucene/Solr bug list caused by JVM's implementations

2018-08-15 Thread Erick Erickson
Christopher

The Lucene devs in particular already are, thanks.

On Wed, Aug 15, 2018 at 11:29 AM, Christopher Schultz
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Erick,
>
> On 8/15/18 12:56 PM, Erick Erickson wrote:
>> Also note that the OpenJDK devs regularly get to test very early
>> (unreleased) Java versions, which flushes out a lot of issues long
>> before a general release of Java
>
> We (dev@tomcat) get emails from Oracle about pre-release versions of
> Java releases as well. I'm sure you guys could get on that list so
> solr-dev@lucene can get notifications of pre-release versions to test
> to make sure Solr is good-to-go on each forthcoming version.
>
> - -chris
>
>> On Wed, Aug 15, 2018 at 5:25 AM, Shawn Heisey 
>> wrote:
>>> On 8/14/2018 8:07 PM, Yasufumi Mizoguchi wrote:

 I am looking for Lucene/Solr's bug list caused by JVM's
 implementations. And I found the following, but it seems not to
 be updated. https://wiki.apache.org/lucene-java/JavaBugs

 Where can I check the latest one?
>>>
>>>
>>> That is the only such list that I'm aware of.  There are not very
>>> many JVM bugs that affect Solr, and most of them have either been
>>> fixed or have a workaround.  I don't know the state of the IBM
>>> bugs ... except to say we strongly recommend that you don't run
>>> IBM Java.
>>>
>>> Best course of action:  Run the latest release of whatever Java
>>> version you have chosen, and only use Oracle or OpenJDK.  For
>>> Java 8, the current Oracle release is 8u181.  At this time, I
>>> wouldn't use Java 10 except in a development environment.  It's
>>> still early days for that -- newest Oracle version is 10.0.2.
>>>
>>> If you use the latest Oracle/OpenJDK release of Java 8, Solr
>>> ought to work quite well.
>>>
>>> Thanks, Shawn
>>>
>>
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt0cQgACgkQHPApP6U8
> pFg3FRAAi8BZICgv57H5zb6qxUh9Ic5scn0BBoT+lVRjpOkemoMQ8Fki3MViE69o
> jCABKQ70HItM1SKFu7a8xys8M/qs81gDHHdO/atW8Q9VzfrWlBJadrnIdVrvqW4y
> ZWBgUfx0fsucgKBGy9U7uFlvpTqj+H/gpRLm+1CEzi3Eb3V43YRkxy1FY5qPH2yA
> YApJMqyLWdSw9p2axwCSRswILfnCTI6VV0YXNAbaJIxJNqbmDat4yLGN/e6mMcP+
> +y8jndPMxQHTgB4OH2B1DLAIlop4p3/eF2Oy+PiKEziALlvd+TYSlc07XWYv+TuO
> NELBS0eEty/ay8wLSoCx+er9N18eiPa3eaMr7LQcRFs2wtBmYe8OGQpb9NC0/Bpm
> WNUuDGxc2rIGAnHYI7CTi/Y8ncCX1XBGstMGuYpnguoEWMSOUSrdWDVYjJJEB+cP
> qFCGRHdsK3qeXze2UQ/9FHNXYGjv9TwKYsTAX06ZZZPGC0VD8l0mxXeHfsx3aTQA
> u7/cmj+i86LFnjQ/gvsc4vUzXEk163Pgd/dutqpaMFmENTdN6cvBHHnj9T7TV/PJ
> WpJemYvje4xFZrFvbkdQ1XMij/s3+8gNqHYmaaTjZ7JHvnlbCDofqtwLbFH9hKDt
> n87iJUmTe6zGtn6/RUTrRA8ONH/5j2Yok+2reHqzgo2XSosqpc8=
> =9CZ5
> -END PGP SIGNATURE-


Re: Is Running the Same Filters on Index and Query Redundant?

2018-08-15 Thread Erick Erickson
Thomas:

If you go to the admin UI, pick a collection (or core) and go to  the
"analysis" page. Put different values in the "index" and "query" entry
boxes. Sometimes a picture is worth a thousand words ;).

And, indeed, synonyms are one of the prime filters that are often
different between the two phases. And do be a little careful about
WordDelimiterGraphFilterFactory, it's subtly different in the
examples, particularly  catenateWords="1" catenateNumbers="1" and
catenateWords="0" catenateNumbers="0" in index and query,
respectively. For catenateWords, the result for wi-fi
would be to index
wi
fi
wifi

but at index time you'd only get
wi
fi

but that's OK since those tokens are already in the index, as is
"wifi" if the search was "wifi"

Do note that when the filters for query and index _are_ identical, say
something like whitespaceTokenizer + lowercaseFilter, you can indeed
define only one, just leave off the "phase",
rather than   ... 

Meanwhile Andrea has taken care of you I see...

Best,
Erick

On Wed, Aug 15, 2018 at 12:17 PM, Andrea Gazzarini  wrote:
> Hi Thomas,
> as you know, the two analyzers play in a different moment, with a different
> input and a different goal for the corresponding output:
>
>  * index analyzer: input is a field value, output is used for building
>the index
>  * query analyzer: input is a (user) query string, output is used for
>building a (Solr) query
>
> At index time a term dictionary is built, and a retrieval time the output
> query tries to find a match in that dictionary. I wouldn't call it
> "redundancy" because even if the filter is the same, it is applied to a
> different input and it has a different goal.
>
> Some filters must be present both at index at query time because otherwise
> you won't find any match: if you put a lowercase filter only on the index
> side, queries with uppercase chars won't find any match. Some others don't
> (one example is the SynonymGraphFilter you've used only at query time). In
> general, everything depends on your needs and it's perfectly valid to have
> symmetric (index analyzer = query analyzer) and asymmetric text analysis
> (index analyzer != query analyzer).
>
> Without knowing your context is very hard to guess if there's something
> wrong in the configuration. What is the part of the analyzers you think is
> redundant?
>
> On top of that: in your chain the HTMLStripCharFilterFactory applied at
> query time is something unusual, because while it makes perfectly sense at
> index time (where I guess you index some HTML source), at query time I can't
> imagine a scenario where the user inputs queries containing HTML tags.
>
> Best,
> Andrea
>
>
> On 15/08/18 20:43, Zimmermann, Thomas wrote:
>>
>> Hi,
>>
>> We have the text field below configured on fields that are both stored and
>> indexed. It seems to me that applying the same filters on both index and
>> query would be redundant, and perhaps a waste of processing on the retrieval
>> side if the filter work was already done on the index side. Is this a fair
>> statement to make? Should I only be applying filters on one end of the
>> transaction?
>>
>> Thanks,
>> TZ
>>
>>
>> > positionIncrementGap="100">
>>
>>
>>
>>  
>>
>>  
>>
>>  > words="stopwords.txt" />
>>
>>  
>>
>>  > language="English" protected="protwords.txt"/>
>>
>>  
>>
>>
>>
>>
>>
>>  
>>
>>  
>>
>>  > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>
>>  > words="stopwords.txt" />
>>
>>  > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>
>>  
>>
>>  > language="English" protected="protwords.txt"/>
>>
>>  
>>
>>
>>
>>  
>>
>>
>>
>


Re: Is Running the Same Filters on Index and Query Redundant?

2018-08-15 Thread Andrea Gazzarini

You're welcome, great to hear you have less doubts.
I see you're using the SynonymGraphFilter followed by a StopFilter at 
query time: have a look at this post [1], you might find some useful info.


Best,
Andrea

[1] https://sease.io/2018/07/combining-synonyms-and-stopwords.html

On 15/08/18 21:47, Zimmermann, Thomas wrote:

Hi Andrea,

Thanks so much. I wasn¹t thinking in the correct perspective on the query
portion of the analyzer, but your explanation makes perfect sense. In my
head I imagine the result set of the query being transformed by the
filters, but in actuality the filter is being applied to the query itself
before processing. This makes sense on my end and I think it answer my
questions.

Excellent point on the html strip factory. I¹ll evaluate our use cases.

This was all brought about by switching from the deprecated synonym and
word delimiter factories to the new graph based factories, where we
stopped filtering on insert for those and switched to filtering on query
based on recommendations from the Solr Doc.

Thanks,
TZ

On 8/15/18, 3:17 PM, "Andrea Gazzarini"  wrote:


Hi Thomas,
as you know, the two analyzers play in a different moment, with a
different input and a different goal for the corresponding output:

  * index analyzer: input is a field value, output is used for building
the index
  * query analyzer: input is a (user) query string, output is used for
building a (Solr) query

At index time a term dictionary is built, and a retrieval time the
output query tries to find a match in that dictionary. I wouldn't call
it "redundancy" because even if the filter is the same, it is applied to
a different input and it has a different goal.

Some filters must be present both at index at query time because
otherwise you won't find any match: if you put a lowercase filter only
on the index side, queries with uppercase chars won't find any match.
Some others don't (one example is the SynonymGraphFilter you've used
only at query time). In general, everything depends on your needs and
it's perfectly valid to have symmetric (index analyzer = query analyzer)
and asymmetric text analysis (index analyzer != query analyzer).

Without knowing your context is very hard to guess if there's something
wrong in the configuration. What is the part of the analyzers you think
is redundant?

On top of that: in your chain the HTMLStripCharFilterFactory applied at
query time is something unusual, because while it makes perfectly sense
at index time (where I guess you index some HTML source), at query time
I can't imagine a scenario where the user inputs queries containing HTML
tags.

Best,
Andrea

On 15/08/18 20:43, Zimmermann, Thomas wrote:

Hi,

We have the text field below configured on fields that are both stored
and indexed. It seems to me that applying the same filters on both index
and query would be redundant, and perhaps a waste of processing on the
retrieval side if the filter work was already done on the index side. Is
this a fair statement to make? Should I only be applying filters on one
end of the transaction?

Thanks,
TZ


 



  

  

  

  

  

  





  

  

  

  

  

  

  

  



  







Re: Is Running the Same Filters on Index and Query Redundant?

2018-08-15 Thread Zimmermann, Thomas
Hi Andrea,

Thanks so much. I wasn¹t thinking in the correct perspective on the query
portion of the analyzer, but your explanation makes perfect sense. In my
head I imagine the result set of the query being transformed by the
filters, but in actuality the filter is being applied to the query itself
before processing. This makes sense on my end and I think it answer my
questions. 

Excellent point on the html strip factory. I¹ll evaluate our use cases.

This was all brought about by switching from the deprecated synonym and
word delimiter factories to the new graph based factories, where we
stopped filtering on insert for those and switched to filtering on query
based on recommendations from the Solr Doc.

Thanks,
TZ

On 8/15/18, 3:17 PM, "Andrea Gazzarini"  wrote:

>Hi Thomas,
>as you know, the two analyzers play in a different moment, with a
>different input and a different goal for the corresponding output:
>
>  * index analyzer: input is a field value, output is used for building
>the index
>  * query analyzer: input is a (user) query string, output is used for
>building a (Solr) query
>
>At index time a term dictionary is built, and a retrieval time the
>output query tries to find a match in that dictionary. I wouldn't call
>it "redundancy" because even if the filter is the same, it is applied to
>a different input and it has a different goal.
>
>Some filters must be present both at index at query time because
>otherwise you won't find any match: if you put a lowercase filter only
>on the index side, queries with uppercase chars won't find any match.
>Some others don't (one example is the SynonymGraphFilter you've used
>only at query time). In general, everything depends on your needs and
>it's perfectly valid to have symmetric (index analyzer = query analyzer)
>and asymmetric text analysis (index analyzer != query analyzer).
>
>Without knowing your context is very hard to guess if there's something
>wrong in the configuration. What is the part of the analyzers you think
>is redundant?
>
>On top of that: in your chain the HTMLStripCharFilterFactory applied at
>query time is something unusual, because while it makes perfectly sense
>at index time (where I guess you index some HTML source), at query time
>I can't imagine a scenario where the user inputs queries containing HTML
>tags.
>
>Best,
>Andrea
>
>On 15/08/18 20:43, Zimmermann, Thomas wrote:
>> Hi,
>>
>> We have the text field below configured on fields that are both stored
>>and indexed. It seems to me that applying the same filters on both index
>>and query would be redundant, and perhaps a waste of processing on the
>>retrieval side if the filter work was already done on the index side. Is
>>this a fair statement to make? Should I only be applying filters on one
>>end of the transaction?
>>
>> Thanks,
>> TZ
>>
>>
>> >positionIncrementGap="100">
>>
>>
>>
>>  
>>
>>  
>>
>>  >words="stopwords.txt" />
>>
>>  
>>
>>  >language="English" protected="protwords.txt"/>
>>
>>  
>>
>>
>>
>>
>>
>>  
>>
>>  
>>
>>  >synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>
>>  >words="stopwords.txt" />
>>
>>  >generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>
>>  
>>
>>  >language="English" protected="protwords.txt"/>
>>
>>  
>>
>>
>>
>>  
>>
>>
>>
>



Re: Is Running the Same Filters on Index and Query Redundant?

2018-08-15 Thread Andrea Gazzarini

Hi Thomas,
as you know, the two analyzers play in a different moment, with a 
different input and a different goal for the corresponding output:


 * index analyzer: input is a field value, output is used for building
   the index
 * query analyzer: input is a (user) query string, output is used for
   building a (Solr) query

At index time a term dictionary is built, and a retrieval time the 
output query tries to find a match in that dictionary. I wouldn't call 
it "redundancy" because even if the filter is the same, it is applied to 
a different input and it has a different goal.


Some filters must be present both at index at query time because 
otherwise you won't find any match: if you put a lowercase filter only 
on the index side, queries with uppercase chars won't find any match. 
Some others don't (one example is the SynonymGraphFilter you've used 
only at query time). In general, everything depends on your needs and 
it's perfectly valid to have symmetric (index analyzer = query analyzer) 
and asymmetric text analysis (index analyzer != query analyzer).


Without knowing your context is very hard to guess if there's something 
wrong in the configuration. What is the part of the analyzers you think 
is redundant?


On top of that: in your chain the HTMLStripCharFilterFactory applied at 
query time is something unusual, because while it makes perfectly sense 
at index time (where I guess you index some HTML source), at query time 
I can't imagine a scenario where the user inputs queries containing HTML 
tags.


Best,
Andrea

On 15/08/18 20:43, Zimmermann, Thomas wrote:

Hi,

We have the text field below configured on fields that are both stored and 
indexed. It seems to me that applying the same filters on both index and query 
would be redundant, and perhaps a waste of processing on the retrieval side if 
the filter work was already done on the index side. Is this a fair statement to 
make? Should I only be applying filters on one end of the transaction?

Thanks,
TZ




   

 

 

 

 

 

 

   

   

 

 

 

 

 

 

 

 

   

 







Is Running the Same Filters on Index and Query Redundant?

2018-08-15 Thread Zimmermann, Thomas
Hi,

We have the text field below configured on fields that are both stored and 
indexed. It seems to me that applying the same filters on both index and query 
would be redundant, and perhaps a waste of processing on the retrieval side if 
the filter work was already done on the index side. Is this a fair statement to 
make? Should I only be applying filters on one end of the transaction?

Thanks,
TZ


   

  













  

  

















  






Re: [OT] Lucene/Solr bug list caused by JVM's implementations

2018-08-15 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 8/15/18 12:56 PM, Erick Erickson wrote:
> Also note that the OpenJDK devs regularly get to test very early 
> (unreleased) Java versions, which flushes out a lot of issues long 
> before a general release of Java

We (dev@tomcat) get emails from Oracle about pre-release versions of
Java releases as well. I'm sure you guys could get on that list so
solr-dev@lucene can get notifications of pre-release versions to test
to make sure Solr is good-to-go on each forthcoming version.

- -chris

> On Wed, Aug 15, 2018 at 5:25 AM, Shawn Heisey 
> wrote:
>> On 8/14/2018 8:07 PM, Yasufumi Mizoguchi wrote:
>>> 
>>> I am looking for Lucene/Solr's bug list caused by JVM's
>>> implementations. And I found the following, but it seems not to
>>> be updated. https://wiki.apache.org/lucene-java/JavaBugs
>>> 
>>> Where can I check the latest one?
>> 
>> 
>> That is the only such list that I'm aware of.  There are not very
>> many JVM bugs that affect Solr, and most of them have either been
>> fixed or have a workaround.  I don't know the state of the IBM
>> bugs ... except to say we strongly recommend that you don't run
>> IBM Java.
>> 
>> Best course of action:  Run the latest release of whatever Java
>> version you have chosen, and only use Oracle or OpenJDK.  For
>> Java 8, the current Oracle release is 8u181.  At this time, I
>> wouldn't use Java 10 except in a development environment.  It's
>> still early days for that -- newest Oracle version is 10.0.2.
>> 
>> If you use the latest Oracle/OpenJDK release of Java 8, Solr
>> ought to work quite well.
>> 
>> Thanks, Shawn
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt0cQgACgkQHPApP6U8
pFg3FRAAi8BZICgv57H5zb6qxUh9Ic5scn0BBoT+lVRjpOkemoMQ8Fki3MViE69o
jCABKQ70HItM1SKFu7a8xys8M/qs81gDHHdO/atW8Q9VzfrWlBJadrnIdVrvqW4y
ZWBgUfx0fsucgKBGy9U7uFlvpTqj+H/gpRLm+1CEzi3Eb3V43YRkxy1FY5qPH2yA
YApJMqyLWdSw9p2axwCSRswILfnCTI6VV0YXNAbaJIxJNqbmDat4yLGN/e6mMcP+
+y8jndPMxQHTgB4OH2B1DLAIlop4p3/eF2Oy+PiKEziALlvd+TYSlc07XWYv+TuO
NELBS0eEty/ay8wLSoCx+er9N18eiPa3eaMr7LQcRFs2wtBmYe8OGQpb9NC0/Bpm
WNUuDGxc2rIGAnHYI7CTi/Y8ncCX1XBGstMGuYpnguoEWMSOUSrdWDVYjJJEB+cP
qFCGRHdsK3qeXze2UQ/9FHNXYGjv9TwKYsTAX06ZZZPGC0VD8l0mxXeHfsx3aTQA
u7/cmj+i86LFnjQ/gvsc4vUzXEk163Pgd/dutqpaMFmENTdN6cvBHHnj9T7TV/PJ
WpJemYvje4xFZrFvbkdQ1XMij/s3+8gNqHYmaaTjZ7JHvnlbCDofqtwLbFH9hKDt
n87iJUmTe6zGtn6/RUTrRA8ONH/5j2Yok+2reHqzgo2XSosqpc8=
=9CZ5
-END PGP SIGNATURE-


Re: Lucene/Solr bug list caused by JVM's implementations

2018-08-15 Thread Erick Erickson
Also note that the OpenJDK devs regularly get to test very early
(unreleased) Java versions, which flushes out a lot of issues long
before a general release of Java

On Wed, Aug 15, 2018 at 5:25 AM, Shawn Heisey  wrote:
> On 8/14/2018 8:07 PM, Yasufumi Mizoguchi wrote:
>>
>> I am looking for Lucene/Solr's bug list caused by JVM's implementations.
>> And I found the following, but it seems not to be updated.
>> https://wiki.apache.org/lucene-java/JavaBugs
>>
>> Where can I check the latest one?
>
>
> That is the only such list that I'm aware of.  There are not very many JVM
> bugs that affect Solr, and most of them have either been fixed or have a
> workaround.  I don't know the state of the IBM bugs ... except to say we
> strongly recommend that you don't run IBM Java.
>
> Best course of action:  Run the latest release of whatever Java version you
> have chosen, and only use Oracle or OpenJDK.  For Java 8, the current Oracle
> release is 8u181.  At this time, I wouldn't use Java 10 except in a
> development environment.  It's still early days for that -- newest Oracle
> version is 10.0.2.
>
> If you use the latest Oracle/OpenJDK release of Java 8, Solr ought to work
> quite well.
>
> Thanks,
> Shawn
>


Solr changing the search when given many qf fields?

2018-08-15 Thread Aaron Gibbons
I found a tipping point where the search being built changes with the
number of qf fields being passed in.

Example search: "foo bar"
solr 7.2.1
select?q.op=AND=edismax=foo bar

Debugging the query you can see it results in:
"parsedquery_toString":"+(+(text:foo) +(text:bar))"

Adding more qf values you get:
"text name_text"
"parsedquery_toString":"+(+(name_text:foo | text:foo) +(name_text:bar |
text:bar))"
"text name_text city_text"
"parsedquery_toString":"+(+(city_text:foo | name_text:foo | text:foo)
+(city_text:bar | name_text:bar | text:bar))"

The search continues to build this way until I get to a certain amount of
qf values.

Large number of qf values:
"34 values.."
"parsedquery_toString":"+(+((+comments_text:foo +comments_text:bar) |
(+zip_text:foo +zip_text:bar) | (+city_text:foo +city_text:bar) |
(+street_address_text:foo +street_address_text:bar) |
(+street_address_two_text:foo +street_address_two_text:bar) |
(+state_text:foo +state_text:bar)..."
Now the search is requiring both foo and bar to be in each qf field in the
search, not foo to be in any qf field and bar to be in any qf field. I had
to cut the number of qf values down to 15 to get it back to the correct
search.

Why is the search changing? Is there any way around this or a better way we
should be doing the search?
I realize we could copy all of the fields to the default text field. However,
most of the fields are searchable individually as well as keyword
searchable so specifying the fields vs using the default text field makes
sense in that respect.

Thank you,
Aaron


Authentication between solr-exporter and solrcloud

2018-08-15 Thread Sushant Vengurlekar
I have followed this guide for monitoring the solrcloud
https://lucene.apache.org/solr/guide/7_3/monitoring-solr-with-prometheus-and-grafana.html

I have basic authentication enabled for the solrcloud. How do I configure
the solr-exporter to authenticate with the set username and password.

Thank you


Re: collections replicas still in Recovery Mode after restarting Solr

2018-08-15 Thread Shawn Heisey

On 8/15/2018 1:26 AM, Derek Poh wrote:

We have a setup of 2 servers, running Solr 6.6.2, on production.
There are 5 collections.
All collection are created as 1 shard x 2 replicas.

4 of the collections have this issue.
A replica of each of this 4 collections is in Recovery Mode. The 
affected replicas are on the same server or node.
I noticed there is no Leader node indicated for this 4 collections in 
the Solr Admin. This is the screenshot of the Solr Admin 
http://imagebucket.net/pmndqkijla5c/solr_admin.PNG This is the 
commands I used to stop and start the solr process, bin/solr stop -p 
8983 bin/solr start -cloud -p 8983 -s "/apps/search/solr-6.6.2/home" 
-z hktszk1:2181,hktszk2:2181,hktszk3:2181 May I know how can I bring 
up this replicas? Derek 


We see reports of this happening occasionally on different versions.

From what I've seen, you may need to restart ALL of your Solr servers 
at least once to clear the problem, even the ones that seem to be 
working.  It may take more than one restart.


If there continue to be problems, share the solr.log file from each 
server so we can look at what the errors are.


Thanks,
Shawn



Re: Lucene/Solr bug list caused by JVM's implementations

2018-08-15 Thread Shawn Heisey

On 8/14/2018 8:07 PM, Yasufumi Mizoguchi wrote:

I am looking for Lucene/Solr's bug list caused by JVM's implementations.
And I found the following, but it seems not to be updated.
https://wiki.apache.org/lucene-java/JavaBugs

Where can I check the latest one?


That is the only such list that I'm aware of.  There are not very many 
JVM bugs that affect Solr, and most of them have either been fixed or 
have a workaround.  I don't know the state of the IBM bugs ... except to 
say we strongly recommend that you don't run IBM Java.


Best course of action:  Run the latest release of whatever Java version 
you have chosen, and only use Oracle or OpenJDK.  For Java 8, the 
current Oracle release is 8u181.  At this time, I wouldn't use Java 10 
except in a development environment.  It's still early days for that -- 
newest Oracle version is 10.0.2.


If you use the latest Oracle/OpenJDK release of Java 8, Solr ought to 
work quite well.


Thanks,
Shawn



collections replicas still in Recovery Mode after restarting Solr

2018-08-15 Thread Derek Poh

Hi
We have a setup of 2 servers, running Solr 6.6.2, on production.
There are 5 collections.
All collection are created as 1 shard x 2 replicas.

4 of the collections have this issue.
A replica of each of this 4 collections is in Recovery Mode. The 
affected replicas are on the same server or node.
I noticed there is no Leader node indicated for this 4 collections in 
the Solr Admin. This is the screenshot of the Solr Admin 
http://imagebucket.net/pmndqkijla5c/solr_admin.PNG This is the commands 
I used to stop and start the solr process, bin/solr stop -p 8983 
bin/solr start -cloud -p 8983 -s "/apps/search/solr-6.6.2/home" -z 
hktszk1:2181,hktszk2:2181,hktszk3:2181 May I know how can I bring up 
this replicas? Derek


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.