Re: Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

2020-03-16 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I don't think you can synonym-ize both the multi-token phrase and each individual token in the multi-token phrase at the same time. But anyone else feel free to chime in! Best, Audrey Lorberfeld On 3/16/20, 12:40 PM, "atin janki" wrote: I aim to achieve an expansion like -

Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

2020-03-16 Thread atin janki
I aim to achieve an expansion like - Synonym(soap powder) + Synonym(soap) + Synonym (powder) which is not happening because of the Synonym expansion is being done at the moment. At the moment, using Synonym Graph Filter with StandardTokenizer and sow = false , expands as - Synonym(soap

Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

2020-03-16 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
To confirm, you want a synonym like "soap powder" to map onto synonyms like "hand soap," "hygiene products," etc? As in, more of a cognitive synonym mapping where you feed synonyms that only apply to the multi-token phrase as a whole? On 3/16/20, 12:17 PM, "atin janki" wrote: Using

Re: Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-28 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
true, wouldn't Selection to Display be binary? I.e. it's > either 1/# of suggestions displayed (assuming this is a constant) or 0? > > Best, > Audrey > > > > From: Paras Lehana > Sent: Thursday, February 27, 2020 2:58:25 AM > To: s

Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-28 Thread Paras Lehana
gt; > Best, > Audrey > > > > From: Paras Lehana > Sent: Thursday, February 27, 2020 2:58:25 AM > To: solr-user@lucene.apache.org > Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation > > Hi Audrey, > > For MRR, we assume tha

Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-27 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
, wouldn't Selection to Display be binary? I.e. it's either 1/# of suggestions displayed (assuming this is a constant) or 0? Best, Audrey From: Paras Lehana Sent: Thursday, February 27, 2020 2:58:25 AM To: solr-user@lucene.apache.org Subject: [EXTERNAL] Re: Re:

Re: Re: Re: Query Autocomplete Evaluation

2020-02-26 Thread Paras Lehana
Hi Audrey, For MRR, we assume that if a suggestion is selected, it's relevant. It's also assumed that the user will always click the highest relevant suggestion. Thus, we calculate position selection for each selection. If still, I'm not understanding your question correctly, feel free to contact

Re: Re: Re: Query Autocomplete Evaluation

2020-02-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
This article http://wwwconference.org/proceedings/www2011/proceedings/p107.pdf also indicates that MRR needs binary relevance labels, p. 114: "To this end, we selected a random sample of 198 (query, context) pairs from the set of 7,311 pairs, and manually tagged each of them as related (i.e.,

Re: Re: Query Autocomplete Evaluation

2020-02-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Thank you, Walter & Paras! So, from the MRR equation, I was under the impression the suggestions all needed a binary label (0,1) indicating relevance.* But it's great to know that you guys use proxies for relevance, such as clicks. *The reason I think MRR has to have binary relevance labels

Re: Re: Query Autocomplete Evaluation

2020-02-24 Thread Paras Lehana
Hey Audrey, I assume MRR is about the ranking of the intended suggestion. For this, no human judgement is required. We track position selection - the position (1-10) of the selected suggestion. For example, this is our recent numbers: Position 1 Selected (B3) 107,699 Position 2 Selected (B4)

Re: Re: Query Autocomplete Evaluation

2020-02-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi Paras, This is SO helpful, thank you. Quick question about your MRR metric -- do you have binary human judgements for your suggestions? If no, how do you label suggestions successful or not? Best, Audrey On 2/24/20, 2:27 AM, "Paras Lehana" wrote: Hi Audrey, I work for

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
Make phrases into single tokens at indexing and query time. Let the engine do the rest of the work. For example, “subunits of the army” can become “subunitsofthearmy” or “subunits_of_the_army”. We used patterns to choose phrases, so “word word”, “word glue word”, or “word glue glue word” could

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
interesting, i cant seem to find anything on Phrase IDF, dont suppose you have a link or two i could look at by chance? On Mon, Feb 17, 2020 at 1:48 PM Walter Underwood wrote: > At Infoseek, we used “glue words” to build phrase tokens. It was really > effective. > Phrase IDF is powerful stuff.

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
At Infoseek, we used “glue words” to build phrase tokens. It was really effective. Phrase IDF is powerful stuff. Luckily for you, the patent on that has expired. :-) wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 17, 2020, at 10:46 AM, David

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
i use stop words for building shingles into "interesting phrases" for my machine teacher/students, so i wouldnt say theres no reason, however my use case is very specific. Otherwise yeah, theyre gone for all practical reasons/search scenarios. On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
Why are you using stopwords? I would need a really, really good reason to use those. Stopwords are an obsolete technique from 16-bit processors. I’ve never used them and I’ve been a search engineer since 1997. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-31 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi all, reviving this thread. For those of you who use an external file for your suggestions, how do you decide from your query logs what suggestions to include? Just starting out with some exploratory analysis of clicks, dwell times, etc., and would love to hear from the community any advise.

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-26 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Oh, great! Thank you, this is helpful! On 1/24/20, 6:43 PM, "Walter Underwood" wrote: Click-based weights are vulnerable to spamming. Some of us fondly remember when Google was showing Microsoft as the first hit for “evil empire” thanks to a click attack. For our ecommerce

Re: Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
David, True! But we are hoping that these are purely seen as suggestions and that people, if they know exactly what they are wanting to type/looking for, will simply ignore the dropdown options. On 1/24/20, 10:03 AM, "David Hastings" wrote: This is a really cool idea! My only concern

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Lucky Sharma
Hi Audrey, As suggested by Erik, you can index the data into a seperate collection and You can instead of adding weights inthe document you can also use LTR(Learning to Rank) with in Solr to rerank on the documents. And also to increase more relevance with in the Autosuggestion and making

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread David Hastings
This is a really cool idea! My only concern is that the edge case searches, where a user knows exactly what they want to find, would be autocomplete into something that happens to be more "successful" rather than what they were looking for. for example, i want to know the legal implications of

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Lucky Sharma
Hi Audrey, As suggested by Erik, you can index the data into a seperate collection and You can instead of adding weights inthe document you can also use LTR with in Solr to rerank on the features. Regards, Lucky Sharma On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - audrey.lorberf...@ibm.com,

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi Alessandro, I'm so happy there is someone who's done extensive work with QAC here! Right now, we measure nDCG via a Dynamic Bayesian Network. To break it down, we: - use a DBN model to generate a "score" for each query_url pair. - We then plug that score into a mathematical formula we

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Erik, Thank you! Yes, that's exactly how we were thinking of architecting it. And our ML engineer suggested something else for the suggestion weights, actually -- to build a model that would programmatically update the weights based on those suggestions' live clicks @ position k, etc. Pretty

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-23 Thread Erik Hatcher
It's a great idea. And then index that file into a separate lean collection of just the suggestions, along with the weight as another field on those documents, to use for ranking them at query time with standard /select queries. (this separate suggest collection would also have appropriate

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-23 Thread Alessandro Benedetti
I have been working extensively on query autocompletion, these blogs should be helpful to you: https://sease.io/2015/07/solr-you-complete-me.html https://sease.io/2018/06/apache-lucene-blendedinfixsuggester-how-it-works-bugs-and-improvements.html You idea of using search quality evaluation to

Re: Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hm, I'm not sure what you mean, but I am pretty new to Solr. Apologies! On 1/20/20, 12:01 PM, "fiedzia" wrote: >From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you >would have to type: > >Regional sales

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread David Hastings
Not a bad idea at all, however ive never used an external file before, just a field in the index, so not an area im familiar with On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > David, > > Thank you, that is useful. So, would you recommend using a (clean)

Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread fiedzia
>From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you >would have to type: > >Regional sales manager -> director of sales, area manager that works for searching, but because everything is in the same position, searching for

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
David, Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad

Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you would have to type: Regional sales manager -> director of sales, area manager I do not believe you can chain synonyms. Re: bigrams/trigrams, I was more interested in you

Re: Re: Handling overlapping synonyms

2020-01-20 Thread fiedzia
> what is the reasoning behind adding the bigrams and trigrams manually like that? Maybe if we knew the end goal, we could figure out a different strategy. Happy that at least the matching is working now! I have large amount of synonyms and keep adding new ones, some of them partially overlap.

Re: Re: Handling overlapping synonyms

2020-01-17 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hmm what is the reasoning behind adding the bigrams and trigrams manually like that? Maybe if we knew the end goal, we could figure out a different strategy. Happy that at least the matching is working now! On 1/17/20, 10:28 AM, "fiedzia" wrote: > Doing it the other way (new york

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Oh I see I see -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:21 PM, "David Hastings" wrote: oh i see what you mean, sorry, i explained it incorrectly. those sentences are what would be in the index, and a general search for 'rush

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
How can a field itself be tagged with a part of speech? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:12 PM, "David Hastings" wrote: nope, i boost the fields already tagged at query time against teh query On Fri, Oct 25, 2019 at

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
oh i see what you mean, sorry, i explained it incorrectly. those sentences are what would be in the index, and a general search for 'rush limbaugh' would come back with results where he is an entity higher than if it was two words in a sentence On Fri, Oct 25, 2019 at 12:12 PM David Hastings <

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
nope, i boost the fields already tagged at query time against teh query On Fri, Oct 25, 2019 at 12:11 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > So then you do run your POS tagger at query-time, Dave? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM >

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
So then you do run your POS tagger at query-time, Dave? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:06 PM, "David Hastings" wrote: I use them for query boosting, so if someone searches for: i dont want to rush limbaugh out the

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Nicolas, Do you use the POS tagger at query time, or just at index time? We are thinking of using it to filter the tokens we will eventually perform ML on. Basically, we have a bunch of acronyms in our corpus. However, many departments use the same acronyms but expand those acronyms to

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
ah, yeah its not the fastest but it proved to be the best for my purposes, I use it to pre-process data before indexing, to apply more metadata to the documents in a separate field(s) On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > No, I meant for

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
No, I meant for part-of-speech tagging __ But that's interesting that you use StanfordNLP. I've read that it's very slow, so we are concerned that it might not work for us at query-time. Do you use it at query-time, or just index-time? -- Audrey Lorberfeld Data Scientist, w3 Search IBM

Re: Re: using the df parameter to set a default to search all fields

2019-10-22 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Eek, Shawn, you're right -- I'm sorry, all! I meant to say the the QF (!) parameter. And pasted the wrong thing too ☹ This is what ours looks like with the qf parameter (and the edismax parser) title_en^1.5 description_en^0.5 content_en^0.5 headings_en^1.3 keywords_en^1.5

Re: Re: using the df parameter to set a default to search all fields

2019-10-22 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I think you actually can search over all fields, but not in the df parameter. We have a big list of fields we want to search over. So, we just put a dummy one in the df param field, and then we use the fl parameter. With the edismax parser, this works. It looks something like this:

Re: Re: Query on autoGeneratePhraseQueries

2019-10-16 Thread Shubham Goswami
Hi Rohan/Audrey I have implemented the sow=false property with eDismax Query parser but still it does not has any effect on the query as it is still parsing as separate terms instead of phrased one. On Tue, Oct 15, 2019 at 8:25 PM Rohan Kasat wrote: > Also check , > pf , pf2 , pf3 > ps , ps2,

Re: Re: Query on autoGeneratePhraseQueries

2019-10-15 Thread Rohan Kasat
Also check , pf , pf2 , pf3 ps , ps2, ps3 parameters for phrase searches. Regards, Rohan K On Tue, Oct 15, 2019 at 6:41 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > I'm not sure how your config file is setup, but I know that the way we do > multi-token synonyms is to have the sow

Re: Re: Query on autoGeneratePhraseQueries

2019-10-15 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I'm not sure how your config file is setup, but I know that the way we do multi-token synonyms is to have the sow (split on whitespace) parameter set to False while using the edismax parser. I'm not sure if this would work with PhraseQueries , but it might be worth a try! In our config file

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
yup. youre going to find solr is WAY more efficient than you think when it comes to complex queries. On Wed, Oct 9, 2019 at 3:17 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > True...I guess another rub here is that we're using the edismax parser, so > all of our queries are

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
True...I guess another rub here is that we're using the edismax parser, so all of our queries are inherently OR queries. So for a query like 'the ibm way', the search engine would have to: 1) retrieve a document list for: --> "ibm" (this list is probably 80% of the documents) --> "the"

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
if you have anything close to a decent server you wont notice it all. im at about 21 million documents, index varies between 450gb to 800gb depending on merges, and about 60k searches a day and stay sub second non stop, and this is on a single core/non cloud environment On Wed, Oct 9, 2019 at

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
only in my more like this tools, but they have a very specific purpose, otherwise no On Wed, Oct 9, 2019 at 2:31 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Wow, thank you so much, everyone. This is all incredibly helpful insight. > > So, would it be fair to say that the majority

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
oh and by 'non stop' i mean close enough for me :) On Wed, Oct 9, 2019 at 2:59 PM David Hastings wrote: > if you have anything close to a decent server you wont notice it all. im > at about 21 million documents, index varies between 450gb to 800gb > depending on merges, and about 60k searches

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Also, in terms of computational cost, it would seem that including most terms/not having a stop ilst would take a toll on the system. For instance, right now we have "ibm" as a stop word because it appears everywhere in our corpus. If we did not include it in the stop words file, we would have

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Wow, thank you so much, everyone. This is all incredibly helpful insight. So, would it be fair to say that the majority of you all do NOT use stop words? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/9/19, 11:14 AM, "David Hastings" wrote: However,

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
However, with all that said, stopwords CAN be useful in some situations. I combine stopwords with the shingle factory to create "interesting phrases" (not really) that i use in "my more like this" needs. for example, europe for vacation europe on vacation will create the shingle europe_vacation

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread David Hastings
another add on, as the previous two were pretty much spot on:

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Erick Erickson
The theory behind stopwords is that they are “safe” to remove when calculating relevance, so we can squeeze every last bit of usefulness out of very constrained hardware (think 64K of memory. Yes kilobytes). We’ve come a long way since then and the necessity of removing stopwords from the

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Alexandre Rafalovitch
Stopwords (it was discussed on mailing list several times I recall): The ideas is that it used to be part of the tricks to make the index as small as possible to allow faster search. Stopwords being the most common words This days, disk space is not an issue most of the time and there have

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Walter Underwood
Stopwords were used when we were running search engines on 16-bit computers with 50 Megabyte disks, like the PDP-11. They avoided storing and processing long posting lists. Think of removing stopwords as a binary weighting on frequent terms, either on or off (not in the index). With idf, we

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hey Alex, Thank you! Re: stopwords being a thing of the past due to the affordability of hardware...can you expand? I'm not sure I understand. -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/8/19, 1:01 PM, "David Hastings" wrote: Another thing to

Re: Re: Need urgent help with Solr spatial search using SpatialRecursivePrefixTreeFieldType

2019-10-01 Thread David Smiley
Do you know how URLs are structured? They include name=value pairs separated by ampersands. This takes precedence over the contents of any particular name or value. Consequently looking at your parenthesis doesn't make sense since the open-close span ampersands and thus go to different filter

Re: Re: Need urgent help with Solr spatial search using SpatialRecursivePrefixTreeFieldType

2019-10-01 Thread anushka gupta
Thanks, Could you please help me in combining two geofilt fqs as the following gives error, it treats ")" as part of the d parameter and gives error that 'd=80)' is not a valid param: ({!geofilt}=adminLatLon=33.0198431,-96.6988856=80)+OR+({!geofilt}=adminLatLon=50.2171726,8.265894=80) --

Re: Re: Need urgent help with Solr spatial search using SpatialRecursivePrefixTreeFieldType

2019-09-30 Thread David Smiley
"sort" is a regular request parameter. In your non-working query, you specified it as a local-param inside geofilt which isn't where it belongs. If you want to sort from two points then you need to make up your mind on how to combine the distances into some greater aggregate function (e.g.

Re: Re: Need urgent help with Solr spatial search using SpatialRecursivePrefixTreeFieldType

2019-09-30 Thread Tim Casey
https://stackoverflow.com/questions/48348312/solr-7-how-to-do-full-text-search-w-geo-spatial-search On Mon, Sep 30, 2019 at 10:31 AM Anushka Gupta < anushka_gu...@external.mckinsey.com> wrote: > Hi, > > I want to be able to filter on different cities and also sort the results > based on

RE: Re: Need urgent help with Solr spatial search using SpatialRecursivePrefixTreeFieldType

2019-09-30 Thread Anushka Gupta
Hi, I want to be able to filter on different cities and also sort the results based on geoproximity. But sorting doesn’t work:

Re: Re: SolR: How to sort (or boost) by Availability dates

2019-09-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Yay! -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 9/24/19, 10:15 AM, "digi_business" wrote: Hi all, reading your suggestions i've juste come out of the darkness! Just for explaining, my problem is that i want to show all my items (not

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-04 Thread Walter Underwood
On Sep 3, 2019, at 1:13 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > > The main issue we are anticipating with the above strategy surrounds scoring. > Since we will be increasing the frequency of accented terms, we might bias > our page ranker... You will not be increasing the

Re: Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-04 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Thanks, Alex! We'll look into this. -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 9/3/19, 4:27 PM, "Alexandre Rafalovitch" wrote: What about combining: 1) KeywordRepeatFilterFactory 2) An existing folding filter (need to check it ignores

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Alexandre Rafalovitch
What about combining: 1) KeywordRepeatFilterFactory 2) An existing folding filter (need to check it ignores Keyword marked word) 3) RemoveDuplicatesTokenFilterFactory That may give what you are after without custom coding. Regards, Alex. On Tue, 3 Sep 2019 at 16:14, Audrey Lorberfeld -

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Toke, Thank you! That makes a lot of sense. In other news -- we just had a meeting where we decided to try out a hybrid strategy. I'd love to know what you & everyone else thinks... - Since we are concerned with the overhead created by "double-fielding" all tokens per language (because I'm

Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Toke Eskildsen
Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Do you find that searching over both the original title field and the > normalized title > field increases the time it takes for your search engine to retrieve results? It is not something we have measured as that index is fast enough

Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Toke, Do you find that searching over both the original title field and the normalized title field increases the time it takes for your search engine to retrieve results? -- Audrey Lorberfeld Data Scientist, w3 Search Digital Workplace Engineering CIO, Finance and Operations IBM

Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Languages are the best. Thank you all so much! -- Audrey Lorberfeld Data Scientist, w3 Search Digital Workplace Engineering CIO, Finance and Operations IBM audrey.lorberf...@ibm.com On 8/30/19, 4:09 PM, "Walter Underwood" wrote: The right transliteration for accents is

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Thank you, Erick! -- Audrey Lorberfeld Data Scientist, w3 Search Digital Workplace Engineering CIO, Finance and Operations IBM audrey.lorberf...@ibm.com On 8/30/19, 3:49 PM, "Erick Erickson" wrote: It Depends (tm). In this case on how sophisticated/precise your users are. If your

Re: Re: Multi-lingual Search & Accent Marks

2019-08-30 Thread Erick Erickson
It Depends (tm). In this case on how sophisticated/precise your users are. If your users are exclusively extremely conversant in the language and are expected to have keyboards that allow easy access to all the accents… then I might leave them in. In some cases removing them can change the

Re: Re: Multi-lingual Search & Accent Marks

2019-08-30 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Aita, Thanks for that insight! As the conversation has progressed, we are now leaning towards not having the ASCII-folding filter in our pipelines in order to keep marks like umlauts and tildas. Instead, we might add acute and grave accents to a file pointed at by the

Re: Re: Multi-language Spellcheck

2019-08-29 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Thanks, everyone! -- Audrey Lorberfeld Data Scientist, w3 Search Digital Workplace Engineering CIO, Finance and Operations IBM audrey.lorberf...@ibm.com On 8/29/19, 11:28 AM, "Atita Arora" wrote: I would agree with the suggestion, I remember something similar presented by someone at

Re: Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread Sunil Srinivasan
Hi Erick,  Is there anyway I can get it to match documents containing at least one of the words of the original query? i.e. 'frozen' or 'dinner' or both. (But not partial matches of the synonyms) Thanks,Sunil -Original Message- From: Erick Erickson To: solr-user Sent: Thu, Jul 18,

Re: Re: Query takes a long time Solr 6.1.0

2019-06-07 Thread David Hastings
There isnt anything wrong aside from your query is poorly thought out. On Fri, Jun 7, 2019 at 11:04 AM vishal patel wrote: > Any one is looking my issue?? > > Get Outlook for Android > > > From: vishal patel > Sent: Thursday, June 6, 2019

Re : Re: Solr 6.6 and OpenJDK11

2019-04-05 Thread e_briere
There is a lack of consensus about Java 11 support. We have been recommended to stick to Java 8 even on Solr 7.X. Is the page bellow the 'official' position? Eric. Le 05/04/19 03:23, Jan Høydahl a écrit : > > Solr7 is the first Solr version that has been proved to work with JDK9+ > So you

Re: Re: solr _route_ key now working

2019-03-27 Thread Jay Potharaju
I was reading the debug info incorrectly it is working as expected ...thanks for the help. Thanks Jay Potharaju On Tue, Mar 26, 2019 at 10:58 PM Jay Potharaju wrote: > Edwin, I tried escaping the special characters but it does not seems to > work. I am using 7.7 > Thanks Jeremy for the

Re: Re: solr _route_ key now working

2019-03-26 Thread Jay Potharaju
Edwin, I tried escaping the special characters but it does not seems to work. I am using 7.7 Thanks Jeremy for the example. id:123:456!789 I do see that the data for the same key is co-located in the same shard by running. I can see that all the data is co-located in the same shard when querying

Re: Re: solr _route_ key now working

2019-03-26 Thread Branham, Jeremy (Experis)
Jay – I’m not familiar with the document ID format you mention [having a “:” in the prefix], but it looks similar to the composite ID routing I’m using. Document Id format: “a/1!id” Then I can use a _route_ value of “a/1!” when querying. Example Doc IDs: a/1!768456 a/1!563575 b/1!456234

Re: Re: Re: obfuscated password error

2019-03-20 Thread Branham, Jeremy (Experis)
Hard to see in email, particularly because my email server strips urls, but a few thinigs I would suggest – Be sure there aren’t any spaces after your line continuation characters ‘\’. This has bit me before. Check the running processes JVM args and compare `ps –ef | grep solr` Also, I’d

Re: Re: obfuscated password error

2019-03-20 Thread Satya Marivada
Sending again, with highlighted text in yellow. So I got a chance to do a diff of the environments solr-6.3.0 folder within contents. solr-6.3.0/bin/solr file has the difference highlighted in yellow. Any idea of what is going on in that if else in solr file? *The working configuration file

Re: Re: obfuscated password error

2019-03-20 Thread Satya Marivada
So I got a chance to do a diff of the environments solr-6.3.0 folder within contents. solr-6.3.0/bin/solr file has the difference highlighted in yellow. Any idea of what is going on in that if else in solr file? *The working configuration file contents are (ssl.properties below has the keystore

Re: Re: obfuscated password error

2019-03-19 Thread Satya Marivada
Hi Jeremy, Thanks for the points. Yes, agreed that there is some conflicting property somewhere that is not letting it work. So I basically restored solr-6.3.0 directory from another environment and replace the host name appropriately for this environment. And I used the original keystore that

Re: Re: obfuscated password error

2019-03-19 Thread Satya Marivada
It has been generated with plain password. Same in other environments too, but it works in other environments. Thanks, Satya On Mon, Mar 18, 2019, 10:42 PM Zheng Lin Edwin Yeo wrote: > Hi, > > Did you generate your keystore with the obfuscated password or the plain > text password? > >

Re: Re: obfuscated password error

2019-03-18 Thread Zheng Lin Edwin Yeo
Hi, Did you generate your keystore with the obfuscated password or the plain text password? Regards, Edwin On Tue, 19 Mar 2019 at 02:32, Branham, Jeremy (Experis) wrote: > I’m not sure if you are sharing the trust/keystores, so I may be off-base > here… > > Some thoughts – > - Verify your VM

Re: Re: obfuscated password error

2019-03-18 Thread Branham, Jeremy (Experis)
I’m not sure if you are sharing the trust/keystores, so I may be off-base here… Some thoughts – - Verify your VM arguments, to be sure there aren’t conflicting SSL properties. - Verify the environment is targeting the correct version of Java - Verify the trust/key stores exist where they are

Re: Re: Garbage Collection Metrics

2019-03-18 Thread Branham, Jeremy (Experis)
I get these metrics by pushing the JMX data into Graphite, then use the non-negative derivative function on the GC ‘time’ metric. It essentially shows the amount of change on a counter, at the specific time it occurred. Jeremy Branham jb...@allstate.com On 3/18/19, 12:06 PM, "Jeff Courtade"

Re: Re: Authorization fails but api still renders

2019-03-15 Thread Branham, Jeremy (Experis)
// Adding the dev DL, as this may be a bug Solr v7.7.0 I’m expecting the 401 on all the servers in all 3 clusters using the security configuration. For example, when I access the core or collection APIs without authentication, it should return a 401. On one of the servers, in one of the

Antwort: Re: Re: High CPU usage with Solr 7.7.0

2019-03-01 Thread Lukas Weiss
$ReferenceHandler.run​(Reference.java:153) Von:"Tomás Fernández Löbbe" An: solr-user@lucene.apache.org, Datum: 27.02.2019 19:34 Betreff: Re: Re: High CPU usage with Solr 7.7.0 Maybe a thread dump would be useful if you still have some instance running on 7.7 On Wed, Feb 27, 2019

Re: Re: High CPU usage with Solr 7.7.0

2019-02-27 Thread Tomás Fernández Löbbe
Maybe a thread dump would be useful if you still have some instance running on 7.7 On Wed, Feb 27, 2019 at 7:28 AM Lukas Weiss wrote: > I can confirm this. Downgrading to 7.6.0 solved the issue. > Thanks for the hint. > > > > Von:"Joe Obernberger" > An: solr-user@lucene.apache.org,

RE: Re: Suppress stack trace in error response

2019-02-22 Thread Markus Jelsma
Friday 22nd February 2019 16:53 > To: solr-user@lucene.apache.org > Subject: Re: Re: Suppress stack trace in error response > > Thanks Edwin – You’re right, I could explain that a bit more. > My security team has run a scan against the SOLR servers and identified a few > things t

Re: Re: Suppress stack trace in error response

2019-02-22 Thread Branham, Jeremy (Experis)
Thanks Edwin – You’re right, I could explain that a bit more. My security team has run a scan against the SOLR servers and identified a few things they want suppressed, one being the stack trace in an error message. For example – 500 1 ` For input string: "`"

Re: Re: Re: Suppress stack trace in error response

2019-02-22 Thread Branham, Jeremy (Experis)
BTW – Congratulations on joining the PMC! Jeremy Branham jb...@allstate.com On 2/22/19, 9:46 AM, "Branham, Jeremy (Experis)" wrote: Thanks Jason – That’s what I was thinking too. It would require some development. Jeremy Branham jb...@allstate.com On

Re: Re: Suppress stack trace in error response

2019-02-22 Thread Branham, Jeremy (Experis)
Thanks Jason – That’s what I was thinking too. It would require some development. Jeremy Branham jb...@allstate.com On 2/22/19, 8:50 AM, "Jason Gerlowski" wrote: Hi Jeremy, Unfortunately Solr doesn't offer anything like what you're looking for, at least that I know of.

Re: Re-read from CloudSolrStream

2019-02-20 Thread Joel Bernstein
It sounds like you just need to catch the exception? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Feb 18, 2019 at 3:14 AM SOLR4189 wrote: > Hi all, > > Let's say I have a next code: > > http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html > < >

RE: Re: Delayed/waiting requests

2019-02-19 Thread Gael Jourdan-Weil
À : solr-user@lucene.apache.org Objet : RE: Re: Delayed/waiting requests @Erick: We will try to lower the autowarm and run some tests to compare. If I get your point, having a big cache might cause more troubles than help if the cache hit ratio is not high enough because the cache

***UNCHECKED*** Re: Re: solr 7.0: What causes the segment to flush

2019-02-18 Thread DIMA
Buongiorno, Vedi allegato e di confermare. Password: 1234567 Grazie DIMA From: khi...@gmail.com Sent: Tue, 17 Oct 2017 15:40:50 + To: solr-user@lucene.apache.org Subject: Re: solr 7.0: What causes the segment to flush   I take my yesterdays comment back. I assumed that the 

  1   2   3   4   5   >