How do I get error messages?
Hi, If I request some query which is incorrect grammatically, error message is shown in web browser but not in my python request code. It just outputs "400 Bad Request". How do I get error message in my python code? Below is sample code... - import ssl import urllib def request(url, params, headers={}, pass_verification=True): if pass_verification: context = ssl._create_unverified_context() else: context = None try: req = urllib.request.Request(url, urllib.parse.urlencode(params).encode('utf-8'), headers) response = urllib.request.urlopen(req, context=context) except Exception as e: raise e return response.read() request('http://localhost:8983/solr/select', params={'q':'(test AND AND query)'}) - 400 72 (test AND AND query) lucene org.apache.solr.search.SyntaxError: Cannot parse '(test AND AND query)': Encountered " "AND "" at line 1, column 10. Was expecting one of: ... "+" ... "-" ... ... "(" ... "*" ... ... ... ... ... ... "[" ... "{" ... ... ... ... "*" ... 400 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
How to filter auto suggestion in Solr
Guys, I am trying to implement Solr context filtering to filter auto-suggestion result based on the category value. We have implemented autosuggestion based on SpellCheckComponent. Here is my detail question https://stackoverflow.com/questions/53707224/filter-the-solr-autosuggestion-in-hybris Any help would be appreciated!!
Query kills Solr
Is there a way to get an approximate measure of the memory used by an indexed field(s). I’m looking into a problem with one of our Solr indexes. I have a Japanese query that causes the replicas to run out of memory when processing a query. Also, is there a way to change or disable the timeout in the Solr Console? When I run this query there it always times out, and that is a real pain. I know that it will complete eventually. I have this field type: I have a number of fields of this type. The CJKBigramFilterFactory can generate a lot of tokens. I’m concerned that this combination is what is killing our solr instances This is the query that is causing my problems: モノクローナル抗ニコチン性アセチルコリンレセプター(??7サブユニット)抗体 マウス宿主抗体 We are using Solr 7.2 in a solrcloud
Re: URL Case Sensitive/Insensitive
Lowercasing might work, it might not. Hostnames originally were case-insensitive, but that might have changed with I18N hostnames. Paths are interpreted by the web server. On Windows, paths are case-insensitive. On Unix, they are case-sensitive. Web servers might be configured to use case-insensitive paths. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 11, 2018, at 10:33 AM, Moyer, Brett wrote: > > https://www.nuveen.com/mutual-funds/nuveen-high-yield-municipal-bond-fund > https://www.nuveen.com/mutual-funds/Nuveen-High-Yield-Municipal-Bond-Fund > > Is there any issue if we just lowercase all URLs? I can't think of an issue > that would be caused, but that's why I'm asking the Guru's! > > Brett Moyer > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, December 11, 2018 12:41 PM > To: solr-user > Subject: Re: URL Case Sensitive/Insensitive > > What do you mean by "url case"? No, I'm not being snarky. > > The value returned in a doc is very different than the value searched. > The stored data is the original input without going through any > filters. > > If you mean the value _returned_ by Solr from a stored field, then the > case is exactly whatever was input originally. To get it a consistent > case, I'd change it on the client side before sending to Solr, or > use, say, a ScriptUpdateProcessor to change it on the way in to Solr. > > If you're talking about _searching_ the URL, you need to put the > appropriate filters in your analysis chain. Most distributions have a > "lowercase" type that is a keywordtokenizer and lowercasefilter That > still treats the searchable text as a single token, so for instance > you wouldn't be able to search for url:com with pre-and-post wildcards > which is not a good pattern. If you want to search sub-parts of a url, > you'll use one of the text-based types to break it up into tokens. > Even in this case, though, the returned data is still the original > case since it's the stored data that's returned. > > Best, > Erick > On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett wrote: >> >> Hello, I'm new to Solr been using it for a few months. A recent question >> came up from our business partners about URL casing. Previously their URLs >> were upper case, they made a change and now all lower. Both pages/URLs are >> still accessible so there are duplicates in Solr. They are requesting all >> URLs be evaluated as lowercase. What is the best practice on URL case? Is >> there a negative to making all lowercase? I know I can drop the index and >> re-crawl to fix it, but long term how should URL case be treated? Thanks! >> >> Brett Moyer >> >> * >> This e-mail may contain confidential or privileged information. >> If you are not the intended recipient, please notify the sender immediately >> and then delete it. >> >> TIAA >> * > * > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender immediately > and then delete it. > > TIAA > *
RE: URL Case Sensitive/Insensitive
https://www.nuveen.com/mutual-funds/nuveen-high-yield-municipal-bond-fund https://www.nuveen.com/mutual-funds/Nuveen-High-Yield-Municipal-Bond-Fund Is there any issue if we just lowercase all URLs? I can't think of an issue that would be caused, but that's why I'm asking the Guru's! Brett Moyer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, December 11, 2018 12:41 PM To: solr-user Subject: Re: URL Case Sensitive/Insensitive What do you mean by "url case"? No, I'm not being snarky. The value returned in a doc is very different than the value searched. The stored data is the original input without going through any filters. If you mean the value _returned_ by Solr from a stored field, then the case is exactly whatever was input originally. To get it a consistent case, I'd change it on the client side before sending to Solr, or use, say, a ScriptUpdateProcessor to change it on the way in to Solr. If you're talking about _searching_ the URL, you need to put the appropriate filters in your analysis chain. Most distributions have a "lowercase" type that is a keywordtokenizer and lowercasefilter That still treats the searchable text as a single token, so for instance you wouldn't be able to search for url:com with pre-and-post wildcards which is not a good pattern. If you want to search sub-parts of a url, you'll use one of the text-based types to break it up into tokens. Even in this case, though, the returned data is still the original case since it's the stored data that's returned. Best, Erick On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett wrote: > > Hello, I'm new to Solr been using it for a few months. A recent question came > up from our business partners about URL casing. Previously their URLs were > upper case, they made a change and now all lower. Both pages/URLs are still > accessible so there are duplicates in Solr. They are requesting all URLs be > evaluated as lowercase. What is the best practice on URL case? Is there a > negative to making all lowercase? I know I can drop the index and re-crawl to > fix it, but long term how should URL case be treated? Thanks! > > Brett Moyer > > * > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender immediately > and then delete it. > > TIAA > * * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *
Re: URL Case Sensitive/Insensitive
Moyer, Brett wrote: > What is the best practice on URL case? I work with web archiving and URL-normalisation is quite a tricky thing. The software we use is https://github.com/ukwa/webarchive-discovery and in there a lot of energy has been spend on the subject. Long story short, we index 2 forms: The unmodified raw one and a heavily normalised one. Question: Is https://www.example.com/FOO/ the same as http://example.com/foo ? Technically it is not as * There might be different content served for different protocols (highly unlikely) * www might mean something (unlikely) * FOO might be another resource than foo (unlikely) * The trailing slash might be significant (seen on some Apache proxy-setups) There are other rules, such as trying to remove session-ids, everything after # and so on. None of the individual steps results in many false positives in themselves, but they do add up. For most practical purposes (URL-lookup & grouping, following links between archived pages, resolving embedded resources from pages) we use the heavily normalised URL. - Toke Eskildsen
Re: Keyword field with tabs in Solr 7.4
You are probably in "url-encoding hell". Add =query to your search and check the parsed query returned to see what Solr actually sees. Try url-encoding the backslash *%5C" maybe? Best, Erick On Tue, Dec 11, 2018 at 1:40 AM Michael Aleythe, Sternwald wrote: > > Hey everybody, > > i have a Solr field keyword field defined as: > > > > > > > > stored="true" termVectors="false" multiValued="false" /> > > Some documents have tabs (\t) indexed in this field, e.g. > IPTC_2_080_KY:"\tbus\tbahn" > > How can i query this content? I tried "\tbus\tbahn", > \\tbus\\tbahn and " bus bahn" but nothing matches. Does > anybody know what to do? > > Regards > Michael
Re: URL Case Sensitive/Insensitive
What do you mean by "url case"? No, I'm not being snarky. The value returned in a doc is very different than the value searched. The stored data is the original input without going through any filters. If you mean the value _returned_ by Solr from a stored field, then the case is exactly whatever was input originally. To get it a consistent case, I'd change it on the client side before sending to Solr, or use, say, a ScriptUpdateProcessor to change it on the way in to Solr. If you're talking about _searching_ the URL, you need to put the appropriate filters in your analysis chain. Most distributions have a "lowercase" type that is a keywordtokenizer and lowercasefilter That still treats the searchable text as a single token, so for instance you wouldn't be able to search for url:com with pre-and-post wildcards which is not a good pattern. If you want to search sub-parts of a url, you'll use one of the text-based types to break it up into tokens. Even in this case, though, the returned data is still the original case since it's the stored data that's returned. Best, Erick On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett wrote: > > Hello, I'm new to Solr been using it for a few months. A recent question came > up from our business partners about URL casing. Previously their URLs were > upper case, they made a change and now all lower. Both pages/URLs are still > accessible so there are duplicates in Solr. They are requesting all URLs be > evaluated as lowercase. What is the best practice on URL case? Is there a > negative to making all lowercase? I know I can drop the index and re-crawl to > fix it, but long term how should URL case be treated? Thanks! > > Brett Moyer > > * > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender immediately > and then delete it. > > TIAA > *
URL Case Sensitive/Insensitive
Hello, I'm new to Solr been using it for a few months. A recent question came up from our business partners about URL casing. Previously their URLs were upper case, they made a change and now all lower. Both pages/URLs are still accessible so there are duplicates in Solr. They are requesting all URLs be evaluated as lowercase. What is the best practice on URL case? Is there a negative to making all lowercase? I know I can drop the index and re-crawl to fix it, but long term how should URL case be treated? Thanks! Brett Moyer * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *
NGramFilterFactory and Similarity
Hello, We are trying to use NGramFilterFactory for approximative search with solr 7. We usually use a similarity with no tf, no idf (our similarity extends ClassicSimilarity, with tf and idf functions always returning 1). For ngram search though, it seems inappropriate since it scores a word matching with one ngram the same as a word matching with, let's say, seven ngrams. We would like a similarity that gives a higher score to a document matching more ngrams, but not using term frequency (we have multivalued fields, and a word might be repeated in more than one entry of our multivalued field, but we dont want that document to get a higher score because of that) Does anyone have experienced the same issues? Best regards, Elisabeth
Re: Filter the Solr autosuggestion in Hybris
Please note: here we have autosuggestion with `SpellCheckComponent`, which we want to filter. Here is the question https://stackoverflow.com/questions/53707224/filter-the-solr-autosuggestion-in-hybris On Tue, 11 Dec 2018 at 17:02 Ankit Patel wrote: > I am trying to implement Solr context filtering to filter auto-suggestion > result based on the category value. > > *schema.xml* > > multiValued="true" /> > stored="true" multiValued="true" /> > multiValued="true" /> > > positionIncrementGap="100"> > > > > > > stored="true" multiValued="true" /> > positionIncrementGap="100"> > > > pattern="(['’])" replacement=" " /> > > words="lang/stopwords_en.txt" ignoreCase="true" /> > > synonyms="synonyms.txt"/> > /> > > > > > > multiValued="true" /> > > > *solrConfig.xml* > > > categorydic > org.apache.solr.spelling.suggest.Suggester > name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory > name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory > autosuggest_en > allCategories_string_mv > false > false > text_spell_en > ${solr.core.dataDir}/suggesttest > > > fields look like > > "spellcheck_en": [ > "ANKITHI LIMIT", > "ROU7000272", > ] > > "allCategories_string_mv": [ > "3m", > "harddiskcategory", > ] > > > > http://localhost:8983/solr/master_Product/suggest?spellcheck=true=true=categorydic=json=mytest=harddiskcategory > > When I am hitting this URL with spellcheck.dictionary=categorydic, > spellcheck.cfq=harddiskcategory,spellcheck.q=mytest it won't filter the > result. I am getting all the match of *mytest* > > Solr Version: 5.3.0 > Hybris Vesion: 6.0 > > Any clue? > > Regards, > Ankit Patel >
Filter the Solr autosuggestion in Hybris
I am trying to implement Solr context filtering to filter auto-suggestion result based on the category value. *schema.xml* *solrConfig.xml* categorydic org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory org.apache.solr.spelling.suggest.DocumentDictionaryFactory autosuggest_en allCategories_string_mv false false text_spell_en ${solr.core.dataDir}/suggesttest fields look like "spellcheck_en": [ "ANKITHI LIMIT", "ROU7000272", ] "allCategories_string_mv": [ "3m", "harddiskcategory", ] http://localhost:8983/solr/master_Product/suggest?spellcheck=true=true=categorydic=json=mytest=harddiskcategory When I am hitting this URL with spellcheck.dictionary=categorydic, spellcheck.cfq=harddiskcategory,spellcheck.q=mytest it won't filter the result. I am getting all the match of *mytest* Solr Version: 5.3.0 Hybris Vesion: 6.0 Any clue? Regards, Ankit Patel
Re: Case insensitive query for fetching facets
Yes, all the three options (copy fields, using dynamic fields and the SortableTextField) are feasible. Since I am on the 7.5.0 version of Solr, I will go ahead with the SortableTextField option. Thank you team!! On Fri, Dec 7, 2018 at 8:46 PM Alexandre Rafalovitch wrote: > If you are on the latest Solr (7.3+), try switching from TextField to > SortableTextField in your string_ci definition above. > > That type implicitly uses docValues and should return original text > for faceting purposes, while still allowing analyzers. > > Regards, >Alex. > On Thu, 6 Dec 2018 at 08:26, Ritesh Kumar > wrote: > > > > Hello team, > > > > I am trying to prepare facet on a field of type string. The facet data > will > > be shown according to the user's query on this very field. > > > > > required="false" multiValued="false"/> > > > > > > As this field is of type string, it works fine with case sensitive > query. I > > want to be able to query on this field irrespective of the case. > > > > I tried changing the field type to string_ci as defined below > > > > > omitNorms="true"> > > > > > > > > > > > > > > > required="false" multiValued="false"/> > > > > Now, in this case, I am able to perform a case-insensitive query but the > > facet values are being shown in lowercase. > > > > I want to be able to perform a case-insensitive query on this field but > > show the original data. > > Is there anything I can do achieve this. > > > > Best, > > > > -- > > Ritesh Kumar >
Keyword field with tabs in Solr 7.4
Hey everybody, i have a Solr field keyword field defined as: Some documents have tabs (\t) indexed in this field, e.g. IPTC_2_080_KY:"\tbus\tbahn" How can i query this content? I tried "\tbus\tbahn", \\tbus\\tbahn and " bus bahn" but nothing matches. Does anybody know what to do? Regards Michael