How do I get error messages?

2018-12-11 Thread Jason
Hi,

If I request some query which is incorrect grammatically, error message is
shown in web browser but not in my python request code.
It just outputs "400 Bad Request".
How do I get error message in my python code?
Below is sample code...

-
import ssl
import urllib

def request(url, params, headers={}, pass_verification=True):
if pass_verification:
context = ssl._create_unverified_context()
else:
context = None
try:
req = urllib.request.Request(url,
urllib.parse.urlencode(params).encode('utf-8'), headers)
response = urllib.request.urlopen(req, context=context)
except Exception as e:
raise e
return response.read()

request('http://localhost:8983/solr/select', params={'q':'(test AND AND
query)'})

-



400
72

(test AND AND query)
lucene




org.apache.solr.search.SyntaxError: Cannot parse '(test AND AND query)':
Encountered "  "AND "" at line 1, column 10. Was expecting one of:
 ... "+" ... "-" ...  ... "(" ... "*" ...  ... 
...  ...  ...  ... "[" ... "{" ...
 ...  ...  ... "*" ...

400





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to filter auto suggestion in Solr

2018-12-11 Thread Ankit Patel
Guys, I am trying to implement Solr context filtering to filter
auto-suggestion result based on the category value. We have implemented
autosuggestion based on SpellCheckComponent.

Here is my detail question
https://stackoverflow.com/questions/53707224/filter-the-solr-autosuggestion-in-hybris

Any help would be appreciated!!


Query kills Solr

2018-12-11 Thread Webster Homer
Is there a way to get an approximate measure of the memory used by an indexed 
field(s). I’m looking into a problem with one of our Solr indexes. I have a 
Japanese query that causes the replicas to run out of memory when processing a 
query.
Also, is there a way to change or disable the timeout in the Solr Console? When 
I run this query there it always times out, and that is a real pain. I know 
that it will complete eventually.

I have this field type:
   

 
 
   
 






   

  

 
 
   

   







   

  

I have a number of fields of this type. The CJKBigramFilterFactory can generate 
a lot of tokens. I’m concerned that this combination is what is killing our 
solr instances
This is the query that is causing my problems:
モノクローナル抗ニコチン性アセチルコリンレセプター(??7サブユニット)抗体 マウス宿主抗体

We are using Solr 7.2 in a solrcloud



Re: URL Case Sensitive/Insensitive

2018-12-11 Thread Walter Underwood
Lowercasing might work, it might not.

Hostnames originally were case-insensitive, but that might have changed with 
I18N hostnames.

Paths are interpreted by the web server. On Windows, paths are 
case-insensitive. On Unix, they are case-sensitive. Web servers might be 
configured to use case-insensitive paths.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 11, 2018, at 10:33 AM, Moyer, Brett  wrote:
> 
> https://www.nuveen.com/mutual-funds/nuveen-high-yield-municipal-bond-fund
> https://www.nuveen.com/mutual-funds/Nuveen-High-Yield-Municipal-Bond-Fund
> 
> Is there any issue if we just lowercase all URLs? I can't think of an issue 
> that would be caused, but that's why I'm asking the Guru's!
> 
> Brett Moyer
>
> 
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Tuesday, December 11, 2018 12:41 PM
> To: solr-user
> Subject: Re: URL Case Sensitive/Insensitive
> 
> What do you mean by "url case"? No, I'm not being snarky.
> 
> The value returned in a doc is very different than the value searched.
> The stored data is the original input without going through any
> filters.
> 
> If you mean the value _returned_ by Solr from a stored field, then the
> case is exactly whatever was input originally. To get it a consistent
> case, I'd change it on the client side before sending  to Solr, or
> use, say, a  ScriptUpdateProcessor to change it on the way in to Solr.
> 
> If you're talking about _searching_ the URL, you need to put the
> appropriate filters in your analysis chain. Most distributions have a
> "lowercase" type that is a keywordtokenizer and lowercasefilter That
> still treats the searchable text as a single token, so for instance
> you wouldn't be able to search for url:com with pre-and-post wildcards
> which is not a good pattern. If you want to search sub-parts of a url,
> you'll use one of the text-based types to break it up into tokens.
> Even in this case, though, the returned data is still the original
> case since it's the stored data that's returned.
> 
> Best,
> Erick
> On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett  wrote:
>> 
>> Hello, I'm new to Solr been using it for a few months. A recent question 
>> came up from our business partners about URL casing. Previously their URLs 
>> were upper case, they made a change and now all lower. Both pages/URLs are 
>> still accessible so there are duplicates in Solr. They are requesting all 
>> URLs be evaluated as lowercase. What is the best practice on URL case? Is 
>> there a negative to making all lowercase? I know I can drop the index and 
>> re-crawl to fix it, but long term how should URL case be treated? Thanks!
>> 
>> Brett Moyer
>> 
>> *
>> This e-mail may contain confidential or privileged information.
>> If you are not the intended recipient, please notify the sender immediately 
>> and then delete it.
>> 
>> TIAA
>> *
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
> 
> TIAA
> *



RE: URL Case Sensitive/Insensitive

2018-12-11 Thread Moyer, Brett
https://www.nuveen.com/mutual-funds/nuveen-high-yield-municipal-bond-fund
https://www.nuveen.com/mutual-funds/Nuveen-High-Yield-Municipal-Bond-Fund

Is there any issue if we just lowercase all URLs? I can't think of an issue 
that would be caused, but that's why I'm asking the Guru's!

Brett Moyer
   

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, December 11, 2018 12:41 PM
To: solr-user
Subject: Re: URL Case Sensitive/Insensitive

What do you mean by "url case"? No, I'm not being snarky.

The value returned in a doc is very different than the value searched.
The stored data is the original input without going through any
filters.

If you mean the value _returned_ by Solr from a stored field, then the
case is exactly whatever was input originally. To get it a consistent
case, I'd change it on the client side before sending  to Solr, or
use, say, a  ScriptUpdateProcessor to change it on the way in to Solr.

If you're talking about _searching_ the URL, you need to put the
appropriate filters in your analysis chain. Most distributions have a
"lowercase" type that is a keywordtokenizer and lowercasefilter That
still treats the searchable text as a single token, so for instance
you wouldn't be able to search for url:com with pre-and-post wildcards
which is not a good pattern. If you want to search sub-parts of a url,
you'll use one of the text-based types to break it up into tokens.
Even in this case, though, the returned data is still the original
case since it's the stored data that's returned.

Best,
Erick
On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett  wrote:
>
> Hello, I'm new to Solr been using it for a few months. A recent question came 
> up from our business partners about URL casing. Previously their URLs were 
> upper case, they made a change and now all lower. Both pages/URLs are still 
> accessible so there are duplicates in Solr. They are requesting all URLs be 
> evaluated as lowercase. What is the best practice on URL case? Is there a 
> negative to making all lowercase? I know I can drop the index and re-crawl to 
> fix it, but long term how should URL case be treated? Thanks!
>
> Brett Moyer
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
>
> TIAA
> *
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: URL Case Sensitive/Insensitive

2018-12-11 Thread Toke Eskildsen
Moyer, Brett  wrote:
> What is the best practice on URL case?

I work with web archiving and URL-normalisation is quite a tricky thing. The 
software we use is https://github.com/ukwa/webarchive-discovery and in there a 
lot of energy has been spend on the subject. Long story short, we index 2 
forms: The unmodified raw one and a heavily normalised one.

Question: Is
  https://www.example.com/FOO/
the same as
  http://example.com/foo
?

Technically it is not as
* There might be different content served for different protocols (highly 
unlikely)
* www might mean something (unlikely)
* FOO might be another resource than foo (unlikely)
* The trailing slash might be significant (seen on some Apache proxy-setups)

There are other rules, such as trying to remove session-ids, everything after # 
and so on. None of the individual steps results in many false positives in 
themselves, but they do add up.

For most practical purposes (URL-lookup & grouping, following links between 
archived pages, resolving embedded resources from pages) we use the heavily 
normalised URL.

- Toke Eskildsen


Re: Keyword field with tabs in Solr 7.4

2018-12-11 Thread Erick Erickson
You are probably in "url-encoding hell". Add =query to your
search and check the parsed query returned to see what Solr actually
sees. Try url-encoding the backslash *%5C" maybe?

Best,
Erick
On Tue, Dec 11, 2018 at 1:40 AM Michael Aleythe, Sternwald
 wrote:
>
> Hey everybody,
>
> i have a Solr field keyword field defined as:
>
> 
>  
>
>  
> 
>
>  stored="true" termVectors="false" multiValued="false" />
>
> Some documents have tabs (\t) indexed in this field, e.g. 
> IPTC_2_080_KY:"\tbus\tbahn"
>
> How can i query this content? I tried  "\tbus\tbahn", 
> \\tbus\\tbahn and " bus bahn" but nothing matches. Does 
> anybody know what to do?
>
> Regards
> Michael


Re: URL Case Sensitive/Insensitive

2018-12-11 Thread Erick Erickson
What do you mean by "url case"? No, I'm not being snarky.

The value returned in a doc is very different than the value searched.
The stored data is the original input without going through any
filters.

If you mean the value _returned_ by Solr from a stored field, then the
case is exactly whatever was input originally. To get it a consistent
case, I'd change it on the client side before sending  to Solr, or
use, say, a  ScriptUpdateProcessor to change it on the way in to Solr.

If you're talking about _searching_ the URL, you need to put the
appropriate filters in your analysis chain. Most distributions have a
"lowercase" type that is a keywordtokenizer and lowercasefilter That
still treats the searchable text as a single token, so for instance
you wouldn't be able to search for url:com with pre-and-post wildcards
which is not a good pattern. If you want to search sub-parts of a url,
you'll use one of the text-based types to break it up into tokens.
Even in this case, though, the returned data is still the original
case since it's the stored data that's returned.

Best,
Erick
On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett  wrote:
>
> Hello, I'm new to Solr been using it for a few months. A recent question came 
> up from our business partners about URL casing. Previously their URLs were 
> upper case, they made a change and now all lower. Both pages/URLs are still 
> accessible so there are duplicates in Solr. They are requesting all URLs be 
> evaluated as lowercase. What is the best practice on URL case? Is there a 
> negative to making all lowercase? I know I can drop the index and re-crawl to 
> fix it, but long term how should URL case be treated? Thanks!
>
> Brett Moyer
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
>
> TIAA
> *


URL Case Sensitive/Insensitive

2018-12-11 Thread Moyer, Brett
Hello, I'm new to Solr been using it for a few months. A recent question came 
up from our business partners about URL casing. Previously their URLs were 
upper case, they made a change and now all lower. Both pages/URLs are still 
accessible so there are duplicates in Solr. They are requesting all URLs be 
evaluated as lowercase. What is the best practice on URL case? Is there a 
negative to making all lowercase? I know I can drop the index and re-crawl to 
fix it, but long term how should URL case be treated? Thanks!

Brett Moyer

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


NGramFilterFactory and Similarity

2018-12-11 Thread elisabeth benoit
Hello,

We are trying to use NGramFilterFactory for approximative search with solr
7.

We usually use a similarity with no tf, no idf (our similarity extends
ClassicSimilarity, with tf and idf functions always returning 1).

For ngram search though, it seems inappropriate since it scores a word
matching with one ngram the same as a word matching with, let's say, seven
ngrams.

We would like a similarity that gives a higher score to a document matching
more ngrams, but not using term frequency (we have multivalued fields, and
a word might be repeated in more than one entry of our multivalued field,
but we dont want that document to get a higher score because of that)

Does anyone have experienced the same issues?

Best regards,
Elisabeth


Re: Filter the Solr autosuggestion in Hybris

2018-12-11 Thread Ankit Patel
Please note: here we have autosuggestion with `SpellCheckComponent`, which
we want to filter.

Here is the question
https://stackoverflow.com/questions/53707224/filter-the-solr-autosuggestion-in-hybris



On Tue, 11 Dec 2018 at 17:02 Ankit Patel  wrote:

> I am trying to implement Solr context filtering to filter auto-suggestion
> result based on the category value.
>
> *schema.xml*
>
>  multiValued="true" />
>  stored="true" multiValued="true" />
>  multiValued="true" />
> 
>  positionIncrementGap="100">
> 
> 
> 
> 
> 
>  stored="true" multiValued="true" />
>  positionIncrementGap="100">
> 
> 
>  pattern="(['’])" replacement=" " />
> 
>  words="lang/stopwords_en.txt" ignoreCase="true" />
> 
>  synonyms="synonyms.txt"/>
>  />
> 
> 
> 
> 
> 
>  multiValued="true" />
>
>
> *solrConfig.xml*
>
> 
> categorydic
> org.apache.solr.spelling.suggest.Suggester
>  name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
>  name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory
> autosuggest_en
> allCategories_string_mv
> false
> false
> text_spell_en
> ${solr.core.dataDir}/suggesttest
> 
>
> fields look like
>
> "spellcheck_en": [
>   "ANKITHI LIMIT",
>   "ROU7000272",
> ]
>
> "allCategories_string_mv": [
>   "3m",
>   "harddiskcategory",
>   ]
>
>
>
> http://localhost:8983/solr/master_Product/suggest?spellcheck=true=true=categorydic=json=mytest=harddiskcategory
>
> When I am hitting this URL with spellcheck.dictionary=categorydic,
> spellcheck.cfq=harddiskcategory,spellcheck.q=mytest it won't filter the
> result. I am getting all the match of *mytest*
>
> Solr Version: 5.3.0
> Hybris Vesion: 6.0
>
> Any clue?
>
> Regards,
> Ankit Patel
>


Filter the Solr autosuggestion in Hybris

2018-12-11 Thread Ankit Patel
I am trying to implement Solr context filtering to filter auto-suggestion
result based on the category value.

*schema.xml*





























*solrConfig.xml*


categorydic
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory
org.apache.solr.spelling.suggest.DocumentDictionaryFactory
autosuggest_en
allCategories_string_mv
false
false
text_spell_en
${solr.core.dataDir}/suggesttest


fields look like

"spellcheck_en": [
  "ANKITHI LIMIT",
  "ROU7000272",
]

"allCategories_string_mv": [
  "3m",
  "harddiskcategory",
  ]


http://localhost:8983/solr/master_Product/suggest?spellcheck=true=true=categorydic=json=mytest=harddiskcategory

When I am hitting this URL with spellcheck.dictionary=categorydic,
spellcheck.cfq=harddiskcategory,spellcheck.q=mytest it won't filter the
result. I am getting all the match of *mytest*

Solr Version: 5.3.0
Hybris Vesion: 6.0

Any clue?

Regards,
Ankit Patel


Re: Case insensitive query for fetching facets

2018-12-11 Thread Ritesh Kumar
Yes, all the three options (copy fields, using dynamic fields and the
SortableTextField) are feasible. Since I am on the 7.5.0 version of Solr, I
will go ahead with the SortableTextField option.

 Thank you team!!

On Fri, Dec 7, 2018 at 8:46 PM Alexandre Rafalovitch 
wrote:

> If you are on the latest Solr (7.3+), try switching from TextField to
> SortableTextField in your string_ci definition above.
>
> That type implicitly uses docValues and should return original text
> for faceting purposes, while still allowing analyzers.
>
> Regards,
>Alex.
> On Thu, 6 Dec 2018 at 08:26, Ritesh Kumar
>  wrote:
> >
> > Hello team,
> >
> > I am trying to prepare facet on a field of type string. The facet data
> will
> > be shown according to the user's query on this very field.
> >
> >  > required="false" multiValued="false"/>
> >
> >
> > As this field is of type string, it works fine with case sensitive
> query. I
> > want to be able to query on this field irrespective of the case.
> >
> > I tried changing the field type to string_ci as defined below
> >
> >  > omitNorms="true">
> > 
> > 
> > 
> > 
> > 
> >
> >  > required="false" multiValued="false"/>
> >
> > Now, in this case, I am able to perform a case-insensitive query but the
> > facet values are being shown in lowercase.
> >
> > I want to be able to perform a case-insensitive query on this field but
> > show the original data.
> > Is there anything I can do achieve this.
> >
> > Best,
> >
> > --
> > Ritesh Kumar
>


Keyword field with tabs in Solr 7.4

2018-12-11 Thread Michael Aleythe, Sternwald
Hey everybody,

i have a Solr field keyword field defined as:


 
   
 




Some documents have tabs (\t) indexed in this field, e.g. 
IPTC_2_080_KY:"\tbus\tbahn"

How can i query this content? I tried  "\tbus\tbahn", 
\\tbus\\tbahn and " bus bahn" but nothing matches. Does 
anybody know what to do?

Regards
Michael