Re: Solr Auto-Complete
For suffix matches, you copy text the field and in the different type add string reversal for both index and query portions. So you are doing prefix matching algorithm but on reversed strings. I can dig up an example if it is not clear. On 6 Dec 2015 8:06 am, "Salman Ansari"wrote: > That is right. I am actually looking for phrase prefixes not each term > prefix within the phrase. That satisfies my requirements. However, my > additional question was how do I manipulate the filedType to later allow > for suffix matches as well? or will that be a completely different > fieldType definition? > > Regards, > Salman > > > On Sun, Dec 6, 2015 at 2:12 PM, Andrea Gazzarini > wrote: > > > Sorry, my damned mobile: "Is that close to what you were looking for?" > > > > 2015-12-06 12:07 GMT+01:00 Andrea Gazzarini : > > > > > Do you mean "phrase" or "term" prefixes? If you try to put a field > value > > > (two or more terms) in the analysis page you will see what the index > > > analyzer chain (of my example field type) is doing. The whole value is > > > managed as a single-ngrammed token, so you will get only a phrase > prefix > > > search, as in your request. > > > > > > If you want to manage also terms prefixes, I would also index another > > > field (similar to the example you posted); then, the search handler > with > > > e(dismax) would have something like this: > > > > > > > > >> > > > text_suggestion_phrase_prefix_search^b1 > > > text_suggestion_terms_prefix_search^b2 > > > > > > > > > > > > > > > b1 and b2 values strictly depend on your search logic. > > > > > > Is that close that what you were looking for? > > > > > > Best, > > > Andrea > > > > > > > > > > > > 2015-12-06 11:53 GMT+01:00 Salman Ansari : > > > > > >> Thanks a lot Andrea. It did work. > > >> > > >> However, just for my understanding, can you please explain more how > did > > >> you > > >> make it work for prefixes. I know you mentioned using another > Tokenizer > > >> but > > >> for example, if I want to tweak it later on to work on suffixes or > > within > > >> phrases how should I go about that? > > >> > > >> Thanks again for your help. > > >> > > >> Regards, > > >> Salman > > >> > > >> > > >> On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini < > a.gazzar...@gmail.com > > > > > >> wrote: > > >> > > >> > Hi Salman, > > >> > that's because you're using a StandardTokenizer. Try with something > > like > > >> > this (copied, pasted and changed using my phone so probably with a > lot > > >> of > > >> > mistakes ;) but you should be able to get what I mean). BTW I don't > > >> know if > > >> > that's the case but I would also put a MappingCharFilterFactory > > >> > > > >> > > >> > positionIncrementGap="100"> > > >> > > > >> > * > >> > mapping="mapping-FoldToASCII.txt"/>* > > >> > > > >> > > > >> > > >> > generateWordParts="0" generateNumberParts="0" catenateAll="1" > > >> > splitOnCaseChange="0" /> > > >> > > >> > maxGramSize="20"/> > > >> > > > >> > > > >> > * > >> > mapping="mapping-FoldToASCII.txt"/>* > > >> > > > >> > > > >> > > >> > generateWordParts="0" generateNumberParts="0" catenateAll="1" > > >> > splitOnCaseChange="0" /> > > >> > > > >> > > > >> > > > >> > > > >> > 2015-12-06 9:36 GMT+01:00 Salman Ansari : > > >> > > > >> > > Hi, > > >> > > > > >> > > > > >> > > > > >> > > I have updated my schema.xml as mentioned in the previous posts > > using > > >> > > > > >> > > > > >> > > > > >> > > > >> > > positionIncrementGap="100"> > > >> > > > > >> > > > > >> > > > > >> > > > >> minGramSize="1" > > >> > > maxGramSize="20"/> > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > This does the auto-complete, but it does it at every portion of > the > > >> text > > >> > > (not just at the beginning) (prefix). So searching for "And" in my > > >> field > > >> > > for locations returns both of the following documents. > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > 1 > > >> > > > > >> > > AD > > >> > > > > >> > > *And*orra > > >> > > > > >> > > أندورا > > >> > > > > >> > > 1519794717684924416 > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > 5 > > >> > > > > >> > > AG > > >> > > > > >> > > Antigua *and* Barbuda > > >> > > > > >> > > أنتيجوا وبربودا > > >> > > > > >> > > 1519794717701701633 > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > I have read about this and at first I thought I need to add > > >> side="front" > > >> > > but after adding that, Solr returned an error (when
Re: Solr Auto-Complete
Do you mean "phrase" or "term" prefixes? If you try to put a field value (two or more terms) in the analysis page you will see what the index analyzer chain (of my example field type) is doing. The whole value is managed as a single-ngrammed token, so you will get only a phrase prefix search, as in your request. If you want to manage also terms prefixes, I would also index another field (similar to the example you posted); then, the search handler with e(dismax) would have something like this: > text_suggestion_phrase_prefix_search^b1 text_suggestion_terms_prefix_search^b2 b1 and b2 values strictly depend on your search logic. Is that close that what you were looking for? Best, Andrea 2015-12-06 11:53 GMT+01:00 Salman Ansari: > Thanks a lot Andrea. It did work. > > However, just for my understanding, can you please explain more how did you > make it work for prefixes. I know you mentioned using another Tokenizer but > for example, if I want to tweak it later on to work on suffixes or within > phrases how should I go about that? > > Thanks again for your help. > > Regards, > Salman > > > On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini > wrote: > > > Hi Salman, > > that's because you're using a StandardTokenizer. Try with something like > > this (copied, pasted and changed using my phone so probably with a lot of > > mistakes ;) but you should be able to get what I mean). BTW I don't know > if > > that's the case but I would also put a MappingCharFilterFactory > > > > > positionIncrementGap="100"> > > > > * > mapping="mapping-FoldToASCII.txt"/>* > > > > > > > generateWordParts="0" generateNumberParts="0" catenateAll="1" > > splitOnCaseChange="0" /> > > > maxGramSize="20"/> > > > > > > * > mapping="mapping-FoldToASCII.txt"/>* > > > > > > > generateWordParts="0" generateNumberParts="0" catenateAll="1" > > splitOnCaseChange="0" /> > > > > > > > > > > 2015-12-06 9:36 GMT+01:00 Salman Ansari : > > > > > Hi, > > > > > > > > > > > > I have updated my schema.xml as mentioned in the previous posts using > > > > > > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > maxGramSize="20"/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This does the auto-complete, but it does it at every portion of the > text > > > (not just at the beginning) (prefix). So searching for "And" in my > field > > > for locations returns both of the following documents. > > > > > > > > > > > > > > > > > > 1 > > > > > > AD > > > > > > *And*orra > > > > > > أندورا > > > > > > 1519794717684924416 > > > > > > > > > > > > > > > > > > 5 > > > > > > AG > > > > > > Antigua *and* Barbuda > > > > > > أنتيجوا وبربودا > > > > > > 1519794717701701633 > > > > > > > > > > > > > > > > > > I have read about this and at first I thought I need to add > side="front" > > > but after adding that, Solr returned an error (when creating a > > collection) > > > indicating "Unknown parameters
Re: Solr Auto-Complete
Hi, I have updated my schema.xml as mentioned in the previous posts using This does the auto-complete, but it does it at every portion of the text (not just at the beginning) (prefix). So searching for "And" in my field for locations returns both of the following documents. 1 AD *And*orra أندورا 1519794717684924416 5 AG Antigua *and* Barbuda أنتيجوا وبربودا 1519794717701701633 I have read about this and at first I thought I need to add side="front" but after adding that, Solr returned an error (when creating a collection) indicating "Unknown parameters
Re: Solr Auto-Complete
Hi Salman, that's because you're using a StandardTokenizer. Try with something like this (copied, pasted and changed using my phone so probably with a lot of mistakes ;) but you should be able to get what I mean). BTW I don't know if that's the case but I would also put a MappingCharFilterFactory ** ** 2015-12-06 9:36 GMT+01:00 Salman Ansari: > Hi, > > > > I have updated my schema.xml as mentioned in the previous posts using > > > > positionIncrementGap="100"> > > > > maxGramSize="20"/> > > > > > > > > > > This does the auto-complete, but it does it at every portion of the text > (not just at the beginning) (prefix). So searching for "And" in my field > for locations returns both of the following documents. > > > > > > 1 > > AD > > *And*orra > > أندورا > > 1519794717684924416 > > > > > > 5 > > AG > > Antigua *and* Barbuda > > أنتيجوا وبربودا > > 1519794717701701633 > > > > > > I have read about this and at first I thought I need to add side="front" > but after adding that, Solr returned an error (when creating a collection) > indicating "Unknown parameters
Re: Solr Auto-Complete
Thanks a lot Andrea. It did work. However, just for my understanding, can you please explain more how did you make it work for prefixes. I know you mentioned using another Tokenizer but for example, if I want to tweak it later on to work on suffixes or within phrases how should I go about that? Thanks again for your help. Regards, Salman On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzariniwrote: > Hi Salman, > that's because you're using a StandardTokenizer. Try with something like > this (copied, pasted and changed using my phone so probably with a lot of > mistakes ;) but you should be able to get what I mean). BTW I don't know if > that's the case but I would also put a MappingCharFilterFactory > > positionIncrementGap="100"> > > * mapping="mapping-FoldToASCII.txt"/>* > > > generateWordParts="0" generateNumberParts="0" catenateAll="1" > splitOnCaseChange="0" /> > maxGramSize="20"/> > > > * mapping="mapping-FoldToASCII.txt"/>* > > > generateWordParts="0" generateNumberParts="0" catenateAll="1" > splitOnCaseChange="0" /> > > > > > 2015-12-06 9:36 GMT+01:00 Salman Ansari : > > > Hi, > > > > > > > > I have updated my schema.xml as mentioned in the previous posts using > > > > > > > > > positionIncrementGap="100"> > > > > > > > > > maxGramSize="20"/> > > > > > > > > > > > > > > > > > > > > This does the auto-complete, but it does it at every portion of the text > > (not just at the beginning) (prefix). So searching for "And" in my field > > for locations returns both of the following documents. > > > > > > > > > > > > 1 > > > > AD > > > > *And*orra > > > > أندورا > > > > 1519794717684924416 > > > > > > > > > > > > 5 > > > > AG > > > > Antigua *and* Barbuda > > > > أنتيجوا وبربودا > > > > 1519794717701701633 > > > > > > > > > > > > I have read about this and at first I thought I need to add side="front" > > but after adding that, Solr returned an error (when creating a > collection) > > indicating "Unknown parameters
Re: Solr Auto-Complete
Sorry, my damned mobile: "Is that close to what you were looking for?" 2015-12-06 12:07 GMT+01:00 Andrea Gazzarini: > Do you mean "phrase" or "term" prefixes? If you try to put a field value > (two or more terms) in the analysis page you will see what the index > analyzer chain (of my example field type) is doing. The whole value is > managed as a single-ngrammed token, so you will get only a phrase prefix > search, as in your request. > > If you want to manage also terms prefixes, I would also index another > field (similar to the example you posted); then, the search handler with > e(dismax) would have something like this: > > >> > text_suggestion_phrase_prefix_search^b1 > text_suggestion_terms_prefix_search^b2 > > > > > b1 and b2 values strictly depend on your search logic. > > Is that close that what you were looking for? > > Best, > Andrea > > > > 2015-12-06 11:53 GMT+01:00 Salman Ansari : > >> Thanks a lot Andrea. It did work. >> >> However, just for my understanding, can you please explain more how did >> you >> make it work for prefixes. I know you mentioned using another Tokenizer >> but >> for example, if I want to tweak it later on to work on suffixes or within >> phrases how should I go about that? >> >> Thanks again for your help. >> >> Regards, >> Salman >> >> >> On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini >> wrote: >> >> > Hi Salman, >> > that's because you're using a StandardTokenizer. Try with something like >> > this (copied, pasted and changed using my phone so probably with a lot >> of >> > mistakes ;) but you should be able to get what I mean). BTW I don't >> know if >> > that's the case but I would also put a MappingCharFilterFactory >> > >> > > > positionIncrementGap="100"> >> > >> > *> > mapping="mapping-FoldToASCII.txt"/>* >> > >> > >> > > > generateWordParts="0" generateNumberParts="0" catenateAll="1" >> > splitOnCaseChange="0" /> >> > > > maxGramSize="20"/> >> > >> > >> > *> > mapping="mapping-FoldToASCII.txt"/>* >> > >> > >> > > > generateWordParts="0" generateNumberParts="0" catenateAll="1" >> > splitOnCaseChange="0" /> >> > >> > >> > >> > >> > 2015-12-06 9:36 GMT+01:00 Salman Ansari : >> > >> > > Hi, >> > > >> > > >> > > >> > > I have updated my schema.xml as mentioned in the previous posts using >> > > >> > > >> > > >> > > > > > positionIncrementGap="100"> >> > > >> > > >> > > >> > > > minGramSize="1" >> > > maxGramSize="20"/> >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > This does the auto-complete, but it does it at every portion of the >> text >> > > (not just at the beginning) (prefix). So searching for "And" in my >> field >> > > for locations returns both of the following documents. >> > > >> > > >> > > >> > > >> > > >> > > 1 >> > > >> > > AD >> > > >> > > *And*orra >> > > >> > > أندورا >> > > >> > > 1519794717684924416 >> > > >> > > >> > > >> > > >> > > >> > > 5 >> > > >> > > AG >> > > >> > > Antigua *and* Barbuda >> > > >> > > أنتيجوا وبربودا >> > > >> > > 1519794717701701633 >> > > >> > > >> > > >> > > >> > > >> > > I have read about this and at first I thought I need to add >> side="front" >> > > but after adding that, Solr returned an error (when creating a >> > collection) >> > > indicating "Unknown parameters
Re: Solr Auto-Complete
That is right. I am actually looking for phrase prefixes not each term prefix within the phrase. That satisfies my requirements. However, my additional question was how do I manipulate the filedType to later allow for suffix matches as well? or will that be a completely different fieldType definition? Regards, Salman On Sun, Dec 6, 2015 at 2:12 PM, Andrea Gazzariniwrote: > Sorry, my damned mobile: "Is that close to what you were looking for?" > > 2015-12-06 12:07 GMT+01:00 Andrea Gazzarini : > > > Do you mean "phrase" or "term" prefixes? If you try to put a field value > > (two or more terms) in the analysis page you will see what the index > > analyzer chain (of my example field type) is doing. The whole value is > > managed as a single-ngrammed token, so you will get only a phrase prefix > > search, as in your request. > > > > If you want to manage also terms prefixes, I would also index another > > field (similar to the example you posted); then, the search handler with > > e(dismax) would have something like this: > > > > > >> > > text_suggestion_phrase_prefix_search^b1 > > text_suggestion_terms_prefix_search^b2 > > > > > > > > > > b1 and b2 values strictly depend on your search logic. > > > > Is that close that what you were looking for? > > > > Best, > > Andrea > > > > > > > > 2015-12-06 11:53 GMT+01:00 Salman Ansari : > > > >> Thanks a lot Andrea. It did work. > >> > >> However, just for my understanding, can you please explain more how did > >> you > >> make it work for prefixes. I know you mentioned using another Tokenizer > >> but > >> for example, if I want to tweak it later on to work on suffixes or > within > >> phrases how should I go about that? > >> > >> Thanks again for your help. > >> > >> Regards, > >> Salman > >> > >> > >> On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini > > >> wrote: > >> > >> > Hi Salman, > >> > that's because you're using a StandardTokenizer. Try with something > like > >> > this (copied, pasted and changed using my phone so probably with a lot > >> of > >> > mistakes ;) but you should be able to get what I mean). BTW I don't > >> know if > >> > that's the case but I would also put a MappingCharFilterFactory > >> > > >> > >> > positionIncrementGap="100"> > >> > > >> > * >> > mapping="mapping-FoldToASCII.txt"/>* > >> > > >> > > >> > >> > generateWordParts="0" generateNumberParts="0" catenateAll="1" > >> > splitOnCaseChange="0" /> > >> > >> > maxGramSize="20"/> > >> > > >> > > >> > * >> > mapping="mapping-FoldToASCII.txt"/>* > >> > > >> > > >> > >> > generateWordParts="0" generateNumberParts="0" catenateAll="1" > >> > splitOnCaseChange="0" /> > >> > > >> > > >> > > >> > > >> > 2015-12-06 9:36 GMT+01:00 Salman Ansari : > >> > > >> > > Hi, > >> > > > >> > > > >> > > > >> > > I have updated my schema.xml as mentioned in the previous posts > using > >> > > > >> > > > >> > > > >> > > >> > > positionIncrementGap="100"> > >> > > > >> > > > >> > > > >> > > >> minGramSize="1" > >> > > maxGramSize="20"/> > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > This does the auto-complete, but it does it at every portion of the > >> text > >> > > (not just at the beginning) (prefix). So searching for "And" in my > >> field > >> > > for locations returns both of the following documents. > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > 1 > >> > > > >> > > AD > >> > > > >> > > *And*orra > >> > > > >> > > أندورا > >> > > > >> > > 1519794717684924416 > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > 5 > >> > > > >> > > AG > >> > > > >> > > Antigua *and* Barbuda > >> > > > >> > > أنتيجوا وبربودا > >> > > > >> > > 1519794717701701633 > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > I have read about this and at first I thought I need to add > >> side="front" > >> > > but after adding that, Solr returned an error (when creating a > >> > collection) > >> > > indicating "Unknown parameters
Re: Authorization API versus zkcli.sh
There's nothing cluster specific in security.json if you're using those plugins. It is totally safe to just take the file from one cluster and upload it for another for things to work. On Sat, Dec 5, 2015 at 3:38 AM, Oakley, Craig (NIH/NLM/NCBI) [C] < craig.oak...@nih.gov> wrote: > Looking through > cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins > one notices that security.json is initially created by zkcli.sh, and then > modified by means of the Authentication API and the Authorization API. By > and large, this sounds like a good way to accomplish such tasks, assuming > that these APIs do some error checking to prevent corruption of > security.json > > I was wondering about cases where one is cloning an existing Solr > instance, such as when creating an instance in Amazon Cloud. If one has a > security.json that has been thoroughly tried and successfully tested on > another Solr instance, is it possible / safe / not-un-recommended to use > zkcli.sh to load the full security.json (as extracted via zkcli.sh from the > Zookeeper of the thoroughly tested existing instance)? Or would the > official verdict be that the only acceptable way to create security.json is > to load a minimal version with zkcli.sh and then to build the remaining > components with the Authentication API and the Authorization API (in a > script, if one wants to automate the process: although such a script would > have to include plain-text passwords)? > > I figured there is no harm in asking. > -- Anshum Gupta
import file to solr
Hi, I am trying to import xml files using data import request handler. When i import xml file of 1,4 kB size, it works correctly. However, i cannot import xml file of 4 GB size to Solr. It does not present any error, but i receive the following answer: *Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. (Duration: 39s)* Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 Also, both of these files have the same structure (same elements/attributes). I would like to ask you, if there are any limits regarding the size of xml files, which we can import to solr. Thank you! Best, Kate
Re: import file to solr
Still, 4GB is going to take a lot of resources to 1> hold the whole thing in memory and parse 2> process. You may simply be hitting a timeout. But I would ask what practical use indexing a 4GB file is. Likely it'll be found by virtually every search (assuming there's a huge text field or two in there) and also appear near the bottom of the list relevance wise. Then there's the problem of anyone ever actually being able to view the file in their browser (although I don't know what your app is, so maybe that's not a concern) This somewhat sounds like an XY problem. _Why_ do you want to index a 4G file? What's the use-case you're supporting? Best Erick On Sun, Dec 6, 2015 at 3:26 PM, Alexandre Rafalovitchwrote: > There should be no limit. Try 100K, 50K sizes. Maybe you have an error > somewhere. Also check Solr logs, not just DIH messages. > On 6 Dec 2015 3:56 pm, "Kate Kas" wrote: > >> Hi, >> >> I am trying to import xml files using data import request handler. >> >> When i import xml file of 1,4 kB size, it works correctly. However, i >> cannot import xml file of 4 GB size to Solr. It does not present any >> error, but i receive the following answer: >> >> *Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. >> (Duration: 39s)* Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, >> Processed: >> 0 >> >> Also, both of these files have the same structure (same >> elements/attributes). >> >> I would like to ask you, if there are any limits regarding the size of xml >> files, which we can import to solr. >> >> Thank you! >> >> Best, >> Kate >>
Re: import file to solr
There should be no limit. Try 100K, 50K sizes. Maybe you have an error somewhere. Also check Solr logs, not just DIH messages. On 6 Dec 2015 3:56 pm, "Kate Kas"wrote: > Hi, > > I am trying to import xml files using data import request handler. > > When i import xml file of 1,4 kB size, it works correctly. However, i > cannot import xml file of 4 GB size to Solr. It does not present any > error, but i receive the following answer: > > *Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. > (Duration: 39s)* Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, > Processed: > 0 > > Also, both of these files have the same structure (same > elements/attributes). > > I would like to ask you, if there are any limits regarding the size of xml > files, which we can import to solr. > > Thank you! > > Best, > Kate >
Match All terms in indexed field value
Scenario: Document should be matched/returned ONLY IF user entered search text is having ALL the terms of single indexed field in any order. Ex. Document has got only 2 fields. Id and title. Below document is indexed. {"id":"1", "title": "refrigerator water filter"} Below search text should NOT return the document as search text is subset of the indexed field value. 1) water filter 2) refrigerator Below search texts should return the document as all indexed terms of title field are present in the search term. 1) ABC refrigerator water filter 2) water filter ABC refrigerator 3) water filter refrigerator Please advise how to model this scenario in SOLR? -- View this message in context: http://lucene.472066.n3.nabble.com/Match-All-terms-in-indexed-field-value-tp4243895.html Sent from the Solr - User mailing list archive at Nabble.com.