Solr Autosuggest help
Hi, I am using Solr (1.4.1) AutoSuggest feature using termsComponent. Currently, if I type 'goo' means, Solr suggest words like 'google'. But I would like to receive suggestions like 'google, google alerts, ..' . ie, suggestions with single and multiple terms. Not sure, whether I need to use edgengrams for that. for eg, indexing google like 'go', 'oo', 'og', ... . But I think I don't need this, Since I don't want partial search. Please let me know if there is any way to do multiple word suggestions . Thanks in Advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2580944.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to handle special character in filter query
Hello, Regarding HTTP specific characters(like spaces and ) , you'll need to URL-encode those if you are firing queries directly to Solr but you don't need to do so if you are using a Solr client such as SolrJ. Regards, - Savvas On 26 February 2011 03:11, cyang2010 ysxsu...@hotmail.com wrote: How to handle special character when constructing filter query? for example, i want to do something like: http://.fq=genre:ACTION ADVENTURE How do i handle the space and in the filter query part? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-handle-special-character-in-filter-query-tp2579978p2579978.html Sent from the Solr - User mailing list archive at Nabble.com.
Text field not defined in Solr Schema?
Hello list, I have recently been working on some JS (ajax solr) and when using Firebug I am alerted to an error within the JS file as below. It immediately breaks on line 12 stating that 'doc.text' is undefined! Here is the code snippet. 10 AjaxSolr.theme.prototype.snippet = function (doc) { 11 var output = ''; 12 if (doc.text.length 300) { 13 output += doc.dateline + ' ' + doc.text.substring(0, 300); 14 output += 'span style=display:none;' + doc.text.substring(300); 15 output += '/span a href=# class=moremore/a'; 16 } 17 else { 18 output += doc.dateline + ' ' + doc.text; 19 } 20 return output; 21 }; I have been advised that the problem might stem from my schema not defining a text field, however as my implementation of Solr is currently geared to index docs from a Nutch web crawl I am using the Nutch schema. A snippet of the schema is below schema name=nutch version=1.1 types ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer ... /types fields ... field name=content type=text stored=true indexed=true/ /fields /schema Can someone confirm if I require to add something similar to the following fields ... field name=text type=text stored=true indexed=true/ /fields Then perform a fresh crawl and reindex so that the schema field is recognised by the JS snippet? Also (sorry I apologise) from my reading on the Solr schema, I became intrigued in options for TextField... namely compressed and compressThreshold. I understand that they are used hand in glove, however can anyone please explain what benefits compression adds and what integer value should be appropriate for the latter option. Any help would be great Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Make syntax highlighter caseinsensitive
On 02/25/2011 03:02 PM, Koji Sekiguchi wrote: (11/02/25 18:30), Tarjei Huse wrote: Hi, On 02/25/2011 02:06 AM, Koji Sekiguchi wrote: (11/02/24 20:18), Tarjei Huse wrote: Hi, I got an index where I have two fields, body and caseInsensitiveBody. Body is indexed and stored while caseInsensitiveBody is just indexed. The idea is that by not storing the caseInsensitiveBody I save some space and gain some performance. So I query against the caseInsensitiveBody and generate highlighting from the case sensitive one. The problem is that as a result, I am missing highlighting terms. For example, when I search for solr and get a match in caseInsensitiveBody for solr but that it is Solr in the original document, no highlighting is done. Is there a way around this? Currently I am using the following highlighting params: 'hl' = 'on', 'hl.fl' = 'header,body', 'hl.usePhraseHighlighter' = 'true', 'hl.highlightMultiTerm' = 'true', 'hl.fragsize' = 200, 'hl.regex.pattern' = '[-\w ,/\n\\']{20,200}', Tarjei, Maybe silly question, but why no you make body field case insensitive and eliminate caseInsensitiveBody field, and then query and highlight on just body field? Not silly. I need to support usage scenarios where case matters as well as scenarios where case doesn't matter. The best part would be if I could use one field for this, store it and handle case sensitivity in the query phase, but as I understand it, that is not possible. Hi Tarjei, If I understand it correctly, you want to highlight case insensitive way. If so, it is easy. You have: body: indexed but not stored caseInsensitiveBody: indexed and stored and request hl.fl=caseInsensitiveBody ? But I also want to be able to do it the other way around - i.e. I need to keep both options open so that I at duntime can select if I want to do a query that is or is not case insensitive. That is why I'm storing the non lowercased version of the field - with that I do not loose information. Regards, Tarjei Koji -- Regards / Med vennlig hilsen Tarjei Huse Mobil: 920 63 413
Studying all files of Solr SRC
Is there any place where a detailed tutorial about all the Java files of Apache Solr(under Src folder) is available.? I want to study them as my purpose is to either write codes for my implementation or modify the existing files to fulfill my purpose. Actually i want to add Advance Search in my Solr based search engine. This advance search will include options like ...at least half , as many as possible , most etc which are linguistic operators. We can say that these options will help the user in finding fuzziness in their search results. The user wants show me all the documents which contains at least half of terms like t1,t2,t3 or show me all the documents which contains most of the terms like t1,t5,t7 etc...These at least half and most have been given some weight . These advance search is different from normal boolean search. Thanks - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Studying-all-files-of-Solr-SRC-tp2581715p2581715.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Studying all files of Solr SRC
DismaxQParser's mm parameter might help you out: http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 Is there any place where a detailed tutorial about all the Java files of Apache Solr(under Src folder) is available.? I want to study them as my purpose is to either write codes for my implementation or modify the existing files to fulfill my purpose. Actually i want to add Advance Search in my Solr based search engine. This advance search will include options like ...at least half , as many as possible , most etc which are linguistic operators. We can say that these options will help the user in finding fuzziness in their search results. The user wants show me all the documents which contains at least half of terms like t1,t2,t3 or show me all the documents which contains most of the terms like t1,t5,t7 etc...These at least half and most have been given some weight . These advance search is different from normal boolean search. Thanks - Kumar Anurag
Re: Text field not defined in Solr Schema?
Yes, you need to add the field text of type Text or use content instead of text. Hello list, I have recently been working on some JS (ajax solr) and when using Firebug I am alerted to an error within the JS file as below. It immediately breaks on line 12 stating that 'doc.text' is undefined! Here is the code snippet. 10 AjaxSolr.theme.prototype.snippet = function (doc) { 11 var output = ''; 12 if (doc.text.length 300) { 13 output += doc.dateline + ' ' + doc.text.substring(0, 300); 14 output += 'span style=display:none;' + doc.text.substring(300); 15 output += '/span a href=# class=moremore/a'; 16 } 17 else { 18 output += doc.dateline + ' ' + doc.text; 19 } 20 return output; 21 }; I have been advised that the problem might stem from my schema not defining a text field, however as my implementation of Solr is currently geared to index docs from a Nutch web crawl I am using the Nutch schema. A snippet of the schema is below schema name=nutch version=1.1 types ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer ... /types fields ... field name=content type=text stored=true indexed=true/ /fields /schema Can someone confirm if I require to add something similar to the following fields ... field name=text type=text stored=true indexed=true/ /fields Then perform a fresh crawl and reindex so that the schema field is recognised by the JS snippet? Also (sorry I apologise) from my reading on the Solr schema, I became intrigued in options for TextField... namely compressed and compressThreshold. I understand that they are used hand in glove, however can anyone please explain what benefits compression adds and what integer value should be appropriate for the latter option. Any help would be great Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219, en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691 ,en.html
Re: Make syntax highlighter caseinsensitive
That is why I'm storing the non lowercased version of the field - with that I do not loose information. You do not loose information when you store lowercased version of the field. Koji -- http://www.rondhuit.com/en/
loading XML docbook files into solr
I've been working on this for a while an seem to hit a wall. The error messages aren't complete enought to give guidance why importing a sample docbook document into solr is not working. I'm using the curl tool to post the xml file and receive a non error message but the document count doesn't increase and the *:* returns no results still. The docbook document has a attribute id and this is mapped to the uniquekey in the schema.xml file. But it seems this may be the issue still. Its not clear how the field names map to the XML. Do they only map to attributes? or do they map to elements? How to you differentiate? Can field names in the schema.xml file have xpath statements? Are there other important sections of the solrconfig that could be keeping this from working? We want to maintain much of the document structure so we have more control over the searching. Here is what the docbook XML looks like: (tried setting the uniquekey to id and docid but no go either way) book label=issuebriefs id=proi docid245/docid titleabbrevAdvancing Return on Investment Analysis for Government IT: A Pu blic Value Framework /titleabbrev chapter titleAdvancing Return on Investment Analysis for Government IT: A Publ ic Value Framework/title para mediaobject imageobject imagedata fileref=/publications/annualreports/ar2006/image s/public-value.jpg format=jpg contentdepth=157 contentwidth=216 align=le ft/ /imageobject textobject phrasePublic Value Illustration/phrase /textobject /mediaobject .. Here is the section of the schema.xml field name=id type=string indexed=true stored=true multiValued=false required=true / field name=titleabbrev type=text indexed=true stored=true / field name=title type=text indexed=true stored=true / field name=para type=text indexed=true stored=true / field name=ulink type=string indexed=true stored=true / field name=listitem type=text indexed=true stored=true / field name=all_text type=text indexed=true stored=false multiValued=true / copyField source=title dest=all_text / copyField source=para dest=all_text / copyField source=listitem dest=all_text / copyField source=titleabbrev dest=all_text / /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldall_text/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ /schema Load command results. $ ./postfile.sh ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime56/int/lst /response ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime15/int/lst /response Thanks Derek
RE: Text field not defined in Solr Schema?
Thank you Markus, I am wondering if anyone can comment on the latter question I posted regarding supporting TextField or StrField with compression options. I understand the methodology behind configuring compressThreshold to the field type definition (1st part of my schema) and adding individual options to the individual field definitions (2nd part of my schema), my question regards any real benefits which can be gained when implemented in a 'small/medium' Solr use case. Thank you Lewis From: Markus Jelsma [markus.jel...@openindex.io] Sent: 26 February 2011 13:42 To: solr-user@lucene.apache.org Cc: McGibbney, Lewis John Subject: Re: Text field not defined in Solr Schema? Yes, you need to add the field text of type Text or use content instead of text. Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: loading XML docbook files into solr
On Sat, Feb 26, 2011 at 9:10 PM, Derek Werthmuller dwert...@ctg.albany.edu wrote: I've been working on this for a while an seem to hit a wall. The error messages aren't complete enought to give guidance why importing a sample docbook document into solr is not working. I'm using the curl tool to post the xml file and receive a non error message but the document count doesn't increase and the *:* returns no results still. [...] Which curl tool? The post.sh included with Solr? You refer to a postfile.sh below. Unless I am missing something, it seems like you are trying to post a standard XML file to Solr. You cannot do that. There are two ways to proceed: * Reformat the XML into Solr's format. See the .xml documents in the example/exampledocs directory of your Solr distribution, or see, e.g., http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html * Write a DataImportHandler script with an XPathEntityProcessor. Please see http://wiki.apache.org/solr/DataImportHandler Load command results. $ ./postfile.sh [...] This is not the problem here, but the standard Solr post.sh takes filenames to be posted as command-line arguments. Regards, Gora
Re: loading XML docbook files into solr
Hi Derek, The XML files you post to Solr needs to be in the correct Solr specific XML format. One way to preserve the original structure would be to flatten the document into field names indicating the position of the text, for example: book_titleabbrev: Advancing Return on Investment Analysis for Government IT:\ A Public Value Framework ... etc. But you will still have to parse your docbook XML into the appropriate schema that you want to use for Solr. I believe DIH also allows XSLT based preprocessors so you don't have to write parsing code, but I haven't used them. -sujit On Sat, 2011-02-26 at 10:40 -0500, Derek Werthmuller wrote: I've been working on this for a while an seem to hit a wall. The error messages aren't complete enought to give guidance why importing a sample docbook document into solr is not working. I'm using the curl tool to post the xml file and receive a non error message but the document count doesn't increase and the *:* returns no results still. The docbook document has a attribute id and this is mapped to the uniquekey in the schema.xml file. But it seems this may be the issue still. Its not clear how the field names map to the XML. Do they only map to attributes? or do they map to elements? How to you differentiate? Can field names in the schema.xml file have xpath statements? Are there other important sections of the solrconfig that could be keeping this from working? We want to maintain much of the document structure so we have more control over the searching. Here is what the docbook XML looks like: (tried setting the uniquekey to id and docid but no go either way) book label=issuebriefs id=proi docid245/docid titleabbrevAdvancing Return on Investment Analysis for Government IT: A Pu blic Value Framework /titleabbrev chapter titleAdvancing Return on Investment Analysis for Government IT: A Publ ic Value Framework/title para mediaobject imageobject imagedata fileref=/publications/annualreports/ar2006/image s/public-value.jpg format=jpg contentdepth=157 contentwidth=216 align=le ft/ /imageobject textobject phrasePublic Value Illustration/phrase /textobject /mediaobject .. Here is the section of the schema.xml field name=id type=string indexed=true stored=true multiValued=false required=true / field name=titleabbrev type=text indexed=true stored=true / field name=title type=text indexed=true stored=true / field name=para type=text indexed=true stored=true / field name=ulink type=string indexed=true stored=true / field name=listitem type=text indexed=true stored=true / field name=all_text type=text indexed=true stored=false multiValued=true / copyField source=title dest=all_text / copyField source=para dest=all_text / copyField source=listitem dest=all_text / copyField source=titleabbrev dest=all_text / /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldall_text/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ /schema Load command results. $ ./postfile.sh ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime56/int/lst /response ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime15/int/lst /response Thanks Derek
Re: How to handle special character in filter query
Try this: fq={!field f=category}insert value, URL encoded of course, here or double quote. Regards Le 26/02/2011 04:11, cyang2010 a écrit : How to handle special character when constructing filter query? for example, i want to do something like: http://.fq=genre:ACTION; ADVENTURE How do i handle the space and in the filter query part? Thanks.
Blacklist keyword list on dataimporter
Hi, Is there a way to drop document when indexing based of a blacklist keyword list? Something like the stopwords.txt... But in this case when one keyword is detected in a specific field at indexing, the whole doc would be skipped. Regards
Re: Solr Autosuggest help
I am using Solr (1.4.1) AutoSuggest feature using termsComponent. Currently, if I type 'goo' means, Solr suggest words like 'google'. But I would like to receive suggestions like 'google, google alerts, ..' . ie, suggestions with single and multiple terms. Not sure, whether I need to use edgengrams for that. for eg, indexing google like 'go', 'oo', 'og', ... . But I think I don't need this, Since I don't want partial search. Please let me know if there is any way to do multiple word suggestions . If you will stick with TermsComponent, you need to add ShingleFilterFactory to your index analyzer chain for that. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
RE: loading XML docbook files into solr
Thank you this clearifies a lot. -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Saturday, February 26, 2011 11:49 AM To: solr-user@lucene.apache.org Subject: Re: loading XML docbook files into solr On Sat, Feb 26, 2011 at 9:10 PM, Derek Werthmuller dwert...@ctg.albany.edu wrote: I've been working on this for a while an seem to hit a wall. The error messages aren't complete enought to give guidance why importing a sample docbook document into solr is not working. I'm using the curl tool to post the xml file and receive a non error message but the document count doesn't increase and the *:* returns no results still. [...] Which curl tool? The post.sh included with Solr? You refer to a postfile.sh below. Unless I am missing something, it seems like you are trying to post a standard XML file to Solr. You cannot do that. There are two ways to proceed: * Reformat the XML into Solr's format. See the .xml documents in the example/exampledocs directory of your Solr distribution, or see, e.g., http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.ht ml * Write a DataImportHandler script with an XPathEntityProcessor. Please see http://wiki.apache.org/solr/DataImportHandler Load command results. $ ./postfile.sh [...] This is not the problem here, but the standard Solr post.sh takes filenames to be posted as command-line arguments. Regards, Gora
Re: query results filter
Just stumbled on field collapsing ( http://wiki.apache.org/solr/FieldCollapsing ), which is apparently slated for inclusion in the next release. Looks like I should be able to achieve my unique field requirement w/ group.limit=1group.main=true in the query string. With regard to the known limitation Distributed search support for result grouping has not yet been implemented, does it work imperfectly with dist search, or does it fail? -Babak On Thu, Feb 24, 2011 at 10:20 PM, Babak Farhang farh...@gmail.com wrote: In my case, I want to filter out duplicate docs so that returned docs are unique w/ respect to a certain field (not the schema's unique field, of course): a duplicate doc here is one that has same value for a checksum field as one of the docs already in the results. It would be great if I could somehow express that w/ a query, but I don't think that would be possible. On Thu, Feb 24, 2011 at 5:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hmm, depending on what you are actually needing to do, can you do it with a simple fq param to filter out what you want filtered out, instead of needing to write custom Java as you are suggesting? It would be a lot easier to just use an fq. How would you describe the documents you want to filter from the query results page? Can that description be represented by a Solr query you can already represent using the lucene, dismax, or any other existing query? If so, why not just use a negated fq describing what to omit from the results? From: Babak Farhang [farh...@gmail.com] Sent: Thursday, February 24, 2011 6:58 PM To: solr-user Subject: query results filter Hi everyone, I have some existing solr cores that for one reason or another have documents that I need to filter from the query results page. I would like to do this inside Solr instead of doing it on the receiving end, in the client. After searching the mailing list archives and Solr wiki, it appears you do this by registering a custom SearchHandler / SearchComponent with Solr. Still, I don't quite understand how this machinery fits together. Any suggestions / ideas / pointers much appreciated! Cheers, -Babak ~~ Ideally, I'd like to find / code a solution that does the following: 1. A request handler that works like the StandardRequestHandler but which allows an optional DocFilter (say, modeled like the java.io.FileFilter interface) 2. Allows current pagination to work transparently. 3. Works transparently with distributed/sharded queries.