most stable way to get facet pivoting
Hi, I want to evaluate (and probably use in production) facet pivoting - what is the best approach to get a as-stable-as-can-be version of solr which is able to do facet pivoting? I was hoping to see this in Solr 3.1, but apparently it is only in the dev versions/nightlies... Is it possible to patch this feature into Solr 3.1 stable? best regards, Nik -- Nikolas Tautenhahn nikolas.tautenh...@livinglogic.de http://www.livinglogic.de LivingLogic AG Markgrafenallee 44 95448 Bayreuth Amtsgericht Bayreuth ++ HRB 3274 Aufsichtsratsvorsitzender: Achim Lindner Vorstand: Philipp Ambrosch, Alois Kastner-Maresch (Vors.)
most stable way to get facet pivoting
Hi, I want to evaluate (and probably use in production) facet pivoting - what is the best approach to get a as-stable-as-can-be version of solr which is able to do facet pivoting? I was hoping to see this in Solr 3.1, but apparently it is only in the dev versions/nightlies... Is it possible to patch this feature into Solr 3.1 stable? best regards, Nik
Re: Proper Escaping of Ampersands
Hi Chris, On 23.08.2010 21:37, Chris Hostetter wrote: : The document is indexed correctly, a search for at s found it and all : fields looked great (ats and not for example, atamp;s). : : As my stopword list does not contain at or or amp;, I don't : quite understand, why my result is found, when I disable the : stopword-list. My stopwordlist can be found here : : http://pastebin.com/RfLuBHqd : : Do you happen to see bad things for a string like ats here? s is in your stopwords file, which may be part of the problem (but i didn't look hard at your query string to verify that) : The analysis page in the admin panel tells me, these steps for the Index : Analyzer: ... : (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats : : So, according to this, it should be found even with my stopwords enabled... Strange, based on the stopwords file you posted the s should definitely be getting removed at index time -- it would also get removed at query time, but because you have it *before* WDF at query time that wouldn't affect this query (even though it did affect the index) There was a bug with analysis.jsp and stopwords recently, but that shouldn't have affected 1.4 (you are definitely using 1.4, correct?) https://issues.apache.org/jira/browse/SOLR-2051 I am using solr 1.4 (actually LucidWorks Solr) in production and tried 1.4.1 for testing - unfortunately I can't tell for sure, if I tried the analysis.jsp in both... I moved the stopword filter before the WordDelimiterFilter - thanks for your hints, Chris and Yonik! best regards, Nikolas Tautenhahn
Re: Proper Escaping of Ampersands
Hi Yonik, I got it working, but I think the Stopword Filter is not behaving as expected - (The document could be found when I disabled the stopword filter, details later in this mail...) On 20.08.2010 16:57, Yonik Seeley wrote On Thu, Aug 19, 2010 at 11:33 AM, Nikolas Tautenhahn nik_s...@livinglogic.de wrote: But when I search for q=at%26s (=ats), I get nothing. That's the correct encoding if you're typing it directly into a browser address box. http://localhost:8983/solr/select?defType=dismaxqf=textq=at%26sdebugQuery=true But you should be able to verify that solr is getting the correct query string by checking out params in the response (in the example server, by default they are echoed back). And adding debugQuery=true to the request should show you exactly what query is being generated. But the real issue likely lies with your fieldType definition. Can you show that? As I (normally) query multiple fields, I changed my request URL to http://127.0.0.1:8983/solr/select?q=at%26sfl=titelqt=dismaxqf=titeldebugQuery=truefl=*qt=dismaxqf=titeldebugQuery=true in order to narrow it down and got this response (cut to, as I think, relevant stuff) str name=rawquerystringats/str str name=querystringats/str str name=parsedquery+DisjunctionMaxQuery((titel:(ats at) s)~0.1) ()/str str name=parsedquery_toString+(titel:(ats at) s)~0.1 ()/str lst name=explain/ str name=QParserDisMaxQParser/str on my local debugging instance, using standard dismax config (from the examples directory at solr). The titel-Field is configured like this: field name=titel type=textgen indexed=true stored=true/ and textgen is configured like this fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.HTMLStripStandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType The document is indexed correctly, a search for at s found it and all fields looked great (ats and not for example, atamp;s). As my stopword list does not contain at or or amp;, I don't quite understand, why my result is found, when I disable the stopword-list. My stopwordlist can be found here http://pastebin.com/RfLuBHqd Do you happen to see bad things for a string like ats here? The analysis page in the admin panel tells me, these steps for the Index Analyzer: (HTMLStripStandardTokenizer) ats = ats (SynonymFilter) ats = ats (WordDelimiterFilter) ats = term position 1: ats, at; term pos 2: s, ats (LowerCaseFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: s, ats (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats So, according to this, it should be found even with my stopwords enabled... best regards and thanks for your response, Nikolas Tautenhahn
Re: Tokenising on Each Letter
Hi Scottie, Could you elaborate about N gram for me, based on my schema? just a quick reply: fieldType name=textNGram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory side=front minGramSize=2 maxGramSize=30 / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Will produce any NGrams from 2 up to 30 Characters, for Info check http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Be sure to adjust those sizes (minGramSize/maxGramSize) so that maxGramSize is big enough to keep the whole original serial number/model number and minGramSize is not so small that you fill your index with useless information. Best regards, Nikolas Tautenhahn
Re: Proper Escaping of Ampersands
Hi all, just some further information: https://issues.apache.org/jira/browse/SOLR-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel seems to be the same problem - but searching the archives yielded nothing I could use. Any hints on this? best regards, Nikolas Tautenhahn Am 19.08.2010 17:33, schrieb Nikolas Tautenhahn: Hi, I have a problem with, for example, company names like ATS. A Job is sending data to the solr 1.4 (also tested it with 1.4.1) index via python in XML, everything is escaped properly ( becomes amp;). When I search for at s(q=%22at%20s%22), using the dismax handler, I find the dataset to this company and I get all names back (The company is still called ats and not something like atamp;s). But when I search for q=at%26s (=ats), I get nothing. I also tried q=at%5C%26s (=at\s) and q=at%5C%5C%26s blindly following any clues for escaping with backslashes... So, my question is: How do I search (correctly) for ats? When I use the Analysis Page in the admin panel and select my fieldname and enter Field Value (Index) ATS and enter the Field Value (Query) as ATS it shows me that the query matches - so I assume, SOLR doesn't get the correct query string... If it is necessary, I can supply information from schema.xml for the fields in use, but as the Analysis-Page showed the match, I don't think this is very useful... best regards, Nikolas Tautenhahn
Proper Escaping of Ampersands
Hi, I have a problem with, for example, company names like ATS. A Job is sending data to the solr 1.4 (also tested it with 1.4.1) index via python in XML, everything is escaped properly ( becomes amp;). When I search for at s(q=%22at%20s%22), using the dismax handler, I find the dataset to this company and I get all names back (The company is still called ats and not something like atamp;s). But when I search for q=at%26s (=ats), I get nothing. I also tried q=at%5C%26s (=at\s) and q=at%5C%5C%26s blindly following any clues for escaping with backslashes... So, my question is: How do I search (correctly) for ats? When I use the Analysis Page in the admin panel and select my fieldname and enter Field Value (Index) ATS and enter the Field Value (Query) as ATS it shows me that the query matches - so I assume, SOLR doesn't get the correct query string... If it is necessary, I can supply information from schema.xml for the fields in use, but as the Analysis-Page showed the match, I don't think this is very useful... best regards, Nikolas Tautenhahn