Different behavior (bug?) for RegExTransformer in Solr5
I'm experimenting with Solr5 (5.1.0 1672403 - timpotter - 2015-04-09 10:37:54). In my custom DIH, I use a RegExTransformer to load several columns, which may or may not be present. If present, the regexp matches and the data loads correctly in both Solr4 and 5. If not present and the regexp fails, the column is empty in Solr 4. But in Solr5 it contains the original string to be matched. In other words, in Solr 5.10, if the 'replaceWith' value is empty, 'replaceWith' appears to revert to the original string. Example: Column 'data' contains: column1:xxx,column3:yyy DIH regexp: field column=column1 regex=^.*column1:(.*?),.*$ replaceWith=$1 sourceColName=data / field column=column2 regex=^.*column2:(.*?),.*$ replaceWith=$1 sourceColName=data / field column=column3 regex=^.*column3:(.*?),.*$ replaceWith=$1 sourceColName=data / solr4: column1: xxx column2: column3: yyy solr5: column1:xxx column2: column1:xxx,column3:yyy column3: yyy
Re: lukeall.jar for Solr4r?
Thank you very much for taking the time to do this. This version is able to read the index files, but there is at least one issue: The home screen reports ERROR: can't count terms per field and this exception is thrown: java.util.NoSuchElementException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64) at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109) at org.getopt.luke.Luke$3.run(Luke.java:1165) On 11/05/2012 05:08 PM, Shawn Heisey wrote: On 11/5/2012 2:52 PM, Shawn Heisey wrote: No idea whether I did it right, or even whether it works. All my indexes are either 3.5 or 4.1-SNAPSHOT, so I can't actually test it. You can get to the resulting jar and my patch against the luke-4.0.0-ALPHA source: https://dl.dropbox.com/u/97770508/luke-4.0.0-unofficial.patch https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial.jar If you have an immediate need for 4.0.0 support in Luke, please try it out and let me know whether it works. If it doesn't work, or when the official luke 4.0.0 is released, I will remove those files from my dropbox. I just realized that the version I uploaded there was compiled with java 1.7.0_09. I don't know if this is actually a problem, but just in case, I re-did the compile on a machine with 1.6.0_29. The filename referenced above now points to this version and I have included a file that indicates its java7 origins: https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial-java7.jar Thanks, Shawn
Re: lukeall.jar for Solr4r?
I checked out luke-src-4.0.0-ALPHA.tgz, the most recent I could find, and compiled, but I still get the error Format version not supported (resource MMapIndexInput(path=/var/lib/tomcat6/solr/apache-solr-4.0.0-/core1/data/index/_7.tvx)): 1 needs to be between 0 and 0 Can anyone post a luke.jar capable of reading 4.0 indexes? On 10/27/2012 09:17 PM, Lance Norskog wrote: Aha! Andrzej has not built a 4.0 release version. You need to check out the source and compile your own. http://code.google.com/p/luke/downloads/list - Original Message - | From: Carrie Coyc...@ssww.com | To: solr-user@lucene.apache.org | Sent: Friday, October 26, 2012 7:33:45 AM | Subject: lukeall.jar for Solr4r? | | Where can I get a copy of Luke capable of reading Solr4 indexes? My | lukeall-4.0.0-ALPHA.jar no longer works. | | Thx, | Carrie Coy |
lukeall.jar for Solr4r?
Where can I get a copy of Luke capable of reading Solr4 indexes? My lukeall-4.0.0-ALPHA.jar no longer works. Thx, Carrie Coy
Re: UnsupportedOperationException: ExternalFileField (SOLVED)
The problem seems to have been caused by my failure to completely remove the existing index files when I switched the inStock field from an indexed boolean field to externally maintained. After I removed everything and re-indexed from scratch, the error went away. On 10/24/2012 08:57 PM, Carrie Coy wrote: (Solr4) I'm getting the following error trying to use ExternalFileField to maintain an inStock flag. Any idea what I'm doing wrong? schema.xml: field name=inStock type=file / fieldtype name=file keyField=id defVal=1 stored=false indexed=false class=solr.ExternalFileField valType=float/ -rw-r--r-- 1 tomcat tomcat 100434 Oct 24 20:07 external_inStock: YM0600=1 YM0544=1 YM0505=1 solrconfig.xml: str name=boostif(inStock,10,1)/str SEVERE: null:java.lang.UnsupportedOperationException at org.apache.solr.schema.ExternalFileField.write(ExternalFileField.java:85) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:130) at org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:355) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:275) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172) at org.apache.solr.response.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:154) at org.apache.solr.response.PHPWriter.writeNamedList(PHPResponseWriter.java:54) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:91) at org.apache.solr.response.PHPResponseWriter.write(PHPResponseWriter.java:36) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:411) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:289) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
UnsupportedOperationException: ExternalFileField
(Solr4) I'm getting the following error trying to use ExternalFileField to maintain an inStock flag. Any idea what I'm doing wrong? schema.xml: field name=inStock type=file / fieldtype name=file keyField=id defVal=1 stored=false indexed=false class=solr.ExternalFileField valType=float/ -rw-r--r-- 1 tomcat tomcat 100434 Oct 24 20:07 external_inStock: YM0600=1 YM0544=1 YM0505=1 solrconfig.xml: str name=boostif(inStock,10,1)/str SEVERE: null:java.lang.UnsupportedOperationException at org.apache.solr.schema.ExternalFileField.write(ExternalFileField.java:85) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:130) at org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:355) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:275) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172) at org.apache.solr.response.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:154) at org.apache.solr.response.PHPWriter.writeNamedList(PHPResponseWriter.java:54) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:91) at org.apache.solr.response.PHPResponseWriter.write(PHPResponseWriter.java:36) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:411) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:289) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
WordBreak spell correction makes split terms optional?
The user query design your ownbinoculars is corrected by the 'wordbreak' dictionary to: str name=querystringdesign your (own binoculars)/str Where are the parentheses coming from? Can I strip them with a post-processing filter? The parentheses make the terms optional, so, while the first match is excellent, the rest are irrelevant. Thx, Carrie Coy
omit tf using per-field CustomSimilarity?
I'm trying to configure per-field similarity to disregard term frequency (omitTf) in a 'title' field. I'm trying to follow the example docs without success: my custom similarity doesn't seem to have any effect on 'tf'. Is the NoTfSimilarity function below written correctly? Any advice is much appreciated. my schema.xml: field name=title type=text_custom_sim indexed=true stored=true omitNorms=true termVectors=true / similarity class=solr.SchemaSimilarityFactory/ fieldType name=text_custom_sim class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer NoTfSimilarityFactory.java: package com.ssww; import org.apache.lucene.search.similarities.Similarity; import org.apache.solr.schema.SimilarityFactory; public class NoTfSimilarityFactory extends SimilarityFactory { @Override public Similarity getSimilarity() { return new NoTfSimilarity(); } } NoTfSimilarity.java: package com.ssww; import org.apache.lucene.search.similarities.DefaultSimilarity; public final class NoTfSimilarity extends DefaultSimilarity { public float tf(int i) { return 1; } } These two files are in a jar in the lib directory of this core. Here's the results of a search for paint with custom and default similarity: Indexed with per-field NoTfSimilarity: 284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 280.5598 = (MATCH) sum of: 280.5598 = (MATCH) max of: 280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of: 280.5598 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 39.83825 = queryWeight, product of: 8.0 = boost 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = queryNorm 7.042474 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = fieldNorm(doc=48) 18.217428 = (MATCH) weight(search_keywords:paint in 48) [], result of: 18.217428 = score(doc=48,freq=1.0 = termFreq=1.0 ), product of: 4.268188 = queryWeight, product of: 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = queryNorm 4.268188 = fieldWeight in 48, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = fieldNorm(doc=48) 7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], result of: 7.725952 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 1.6527361 = queryWeight, product of: 0.5 = boost 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = queryNorm 4.6746435 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = fieldNorm(doc=48) 106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of: 106.50396 = score(doc=48,freq=4.0 = termFreq=4.0 ), product of: 16.317472 = queryWeight, product of: 5.0 = boost 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = queryNorm 6.526989 = fieldWeight in 48, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = fieldNorm(doc=48) 1.0142012 = scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0) Indexed with DefaultSimilarity: 7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 7.524058 = (MATCH) sum of: 7.524058 = (MATCH) max of: 7.524058 = (MATCH) weight(title:paint^8.0 in 3504) [DefaultSimilarity], result of: 7.524058 = fieldWeight in 3504, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 5.3203125 = idf(docFreq=197, maxDocs=14892) 1.0 = fieldNorm(doc=3504) 0.5091842 = (MATCH) weight(search_keywords:paint in 3504) [DefaultSimilarity], result of: 0.5091842 = score(doc=3504,freq=1.0 = termFreq=1.0 ), product of:
Solved: Re: omit tf using per-field CustomSimilarity?
My problem was that I specified the per-field similarity class INSIDE the analyzer instead of outside it. fieldType analyzer similarity /fieldType On 09/24/2012 02:56 PM, Carrie Coy wrote: I'm trying to configure per-field similarity to disregard term frequency (omitTf) in a 'title' field. I'm trying to follow the example docs without success: my custom similarity doesn't seem to have any effect on 'tf'. Is the NoTfSimilarity function below written correctly? Any advice is much appreciated. my schema.xml: field name=title type=text_custom_sim indexed=true stored=true omitNorms=true termVectors=true / similarity class=solr.SchemaSimilarityFactory/ fieldType name=text_custom_sim class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer NoTfSimilarityFactory.java: package com.ssww; import org.apache.lucene.search.similarities.Similarity; import org.apache.solr.schema.SimilarityFactory; public class NoTfSimilarityFactory extends SimilarityFactory { @Override public Similarity getSimilarity() { return new NoTfSimilarity(); } } NoTfSimilarity.java: package com.ssww; import org.apache.lucene.search.similarities.DefaultSimilarity; public final class NoTfSimilarity extends DefaultSimilarity { public float tf(int i) { return 1; } } These two files are in a jar in the lib directory of this core. Here's the results of a search for paint with custom and default similarity: Indexed with per-field NoTfSimilarity: 284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 280.5598 = (MATCH) sum of: 280.5598 = (MATCH) max of: 280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of: 280.5598 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 39.83825 = queryWeight, product of: 8.0 = boost 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = queryNorm 7.042474 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = fieldNorm(doc=48) 18.217428 = (MATCH) weight(search_keywords:paint in 48) [], result of: 18.217428 = score(doc=48,freq=1.0 = termFreq=1.0 ), product of: 4.268188 = queryWeight, product of: 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = queryNorm 4.268188 = fieldWeight in 48, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = fieldNorm(doc=48) 7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], result of: 7.725952 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 1.6527361 = queryWeight, product of: 0.5 = boost 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = queryNorm 4.6746435 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = fieldNorm(doc=48) 106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of: 106.50396 = score(doc=48,freq=4.0 = termFreq=4.0 ), product of: 16.317472 = queryWeight, product of: 5.0 = boost 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = queryNorm 6.526989 = fieldWeight in 48, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = fieldNorm(doc=48) 1.0142012 = scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0) Indexed with DefaultSimilarity: 7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 7.524058 = (MATCH) sum of: 7.524058 = (MATCH) max of: 7.524058 = (MATCH) weight(title:paint^8.0 in 3504) [DefaultSimilarity], result of: 7.524058 = fieldWeight in 3504, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 5.3203125 = idf(docFreq=197, maxDocs=14892) 1.0
Conditionally apply synonyms?
Is there an existing TokenFilterFactory that can conditionally insert synonyms based on a given document attribute, say category? Some synonyms only make sense in context: bats in Sports is different from bats in Party and Novelty. It seems the synonyms.txt file would need an additional column that could be checked against the document attribute prior to appending synonyms: *#synonymscategory* post,polesports wheel,caster furniture pat,paddy,patrick holiday Is anything like this possible without writing a custom TokenFilterFactory?
Re: Conditionally apply synonyms?
the latter: the document (eg product) has a category, and the synonyms would be applied at index time. sports-related bat synonyms to baseball bats, and halloween-themed bat synonyms to scary bats, for example. On 09/19/2012 05:08 PM, Erick Erickson wrote: Not that I know of, synonyms are an all-or-nothing on a field. But how would you indicate the context at index time as opposed to query time? Especially at query time, there's very little in the way of context to figure out what the category was. Or were you thinking that the document had a category and applying this only at index time? Best Erick On Wed, Sep 19, 2012 at 3:23 PM, Carrie Coyc...@ssww.com wrote: Is there an existing TokenFilterFactory that can conditionally insert synonyms based on a given document attribute, say category? Some synonyms only make sense in context: bats in Sports is different from bats in Party and Novelty. It seems the synonyms.txt file would need an additional column that could be checked against the document attribute prior to appending synonyms: *#synonymscategory* post,polesports wheel,caster furniture pat,paddy,patrick holiday Is anything like this possible without writing a custom TokenFilterFactory?
local param invariant for q is ignored?
(solr4-beta) I'm trying to follow the instructions in this article: http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/ to apply a custom sort order to search results: Essentially, it involves creating a new qq parameter, and substituting it into the original q parameter as a local param via an invariant, as below. My RequestHandler looks like this: requestHandler name=/browse class=solr.SearchHandler lst name=defaults str name=echoParamsall/str !-- VelocityResponseWriter settings -- lst name=invariants str name=q{!boost b=page_views v=$qq}/str /lst str name=wtvelocity/str str name=v.templatebrowse/str str name=v.layoutlayout/str str name=qq*:*/str /requestHandler I triple-checked the small modifications these files: velocity/VM_global_library.vm. #macro(q)qq=$!{esc.url($params.get('qq'))}#end velocity/query.vm span #annTitle(Add the query using the qq= parameter)Find: input type=text id=q name=qq value=$!esc.html($params.get('qq'))/ input type=submit id=querySubmit/ input type=reset//span But instead of creating a query like this: http://localhost:8080/solr/ssww/browse?q={!boost b=page_views v=$qq}qq=glue (which returns great results) It creates this: http://localhost:8080/solr/ssww/browse?qq=glue (no results) From the query debugging: lstname=params strname=qqglue/str strname=v.templatebrowse/str strname=qftitle^2 description^0.5 cat^0.5 bullets^0.5 search_keywords^0.3 /str strname=wtxml/str strname=rows10/str strname=defTypeedismax/str strname=invariants{q={!boost b=page_views v=$qq}}/str strname=debugtrue/str strname=wtxml/str strname=qqglue/str lstname=debug nullname=rawquerystring/null nullname=querystring/null strname=parsedquery/str strname=parsedquery_toString/str lstname=explain/lst strname=QParserExtendedDismaxQParser/str nullname=altquerystring/null nullname=boostfuncs/null Does anybody know what I might have done wrong?
Re: local param invariant for q is ignored? (solved)
Of course I saw my error within seconds of pressing send. The invariants block should appear outside the defaults block in the RequestHandler. On 08/31/2012 04:25 PM, Carrie Coy wrote: (solr4-beta) I'm trying to follow the instructions in this article: http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/ to apply a custom sort order to search results: Essentially, it involves creating a new qq parameter, and substituting it into the original q parameter as a local param via an invariant, as below. My RequestHandler looks like this: requestHandler name=/browse class=solr.SearchHandler lst name=defaults str name=echoParamsall/str !-- VelocityResponseWriter settings -- lst name=invariants str name=q{!boost b=page_views v=$qq}/str /lst str name=wtvelocity/str str name=v.templatebrowse/str str name=v.layoutlayout/str str name=qq*:*/str /requestHandler I triple-checked the small modifications these files: velocity/VM_global_library.vm. #macro(q)qq=$!{esc.url($params.get('qq'))}#end velocity/query.vm span #annTitle(Add the query using the qq= parameter)Find: input type=text id=q name=qq value=$!esc.html($params.get('qq'))/ input type=submit id=querySubmit/ input type=reset//span But instead of creating a query like this: http://localhost:8080/solr/ssww/browse?q={!boost b=page_views v=$qq}qq=glue (which returns great results) It creates this: http://localhost:8080/solr/ssww/browse?qq=glue (no results) From the query debugging: lstname=params strname=qqglue/str strname=v.templatebrowse/str strname=qftitle^2 description^0.5 cat^0.5 bullets^0.5 search_keywords^0.3 /str strname=wtxml/str strname=rows10/str strname=defTypeedismax/str strname=invariants{q={!boost b=page_views v=$qq}}/str strname=debugtrue/str strname=wtxml/str strname=qqglue/str lstname=debug nullname=rawquerystring/null nullname=querystring/null strname=parsedquery/str strname=parsedquery_toString/str lstname=explain/lst strname=QParserExtendedDismaxQParser/str nullname=altquerystring/null nullname=boostfuncs/null Does anybody know what I might have done wrong?
Re: More debugging DIH - URLDataSource (solved)
Thank you for these suggestions. The real problem was incorrect syntax for the primary key column in data-config.xml. Once I corrected that, the data loaded fine. wrong: field column=part_code name=id xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME'] regex=/^PRODUCT:.*\((.*?)\)$/ replaceWith=$1/ Right: field column=id xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME'] regex=/^PRODUCT:.*\((.*?)\)$/ replaceWith=$1/ On 08/25/2012 08:52 PM, Lance Norskog wrote: About XPaths: the XPath engine does a limited range of xpaths. The doc says that your paths are covered. About logs: You only have the RegexTransformer listed. You need to add LogTransformer to the transformer list: http://wiki.apache.org/solr/DataImportHandler#LogTransformer Having xml entity codes in the url string seems right. Can you verify the url that goes to the remote site? Can you read the logs at the remote site? Can you run this code through a proxy and watch the data? On Fri, Aug 24, 2012 at 1:34 PM, Carrie Coyc...@ssww.com wrote: I'm trying to write a DIH to incorporate page view metrics from an XML feed into our index. The DIH makes a single request, and updates 0 documents. I set log level to finest for the entire dataimport section, but I still can't tell what's wrong. I suspect the XPath. http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport returns 404. Any suggestions on how I can debug this? * solr-spec 4.0.0.2012.08.06.22.50.47 The XML data: ?xml version='1.0' encoding='UTF-8'? ReportDataResponse Data Rows Row rowKey=P#PRODUCT: BURLAP POTATO SACKS (PACK OF 12) (W4537)#N/A#5516196614 rowActionAvailability=0 0 0 Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: BURLAP POTATO SACKS (PACK OF 12) (W4537)/Value Value columnId=PAGE_VIEWS comparisonSpecifier=A2388/Value /Row Row rowKey=P#PRODUCT: OPAQUE PONY BEADS 6X9MM (BAG OF 850) (BE9000)#N/A#5521976460 rowActionAvailability=0 0 0 Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: OPAQUE PONY BEADS 6X9MM (BAG OF 850) (BE9000)/Value Value columnId=PAGE_VIEWS comparisonSpecifier=A1313/Value /Row /Rows /Data /ReportDataResponse My DIH: |dataConfig dataSource name=coremetrics type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ document entity name=coremetrics dataSource=coremetrics pk=id url=https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=**amp;username=amp;format=XMLamp;userAuthKey=amp;language=en_USmp;viewID=9475540amp;period_a=M20110930; processor=XPathEntityProcessor stream=true forEach=/ReportDataResponse/Data/Rows/Row logLevel=fine transformer=RegexTransformer field column=part_code name=id xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME'] regex=/^PRODUCT:.*\((.*?)\)$/ replaceWith=$1/ field column=page_views xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS'] / /entity /document /dataConfig | |||This little test perl script correctly extracts the data:| || |use XML::XPath;| |use XML::XPath::XMLParser;| || |my $xp = XML::XPath-new(filename = 'cm.xml');| |||my $nodeset = $xp-find('/ReportDataResponse/Data/Rows/Row');| |||foreach my $node ($nodeset-get_nodelist) {| |||my $page_name = $node-findvalue('Value[@columnId=PAGE_NAME]');| |my $page_views = $node-findvalue('Value[@columnId=PAGE_VIEWS]');| |$page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;| |}| From logs: INFO: Loading DIH Configuration: data-config.xml Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=2 Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties INFO: Read dataimport.properties Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource getData FINE: Accessing URL: https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*username=***format=XMLuserAuthKey=**language=en_USviewID=9475540period_a=M20110930 Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0 Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport
More debugging DIH - URLDataSource
I'm trying to write a DIH to incorporate page view metrics from an XML feed into our index. The DIH makes a single request, and updates 0 documents. I set log level to finest for the entire dataimport section, but I still can't tell what's wrong. I suspect the XPath. http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport returns 404. Any suggestions on how I can debug this? * solr-spec 4.0.0.2012.08.06.22.50.47 The XML data: ?xml version='1.0' encoding='UTF-8'? ReportDataResponse Data Rows Row rowKey=P#PRODUCT: BURLAP POTATO SACKS (PACK OF 12) (W4537)#N/A#5516196614 rowActionAvailability=0 0 0 Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: BURLAP POTATO SACKS (PACK OF 12) (W4537)/Value Value columnId=PAGE_VIEWS comparisonSpecifier=A2388/Value /Row Row rowKey=P#PRODUCT: OPAQUE PONY BEADS 6X9MM (BAG OF 850) (BE9000)#N/A#5521976460 rowActionAvailability=0 0 0 Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: OPAQUE PONY BEADS 6X9MM (BAG OF 850) (BE9000)/Value Value columnId=PAGE_VIEWS comparisonSpecifier=A1313/Value /Row /Rows /Data /ReportDataResponse My DIH: |dataConfig dataSource name=coremetrics type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ document entity name=coremetrics dataSource=coremetrics pk=id url=https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=**amp;username=amp;format=XMLamp;userAuthKey=amp;language=en_USmp;viewID=9475540amp;period_a=M20110930; processor=XPathEntityProcessor stream=true forEach=/ReportDataResponse/Data/Rows/Row logLevel=fine transformer=RegexTransformer field column=part_code name=id xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME'] regex=/^PRODUCT:.*\((.*?)\)$/ replaceWith=$1/ field column=page_views xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS'] / /entity /document /dataConfig | |||This little test perl script correctly extracts the data:| || |use XML::XPath;| |use XML::XPath::XMLParser;| || |my $xp = XML::XPath-new(filename = 'cm.xml');| |||my $nodeset = $xp-find('/ReportDataResponse/Data/Rows/Row');| |||foreach my $node ($nodeset-get_nodelist) {| |||my $page_name = $node-findvalue('Value[@columnId=PAGE_NAME]');| |my $page_views = $node-findvalue('Value[@columnId=PAGE_VIEWS]');| |$page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;| |}| From logs: INFO: Loading DIH Configuration: data-config.xml Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=2 Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties INFO: Read dataimport.properties Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource getData FINE: Accessing URL: https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*username=***format=XMLuserAuthKey=**language=en_USviewID=9475540period_a=M20110930 Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0 Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=1 Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=1 Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0 Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0 Aug 24, 2012 3:53:20 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0 Aug 24, 2012 3:53:22 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0 Aug 24, 2012 3:53:24 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0 Aug 24, 2012 3:53:27 PM org.apache.solr.core.SolrCore execute INFO: [ssww] webapp=/solr path=/dataimport
Shingle and PositionFilterFactory question
I am trying to use shingles and position filter to make a query for foot print, for example, match either foot print or footprint. From the docs: using the PositionFilter http://wiki.apache.org/solr/PositionFilter in combination makes it possible to make all shingles synonyms of each other. I've configured my analyzer like this: analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=2 outputUnigrams=true outputUnigramsIfNoShingles=false tokenSeparator=/ filter class=solr.PositionFilterFactory/ /analyzer user query: foot print Without PositionFilterFactory, parsed query:+(((title:foot) (title:print))~2) (title:(foot footprint) print) With PositionFilterFactory, parsed query: +(((title:foot) (title:print))~2) () Why, when I add PositionFilterFactory into the mix, is the footprint shingle is omitted? Output of analysis: WT text raw_bytes start end position type foot [66 6f 6f 74] 0 4 1 word print [70 72 69 6e 74] 5 10 2 word SF text raw_bytes start end positionLength type position foot [66 6f 6f 74] 0 4 1 word 1 footprint [66 6f 6f 74 70 72 69 6e 74] 0 10 2 shingle 1 print [70 72 69 6e 74] 5 10 1 word 2 Thanks, Carrie Coy
WordBreakSolrSpellChecker ignores MinBreakWordLength?
I set MinBreakWordLength = 3 thinking it would prevent WordBreakSolrSpellChecker from suggesting corrections made up of subwords shorter than 3 characters, but I still get suggestions like this: query: Touch N' Match suggestion: (t o u ch) 'n (m a t ch) Can someone help me understand why? Here is the relevant portion of solrconfig.xml: str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations15/str str name=spellcheck.maxCollationTries100/str str name=spellcheck.alternativeTermCount4/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.MinBreakWordLength3/str str name=spellcheck.maxChanges3/str
Re: Solved: WordBreakSolrSpellChecker ignores MinBreakWordLength?
Thanks! The combination of these two suggestions (relocating the wordbreak parameters to the spellchecker configuration and correcting the spelling of the parameter to minBreakLength) fixed the problem I was having. On 06/28/2012 10:22 AM, Dyer, James wrote: Carrie, Try taking the workbreak parameters out of the request handler configuration and instead put them in the spellchecker configuration. You also need to remove the spellcheck. prefix. Also, the correct spelling for this parameter is minBreakLength. Here's an example. lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=field{your field name here}/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges3/int int name=minBreakLength3/int /lst All of the parameters in the following source file go in the spellchecker configuration like this: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java Descriptions of each of these parameters can be found in this source file: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java Let me know if this works out for you. Any more feedback you can provide on the newer spellcheck features you're using is appreciated. Thanks. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Carrie Coy [mailto:c...@ssww.com] Sent: Thursday, June 28, 2012 8:20 AM To: solr-user@lucene.apache.org Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength? I set MinBreakWordLength = 3 thinking it would prevent WordBreakSolrSpellChecker from suggesting corrections made up of subwords shorter than 3 characters, but I still get suggestions like this: query: Touch N' Match suggestion: (t o u ch) 'n (m a t ch) Can someone help me understand why? Here is the relevant portion of solrconfig.xml: str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations15/str str name=spellcheck.maxCollationTries100/str str name=spellcheck.alternativeTermCount4/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.MinBreakWordLength3/str str name=spellcheck.maxChanges3/str
Re: WordBreak and default dictionary crash Solr
On 06/15/2012 05:16 PM, Dyer, James wrote: I'm pretty sure you've found a bug here. Could you tell me whether you're using a build from Trunk or Solr_4x ? Also, do you know the svn revision or the Jenkins build # (or timestamp) you're working from? I continued to see the problem after updating to version below (previously was running version built on 06-09): * solr-spec 4.0.0.2012.06.16.10.22.10 * solr-impl 4.0-2012-06-16_10-02-16 1350899 - hudson - 2012-06-16 10:22:10 Could you try instead to use DirectSolrSpellChecker instead of IndexBasedSpellChecker for your default dictionary? Switching to DirectSolrSpellChecker appears to fix the problem: a query with 2 misspellings, one from each dictionary, does not crash Solr and is correctly spell-checked. Thanks! Carrie Coy
WordBreak and default dictionary crash Solr
Is this a configuration problem or a bug? We use two dictionaries, default (spellcheckerFreq) and solr.WordBreakSolrSpellChecker. When a query contains 2 misspellings, one corrected by the default dictionary, and the other corrected by the wordbreak dictionary (strawberryn shortcake) , Solr crashes with error below. It doesn't matter which dictionary is checked first. java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:566) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1555) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Multiple errors corrected by the SAME dictionary (either wordbreak or default) do not crash Solr. Here is excerpt from our solrconfig.xml: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspell/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges1/int /lst lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDirspellcheckerFreq/str str name=buildOnOptimizetrue/str /lst /searchComponent requestHandler name=/select class=solr.SearchHandler lst name=defaults . str name=spellcheck.dictionarywordbreak/str str name=spellcheck.dictionarydefault/str str name=spellcheck.count3/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str /lst /requestHandler
PorterStemmerTokenizerFactory ?
I've read different suggestions on how to handle cases where synonyms are used and there are multiple version of the original word that need to point to the same set of synonyms (/responsibility, responsibilities, obligation, duty/ ). The approach that seems most logical is to configure a SynonymFilterFactory to use a custom TokenizerFactory that stems synonyms by calling out to the PorterStemmer. Does anyone know if a PorterStemmerTokenizerFactory already exists somewhere? Thank you. Carrie Coy