loading solr from Pig?
Hello All, Is anyone loading Solr from a Pig script / process? I was talking to another group in our company and they have standardized on MongoDB instead of Solr - apparently there is very good support between MongoDB and Pig - allowing users to stream data directly from a Pig process in to MongoDB. Does solr have anything like this as well? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/loading-solr-from-Pig-tp4085933.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: translating a character code to an ordinal?
i will try it. i guess i made a poor assumption that you would not get predictable results when copying a code like mycode to an int field where where the desired end result in the int field is say, 1. i was worried that some sort of ascii conversion or wrap around would happen in the int field. thx for the insight. mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4069335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: translating a character code to an ordinal?
i will try it out and let you know - -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4069339.html Sent from the Solr - User mailing list archive at Nabble.com.
translating a character code to an ordinal?
hello all, environment: solr 3.5, centos problem statement: i have several character codes that i want to translate to ordinal (integer) values (for sorting), while retaining the original code field in the document. i was thinking that i could use a copyField from my code field to my ord field - then employ a pattern replace filter factory during indexing. but won't the copyfield fail because the two field types are different? ps: i also read the wiki about http://wiki.apache.org/solr/DataImportHandler#Transformer the script transformer and regex transformer - but was hoping to avoid this - if i could. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: translating a character code to an ordinal?
hello jack, thank you for the code ;) what book are you referring to? AFAICT - all of the 4.0 books are future order. we won't be moving to 4.0 (soon enough). so i take it - copyfield will not work, eg - i cannot take a code like ABC and copy it to an int field and then use the regex to turn it in to an ordinal? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068984.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: translating a character code to an ordinal?
thx, please send me a link to the book so i get/purchase it. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068997.html Sent from the Solr - User mailing list archive at Nabble.com.
custom field tutorial
can someone point me to a custom field tutorial. i checked the wiki and this list - but still a little hazy on how i would do this. essentially - when the user issues a query, i want my class to interrogate a string field (containing several codes - example boo, baz, bar) and return a single integer field that maps to the string field (containing the code). example: boo=1 baz=2 bar=3 thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/custom-field-tutorial-tp4068998.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: seeing lots of autowarming messages in log during DIH indexing
the DIH is launched via a script - called by a cron like scheduler. clean, commit and optimize are all true. thx mark #!/bin/bash SERVER=$1 PORT=$2 CLEAN=$3 COMMIT=$4 OPTIMIZE=$5 COREPATH=$6 echo SERVER: $SERVER echo PORT: $PORT echo CLEAN: $CLEAN echo COMMIT: $COMMIT echo OPTIMIZE: $OPTIMIZE echo COREPATH: $COREPATH if [ $# != 6 ]; then echo USAGE: $0 [SERVER] [PORT] [CLEAN: true/false] [COMMIT: true/false] [OPTIMIZE: true/false] [COREPATH] exit 1; fi ... -- View this message in context: http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4067477.html Sent from the Solr - User mailing list archive at Nabble.com.
seeing lots of autowarming messages in log during DIH indexing
hello, we are tracking down some performance issues with our DIH process. not sure if this is related - but i am seeing tons of the messages below in the logs during re-indexing of the core. what do these messages mean? 2013-05-18 19:37:30,623 INFO [org.apache.solr.update.UpdateHandler] (pool-11-thread-1) end_commit_flush 2013-05-18 19:37:30,623 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,624 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,624 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,625 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,625 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=1,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=3,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: seeing lots of autowarming messages in log during DIH indexing
you mean i would add this switch to my script that kicks of the dataimport? exmaple: OUTPUT=$(curl -v http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport -F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F optimize=${OPTIMIZE} -F openSearcher=false) what needs to be done _AFTER_ the DIH finishes (if anything)? eg, does this need to be turned back on after the DIH has finished? -- View this message in context: http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064695.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: having trouble storing large text blob fields - returns binary address in search results
hello your comment made me think - so i decided to double check myself. i opened up the schema in squirrel and made sure that the two columns in question were actually of type TEXT in the schema - check i went in to the db-config.xml and removed all references to ClobTransformer, removed the cast directives from the fields as well as the clob=true on the two fields - i pasted the db-config.xml below for reference - check i restarted jboss - thus restarting solr - check i went in to the solr dataimport admin screen and did a clean import - check after the import was complete - i queried a part that i knew would have one of the clob fields - results are pasted below as well - you can see the binary address in the field. ?xml version=1.0? result name=response numFound=1 start=0 doc str name=accessoryIndicatorN/str * str name=attributes[B@5b372219/str* str name=availabilityStatusPIA/str arr name=divProductTypeDesc strRefrigerators and Freezers/str /arr str name=divProductTypeId0046/str str name=id12001892,0046,464/str str name=itemModelDescVALVE, WATER/str str name=itemModelNo12001892/str str name=itemModelNoExactMatchStr12001892/str int name=itemType1/int str name=otcStockIndicatorY/str int name=partCnt1/int str name=partConditionN/str arr name=plsBrandDesc str/ /arr str name=plsBrandId464/str str name=productIndicatorN/str int name=rankNo13/int float name=sellingPrice53.54/float str name=sourceOrderNo464 /str str name=subbedFlagY/str /doc /result document entity transformer=TemplateTransformer name=core1-parts query=select summ.*, 1 as item_type, 1 as part_cnt, '' as brand, mst.acy_prt_fl, mst.dil_tx, mst.hzd_mtl_typ_cd, mst.otc_cre_stk_fl, mst.prd_fl, mst.prt_cmt_tx, mst.prt_cnd_cd, mst.prt_inc_qt, mst.prt_made_by, mst.sug_qt, att.attr_val, rsr.rsr_val, case when sub.orb_itm_id is null then 'N' else 'Y' end as subbed_flag from prtxtps_prt_summ as summ left outer join prtxtpm_prt_mast as mst on mst.orb_itm_id = summ.orb_itm_id and mst.prd_gro_id = summ.prd_gro_id and mst.spp_id = summ.spp_id left outer join tmpxtpa_prt_attr as att on att.orb_itm_id = summ.orb_itm_id and att.prd_gro_id = summ.prd_gro_id and att.spp_id = summ.spp_id left outer join tmpxtpr_prt_rsr as rsr on rsr.orb_itm_id = summ.orb_itm_id and rsr.prd_gro_id = summ.prd_gro_id and rsr.spp_id = summ.spp_id left outer join tmpxtps_prt_sub as sub on sub.orb_itm_id = summ.orb_itm_id and sub.prd_gro_id = summ.prd_gro_id and sub.spp_id = summ.spp_id where summ.spp_id = '464' field column =id name=id template=${core1-parts.orb_itm_id},${core1-parts.prd_gro_id},${core1-parts.spp_id}/ field column=orb_itm_id name=itemModelNo/ field column=prd_gro_id name=divProductTypeId/ field column=ds_tx name=itemModelDesc/ field column=spp_id name=plsBrandId/ field column=rnk_no name=rankNo/ field column=item_type name=itemType/ field column=brand name=plsBrandDesc/ field column=prd_gro_ds name=divProductTypeDesc/ field column=part_cnt name=partCnt/ field column=avail name=availabilityStatus/ field column=price name=sellingPrice/ field column=prt_son name=sourceOrderNo/ field column=prt_src_cd name=sourceIdCode/ field column=rte_cd name=sourceRouteCode/ field column=acy_prt_fl name=accessoryIndicator/ field column=dil_tx name=disclosure/ field column=hzd_mtl_typ_cd name=hazardousMaterialCode/ field column=otc_cre_stk_fl name=otcStockIndicator/ field column=prd_fl name=productIndicator/ field column=prt_cmt_tx name=comment/ field column=prt_cnd_cd name=partCondition/ field column=prt_inc_qt name=qtyIncluded/ field column=prt_made_byname=madeBy/ field column=sug_qt name=suggestedQty/ field column=attr_val name=attributes/ field column=rsr_valname=restrictions/ field column=subbed_flag
Re: having trouble storing large text blob fields - returns binary address in search results
Hello Gora, thank you for the reply - i did finally get this to work. i had to cast the column in the DIH to a clob - like this. cast(att.attr_val AS clob) as attr_val, cast(rsr.rsr_val AS clob) as rsr_val, once this was done, the ClobTransformer worked. to my knowledge - this particular use case and the need for the cast is not documented anywhere. i checked the solr wiki and searched the threads on this forum for things like clobtransformer, informix and blob without luck. i also did quite a few google searches as well but no luck (but maybe i missed something ;) maybe this is just some edge case. i also realize that informix is not that common. i have a question in to the solr developers list - just so i can better understand what actually is happening, why it was necessary for the cast, and the limitations / parameters of the ClobTransformer. the thread on the developers list is located here: http://lucene.472066.n3.nabble.com/have-developer-question-about-ClobTransformer-and-DIH-td4064256.html thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979p4064286.html Sent from the Solr - User mailing list archive at Nabble.com.
having trouble storing large text blob fields - returns binary address in search results
hello environment: solr 3.5 can someone help me with the correct configuration for some large text blob fields? we have two fields in informix tables that are of type text. when we do a search the results for these fields come back looking like this: str name=attributes[B@17c232ee/str i have tried setting them up as clob fields - but this is not working (see details below) i have also tried treating them as plain string fields (removing the references to clob in the DIH) - but this does not work either. DIH configuration: entity transformer=quot;TemplateTransformer,ClobTransformerquot; name=quot;core1-partsquot; query=quot;select summ.*, 1 as item_type, 1 as part_cnt, '' as brand, ... lt;field column=quot;attr_valquot; name=quot;attributesquot; clob=quot;truequot; / field column=rsr_valname=restrictions clob=true / Schema.xml field name=attributes type=string indexed=false stored=true/ field name=restrictions type=string indexed=false stored=true/ thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why does * affect case sensitivity of query results
hello erik, thank you for the info - yes - i did notice ;) one more reason for us to upgrade from 3.5. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p406.html Sent from the Solr - User mailing list archive at Nabble.com.
why does * affect case sensitivity of query results
hello, environment: solr 3.5 problem statement: when query has * appended, it turns case sensitive. assumption: query should NOT be case sensitive actual value in database at time of index: 4387828BULK here is a snapshot of what works and does not work. what works: itemModelNoExactMatchStr:4387828bULk (and any variation of upper and lower case letters for *bulk*) itemModelNoExactMatchStr:4387828bu* itemModelNoExactMatchStr:4387828bul* itemModelNoExactMatchStr:4387828bulk* what does NOT work: itemModelNoExactMatchStr:4387828BU* itemModelNoExactMatchStr:4387828BUL* itemModelNoExactMatchStr:4387828BULK* below are the specifics of my field and fieldType field name=itemModelNoExactMatchStr type=text_exact indexed=true stored=true/ fieldType name=text_exact class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why does * affect case sensitivity of query results
was looking in Smiley's book on page 129 and 130. from the book, No text analysis is performed on the search word containing the wildcard, not even lowercasing. So if you want to find a word starting with Sma, then sma* is required instead of Sma*, assuming the index side of the field's type includes lowercasing. This shortcoming is tracked on SOLR-219. Moreover, if the field that you want to use the wildcard query on is stemmed in the analysis, then smashing* would not find the original text Smashing because the stemming process transforms this to smash. Consequently, don't stem. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p4059812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why does * affect case sensitivity of query results
here is the jira link: https://issues.apache.org/jira/browse/SOLR-219 -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-affect-case-sensitivity-of-query-results-tp4059801p4059814.html Sent from the Solr - User mailing list archive at Nabble.com.
having trouble searching on EdgeNGramFilterFactory field with a length minGramSize
hello, i am trying to debug the following query in the analyzer: *+itemModelNoExactMatchStr:JVM1640CJ01 +plsBrandId:0432 +plsBrandDesc:ge* the query is going against a field (plsBrandDesc) that is being indexed with solr.EdgeNGramFilterFactory and a minGramSize of 3. i have included the complete field definition below. after doing some experimenting in the analyzer, i believe the query may be failing because the queried value of ge is only two (2) characters long - and the minimum gram size is three (3) characters. for example - this query does work in the analyzer. it has a plsBrandDesc three characters and does return exactly one document: +itemModelNoExactMatchStr:404 +plsBrandId:0431 *+plsBrandDesc:general* i have tried overriding this behavior by using mm=2, but this does not seem to work: +itemModelNoExactMatchStr:JVM1640CJ01 +plsBrandId:0432 +plsBrandDesc:ge mm=2 am i misunderstanding how mm works - or am i getting the syntax for mm incorrect? thx mark field name=plsBrandDesc type=text_general_edge_ngram indexed=true stored=true multiValued=true/ fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms_SHC.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-searching-on-EdgeNGramFilterFactory-field-with-a-length-minGramSize-tp4049107.html Sent from the Solr - User mailing list archive at Nabble.com.
need general advice on how others version and mange core deployments over time
hello everyone, i know this is a general topic - but would really appreciate info from others that are doing this now. - how are others managing this so that users are impacted the least - how are others handling the scenario where users don't want to migrate forward. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/need-general-advice-on-how-others-version-and-mange-core-deployments-over-time-tp4047390.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about syntax for multiple terms in filter query
hello jack, yes - i will always be using the two constraints at the same time. thank you again for the info. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442p4046650.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about syntax for multiple terms in filter query
jack, did you mean function query or filter query i was going to do this in my request handler for parts str name=fq+itemType:1 +sellingPrice:[1 TO *]/str -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442p4046715.html Sent from the Solr - User mailing list archive at Nabble.com.
having trouble escaping a character string
hello all, i am searching on this field type: field name=itemModelNoExactMatchStr type=text_exact indexed=true stored=true/ fieldType name=text_exact class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType for this string: 30326R-26 TILLER when i use the analyzer and issue the query - it indicates success (please see attached screen shot) but when i issue the search url - it does not return a document http://bogus/solrpartscat/core2/select?qt=modelItemNoSearchq=itemModelNoExactMatchStr:%2230326R-26%22%20TILLER%22 can someone tell me what i am missing? thx mark http://lucene.472066.n3.nabble.com/file/n4046796/temp1.bmp -- View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-escaping-a-character-string-tp4046796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: having trouble escaping a character string
attempting to upload the screenshot bmp file. the embedded image is difficult to make out. temp1.bmp http://lucene.472066.n3.nabble.com/file/n4046798/temp1.bmp -- View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-escaping-a-character-string-tp4046796p4046798.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: having trouble escaping a character string
oh - now i see what i was doing wrong. i kept trying to use the hex code of %22 as a replacement for the double quote - but that was not working - thank you jack, mark -- View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-escaping-a-character-string-tp4046796p4046821.html Sent from the Solr - User mailing list archive at Nabble.com.
question about syntax for multiple terms in filter query
hello everyone, i have a question on the filter query syntax for multiple terms, after reading this: http://wiki.apache.org/solr/CommonQueryParameters#fq i see from the above that two (2) syntax constructs are supported fq=term1:foo fq=term2:bar and fq=+term1:foo +term2:bar is there a reason why i would want to use one syntax over the other? does the first syntax support the and operand as well as the ? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about syntax for multiple terms in filter query
otis and jack - thank you VERY much for the feedback - jack - use a single fq containing two mandatory clauses if those clauses appear together often this is the use case i have to account for - eg, right now i have this in my request handler requestHandler name=partItemNoSearch class=solr.SearchHandler default=false ... str name=fqitemType:1/str ... /requestHandler which says - i only want parts but i need to augment the filter so only parts that have a price = 1.0 are returned from the request handler so i believe i need to have this in the RH requestHandler name=partItemNoSearch class=solr.SearchHandler default=false ... str name=fq+itemType:1 +sellingPrice:[1 TO *]/str ... /requestHandler thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442p4046548.html Sent from the Solr - User mailing list archive at Nabble.com.
searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses
hello environment: solr 3.5 problem statement: i have a requirement to search for part numbers that start with a dash / hyphen. example q= term: *-0004A-0436* example query: http://some_url:some_port/some_core/select?facet=falsesort=score+desc%2C+rankNo+asc%2C+partCnt+descstart=0q=*-0004A-0436*+itemType%3A1wt=xmlqt=itemModelNoProductTypeBrandSearchrows=4 what is happening: query is returning a huge results set. in reality there is one (1) and only one record in the database with this part number. i believe this is happening because the dash is being interpreted by the query parser as a prohibited clause and the effective result is, give me everything that does NOT have this part number. how is this handled so that the search is conducted for the actual part: -0004A-0436 thx mark more information: request handler in solrconfig.xml requestHandler name=itemModelNoProductTypeBrandSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemModelNoExactMatchStr^30 itemModelNo^.9 divProductTypeDesc^.8 plsBrandDesc^.5/str str name=q.alt*:*/str str name=sortscore desc, rankNo desc, partCnt desc/str str name=facettrue/str str name=facet.fielditemModelDescFacet/str str name=facet.fieldplsBrandDescFacet/str str name=facet.fielddivProductTypeIdFacet/str /lst lst name=appends /lst lst name=invariants /lst /requestHandler field information from schema.xml (if helpful) field name=itemModelNoExactMatchStr type=text_general_trim indexed=true stored=true/ field name=itemModelNo type=text_en_splitting indexed=true stored=true omitNorms=true/ field name=divProductTypeDesc type=text_general_edge_ngram indexed=true stored=true multiValued=true/ field name=plsBrandDesc type=text_general_edge_ngram indexed=true stored=true multiValued=true/ fieldType name=text_general_trim class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.PatternReplaceFilterFactory pattern=\. replacement= replace=all/ filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15 side=front/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms_SHC.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html Sent from the Solr - User mailing list archive at Nabble.com.
performing a boolean query (OR) with a large number of terms
hello, environment: solr 3.5 i have a requirement to perform a boolean query (like the example below) with a large number of terms. the number of terms could be 15 or possibly larger. after looking over several theads and the smiley book - i think i just have include the parens and string all of the terms together with OR's i just want to make sure that i am not missing anything. is there a better or more efficient way of doing this? http://server:port/dir/core1/select?qt=modelItemNoSearchq=itemModelNoExactMatchStr:%285-100-NGRT7%20OR%205-10-10MS7%20OR%20404%29rows=30debugQuery=onrows=40 thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/performing-a-boolean-query-OR-with-a-large-number-of-terms-tp4032039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is it possible to save the search query?
Hello, i think you are asking two questions here - i'll see if i can give you some simple examples for both 1) how can i pull data from a solr search result set and compare it to another for analysis? one way - might be to drive the results in to files and then use xslt to extract relevant information. here is an example xslt file that pulls specific fields from a result: ?xml version=1.0 encoding=ISO-8859-1? xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform; xsl:output method=text/ xsl:template match=response/result xsl:for-each select=doc xsl:text[/xsl:text xsl:value-of select=str[@name='itemNo']/ xsl:text]/xsl:text xsl:text,/xsl:text xsl:value-of select=float[@name='score']/ xsl:text,/xsl:text xsl:value-of select=int[@name='rankNo']/ xsl:text,/xsl:text xsl:value-of select=int[@name='partCnt']/ xsl:text#10;/xsl:text /xsl:for-each /xsl:template /xsl:stylesheet 2) how can i embed data in to a solr query, making it easier to do analysis in the log files? here is a simple example that bookmarks or brackets transactions in the logs - used only during stress testing #!/bin/bash TYPE=$1 TAG=$2 if [ $TYPE == 1 ] then # beginning curl -v http://something:1234/boo/core1/select/?q=partImageURL%3A${TAG}-test-beginversion=2.2start=0rows=777indent=on else # end curl -v http://something:1234/boo/core1/select/?q=partImageURL%3A${TAG}-test-endversion=2.2start=0rows=777indent=on fi hopefully this will give you something to start with. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-save-the-search-query-tp4018925p4021315.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I best detect when my DIH load is done?
Hello Andy, i had a similar question on this some time ago. http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html#a3987123 http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-td3801327.html#a3803658 i ended up writing my own shell based polling application that runs from our *nx batch server that handles all of our Control-M work. +1 on the idea of making this a more formal part of the API. let me know if you want concrete example code. -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021148.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How do I best detect when my DIH load is done?
James, was it you (cannot remember) that replied to one of my queries on this subject and mentioned that there was consideration being given to cleaning up the response codes to remove ambiguity? -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
Gora, currently our core does use mult-valued fields. however the exsiting multi-valued fields in the schema are will only result in 3 - 10 values. we are thinking of using the text blob approach primarily because of the large number of possible values in this field. if we were to use a multi-valued field, it is likely that the MV field would have 200+ values and in some edge cases 400+ values. are you saying that the MV field approach to represent the data (given the scale previously indicated) is the best design solution? -- View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018315.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
Erick, thanks for the insight. FWIW and to add to the context of this discussion, if we do decide to add the previously mentioned content as a multivalued field, we would likely use a DIH hooked to our database schema (this is currently how we add ALL content to our core) and within the DIH, use a sub-entity to pull the many rows for each parent row. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018355.html Sent from the Solr - User mailing list archive at Nabble.com.
large text blobs in string field
hello environment - solr 3.5 i would like to know if anyone is using the technique of placing large text blobs in to a non-indexed string field and if so - are there any good/bad aspects to consider? we are thinking of doing this to represent a 1:M relationship with the Many being represented as a string in the schema (probably comprised either of xml or json objects). we are looking at the classic part : model scenario, where the client would look up a part and the document would contain a string field with potentially 200+ model numbers. edge cases for this could be 400+ model numbers. thx -- View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help with exact match search
hello jack, that was it! thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-with-exact-match-search-tp4014832p4015103.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help with exact match search
hello jack, thank you very much for the reply - i will re-test and let you know. really appreciate it ;) thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-with-exact-match-search-tp4014832p4014848.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
Chris, Jack, thank you for the detailed replies and help ;) -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4003782.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
update: as an experiment - i changed the query to a wildcard (9030*) instead of an explicit value (9030) example: QUERY=http://$SERVER.intra.searshc.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030*rows=2000debugQuery=onfl=*,score; this resulted in a results list that appears much more rational from a sort order perspective - however - the wildcard query is not acceptable from a performance stand point. any input or illumination would be appreciated ;) thank you itemNo, score, rankNo, partCnt [9030],1.0,10353,1 [90302 ],1.0,6849,1 [9030P ],1.0,444,1 [903093 ],1.0,51,1 [9030430 ],1.0,47,1 [9030],1.0,37,1 [903057-9010 ],1.0,26,1 [903061-9010 ],1.0,20,1 [903046-9010 ],1.0,18,1 [903056-9010 ],1.0,14,1 [903095 ],1.0,14,1 [90303-MR1-000 ],1.0,14,1 [903097-9050 ],1.0,12,1 [903046-9011 ],1.0,12,1 [903097-9010 ],1.0,11,1 [903097-9040 ],1.0,11,1 [903063-9100 ],1.0,6,1 [903066-9011 ],1.0,6,1 [903098 ],1.0,3,1 -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002919.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
looks like the original complete list of the results did not get attached to this thread here is a snippet of the list. what i am trying to demonstrate, is the difference in scoring and ultimately, sorting - and the breadth of documents (a few hundred) between the two documents of interest (9030 and 90302) thank you, itemNo, score, rankNo, partCnt [9030],12.014701,10353,1 [9030],12.014701,37,1 [9030],12.014701,1,1 [9030 ],12.014701,0,167 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [9030],12.014701,0,1 [PC-9030],7.509188,0,169 [58-9030 ],7.509188,0,1 [9030-1R ],7.509188,0,1 [903028-9030 ],7.509188,0,1 [903139-9030 ],7.509188,0,1 [903091-9030 ],7.509188,0,1 [903099-9030 ],7.509188,0,1 [903153-9030 ],7.509188,0,1 [031-9030],7.509188,0,1 [308-9030],7.509188,0,1 [9030-6010 ],7.509188,0,1 [9030-6010 ],7.509188,0,1 [9030-6006 ],7.509188,0,1 [9030-6008 ],7.509188,0,1 [9030-6008 ],7.509188,0,1 [9030-6001 ],7.509188,0,1 [9030-6003 ],7.509188,0,1 [9030-6006 ],7.509188,0,1 [208568-9030 ],7.509188,0,1 [79-9030 ],7.509188,0,1 [33-9030 ],7.509188,0,1 [M-9030 ],7.509188,0,1 ... a few hundred more ... [LGQ9030PQ1 ],0.41475832,0,150 [LEQ9030PQ0 ],0.41475832,0,124 [LEQ9030PQ1 ],0.41475832,0,123 [CWE9030BCE ],0.41475832,0,115 [PJDS9030Z ],0.29327843,0,1 [8A-CT9-030-010 ],0.29327843,0,1 [RDT9030A],0.29327843,0,1 [PJDG9030Z ],0.29327843,0,1 [90302 ],0.20737916,6849,1 ~ -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002922.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Holy cow do I love 4.0's admin screen
Andy, we are not running solr 4.0 here in production. can you elaborate on your comment related to your polling script written in ruby and how the new data import status screen makes your polling app obsolete? i wrote my own polling app (in shell) to work around the very same issues: http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html thx for the post -- View this message in context: http://lucene.472066.n3.nabble.com/Holy-cow-do-I-love-4-0-s-admin-screen-tp4002912p4002936.html Sent from the Solr - User mailing list archive at Nabble.com.
using tie parameter of edismax to raise a score (disjunction max query)?
Hello all, this more specific question is related to my earlier post at: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-td4002897.html i am reading here about the tie parameter: http://wiki.apache.org/solr/ExtendedDisMax?highlight=%28edismax%29#tie_.28Tie_breaker.29 *can i use the edismax, tie= parameter, to raise the following score?* my goal is to raise the total score of this document (see score snippet below) to 9.11329. to do this - would i use tie=0.0 to make a pure disjunction max query -- only the maximum scoring sub query contributes to the final score? str name=90302 ,0046,046 *0.20737723* = (MATCH) max of: 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of: 0.022755474 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 9.11329 = idf(docFreq=2565, maxDocs=8566704) 0.0027743944 = queryNorm *9.11329* = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 9.11329 = idf(docFreq=2565, maxDocs=8566704) 1.0 = fieldNorm(field=itemNo, doc=1796597) /str thank you -- View this message in context: http://lucene.472066.n3.nabble.com/using-tie-parameter-of-edismax-to-raise-a-score-disjunction-max-query-tp4002935.html Sent from the Solr - User mailing list archive at Nabble.com.
need help understanding an issue with scoring
hello, i am trying to understand the debug output from a query, and specifically - how scores for two (2) documents are derived and why they are so far apart. the user is entering 9030 for the search the search is rightfully returning the top document, however - the question is why is the document with id 90302 so far down on the list. i have attached a text file i generated with xslt, pulling the document information. the text file has the itemNo, the rankNo and the partCnt. the sort order of the response handler is: str name=sortscore desc, rankNo desc, partCnt desc/str if you look at the text file - you will see that 90302 is 174'th on the list! 90302 has a rankNo of 6849 - and i would think that would drive it much higher on the list and therefore much closer to 9030. what is happening from a business perspective - is - 9030 is one of our top selling parts as is 90302. they need to be closer together in the results instead of separated by 170+ documents that have a rankNo of 0. i have also CnP the response handler that is being used - below can someone help me understand the scoring so i can correct this? this is the scoring for the two documents: str name=9030,0046,046 12.014634 = (MATCH) max of: 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 2308681), product of: 0.022755474 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 9.11329 = idf(docFreq=2565, maxDocs=8566704) 0.0027743944 = queryNorm 9.11329 = (MATCH) fieldWeight(itemNo:9030 in 2308681), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 9.11329 = idf(docFreq=2565, maxDocs=8566704) 1.0 = fieldNorm(field=itemNo, doc=2308681) 12.014634 = (MATCH) fieldWeight(itemNoExactMatchStr:9030 in 2308681), product of: 1.0 = tf(termFreq(itemNoExactMatchStr:9030)=1) 12.014634 = idf(docFreq=140, maxDocs=8566704) 1.0 = fieldNorm(field=itemNoExactMatchStr, doc=2308681) /str str name=90302 ,0046,046 0.20737723 = (MATCH) max of: 0.20737723 = (MATCH) weight(itemNo:9030^0.9 in 1796597), product of: 0.022755474 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 9.11329 = idf(docFreq=2565, maxDocs=8566704) 0.0027743944 = queryNorm 9.11329 = (MATCH) fieldWeight(itemNo:9030 in 1796597), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 9.11329 = idf(docFreq=2565, maxDocs=8566704) 1.0 = fieldNorm(field=itemNo, doc=1796597) /str ~ requestHandler name=itemNoProductTypeBrandSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8 brand^.5/str str name=q.alt*:*/str str name=sortscore desc, rankNo desc, partCnt desc/str str name=facettrue/str str name=facet.fielditemDescFacet/str str name=facet.fieldbrandFacet/str str name=facet.fielddivProductTypeIdFacet/str /lst lst name=appends /lst lst name=invariants /lst /requestHandler thank you for any help -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help understanding an issue with scoring
hello, this is the query i am using: cat goquery.sh #!/bin/bash SERVER=$1 PORT=$2 QUERY=http://$SERVER.blah.blah.com:${PORT}/solrpartscat/core1/select?qt=itemNoProductTypeBrandSearchq=9030rows=2000debugQuery=onfl=*,score; curl -v $QUERY -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-an-issue-with-scoring-tp4002897p4002969.html Sent from the Solr - User mailing list archive at Nabble.com.
need help understanding times used in dataimport?command=status
hello all, i noticed something in one of our logs that periodically polls the status of an data import. can someone help me understand where / how the times for Full Dump Started are derived? here it shows the dataimport dump starting at 1:32 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configdb-data-config.xml/str /lst /lst str name=commandstatus/str str name=statusbusy/str str name=importResponseA command is still running.../str lst name=statusMessages str name=Time Elapsed0:0:8.182/str str name=Total Requests made to DataSource2/str str name=Total Rows Fetched18834/str str name=Total Documents Processed18818/str str name=Total Documents Skipped0/str *str name=Full Dump Started2012-07-11 01:32:18/str* /lst str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response however - here is shows the dump starting at 2:17 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configdb-data-config.xml/str /lst /lst str name=commandstatus/str str name=statusbusy/str str name=importResponseA command is still running.../str lst name=statusMessages str name=Time Elapsed0:45:8.373/str str name=Total Requests made to DataSource3/str str name=Total Rows Fetched8138060/str str name=Total Documents Skipped0/str *str name=Full Dump Started2012-07-11 02:17:11/str* /lst str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configdb-data-config.xml/str /lst /lst str name=commandstatus/str str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource3/str str name=Total Rows Fetched8528239/str str name=Total Documents Skipped0/str *str name=Full Dump Started2012-07-11 02:17:11/str* str name=Indexing completed. Added/Updated: 8464051 documents. Deleted 0 documents./str *str name=Committed2012-07-11 02:21:17/str* *str name=Optimized2012-07-11 02:21:17/str* str name=Total Documents Processed8464051/str *str name=Time taken 0:48:58.712/str* /lst str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-understanding-times-used-in-dataimport-command-status-tp3994437.html Sent from the Solr - User mailing list archive at Nabble.com.
maxNumberOfBackups does not cleanup - jira 3361
environment: solr 3.5 hello all, i have a question on this jira - https://issues.apache.org/jira/browse/SOLR-3361 the jira states that, with backupAfter=commit, the backups do not get cleaned up however - we are noticing this same issue in our environment, when using optimize. can someone confirm that this bug applies to optimze as well? thank you example: str name=backupAfteroptimize/str -- View this message in context: http://lucene.472066.n3.nabble.com/maxNumberOfBackups-does-not-cleanup-jira-3361-tp3994156.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: maxNumberOfBackups does not cleanup - jira 3361
thank you James - that is good to know. for the short-term we'll just use cron and kill backup directories that are older than x. for the long-term, we'll just migrate to 4.0 thanks again -- View this message in context: http://lucene.472066.n3.nabble.com/maxNumberOfBackups-does-not-cleanup-jira-3361-tp3994156p3994191.html Sent from the Solr - User mailing list archive at Nabble.com.
avgTimePerRequest JMX M-Bean displays with NaN instead of 0 - when no activity
hello all, environment: solr 3.5, jboss, wily we have been setting up jmx monitoring for our solr installation. while running tests - i noticed that of the 6 JMX M-Beans (avgRequestsPerSecond, avgTimePerRequest, errors, requests, timeouts, totalTime) ... the avgTimePerRequest M-Bean was producing NaN when there was no search activity. all of the other M-Beans displayed a 0 (zero) when there was no search activity. we were able to compensate for this issue with custom scripting in wily on our side. can someone help me understand this inconsistency? is this just a WAD (works as a designed) ? thanks for any help or insight -- View this message in context: http://lucene.472066.n3.nabble.com/avgTimePerRequest-JMX-M-Bean-displays-with-NaN-instead-of-0-when-no-activity-tp3991962.html Sent from the Solr - User mailing list archive at Nabble.com.
question about jmx value (avgRequestsPerSecond) output from solr
hello all, environment: centOS, solr 3.5, jboss 5.1 i have been using wily (a monitoring tool) to instrument our solr instances in stress. can someone help me to understand something about the jmx values being output from solr? please note - i am new to JMX. problem / issue statement: for a given request handler (partItemDescSearch), i see output from the jmx MBean for the metric avgRequestsPerSecond - AFTER my test harness has completed and there is NO request activity to this request handler - taking place (verified in solr log files). example scenario during testing: during a test run - the test harness will fire requests at request handler (partItemDescSearch) and all numbers look fine. then after the test harness is done - the metric avgRequestsPerSecond does not immediately drop to 0. instead - it appears as if JMX is somehow averaging this metric and gradually trending it downward toward 0. continual checking of this metric (in the JMX tree - see screen shot) shows the number trending downward instead of a hard stop at 0. is this behavior - just the way jmx works? thanks mark http://lucene.472066.n3.nabble.com/file/n3991616/test1.bmp -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-jmx-value-avgRequestsPerSecond-output-from-solr-tp3991616.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: seeing errors during replication process on slave boxes - read past EOF
hello, i have shell scripts that handle all of the operational tasks. example: curl -v http://${SERVER}.bogus.com:${PORT}/somecore/dataimport -F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F optimize=${OPTIMIZE} -- View this message in context: http://lucene.472066.n3.nabble.com/seeing-errors-during-replication-process-on-slave-boxes-read-past-EOF-tp3987489p3987617.html Sent from the Solr - User mailing list archive at Nabble.com.
seeing errors during replication process on slave boxes - read past EOF
hello all, environment: solr 3.5 1 - master 2 - slave slaves are set to poll master every 10 minutes. i have had replication running on one master and two slaves - for a few weeks now. these boxes are not production boxes - just QA/test boxes. right after i started a re-index on the master - i started to see the following errors on both of the slave boxes. in previous test runs - i have not noticed any errors. can someone help me understand what is causing these errors? thank you, 2012-06-03 19:30:23,104 INFO [org.apache.solr.update.UpdateHandler] (pool-16-thread-1) start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) 2012-06-03 19:30:23,164 SEVERE [org.apache.solr.handler.ReplicationHandler] (pool-16-thread-1) SnapPull failed org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:268) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: java.io.IOException: read past EOF: MMapIndexInput(path=/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdx) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1103) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418) at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:470) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:321) ... 11 more Caused by: java.io.IOException: read past EOF: MMapIndexInput(path=/appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdx) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte(MMapDirectory.java:279) at org.apache.lucene.store.DataInput.readInt(DataInput.java:84) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readInt(MMapDirectory.java:315) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:138) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:212) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:93) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:235) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:34) at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:506) at org.apache.lucene.index.DirectoryReader.access$000(DirectoryReader.java:45) at org.apache.lucene.index.DirectoryReader$2.doBody(DirectoryReader.java:498) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754) at org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:493) at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450) at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:396) at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:520) at org.apache.lucene.index.IndexReader.reopen(IndexReader.java:697) at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:414) at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:425) at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:35) at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:501) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1083) ... 14 more 2012-06-03 19:30:23,197 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Skipping download for /appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kiq.tis 2012-06-03 19:30:23,198 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Skipping download for /appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kit.tis 2012-06-03 19:30:23,198 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Skipping download for /appl/solr/stress/partcatalog/index/core1/index.20120514101522/_5kgm.fdt 2012-06-03 19:30:23,198 INFO
eliminate adminPath tag from solr.xml file?
hello all, referring to: http://wiki.apache.org/solr/CoreAdmin#Core_Administration if you wanted to eliminate administration of the core from the web site, could you eliminate either solr.xml or remove the cores adminPath=/admin/cores from the solr.xml file? thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/eliminate-adminPath-tag-from-solr-xml-file-tp3987262.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: possible status codes from solr during a (DIH) data import process
thank you ALL for the great feedback - very much appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110p3987263.html Sent from the Solr - User mailing list archive at Nabble.com.
possible status codes from solr during a (DIH) data import process
hello all, i have been asked to write a small polling script (bash) to periodically check the status of an import on our Master. our import times are small, but there are business reasons why we want to know the status of an import after a specified amount of time. i need to perform certain actions based on the status of the import, and therefore need to quantify which tags to check and their appropriate states. i am using the command from the DataImportHandler HTTP API to get the status of the import: OUTPUT=$(curl -v http://${SERVER}:${PORT}/somecore/dataimport?command=status) can someone tell me if i have these rules correct? 1) during an import - the status tag will have a busy state: example: str name=statusbusy/str 2) at the completion of an import (regardless of failure or success) the status tag will have an idle state: example: str name=statusidle/str 3) to determine if an import failed or succeeded - you must interrogate the tags under lst name=statusMessages and specifically look for : success: str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0 documents./str failure: str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0 documents./str thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html Sent from the Solr - User mailing list archive at Nabble.com.
need to verify my understanding of default value of mm (minimum match) for edismax
environment: solr 3.5 default operator is OR i want to make sure i understand how the mm param(minimum match) works for the edismax parser http://wiki.apache.org/solr/ExtendedDisMax?highlight=%28dismax%29#mm_.28Minimum_.27Should.27_Match.29 it looks like the rule is 100% of the terms must match across the fields, unless i over ride this with the mm=x param - do i have this right? what i am seeing is a query that matches on: q=singer sewing 9010 will fail if it is changed to: q=singer sewing machine 9010 for the second query - if i add mm=3 - then it comes back with results thank you -- View this message in context: http://lucene.472066.n3.nabble.com/need-to-verify-my-understanding-of-default-value-of-mm-minimum-match-for-edismax-tp3985936.html Sent from the Solr - User mailing list archive at Nabble.com.
index-time boosting using DIH
hello all, can i use the technique described on the wiki at: http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts if i am populating my core using a DIH? looking at the posts on this subject and the wiki docs - leads me to believe that you can only use this when you are using the xml interface for importing data? thank you -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: index-time boosting using DIH
thanks for the reply, so to use the $docBoost pseudo-field name, would you do something like below - and would this technique likely increase my total index time? dataConfig dataSource .../ document name=mydoc entity name=myentity transformer=script:BoostDoc query=select ... field column=SOME_COLUMN name=someField / ... -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985527.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: index-time boosting using DIH
thank you james for the feedback - i appreciate it. ultimately - i was trying to decide if i was missing the boat by ONLY using query time boosting, and i should really be using index time boosting. but after your reply, reading the solr book, and looking at the lucene dox - it looks like index-time boosting is not what i need. i can probably do better by using query-time boosting and the proper sort params. thanks again -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985539.html Sent from the Solr - User mailing list archive at Nabble.com.
need help with getting exact matches to score higher
Hello all, i am trying to tune our core for exact matches on a single field (itemNo) and having issues getting it to work. in addition - i need help understanding the output from debugQuery=on where it presents the scoring. my goal is to get exact matches to arrive at the top of the results. however - what i am seeing is non-exact matches arrive at the top of the results with MUCH higher scores. // from schema.xml - i am copying itemNo in to the string field for use in boosting field name=itemNoExactMatchStr type=string indexed=true stored=false/ copyField source=itemNo dest=itemNoExactMatchStr/ // from solrconfig.xml - i have the boost set for my special exact match field and the sorting on score desc. requestHandler name=itemNoProductTypeBrandSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int *str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8 brand^.5/str* str name=q.alt*:*/str * str name=sortscore desc/str* str name=facettrue/str str name=facet.fielditemDescFacet/str str name=facet.fieldbrandFacet/str str name=facet.fielddivProductTypeIdFacet/str /lst lst name=appends /lst lst name=invariants /lst /requestHandler // analysis output from debugQuery=on here you can see that the top socre for itemNo:9030 is a part that does not start with 9030. the entries below (there are 4) all have exact matches - but they rank below this part - ??? str name=quot;0904000,1354 ,lt;b2TTZ9030C1000A* 0.585678 = (MATCH) max of: 0.585678 = (MATCH) weight(itemNo:9030^0.9 in 582979), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 27.173943 = (MATCH) fieldWeight(itemNo:9030 in 582979), product of: 2.6457512 = tf(termFreq(itemNo:9030)=7) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=582979) /str str name=quot;122,1232 ,lt;b9030* 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 499864), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 499864), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=499864) /str str name=quot;0537220,1882 ,lt;b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 538826), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 538826), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=538826) /str str name=quot;0537220,2123 ,lt;b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544313), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544313), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=544313) /str str name=quot;0537220,2087 ,lt;b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544657), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544657), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=544657) /str -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-with-getting-exact-matches-to-score-higher-tp3983882.html Sent from the Solr - User mailing list archive at Nabble.com.
doing a full-import after deleting records in the database - maxDocs
hello, After doing a DIH full-import (with clean=true) after deleting records in the database, i noticed that the number of documents processed, did change. example: Indexing completed. Added/Updated: 595908 documents. Deleted 0 documents. however, i noticed the numbers on the statistics page did not change nor do they match the number of indexed records - can someone help me understand the difference in these numbers and the meaning of maxDoc / numDoc? numDocs : 594893 maxDoc : 594893 -- View this message in context: http://lucene.472066.n3.nabble.com/doing-a-full-import-after-deleting-records-in-the-database-maxDocs-tp3983948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: doing a full-import after deleting records in the database - maxDocs
hello thanks for the reply this is the output - docsPending = 0 commits : 1786 autocommit maxDocs : 1000 autocommit maxTime : 6ms autocommits : 1786 optimizes : 3 rollbacks : 0 expungeDeletes : 0 docsPending : 0 adds : 0 deletesById : 0 deletesByQuery : 0 errors : 0 cumulative_adds : 1787752 cumulative_deletesById : 0 cumulative_deletesByQuery : 3 cumulative_errors : 0 -- View this message in context: http://lucene.472066.n3.nabble.com/doing-a-full-import-after-deleting-records-in-the-database-maxDocs-tp3983948p3983995.html Sent from the Solr - User mailing list archive at Nabble.com.
not getting expected results when doing a delta import via full import
hello all, i am not getting the expected results when trying to set up delta imports according to the wiki documentation here: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport?highlight=%28delta%29|%28import%29 i have the following set up in my DIH, query=select [complicated sql goes here] and ('${dataimporter.request.clean}' != 'false' OR some_table.upd_by_ts '${dataimporter.last_index_time}') i have the following set up in the shell script to invoke my import process (either a full w/clean or delta) # change clean=true for full, clean=false for delta SERVER=http://some_server:port/some_core/dataimport -F command=full-import -F clean=false curl $SERVER when i do a full import (clean=true) i see all of the documents (via the stats page) show up in the core. when i do a delta import (clean=false) i see ~900 fewer records in the import, but i should see much fewer (~84,000) records less, based on the fact that i am updating the upd_by_ts field to the current timestamp on 84,000 records! can someone tell me what i am missing? thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/not-getting-expected-results-when-doing-a-delta-import-via-full-import-tp3983711.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: not getting expected results when doing a delta import via full import
update on this: i also tried manipulating the timestamps in the dataimport.properties file to advance the date so that no records could be older than last_index_time example: #Mon May 14 12:42:49 CDT 2012 core1-model.last_index_time=2012-05-15 14\:38\:55 last_index_time=2012-05-15 14\:38\:55 ~ this leads me to believe that date comparisons are not being done correctly or have not been configured correctly. so what does something need to be configured for the date comparison to work? example from wiki: OR last_modified '${*dataimporter.last_index_time*}' -- View this message in context: http://lucene.472066.n3.nabble.com/not-getting-expected-results-when-doing-a-delta-import-via-full-import-tp3983711p3983715.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: should slave replication be turned off / on during master clean and re-index?
thanks for all of the advice / help. i appreciate it ;) -- View this message in context: http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3959088.html Sent from the Solr - User mailing list archive at Nabble.com.
solr snapshots - old school and replication - new school ?
hello all, enviornment: centOS and solr 3.5 i want to make sure i understand the difference between snapshots and solr replication. snapshots are old school and have been deprecated with solr replication new school. do i have this correct? btw: i have replication working (now), between my master and two slaves - i just want to make sure i am not missing a larger picture ;) i have been reading the Smiley Pugh book (pg 349) as well as material on the wiki at: http://wiki.apache.org/solr/SolrCollectionDistributionScripts http://wiki.apache.org/solr/SolrReplication thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/solr-snapshots-old-school-and-replication-new-school-tp3959152.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: should slave replication be turned off / on during master clean and re-index?
hello shawn, thanks for the reply. ok - i did some testing and yes you are correct. autocommit is doing the commit work in chunks. yes - the slaves are also going to having everything to nothing, then slowly building back up again, lagging behind the master. ... and yes - this is probably not what we need - as far as a replication strategy for the slaves. you said, you don't use autocommit. if so - then why don't you use / like autocommit? since we have not done this here - there is no established reference point, from an operations perspective. i am looking to formulate some sort of operation strategy, so ANY ideas or input is really welcome. it seems to me that we have to account for two operational strategies - the first operational mode is a daily append to the solr core after the database tables have been updated. this can probably be done with a simple delta import. i would think that autocommit could remain on for the master and replication could also be left on so the slaves picked up the changes ASAP. this seems like the mode that we would / should be in most of the time. the second operational mode would be a build from scratch mode, where changes in the schema necessitated a full re-index of the data. given that our site (powered by solr) must be up all of the time, and that our full index time on the master (for the moment) is hovering somewhere around 16 hours - it makes sense that some sort of parallel path - with a cut-over, must be used. in this situation is it possible to have the indexing process going on in the background - then have one commit at the end - then turn replication on for the slaves? are there disadvantages to this approach? also - i really like your suggestion of a build core and live core. is this approach you use? thank you for all of the great input then -- View this message in context: http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html Sent from the Solr - User mailing list archive at Nabble.com.
dataimport handler (DIH) - notify when it has finished?
Hello all, is there a notification / trigger / callback mechanism people use that allows them to know when a dataimport process has finished? we will be doing daily delta-imports and i need some way for an operations group to know when the DIH has finished. thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-handler-DIH-notify-when-it-has-finished-tp3953339.html Sent from the Solr - User mailing list archive at Nabble.com.
should slave replication be turned off / on during master clean and re-index?
hello all, i am just getting replication going on our master and two (2) slaves. from time to time, i may need to do a complete re-index and clean on the master. should replication on the slave - remain On or Off during a full clean and re-index on the Master? thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: should slave replication be turned off / on during master clean and re-index?
hello, thank you for the reply, Does a clean mean issuing a deletion query (e.g. deleteid*:*/id/delete) prior to re-indexing all of your content? I don't think the slaves will download any changes until you've committed at some point on the master. well, in this case when i say, clean (on the Master), i mean selecting the Full Import with Cleaning button from the DataImportHandler Development Console page in solr. at the top of the page, i have the check boxes selected for verbose and clean (*but i don't have the commit checkbox selected*). by doing the above process - doesn't this issue a deletion query - then start the import? and as a follow-up - when actually is the commit being done? here is my from my solrconfig.xml file on the master updateHandler class=solr.DirectUpdateHandler2 *autoCommit maxTime6/maxTime maxDocs1000/maxDocs /autoCommit* maxPendingDeletes10/maxPendingDeletes /updateHandler -- View this message in context: http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3945954.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication failing with error: Master at: is not available. Index fetch failed
hello, sorry - i overlooked this message - thanks for checking back and thanks for the info. yes - replication seems to be working now: tailed from logs just now: 2012-04-26 09:21:33,284 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-26 09:21:53,279 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-26 09:22:13,279 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-26 09:22:33,279 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3941447.html Sent from the Solr - User mailing list archive at Nabble.com.
impact of EdgeNGramFilterFactory on indexing process?
Hello all, i am experimenting with EdgeNGramFilterFactory - on two of the fieldTypes in my schema. filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=15 side=front/ i believe i understand this - but want to verify: 1) will this increase my index time? 2) will increase the number of documents in my index? thank you -- View this message in context: http://lucene.472066.n3.nabble.com/impact-of-EdgeNGramFilterFactory-on-indexing-process-tp3941743p3941743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: faceted searches - design question - facet field not part of qf search fields
thank you BOTH, Erick and Hos for the insight. -- View this message in context: http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3938080.html Sent from the Solr - User mailing list archive at Nabble.com.
correct location in chain for EdgeNGramFilterFactory ?
hello all, i want to experiment with the EdgeNGramFilterFactory at index time. i believe this needs to go in post tokenization - but i am doing a pattern replace as well as other things. should the EdgeNGramFilterFactory go in right after the pattern replace? fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.PatternReplaceFilterFactory pattern=\. replacement= replace=all/ *put EdgeNGramFilterFactory here === ?* filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.PatternReplaceFilterFactory pattern=\. replacement= replace=all/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType thanks for any help, -- View this message in context: http://lucene.472066.n3.nabble.com/correct-location-in-chain-for-EdgeNGramFilterFactory-tp3935589p3935589.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication failing with error: Master at: is not available. Index fetch failed
hello, thank you for the reply, yes - master has been indexed. ok - makes sense - the polling interval needs to change i did check the solr war file on both boxes (master and slave). they are identical. actually - if they were not indentical - this would point to a different issue altogether - since our deployment infrastructure - rolls the war file to the slaves when you do a deployment on the master. this has me stumped - not sure what to check next. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication failing with error: Master at: is not available. Index fetch failed
that was it! thank you. i did notice something else in the logs now ... what is the meaning or implication of the message, Connection reset.? 2012-04-24 12:59:19,996 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 12:59:39,998 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. *2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Master at: http://bogus:bogusport/somepath/somecore/replication/ is not available. Index fetch failed. Exception: Connection reset* 2012-04-24 13:00:19,998 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:00:40,004 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:00:59,992 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:19,993 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:39,992 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:59,989 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:19,990 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:39,989 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:59,991 INFO [org.a -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html Sent from the Solr - User mailing list archive at Nabble.com.
faceted searches - design question - facet field not part of qf search fields
hello all, this is more of a design / newbie question on how others combine faceted search fields in to their requestHandlers. say you have a request handler set up like below. does it make sense (from a design perspective) to add a faceted search field that is NOT part of the main search fields (itemNo, productType, brand) in the qf param? for example, augment the requestHandler below to include a faceted search on itemDesc? would this be confusing ? - to be searching across three fields - but offering faceted suggestions on itemDesc? just trying to understand how others approach this thanks requestHandler name=generalSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemNo^1.0 productType^.8 brand^.5/str str name=q.alt*:*/str /lst lst name=appends /lst lst name=invariants str name=facetfalse/str /lst /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3936509.html Sent from the Solr - User mailing list archive at Nabble.com.
solr replication failing with error: Master at: is not available. Index fetch failed
hello all, enviornment: centOS and solr 3.5 i am attempting to set up replication betweeen two solr boxes (master and slave). i am getting the following in the logs on the slave box. 2012-04-23 10:54:59,985 SEVERE [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Master at: http://someip:someport/somepath/somecore/admin/replication/ is not available. Index fetch failed. Exception: Invalid version (expected 2, but 10) or the data in not in 'javabin' format master jvm (jboss host) is being started like this: -Denable.master=true slave jvm (jboss host) is being started like this: -Denable.slave=true does anyone have any ideas? i have done the following: used curl http://someip:someport/somepath/somecore/admin/replication/ from slave to successfully see master used ping from slave to master switched out the dns name for master to hard coded ip address made sure i can see http://someip:someport/somepath/somecore/admin/replication/ in a browser this is my request handler - i am using the same config file on both the master and slave - but sending in the appropriate switch on start up (per the solr wiki page on replication) lst name=master str name=enable${enable.master:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=confFilesschema.xml,stopwords.txt,elevate.xml/str str name=commitReserveDuration00:00:10/str /lst str name=maxNumberOfBackups1/str lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://someip:someport/somecore/admin/replication//str str name=pollInterval00:00:20/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler any suggestions would be great thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3932921.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: searching across multiple fields using edismax - am i setting this up right?
thank you for the response. it seems to be working well ;) 1) i tried your suggestion about removing the qt parameter - *somecore/partItemNoSearch*q=dishwasherdebugQuery=onrows=10 but this results in a 404 error message - is there some configuration i am missing to support this short-hand syntax for specifying the requestHandler in the url ? 2) ok - good suggestion. 3) yes it looks like it IS searching across all three (3) fields. i noticed that for the itemNo field, it reduced the search string from dishwasher to dishwash - it this because of stemming on the field type, used for the itemNo field? lst name=debugstr name=rawquerystringdishwasher/strstr name=querystringdishwasher/strstr name=parsedquery+DisjunctionMaxQuery((brand:dishwasher^0.5 | *itemNo:dishwash* | productType:dishwasher^0.8))/strstr name=parsedquery_toString+(brand:dishwasher^0.5 | itemNo:dishwash | productType:dishwasher^0.8)/str -- View this message in context: http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3907875.html Sent from the Solr - User mailing list archive at Nabble.com.
is there a downside to combining search fields with copyfield?
hello everyone, can people give me their thoughts on this. currently, my schema has individual fields to search on. are there advantages or disadvantages to taking several of the individual search fields and combining them in to a single search field? would this affect search times, term tokenization or possibly other things. example of individual fields brand category partno example of a single combined search field part_info (would combine brand, category and partno) thank you for any feedback mark -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3905349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there a downside to combining search fields with copyfield?
You end up with one multivalued field, which means that you can only have one analyzer chain. actually two of the three fields being considered for combination in to a single field ARE multivalued fields. would this be an issue? With separate fields, each field can be analyzed differently. Also, if you are indexing and/or storing the individual fields, you may have data duplication in your index, making it larger and increasing your disk/RAM requirements. this makes sense That field will have a higher termcount than the individual fields, which means that searches against it will naturally be just a little bit slower. ok Your application will not have to do as much work to construct a query, though. actually this is the primary reason this came up. If you are already planning to use dismax/edismax, then you don't need the overhead of a copyField. You can simply provide access to (e)dismax search with the qf (and possibly pf) parameters predefined, or your application can provide these parameters. http://wiki.apache.org/solr/ExtendedDisMax can you elaborate on this and how EDisMax would preclude the need for copyfield? i am using extended dismax now in my response handlers. here is an example of one of my requestHandlers requestHandler name=partItemNoSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows5/int str name=qfitemNo^1.0/str str name=q.alt*:*/str /lst lst name=appends str name=fqitemType:1/str str name=sortrankNo asc, score desc/str /lst lst name=invariants str name=facetfalse/str /lst /requestHandler Thanks, Shawn -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3906265.html Sent from the Solr - User mailing list archive at Nabble.com.
searching across multiple fields using edismax - am i setting this up right?
hello all, i just want to check to make sure i have this right. i was reading on this page: http://wiki.apache.org/solr/ExtendedDisMax, thanks to shawn for educating me. *i want the user to be able to fire a requestHandler but search across multiple fields (itemNo, productType and brand) WITHOUT them having to specify in the query url what fields they want / need to search on* this is what i have in my request handler requestHandler name=partItemNoSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows5/int *str name=qfitemNo^1.0 productType^.8 brand^.5/str* str name=q.alt*:*/str /lst lst name=appends str name=sortrankNo asc, score desc/str /lst lst name=invariants str name=facetfalse/str /lst /requestHandler this would be an example of a single term search going against all three of the fields http://bogus:bogus/somecore/select?qt=partItemNoSearchq=*dishwasher*debugQuery=onrows=100 this would be an example of a multiple term search across all three of the fields http://bogus:bogus/somecore/select?qt=partItemNoSearchq=*dishwasher 123-xyz*debugQuery=onrows=100 do i understand this correctly? thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/searching-across-multiple-fields-using-edismax-am-i-setting-this-up-right-tp3906334p3906334.html Sent from the Solr - User mailing list archive at Nabble.com.
why does building war from source produce a different size file?
hello all, i have been pulling down the 3.5 solr war file from the mirror site. the size of this file is: 6403279 Nov 22 14:54 apache-solr-3.5.0.war when i build the war file from source - i get a different sized file: ./dist/apache-solr-3.5-SNAPSHOT.war 6404098 Mar 29 11:41 ./dist/apache-solr-3.5-SNAPSHOT.war am i building from the wrong source? -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-building-war-from-source-produce-a-different-size-file-tp3868307p3868307.html Sent from the Solr - User mailing list archive at Nabble.com.
authentication for solr admin page?
hello, environment: running solr 3.5 under jboss 5.1 i have been searching the user list along with the locations below - to find out how you require a user to authenticate in to the solr /admin page. i thought this would be a common issue - but maybe not ;) any help would be apprecaited thank you, mark http://drupal.org/node/658466 http://wiki.apache.org/solr/SolrSecurity#Write_Your_Own_RequestHandler_or_SearchComponent -- View this message in context: http://lucene.472066.n3.nabble.com/authentication-for-solr-admin-page-tp3865665p3865665.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: preventing words from being indexed in spellcheck dictionary?
thank you, James. -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3865670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: authentication for solr admin page?
update - ok - i was reading about replication here: http://wiki.apache.org/solr/SolrReplication and noticed comments in the solrconfig.xml file related to HTTP Basic Authentication and the usage of the following tags: str name=httpBasicAuthUserusername/str str name=httpBasicAuthPasswordpassword/str *Can i place these tags in the request handler to achieve an authentication scheme for the /admin page?* // snipped from the solrconfig.xml file requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers/ thanks for any help mark -- View this message in context: http://lucene.472066.n3.nabble.com/authentication-for-solr-admin-page-tp3865665p3865747.html Sent from the Solr - User mailing list archive at Nabble.com.
preventing words from being indexed in spellcheck dictionary?
hello all, i am creating a spellcheck dictionary from the itemDescSpell field in my schema. is there a way to prevent certain words from entering the dictionary - as the dictionary is being built? thanks for any help mark // snipped from solarconfig.xml lst name=spellchecker str name=namedefault/str str name=fielditemDescSpell/str str name=buildOnOptimizetrue/str str name=spellcheckIndexDirspellchecker_mark/str -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: preventing words from being indexed in spellcheck dictionary?
thank you very much for the info ;) -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861987.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: preventing words from being indexed in spellcheck dictionary?
hello, should i apply the StopFilterFactory at index time or query time. right now - per the schema below - i am applying it at BOTH index time and query time. is this correct? thank you, mark // snipped from schema.xml field name=itemDescSpell type=textSpell/ fieldType name=textSpell class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html Sent from the Solr - User mailing list archive at Nabble.com.
spellcheck file format - multiple words on a line?
hello all, for business reasons, we are sourcing the spellcheck file from another business group. the file we receive looks like the example data below can solr support this type of format - or do i need to process this file in to a format that has a single word on a single line? thanks for any help mark // snipped from spellcheck file sourced from business group 14-INCH CHAIN 14-INCH RIGHT TINE 1/4 open end ignition wrench 150 DEGREES CELSIUS 15 foot I wire 15 INCH 15 WATT 16 HORSEPOWER ENGINE 16 HORSEPOWER GASOLINE ENGINE 16-INCH BAR 16-INCH CHAIN 16l Cross 16p SIXTEEN PIECE FLAT FLEXIBLE CABLE -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-file-format-multiple-words-on-a-line-tp3853096p3853096.html Sent from the Solr - User mailing list archive at Nabble.com.
suggestions on automated testing for solr output
hello all, i know this is never a fun topic for people, but our SDLC mandates that we have unit test cases that attempt to validate the output from specific solr queries. i have some ideas on how to do this, but would really appreciate feedback from anyone that has done this or is doing it now. the ideal situation (for this environment) would be something script based and automated. thanks for any input, mark -- View this message in context: http://lucene.472066.n3.nabble.com/suggestions-on-automated-testing-for-solr-output-tp3833049p3833049.html Sent from the Solr - User mailing list archive at Nabble.com.
does solr have a mechanism for intercepting requests - before they are handed off to a request handler
hello all, does solr have a mechanism that could intercept a request (before it is handed off to a request handler). the intent (from the business) is to send in a generic request - then pre-parse the url and send it off to a specific request handler. thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-have-a-mechanism-for-intercepting-requests-before-they-are-handed-off-to-a-request-handler-tp3813255p3813255.html Sent from the Solr - User mailing list archive at Nabble.com.
need input - lessons learned or best practices for data imports
hello all, we are approaching the time when we will move our first solr core in to a more production like environment. as a precursor to this, i am attempting to write some documents on impact assessment and batch load / data import strategies. does anyone have processes or lessons learned - that they can share? maybe a good place to start - but not limited to - would be how do people monitor data imports (we are using a very simple DIH hooked to an informix schema) and send out appropriate notifications? thank you for any help or suggestions, mark -- View this message in context: http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-tp3801327p3801327.html Sent from the Solr - User mailing list archive at Nabble.com.
does the location of a match (within a field) affect the score?
hello all, example: i have a field named itemNo the user does a search, itemNo:665 there are three document in the core, that look like this doc1 - itemNo = 1237899*665* doc2 - itemNo = *665*1237899 doc3 - itemNo = 123*665*7899 does the location or placement of the search string (beginning, middle, end) affect the scoring of the document? -- View this message in context: http://lucene.472066.n3.nabble.com/does-the-location-of-a-match-within-a-field-affect-the-score-tp3793634p3793634.html Sent from the Solr - User mailing list archive at Nabble.com.
need to support bi-directional synonyms
hello all, i need to support the following: if the user enters sprayer in the desc field - then they get results for BOTH sprayer and washer. and in the other direction if the user enters washer in the desc field - then they get results for BOTH washer and sprayer. would i set up my synonym file like this? assuming expand = true.. sprayer = washer washer = sprayer thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/need-to-support-bi-directional-synonyms-tp3767990p3767990.html Sent from the Solr - User mailing list archive at Nabble.com.
proper syntax for using sort query parameter in responseHandler
what is the proper syntax for including sort directive in my responseHandler? i tried this but got an error: requestHandler name=partItemNoSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemNo^1.0/str str name=q.alt*:*/str * str name=sortrankNo desc/str* /lst lst name=appends str name=fqitemType:1/str /lst lst name=invariants str name=facetfalse/str /lst /requestHandler thank you mark -- View this message in context: http://lucene.472066.n3.nabble.com/proper-syntax-for-using-sort-query-parameter-in-responseHandler-tp3755077p3755077.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: spellcheck configuration not providing suggestions or corrections
hello thank you for the suggestion - however this did not work. i went in to solrconfig and change the count to 20 - then restarted the server and then did a reimport. is it possible that i am not firing the request handler that i think i am firing ? requestHandler name=/search class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellcheck.dictionarydefault/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count20/str str name=echoParamsexplicit/str /lst arr name=last-components strspellcheck/str /arr /requestHandler query sent to server: http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemDescSpell%3Agusket%0D%0Aversion=2.2start=0rows=10indent=onspellcheck=truespellcheck.build=true results: responselst name=responseHeaderint name=status0/intint name=QTime0/intlst name=paramsstr name=spellchecktrue/strstr name=indenton/strstr name=start0/strstr name=qitemDescSpell:gusket /strstr name=spellcheck.buildtrue/strstr name=rows10/strstr name=version2.2/str/lst/lstresult name=response numFound=0 start=0//response -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-configuration-not-providing-suggestions-or-corrections-tp3740877p3741521.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: spellcheck configuration not providing suggestions or corrections
thank you sooo much - that was it. also - thank you for the tip on which field to hit, eg itemDesc in stead of itemDescSpell. thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-configuration-not-providing-suggestions-or-corrections-tp3740877p3741783.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots
hello, Or does your field in schema.xml have anything like autoGeneratePhraseQueries=true in it? there is no reference to this in our production schema. this is extremely confusing. i am not completely clear on the issue? reviewing our previous messages - it looks like the data is being tokenized correctly according to the analysis page and output from Luke. it also looks like the definition of the field and field type is correct in the schema.xml it also looks like there is no errant data (quotes) being introduced in to the query string submitted to solr: example: *http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select?indent=onversion=2.2q=itemNo%3ABP21UAAfq=start=0rows=10fl=*%2Cscoreqt=wt=debugQuery=onexplainOther=hl.fl=* *so - does the real issue reside in HOW the query is being contructed / parsed ??? and if so - what drives this query to become a MultiPhraseQuery with embedded quotes * lst name=debugstr name=rawquerystringitemNo:BP21UAA /strstr name=querystringitemNo:BP21UAA /strstr name=parsedqueryMultiPhraseQuery(itemNo:bp 21 (uaa bp21uaa))/strstr name=parsedquery_toStringitemNo:bp 21 (uaa bp21uaa)/str please note - i also mocked up a simple test on my personal linux box - just using the solr 3.5 distro (we are using 3.3.0 on our production box under centOS) i was able to get a simple test to work and yes - my query does look different output from my simple mock up on my personal box: *http://localhost:8983/solr/select?indent=onversion=2.2q=manu%3ABP21UAAfq=start=0rows=10fl=*%2Cscoreqt=wt=debugQuery=onexplainOther=hl.fl=* lst name=debugstr name=rawquerystringmanu:BP21UAA/strstr name=querystringmanu:BP21UAA/strstr name=parsedquerymanu:bp manu:21 manu:uaa manu:bp21uaa/strstr name=parsedquery_toStringmanu:bp manu:21 manu:uaa manu:bp21uaa/strlst name=explain schema.xml fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100analyzer type=indextokenizer class=solr.WhitespaceTokenizerFactory/filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter class=solr.PorterStemFilterFactory//analyzeranalyzer type=querytokenizer class=solr.WhitespaceTokenizerFactory/filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=1 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter class=solr.PorterStemFilterFactory//analyzer/fieldType field name=manu type=text_en_splitting indexed=true stored=true omitNorms=true/ any suggestions would be greatly appreciated. mark -- View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3733486.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots
OK, first question is why are you searching on two different values? Is that intentional? yes - our users have to be able to locate a part or model number (that may or may not have periods in that number) even if they do NOT enter the number with the embedded periods. example: actual part number in our database is BP2.1UAA however the user needs to be able to search on BP21UAA and find that part. there are business reason why a user may see something different in the field then is actually in the database. does this make sense? If I'm reading your problem right, you should be able to get/not get any response just by toggling whether the period is in the search URL, right? yes - simply put - the user MUST get a hit on the above mentioned part if they enter BP21UAA or BP2.1UAA. But assuming that's not the problem, there's something you're not telling us. In particular, why is this parsing as MultiPhraseQuer? sorry - i did not know i was doing this or how it happened - it was not intentional and i did not notice this until your posting. i am not sure of the implications related to this or what it means to have something as a MultiPhraseQuery. Are you putting quotes in somehow, either through the URL or by something in your solrconfig.xml? i did not use quotes in the url - i cut and pasted the urls for my tests in the message thread. i do not see quotes as part of the url in my previous post. what would i be looking for in the solrconfig.xml file that would force the MultiPhraseQuery? it seems that this is the crux of the issue - but i am not sure how to determine what is manifesting the quotes? as previously stated - the quotes are not being entered via the url - they are pasted (in this message thread) exactly as i pulled them from the browser. thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots
hello, thank you for the reply. yes - i did re-index after the changes to the schema. also - thank you for the direction on using the analyzer - but i am not sure if i am interpreting the feedback from the analyzer correctly. here is what i did: in the Field value (Index) box - i placed this: BP2.1UAA in the Field value (Query) box - i placed this: BP21UAA then after hitting the Analyze button - i see the following: Under Index Analyzer for: org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33, generateWordParts=1, catenateAll=1, catenateNumbers=1} i see position1 2 3 4 term text BP 2 1 UAA 21 BP21UAA Under Query Analyzer for: org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33, generateWordParts=1, catenateAll=1, catenateNumbers=1} i see position1 2 3 term text BP 21 UAA BP21UAA the above information leads me to believe that i should have BP21UAA as an indexed term generated from the BP2.1UAA value coming from the database. also - the query analysis lead me to believe that i should find a document when i search on BP21UAA in the itemNo field do i have this correct am i missing something here? i am still unable to get a hit when i search on BP21UAA in the itemNo field. thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots
hello, thanks for sticking with me on this ...very frustrating ok - i did perform the query with the debug parms using two scenarios: 1) a successful search (where i insert the period / dot) in to the itemNo field and the search returns a document. itemNo:BP2.1UAA http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAAversion=2.2start=0rows=10indent=ondebugQuery=on results from debug ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=indenton/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=start0/str str name=qitemNo:BP2.1UAA/str /lst /lst result name=response numFound=1 start=0 doc arr name=brandstrPHILIPS/str/arr str name=groupId0333500/str str name=id0333500,1549 ,BP2.1UAA /str str name=itemDescPLASMA TELEVISION/str str name=itemNoBP2.1UAA /str int name=itemType2/int arr name=modelstrBP2.1UAA /str/arr arr name=productTypestrPlasma Television^/str/arr int name=rankNo0/int str name=supplierId1549 /str /doc /result lst name=debug str name=rawquerystringitemNo:BP2.1UAA/str str name=querystringitemNo:BP2.1UAA/str str name=parsedqueryMultiPhraseQuery(itemNo:bp 2 (1 21) (uaa bp21uaa))/str str name=parsedquery_toStringitemNo:bp 2 (1 21) (uaa bp21uaa)/str lst name=explain str name=0333500,1549 ,BP2.1UAA 22.539911 = (MATCH) weight(itemNo:bp 2 (1 21) (uaa bp21uaa) in 134993), product of: 0.9994 = queryWeight(itemNo:bp 2 (1 21) (uaa bp21uaa)), product of: 45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1) 0.02218287 = queryNorm 22.539913 = (MATCH) fieldWeight(itemNo:bp 2 (1 21) (uaa bp21uaa) in 134993), product of: 1.0 = tf(phraseFreq=1.0) 45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1) 0.5 = fieldNorm(field=itemNo, doc=134993) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time1.0/double lst name=prepare double name=time0.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time1.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response 2) a NON-successful search (where i do NOT insert a period / dot) in to the itemNo field and the search does NOT return a document itemNo:BP21UAA http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAAversion=2.2start=0rows=10indent=ondebugQuery=on ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=indenton/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=start0/str str name=qitemNo:BP21UAA/str /lst /lst result name=response numFound=0 start=0/ lst name=debug str name=rawquerystringitemNo:BP21UAA/str str name=querystringitemNo:BP21UAA/str str name=parsedqueryMultiPhraseQuery(itemNo:bp 21 (uaa bp21uaa))/str str name=parsedquery_toStringitemNo:bp 21 (uaa bp21uaa)/str lst name=explain/ str name=QParserLuceneQParser/str lst name=timing double name=time1.0/double lst name=prepare double name=time1.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst