Re: FunctionQuery score=0
Definitely worked for me, with a classic full text search on ipod and such. Changing the lower bound changed the number of results. Follow Chris advice, and give more details. John wrote: Doesn't seem to work. I though that FilterQueries work before the search is performed and not after... no? Debug doesn't include filter query only the below (changed a bit): BoostedQuery(boost(+fieldName:,boostedFunction(ord(fieldName),query))) On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez andre.b...@kelkoo.comwrote: John wrote: Some of the results are receiving score=0 in my function and I would like them not to appear in the search results. you can use frange, and filter by score: q=ipodfq={!frange l=0 incl=false}query($q) -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/
Re: fieldCache problem OOM exception
dear erolagnab, it is your code in the solr server? which class i can put it? -- View this message in context: http://lucene.472066.n3.nabble.com/fieldCache-problem-OOM-exception-tp3067057p3517780.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delta-import of rich documents like word and pdf files!
When I set my fileSize of type string. It shows error as I have posted above. Then I changed it to slong and results was severe..here is log 18 Nov, 2011 3:00:54 PM org.apache.solr.response.BinaryResponseWriter$Resolver getDoc WARNING: Error reading a field from document : SolrDocument[{}] java.lang.StringIndexOutOfBoundsException: String index out of range: 3 at java.lang.String.charAt(String.java:694) at org.apache.solr.util.NumberUtils.SortableStr2long(NumberUtils.java:152) at org.apache.solr.schema.SortableLongField.toObject(SortableLongField.java:70) at org.apache.solr.schema.SortableLongField.toObject(SortableLongField.java:37) at org.apache.solr.response.BinaryResponseWriter$Resolver.getDoc(BinaryResponseWriter.java:148) at org.apache.solr.response.BinaryResponseWriter$Resolver.writeDocList(BinaryResponseWriter.java:122) at org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:86) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:144) at org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:134) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:222) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:139) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:87) at org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:191) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:57) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:964) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) what does this mean? -- View this message in context: http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3518171.html Sent from the Solr - User mailing list archive at Nabble.com.
write-lock issue
Environment: Solr 1.4 on Windows/MS SQL Server A write lock is getting created whenever I am trying to do a full-import of documents using DIH. Logs say Creating a connection with the database. and the process is not going forward (Not getting a database connection). So the indexes are not getting created. Note that no other process is accessing the index and even I restarted my MS SQL Server service. However still I see a write.lock file in my index directory. What could be the reason for this? Even I have set the flag unlockOnStartup in solrconfig to be true, still the indexing is not happening. /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
Multivalued Boolean Search
Hi I have a multivalued field say MulField in my index that have values in a document like str name=DocID1/str arr name=MulField strAuto Mobiles/str strToyota Corolla/str /arr No let say I specified a search criteria as +MulField:Mobiles +MulField:Toyota now my question is it is possible that this document should not appear in the search results. Regards Ahsan
wild card search and lower-casing
Hello, Here is one puzzle I couldn't yet find a key for: for the wild-card query: *ocvd SOLR 3.4 returns hits. But for *OCVD it doesn't On the indexing side two following tokenizers/filters are defined: tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=3 maxPosQuestion=2 maxFractionAsterisk=0.33/ On the query side: tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ SOLR analysis tool shows, that OCVD gets lower-cased to ocvd. Does SOLR skip a lower-casing step when doing the actual wild-card search? BTW, the same issue for a trailing wild-card: mocv* produces hits, while MOCV* doesn't. Appreciate any help or pointers. -- Regards, Dmitry Kan
Only a subset of edismax pf fields are used for the phrase part DisjunctionMaxQuery
Hello, The parsedQuery is displayed as follow: parsedquery=+(DisjunctionMaxQuery((title:responsable^4.0 | keywords:responsable^3.0 | organizationName:responsable | location:responsable | formattedDescription:responsable^2.0 | nafCodeText:responsable^2.0 | jobCodeText:responsable^3.0 | categoryPayloads:responsable | labelLocation:responsable)~0.1) DisjunctionMaxQuery((title:boutique^4.0 | keywords:boutique^3.0 | organizationName:boutique | location:boutique | formattedDescription:boutique^2.0 | nafCodeText:boutique^2.0 | jobCodeText:boutique^3.0 | categoryPayloads:boutique | labelLocation:boutique)~0.1) DisjunctionMaxQuery((title:lingerie^4.0 | keywords:lingerie^3.0 | organizationName:lingerie | location:lingerie | formattedDescription:lingerie^2.0 | nafCodeText:lingerie^2.0 | jobCodeText:lingerie^3.0 | categoryPayloads:lingerie | labelLocation:lingerie)~0.1)) *DisjunctionMaxQuery*((title:responsable boutique lingerie~10^4.0 | formattedDescription:responsable boutique lingerie~10^2.0 | categoryPayloads:responsable boutique lingerie~10)~0.1) The search query is 'responsable boutique lingerie' The qf and pf fields are the same: qf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0 organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0 categoryPayloads^1.0, pf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0 organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0 categoryPayloads^1.0, I would have expect to retrieve the whole set of pf fields for the phrase part of the parsed query! Is it comming from the field definition in the schema.xml? Best, Jean-Claude Dauphin -- Jean-Claude Dauphin jc.daup...@gmail.com jc.daup...@afus.unesco.org http://kenai.com/projects/j-isis/ http://www.unesco.org/isis/ http://www.unesco.org/idams/ http://www.greenstone.org
handling query errors
I'm new to SolR and just got things working. I can query my index retrieve JSON results via: HTTP GET using: wt=json and q=num_cpu parameters: e.g.: http://127.0.0.1:8080/solr/select?indent=onversion=2.2q=num_cpu%3A16fq=start=0rows=10fl=*%2Cscoreqt=wt=jsonexplainOther=debugQuery=on When the query is syntactically correct SolR responds with HTTP Status 200 and my JSON data. When the query is incorrect (e.g. specifying an undefined field) SolR responds with a HTTP status 400 with HTML that explains the error. For example when I submit number_cpu instead of num_cpu I get: h1HTTP Status 400 - undefined field number_cpu/h1 HR size=1 noshade=noshade pbtype/b Status report/p pbmessage/b uundefined field number_cpu/u/p pbdescription/b uThe request sent by the client was syntactically incorrect (undefined field number_cpu)./u/p What's the best way to programatically access the error message undefined field number_cpu? Is it possible to configure SolR to always return error messages in a different way. Thanks Alan
Re: wild card search and lower-casing
Here is one puzzle I couldn't yet find a key for: for the wild-card query: *ocvd SOLR 3.4 returns hits. But for *OCVD it doesn't This is a FAQ. Please see http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F
Re: Multivalued Boolean Search
I have a multivalued field say MulField in my index that have values in a document like str name=DocID1/str arr name=MulField strAuto Mobiles/str strToyota Corolla/str /arr No let say I specified a search criteria as +MulField:Mobiles +MulField:Toyota now my question is it is possible that this document should not appear in the search results. No. You should index separate two documents in your example. e.g. Normalize your data. However there is one trick that you can use. Issuing a phrase query may satisfy your needs. e.g. q=MulField:Mobiles Toyota~99 Use a big positionIncrementGap value in your field definition. http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html
Re: Can files be faceted based on their size ?
*Can fileSize be faceted? *I tried to facet them, but fileSize is of type string and can not be faceted. I want to facet my doc and pdf files according to their size. I can calculate file Size but they are of type string. What should I do in order to achieve that? Thanks in advance Since you want to apply range faceting you need to use trie based types. http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range
Re: wild card search and lower-casing
Hi Ahmet, Thanks for the link. I'm a bit puzzled with the explanation found there regarding lower casing: These queries are case-insensitive anyway because QueryParser makes them lowercase. that's exactly what I want to achieve, but somehow the queries *are* case-sensitive. Probably I should play around with code of a query parser. On Fri, Nov 18, 2011 at 2:50 PM, Ahmet Arslan iori...@yahoo.com wrote: Here is one puzzle I couldn't yet find a key for: for the wild-card query: *ocvd SOLR 3.4 returns hits. But for *OCVD it doesn't This is a FAQ. Please see http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F -- Regards, Dmitry Kan
NewSolrCloud/ES Comparison
Hi, I know its too early for this given that the NewSolrCloud for Solr 4 is still in development, but what is interesting for those of us anxiously awaiting NewSolrCloud 4 is to understand how it compares to existing cloud-like search such as ElasticSearch (which I only recently learned about). I use Solr in my project and am hoping to gain some of the distributed features mention in ES with Solr. Does anyone know what the similarities/differences might be? Many thanks, Darren
Re: Implications of setting catenateAll=1
The main one is that you can get an explosion in the number of terms, depending on your input, especially if you have things that aren't regular text. Imagine partone-1 partone-2 partone-3 parttwo-1 parttwo-2 parttwo-3 if catenateall is set to 0, you;d get 5 tokens here. If it was set to 1 you'd get 11 tokens. Which doesn't seem like a lot until you have hundreds of thousands of patterns like this. So, give it a whirl and see what pops out with your particular corpus, but keep an eye on the number of unique terms that end up in the field. Best Erick On Thu, Nov 17, 2011 at 12:18 PM, Brendan Grainger brendan.grain...@gmail.com wrote: Hi, The default for catenateAll is 0 which we've been using on the WordDelimiterFilter. What would be the possibly negative implications of setting this to 1? So that: wi-fi-800 would produce the tokens: wi, fi, wifi, 800, wifi800 for example? Thanks
Re: wild card search and lower-casing
Hi Ahmet, Thanks for the link. I'm a bit puzzled with the explanation found there regarding lower casing: These queries are case-insensitive anyway because QueryParser makes them lowercase. that's exactly what I want to achieve, but somehow the queries *are* case-sensitive. Probably I should play around with code of a query parser. There is an effort for this : https://issues.apache.org/jira/browse/SOLR-218 You can vote this issue. For the time being you can lowercase them in the client side.
Re: wild card search and lower-casing
OK. Actually I have just checked the source code of Lucene's QueryParser and lowercaseExpandedTerms there is set to true by default (version 3.4). The code there does lower-casing by default. So in that sense I don't need to do anything in the client code. Is something wrong here? On Fri, Nov 18, 2011 at 3:49 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Ahmet, Thanks for the link. I'm a bit puzzled with the explanation found there regarding lower casing: These queries are case-insensitive anyway because QueryParser makes them lowercase. that's exactly what I want to achieve, but somehow the queries *are* case-sensitive. Probably I should play around with code of a query parser. There is an effort for this : https://issues.apache.org/jira/browse/SOLR-218 You can vote this issue. For the time being you can lowercase them in the client side. -- Regards, Dmitry Kan
Re: wild card search and lower-casing
Actually I have just checked the source code of Lucene's QueryParser and lowercaseExpandedTerms there is set to true by default (version 3.4). The code there does lower-casing by default. So in that sense I don't need to do anything in the client code. Is something wrong here? But SolrQueryParser extends that and default behavior may different. For clarification see source code of SolrQueryParser.
Re: wild card search and lower-casing
You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } OK, thanks for pointing. On Fri, Nov 18, 2011 at 4:12 PM, Ahmet Arslan iori...@yahoo.com wrote: Actually I have just checked the source code of Lucene's QueryParser and lowercaseExpandedTerms there is set to true by default (version 3.4). The code there does lower-casing by default. So in that sense I don't need to do anything in the client code. Is something wrong here? But SolrQueryParser extends that and default behavior may different. For clarification see source code of SolrQueryParser. -- Regards, Dmitry Kan
Re: wild card search and lower-casing
You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } Please note that lowercaseExpandedTerms uses String.toLowercase() (uses default Locale) which is a Locale sensitive operation. In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if it is ported to solr. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
Re: Boosting is slow
Any ideas on this one? On Thu, Nov 17, 2011 at 3:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Sorry, the query is actually: http://localhost:8983/solr/mycore/search/?q=test{!boost b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}start=sort=score+desc,mydate_field+descwt=xslttr=mysite.xsl On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I have about 20 million records in my solr index. I'm running into a problem now where doing a boost drastically slows down my search application. A typical query for me looks something like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))} I've tried several variations on the boost to see if that was the problem but even when doing something simple like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=2} it is still really slow. Is there a different approach I should be taking? Thanks, Brian Lamb
Re: Boosting is slow
On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb brian.l...@journalexperts.com wrote: http://localhost:8983/solr/mycore/search/?q=test {!boost b=2} it is still really slow. Is there a different approach I should be taking? I just tried what something similar to this (a non-boosted query vs a simple boosted query) on a 10M document test index. #Non boosted q=myfield:[* TO *] dummy_i:1 #Boosted q={!boost b=2 v=$qq} qq=myfield:[* TO *] dummy_i:1 Notes: - the dummy was just used to change the query so there would be no cache it (I set it to something different for each try) - myfield is a single valued field that only has 10 unique terms (so the range query should be fast), and does select all 10M docs My results: normal=386ms boosted=481ms -Yonik http://www.lucidimagination.com
Solr filterCache size settings...
I am new to solr in general and trying to get a handle on the memory requirements for caching. Specifically I am looking at the filterCache right now. The documentation on size setting seems to indicate that it is the number of values to be cached. Did I read that correctly, or is it really the amount of memory that will be set aside for the cache? How do you determine how much cache each fq will consume? Thank you! -- Andrew Lundgren lundg...@familysearch.org NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
RE: How to read values from dataimport.properties in a production environment
You can add it yourself in admin-extra.html Ephraim Ofir -Original Message- From: Nico Luna [mailto:nicolaslun...@gmail.com] Sent: Friday, November 11, 2011 7:57 PM To: solr-user@lucene.apache.org Subject: How to read values from dataimport.properties in a production environment I'm trying to know the values stored in the dataimport.properties file in a production environment using the solr admin feature, so I copied the same behaviour as: [Schema] and [Config] but changing the contentType property (from contentType=text/xml to contentType=text): http://localhost:8080/solr/admin/file/?contentType=text;charset=utf-8file=dataimport.properties I want to know if there is other way to know the dataimport.properties values using the solr admin and if that solution may be a possible feature to add in the solr admin. For example adding into he index.jsp file: [ a href=file/?contentType=text;charset=utf-8file=dataimport.propertiesdataimport.properties ] Thanks, Nicolás -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-read-values-from-dataimport-properties-in-a-production-environment-tp3500453p3500453.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: keeping master server indexes in sync after failover recovery
I am having similar scenario and we are having one Primary Master and One Back Up Master. Load switching is happen using Big IP load balancer. When Primary master goes down Back Up Master will become active Primary master. We have added a heath check API in solr and when Primary master is back to normal it will check last delta/full import completion time with server startup time and heath API will only return Active Status if delta/full import completion time is greater than server startup time. Hope this make sense. Thanks, Umesh -- View this message in context: http://lucene.472066.n3.nabble.com/keeping-master-server-indexes-in-sync-after-failover-recovery-tp3497417p3520451.html Sent from the Solr - User mailing list archive at Nabble.com.
Index Update Strategy
What are the general schools of thought on how to update an index? I have a medium volume OLTP SaaS system. I think my options are: 1) Run the DIH delta-query every minutes to pull in changes 2) Use Update events on the app to asynchronously create a bean that represents my solr doc, then use SolrJ to add/update the doc. Are there any other options. Any advice from the veterans who have been down this road before? Thank you. -- Sincerely, David Webb