Problem with AND clause in multi core search query
Hi, I have 2 cores configured in my solr instance. Both cores are using same schema. I have indexed column1 in core0 and column2 in core1 My search query is http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1:A; AND column2:B No result found http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1:A; OR column2:B Whether AND is supported in multi core search? Thanks, ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-AND-clause-in-multi-core-search-query-tp3983800.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with date searching.
In fact I am able to see scanneddate field when i added query like this responseHeader:{ - - q: ibrahim.hamid 2012-02-02T04:00:52Z, qf: userid scanneddate, wt:json, defType:dismax, version:2.2, rows:50}}, response:{numFound:20,start:0,docs:[ { --- -- scanneddate:[2012-02-02T04:00:52Z], }, -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3983801.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with date searching.
select/?defType=dismaxq=+ibrahim.hamid+2012-02-02T04:00:52Zqf=+userid+scanneddateversion=2.2start=0rows=50indent=onwt=jsondebugQuery=on -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3983802.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with AND clause in multi core search query
The latter is supposed to work: http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1 :A OR column2:B The first query cannot work as there is no document neither in core0 nor in core1 which has A in field column1 and B in field column2 but only documents which have B in column2 (in core1) OR A in column1 (in core0). Regards. Tommaso 2012/5/15 ravicv ravichandra...@gmail.com Hi, I have 2 cores configured in my solr instance. Both cores are using same schema. I have indexed column1 in core0 and column2 in core1 My search query is http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1 :A AND column2:B No result found http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1 :A OR column2:B Whether AND is supported in multi core search? Thanks, ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-AND-clause-in-multi-core-search-query-tp3983800.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Show a portion of searchable text in Solr
Can somebody tell me where should I place the highlighting parameters, when I did on the query, it is not working. hl=truehl.requireFieldMatch=truehl.fl=* FYI: I am new to solr. My aim is to have emphasis tags on the queried words and need to display only the query relevant snippet of the content Thanks Shameema On Mon, May 14, 2012 at 1:18 PM, Ahmet Arslan iori...@yahoo.com wrote: I have indexed very large documents, In some cases these documents has 100.000 characters. Is there a way to return a portion of the documents (lets say the 300 first characters) when i am querying Solr?. Is there any attribute to set in the schema.xml or solrconfig.xml to achieve this? I have a set-up with very large documents too. Here is two different solutions that I have used in the past: 1) Use highlighting with hl.alternateField and hl.maxAlternateFieldLength http://wiki.apache.org/solr/HighlightingParameters 2) Create an extra field (indexed=false and stored=true) using copyField just for display purposes. (fl=shortField) copyField source=largeField dest=shortField maxChars=300/ http://wiki.apache.org/solr/SchemaXml#Copy_Fields Also, didn't used by myself yet but I *think* this can be accomplished by using a custom Transformer too. http://wiki.apache.org/solr/DocTransformers
Re: Problem with AND clause in multi core search query
Thanks Tommaso . Could you please tell me is their any way to get this scenario to get worked? http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1:A; AND column2:B Is their any way we can achieve this below scenario query : column1:A should searched in core0 and column2:B should be searched in core1 and later the results from both queries should use condition AND and give final response.? since both will return common field as response. For reference my schema is : field name=id type=string indexed=true stored=true required=true / field name=value type=string indexed=true stored=true / field name=column1 type=string indexed=true stored=true/ field name=column2 type=string indexed=true stored=true / Thanks Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-AND-clause-in-multi-core-search-query-tp3983800p3983806.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with date searching.
if i use q=scanneddate:[2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] . it is working fine . but when i tried with dismax query .it is not working . EX : select/?defType=dismaxq=[2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z]qf=scanneddateversion=2.2start=0rows=50indent=onwt=jsondebugQuery=ontrue please comment on the same. -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3983807.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Security
Thanks for the suggestions. I tried to use SolrJ within my Servlet. Although the SolrJ QueryResponse is not returning a well formed Json Object. I need the Json String with quotes as below. although QueryResponse.toString() doesn't return json with quotes at all. jsonp1337064466204({responseHeader:{status:0,QTime:0,params:{json.wrf:jsonp1337064466204,facet:true,facet.mincount:1,q:*:*,facet.limit:-1,json.nl:map,facet.field:[title,abstract],wt:json,rows:0}},response:{numFound:0,start:0,docs:[]},facet_counts:{facet_queries:{},facet_fields:{title:{},abstract:{}},facet_dates:{},facet_ranges:{}}}) Regards Anupam On Fri, May 11, 2012 at 7:56 PM, Welty, Richard rwe...@ltionline.comwrote: in fact, there's a sample proxy.php on the ajax-solr web page which can easily be modified into a security layer. my solr servers only listen to requests issued by a narrow list of systems, and everything gets routed through a modified copy of the proxy.php file, which checks whether the user is logged in, and adds terms to the query to limit returned results to those the user is permitted to see. -Original Message- From: Jan Høydahl [mailto:j...@hoydahl.no] Sent: Fri 5/11/2012 9:45 AM To: solr-user@lucene.apache.org Subject: Re: SOLR Security Hi, There is nothing stopping you from pointing Ajax-SOLR to a URL on your app-server, which acts as a security insulation layer between the Solr backend and the world. In this (thin) layer you can analyze the input and choose carefully what to let through and not. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 11. mai 2012, at 06:37, Anupam Bhattacharya wrote: Yes, I agree with you. But Ajax-SOLR Framework doesn't fit in that manner. Any alternative solution ? Anupam On Fri, May 11, 2012 at 9:41 AM, Klostermeyer, Michael mklosterme...@riskexchange.com wrote: Instead of hitting the Solr server directly from the client, I think I would go through your application server, which would have access to all the users data and can forward that to the Solr server, thereby hiding it from the client. Mike -Original Message- From: Anupam Bhattacharya [mailto:anupam...@gmail.com] Sent: Thursday, May 10, 2012 9:53 PM To: solr-user@lucene.apache.org Subject: SOLR Security I am using Ajax-Solr Framework for creating a search interface. The search interface works well. In my case, the results have document level security so by even indexing records with there authorized users help me to filter results per user based on the authentication of the user. The problem that I have to a pass always a parameter to the SOLR Server with userid={xyz} which one can figure out from the SOLR URL(ajax call url) using Firebug tool in the Net Console on Firefox and can change this parameter value to see others records which he/she is not authorized. Basically it is Cross Site Scripting Issue. I have read about some approaches for Solr Security like Nginx with Jetty .htaccess based security.Overall what i understand from this is that we can restrict users to do update/delete operations on SOLR as well as we can restrict the SOLR admin interface to certain IPs also. But How can I restrict the {solr-server}/solr/select based results from access by different user id's ?
Re: Boosting on field empty or not
Basically I want documents that have a given field populated to have a higher score than the documents that dont. So if you search for foo I want documents that contain foo, but i want the documents that have field a populated to have a higher score... Hi Donald, Since you are using edismax, it is better to use bq (boosting query) for this. bq=reqularprice:[* TO *]^50 http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 defType=edismaxqf=nameSuggest^10 name^10 codeTXT^2 description^1 brand_search^0 cat_search^10q=chairsbq=reqularprice:[* TO *]^50
Query regarding multi core search
HI, I want to configured 2 cores in my SOLR instance. Now i want to query core0 with different query and core1 with diffrent query and finally merge the results . Please suggest me the best way to do this . Thanks Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Query-regarding-multi-core-search-tp3983813.html Sent from the Solr - User mailing list archive at Nabble.com.
simple query help
Hi Can someone please give me some help with a simple query. If I search q=skcode:2021051 and flength:368.0 I get 1 document returned (doc A) If I search q=skcode:2021049 and ent_no:1040970907 I get 1 document returned (doc B) But if I search q=skcode:2021051 and flength:368.0 or skcode:2021049 and ent_no:1040970907 I get no documents returned. Shouldn't I get both docA and docB? Thanks, Peter
Re: Multi-words synonyms matching
Without reading the whole thread let me say that you should not trust the solr admin analysis. It takes the whole multiword search and runs it all together at once through each analyzer step (factory). But this is not how the real system works. First pitfall, the query parser is also splitting at white space (if not a phrase query). Due to this, a multiword query is send chunk after chunk through the analyzer and, second pitfall, each chunk runs through the whole analyzer by its own. So if you are dealing with multiword synonyms you have the following problems. Either you turn your query into a phrase so that the whole phrase is analyzed at once and therefore looked up as multiword synonym but phrase queries are not analyzed !!! OR you send your query chunk by chunk through the analyzer but then they are not multiwords anymore and are not found in your synonyms.txt. From my experience I can say that it requires some deep work to get it done but it is possible. I have connected a thesaurus to solr which is doing query time expansion (no need to reindex if the thesaurus changes). The thesaurus holds synonyms and used for terms in 24 languages. So it is also some kind of language translation. And naturally the thesaurus translates from single term to multi term synonyms and vice versa. Regards, Bernd Am 14.05.2012 13:54, schrieb elisabeth benoit: Just for the record, I'd like to conclude this thread First, you were right, there was no behaviour difference between fq and q parameters. I realized that: 1) my synonym (hotel de ville) has a stopword in it (de) and since I used tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms declaration, there was no stopword removal in the indewed expression, so when requesting hotel de ville, after stopwords removal in query, Solr was comparing hotel de ville with hotel ville but my queries never even got to that point since 2) I made a mistake using mairie alone in the admin interface when testing my schema. The real field was something like collectivités territoriales mairie, so the synonym hotel de ville was not even applied, because of the tokenizerFactory=solr.KeywordTokenizerFactory in my synonym definition not splitting field into words when parsing So my problem is not solved, and I'm considering solving it outside of Solr scope, unless someone else has a clue Thanks again, Elisabeth 2012/4/25 Erick Erickson erickerick...@gmail.com A little farther down the debug info output you'll find something like this (I specified fq=name:features) arr name=parsed_filter_queries strname:features/str /arr so it may well give you some clue. But unless I'm reading things wrong, your q is going against a field that has much more information than the CATEGORY_ANALYZED field, is it possible that the data from your test cases simply isn't _in_ CATEGORY_ANALYZED? Best Erick On Wed, Apr 25, 2012 at 9:39 AM, elisabeth benoit elisaelisael...@gmail.com wrote: I'm not at the office until next Wednesday, and I don't have my Solr under hand, but isn't debugQuery=on giving informations only about q parameter matching and nothing about fq parameter? Or do you mean parsed_filter_queries gives information about fq? CATEGORY_ANALYZED is being populated by a copyField instruction in schema.xml, and has the same field type as my catchall field, the search field for my searchHandler (the one being used by q parameter). CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is text) CATEGORY (a string) is copied in catchall field (field type is text), and a lot of other fields are copied too in that catchall field. So as far as I can see, the same analysis should be done in both cases, but obviously I'm missing something, and the only thing I can think of is a different behavior between q and fq parameter. I'll check that parsed_filter_querie first thing in the morning next Wednesday. Thanks a lot for your help. Elisabeth 2012/4/24 Erick Erickson erickerick...@gmail.com Elisabeth: What shows up in the debug section of the response when you add debugQuery=on? There should be some bit of that section like: parsed_filter_queries My other question is are you absolutely sure that your CATEGORY_ANALYZED field has the correct content?. How does it get populated? Nothing jumps out at me here Best Erick On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, thanks, but this is NOT my question. I was wondering why I have multiple matches with q=hotel de ville and no match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm searching in the same solr fieldType. Why is q parameter behaving differently in that case? Why do the quotes work in one case and not in the other? Does anyone know? Thanks, Elisabeth 2012/4/24 Jeevanandam je...@myjeeva.com usage of q and fq q = is typically the main query for the search
query with DATE FIELD AND RANGE query using dismax
Hi My queries are working with standard query handler but not in dismax. *it is working fine * EX : q=scanneddate:[2012-02-02T01:30:52Z TO 2011-09-22T22:40:30Z] . *Not Working :* EX defType=dismaxq=[2012-02-02T01:30:52Z TO 2011-09-22T22:40:30Z]qf=scanneddate How can I check for the date ranges using solr's dismax query handler -- View this message in context: http://lucene.472066.n3.nabble.com/query-with-DATE-FIELD-AND-RANGE-query-using-dismax-tp3983819.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: authentication for solr admin page?
I have written an article on this. The various steps to restrict / authenticate Solr admin interface. http://www.findbestopensource.com/article-detail/restrict-solr-admin-access Regards Aditya www.findbestopensource.com On Thu, Mar 29, 2012 at 1:06 AM, geeky2 gee...@hotmail.com wrote: update - ok - i was reading about replication here: http://wiki.apache.org/solr/SolrReplication and noticed comments in the solrconfig.xml file related to HTTP Basic Authentication and the usage of the following tags: str name=httpBasicAuthUserusername/str str name=httpBasicAuthPasswordpassword/str *Can i place these tags in the request handler to achieve an authentication scheme for the /admin page?* // snipped from the solrconfig.xml file requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers/ thanks for any help mark -- View this message in context: http://lucene.472066.n3.nabble.com/authentication-for-solr-admin-page-tp3865665p3865747.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query with DATE FIELD AND RANGE query using dismax
Hi, You can't. Try eDisMax instead: http://wiki.apache.org/solr/ExtendedDisMax -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 15. mai 2012, at 11:05, ayyappan wrote: Hi My queries are working with standard query handler but not in dismax. *it is working fine * EX : q=scanneddate:[2012-02-02T01:30:52Z TO 2011-09-22T22:40:30Z] . *Not Working :* EX defType=dismaxq=[2012-02-02T01:30:52Z TO 2011-09-22T22:40:30Z]qf=scanneddate How can I check for the date ranges using solr's dismax query handler -- View this message in context: http://lucene.472066.n3.nabble.com/query-with-DATE-FIELD-AND-RANGE-query-using-dismax-tp3983819.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: adding an OR to a fq makes some doc that matched not match anymore
that does not change the results for me: -suggest?q=suggest_terms:lap*fq=type:Pfq=((-type:B))debugQuery=true -found 1 -suggest?q=suggest_terms:lap*fq=type:Pfq=((-type:B)+OR+name:aa)debugQuery=true -found 0 looks like a bug? xab -- View this message in context: http://lucene.472066.n3.nabble.com/adding-an-OR-to-a-fq-makes-some-doc-that-matched-not-match-anymore-tp3983775p3983828.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: adding an OR to a fq makes some doc that matched not match anymore
that does not change the results for me: -suggest?q=suggest_terms:lap*fq=type:Pfq=((-type:B))debugQuery=true -found 1 -suggest?q=suggest_terms:lap*fq=type:Pfq=((-type:B)+OR+name:aa)debugQuery=true -found 0 Negative clause and OR clause does not work like this. fq=+*:* -type:B name:aa should work.
Re: simple query help
Hi, You should use parantheses, have you tried that? q=(skcode:2021051 and flength:368.0) or (skcode:2021049 and ent_no:1040970907) http://robotlibrarian.billdueber.com/solr-and-boolean-operators/ Bye, Andras 2012/5/15 Peter Kirk p...@alpha-solutions.dk Hi Can someone please give me some help with a simple query. If I search q=skcode:2021051 and flength:368.0 I get 1 document returned (doc A) If I search q=skcode:2021049 and ent_no:1040970907 I get 1 document returned (doc B) But if I search q=skcode:2021051 and flength:368.0 or skcode:2021049 and ent_no:1040970907 I get no documents returned. Shouldn't I get both docA and docB? Thanks, Peter
RE: simple query help
Hi - thanks for the response. Yes I have tried with parentheses, to group as you suggest. It doesn't make a difference. But now I'm thinking there's something completely odd - and I wonder if it's necessary to use a special search-handler to achieve what I want. For example, if I execute q=(skcode:2021051 AND flength:368.0) I get no results. If I omit the parentheses, I get 1 result. (Let alone trying to combine several Boolean clauses). /Peter -Original Message- From: András Bártházi [mailto:and...@barthazi.hu] Sent: 15. maj 2012 12:51 To: solr-user@lucene.apache.org Subject: Re: simple query help Hi, You should use parantheses, have you tried that? q=(skcode:2021051 and flength:368.0) or (skcode:2021049 and ent_no:1040970907) http://robotlibrarian.billdueber.com/solr-and-boolean-operators/ Bye, Andras 2012/5/15 Peter Kirk p...@alpha-solutions.dk Hi Can someone please give me some help with a simple query. If I search q=skcode:2021051 and flength:368.0 I get 1 document returned (doc A) If I search q=skcode:2021049 and ent_no:1040970907 I get 1 document returned (doc B) But if I search q=skcode:2021051 and flength:368.0 or skcode:2021049 and ent_no:1040970907 I get no documents returned. Shouldn't I get both docA and docB? Thanks, Peter
RE: simple query help
It doesn't make a difference. But now I'm thinking there's something completely odd - and I wonder if it's necessary to use a special search-handler to achieve what I want. For example, if I execute q=(skcode:2021051 AND flength:368.0) I get no results. If I omit the parentheses, I get 1 result. (Let alone trying to combine several Boolean clauses). Which query parser are you using?
Re: - Solr 4.0 - How do I enable JSP support ? ...
What do you mean jsp support? What is it you're trying to do with jsp? What servelet container are you using? Details matter. Best Erick On Mon, May 14, 2012 at 5:34 PM, Naga Vijayapuram nvija...@tibco.com wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga
Re: document cache
Yes. In fact, all the caches get flushed on every commit/replication cycle. Some of the caches get autowarmed when a new searcher is opened, which happens...you guessed it...every time a commit/replication happens. Best Erick On Tue, May 15, 2012 at 1:32 AM, shinkanze rajatrastogi...@gmail.com wrote: hi , I want to know the internal mechanism how document cache works . specifically its flushing cycle ... i.e does it gets flushed on every commit /replication . regards Rajat Rastogi -- View this message in context: http://lucene.472066.n3.nabble.com/document-cache-tp3983796.html Sent from the Solr - User mailing list archive at Nabble.com.
Issue in Applying patch file
Hi, We have checked out the latest version of Solr source code from svn. We are trying to apply the following patch file to it. https://issues.apache.org/jira/browse/SOLR-3430 While applying the patch file using eclipse (i.e. using team--apply patch options), we are getting cross marks for certain java files and its getting updated for the following java file alone and we are able to see the patch file changes for this alone. solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestThreaded.java Why is that its not getting applied for the other set of java files which is present in the patch file and sometimes, we are getting file does not exist error even if the corresponding files are present. And also, when I try to ant build it after applying the patch, Im getting the following error common-build.xml:949: Error starting modern compiler Can you tell me If Im missing out anything? Can you please guide me on this? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-in-Applying-patch-file-tp3983842.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: simple query help
Hi, it is AND (uppercase) not and (smallcase) (and OR instead of or). Regards, Peter 2012/5/15 András Bártházi and...@barthazi.hu: Hi, You should use parantheses, have you tried that? q=(skcode:2021051 and flength:368.0) or (skcode:2021049 and ent_no:1040970907) http://robotlibrarian.billdueber.com/solr-and-boolean-operators/ Bye, Andras 2012/5/15 Peter Kirk p...@alpha-solutions.dk Hi Can someone please give me some help with a simple query. If I search q=skcode:2021051 and flength:368.0 I get 1 document returned (doc A) If I search q=skcode:2021049 and ent_no:1040970907 I get 1 document returned (doc B) But if I search q=skcode:2021051 and flength:368.0 or skcode:2021049 and ent_no:1040970907 I get no documents returned. Shouldn't I get both docA and docB? Thanks, Peter -- Péter Király eXtensible Catalog http://eXtensibleCatalog.org http://drupal.org/project/xc
Re: Problem with AND clause in multi core search query
I really don't understand what you're trying to achieve. query : column1:A should searched in core0 and column2:B should be searched in core1 and later the results from both queries should use condition AND and give final response.? core1 and core0 are completely separate cores, with separate documents. The only relationship between documents in the two cores is that they should conform to the same schema since you're using shards. So saying that your query should search in just one column in each core then AND the results really doesn't make any sense to me. I suspect there are some assumptions you're not explicitly stating about the relationship between documents in separate cores that would help here... Best Erick On Tue, May 15, 2012 at 3:07 AM, ravicv ravichandra...@gmail.com wrote: Thanks Tommaso . Could you please tell me is their any way to get this scenario to get worked? http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1:A; AND column2:B Is their any way we can achieve this below scenario query : column1:A should searched in core0 and column2:B should be searched in core1 and later the results from both queries should use condition AND and give final response.? since both will return common field as response. For reference my schema is : field name=id type=string indexed=true stored=true required=true / field name=value type=string indexed=true stored=true / field name=column1 type=string indexed=true stored=true/ field name=column2 type=string indexed=true stored=true / Thanks Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-AND-clause-in-multi-core-search-query-tp3983800p3983806.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: - Solr 4.0 - How do I enable JSP support ? ...
Afaik we disabled JSP-Functionality in SOLR-3159 while upgrading Jetty .. On Tuesday, May 15, 2012 at 1:44 PM, Erick Erickson wrote: What do you mean jsp support? What is it you're trying to do with jsp? What servelet container are you using? Details matter. Best Erick On Mon, May 14, 2012 at 5:34 PM, Naga Vijayapuram nvija...@tibco.com (mailto:nvija...@tibco.com) wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga
RE: simple query help
Hi If I understand the terms correctly, the search-handler was configured to use edismax. The start of the configuration in the solrconfig.xml looks like this: requestHandler name=/search class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str In any case, when I commented-out the deftype entry, and restarted the solr webapp, things began to function as I expected. But whether or not it was simply the act of restarting - I'm not sure. (I had also found out that AND and OR should be written in uppercase, but this made no difference until after I had restarted). Thanks for your time, Peter -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 15. maj 2012 13:25 To: solr-user@lucene.apache.org Subject: RE: simple query help It doesn't make a difference. But now I'm thinking there's something completely odd - and I wonder if it's necessary to use a special search-handler to achieve what I want. For example, if I execute q=(skcode:2021051 AND flength:368.0) I get no results. If I omit the parentheses, I get 1 result. (Let alone trying to combine several Boolean clauses). Which query parser are you using?
Re: simple query help
Are you using the edismax query parser (which permits lower case and and or operators)? If so, there is a bug with parenthesized sub-queries. If you have a left parenthesis immediately before a field name (which you do in this case) the query fails. The short-term workaround is to place a space between the left parenthesis and the field name. See: https://issues.apache.org/jira/browse/SOLR-3377 -- Jack Krupansky -Original Message- From: Peter Kirk Sent: Tuesday, May 15, 2012 7:04 AM To: solr-user@lucene.apache.org Subject: RE: simple query help Hi - thanks for the response. Yes I have tried with parentheses, to group as you suggest. It doesn't make a difference. But now I'm thinking there's something completely odd - and I wonder if it's necessary to use a special search-handler to achieve what I want. For example, if I execute q=(skcode:2021051 AND flength:368.0) I get no results. If I omit the parentheses, I get 1 result. (Let alone trying to combine several Boolean clauses). /Peter -Original Message- From: András Bártházi [mailto:and...@barthazi.hu] Sent: 15. maj 2012 12:51 To: solr-user@lucene.apache.org Subject: Re: simple query help Hi, You should use parantheses, have you tried that? q=(skcode:2021051 and flength:368.0) or (skcode:2021049 and ent_no:1040970907) http://robotlibrarian.billdueber.com/solr-and-boolean-operators/ Bye, Andras 2012/5/15 Peter Kirk p...@alpha-solutions.dk Hi Can someone please give me some help with a simple query. If I search q=skcode:2021051 and flength:368.0 I get 1 document returned (doc A) If I search q=skcode:2021049 and ent_no:1040970907 I get 1 document returned (doc B) But if I search q=skcode:2021051 and flength:368.0 or skcode:2021049 and ent_no:1040970907 I get no documents returned. Shouldn't I get both docA and docB? Thanks, Peter
RE: simple query help
But whether or not it was simply the act of restarting - I'm not sure. (I had also found out that AND and OR should be written in uppercase, but this made no difference until after I had restarted). By the way, there is a control parameter for this. lowercaseOperators A Boolean parameter indicating if lowercase and and or should be treated the same as operators AND and OR. http://lucidworks.lucidimagination.com/display/solr/The+Extended+DisMax+Query+Parser
Re: simple query help
By removing the defType you reverted to using the traditional Solr/Lucene query parser which supports the particular query syntax you used (as long as AND is in upper-case) and without the parenthesis bug of edismax. -- Jack Krupansky -Original Message- From: Peter Kirk Sent: Tuesday, May 15, 2012 8:23 AM To: solr-user@lucene.apache.org Subject: RE: simple query help Hi If I understand the terms correctly, the search-handler was configured to use edismax. The start of the configuration in the solrconfig.xml looks like this: requestHandler name=/search class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str In any case, when I commented-out the deftype entry, and restarted the solr webapp, things began to function as I expected. But whether or not it was simply the act of restarting - I'm not sure. (I had also found out that AND and OR should be written in uppercase, but this made no difference until after I had restarted). Thanks for your time, Peter -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 15. maj 2012 13:25 To: solr-user@lucene.apache.org Subject: RE: simple query help It doesn't make a difference. But now I'm thinking there's something completely odd - and I wonder if it's necessary to use a special search-handler to achieve what I want. For example, if I execute q=(skcode:2021051 AND flength:368.0) I get no results. If I omit the parentheses, I get 1 result. (Let alone trying to combine several Boolean clauses). Which query parser are you using?
Re: simple query help
Yes, the parentheses are needed to prioritize the operator precedence (do the ANDs and then OR those results.) And, add a space after both left parentheses to account for the edismax bug. (https://issues.apache.org/jira/browse/SOLR-3377) -- Jack Krupansky -Original Message- From: András Bártházi Sent: Tuesday, May 15, 2012 6:50 AM To: solr-user@lucene.apache.org Subject: Re: simple query help Hi, You should use parantheses, have you tried that? q=(skcode:2021051 and flength:368.0) or (skcode:2021049 and ent_no:1040970907) http://robotlibrarian.billdueber.com/solr-and-boolean-operators/ Bye, Andras 2012/5/15 Peter Kirk p...@alpha-solutions.dk Hi Can someone please give me some help with a simple query. If I search q=skcode:2021051 and flength:368.0 I get 1 document returned (doc A) If I search q=skcode:2021049 and ent_no:1040970907 I get 1 document returned (doc B) But if I search q=skcode:2021051 and flength:368.0 or skcode:2021049 and ent_no:1040970907 I get no documents returned. Shouldn't I get both docA and docB? Thanks, Peter
Solr tmp working directory
Hi :) I'm using SolrJ to index documents. I noticed that during the indexing process, .tmp files are created in my /tmp folder. These files contain the xml commands add for the documents I add to the index. Can I change this folder in Solr config and where is it? Thanks, Gary
Re: Show a portion of searchable text in Solr
See the /browse request handler in the example config. Only stored fields will be highlighted. -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Tuesday, May 15, 2012 2:59 AM To: solr-user@lucene.apache.org Subject: Re: Show a portion of searchable text in Solr Can somebody tell me where should I place the highlighting parameters, when I did on the query, it is not working. hl=truehl.requireFieldMatch=truehl.fl=* FYI: I am new to solr. My aim is to have emphasis tags on the queried words and need to display only the query relevant snippet of the content Thanks Shameema On Mon, May 14, 2012 at 1:18 PM, Ahmet Arslan iori...@yahoo.com wrote: I have indexed very large documents, In some cases these documents has 100.000 characters. Is there a way to return a portion of the documents (lets say the 300 first characters) when i am querying Solr?. Is there any attribute to set in the schema.xml or solrconfig.xml to achieve this? I have a set-up with very large documents too. Here is two different solutions that I have used in the past: 1) Use highlighting with hl.alternateField and hl.maxAlternateFieldLength http://wiki.apache.org/solr/HighlightingParameters 2) Create an extra field (indexed=false and stored=true) using copyField just for display purposes. (fl=shortField) copyField source=largeField dest=shortField maxChars=300/ http://wiki.apache.org/solr/SchemaXml#Copy_Fields Also, didn't used by myself yet but I *think* this can be accomplished by using a custom Transformer too. http://wiki.apache.org/solr/DocTransformers
Re: Boosting on field empty or not
The problem with what you provided is it is boosting ALL documents whether the field is empty or not On Tue, May 15, 2012 at 3:52 AM, Ahmet Arslan iori...@yahoo.com wrote: Basically I want documents that have a given field populated to have a higher score than the documents that dont. So if you search for foo I want documents that contain foo, but i want the documents that have field a populated to have a higher score... Hi Donald, Since you are using edismax, it is better to use bq (boosting query) for this. bq=reqularprice:[* TO *]^50 http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 defType=edismaxqf=nameSuggest^10 name^10 codeTXT^2 description^1 brand_search^0 cat_search^10q=chairsbq=reqularprice:[* TO *]^50
Re: Boosting on field empty or not
The problem with what you provided is it is boosting ALL documents whether the field is empty or not Then all of your fields are non-empty? What is the type of your field?
Re: Solr tmp working directory
Solr is probably simply using the Java JVM default. Set the java.io.tmpdir system property. Something equivalent to the following: java -Djava.io.tmpdir=/mytempdir ... On Windows you can set the TMP environment variable. -- Jack Krupansky -Original Message- From: G.Long Sent: Tuesday, May 15, 2012 9:04 AM To: solr-user@lucene.apache.org Subject: Solr tmp working directory Hi :) I'm using SolrJ to index documents. I noticed that during the indexing process, .tmp files are created in my /tmp folder. These files contain the xml commands add for the documents I add to the index. Can I change this folder in Solr config and where is it? Thanks, Gary
Re: Solr tmp working directory
Thank you :) Gary Le 15/05/2012 15:27, Jack Krupansky a écrit : Solr is probably simply using the Java JVM default. Set the java.io.tmpdir system property. Something equivalent to the following: java -Djava.io.tmpdir=/mytempdir ... On Windows you can set the TMP environment variable. -- Jack Krupansky -Original Message- From: G.Long Sent: Tuesday, May 15, 2012 9:04 AM To: solr-user@lucene.apache.org Subject: Solr tmp working directory Hi :) I'm using SolrJ to index documents. I noticed that during the indexing process, .tmp files are created in my /tmp folder. These files contain the xml commands add for the documents I add to the index. Can I change this folder in Solr config and where is it? Thanks, Gary
Re: adding an OR to a fq makes some doc that matched not match anymore
oh yeah, forgot about negatives and *:*... thanks -- View this message in context: http://lucene.472066.n3.nabble.com/adding-an-OR-to-a-fq-makes-some-doc-that-matched-not-match-anymore-tp3983775p3983863.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting on field empty or not
The problem with what you provided is it is boosting ALL documents whether the field is empty or not Then all of your fields are non-empty? What is the type of your field? How do you feed your documents to solr? My be you are indexing empty string? Is your field indexed=true? http://wiki.apache.org/solr/SolrQuerySyntax#Differences_From_Lucene_Query_Parser -field:[* TO *] finds all documents without a value for field Another approach is to use default=SOMETHING in your field definition. (schema.xml) field name=id type=int indexed=true stored=true default=0 / Then you can use field:SOMETHING to retrieve empty fields. +*:* -field:SOMETHING retrieves non-empty documents.
Re: Editing long Solr URLs - Chrome Extension
Jan Thanks for your feedback! If possible can you file these requests on the github page for the extension so I can work on them? They sound like great ideas and I'll try to incorporate all of them in future releases. Thanks Amit On May 11, 2012 9:57 AM, Jan Høydahl j...@hoydahl.no wrote: I've been testing https://chrome.google.com/webstore/detail/mbnigpeabbgkmbcbhkkbnlidcobbapff?hl=enbut I don't think it's great. Great work on this one. Simple and straight forward. A few wishes: * Sticky mode? This tool would make sense in a sidebar, to do rapid refinements * If you edit a value and click TAB, it is not updated :( * It should not be necessary to URLencode all non-ascii chars - why not leave colon, caret (^) etc as is, for better readability? * Some param values in Solr may be large, such as fl, qf or bf. Would be nice if the edit box was multi-line, or perhaps adjusts to the size of the content -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 11. mai 2012, at 07:32, Amit Nithian wrote: Hey all, I don't know about you but most of the Solr URLs I issue are fairly lengthy full of parameters on the query string and browser location bars aren't long enough/have multi-line capabilities. I tried to find something that does this but couldn't so I wrote a chrome extension to help. Please check out my blog post on the subject and please let me know if something doesn't work or needs improvement. Of course this can work for any URL with a query string but my motivation was to help edit my long Solr URLs. http://hokiesuns.blogspot.com/2012/05/manipulating-urls-with-long-query.html Thanks! Amit
Re: Boosting on field empty or not
Let's go back to this step where things look correct, but we ran into the edismax bug which requires that you put a space between each left parenthesis and field name. First, verify that you are using edismax or not. Then, change: q=chairs AND (regularprice:*^5 OR (*:* -regularprice:*)^0.5)sort=score desc to q=chairs AND ( regularprice:*^5 OR ( *:* -regularprice:*)^0.5)sort=score desc (Note the space after each (.) And make sure to uuencode your spaces as + or %20. Also, try this to verify whether you really have chairs without prices: q=chairs AND ( *:* -regularprice:*)sort=score desc (Note that space after (.) And for sanity, try this as well: q=chairs AND ( -regularprice:*)sort=score desc (Again, note that space after (.) Those two queries should give identical results. Finally, technically you should be able to use * or [* TO *] to match all values or negate them to match all documents without a value in a field, but try both to see that they do return the identical set of documents. -- Jack Krupansky -Original Message- From: Donald Organ Sent: Monday, May 14, 2012 4:19 PM To: solr-user@lucene.apache.org Subject: Re: Boosting on field empty or not q=chairs AND (regularprice:*^5 OR (*:* -regularprice:*)^0.5)sort=score desc Same effect. On Mon, May 14, 2012 at 4:12 PM, Jack Krupansky j...@basetechnology.comwrote: Change the second boost to 0.5 to de-boost doc that are missing the field value. You had them the same. -- Jack Krupansky -Original Message- From: Donald Organ Sent: Monday, May 14, 2012 4:01 PM To: solr-user@lucene.apache.org Subject: Re: Boosting on field empty or not OK it looks like the query change is working but it looks like it boosting everything even documents that have that field empty On Mon, May 14, 2012 at 3:41 PM, Donald Organ dor...@donaldorgan.com wrote: OK i must be missing something: defType=edismaxstart=0rows=**24facet=trueqf=nameSuggest^**10 name^10 codeTXT^2 description^1 brand_search^0 cat_search^10spellcheck=true** spellcheck.collate=true**spellcheck.q=chairsfacet.** mincount=1fl=code,scoreq=**chairs AND (regularprice:*^5 OR (*:* -regularprice:*)^5)sort=score desc On Mon, May 14, 2012 at 3:36 PM, Jack Krupansky j...@basetechnology.com **wrote: (*:* -regularprice:*)5 should be (*:* -regularprice:*)^0.5 - the missing boost operator. -- Jack Krupansky -Original Message- From: Donald Organ Sent: Monday, May 14, 2012 3:31 PM To: solr-user@lucene.apache.org Subject: Re: Boosting on field empty or not Still doesnt appear to be working. Here is the full Query string: defType=edismaxstart=0rows=24facet=trueqf=nameSuggest^10 name^10 codeTXT^2 description^1 brand_search^0 cat_search^10spellcheck=truespellcheck.collate=true** spellcheck.q=chairsfacet.mincount=1fl=code,scoreq=chairs AND (regularprice:*^5 OR (*:* -regularprice:*)5) On Mon, May 14, 2012 at 3:28 PM, Jack Krupansky j...@basetechnology.com **wrote: Sorry, make that: q=chairs AND (regularprice:*^5 OR (*:* -regularprice:*)^0.5) I forgot that pure negative queries are broken again, so you need the *:* in there. I noticed that you second boost operator was missing as well. -- Jack Krupansky -Original Message- From: Donald Organ Sent: Monday, May 14, 2012 3:24 PM To: solr-user@lucene.apache.org Subject: Re: Boosting on field empty or not OK i just tried: q=chairs AND (regularprice:*^5 OR (-regularprice:*)5) And that gives me 0 results On Mon, May 14, 2012 at 2:51 PM, Jack Krupansky j...@basetechnology.com * *wrote: foo AND (field:*^2.0 OR (-field:*)^0.5) So, if a doc has anything in the field, it gets boosted, and if the doc does not have anything in the field, de-boost it. Choose the boost factors to suit your desired boosting effect. -- Jack Krupansky -Original Message- From: Donald Organ Sent: Monday, May 14, 2012 2:38 PM To: solr-user@lucene.apache.org Subject: Re: Boosting on field empty or not OK maybe i need to describe this a little more. Basically I want documents that have a given field populated to have a higher score than the documents that dont. So if you search for foo I want documents that contain foo, but i want the documents that have field a populated to have a higher score... Is there a way to do this? On Mon, May 14, 2012 at 2:22 PM, Jack Krupansky j...@basetechnology.com * *wrote: In a query or filter query you can write +field:* to require that a field be populated or +(-field:*) to require that it not be populated -- Jack Krupansky -Original Message- From: Donald Organ Sent: Monday, May 14, 2012 2:10 PM To: solr-user Subject: Boosting on field empty or not Is there a way to boost a document based on whether the field is empty or not. I am looking to boost documents that have a specific field populated.
RE: Issue in Applying patch file
SOLR-3430 is already applied to the latest 3.6 and 4.x (trunk) source code. Be sure you have sources from May 7, 2012 or later (for 3.6 this is SVN r1335205 + ; for trunk it is SVN r1335196 + ) No patches are needed. About the modern compiler error, make sure you're running a 1.6 or 1.7 JDK (the default JDK on some linux distributions is often inadequate) Issue javac -version from the command line as an insanity check. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: mechravi25 [mailto:mechrav...@yahoo.co.in] Sent: Tuesday, May 15, 2012 6:54 AM To: solr-user@lucene.apache.org Subject: Issue in Applying patch file Hi, We have checked out the latest version of Solr source code from svn. We are trying to apply the following patch file to it. https://issues.apache.org/jira/browse/SOLR-3430 While applying the patch file using eclipse (i.e. using team--apply patch options), we are getting cross marks for certain java files and its getting updated for the following java file alone and we are able to see the patch file changes for this alone. solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestThreaded.java Why is that its not getting applied for the other set of java files which is present in the patch file and sometimes, we are getting file does not exist error even if the corresponding files are present. And also, when I try to ant build it after applying the patch, Im getting the following error common-build.xml:949: Error starting modern compiler Can you tell me If Im missing out anything? Can you please guide me on this? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-in-Applying-patch-file-tp3983842.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Editing long Solr URLs - Chrome Extension
I think I put one up already, but in case I messed up github, complex params like the fq here: http://localhost:8983/solr/select?q=:fq={!geofilt sfield=store pt=52.67,7.30 d=5} aren't properly handled. But I'm already using it occasionally Erick On Tue, May 15, 2012 at 10:02 AM, Amit Nithian anith...@gmail.com wrote: Jan Thanks for your feedback! If possible can you file these requests on the github page for the extension so I can work on them? They sound like great ideas and I'll try to incorporate all of them in future releases. Thanks Amit On May 11, 2012 9:57 AM, Jan Høydahl j...@hoydahl.no wrote: I've been testing https://chrome.google.com/webstore/detail/mbnigpeabbgkmbcbhkkbnlidcobbapff?hl=enbut I don't think it's great. Great work on this one. Simple and straight forward. A few wishes: * Sticky mode? This tool would make sense in a sidebar, to do rapid refinements * If you edit a value and click TAB, it is not updated :( * It should not be necessary to URLencode all non-ascii chars - why not leave colon, caret (^) etc as is, for better readability? * Some param values in Solr may be large, such as fl, qf or bf. Would be nice if the edit box was multi-line, or perhaps adjusts to the size of the content -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 11. mai 2012, at 07:32, Amit Nithian wrote: Hey all, I don't know about you but most of the Solr URLs I issue are fairly lengthy full of parameters on the query string and browser location bars aren't long enough/have multi-line capabilities. I tried to find something that does this but couldn't so I wrote a chrome extension to help. Please check out my blog post on the subject and please let me know if something doesn't work or needs improvement. Of course this can work for any URL with a query string but my motivation was to help edit my long Solr URLs. http://hokiesuns.blogspot.com/2012/05/manipulating-urls-with-long-query.html Thanks! Amit
Highlight feature
Hello friends I have noticed that the highlighted term of a query are returned in a second xml struct(named highlighting). Is it possible to return the highlighted terms into the doc field. I don't need the solr generated ids of the highlighted field. Thanks, Tom -- View this message in context: http://lucene.472066.n3.nabble.com/Highlight-feature-tp3983875.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with AND clause in multi core search query
Hi Erick , My Schema is as follows field name=id type=string indexed=true stored=true required=true / field name=value type=string indexed=true stored=true / field name=column1 type=string indexed=true stored=true/ field name=column2 type=string indexed=true stored=true / My data which i am indexing in core0 is id:1, value:'123456', column1:'A',column2:'null' id:2, value:'1234567895252', column1:'B',column2:'null' My data which i am indexing in core1 is id:3, value:'123456', column1:'null', column2:'C' Now my query is http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1:A; AND column2:C Response: No data In database we can achieve this by query querying separately as follows select value from core0 where column1='A' intersect select value from core0 where column1='C' Same scenario i am trying to implement in my multi core SOLR setup. But i am unable to do so. Please let me know what should i do to implement this type of scenario in SOLR. I am using SOLR 1.4 version. Thanks Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-AND-clause-in-multi-core-search-query-tp3983800p3983881.html Sent from the Solr - User mailing list archive at Nabble.com.
need help with getting exact matches to score higher
Hello all, i am trying to tune our core for exact matches on a single field (itemNo) and having issues getting it to work. in addition - i need help understanding the output from debugQuery=on where it presents the scoring. my goal is to get exact matches to arrive at the top of the results. however - what i am seeing is non-exact matches arrive at the top of the results with MUCH higher scores. // from schema.xml - i am copying itemNo in to the string field for use in boosting field name=itemNoExactMatchStr type=string indexed=true stored=false/ copyField source=itemNo dest=itemNoExactMatchStr/ // from solrconfig.xml - i have the boost set for my special exact match field and the sorting on score desc. requestHandler name=itemNoProductTypeBrandSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int *str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8 brand^.5/str* str name=q.alt*:*/str * str name=sortscore desc/str* str name=facettrue/str str name=facet.fielditemDescFacet/str str name=facet.fieldbrandFacet/str str name=facet.fielddivProductTypeIdFacet/str /lst lst name=appends /lst lst name=invariants /lst /requestHandler // analysis output from debugQuery=on here you can see that the top socre for itemNo:9030 is a part that does not start with 9030. the entries below (there are 4) all have exact matches - but they rank below this part - ??? str name=quot;0904000,1354 ,lt;b2TTZ9030C1000A* 0.585678 = (MATCH) max of: 0.585678 = (MATCH) weight(itemNo:9030^0.9 in 582979), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 27.173943 = (MATCH) fieldWeight(itemNo:9030 in 582979), product of: 2.6457512 = tf(termFreq(itemNo:9030)=7) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=582979) /str str name=quot;122,1232 ,lt;b9030* 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 499864), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 499864), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=499864) /str str name=quot;0537220,1882 ,lt;b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 538826), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 538826), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=538826) /str str name=quot;0537220,2123 ,lt;b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544313), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544313), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=544313) /str str name=quot;0537220,2087 ,lt;b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544657), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544657), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=544657) /str -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-with-getting-exact-matches-to-score-higher-tp3983882.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Urgent! Highlighting not working as expected
Hi Jack, Thanks for your reply. I did not specify dismax when query with highlighting enabled: q=text:G-Moneyhl=truehl.fl=*, that was the whole query string I sent. What puzzled me is that the string field cr_firstname was copied to text, but it was not highlighted. But if I use q=cr_fristname:G-Moneyhl=truehl.fl=*, it will be highlighted. I attached my solrconfig.xml here, could you please take a look? Thanks again! http://lucene.472066.n3.nabble.com/file/n3983883/solrconfig.xml solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Urgent-Highlighting-not-working-as-expected-tp3983755p3983883.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with AND clause in multi core search query
Right, but for that to work, there's an implicit connection between the docs in core1 and core0, I assume provided by 123456 as a foreign key or something. There's nothing automatically built in like this in Solr 1.4 (joins come close, but those are trunk). Whenever you try to make Solr act just like a database, you're probably doing something you shouldn't. Solr is a very good search engine, but it's not a RDBMS and shouldn't be asked to behave like one. In your case, consider de-normalizing the data and indexing all the related data in a single document, even if it means repeating the data. Sometimes this requires some judicious creativity, but it's the first thing I'd look at. Best Erick On Tue, May 15, 2012 at 10:54 AM, ravicv ravichandra...@gmail.com wrote: Hi Erick , My Schema is as follows field name=id type=string indexed=true stored=true required=true / field name=value type=string indexed=true stored=true / field name=column1 type=string indexed=true stored=true/ field name=column2 type=string indexed=true stored=true / My data which i am indexing in core0 is id:1, value:'123456', column1:'A', column2:'null' id:2, value:'1234567895252', column1:'B', column2:'null' My data which i am indexing in core1 is id:3, value:'123456', column1:'null', column2:'C' Now my query is http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=column1:A; AND column2:C Response: No data In database we can achieve this by query querying separately as follows select value from core0 where column1='A' intersect select value from core0 where column1='C' Same scenario i am trying to implement in my multi core SOLR setup. But i am unable to do so. Please let me know what should i do to implement this type of scenario in SOLR. I am using SOLR 1.4 version. Thanks Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-AND-clause-in-multi-core-search-query-tp3983800p3983881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help with getting exact matches to score higher
Hello, From the response you pasted here, it looks like the field itemNoExactMatchStr never matched. Can you try matching in that field only and ensure you have matches ? Given the ^30 boost, you should have high scores on this field... Hope this helps, -- Tanguy 2012/5/15 geeky2 gee...@hotmail.com Hello all, i am trying to tune our core for exact matches on a single field (itemNo) and having issues getting it to work. in addition - i need help understanding the output from debugQuery=on where it presents the scoring. my goal is to get exact matches to arrive at the top of the results. however - what i am seeing is non-exact matches arrive at the top of the results with MUCH higher scores. // from schema.xml - i am copying itemNo in to the string field for use in boosting field name=itemNoExactMatchStr type=string indexed=true stored=false/ copyField source=itemNo dest=itemNoExactMatchStr/ // from solrconfig.xml - i have the boost set for my special exact match field and the sorting on score desc. requestHandler name=itemNoProductTypeBrandSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int *str name=qfitemNoExactMatchStr^30 itemNo^.9 divProductTypeDesc^.8 brand^.5/str* str name=q.alt*:*/str * str name=sortscore desc/str* str name=facettrue/str str name=facet.fielditemDescFacet/str str name=facet.fieldbrandFacet/str str name=facet.fielddivProductTypeIdFacet/str /lst lst name=appends /lst lst name=invariants /lst /requestHandler // analysis output from debugQuery=on here you can see that the top socre for itemNo:9030 is a part that does not start with 9030. the entries below (there are 4) all have exact matches - but they rank below this part - ??? str name=0904000,1354 ,b2TTZ9030C1000A* 0.585678 = (MATCH) max of: 0.585678 = (MATCH) weight(itemNo:9030^0.9 in 582979), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 27.173943 = (MATCH) fieldWeight(itemNo:9030 in 582979), product of: 2.6457512 = tf(termFreq(itemNo:9030)=7) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=582979) /str str name=122,1232 ,b9030* 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 499864), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 499864), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=499864) /str str name=0537220,1882 ,b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 538826), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 538826), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=538826) /str str name=0537220,2123 ,b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544313), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544313), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=544313) /str str name=0537220,2087 ,b9030 * 0.22136548 = (MATCH) max of: 0.22136548 = (MATCH) weight(itemNo:9030^0.9 in 544657), product of: 0.021552926 = queryWeight(itemNo:9030^0.9), product of: 0.9 = boost 10.270785 = idf(docFreq=55, maxDocs=594893) 0.0023316324 = queryNorm 10.270785 = (MATCH) fieldWeight(itemNo:9030 in 544657), product of: 1.0 = tf(termFreq(itemNo:9030)=1) 10.270785 = idf(docFreq=55, maxDocs=594893) 1.0 = fieldNorm(field=itemNo, doc=544657) /str -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-with-getting-exact-matches-to-score-higher-tp3983882.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight feature
I am also working on highlighting. I don't think so. And the ids in the highlighting part are the ids of the docs retrieved. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlight-feature-tp3983875p3983887.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Urgent! Highlighting not working as expected
In the case of text:G-Money, the term is analyzed by Solr into the phrase g money, which matches in the text field, but will not match for a string field containing the literal text G-Money. But when you query cr_fristname:G-Money, the term is not tokenized by the Solr analyzer because it is a value for a string field, and a literal match occurs in the string field cr_fristname. I think that fully accounts for the behavior you see. You might consider having a cr_fristname_text field which is tokenized text with a copyField from cr_fristname that fully supports highlighting of text terms. BTW, I presume that should be first name, not frist name. -- Jack Krupansky -Original Message- From: TJ Tong Sent: Tuesday, May 15, 2012 11:15 AM To: solr-user@lucene.apache.org Subject: Re: Urgent! Highlighting not working as expected Hi Jack, Thanks for your reply. I did not specify dismax when query with highlighting enabled: q=text:G-Moneyhl=truehl.fl=*, that was the whole query string I sent. What puzzled me is that the string field cr_firstname was copied to text, but it was not highlighted. But if I use q=cr_fristname:G-Moneyhl=truehl.fl=*, it will be highlighted. I attached my solrconfig.xml here, could you please take a look? Thanks again! http://lucene.472066.n3.nabble.com/file/n3983883/solrconfig.xml solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Urgent-Highlighting-not-working-as-expected-tp3983755p3983883.html Sent from the Solr - User mailing list archive at Nabble.com.
Index an URL
Hi, I have a few questions, please bear with me: 1- I have a theory. nutch may be used to index to solr when we don't have access to URL's file system, while we can use curl when we do have access. Am I correct? 2- A tutorial I have been reading is talking about different levels of id. Is there such a thing (exid6, exid7 etc)? 3- When I use curl http://localhost:8983/solr/update/extract?literal.id=exid7commit=true; -F myfile=@serialized-form.html, I get ERROR: [doc=exid7] unknown field 'ignored_link'/pre. Is this something exid7 gives me? Where does this field ignored_link come from? Do I need to add all these fields to schema.xml in order not to get such error? What is the safest way? Regards,
Re: Boosting on field empty or not
I have figured it out using your recommendation...I just had to give it a high enough boost. BTW its a float field On Tue, May 15, 2012 at 9:21 AM, Ahmet Arslan iori...@yahoo.com wrote: The problem with what you provided is it is boosting ALL documents whether the field is empty or not Then all of your fields are non-empty? What is the type of your field?
Re: - Solr 4.0 - How do I enable JSP support ? ...
In 4.0, solr no longer uses JSP, so it is not enabled in the example setup. You can enable JSP in your servlet container using whatever method they provide. For Jetty, using start.jar, you need to add the command line: java -jar start.jar -OPTIONS=jsp ryan On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga
Re: Highlight feature
That is the default response format. If you would like to change that, you could extend the search handler or post process the XML data. Another option would be to use the javabin (if your app is java based) and build xml the way your app would need. Best Regards, Ramesh
Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!
Hello, Unfortunately it seems like I spoke too early. Today morning I received the same error again even after disabling the iptables. The weird thing is only one out of 6 or 7 queries fails as evidenced in the stack traces below. The query below the stack trace gave a 'status=500' subsequent queries look fine [#|2012-05-15T08:12:38.703-0400|SEVERE|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-8;_RequestID=9f54ea89-357a-4c1b-87a1-fbaacc9fd0ee;|org.apache.solr.common.SolrException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:275) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:601) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:875) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:365) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:285) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:221) at com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:269) at com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run(SSLWorkerThread.java:111) Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:249) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:129) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:103) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
Re: Urgent! Highlighting not working as expected
Thanks, Jack! I think you are right. But I also copied cr_firstname to text, I assumed Solr would highlight cr_firstname if there is a match. I guess the only solution is to copy all field to another field which is not tokenized. Yes, it is firstname, good catch! Thanks again! TJ -- View this message in context: http://lucene.472066.n3.nabble.com/Urgent-Highlighting-not-working-as-expected-tp3983755p3983907.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!
I have seen similar errors before when the solr version and solrj version in the client don't match. Best Regards, Ramesh
apostrophe / ayn / alif
We are using the ICUFoldingFilterFactory with great success to fold diacritics so searches with and without the diacritics get the same results. We recently discovered we have some Korean records that use an alif diacritic instead of an apostrophe, and this diacritic is NOT getting folded. Has anyone experienced this for alif or ayn characters? Do you have a solution? - Naomi
Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!
I have already triple cross-checked that all my clients are using same version as the server which is 3.6 Thanks Ravi Kiran On Tue, May 15, 2012 at 2:09 PM, Ramesh K Balasubramanian beeyar...@yahoo.com wrote: I have seen similar errors before when the solr version and solrj version in the client don't match. Best Regards, Ramesh
Solr Caches
Hello, I am trying to understand how I can size the caches for my solr powered application. Some details on the index and application : Solr Version : 1.3 JDK : 1.5.0_14 32 bit OS : Solaris 10 App Server : Weblogic 10 MP1 Number of documents : 1 million Total number of fields : 1000 (750 strings, 225 int/float/double/long, 25 boolean) Number of fields on which faceting and filtering can be done : 400 Physical size of index : 600MB Number of unique values for a field : Ranges from 5 - 1000. Average of 150 -Xms and -Xmx vals for jvm : 3G Expected number of concurrent users : 15 No sorting planned for now Now I want to set appropriate values for the caches. I have put below some of my understanding and questions about the caches. Please correct and answer accordingly. FilterCache: As per the solr wiki, this is used to store an unordered list of Ids of matching documents for an fq param. So if a query contains two fq params, it will create two separate entries for each of these fq params. The value of each entry is the list of ids of all documents across the index that match the corresponding fq param. Each entry is independent of any other entry. A minimum size for filterCache could be (total number of fields * avg number of unique values per field) ? Is this correct ? I have not enabled useFilterForSortedQuery. Max physical size of the filter cache would be (size * avg byte size of a document id * avg number of docs returned per fq param) ? QueryResultsCache: Used to store an ordered list of ids of the documents that match the most commonly used searches. So if my query is something like q=Status:Activefq=Org:Apachefq=Version:13, it will create one entry that contains list of ids of documents that match this full query. Is this correct ? How can I size my queryResultsCache ? Some entries from solrconfig.xml : queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached Max physical size of the filterCache would be (size * avg byte size of a document id * avg number of docs per query). Is this correct ? documentCache: Stores the documents that are stored in the index. So I do two searches that return three documents each with 1 document being common between both result sets. This will result in 5 entries in the documentCache for the 5 unique documents that have been returned for the two queries ? Is this correct ? For sizing, SolrWiki states that *The size for the documentCache should always be greater than max_results * max_concurrent_queries*. Why do we need the max_concurrent_queries parameter here ? Is it when max_results is much lesser than numDocs ? In my case, a q=*:*search is done the first time the index is loaded. So, will setting documentCache size to numDocs be correct ? Can this be like the max that I need to allocate ? Max physical size of document cache would be (size * avg byte size of a document in the index). Is this correct ? Thank you -Rahul
Re: Boosting on field empty or not
Scratch that...it still seems to be boosting documents where the value of the field is empty. bq=regularprice:[0.01 TO *]^50 Results with bq set: doc float name=score2.2172112/float str name=codebhl-ltab-30/str /doc Results without bq set: doc float name=score2.4847748/float str name=codebhl-ltab-30/str /doc On Tue, May 15, 2012 at 12:40 PM, Donald Organ dor...@donaldorgan.comwrote: I have figured it out using your recommendation...I just had to give it a high enough boost. BTW its a float field On Tue, May 15, 2012 at 9:21 AM, Ahmet Arslan iori...@yahoo.com wrote: The problem with what you provided is it is boosting ALL documents whether the field is empty or not Then all of your fields are non-empty? What is the type of your field?
Re: - Solr 4.0 - How do I enable JSP support ? ...
Alright; thanks. Tried with -OPTIONS=jsp and am still seeing this on console Š 2012-05-15 12:47:08.837:INFO:solr:No JSP support. Check that JSP jars are in lib/jsp and that the JSP option has been specified to start.jar I am trying to go after http://localhost:8983/solr/collection1/admin/zookeeper.jsp (or its equivalent in 4.0) after going through http://wiki.apache.org/solr/SolrCloud May I know the right zookeeper url in 4.0 please? Thanks Naga On 5/15/12 10:56 AM, Ryan McKinley ryan...@gmail.com wrote: In 4.0, solr no longer uses JSP, so it is not enabled in the example setup. You can enable JSP in your servlet container using whatever method they provide. For Jetty, using start.jar, you need to add the command line: java -jar start.jar -OPTIONS=jsp ryan On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga
Replacing payloads for per-document-per-keyword scores
Hello Hoss and the list, We are currently using Lucene payloads to store per-document-per-keyword scores for our dataset. Our dataset consists of photos with keywords assigned (only once each) to them. The index is about 90 GB, running on 24-core machines with dedicated 10k SAS drives, and 16/32 GB allocated to the JVM. When searching the payloads field, our 98 percentile query time is at 2 seconds even with trivially low queries per second. I have asked several Lucene committers about this and it's believed that the implementation of payloads being so general is the cause of the slowness. Hoss guessed that we could override Term Frequency with PreAnalyzedField[1] for the per-keyword scores, since keywords (tags) always have a Term Frequency of 1 and the TF calculation is very fast. However it turns out that you can't[2] specify TF in the PreAnalyzedField. Is there any other way to override Term Frequency during index time? If not, where in the code could this be implemented? An obvious option is to repeat the keyword as many times as its payload score, but that would drastically increase the amount of data per document sent during index time. I'd welcome any other per-document-per-keyword score solutions, or some way to speed up searching a payload field. Thanks, - Neil [1] https://issues.apache.org/jira/browse/SOLR-1535 [2] https://issues.apache.org/jira/browse/SOLR-1535?focusedCommentId=13273501#comment-13273501
Re: apostrophe / ayn / alif
On Tue, May 15, 2012 at 2:47 PM, Naomi Dushay ndus...@stanford.edu wrote: We are using the ICUFoldingFilterFactory with great success to fold diacritics so searches with and without the diacritics get the same results. We recently discovered we have some Korean records that use an alif diacritic instead of an apostrophe, and this diacritic is NOT getting folded. Has anyone experienced this for alif or ayn characters? Do you have a solution? What do you mean alif diacritic in Korean? Alif (ا) isn't a diacritic and isn't used in Korean. Or did you mean arabic dagger alif ( ٰ ) ? This is not a diacritic in unicode (though its a combining mark). -- lucidimagination.com
Re: Show a portion of searchable text in Solr
Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Show-a-portion-of-searchable-text-in-Solr-tp3983613p3983942.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replacing payloads for per-document-per-keyword scores
Hello Neil, if manipulating tf is a possible approach, why don't extend KeywordTokenizer to make it work in the following manner: 3|wheel - {wheel,wheel,wheel} it will allow supply your per-term-per-doc boosts as a prefixes for field values and multiply them during indexing internally. The second consideration is - have you considered Click Scoring Tools from lucidworks as a relevant approach? Regards On Wed, May 16, 2012 at 12:02 AM, Neil Hooey nho...@gmail.com wrote: Hello Hoss and the list, We are currently using Lucene payloads to store per-document-per-keyword scores for our dataset. Our dataset consists of photos with keywords assigned (only once each) to them. The index is about 90 GB, running on 24-core machines with dedicated 10k SAS drives, and 16/32 GB allocated to the JVM. When searching the payloads field, our 98 percentile query time is at 2 seconds even with trivially low queries per second. I have asked several Lucene committers about this and it's believed that the implementation of payloads being so general is the cause of the slowness. Hoss guessed that we could override Term Frequency with PreAnalyzedField[1] for the per-keyword scores, since keywords (tags) always have a Term Frequency of 1 and the TF calculation is very fast. However it turns out that you can't[2] specify TF in the PreAnalyzedField. Is there any other way to override Term Frequency during index time? If not, where in the code could this be implemented? An obvious option is to repeat the keyword as many times as its payload score, but that would drastically increase the amount of data per document sent during index time. I'd welcome any other per-document-per-keyword score solutions, or some way to speed up searching a payload field. Thanks, - Neil [1] https://issues.apache.org/jira/browse/SOLR-1535 [2] https://issues.apache.org/jira/browse/SOLR-1535?focusedCommentId=13273501#comment-13273501 -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
doing a full-import after deleting records in the database - maxDocs
hello, After doing a DIH full-import (with clean=true) after deleting records in the database, i noticed that the number of documents processed, did change. example: Indexing completed. Added/Updated: 595908 documents. Deleted 0 documents. however, i noticed the numbers on the statistics page did not change nor do they match the number of indexed records - can someone help me understand the difference in these numbers and the meaning of maxDoc / numDoc? numDocs : 594893 maxDoc : 594893 -- View this message in context: http://lucene.472066.n3.nabble.com/doing-a-full-import-after-deleting-records-in-the-database-maxDocs-tp3983948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting on field empty or not
Scratch that...it still seems to be boosting documents where the value of the field is empty. bq=regularprice:[0.01 TO *]^50 Results with bq set: doc float name=score2.2172112/float str name=codebhl-ltab-30/str /doc Results without bq set: doc float name=score2.4847748/float str name=codebhl-ltab-30/str /doc Important thing is the order. Does the order of results change in a way that you want? (When you add bq) It is not a good idea to compare scores of two different queries. I *think* queryNorm is causing this difference. You can add debugQuery=on and see what is the difference.
Re: Boosting on field empty or not
If the bq is only supposed apply the boost when the field value is greater than 0.01 why would trying another query make sure this is working. Its applying the boost to all the fields, yes when the boost is high enough most of documents with a value GT 0.01 show up first however since it is applying the boost to all the documents sometimes documents without a value in this field appear before those that do. On Tue, May 15, 2012 at 4:51 PM, Ahmet Arslan iori...@yahoo.com wrote: Scratch that...it still seems to be boosting documents where the value of the field is empty. bq=regularprice:[0.01 TO *]^50 Results with bq set: doc float name=score2.2172112/float str name=codebhl-ltab-30/str /doc Results without bq set: doc float name=score2.4847748/float str name=codebhl-ltab-30/str /doc Important thing is the order. Does the order of results change in a way that you want? (When you add bq) It is not a good idea to compare scores of two different queries. I *think* queryNorm is causing this difference. You can add debugQuery=on and see what is the difference.
Re: Exception in DataImportHandler (stack overflow)
Hi, Jon: Well, you don't see that every day! Is it possible that you have something weird going on in your DDL and/or queries, like a tree schema that now suddenly has a cyclical reference? Michael On Tue, May 15, 2012 at 4:33 PM, Jon Drukman jdruk...@gmail.com wrote: I have a machine which does a full update using DataImportHandler every hour. It worked up until a little while ago. I did not change the dataconfig.xml or version of Solr. Here is the beginning of the error in the log (the real thing runs for thousands of lines) 2012-05-15 12:44:30.724166500 SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.StackOverflowError 2012-05-15 12:44:30.724168500 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) 2012-05-15 12:44:30.724169500 at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) 2012-05-15 12:44:30.724171500 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) 2012-05-15 12:44:30.724219500 at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) 2012-05-15 12:44:30.724221500 at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) 2012-05-15 12:44:30.724223500 at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError 2012-05-15 12:44:30.724225500 at java.lang.String.checkBounds(String.java:404) 2012-05-15 12:44:30.724234500 at java.lang.String.init(String.java:450) 2012-05-15 12:44:30.724235500 at java.lang.String.init(String.java:523) 2012-05-15 12:44:30.724236500 at java.net.SocketOutputStream.socketWrite0(Native Method) 2012-05-15 12:44:30.724238500 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) 2012-05-15 12:44:30.724239500 at java.net.SocketOutputStream.write(SocketOutputStream.java:153) 2012-05-15 12:44:30.724253500 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 2012-05-15 12:44:30.724254500 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 2012-05-15 12:44:30.724256500 at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345) 2012-05-15 12:44:30.724257500 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983) 2012-05-15 12:44:30.724259500 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163) 2012-05-15 12:44:30.724267500 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618) 2012-05-15 12:44:30.724268500 at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) 2012-05-15 12:44:30.724270500 at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) 2012-05-15 12:44:30.724271500 at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) 2012-05-15 12:44:30.724273500 at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) 2012-05-15 12:44:30.724280500 at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2478) 2012-05-15 12:44:30.724282500 at com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1584) 2012-05-15 12:44:30.724283500 at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4364) 2012-05-15 12:44:30.724285500 at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1360) 2012-05-15 12:44:30.724286500 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2652) 2012-05-15 12:44:30.724321500 at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) 2012-05-15 12:44:30.724322500 at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) 2012-05-15 12:44:30.724324500 at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) 2012-05-15 12:44:30.724325500 at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) 2012-05-15 12:44:30.724327500 at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2478) 2012-05-15 12:44:30.724334500 at com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1584) 2012-05-15 12:44:30.724335500 at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4364) 2012-05-15 12:44:30.724336500 at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1360) 2012-05-15 12:44:30.724338500 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2652) 2012-05-15 12:44:30.724339500 at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) 2012-05-15 12:44:30.724345500 at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) 2012-05-15 12:44:30.724347500 at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) 2012-05-15 12:44:30.724348500 at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) 2012-05-15 12:44:30.724350500 at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2478) 2012-05-15 12:44:30.724351500 at com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1584) 2012-05-15
Re: doing a full-import after deleting records in the database - maxDocs
Hello, geeky2: In statistics in the update section, do you see a non-zero value for docsPending? Thanks, Michael On Tue, May 15, 2012 at 4:49 PM, geeky2 gee...@hotmail.com wrote: hello, After doing a DIH full-import (with clean=true) after deleting records in the database, i noticed that the number of documents processed, did change. example: Indexing completed. Added/Updated: 595908 documents. Deleted 0 documents. however, i noticed the numbers on the statistics page did not change nor do they match the number of indexed records - can someone help me understand the difference in these numbers and the meaning of maxDoc / numDoc? numDocs : 594893 maxDoc : 594893 -- View this message in context: http://lucene.472066.n3.nabble.com/doing-a-full-import-after-deleting-records-in-the-database-maxDocs-tp3983948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception in DataImportHandler (stack overflow)
i don't think so, my config is straightforward: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://x/xx user=x password=x batchSize=-1 / document entity name=content query=select content_id, description, title, add_date from content_solr where active = '1' entity name=tag query=select tag_id from tags_assoc where content_id = '${content.content_id}' / entity name=likes query=select count(1) as likes from votes where content_id = '${content.content_id}' / entity name=views query=select sum(views) as views from media_views mv join content_media cm USING (media_id) WHERE cm.content_id = '${content.content_id}' / /entity /document /dataConfig i'm triggering the import with: http://localhost:8983/solr/dataimport?command=full-importclean=truecommit=true On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, Jon: Well, you don't see that every day! Is it possible that you have something weird going on in your DDL and/or queries, like a tree schema that now suddenly has a cyclical reference? Michael On Tue, May 15, 2012 at 4:33 PM, Jon Drukman jdruk...@gmail.com wrote: I have a machine which does a full update using DataImportHandler every hour. It worked up until a little while ago. I did not change the dataconfig.xml or version of Solr. Here is the beginning of the error in the log (the real thing runs for thousands of lines) 2012-05-15 12:44:30.724166500 SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.StackOverflowError 2012-05-15 12:44:30.724168500 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) 2012-05-15 12:44:30.724169500 at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) 2012-05-15 12:44:30.724171500 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) 2012-05-15 12:44:30.724219500 at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) 2012-05-15 12:44:30.724221500 at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) 2012-05-15 12:44:30.724223500 at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError 2012-05-15 12:44:30.724225500 at java.lang.String.checkBounds(String.java:404) 2012-05-15 12:44:30.724234500 at java.lang.String.init(String.java:450) 2012-05-15 12:44:30.724235500 at java.lang.String.init(String.java:523) 2012-05-15 12:44:30.724236500 at java.net.SocketOutputStream.socketWrite0(Native Method) 2012-05-15 12:44:30.724238500 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) 2012-05-15 12:44:30.724239500 at java.net.SocketOutputStream.write(SocketOutputStream.java:153) 2012-05-15 12:44:30.724253500 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 2012-05-15 12:44:30.724254500 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 2012-05-15 12:44:30.724256500 at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345) 2012-05-15 12:44:30.724257500 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983) 2012-05-15 12:44:30.724259500 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163) 2012-05-15 12:44:30.724267500 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618) 2012-05-15 12:44:30.724268500 at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) 2012-05-15 12:44:30.724270500 at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) 2012-05-15 12:44:30.724271500 at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) 2012-05-15 12:44:30.724273500 at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) 2012-05-15 12:44:30.724280500 at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2478) 2012-05-15 12:44:30.724282500 at com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1584) 2012-05-15 12:44:30.724283500 at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4364) 2012-05-15 12:44:30.724285500 at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1360) 2012-05-15 12:44:30.724286500 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2652) 2012-05-15 12:44:30.724321500 at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) 2012-05-15 12:44:30.724322500 at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) 2012-05-15 12:44:30.724324500 at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) 2012-05-15 12:44:30.724325500 at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) 2012-05-15 12:44:30.724327500 at
Re: - Solr 4.0 - How do I enable JSP support ? ...
Finally got a handle on this by looking into the New Admin UI - http://localhost:8983/solr/#/~cloud Thanks Naga On 5/15/12 12:53 PM, Naga Vijayapuram nvija...@tibco.com wrote: Alright; thanks. Tried with -OPTIONS=jsp and am still seeing this on console Š 2012-05-15 12:47:08.837:INFO:solr:No JSP support. Check that JSP jars are in lib/jsp and that the JSP option has been specified to start.jar I am trying to go after http://localhost:8983/solr/collection1/admin/zookeeper.jsp (or its equivalent in 4.0) after going through http://wiki.apache.org/solr/SolrCloud May I know the right zookeeper url in 4.0 please? Thanks Naga On 5/15/12 10:56 AM, Ryan McKinley ryan...@gmail.com wrote: In 4.0, solr no longer uses JSP, so it is not enabled in the example setup. You can enable JSP in your servlet container using whatever method they provide. For Jetty, using start.jar, you need to add the command line: java -jar start.jar -OPTIONS=jsp ryan On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga
- When is Solr 4.0 due for Release? ...
… Any idea, anyone? Thanks Naga
RE: Exception in DataImportHandler (stack overflow)
Shot in the dark here, but try adding readOnly=true to your dataSource tag. dataSource readOnly=true type=JdbcDataSource ... / This sets autocommit to true and sets the Holdability to ResultSet.CLOSE_CURSORS_AT_COMMIT. DIH does not explicitly close resultsets and maybe if your JDBC driver also manages this poorly you could end up with strange conditions like the one you're getting? It could be a case where your data has grown just over the limit your setup can handle under such an unfortunate circumstance. Let me know if this solves it. If so, we probably should open a bug report and get this fixed in DIH. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jon Drukman [mailto:jdruk...@gmail.com] Sent: Tuesday, May 15, 2012 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Exception in DataImportHandler (stack overflow) i don't think so, my config is straightforward: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://x/xx user=x password=x batchSize=-1 / document entity name=content query=select content_id, description, title, add_date from content_solr where active = '1' entity name=tag query=select tag_id from tags_assoc where content_id = '${content.content_id}' / entity name=likes query=select count(1) as likes from votes where content_id = '${content.content_id}' / entity name=views query=select sum(views) as views from media_views mv join content_media cm USING (media_id) WHERE cm.content_id = '${content.content_id}' / /entity /document /dataConfig i'm triggering the import with: http://localhost:8983/solr/dataimport?command=full-importclean=truecommit=true On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, Jon: Well, you don't see that every day! Is it possible that you have something weird going on in your DDL and/or queries, like a tree schema that now suddenly has a cyclical reference? Michael On Tue, May 15, 2012 at 4:33 PM, Jon Drukman jdruk...@gmail.com wrote: I have a machine which does a full update using DataImportHandler every hour. It worked up until a little while ago. I did not change the dataconfig.xml or version of Solr. Here is the beginning of the error in the log (the real thing runs for thousands of lines) 2012-05-15 12:44:30.724166500 SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.StackOverflowError 2012-05-15 12:44:30.724168500 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) 2012-05-15 12:44:30.724169500 at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) 2012-05-15 12:44:30.724171500 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) 2012-05-15 12:44:30.724219500 at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) 2012-05-15 12:44:30.724221500 at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) 2012-05-15 12:44:30.724223500 at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError 2012-05-15 12:44:30.724225500 at java.lang.String.checkBounds(String.java:404) 2012-05-15 12:44:30.724234500 at java.lang.String.init(String.java:450) 2012-05-15 12:44:30.724235500 at java.lang.String.init(String.java:523) 2012-05-15 12:44:30.724236500 at java.net.SocketOutputStream.socketWrite0(Native Method) 2012-05-15 12:44:30.724238500 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) 2012-05-15 12:44:30.724239500 at java.net.SocketOutputStream.write(SocketOutputStream.java:153) 2012-05-15 12:44:30.724253500 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 2012-05-15 12:44:30.724254500 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 2012-05-15 12:44:30.724256500 at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345) 2012-05-15 12:44:30.724257500 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983) 2012-05-15 12:44:30.724259500 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163) 2012-05-15 12:44:30.724267500 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618) 2012-05-15 12:44:30.724268500 at com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) 2012-05-15 12:44:30.724270500 at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) 2012-05-15 12:44:30.724271500 at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) 2012-05-15 12:44:30.724273500 at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) 2012-05-15 12:44:30.724280500 at
Re: Boosting on field empty or not
If the bq is only supposed apply the boost when the field value is greater than 0.01 why would trying another query make sure this is working. Its applying the boost to all the fields, yes when the boost is high enough most of documents with a value GT 0.01 show up first however since it is applying the boost to all the documents sometimes documents without a value in this field appear before those that do. If boosting is applied to all documents, then why result order is changing? Sometimes documents without a value can show-up before because there are other factors that contribute score calculation. http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html If you add debugQuery=on, you can see detailed explanation about how calculation is done.
Re: Exception in DataImportHandler (stack overflow)
I fixed it for now by upping the wait_timeout on the mysql server. Apparently Solr doesn't like having its connection yanked out from under it and/or isn't smart enough to reconnect if the server goes away. I'll set it back the way it was and try your readOnly option. Is there an option with DataImportHandler to have it transmit one or more arbitrary SQL statements after connecting? If there was, I could just send SET wait_timeout=86400; after connecting. That would probably prevent this issue. -jsd- On Tue, May 15, 2012 at 2:35 PM, Dyer, James james.d...@ingrambook.comwrote: Shot in the dark here, but try adding readOnly=true to your dataSource tag. dataSource readOnly=true type=JdbcDataSource ... / This sets autocommit to true and sets the Holdability to ResultSet.CLOSE_CURSORS_AT_COMMIT. DIH does not explicitly close resultsets and maybe if your JDBC driver also manages this poorly you could end up with strange conditions like the one you're getting? It could be a case where your data has grown just over the limit your setup can handle under such an unfortunate circumstance. Let me know if this solves it. If so, we probably should open a bug report and get this fixed in DIH. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jon Drukman [mailto:jdruk...@gmail.com] Sent: Tuesday, May 15, 2012 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Exception in DataImportHandler (stack overflow) i don't think so, my config is straightforward: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://x/xx user=x password=x batchSize=-1 / document entity name=content query=select content_id, description, title, add_date from content_solr where active = '1' entity name=tag query=select tag_id from tags_assoc where content_id = '${content.content_id}' / entity name=likes query=select count(1) as likes from votes where content_id = '${content.content_id}' / entity name=views query=select sum(views) as views from media_views mv join content_media cm USING (media_id) WHERE cm.content_id = '${content.content_id}' / /entity /document /dataConfig i'm triggering the import with: http://localhost:8983/solr/dataimport?command=full-importclean=truecommit=true On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, Jon: Well, you don't see that every day! Is it possible that you have something weird going on in your DDL and/or queries, like a tree schema that now suddenly has a cyclical reference? Michael On Tue, May 15, 2012 at 4:33 PM, Jon Drukman jdruk...@gmail.com wrote: I have a machine which does a full update using DataImportHandler every hour. It worked up until a little while ago. I did not change the dataconfig.xml or version of Solr. Here is the beginning of the error in the log (the real thing runs for thousands of lines) 2012-05-15 12:44:30.724166500 SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.StackOverflowError 2012-05-15 12:44:30.724168500 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) 2012-05-15 12:44:30.724169500 at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) 2012-05-15 12:44:30.724171500 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) 2012-05-15 12:44:30.724219500 at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) 2012-05-15 12:44:30.724221500 at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) 2012-05-15 12:44:30.724223500 at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError 2012-05-15 12:44:30.724225500 at java.lang.String.checkBounds(String.java:404) 2012-05-15 12:44:30.724234500 at java.lang.String.init(String.java:450) 2012-05-15 12:44:30.724235500 at java.lang.String.init(String.java:523) 2012-05-15 12:44:30.724236500 at java.net.SocketOutputStream.socketWrite0(Native Method) 2012-05-15 12:44:30.724238500 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) 2012-05-15 12:44:30.724239500 at java.net.SocketOutputStream.write(SocketOutputStream.java:153) 2012-05-15 12:44:30.724253500 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 2012-05-15 12:44:30.724254500 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 2012-05-15 12:44:30.724256500 at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345) 2012-05-15 12:44:30.724257500 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983) 2012-05-15
should i upgrade
We're running solr v1.4.1 w/ approx 30M - 40M records at any given time. Often, socket timeout exceptions occur for a search query. Is there a compelling reason to upgrade? I.e. can u set a socket timeout in solrconfig.xml in the latest version and not in v1.4.1 ?
Re: First query to find meta data, second to search. How to group into one?
Hi Samarendra, This does look like a candidate for a custom query component if you want to do this inside Solr. You can of course continue to do this at the client. -sujit On May 15, 2012, at 12:26 PM, Samarendra Pratap wrote: Hi, I need a suggestion for improving relevance of search results. Any help/pointers are appreciated. We have following fields (plus a lot more) in our schema title description category_id (multivalued) We are using mm=70% in solrconfig.xml We are using qf=title description We are not doing phrase query in q In case of a multi-word search text, mostly the end results are the junk ones. Because the words, mentioned in search text, are written in different fields and in different contexts. For example searching for water proof (without double quotes) brings a record where title = rose water and description = ... no proof of contamination ... Our priority is to remove irrelevant results, as much as possible. Increasing mm will not solve this completely because user input may not be always correct to be benefited by high mm. To remove irrelevant records we worked on following solution (or work-around) - We are firing first query to get top n results. We assume that first n results are mostly good results. n is dynamic within a predefined minimum and maximum value. - We are calculating frequency of category ids in these top results. We are not using facets because that gives count for all, relevant or irrelevant, results. - Based on category frequencies within top matching results we are trying to find a few most frequent categories by simple calculation. Now we are very confident that these categories are the ones which best suit to our query. - Finally we are firing a second query with top categories, calculated above, in filter query (fq). The quality of results really increased very much so I thought to try it the standard way. Does it require writing a plugin if I want to move above logic into Solr? Which component do I need to modify - QueryComponent? Or is there any better or even equivalent method in Solr of doing this or similar thing? Thanks -- Regards, Samar
Re: Boosting on field empty or not
Just tested to make sure. queryNorm is changing after you add bq parameter. For example : 0.00317763 = queryNorm becomes 0.0028020076 = queryNorm. Since all scores are multiplied by this queryNorm factor, score of a document ( even if it is not effected/boosted by bq) changes. before bq=SOURCE:Haberler^100 doc float name=score5.246903/float str name=ID4529806/str str name=SOURCEEnSonHaber/str /doc after bq=SOURCE:Haberler^100 doc float name=score4.626675/float str name=ID4529806/str str name=SOURCEEnSonHaber/str /doc Does that makes sense? If the bq is only supposed apply the boost when the field value is greater than 0.01 why would trying another query make sure this is working. Its applying the boost to all the fields, yes when the boost is high enough most of documents with a value GT 0.01 show up first however since it is applying the boost to all the documents sometimes documents without a value in this field appear before those that do. If boosting is applied to all documents, then why result order is changing? Sometimes documents without a value can show-up before because there are other factors that contribute score calculation. http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html If you add debugQuery=on, you can see detailed explanation about how calculation is done.
Distributed search between solrclouds?
Hi, Would distributed search (the old way where you provide the solr host IP's etc.) still work between different solrclouds? thanks, Darren
Re: Exception in DataImportHandler (stack overflow)
OK, setting the wait_timeout back to its previous value and adding readOnly didn't help, I got the stack overflow again. I re-upped the mysql timeout value again. -jsd- On Tue, May 15, 2012 at 2:42 PM, Jon Drukman jdruk...@gmail.com wrote: I fixed it for now by upping the wait_timeout on the mysql server. Apparently Solr doesn't like having its connection yanked out from under it and/or isn't smart enough to reconnect if the server goes away. I'll set it back the way it was and try your readOnly option. Is there an option with DataImportHandler to have it transmit one or more arbitrary SQL statements after connecting? If there was, I could just send SET wait_timeout=86400; after connecting. That would probably prevent this issue. -jsd- On Tue, May 15, 2012 at 2:35 PM, Dyer, James james.d...@ingrambook.comwrote: Shot in the dark here, but try adding readOnly=true to your dataSource tag. dataSource readOnly=true type=JdbcDataSource ... / This sets autocommit to true and sets the Holdability to ResultSet.CLOSE_CURSORS_AT_COMMIT. DIH does not explicitly close resultsets and maybe if your JDBC driver also manages this poorly you could end up with strange conditions like the one you're getting? It could be a case where your data has grown just over the limit your setup can handle under such an unfortunate circumstance. Let me know if this solves it. If so, we probably should open a bug report and get this fixed in DIH. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jon Drukman [mailto:jdruk...@gmail.com] Sent: Tuesday, May 15, 2012 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Exception in DataImportHandler (stack overflow) i don't think so, my config is straightforward: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://x/xx user=x password=x batchSize=-1 / document entity name=content query=select content_id, description, title, add_date from content_solr where active = '1' entity name=tag query=select tag_id from tags_assoc where content_id = '${content.content_id}' / entity name=likes query=select count(1) as likes from votes where content_id = '${content.content_id}' / entity name=views query=select sum(views) as views from media_views mv join content_media cm USING (media_id) WHERE cm.content_id = '${content.content_id}' / /entity /document /dataConfig i'm triggering the import with: http://localhost:8983/solr/dataimport?command=full-importclean=truecommit=true On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, Jon: Well, you don't see that every day! Is it possible that you have something weird going on in your DDL and/or queries, like a tree schema that now suddenly has a cyclical reference? Michael On Tue, May 15, 2012 at 4:33 PM, Jon Drukman jdruk...@gmail.com wrote: I have a machine which does a full update using DataImportHandler every hour. It worked up until a little while ago. I did not change the dataconfig.xml or version of Solr. Here is the beginning of the error in the log (the real thing runs for thousands of lines) 2012-05-15 12:44:30.724166500 SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.StackOverflowError 2012-05-15 12:44:30.724168500 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) 2012-05-15 12:44:30.724169500 at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) 2012-05-15 12:44:30.724171500 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) 2012-05-15 12:44:30.724219500 at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) 2012-05-15 12:44:30.724221500 at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) 2012-05-15 12:44:30.724223500 at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError 2012-05-15 12:44:30.724225500 at java.lang.String.checkBounds(String.java:404) 2012-05-15 12:44:30.724234500 at java.lang.String.init(String.java:450) 2012-05-15 12:44:30.724235500 at java.lang.String.init(String.java:523) 2012-05-15 12:44:30.724236500 at java.net.SocketOutputStream.socketWrite0(Native Method) 2012-05-15 12:44:30.724238500 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) 2012-05-15 12:44:30.724239500 at java.net.SocketOutputStream.write(SocketOutputStream.java:153) 2012-05-15 12:44:30.724253500 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 2012-05-15 12:44:30.724254500 at
Re: doing a full-import after deleting records in the database - maxDocs
hello thanks for the reply this is the output - docsPending = 0 commits : 1786 autocommit maxDocs : 1000 autocommit maxTime : 6ms autocommits : 1786 optimizes : 3 rollbacks : 0 expungeDeletes : 0 docsPending : 0 adds : 0 deletesById : 0 deletesByQuery : 0 errors : 0 cumulative_adds : 1787752 cumulative_deletesById : 0 cumulative_deletesByQuery : 3 cumulative_errors : 0 -- View this message in context: http://lucene.472066.n3.nabble.com/doing-a-full-import-after-deleting-records-in-the-database-maxDocs-tp3983948p3983995.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Editing long Solr URLs - Chrome Extension
Erick Yes thanks I did see that and am working on a solution to that already. Hope to post a new revision shortly and eventually migrate to the extension store. Cheers Amit On May 15, 2012 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I think I put one up already, but in case I messed up github, complex params like the fq here: http://localhost:8983/solr/select?q=:fq={!geofilt sfield=store pt=52.67,7.30 d=5} aren't properly handled. But I'm already using it occasionally Erick On Tue, May 15, 2012 at 10:02 AM, Amit Nithian anith...@gmail.com wrote: Jan Thanks for your feedback! If possible can you file these requests on the github page for the extension so I can work on them? They sound like great ideas and I'll try to incorporate all of them in future releases. Thanks Amit On May 11, 2012 9:57 AM, Jan Høydahl j...@hoydahl.no wrote: I've been testing https://chrome.google.com/webstore/detail/mbnigpeabbgkmbcbhkkbnlidcobbapff?hl=enbutI don't think it's great. Great work on this one. Simple and straight forward. A few wishes: * Sticky mode? This tool would make sense in a sidebar, to do rapid refinements * If you edit a value and click TAB, it is not updated :( * It should not be necessary to URLencode all non-ascii chars - why not leave colon, caret (^) etc as is, for better readability? * Some param values in Solr may be large, such as fl, qf or bf. Would be nice if the edit box was multi-line, or perhaps adjusts to the size of the content -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 11. mai 2012, at 07:32, Amit Nithian wrote: Hey all, I don't know about you but most of the Solr URLs I issue are fairly lengthy full of parameters on the query string and browser location bars aren't long enough/have multi-line capabilities. I tried to find something that does this but couldn't so I wrote a chrome extension to help. Please check out my blog post on the subject and please let me know if something doesn't work or needs improvement. Of course this can work for any URL with a query string but my motivation was to help edit my long Solr URLs. http://hokiesuns.blogspot.com/2012/05/manipulating-urls-with-long-query.html Thanks! Amit
Re: should i upgrade
Hi, I don't think you can set that, but you may still want to upgrade. Solr 3.6 has a lower memory footprint, is faster, and has more features. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Jon Kirton jkir...@3taps.com To: solr-user@lucene.apache.org Sent: Tuesday, May 15, 2012 5:47 PM Subject: should i upgrade We're running solr v1.4.1 w/ approx 30M - 40M records at any given time. Often, socket timeout exceptions occur for a search query. Is there a compelling reason to upgrade? I.e. can u set a socket timeout in solrconfig.xml in the latest version and not in v1.4.1 ?
Re: - When is Solr 4.0 due for Release? ...
Hi Naga, I'll guess . Fall 2012. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Naga Vijayapuram nvija...@tibco.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, May 15, 2012 5:17 PM Subject: - When is Solr 4.0 due for Release? ... … Any idea, anyone? Thanks Naga
Re: Solr Caches
Rahul, Get SPM for Solr from http://sematext.com/spm and you'll get all the insight into your cache utilization you need and more, and through it you will get (faster) answers to all your questions if you play with your Solr config settings and observe cache metrics in SPM. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Rahul R rahul.s...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, May 15, 2012 3:20 PM Subject: Solr Caches Hello, I am trying to understand how I can size the caches for my solr powered application. Some details on the index and application : Solr Version : 1.3 JDK : 1.5.0_14 32 bit OS : Solaris 10 App Server : Weblogic 10 MP1 Number of documents : 1 million Total number of fields : 1000 (750 strings, 225 int/float/double/long, 25 boolean) Number of fields on which faceting and filtering can be done : 400 Physical size of index : 600MB Number of unique values for a field : Ranges from 5 - 1000. Average of 150 -Xms and -Xmx vals for jvm : 3G Expected number of concurrent users : 15 No sorting planned for now Now I want to set appropriate values for the caches. I have put below some of my understanding and questions about the caches. Please correct and answer accordingly. FilterCache: As per the solr wiki, this is used to store an unordered list of Ids of matching documents for an fq param. So if a query contains two fq params, it will create two separate entries for each of these fq params. The value of each entry is the list of ids of all documents across the index that match the corresponding fq param. Each entry is independent of any other entry. A minimum size for filterCache could be (total number of fields * avg number of unique values per field) ? Is this correct ? I have not enabled useFilterForSortedQuery. Max physical size of the filter cache would be (size * avg byte size of a document id * avg number of docs returned per fq param) ? QueryResultsCache: Used to store an ordered list of ids of the documents that match the most commonly used searches. So if my query is something like q=Status:Activefq=Org:Apachefq=Version:13, it will create one entry that contains list of ids of documents that match this full query. Is this correct ? How can I size my queryResultsCache ? Some entries from solrconfig.xml : queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached Max physical size of the filterCache would be (size * avg byte size of a document id * avg number of docs per query). Is this correct ? documentCache: Stores the documents that are stored in the index. So I do two searches that return three documents each with 1 document being common between both result sets. This will result in 5 entries in the documentCache for the 5 unique documents that have been returned for the two queries ? Is this correct ? For sizing, SolrWiki states that *The size for the documentCache should always be greater than max_results * max_concurrent_queries*. Why do we need the max_concurrent_queries parameter here ? Is it when max_results is much lesser than numDocs ? In my case, a q=*:*search is done the first time the index is loaded. So, will setting documentCache size to numDocs be correct ? Can this be like the max that I need to allocate ? Max physical size of document cache would be (size * avg byte size of a document in the index). Is this correct ? Thank you -Rahul
Re: - Solr 4.0 - How do I enable JSP support ? ...
just use the admin UI -- look at the 'cloud' tab On Tue, May 15, 2012 at 12:53 PM, Naga Vijayapuram nvija...@tibco.com wrote: Alright; thanks. Tried with -OPTIONS=jsp and am still seeing this on console Š 2012-05-15 12:47:08.837:INFO:solr:No JSP support. Check that JSP jars are in lib/jsp and that the JSP option has been specified to start.jar I am trying to go after http://localhost:8983/solr/collection1/admin/zookeeper.jsp (or its equivalent in 4.0) after going through http://wiki.apache.org/solr/SolrCloud May I know the right zookeeper url in 4.0 please? Thanks Naga On 5/15/12 10:56 AM, Ryan McKinley ryan...@gmail.com wrote: In 4.0, solr no longer uses JSP, so it is not enabled in the example setup. You can enable JSP in your servlet container using whatever method they provide. For Jetty, using start.jar, you need to add the command line: java -jar start.jar -OPTIONS=jsp ryan On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga