Re: Sorting in solr
Hi Naveen, I am not too sure what you're after but the sorting mechanism is applied after search results are fetched. >From Solr Ref Guide: https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter The sort parameter *arranges search results* in either ascending (asc) or descending (desc) order. Thanks, Sandeep On 11 July 2016 at 11:13, Naveen Pajjuriwrote: > Hi, > If i apply some sorting order on solr. when are the Documents sorted. > >1. are documents sorted after fetching the results ? >2. or we get sorted documents ? > > Regards, > Naveen >
Re: Many to Many Mapping with Solr
Thanks Alexandre, even I am of the opinion not to use solr rdbms way but i am concerned about the updates to the indexes. We're expecting around 500 writes per second to the database which will generate in >500 updates to the index per second. If the entities are denormalised this will have an impact on performance hence I was inclined to design it like db. Joel, I will explain it in a bit more detail what my use cases are, all of these should be driven by search engine: 1) user logs in and the system should display all recordings for that user 2) user adds a recording, the system is updated with the additional recording 3) user removes a recording, the system is updated with the recording removed. 4) when the user searches for a recording, the system should only display matches in his recordings. Every user-recording mapping has additional properties which are also searchable attributes. here, we are talking about 2M users and 500M recordings and this is currently driven by database of size ~60-80GB. I am going to do a small poc for these use cases and I will go with denormalised entities with search requirements as my main focus. However, if you have anything more to add, do let me know. I will be grateful. Many Thanks, Sandeep On 29 April 2016 at 14:54, Joel Bernstein <joels...@gmail.com> wrote: > We really still need to know more about your use case. In particular what > types of questions will you be asking of the data? It's useful to do this > in plain english without mapping to any specific implementation. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Apr 29, 2016 at 9:43 AM, Alexandre Rafalovitch <arafa...@gmail.com > > > wrote: > > > You do not structure Solr to represent your database. You structure it > > to represent what you will search. > > > > In your case, it sounds like you want to return 'user-records', in > > which case you will index the related information all together. Yes, > > you will possibly need to recreate the multiple documents when you > > update one record (or one user). And yes, you will have the same > > information multiple times. But you can used index-only values or > > docvalues to reduce storage and duplication. > > > > You may also want to have Solr return only the relevant IDs from the > > search and you recreate the m-to-m object structure from the database. > > Then, you don't need to store much at all, just index. > > > > Basically, don't think about your database as much when deciding Solr > > structure. It does not map one-to-one. > > > > Regards, > >Alex. > > > > Newsletter and resources for Solr beginners and intermediates: > > http://www.solr-start.com/ > > > > > > On 29 April 2016 at 20:48, Sandeep Mestry <sanmes...@gmail.com> wrote: > > > Hi All, > > > > > > Hope the day is going on well for you. > > > > > > This question has been asked before, but I couldn't find answer to my > > > specific request. I have many to many relationship and the mapping > table > > > has additional columns. Whats the best way I can model this into solr > > > entity? > > > > > > For example: a user has many recordings and a recording belongs to many > > > users. But each user-recording has additional feature like type, number > > etc. > > > I'd like to fetch recordings for the user. If the user adds/ updates/ > > > deletes a recording then that should be reflected in the search. > > > > > > I have 2 options: > > > 1) to create user entity, recording entity and user_recording entity > > > - this is good but it's like treating solr like rdbms which i mostly > > avoid.. > > > > > > 2) user entity containing all the recordings information and each > > recording > > > containing user information > > > - this has impact on index size but the fetch and manipulation will be > > > faster. > > > > > > Any guidance will be good.. > > > > > > Thanks, > > > Sandeep > > >
Many to Many Mapping with Solr
Hi All, Hope the day is going on well for you. This question has been asked before, but I couldn't find answer to my specific request. I have many to many relationship and the mapping table has additional columns. Whats the best way I can model this into solr entity? For example: a user has many recordings and a recording belongs to many users. But each user-recording has additional feature like type, number etc. I'd like to fetch recordings for the user. If the user adds/ updates/ deletes a recording then that should be reflected in the search. I have 2 options: 1) to create user entity, recording entity and user_recording entity - this is good but it's like treating solr like rdbms which i mostly avoid.. 2) user entity containing all the recordings information and each recording containing user information - this has impact on index size but the fetch and manipulation will be faster. Any guidance will be good.. Thanks, Sandeep
Re: Newbie SolR - Need advice
+1 On 3 July 2013 14:58, Jack Krupansky j...@basetechnology.com wrote: Design your own application layer for both indexing and query that knows about both SQL and Solr. Give it a REST API and then your client applications can talk to your REST API and not have to care about the details of Solr or SQL. That's the best starting point. -- Jack Krupansky -Original Message- From: fabio1605 Sent: Wednesday, July 03, 2013 4:55 AM To: solr-user@lucene.apache.org Subject: Re: Newbie SolR - Need advice Hi Sandeep Thank you for your reply Il have a read through the tutorials now that i understand the principle of all this, i would ideally like to keep mssql and bolt solr on top of this so that we can keep mssql as we have a 200GB database Cheers -- View this message in context: http://lucene.472066.n3.** nabble.com/Newbie-SolR-Need-**advice-tp4074746p4075026.htmlhttp://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4075026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie SolR - Need advice
Hi Fabio, No, Solr isn't the database replacement for MS SQL. Solr is built on top of Lucene which is a search engine library for text searches. Solr in itself is not a replacement for any database as it does not support any relational db features, however as Jack and David mentioned its fully optimised search engine platform that can provide all search related features like faceting, highlighting etc. Solr does not have a *database*. It stores the data in binary files called indexes http://lucene.apache.org/core/3_0_3/fileformats.html. These indexes are populated with the data from the database. Solr provides an inbuilt functionality through DataImportHandler component to get the data and generate indexes. When you say, your web servers are mainly doing search function, do you mean it is a text search and you use queries with clauses as 'like', 'in' etc. (in addition to multiple joints) to get the results? Does the web application need faceting? If yes, then solr can be your friend to get it through. Do remember that it always takes some time to get the new concepts from understanding through to implementation. As David mentioned already, it *is* going to be a bumpy ride at the start but *definitely* a sensational one. Good Luck, Sandeep On 2 July 2013 17:09, fabio1605 fabio.to...@btinternet.com wrote: Thanks guys So SolR is actually a database replacement for mssql... Am I right We have a lot of perl scripts that contains lots of sql insert queries. Etc How do we query the SolR database from scripts I know I have a lot to learn still so excuse my ignorance. Also... What is mongo and how does it compare I just don't understand how in 10years of Web development I have never heard of SolR till last week Sent from Samsung Mobile Original message From: David Quarterman [via Lucene] ml-node+s472066n4074772...@n3.nabble.com Date: 02/07/2013 16:57 (GMT+00:00) To: fabio1605 fabio.to...@btinternet.com Subject: RE: Newbie SolR - Need advice Hi Fabio, Like Jack says, try the tutorial. But to answer your question, SOLR isn't a bolt on to SQLServer or any other DB. It's a fantastically fast indexing/searching tool. You'll need to use the DataImportHandler (see the tutorial) to import your data from the DB into the indices that SOLR uses. Once in there, you'll have more power flexibility than SQLServer would ever give you! Haven't tried SOLR on Windows (I guess your environment) but I'm sure it'll work using Jetty or Tomcat as web container. Stick with it. The ride can be bumpy but the experience is sensational! DQ -Original Message- From: fabio1605 [mailto:[hidden email]] Sent: 02 July 2013 16:16 To: [hidden email] Subject: Newbie SolR - Need advice Hi we have a MSSQL Server which is just getting far to large now and performance is dying! the majority of our webservers mainly are doing search function so i thought it may be best to move to SolR But i know very little about it! My questions are! Does SolR Run as a bolt on to MSSQL - as in the data is still in MSSQL and SolR is just the search bit between? Im really struggling to understand the point of SOLR etc so if someone could point me to a Dummies website id apprecaite it! google is throwing to much confusion at me! -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074772.html To unsubscribe from Newbie SolR - Need advice, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074782.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie SolR - Need advice
Hi Fabio, Yes, you're on right track. I'd like to now direct you to first reply from Jack to go through solr tutorial. Even with Solr,, it will take some time to learn various bits and pieces about designing fields, their field types, server configuration, etc. and then tune the results to match the results that you're currently getting from the database. There is lots of info available for Solr on web and do check Lucidworks' Solr Reference Guide. http://docs.lucidworks.com/display/solr/Apache+Solr+Reference+Guide;jsessionid=16ED0DB3B6F6BE8CEC6E6CDB207DBC49 Best of Solr Luck! Sandeep On 2 July 2013 20:47, fabio1605 fabio.to...@btinternet.com wrote: So, you keep your mssql database, you just don't use it for searches - that'll relieve some of the load. Searches then all go through SOLR its Lucene indexes. If your various tables need SQL joins, you specify those in the DataImportHandler (DIH) config. That way, when SOLR indexes everything, it indexes the data the way you want to see it. -- SO by this you mean we keep mssql as we do!! But we use the website to run through SOLR SOLR will then handle the indexing and retrieval of data from its own index's, and will make its own calls to our MSSQL server when required(i.e updating/adding to indexs..) Am I on the right tracks there now! So MSSQL becomes the datastore SOLR becomes the search engine... -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074889.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dot operater issue.
Hi Sri, This depends on how the fields (that hold the value) are defined and how the query is generated. Try running the query in solr console and use debug=true to see how the query string is getting parsed. If that doesn't help then could you answer following 3 questions relating to your question. 1) field definition in schema.xml 2) solr query url 3) parser config from solrconfig.xml Thanks, Sandeep On 27 June 2013 10:41, Srinivasa Chegu cheg...@hcl.com wrote: Hi team, When the user enter search term as h.e.r.b.a.l in the search textbox and click on search button then SOLR search engine is not returning any results found. As I can see SOLR is accepting the request parameter as h.e.r.b.a.l. However we have many records with the string h.e.r.b.a.l as part of the product name. Look like there is an issue with dot operator in the search term. If we enter search term as herbal then it is returning search results . Our requirement is search term should be h.e.r.b.a.l then it needs to display results based on dot operator . Please help us on this issue. Regards Srinivas ::DISCLAIMER:: The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
Re: Solr 4.2.1 + Distribution scripts (rsync) Issue
Hi Hoss, Thanks for your reply, Please find answers to your questions below. *Well, for starters -- have you considered at least looking into using the java based Replicationhandler instead of the rsync scripts?* - There was an attempt to to implement java based replication but it was very slow and so that option was discarded and instead rsync was used. This was done couple of years ago and till Feb of this year, we were using Solr 1.4. I upgraded solr to 4.0 with rsync, however due to time and resource constraint rsync alternative was not evaluated and it can't be done even today - only in next release, we'll be doing solrcloud. My setup looks like below - this was working correctly with Solr 1.4, Solr 4.0 versions. 1) Index Feeder applications feeds indexes to indexer boxes. 2) A cron job that runs every minute on indexer boxes (commiter), commits the indexes (commit) and invokes snapshooter to create snapshot. rsync daemon running on indexer boxes. 3) Another cron job runs on search boxes every minute, which pulls the snapshot (using snappuller), installs it on search boxes (snapinstaller) which also notifies search to open a new searcher (commit) Additionally, there is a cron job that runs every morning at 4 am on indexer boxes which optimises the index (optimize) and cleans the snapshots until a day (snapcleaner). This is as per http://wiki.apache.org/solr/SolrCollectionDistributionScripts *Which config is this, your indexer or your searcher? (i'm assuming it's the searcher since i don't see any postCommit commands to exec snapshooter but i wanted to sanity check that wasn't a simple explanation for your problem)* - Because of this set up, I do not have any post commit setup in solrconfig.xml. - This solrconfig.xml is used for both indexer and searcher boxes. I can see that after my upgrade to Solr 4.2.1, all these scripts behave normally just that I do not see the updates getting refreshed on search boxes unless I restart. * * *What exactly does your manual commit command look like? * - This is by using commit script under bin directory (commit -h localhost -p 8983) - I have also tried URL based commit as you had mentioned but no luck *Are you doing this on the indexer box or the searcher boxes? * - I executed manual commit on searcher boxes, the indexer boxes do show the commit and updates correctly. *what is the HTTP response from this comment? what do the logs show when you do this? * - I have attached the logs, please note that I have enabled the openSearcher for testing. Thanks, please let me know if I'm missing something. I remembered people not getting their deletes and the workaround was to add _version_ field in schema, which I had done but no luck. I know it might be unrelated but I am just trying all my options. Thanks again, Sandeep On 5 June 2013 00:41, Chris Hostetter hossman_luc...@fucit.org wrote: : However, we haven't yet implemented SolrCloud and still relying on : distribution scripts - rsync, indexpuller mechanism. Well, for starters -- have you considered at least looking into using hte java based Replicationhandler instead of the rsync scripts? Script based replication has not been actively maintained since java replication was added back in Solr 1.4! : I see that the indexes are getting created on indexer boxes, snapshots : being created and then pulled across to search boxes. The snapshots are : getting installed on search boxes as well. There are no errors in the : scripts logs and this process works well. : However, when I check the update in solr console (on search boxes), I do : not see the updated result. The updates do not appear in search boxes even : after manual commit. Only after a *restart* of the search application : (deployed in tomcat) I can see the updated results. What exactly does your manual commit command look like? Are you doing this on the indexer box or the searcher boxes? what is the HTTP response from this comment? what do the logs show when you do this? It's possible that some internal changes in Solr relating to NRT improvements may have optimized away re-opening on commit if solr doesn't think the index has changed -- but i doubt it. because I just tried a simple test using the 4.3.0 example where i manually simulated snapinstaller replacing hte index files with a newer index and issued http://localhost:8983/solr/update?commit=true; and solr loaded up that new index and started searching it -- so i suspect the devil is in the details of your setup. you're sure each of the snapshooter, snappuller, snapinstaller scripts are executing properly? : I have done minimal changes for the upgrade in solrconfig.xml and is pasted : below. Please can someone take a look and let me know what the issue is. : The same config was working fine on Solr 4.0 (as well as Solr 1.4.1). which config is this, your indexer or your searcher? (i'm assuming it's the searcher since i don't see any postCommit commands
Re: Solr Faceting doesn't return values.
*str name=msgorg.apache.solr.search.SyntaxError: Cannot parse '*mm_state_code:(**TX)*': Encountered : : at line 1, column 14. Was expecting one of:* This suggests to me that you kept the df parameter in the query hence it was forming mm_state_code:mm_state_code:(TX), can you try exactly they way I gave you - i.e. without the df parameter? Also, can you post schema.xml and /select handler config from solrconfig.xml? On 22 May 2013 18:36, samabhiK qed...@gmail.com wrote: When I use your query, I get : ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status400/int int name=QTime12/int lst name=params str name=facettrue/str str name=dfmm_state_code/str str name=indenttrue/str str name=q*mm_state_code:(**TX)*/str str name=_1369244078714/str str name=debugall/str str name=facet.fieldsa_site_city/str str name=wtxml/str /lst /lst lst name=error str name=msgorg.apache.solr.search.SyntaxError: Cannot parse '*mm_state_code:(**TX)*': Encountered : : at line 1, column 14. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... LPARAMS ... NUMBER ... /str int name=code400/int /lst /response Not sure why the data wont show up. Almost all the records has the field sa_site_city has data and is also indexed. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/**field field name=office boost=2.0Bridgewater/**field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
Did you use the debugQuery=true in solr console to see how the query is being interpreted and the result calculation? Also, I'm not sure but this copyfield directive seems a bit confusing to me.. copyField source=Id dest=Suggestion / Because multiValued is false for Suggestion field so does that schema mean Suggestion has value only from Id and not from any other input? You haven't mentioned the version of Solr, can you also post the query params? On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.**KeywordTokenizerFactory / filter class=solr.**LowerCaseFilterFactory / filter class=solr.**EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr.**KeywordTokenizerFactory / filter class=solr.**LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchField**Suggestion/**defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts http://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_**http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http:**//wiki.apache.org/solr/** UpdateXmlMessages#Optional_**attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/ field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http
Re: Boosting Documents
I'm running out of options now, can't really see the issue you're facing unless the debug analysis is posted. I think a thorough debugging is required from both application and solr level. If you want a customize scoring from Solr, you can also consider overriding DefaultSimilarity implementation - but that'll be a separate issue. On 22 May 2013 11:32, Oussama Jilal jilal.ouss...@gmail.com wrote: Yes I did debug it and there is nothing special about it, everything is treated the same, My Solr version is 4.2 The copy field is used because the 2 field are of different types but only one value is indexed in them (so no multiValue is required and it works perfectly). On 05/22/2013 11:18 AM, Sandeep Mestry wrote: Did you use the debugQuery=true in solr console to see how the query is being interpreted and the result calculation? Also, I'm not sure but this copyfield directive seems a bit confusing to me.. copyField source=Id dest=Suggestion / Because multiValued is false for Suggestion field so does that schema mean Suggestion has value only from Id and not from any other input? You haven't mentioned the version of Solr, can you also post the query params? On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr. KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr. EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr. KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchFieldSuggestion/defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_** boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.**org/solr
Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core
Thanks Erick for your suggestion. Turns out I won't be going that route after all as the highlighter component is quite complicated - to follow and to override - and not much time left in hand so did it the manual (dirty) way. Beat Regards, Sandeep On 22 May 2013 12:21, Erick Erickson erickerick...@gmail.com wrote: Sandeep: You need to be a little careful here, I second Shawn's comment that you are mixing versions. You say you are using solr 4.0. But the jar that ships with that is apache-solr-core-4.0.0.jar. Then you talk about using solr-core, which is called solr-core-4.1.jar. Maven is not officially supported, so grabbing some solr-core.jar (with no apache) and doing _anything_ with it from a 4.0 code base is not a good idea. You can check out the 4.0 code branch and just compile the whole thing. Or you can get a new 4.0 distro and use the jars there. But I'd be _really_ cautious about using a 4.1 or later jar with 4.0. FWIW, Erick On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Steve, I could find solr-core.jar in the repo but could not find apache-solr-core.jar. I think my issue got misunderstood - which is totally my fault. Anyway, I took into account Shawn's comment and will use solr-core.jar only for compiling the project - not for deploying. Thanks, Sandeep On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote: The 4.0 solr-core jar is available in Maven Central: http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar Steve On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Steve, Solr 4.0 - mentioned in the subject.. :-) Thanks, Sandeep On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote: Sandeep, What version of Solr are you using? Steve On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Shawn, Thanks for your reply. I'm not mixing versions. The problem I faced is I want to override Highlighter from solr-core jar and if I add that as a dependency in my project then there was a clash between solr-core.jar and the apache-solr-core.jar that comes bundled within the solr distribution. It was complaining about MorfologikFilterFactory classcastexception. I can't use apache-solr-core.jar as a dependency as no such jar exists in any maven repo. The only thing I could do is to remove apache-solr-core.jar from solr.war and then use solr-core.jar as a dependency - however I do not think this is the ideal solution. Thanks, Sandeep On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote: On 5/20/2013 8:01 AM, Sandeep Mestry wrote: And I do remember the discussion on the forum about dropping the name *apache* from solr jars. If that's what caused this issue, then can you tell me if the mirrors need updating with solr-core.jar instead of apache-solr-core.jar? If it's named apache-solr-core, then it's from 4.0 or earlier. If it's named solr-core, then it's from 4.1 or later. That might mean that you are mixing versions - don't do that. Make sure that you have jars from the exact same version as your server. Thanks, Shawn
Re: Solr Faceting doesn't return values.
Hi There, Not sure I understand your problem correctly, but is 'mm_state_code' a real value or is it field name? Also, as Erick pointed out above, the facets are not calculated if there are no results. Hence you get no facets. You have mentioned which facets you want but you haven't mentioned which field you want to search against. That field should be defined in df parameter instead of sa_property_id. Can you post example solr document you're indexing? -Sandeep On 22 May 2013 14:28, samabhiK qed...@gmail.com wrote: Ok my bad. I do have a default field defined in the /select handler in the config file. lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfsa_property_id/str /lst But then how do I change my query now? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065298.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter query by string length or word count?
I doubt if there is any straight out of the box feature that supports this requirement, you will probably need to handle this at the index time. You can play around with Function Queries http://wiki.apache.org/solr/FunctionQuery for any such feature. On 22 May 2013 16:37, Sam Lee skyn...@gmail.com wrote: I have schema.xml field name=body type=text_en_html indexed=true stored=true omitNorms=true/ ... fieldType name=text_en_html class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType how can I query docs whose body has more than 80 words (or 80 characters) ?
Re: Solr Faceting doesn't return values.
From the response you've mentioned it appears to me that the query term TX is searched against sa_site_city instead of mm_state_code. Can you try your query like below: http://xx.xx.xx.xx/solr/collection1/select?q=*mm_state_code:(**TX)* wt=xmlindent=truefacet=truefacet.field=sa_site_citydebug=all and post your output? On 22 May 2013 17:13, samabhiK qed...@gmail.com wrote: str name=dfsa_site_city/str
Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core
Hi Shawn, Thanks for your reply. I'm not mixing versions. The problem I faced is I want to override Highlighter from solr-core jar and if I add that as a dependency in my project then there was a clash between solr-core.jar and the apache-solr-core.jar that comes bundled within the solr distribution. It was complaining about MorfologikFilterFactory classcastexception. I can't use apache-solr-core.jar as a dependency as no such jar exists in any maven repo. The only thing I could do is to remove apache-solr-core.jar from solr.war and then use solr-core.jar as a dependency - however I do not think this is the ideal solution. Thanks, Sandeep On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote: On 5/20/2013 8:01 AM, Sandeep Mestry wrote: And I do remember the discussion on the forum about dropping the name *apache* from solr jars. If that's what caused this issue, then can you tell me if the mirrors need updating with solr-core.jar instead of apache-solr-core.jar? If it's named apache-solr-core, then it's from 4.0 or earlier. If it's named solr-core, then it's from 4.1 or later. That might mean that you are mixing versions - don't do that. Make sure that you have jars from the exact same version as your server. Thanks, Shawn
Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core
Hi Steve, Solr 4.0 - mentioned in the subject.. :-) Thanks, Sandeep On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote: Sandeep, What version of Solr are you using? Steve On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Shawn, Thanks for your reply. I'm not mixing versions. The problem I faced is I want to override Highlighter from solr-core jar and if I add that as a dependency in my project then there was a clash between solr-core.jar and the apache-solr-core.jar that comes bundled within the solr distribution. It was complaining about MorfologikFilterFactory classcastexception. I can't use apache-solr-core.jar as a dependency as no such jar exists in any maven repo. The only thing I could do is to remove apache-solr-core.jar from solr.war and then use solr-core.jar as a dependency - however I do not think this is the ideal solution. Thanks, Sandeep On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote: On 5/20/2013 8:01 AM, Sandeep Mestry wrote: And I do remember the discussion on the forum about dropping the name *apache* from solr jars. If that's what caused this issue, then can you tell me if the mirrors need updating with solr-core.jar instead of apache-solr-core.jar? If it's named apache-solr-core, then it's from 4.0 or earlier. If it's named solr-core, then it's from 4.1 or later. That might mean that you are mixing versions - don't do that. Make sure that you have jars from the exact same version as your server. Thanks, Shawn
Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core
Thanks Steve, I could find solr-core.jar in the repo but could not find apache-solr-core.jar. I think my issue got misunderstood - which is totally my fault. Anyway, I took into account Shawn's comment and will use solr-core.jar only for compiling the project - not for deploying. Thanks, Sandeep On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote: The 4.0 solr-core jar is available in Maven Central: http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar Steve On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Steve, Solr 4.0 - mentioned in the subject.. :-) Thanks, Sandeep On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote: Sandeep, What version of Solr are you using? Steve On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Shawn, Thanks for your reply. I'm not mixing versions. The problem I faced is I want to override Highlighter from solr-core jar and if I add that as a dependency in my project then there was a clash between solr-core.jar and the apache-solr-core.jar that comes bundled within the solr distribution. It was complaining about MorfologikFilterFactory classcastexception. I can't use apache-solr-core.jar as a dependency as no such jar exists in any maven repo. The only thing I could do is to remove apache-solr-core.jar from solr.war and then use solr-core.jar as a dependency - however I do not think this is the ideal solution. Thanks, Sandeep On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote: On 5/20/2013 8:01 AM, Sandeep Mestry wrote: And I do remember the discussion on the forum about dropping the name *apache* from solr jars. If that's what caused this issue, then can you tell me if the mirrors need updating with solr-core.jar instead of apache-solr-core.jar? If it's named apache-solr-core, then it's from 4.0 or earlier. If it's named solr-core, then it's from 4.1 or later. That might mean that you are mixing versions - don't do that. Make sure that you have jars from the exact same version as your server. Thanks, Shawn
Highlight only when all keywords match
Dear All, I have a requirement to highlight a field only when all keywords entered match. This also needs to support phrase, operator or wildcard queries. I'm using Solr 4.0 with edismax because the search needs to be carried out on multiple fields. I know with highlighting feature I can configure a field to indicate a match, however I do not find a setting to highlight only if all keywords match. That makes me think is that the right approach to take? Can you please guide me in right direction? The edsimax config looks like below: requestHandler name=assdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qftitle^10 description^5 annotations^3 notes^2 categories/str str name=pftitle/str int name=ps0/int str name=q.alt*:*/str str name=fl*,score/str str name=mm100%/str str name=q.opAND/str str name=sortscore desc/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str str name=facet.fielduniq_subtype_id/str str name=facet.fieldcomponent_type/str str name=facet.fieldgenre_type/str /lst lst name=appends str name=fqcollection:assets/str /lst /requestHandler If I search for 'countryside number 10' as the keyword then highlight only if the 'annotations' contain all these entered search terms. Any document containing just one or two terms is not a match. Thanks, Sandeep (p.s: I haven't enabled the highlighting feature yet on this config and will be doing so only if that will fulfil the requirement I have mentioned above.)
Re: Highlight only when all keywords match
Hi Jaideep, The edismax config I have posted mentioned that the default operator is AND. I am sorry if I was not clear in my previous mail, what I need really is highlight a field when all search query terms present. The current highlighter works for *any* of the terms match and not for *all* terms match. Thanks, Sandeep On 20 May 2013 11:40, Jaideep Dhok jaideep.d...@inmobi.com wrote: Sandeep, If you AND all keywords, that should be OK? Thanks Jaideep On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry sanmes...@gmail.com wrote: Dear All, I have a requirement to highlight a field only when all keywords entered match. This also needs to support phrase, operator or wildcard queries. I'm using Solr 4.0 with edismax because the search needs to be carried out on multiple fields. I know with highlighting feature I can configure a field to indicate a match, however I do not find a setting to highlight only if all keywords match. That makes me think is that the right approach to take? Can you please guide me in right direction? The edsimax config looks like below: requestHandler name=assdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qftitle^10 description^5 annotations^3 notes^2 categories/str str name=pftitle/str int name=ps0/int str name=q.alt*:*/str str name=fl*,score/str str name=mm100%/str str name=q.opAND/str str name=sortscore desc/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str str name=facet.fielduniq_subtype_id/str str name=facet.fieldcomponent_type/str str name=facet.fieldgenre_type/str /lst lst name=appends str name=fqcollection:assets/str /lst /requestHandler If I search for 'countryside number 10' as the keyword then highlight only if the 'annotations' contain all these entered search terms. Any document containing just one or two terms is not a match. Thanks, Sandeep (p.s: I haven't enabled the highlighting feature yet on this config and will be doing so only if that will fulfil the requirement I have mentioned above.) -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
Re: Highlight only when all keywords match
I doubt if that will be the correct approach as it will be hard to generate the query grammar considering we have support for phrase, operator, wildcard and group queries. That's why I have kept it simple and only passing the query text with minimal parsing (escaping lucene special characters) to configured edismax. The number of fields I have mentioned above are a lot lesser than the actual number of fields - around 50 in number :-). So forming such a long query will both be time and resource consuming. Further, it's not going to fulfill my requirement anyway because I do not want to change my search results, the requirement is only to provide a highlight if a field is matched for all the query terms. Thanks, Sandeep On 20 May 2013 12:02, Jaideep Dhok jaideep.d...@inmobi.com wrote: If you know all fields that need to be queried, you can rewrite it as - (assuming, f1, f2 are the fields that you have to search) (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn) - Jaideep On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Jaideep, The edismax config I have posted mentioned that the default operator is AND. I am sorry if I was not clear in my previous mail, what I need really is highlight a field when all search query terms present. The current highlighter works for *any* of the terms match and not for *all* terms match. Thanks, Sandeep On 20 May 2013 11:40, Jaideep Dhok jaideep.d...@inmobi.com wrote: Sandeep, If you AND all keywords, that should be OK? Thanks Jaideep On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry sanmes...@gmail.com wrote: Dear All, I have a requirement to highlight a field only when all keywords entered match. This also needs to support phrase, operator or wildcard queries. I'm using Solr 4.0 with edismax because the search needs to be carried out on multiple fields. I know with highlighting feature I can configure a field to indicate a match, however I do not find a setting to highlight only if all keywords match. That makes me think is that the right approach to take? Can you please guide me in right direction? The edsimax config looks like below: requestHandler name=assdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qftitle^10 description^5 annotations^3 notes^2 categories/str str name=pftitle/str int name=ps0/int str name=q.alt*:*/str str name=fl*,score/str str name=mm100%/str str name=q.opAND/str str name=sortscore desc/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str str name=facet.fielduniq_subtype_id/str str name=facet.fieldcomponent_type/str str name=facet.fieldgenre_type/str /lst lst name=appends str name=fqcollection:assets/str /lst /requestHandler If I search for 'countryside number 10' as the keyword then highlight only if the 'annotations' contain all these entered search terms. Any document containing just one or two terms is not a match. Thanks, Sandeep (p.s: I haven't enabled the highlighting feature yet on this config and will be doing so only if that will fulfil the requirement I have mentioned above.) -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system
Re: Highlight only when all keywords match
Thanks Upayavira for that valuable suggestion. I believe overriding highlight component should be the way forward. Could you tell me if there is any existing example or which methods I should particularly override? Thanks, Sandeep On 20 May 2013 12:47, Upayavira u...@odoko.co.uk wrote: If you are saying that you want to change highlighting behaviour, not query behaviour, then I suspect you are going to have to interact with the java HighlightComponent. If you can work out how to update that component to behave as you wish, you could either subclass it, or create your own implementation that you can include in your Solr setup. Or, if you make it generic enough, offer it back as a contribution that can be included in future Solr releases. Upayavira On Mon, May 20, 2013, at 12:14 PM, Sandeep Mestry wrote: I doubt if that will be the correct approach as it will be hard to generate the query grammar considering we have support for phrase, operator, wildcard and group queries. That's why I have kept it simple and only passing the query text with minimal parsing (escaping lucene special characters) to configured edismax. The number of fields I have mentioned above are a lot lesser than the actual number of fields - around 50 in number :-). So forming such a long query will both be time and resource consuming. Further, it's not going to fulfill my requirement anyway because I do not want to change my search results, the requirement is only to provide a highlight if a field is matched for all the query terms. Thanks, Sandeep On 20 May 2013 12:02, Jaideep Dhok jaideep.d...@inmobi.com wrote: If you know all fields that need to be queried, you can rewrite it as - (assuming, f1, f2 are the fields that you have to search) (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn) - Jaideep On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Jaideep, The edismax config I have posted mentioned that the default operator is AND. I am sorry if I was not clear in my previous mail, what I need really is highlight a field when all search query terms present. The current highlighter works for *any* of the terms match and not for *all* terms match. Thanks, Sandeep On 20 May 2013 11:40, Jaideep Dhok jaideep.d...@inmobi.com wrote: Sandeep, If you AND all keywords, that should be OK? Thanks Jaideep On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry sanmes...@gmail.com wrote: Dear All, I have a requirement to highlight a field only when all keywords entered match. This also needs to support phrase, operator or wildcard queries. I'm using Solr 4.0 with edismax because the search needs to be carried out on multiple fields. I know with highlighting feature I can configure a field to indicate a match, however I do not find a setting to highlight only if all keywords match. That makes me think is that the right approach to take? Can you please guide me in right direction? The edsimax config looks like below: requestHandler name=assdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qftitle^10 description^5 annotations^3 notes^2 categories/str str name=pftitle/str int name=ps0/int str name=q.alt*:*/str str name=fl*,score/str str name=mm100%/str str name=q.opAND/str str name=sortscore desc/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str str name=facet.fielduniq_subtype_id/str str name=facet.fieldcomponent_type/str str name=facet.fieldgenre_type/str /lst lst name=appends str name=fqcollection:assets/str /lst /requestHandler If I search for 'countryside number 10' as the keyword then highlight only if the 'annotations' contain all these entered search terms. Any document containing just one or two terms is not a match. Thanks, Sandeep (p.s: I haven't enabled the highlighting feature yet on this config and will be doing so only if that will fulfil the requirement I have mentioned above.) -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action
Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core
Hi All, I want to override a component from solr-core and for that I need solr-core jar. I am using the solr.war that comes from Apache mirror and if I open the war, I see the solr-core jar is actually named as apache-solr-core.jar. This is also true about solrj jar. If I now provide a dependency in my module for apache-solr-core.jar, it's not being found in the mirror. And if I use solr-core.jar, I get strange class cast exception during Solr startup for MorfologikFilterFactory. (I'm not using this factory at all in my project.) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.lang.ClassCastException: class org.apache.lucene.analysis.morfologik.MorfologikFilterFactory at java.lang.Class.asSubclass(Unknown Source) at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:126) at org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:73) at org.apache.lucene.analysis.util.AnalysisSPILoader.init(AnalysisSPILoader.java:55) I tried manually removing the apache-solr-core.jar from the solr distribution war and then providing the dependency and everything worked fine. And I do remember the discussion on the forum about dropping the name *apache* from solr jars. If that's what caused this issue, then can you tell me if the mirrors need updating with solr-core.jar instead of apache-solr-core.jar? Many Thanks, Sandeep
Re: Question about Edismax - Solr 4.0
Hello Jack, Thanks for pointing the issues out and for your valuable suggestion. My preliminary tests were okay on search but I will be doing more testing to see if this has impacted any other searches. Thanks once again and have a nice sunny weekend, Sandeep On 17 May 2013 05:35, Jack Krupansky j...@basetechnology.com wrote: Ah... I think your issue is the preserveOriginal=1 on the query analyzer as well as the fact that you have all of these catenatexx=1 options on the query analyzer - I indicated that you should remove them all. The problem is that the whitespace analyzer leaves the leading comma in place, and the preserveOriginal=1 also generates an extra token for the term, with the comma in place . But, with the space, the comma and 10 are separate terms and get analyzed independently. The query results probably indicate that you don't have that exact combination of the term and leading punctuation - or that there is no standalone comma in your input data. Try the following replacement for the query-time WDF: filter class=solr.**WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=0 / -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 5:50 PM To: solr-user@lucene.apache.org Subject: Re: Question about Edismax - Solr 4.0 Hi Jack, Thanks for your response again and for helping me out to get through this. The URL is definitely encoded for spaces and it looks like below. As I mentioned in my previous mail, I can't add it to query parameter as that searches on multiple fields. The title field is defined as below: field name=title type=text_wc indexed=true stored=false multiValued=true/ q=countrysiderows=20qt=**assdismaxfq=%28title%3A%28,** 10%29%29fq=collection:assets requestHandler name=assdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/**str float name=tie0.01/float str name=qftitle^10 description^5 annotations^3 notes^2 categories/str str name=pftitle/str int name=ps0/int str name=q.alt*:*/str str name=fl*,score/str str name=mm100%/str str name=q.opAND/str str name=sortscore desc/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str str name=facet.fielduniq_**subtype_id/str str name=facet.fieldcomponent_**type/str str name=facet.fieldgenre_type**/str /lst lst name=appends str name=fqcollection:assets/**str /lst /requestHandler The term 'countryside' needs to be searched against multiple fields including titles, descriptions, annotations, categories, notes but the UI also has a feature to limit results by providing a title field. I can see that the filter queries are always parsed by LuceneQueryParser however I'd expect it to generate the parsed_filter_queries debug output in every situation. I have tried it as the main query with both edismax and lucene defType and it gives me correct output and correct results. But, there is some problem when this is used as a filter query as the the parser is not able to parse a comma with a space. Thanks again Jack, please let me know in case you need more inputs from my side. Best Regards, Sandeep On 16 May 2013 18:03, Jack Krupansky j...@basetechnology.com wrote: Could you show us the full query URL - spaces must be encoded in URL query parameters. Also show the actual field XML - you omitted that. Try the same query as a main query, using both defType=edismax and defType=lucene. Note that the filter query is parsed using the Lucene query parser, not edismax, independent of the defType parameter. But you don't have any edismax features in your fq anyway. But you can stick {!edismax} in front of the query to force edismax to be used for the fq, although it really shouldn't change anything: Also, catenate is fine for indexing, but will mess up your queries at query time, so set them to 0 in the query analyzer Also, make sure you have autoGeneratePhraseQueries=true on the field type, but that's not the issue here. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Question about Edismax - Solr 4.0 Thanks Jack for your reply.. The problem is, I'm finding results for fq=title:(,10) but not for fq=title:(, 10) - apologies if that was not clear from my first mail. I have already mentioned the debug analysis in my previous mail. Additionally, the title field is defined as below: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1
Question about Edismax - Solr 4.0
-- *Edismax and Filter Queries with Commas and spaces* -- Dear Experts, This appears to be a bug, please suggest if I'm wrong. If I search with the following filter query, 1) fq=title:(, 10) - I get no results. - The debug output does NOT show the section containing parsed_filter_queries if I carry a search with the filter query, 2) fq=title:(,10) - (No space between , and 10) - I get results and the debug output shows the parsed filter queries section as, arr name=filter_queries str(titles:(,10))/str str(collection:assets)/str As you can see above, I'm also passing in other filter queries (collection:assets) which appear correctly but they do not appear in case 1 above. I can't make this as part of the query parameter as that needs to be searched against multiple fields. Can someone suggest a fix in this case please. I'm using Solr 4.0. Many Thanks, Sandeep
Re: Question about Edismax - Solr 4.0
Thanks Jack for your reply.. The problem is, I'm finding results for fq=title:(,10) but not for fq=title:(, 10) - apologies if that was not clear from my first mail. I have already mentioned the debug analysis in my previous mail. Additionally, the title field is defined as below: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I have the set catenate options to 1 for all types. I can understand if ',' getting ignored when it is on its own (title:(, 10)) but - Why solr is not searching for 10 in that case just like it did when the query was (title:(,10))? - And why other filter queries did not show up (collection:assets) in debug section? Thanks, Sandeep On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote: You haven't indicated any problem here! What is the symptom that you actually think is a problem. There is no comma operator in any of the Solr query parsers. Comma is just another character that may or may not be included or discarded depending on the specific field type and analyzer. For example, a white space analyzer will keep commas, but the standard analyzer or the word delimiter filter will discard them. If title were a string type, all punctuation would be preserved, including commas and spaces (but spaces would need to be escaped or the term text enclosed in parentheses.) Let us know what your symptom is though, first. I mean, the filter query looks perfectly reasonable from an abstract perspective. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 6:51 AM To: solr-user@lucene.apache.org Subject: Question about Edismax - Solr 4.0 -- *Edismax and Filter Queries with Commas and spaces* -- Dear Experts, This appears to be a bug, please suggest if I'm wrong. If I search with the following filter query, 1) fq=title:(, 10) - I get no results. - The debug output does NOT show the section containing parsed_filter_queries if I carry a search with the filter query, 2) fq=title:(,10) - (No space between , and 10) - I get results and the debug output shows the parsed filter queries section as, arr name=filter_queries str(titles:(,10))/str str(collection:assets)/str As you can see above, I'm also passing in other filter queries (collection:assets) which appear correctly but they do not appear in case 1 above. I can't make this as part of the query parameter as that needs to be searched against multiple fields. Can someone suggest a fix in this case please. I'm using Solr 4.0. Many Thanks, Sandeep
Re: Question about Edismax - Solr 4.0
Hi Jack, Thanks for your response again and for helping me out to get through this. The URL is definitely encoded for spaces and it looks like below. As I mentioned in my previous mail, I can't add it to query parameter as that searches on multiple fields. The title field is defined as below: field name=title type=text_wc indexed=true stored=false multiValued=true/ q=countrysiderows=20qt=assdismaxfq=%28title%3A%28,10%29%29fq=collection:assets requestHandler name=assdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qftitle^10 description^5 annotations^3 notes^2 categories/str str name=pftitle/str int name=ps0/int str name=q.alt*:*/str str name=fl*,score/str str name=mm100%/str str name=q.opAND/str str name=sortscore desc/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str str name=facet.fielduniq_subtype_id/str str name=facet.fieldcomponent_type/str str name=facet.fieldgenre_type/str /lst lst name=appends str name=fqcollection:assets/str /lst /requestHandler The term 'countryside' needs to be searched against multiple fields including titles, descriptions, annotations, categories, notes but the UI also has a feature to limit results by providing a title field. I can see that the filter queries are always parsed by LuceneQueryParser however I'd expect it to generate the parsed_filter_queries debug output in every situation. I have tried it as the main query with both edismax and lucene defType and it gives me correct output and correct results. But, there is some problem when this is used as a filter query as the the parser is not able to parse a comma with a space. Thanks again Jack, please let me know in case you need more inputs from my side. Best Regards, Sandeep On 16 May 2013 18:03, Jack Krupansky j...@basetechnology.com wrote: Could you show us the full query URL - spaces must be encoded in URL query parameters. Also show the actual field XML - you omitted that. Try the same query as a main query, using both defType=edismax and defType=lucene. Note that the filter query is parsed using the Lucene query parser, not edismax, independent of the defType parameter. But you don't have any edismax features in your fq anyway. But you can stick {!edismax} in front of the query to force edismax to be used for the fq, although it really shouldn't change anything: Also, catenate is fine for indexing, but will mess up your queries at query time, so set them to 0 in the query analyzer Also, make sure you have autoGeneratePhraseQueries=**true on the field type, but that's not the issue here. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Question about Edismax - Solr 4.0 Thanks Jack for your reply.. The problem is, I'm finding results for fq=title:(,10) but not for fq=title:(, 10) - apologies if that was not clear from my first mail. I have already mentioned the debug analysis in my previous mail. Additionally, the title field is defined as below: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.**LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.**LowerCaseFilterFactory/ /analyzer /fieldType I have the set catenate options to 1 for all types. I can understand if ',' getting ignored when it is on its own (title:(, 10)) but - Why solr is not searching for 10 in that case just like it did when the query was (title:(,10))? - And why other filter queries did not show up (collection:assets) in debug section? Thanks, Sandeep On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote: You haven't indicated any problem here! What is the symptom that you actually think is a problem. There is no comma operator in any of the Solr query parsers. Comma is just another character that may or may not be included or discarded depending on the specific field type and analyzer. For example, a white space analyzer will keep commas, but the standard analyzer or the word delimiter filter will discard them. If title were a string type, all punctuation would
Solr Sorting Algorithm
Good Morning All, The alphabetical sorting is causing slight issues as below: I have 3 documents with title value as below: 1) Acer Palmatum (Tree) 2) Aceraceae (Tree Family) 3) Acer Pseudoplatanus (Tree) I have created title_sort field which is defined with field type as alphaNumericalSort (that comes with solr example schema) When I apply the sort order (sort=title_sort asc), I get the results as: Aceraceae (Tree Family) Acer Palmatum (Tree) Acer Pseudoplatanus (Tree) But, the expected order is (spaces first), Acer Palmatum (Tree) Acer Pseudoplatanus (Tree) Aceraceae (Tree Family) My unit test contains Collections.sort method and I get the expected results but I'm not sure why Solr is doing it in different way. From Collections.sort API, I can see that it uses modified merge sort, could you tell me which algorithm solr follows for sorting logic and also if there is any other approach I can take? Many Thanks, Sandeep
Re: commit in solr4 takes a longer time
That's not ideal. Can you post solrconfig.xml? On 3 May 2013 07:41, vicky desai vicky.de...@germinait.com wrote: Hi sandeep, I made the changes u mentioned and tested again for the same set of docs but unfortunately the commit time increased. -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: commit in solr4 takes a longer time
Hi Vicky, I faced this issue as well and after some playing around I found the autowarm count in cache sizes to be a problem. I changed that from a fixed count (3072) to percentage (10%) and all commit times were stable then onwards. filterCache class=solr.FastLRUCache size=8192 initialSize=3072 autowarmCount=10% / queryResultCache class=solr.LRUCache size=16384 initialSize=3072 autowarmCount=10% / documentCache class=solr.LRUCache size=8192 initialSize=4096 autowarmCount=10% / HTH, Sandeep On 2 May 2013 16:31, Alexandre Rafalovitch arafa...@gmail.com wrote: If you don't re-open the searcher, you will not see new changes. So, if you only have hard commit, you never see those changes (until restart). But if you also have soft commit enabled, that will re-open your searcher for you. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 2, 2013 at 11:21 AM, Furkan KAMACI furkankam...@gmail.com wrote: What happens exactly when you don't open searcher at commit? 2013/5/2 Gopal Patwa gopalpa...@gmail.com you might want to added openSearcher=false for hard commit, so hard commit also act like soft commit autoCommit maxDocs5/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com wrote: Hi, I am using 1 shard and two replicas. Document size is around 6 lakhs My solrconfig.xml is as follows ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion indexConfig maxFieldLength2147483647/maxFieldLength lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup /indexConfig updateHandler class=solr.DirectUpdateHandler2 autoSoftCommit maxDocs500/maxDocs maxTime1000/maxTime /autoSoftCommit autoCommit maxDocs5/maxDocs maxTime30/maxTime /autoCommit /updateHandler requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=204800 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/replication class=solr.ReplicationHandler / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} / enableLazyFieldLoadingtrue/enableLazyFieldLoading admin defaultQuery*:*/defaultQuery /admin /config -- View this message in context: http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom sorting of Solr Results
Dear Experts, I have a requirement for the exact matches and applying alphabetical sorting thereafter. To illustrate, the results should be sorted in exact matches and all later alphabetical. So, if there are 5 documents as below Doc1 title: trees Doc 2 title: plum trees Doc 3 title: Money Trees (Legendary Trees) Doc 4 title: Cork Trees Doc 5 title: Old Trees Then, if user searches with query term as 'trees', the results should be in following order: Doc 1 trees - Highest Rank Doc 4 Cork Trees - Alphabetical afterwards.. Doc 3 Money Trees (Legendary Trees) Doc 5 Old Trees Doc 2 plum trees I can achieve the alphabetical sorting by adding the title sort parameter, However, Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so it arranges Doc 3 above Doc 4, 5 and 2). So, it looks like: Doc 1 trees - Highest Rank Doc 3 Money Trees (Legendary Trees) Doc 4 Cork Trees - Alphabetical afterwards.. Doc 5 Old Trees Doc 2 plum trees Can you tell me an easy way to achieve this requirement please? I'm using Solr 4.0 and the *title *field is defined as follows: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Many Thanks in advance, Sandeep
Re: Exact and Partial Matches
Thanks Erick, I tried grouping and it appears to work okay. However, I will need to change the client to parse the output.. fq=title:(tree)group=truegroup.query=title:(trees) NOT title_ci:treesgroup.query=title_ci:blairgroup.sort=title_sort descsort=score desc,title_sort asc I used the actual query as the filter query so my scores will be 1 and then used 2 group queries - one which will give me exact matches and other that will give me partial minus exact matches. I have tried this with operators too and it seems to be doing the job I want, do you see any issue in this? Thanks again for your reply and by the way thanks for SOLR-4662. -S On 30 April 2013 15:06, Erick Erickson erickerick...@gmail.com wrote: I don't think you can do that. You're essentially trying to mix ordering of the result set. You _might_ be able to kludge some of this with grouping, but I doubt it. You'll need two queries I'd guess. Best Erick On Mon, Apr 29, 2013 at 9:44 AM, Sandeep Mestry sanmes...@gmail.com wrote: Dear Experts, I have a requirement for the exact matches and applying alphabetical sorting thereafter. To illustrate, the results should be sorted in exact matches and all later alphabetical. So, if there are 5 documents as below Doc1 title: trees Doc 2 title: plum trees Doc 3 title: Money Trees (Legendary Trees) Doc 4 title: Cork Trees Doc 5 title: Old Trees Then, if user searches with query term as 'trees', the results should be in following order: Doc 1 trees - Highest Rank Doc 4 Cork Trees - Alphabetical afterwards.. Doc 3 Money Trees (Legendary Trees) Doc 5 Old Trees Doc 2 plum trees I can achieve the alphabetical sorting by adding the title sort parameter, However, Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so it arranges Doc 3 above Doc 4, 5 and 2). So, it looks like: Doc 1 trees - Highest Rank Doc 3 Money Trees (Legendary Trees) Doc 4 Cork Trees - Alphabetical afterwards.. Doc 5 Old Trees Doc 2 plum trees Can you tell me an easy way to achieve this requirement please? I'm using Solr 4.0 and the *title *field is defined as follows: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Many Thanks in advance, Sandeep
Exact and Partial Matches
Dear Experts, I have a requirement for the exact matches and applying alphabetical sorting thereafter. To illustrate, the results should be sorted in exact matches and all later alphabetical. So, if there are 5 documents as below Doc1 title: trees Doc 2 title: plum trees Doc 3 title: Money Trees (Legendary Trees) Doc 4 title: Cork Trees Doc 5 title: Old Trees Then, if user searches with query term as 'trees', the results should be in following order: Doc 1 trees - Highest Rank Doc 4 Cork Trees - Alphabetical afterwards.. Doc 3 Money Trees (Legendary Trees) Doc 5 Old Trees Doc 2 plum trees I can achieve the alphabetical sorting by adding the title sort parameter, However, Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so it arranges Doc 3 above Doc 4, 5 and 2). So, it looks like: Doc 1 trees - Highest Rank Doc 3 Money Trees (Legendary Trees) Doc 4 Cork Trees - Alphabetical afterwards.. Doc 5 Old Trees Doc 2 plum trees Can you tell me an easy way to achieve this requirement please? I'm using Solr 4.0 and the *title *field is defined as follows: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Many Thanks in advance, Sandeep
Re: Exact matching in Solr 3.6.1
Hi Pawel, Not sure which parser you are using, I am using edismax and tried using the bq parameter to boost the results having exact matches at the top. You may try something like: q=cats AND London NOT Leedsbq=cats^50 In edismax, pf and pf2 parameters also need some tuning to get the results at the top. HTH, Sandeep On 25 April 2013 10:33, vsl ociepa.pa...@gmail.com wrote: Hi, is it possible to get exact matched result if the search term is combined e.g. cats AND London NOT Leeds In the previus threads I have read that it is possible to create new field of String type and perform phrase search on it but nowhere the above mentioned combined search term had been taken into consideration. BR Pawel -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact matching in Solr 3.6.1
I think in that case, making a field String type is your option, however remember that it'd be case sensitive. Another approach is to create a case insensitive field type and doing searches on those fields only. fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true compressThreshold=10 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Can you provide your fields and dismax config and if possible records you would like and records you do not want? -S On 25 April 2013 11:50, vsl ociepa.pa...@gmail.com wrote: Thanks for your reply. I am using edismax as well. What I want to get is the exact match without other results that could be close to the given term. -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact matching in Solr 3.6.1
Agree with Jack. The current field type text_general is designed to match the query tokens instead of exact matches - so it's not able to fulfill your requirements. Can you use flat file http://wiki.apache.org/solr/FileBasedSpellCheckeras spell check dictionary instead and that way you can search on exact matched field while generating spell check suggestions from the file instead of from index? -S On 25 April 2013 16:25, Jack Krupansky j...@basetechnology.com wrote: Well then just do an exact match ONLY! It sounds like you haven't worked out the inconsistencies in your requirements. To be clear: We're not offering you solutions - that's your job. We're only pointing out tools that you can use. It is up to you to utilize the tools wisely to implement your solution. I suspect that you simply haven't experimented enough with various boosts to assure that the unstemmed result is consistently higher. Maybe you need a custom stemmer or stemmer overide so that passengers does get stemmed to passenger, but cats does not (but dogs does.) That can be a choice that you can make, but I would urge caution. Still, it is a decision that you can make - it's not a matter of Solr forcing or preventing you. I still think boosting of an unstemmed field should be sufficient. But until you clarify the inconsistencies in your requirements, we won't be able to make much progress. -- Jack Krupansky -Original Message- From: vsl Sent: Thursday, April 25, 2013 10:45 AM To: solr-user@lucene.apache.org Subject: Re: Exact matching in Solr 3.6.1 Thanks for your reply but this solution does not fullfil my requirment because other documents (not exact matched) will be returned as well. -- View this message in context: http://lucene.472066.n3.** nabble.com/Exact-matching-in-**Solr-3-6-1-tp4058865p4058929.**htmlhttp://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Exact Matches - edismax
Hi Jan, Thanks for your reply. I have defined string_ci like below: fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true compressThreshold=10 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType When I analyse the query in solr, I saw that document containing pg_series_title_ci:funny matches when I do a search for pg_series_title_ci:funny games and is ranked higher than the document containing the exact matches. I can use the default string data type but then the match will be on exact casing. Thanks, Sandeep On 3 April 2013 22:20, Jan Høydahl jan@cominvent.com wrote: Can you show us your *_ci field type? Solr does not really have a way to tell whether a match is exact or only partial, but you could hack around it with the fieldType. See https://github.com/cominvent/exactmatch for a possible solution. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 3. apr. 2013 kl. 15:55 skrev Sandeep Mestry sanmes...@gmail.com: Hi All, I have a requirement where in exact matches for 2 fields (Series Title, Title) should be ranked higher than the partial matches. The configuration looks like below: requestHandler name=assetdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf*pg_series_title_ci*^500 *title_ci*^300 * pg_series_title*^200 *title*^25 classifications^15 classifications_texts^15 parent_classifications^10 synonym_classifications^5 pg_brand_title^5 pg_series_working_title^5 p_programme_title^5 p_item_title^5 p_interstitial_title^5 description^15 pg_series_description annotations^0.1 classification_notes^0.05 pv_program_version_number^2 pv_program_version_number_ci^2 pv_program_number^2 pv_program_number_ci^2 p_program_number^2 ma_version_number^2 ma_recording_location ma_contributions^0.001 rel_pg_series_title rel_programme_title rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5 pv_uuid^0.5 ma_uuid^0.5/str str name=pfpg_series_title_ci^500 title_ci^500/str int name=ps0/int str name=q.alt*:*/str str name=mm100%/str str name=q.opAND/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str /lst /requestHandler As you can see above, the search is against many fields. What I'd want is the documents that have exact matches for series title and title fields should rank higher than the rest. I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields for series title and title and have boosted them higher over the tokenized and rest of the fields. I have also implemented a similarity class to override idf however I still get documents having partial matches in title and other fields ranking higher than exact match in pg_series_title_ci. Many Thanks, Sandeep
Re: Question on Exact Matches - edismax
Another problem that I see in Solr analysis is the query term that matches the tokenized field does not match on the case insensitive field. So, if I'm searching for 'coast to coast', I see that the tokenized series title (pg_series_title) is matched but not the ci field which is pg_series_title_ci. The definition of both field is as below: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true compressThreshold=10 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=pg_series_title type=text_wc indexed=true stored=true multiValued=false / field name=pg_series_title_ci type=string_ci indexed=true stored=true multiValued=false / *copyField source=pg_series_title dest=pg_series_title_ci /* * * *Can this copyfield directive be an issue? Should it be other way round or does it matter?* Thanks, Sandeep On 4 April 2013 10:38, Sandeep Mestry sanmes...@gmail.com wrote: Hi Jan, Thanks for your reply. I have defined string_ci like below: fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true compressThreshold=10 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType When I analyse the query in solr, I saw that document containing pg_series_title_ci:funny matches when I do a search for pg_series_title_ci:funny games and is ranked higher than the document containing the exact matches. I can use the default string data type but then the match will be on exact casing. Thanks, Sandeep On 3 April 2013 22:20, Jan Høydahl jan@cominvent.com wrote: Can you show us your *_ci field type? Solr does not really have a way to tell whether a match is exact or only partial, but you could hack around it with the fieldType. See https://github.com/cominvent/exactmatch for a possible solution. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 3. apr. 2013 kl. 15:55 skrev Sandeep Mestry sanmes...@gmail.com: Hi All, I have a requirement where in exact matches for 2 fields (Series Title, Title) should be ranked higher than the partial matches. The configuration looks like below: requestHandler name=assetdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf*pg_series_title_ci*^500 *title_ci*^300 * pg_series_title*^200 *title*^25 classifications^15 classifications_texts^15 parent_classifications^10 synonym_classifications^5 pg_brand_title^5 pg_series_working_title^5 p_programme_title^5 p_item_title^5 p_interstitial_title^5 description^15 pg_series_description annotations^0.1 classification_notes^0.05 pv_program_version_number^2 pv_program_version_number_ci^2 pv_program_number^2 pv_program_number_ci^2 p_program_number^2 ma_version_number^2 ma_recording_location ma_contributions^0.001 rel_pg_series_title rel_programme_title rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5 pv_uuid^0.5 ma_uuid^0.5/str str name=pfpg_series_title_ci^500 title_ci^500/str int name=ps0/int str name=q.alt*:*/str str name=mm100%/str str name=q.opAND/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str /lst /requestHandler As you can see above, the search is against many fields. What I'd want is the documents that have exact matches for series title and title fields should rank higher than the rest. I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields for series title and title and have boosted them higher over the tokenized and rest of the fields. I have
Question on Exact Matches - edismax
Hi All, I have a requirement where in exact matches for 2 fields (Series Title, Title) should be ranked higher than the partial matches. The configuration looks like below: requestHandler name=assetdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf*pg_series_title_ci*^500 *title_ci*^300 * pg_series_title*^200 *title*^25 classifications^15 classifications_texts^15 parent_classifications^10 synonym_classifications^5 pg_brand_title^5 pg_series_working_title^5 p_programme_title^5 p_item_title^5 p_interstitial_title^5 description^15 pg_series_description annotations^0.1 classification_notes^0.05 pv_program_version_number^2 pv_program_version_number_ci^2 pv_program_number^2 pv_program_number_ci^2 p_program_number^2 ma_version_number^2 ma_recording_location ma_contributions^0.001 rel_pg_series_title rel_programme_title rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5 pv_uuid^0.5 ma_uuid^0.5/str str name=pfpg_series_title_ci^500 title_ci^500/str int name=ps0/int str name=q.alt*:*/str str name=mm100%/str str name=q.opAND/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str /lst /requestHandler As you can see above, the search is against many fields. What I'd want is the documents that have exact matches for series title and title fields should rank higher than the rest. I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields for series title and title and have boosted them higher over the tokenized and rest of the fields. I have also implemented a similarity class to override idf however I still get documents having partial matches in title and other fields ranking higher than exact match in pg_series_title_ci. Many Thanks, Sandeep
Re: How to give more more importance to a document if term match is more
Hi Pragyanshis, I faced a similar problem few days ago and I was advised on this forum to override Solr DefaultSimilairy calculation to return always a constant value for idf. I think, in your case you'd also want to suppress the length norm which will require re-indexing as length norm is calculated during indexing. The link of my issue is as below: http://lucene.472066.n3.nabble.com/Possible-issue-in-edismax-td4037397.html Cheers, Sandeep On 14 February 2013 19:20, Pragyanshis Pattanaik pragyans...@outlook.comwrote: Hi, My schema is like below. fields dynamicField name=Subject-Name-* type=string indexed=true stored=true/dynamicField name=Subject-Mark-* type=int indexed=true stored=true//fields My need is to search only three subject fields and boost those subjects which has a higher Mark(Mark can be in between 1 - 10). Again Top subjects will get a higher boost than preceding one's. Like if a search term is present in Subject-Name-1,Then it will get a higher boost than Subject-Name-2 and Subject-Name-3. Similarly Subject-Mark-1 will get higher boost than Subject-Mark-2 and Subject-Mark-3. To achieve this i am querying over subject fields and my query looks like below. q=+Economics+Geographywt=xmldeftype=edismaxqf=Subject-Name-1+Subject-Name-2+Subject-Name-3bq=Subject-Name-1%3AEconomics%3BGeography^50.0+Subject-Mark-1%3A20^90.0+Subject-Mark-1%3A9^80.0+Subject-Mark-1%3A8^70.0+Subject-Mark-1%3A7^60.0+Subject-Name-2%3AEconomics%3BGeography^45.0+Subject-Mark-2%3A20^90.0+Subject-Mark-2%3A9^80.0+Subject-Mark-2%3A8^70.0+Subject-Mark-2%3A7^60.0+Subject-Name-3%3AEconomics%3BGeography^40.0+Subject-Mark-3%3A20^90.0+Subject-Mark-3%3A9^80.0+Subject-Mark-3%3A8^70.0+Subject-Mark-3%3A7^60.0 If i am having four documents like below docstr name=Subject-Name-1Economics/strstr name=Subject-Name-1Geography/strstr name=Subject-Name-1History/strint name=Subject-Name-17/int int name=Subject-Name-17/intint name=Subject-Name-16/int /docdocstr name=Subject-Name-1Economics/strstr name=Subject-Name-1History/strstr name=Subject-Name-1Geography/strint name=Subject-Name-18/int int name=Subject-Name-18/intint name=Subject-Name-15/int /docdocstr name=Subject-Name-1Economics/str str name=Subject-Name-1History/strstr name=Subject-Name-1Geography/strint name=Subject-Name-19/int int name=Subject-Name-16/intint name=Subject-Name-17/int /docdocstr name=Subject-Name-1Economics/str str name=Subject-Name-1Mathematics/strstr name=Subject-Name-1History/strint name=Subject-Name-17/int int name=Subject-Name-17/intint name=Subject-Name-16/int /doc then i am getting a higher score for last document which has only one of the search term !!! But in my situation it is not applicable. My requirement is,if a document has only one term then they should get a lower score than the documents which are having both of the terms. Is it happening because of idf(rarer terms give higher contribution to the total score) ? Or there is something wrong with my query ? Can anybody help me to achieve the desired output. Thanks in advance
Re: Problem when I search something that contains a forward slash?
Hi Bruno, [image: !] Solr 4.0 added regular expression support, which means that '/' is now a special character and must be escaped if searching for literal forward slash. http://wiki.apache.org/solr/SolrQuerySyntax So, you can either escape it or use quotes like A01H2/001 Cheers, Sandeep On 19 February 2013 11:40, Bruno Mannina bmann...@free.fr wrote: Dear Solr Users, I use Solr 3.6 I have a field name IC which contains IPC codes with a forward slash inside like: A01H2/001 G06F1/023 C01C3/147 G06F3/023 etc... My definition for this field is: field name=ic type=text_general indexed=true stored=true multiValued=true/ If i try to search: ic:G06F3/023 http://:/solr/**select/?q=ic%3AG06F3%2F023** version=2.2start=0rows=10**indent=on the result is wrong. When I use debugQuery, I see that the forward slash split the request as: str name=parsedquery_toString**ic:g06f3 ic:023/str How can I search a term that contains a / (forward slash)? Thanks a lot for your help, Bruno
Re: Problem when I search something that contains a forward slash?
Hi Bruno, I have never used 3.6 so I am sorry I might not be of much help. But, I have a similar requirement for 2 fields and I use string case insensitive string fields and by escaping the forward slash, I get the result correctly. The field definitions are as below: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true compressThreshold=10/ fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true compressThreshold=10 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType The debug output for string field is as below: *String field:* str name=rawquerystringpv_program_version_number_ci:HNAD002D\/01/str str name=querystringpv_program_version_number_ci:HNAD002D\/01/str str name=parsedquerypv_program_version_number_ci:hnad002d/01/str str name=parsedquery_toStringpv_program_version_number_ci:hnad002d/01 /str *Case Insensitive String field:* str name=rawquerystringpv_program_version_number:HNAD002D\/01/str str name=querystringpv_program_version_number:HNAD002D\/01/str str name=parsedquerypv_program_version_number:HNAD002D/01/str str name=parsedquery_toStringpv_program_version_number:HNAD002D/01/str HTH, Sandeep On 19 February 2013 12:24, Bruno Mannina bmann...@free.fr wrote: Hi, Even I use backslash, the problem is the same: ic:A01H2\/023 returns the same problem. May be I must disable an option ? or something Le 19/02/2013 13:11, Bruno Mannina a Ă©crit : Hi Sandeep, First thanks for your answer but I use Solr 3.6 and not 4.0. I can't actually update my solr to 4.0 version. And using the is not the solution because Solr 3.6 has an issue when I use troncation like * inside the request: A01H2/0* doesn't work. Do you have an other solution for Solr 3.6 ? thanks a lot, Bruno Le 19/02/2013 13:05, Sandeep Mestry a Ă©crit : Hi Bruno, [image: !] Solr 4.0 added regular expression support, which means that '/' is now a special character and must be escaped if searching for literal forward slash. http://wiki.apache.org/solr/**SolrQuerySyntaxhttp://wiki.apache.org/solr/SolrQuerySyntax So, you can either escape it or use quotes like A01H2/001 Cheers, Sandeep On 19 February 2013 11:40, Bruno Mannina bmann...@free.fr wrote: Dear Solr Users, I use Solr 3.6 I have a field name IC which contains IPC codes with a forward slash inside like: A01H2/001 G06F1/023 C01C3/147 G06F3/023 etc... My definition for this field is: field name=ic type=text_general indexed=true stored=true multiValued=true/ If i try to search: ic:G06F3/023 http://:/solr/select/?q=ic%3AG06F3%2F023** version=2.2start=0rows=10indent=on the result is wrong. When I use debugQuery, I see that the forward slash split the request as: str name=parsedquery_toStringic:g06f3 ic:023/str How can I search a term that contains a / (forward slash)? Thanks a lot for your help, Bruno
Re: Possible issue in edismax?
Hi Felipe, Just a short note to say thanks for your valuable suggestion. I had implemented that and could see expected results. The length norm still spoils it for few fields but I balanced it with the boost factors accordingly. Once again, Many Thanks! Sandeep On 1 February 2013 22:53, Sandeep Mestry sanmes...@gmail.com wrote: Brilliant! Thanks very much for your response. . On 1 Feb 2013 20:37, Felipe Lahti fla...@thoughtworks.com wrote: It's not necessary. It's only query time. On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi.. Could you tell me if changing default similarity to custom implementation will require me to rebuild the index? Or will it be used only query time? thanks, Sandeep On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote: So, it depends of your business requirement, right? If a document has matches in more searchable fields, at least for me, this document is more important than other document that has less matches. Example: Put this in your schema: similarity class=com.your.namespace.NoIDFSimilarity / And create a class in your classpath of your Solr: package com.your.namespace; import org.apache.lucene.search.similarities.DefaultSimilarity; public class NoIDFSimilarity extends DefaultSimilarity { @Override public float idf(long docFreq, long numDocs) { return 1; } } It will neutralize the idf (which is the rarity of term). On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Felipe.. Can you point me an example please? Also forgive me but if a document has matches in more searchable fields then should it not rank higher? Thanks, Sandeep On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com wrote: If you compare the first and last document scores you will see that the last one matches more fields than first one. So, you maybe thinking why? The first doc only matches contributions field and the last matches a bunch of fields so if you want to have behave more like (str name=qfseries_title^500 title^100 description^15 contribution/str) you have to override the method of DefaultSimilarity. On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry sanmes...@gmail.com wrote: I have pasted it below and it is slightly variant from the dismax configuration I have mentioned above as I was playing with all sorts of boost values, however it looks more lie below: str name=c208c2ca-4270-27b8-e040-a8c00409063a 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others of: 2675.7844 = (MATCH) weight(contributions:news in 63298) [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 40960.0 = fieldNorm(doc=63298) /str str name=c208c2a9-66bc-27b8-e040-a8c00409063a 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others of: 2317.297 = (MATCH) weight(contributions:news in 9826415) [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 = termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0), with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415) /str str name=c208c2aa-1806-27b8-e040-a8c00409063a 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others of: 2140.6274 = (MATCH) weight(contributions:news in 9882325) [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 32768.0 = fieldNorm(doc=9882325) /str str name=c208c2b0-5165-27b8-e040-a8c00409063a 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others of: 1605.4707 = (MATCH) weight(contributions:news in 220007) [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 = termFreq=1.0 ), product of: 0.004495774
Re: Possible issue in edismax?
Hi.. Could you tell me if changing default similarity to custom implementation will require me to rebuild the index? Or will it be used only query time? thanks, Sandeep On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote: So, it depends of your business requirement, right? If a document has matches in more searchable fields, at least for me, this document is more important than other document that has less matches. Example: Put this in your schema: similarity class=com.your.namespace.NoIDFSimilarity / And create a class in your classpath of your Solr: package com.your.namespace; import org.apache.lucene.search.similarities.DefaultSimilarity; public class NoIDFSimilarity extends DefaultSimilarity { @Override public float idf(long docFreq, long numDocs) { return 1; } } It will neutralize the idf (which is the rarity of term). On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Felipe.. Can you point me an example please? Also forgive me but if a document has matches in more searchable fields then should it not rank higher? Thanks, Sandeep On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com wrote: If you compare the first and last document scores you will see that the last one matches more fields than first one. So, you maybe thinking why? The first doc only matches contributions field and the last matches a bunch of fields so if you want to have behave more like (str name=qfseries_title^500 title^100 description^15 contribution/str) you have to override the method of DefaultSimilarity. On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry sanmes...@gmail.com wrote: I have pasted it below and it is slightly variant from the dismax configuration I have mentioned above as I was playing with all sorts of boost values, however it looks more lie below: str name=c208c2ca-4270-27b8-e040-a8c00409063a 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others of: 2675.7844 = (MATCH) weight(contributions:news in 63298) [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 40960.0 = fieldNorm(doc=63298) /str str name=c208c2a9-66bc-27b8-e040-a8c00409063a 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others of: 2317.297 = (MATCH) weight(contributions:news in 9826415) [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 = termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0), with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415) /str str name=c208c2aa-1806-27b8-e040-a8c00409063a 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others of: 2140.6274 = (MATCH) weight(contributions:news in 9882325) [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 32768.0 = fieldNorm(doc=9882325) /str str name=c208c2b0-5165-27b8-e040-a8c00409063a 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others of: 1605.4707 = (MATCH) weight(contributions:news in 220007) [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 24576.0 = fieldNorm(doc=220007) /str str name=c208c2cc-d01b-27b8-e040-a8c00409063a 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others of: 1605.4707 = (MATCH) weight(contributions:news in 241151) [DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14
Re: Possible issue in edismax?
Brilliant! Thanks very much for your response. . On 1 Feb 2013 20:37, Felipe Lahti fla...@thoughtworks.com wrote: It's not necessary. It's only query time. On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi.. Could you tell me if changing default similarity to custom implementation will require me to rebuild the index? Or will it be used only query time? thanks, Sandeep On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote: So, it depends of your business requirement, right? If a document has matches in more searchable fields, at least for me, this document is more important than other document that has less matches. Example: Put this in your schema: similarity class=com.your.namespace.NoIDFSimilarity / And create a class in your classpath of your Solr: package com.your.namespace; import org.apache.lucene.search.similarities.DefaultSimilarity; public class NoIDFSimilarity extends DefaultSimilarity { @Override public float idf(long docFreq, long numDocs) { return 1; } } It will neutralize the idf (which is the rarity of term). On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Felipe.. Can you point me an example please? Also forgive me but if a document has matches in more searchable fields then should it not rank higher? Thanks, Sandeep On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com wrote: If you compare the first and last document scores you will see that the last one matches more fields than first one. So, you maybe thinking why? The first doc only matches contributions field and the last matches a bunch of fields so if you want to have behave more like (str name=qfseries_title^500 title^100 description^15 contribution/str) you have to override the method of DefaultSimilarity. On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry sanmes...@gmail.com wrote: I have pasted it below and it is slightly variant from the dismax configuration I have mentioned above as I was playing with all sorts of boost values, however it looks more lie below: str name=c208c2ca-4270-27b8-e040-a8c00409063a 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others of: 2675.7844 = (MATCH) weight(contributions:news in 63298) [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 40960.0 = fieldNorm(doc=63298) /str str name=c208c2a9-66bc-27b8-e040-a8c00409063a 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others of: 2317.297 = (MATCH) weight(contributions:news in 9826415) [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 = termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0), with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415) /str str name=c208c2aa-1806-27b8-e040-a8c00409063a 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others of: 2140.6274 = (MATCH) weight(contributions:news in 9882325) [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 32768.0 = fieldNorm(doc=9882325) /str str name=c208c2b0-5165-27b8-e040-a8c00409063a 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others of: 1605.4707 = (MATCH) weight(contributions:news in 220007) [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 24576.0 = fieldNorm(doc=220007) /str str name=c208c2cc
Re: Possible issue in edismax?
Fantastic! Thanks very much.. I will do so accordingly and will let you know the results. Thanks again, Sandeep On 31 January 2013 13:54, Felipe Lahti fla...@thoughtworks.com wrote: So, it depends of your business requirement, right? If a document has matches in more searchable fields, at least for me, this document is more important than other document that has less matches. Example: Put this in your schema: similarity class=com.your.namespace.NoIDFSimilarity / And create a class in your classpath of your Solr: package com.your.namespace; import org.apache.lucene.search.similarities.DefaultSimilarity; public class NoIDFSimilarity extends DefaultSimilarity { @Override public float idf(long docFreq, long numDocs) { return 1; } } It will neutralize the idf (which is the rarity of term). On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Felipe.. Can you point me an example please? Also forgive me but if a document has matches in more searchable fields then should it not rank higher? Thanks, Sandeep On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com wrote: If you compare the first and last document scores you will see that the last one matches more fields than first one. So, you maybe thinking why? The first doc only matches contributions field and the last matches a bunch of fields so if you want to have behave more like (str name=qfseries_title^500 title^100 description^15 contribution/str) you have to override the method of DefaultSimilarity. On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry sanmes...@gmail.com wrote: I have pasted it below and it is slightly variant from the dismax configuration I have mentioned above as I was playing with all sorts of boost values, however it looks more lie below: str name=c208c2ca-4270-27b8-e040-a8c00409063a 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others of: 2675.7844 = (MATCH) weight(contributions:news in 63298) [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 40960.0 = fieldNorm(doc=63298) /str str name=c208c2a9-66bc-27b8-e040-a8c00409063a 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others of: 2317.297 = (MATCH) weight(contributions:news in 9826415) [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 = termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0), with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415) /str str name=c208c2aa-1806-27b8-e040-a8c00409063a 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others of: 2140.6274 = (MATCH) weight(contributions:news in 9882325) [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 32768.0 = fieldNorm(doc=9882325) /str str name=c208c2b0-5165-27b8-e040-a8c00409063a 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others of: 1605.4707 = (MATCH) weight(contributions:news in 220007) [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 24576.0 = fieldNorm(doc=220007) /str str name=c208c2cc-d01b-27b8-e040-a8c00409063a 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others of: 1605.4707 = (MATCH) weight(contributions:news in 241151) [DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 = termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of: 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm 357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414) 24576.0 = fieldNorm(doc
Possible issue in edismax?
Hi All, I'm facing an issue in relevancy calculation by dismax query parser. The boost factor applied does not work as expected in certain cases when the keyword is generic and by generic I mean, if the keyword is appearing many times in the document as well as in the index. I have parser configuration as below: requestHandler name=querydismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfseries_title^500 title^100 description^15 contribution/str str name=pfseries_title^200/str int name=ps0/int str name=q.alt*:*/str /lst /requestHandler As you can see above, I'd expect the documents containing the matches for series title should rank higher than the ones in contribution. This works well, if I type in a query like 'wonderworld' which is a less occurring term and the series titles rank higher. But, if I type in a keyword like 'news' which is the most common term in the index, I get hits in contributions even though I have lots of documents having word news in series title. The field definition is as below: field name=series_title type=text_wc indexed=true stored=true multiValued=false / field name=title type=text_wc indexed=true stored=true multiValued=false / field name=description type=text_wc indexed=true stored=true multiValued=false / field name=contribution type=text indexed=true stored=true multiValued=true / fieldType name=text class=solr.TextField positionIncrementGap=100 compressThreshold=10 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I have tried debugging and when I use query term news, I see that matches for contributions are ranked higher than series title. The parsed queries look like below: (Note that I have edited the query as in reality I have lot of fields that are searchable and I have only mentioned the fields containing text data - rest all contain uuids) str name=parsedquery (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 | contributions:news | series_title:news^500.0)~0.01) () () () () () () () () () () () () () () () () () () () () () () () () () () () ())/no_coord /str str name=parsedquery_toString +(description:news^15 | title:news^100.0 | contributions:news | series_title:news^500.0)~0.01 () () () () () () () () () () () () () () () () () () () () () () () () () () () () Could you guide me in right direction please? Many Thanks, Sandeep
Re: Possible issue in edismax?
Thanks Felipe, yes I have seen that and my requirement somewhere falls for On 30 January 2013 15:53, Felipe Lahti fla...@thoughtworks.com wrote: Hi Sandeep, Quick answer is that not only the boost that you define in your requestHandler is taken to calculate the score of each document. There are others factors that contribute to score calculation. You can take a look here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see using debugQuery=true the score calculation for each document returned. Let me know you need something else. On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi All, I'm facing an issue in relevancy calculation by dismax query parser. The boost factor applied does not work as expected in certain cases when the keyword is generic and by generic I mean, if the keyword is appearing many times in the document as well as in the index. I have parser configuration as below: requestHandler name=querydismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfseries_title^500 title^100 description^15 contribution/str str name=pfseries_title^200/str int name=ps0/int str name=q.alt*:*/str /lst /requestHandler As you can see above, I'd expect the documents containing the matches for series title should rank higher than the ones in contribution. This works well, if I type in a query like 'wonderworld' which is a less occurring term and the series titles rank higher. But, if I type in a keyword like 'news' which is the most common term in the index, I get hits in contributions even though I have lots of documents having word news in series title. The field definition is as below: field name=series_title type=text_wc indexed=true stored=true multiValued=false / field name=title type=text_wc indexed=true stored=true multiValued=false / field name=description type=text_wc indexed=true stored=true multiValued=false / field name=contribution type=text indexed=true stored=true multiValued=true / fieldType name=text class=solr.TextField positionIncrementGap=100 compressThreshold=10 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I have tried debugging and when I use query term news, I see that matches for contributions are ranked higher than series title. The parsed queries look like below: (Note that I have edited the query as in reality I have lot of fields that are searchable and I have only mentioned the fields containing text data - rest all contain uuids) str name=parsedquery (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 | contributions:news | series_title:news^500.0)~0.01) () () () () () () () () () () () () () () () () () () () () () () () () () () () ())/no_coord /str str name=parsedquery_toString +(description:news^15 | title:news^100.0 | contributions:news | series_title:news^500.0)~0.01 () () () () () () () () () () () () () () () () () () () () () () () () () () () () Could you guide me in right direction please? Many Thanks, Sandeep -- Felipe Lahti Consultant Developer
Re: Possible issue in edismax?
(Sorry for in complete reply in my previous mail, didn't know Ctrl F sends an email in Gmail.. ;-)) Thanks Felipe, yes I have seen that and my requirement falls for How can I make exact-case matches score higher Example: a query of Penguin should score documents containing Penguin higher than docs containing penguin. The general strategy is to index the content twice, using different fields with different fieldTypes (and different analyzers associated with those fieldTypes). One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches. Use copyField http://wiki.apache.org/solr/SchemaXml#copyField commands in the schema to index a single input field multiple times. Once the content is indexed into multiple fields that are analyzed differently, query across both fieldshttp://wiki.apache.org/solr/SolrRelevancyFAQ#multiFieldQuery . I have added a case insensitive field too to match the exact matches higher, however the result is not even considering the matches in field - forget the exact matching part. And I have tried the debugQuery option as mentioned in my previous mail, and I have also posted the parsed queries. From the debug query, I see that field boosted with lesser factor (contribution) is still resulting higher than the one with higher boost factor (series_title). Thanks, Sandeep On 30 January 2013 16:02, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Felipe, yes I have seen that and my requirement somewhere falls for On 30 January 2013 15:53, Felipe Lahti fla...@thoughtworks.com wrote: Hi Sandeep, Quick answer is that not only the boost that you define in your requestHandler is taken to calculate the score of each document. There are others factors that contribute to score calculation. You can take a look here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see using debugQuery=true the score calculation for each document returned. Let me know you need something else. On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi All, I'm facing an issue in relevancy calculation by dismax query parser. The boost factor applied does not work as expected in certain cases when the keyword is generic and by generic I mean, if the keyword is appearing many times in the document as well as in the index. I have parser configuration as below: requestHandler name=querydismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfseries_title^500 title^100 description^15 contribution/str str name=pfseries_title^200/str int name=ps0/int str name=q.alt*:*/str /lst /requestHandler As you can see above, I'd expect the documents containing the matches for series title should rank higher than the ones in contribution. This works well, if I type in a query like 'wonderworld' which is a less occurring term and the series titles rank higher. But, if I type in a keyword like 'news' which is the most common term in the index, I get hits in contributions even though I have lots of documents having word news in series title. The field definition is as below: field name=series_title type=text_wc indexed=true stored=true multiValued=false / field name=title type=text_wc indexed=true stored=true multiValued=false / field name=description type=text_wc indexed=true stored=true multiValued=false / field name=contribution type=text indexed=true stored=true multiValued=true / fieldType name=text class=solr.TextField positionIncrementGap=100 compressThreshold=10 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1
Re: Possible issue in edismax?
= fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.4641423 = idf(docFreq=47791, maxDocs=11282414) 1.0 = fieldNorm(doc=967895) 1.6107484 = (MATCH) weight(title_ci:news^100.0 in 967895) [DefaultSimilarity], result of: 1.6107484 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of: 0.22324038 = queryWeight, product of: 100.0 = boost 7.2153096 = idf(docFreq=22548, maxDocs=11282414) 3.093982E-4 = queryNorm 7.2153096 = fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 7.2153096 = idf(docFreq=22548, maxDocs=11282414) 1.0 = fieldNorm(doc=967895) /str On 30 January 2013 17:55, Felipe Lahti fla...@thoughtworks.com wrote: Let me see if I understood your problem: By your first e-mail I think you are worried about the returned order of documents from Solr. Is that correct? If yes, as I said before it's not only the boosting that influence the order of returned documents. There's term frequency, IDF(inverse document frequency)... If I understood correctly by your first e-mail, you are interested in get rid of IDF. So for that, you can create a NoIDFSimilarity class to override the default similarity. Can you paste here the score calculation for one document? On Wed, Jan 30, 2013 at 2:06 PM, Sandeep Mestry sanmes...@gmail.comwrote: (Sorry for in complete reply in my previous mail, didn't know Ctrl F sends an email in Gmail.. ;-)) Thanks Felipe, yes I have seen that and my requirement falls for How can I make exact-case matches score higher Example: a query of Penguin should score documents containing Penguin higher than docs containing penguin. The general strategy is to index the content twice, using different fields with different fieldTypes (and different analyzers associated with those fieldTypes). One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches. Use copyField http://wiki.apache.org/solr/SchemaXml#copyField commands in the schema to index a single input field multiple times. Once the content is indexed into multiple fields that are analyzed differently, query across both fieldshttp://wiki.apache.org/solr/SolrRelevancyFAQ#multiFieldQuery . I have added a case insensitive field too to match the exact matches higher, however the result is not even considering the matches in field - forget the exact matching part. And I have tried the debugQuery option as mentioned in my previous mail, and I have also posted the parsed queries. From the debug query, I see that field boosted with lesser factor (contribution) is still resulting higher than the one with higher boost factor (series_title). Thanks, Sandeep On 30 January 2013 16:02, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Felipe, yes I have seen that and my requirement somewhere falls for On 30 January 2013 15:53, Felipe Lahti fla...@thoughtworks.com wrote: Hi Sandeep, Quick answer is that not only the boost that you define in your requestHandler is taken to calculate the score of each document. There are others factors that contribute to score calculation. You can take a look here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see using debugQuery=true the score calculation for each document returned. Let me know you need something else. On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi All, I'm facing an issue in relevancy calculation by dismax query parser. The boost factor applied does not work as expected in certain cases when the keyword is generic and by generic I mean, if the keyword is appearing many times in the document as well as in the index. I have parser configuration as below: requestHandler name=querydismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfseries_title^500 title^100 description^15 contribution/str str name=pfseries_title^200/str int name=ps0/int str name=q.alt*:*/str /lst /requestHandler As you can see above, I'd expect the documents containing the matches for series title should rank higher than the ones in contribution. This works well, if I type in a query like 'wonderworld' which is a less occurring term and the series titles rank higher. But, if I type in a keyword like 'news' which is the most common term in the index, I get hits in contributions even though I have lots of documents having word news in series title. The field definition is as below: field name=series_title type=text_wc indexed=true stored=true multiValued=false / field name=title type=text_wc indexed=true stored=true multiValued=false / field name
Re: ConcurrentModificationException in Solr 3.6.1
Hi There, I think Andre has already guided you in your earlier mail.. This should be fixed in 3.6.2 which is available since Dec 25. From the release notes: Fixed ConcurrentModificationException during highlighting, if all fields were requested. André Von: mechravi25 [mechrav...@yahoo.co.in] Gesendet: Freitag, 18. Januar 2013 11:10 An: solr-user@lucene.apache.org Betreff: ConcurrentModificationException in Solr 3.6.1 On 18 January 2013 12:01, mechravi25 mechrav...@yahoo.co.in wrote: Hi all, I am using Solr 3.6.1 version. I am giving a set of requests to solr simultaneously. When I check the log file, I noticed the below exception stack trace SEVERE: java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at java.util.LinkedList$ListItr.next(LinkedList.java:696) at org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) When I searched through the solr issues, I got the following two url's, https://issues.apache.org/jira/browse/SOLR-2684 https://issues.apache.org/jira/browse/SOLR-3790 The stack trace given in the second url coincides with the one given above so I have applied the code change as given in the below link http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h The first url's stack trace seems to be different. I have two questions here. 1.) Please tell me why this exception stack trace occurs 2.) IS there any other patch/solution available to overcome this exception. Please guide me. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034520.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4 : Optimize very slow
Hi All, I followed the advice Michael and the timings reduced to couple of hours now from 6-8 hours :-) I have attached the solrconfig.xml we're using, can you let me know if I'm missing something.. Thanks, Sandeep ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- For more details about configurations options that may appear in this file, see http://wiki.apache.org/solr/SolrConfigXml. Specifically, the Solr Config can support XInclude, which may make it easier to manage the configuration. See https://issues.apache.org/jira/browse/SOLR-1167 -- config luceneMatchVersionLUCENE_40/luceneMatchVersion !-- Set this to 'false' if you want solr to continue working after it has encountered an severe configuration error. In a production environment, you may want solr to keep working even if one handler is mis-configured. You may also set this to false using by setting the system property: -Dsolr.abortOnConfigurationError=false -- abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError !-- lib directives can be used to instruct Solr to load an Jars identified and use them to resolve any plugins specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...). All directories and paths are resolved relative the instanceDir. If a ./lib directory exists in your instanceDir, all files found in it are included as if you had used the following syntax... lib dir=./lib / -- !-- A dir option by itself adds any files found in the directory to the classpath, this is useful for including all jars in a directory. -- lib dir=../../contrib/extraction/lib / !-- When a regex is specified in addition to a directory, only the files in that directory which completely match the regex (anchored on both ends) will be included. -- lib dir=../../dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=../../dist/ regex=apache-solr-clustering-\d.*\.jar / !-- If a dir option (with or without a regex) is used and nothing is found that matches, it will be ignored -- lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / lib dir=/total/crap/dir/ignored / !-- an exact path can be used to specify a specific file. This will cause a serious error to be logged if it can't be loaded. lib path=../a-jar-that-does-not-exist.jar / -- !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. -- dataDir${solr.data.dir:./solr/data}/dataDir directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NIOFSDirectory}/ !-- WARNING: this indexDefaults section only provides defaults for index writers in general. See also the mainIndex section after that when changing parameters for Solr's main Lucene index. -- indexConfig !-- Values here affect all index writers and act as a default unless overridden. -- mergeFactor30/mergeFactor mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/ mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce15/int int name=segmentsPerTier15/int /mergePolicy !-- options specific to the main on-disk lucene index -- ramBufferSizeMB32/ramBufferSizeMB !-- Custom deletion policies can specified here. The class must implement org.apache.lucene.index.IndexDeletionPolicy. http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexDeletionPolicy.html The standard Solr IndexDeletionPolicy implementation supports deleting index commit points on number of commits, age of commit point and optimized status. The latest commit point should always be preserved regardless of the criteria. -- deletionPolicy class=solr.SolrDeletionPolicy !-- The number of commit points to be kept -- str name=maxCommitsToKeep1/str !-- The number of optimized commit
Re: Solr 4 : Optimize very slow
@ Walter, the daily optimization was introduced as we saw a decrease in the performance for searches that happen during the peak hours - when loads of updates take place on index. The load testing was proved slightly successfull on optimized indexes. As a matter of fact, the merge factor was increased from 10 to 30 to make it acceptable. @Upayavira , thanks for the inputs. I will try to avoid the daily optimizations however its sort of the workplace policy not to alter anything except the essential configs for this release of project. I take your point that the daily optimizations are unnecessary even then its hard to imagine why they take 6-8 hours a day when previously they were finished within half an hour. @Michael, thank for poitning that out, I will try using solr.NIOFSDirectoryFactory as currently I'm using the default one. Regarding your questions, - Nothing has changed between solr 1.4 and solr 4 except the solr config. I have built 2 separate environments using solr 1.4 and solr 4 with the same application code, db config etc. and can see the difference in the optimization timings. - I will check the solr stats for gc and also during optimization. I see that the index size reaches to 17 Gig from 8.5G and the CPU utilization then is the highest.. And I meant WAS only as in Websphere Application Server. @Otis, a quick google for optimize wunder Erick Otis results in this mail chain (ha ha !), but I will dig the mail archives, thank you for your suggestion.. Have a good day all, I will come back with my findings.. Best, Sandeep On 5 December 2012 06:07, Walter Underwood wun...@wunderwood.org wrote: It was not necessary under 1.4. It has never been necessary. It was not necessary in Ultraseek Server in 1996, using the same merging model. In some cases, it can be a good idea. Since you are continuously updating, this is not one of those cases. wunder On Dec 4, 2012, at 9:29 PM, Upayavira wrote: I tried that search, without success :-( I suspect what Otis was trying to say was to question why you are optimising. Optimise was necessary under 1.4, but with newer Solr, the new TieredMergePolicy does a much better job of handling background merging, reducing the need for optimize. Try just not doing it at all and see if your index actually reaches a point where it is needed. Upayavira On Wed, Dec 5, 2012, at 12:31 AM, Otis Gospodnetic wrote: Hi, You should search the ML archives for : optimize wunder Erick Otis :) Is WAS really AWS? If so, if these are new EC2 instances you are unfortunately unable to do a fair apples to apples comparison. Have you tried a different set of instances? Otis -- Performance Monitoring - http://sematext.com/spm On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi All, I have recently migrated from solr 1.4 to solr 4 and have done the basic changes required for solr 4 in solrconfig.xml and schema.xml. I have also rebuilt the index set for solr 4. We run optimize every morning at 4 am and we keep the index updates off during this process. Previously, with 1.4 - the optimization used to take around 20-30 mins per shard but now with solr 4, its taking 6-8 hours or even more.. I have also tested the optimize from solr UI and that takes 6-8 hours too.. The hardware is saeme and, we have deployed solr under WAS. There ar 4 shards and every shard contains around 8 - 9 Gig of data and we are using master-slave configuration with rsync. I have not enabled soft commit. Also, commiter process is scheduled to run every minute. I am not sure which part I'm missing, do let me know your inputs please. Many Thanks in advance, Sandeep -- Walter Underwood wun...@wunderwood.org
Re: Incremental Update of index
Hi Amit/Shanu, You can create the solr document for only the updated record and index it to ensure only the updated record gets indexed. You need not rebuild indexes from scratch for every record update. Thanks, Sandeep
Solr 4 : Optimize very slow
Hi All, I have recently migrated from solr 1.4 to solr 4 and have done the basic changes required for solr 4 in solrconfig.xml and schema.xml. I have also rebuilt the index set for solr 4. We run optimize every morning at 4 am and we keep the index updates off during this process. Previously, with 1.4 - the optimization used to take around 20-30 mins per shard but now with solr 4, its taking 6-8 hours or even more.. I have also tested the optimize from solr UI and that takes 6-8 hours too.. The hardware is saeme and, we have deployed solr under WAS. There ar 4 shards and every shard contains around 8 - 9 Gig of data and we are using master-slave configuration with rsync. I have not enabled soft commit. Also, commiter process is scheduled to run every minute. I am not sure which part I'm missing, do let me know your inputs please. Many Thanks in advance, Sandeep
Re: Does SolrCloud support distributed IDFs?
Dear All, Can anyone suggest how long it will take to get SOLR-1632 patch into Solr 4? Also, it'd be good if someone has used any alternate method like Ultraseek XPA Java library to calculate the distributed ranking? Many Thanks, Sandeep On 22 October 2012 13:23, Sascha SZOTT sz...@gmx.de wrote: Hi Mark, Mark Miller wrote: Still waiting on that issue. I think Andrzej should just update it to trunk and commit - it's option and defaults to off. Go vote :) Sounds like the problem is already solved and the remaining work consists of code integration? Can somebody estimate how much work that would be? -Sascha
Forming Solr Query for multiple operators against multiple fields
Dear All, I have a requirement to search against multiple fields like title, description, annotations, comments, text and the query can contain multiple boolean operators. So, can someone point me out in right direction. If the user enters a query like , - (day AND world) NOT night I want to form a query: *(title:day AND title:world NOT title:night) OR (description:day AND description:world NOT description:night) OR (annotations:day AND annotations:world NOT annotations:night) OR (comments:day AND comments:world NOT comments:night) OR (text:day AND text:world NOT text:night) * I've tried Lucene MultiFieldQueryParser to form the query and after some string manipulation tried producing a query as below, however it does not provide me correct relevancy. *(title:day OR description:day OR annotations:day OR comments:day OR text:day) AND (title:world OR description:world OR annotations:world OR comments:world OR text:world) NOT (title:night OR description:night OR annotations:night OR comments:night OR text:night)* For the record, the project is still on Solr 1.4 and hence I'm using Standard Query Parser (the upgrade is due in coming months). But for now, I need to make it work for above requirement. Please suggest if there is any straightforward approach or should I take the route of writing the QueryGrammar myself? Many Thanks, Sandeep
Re: Forming Solr Query for multiple operators against multiple fields
Thanks Ahmet, however as I have mentioned in my e-mail, we're using Solr 1.4 here and edismax is supported from Solr 3.1. :-) On 23 October 2012 13:42, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 10/23/12, Sandeep Mestry sanmes...@gmail.com wrote: From: Sandeep Mestry sanmes...@gmail.com Subject: Forming Solr Query for multiple operators against multiple fields To: solr-user@lucene.apache.org Date: Tuesday, October 23, 2012, 2:51 PM Dear All, I have a requirement to search against multiple fields like title, description, annotations, comments, text and the query can contain multiple boolean operators. So, can someone point me out in right direction. If the user enters a query like , - (day AND world) NOT night Probably you can make use of (e)dismax query parser. http://wiki.apache.org/solr/DisMax http://wiki.apache.org/solr/ExtendedDisMax