Re: autocomplete: case-insensitive and middle word
This thread might help - http://www.lucidimagination.com/search/document/9edc01a90a195336/enhancing_auto_complete Cheers Avlesh @avlesh http://twitter.com/avlesh | http://webklipper.com On Tue, Aug 17, 2010 at 8:30 PM, Paul p...@nines.org wrote: I have a couple questions about implementing an autocomplete function in solr. Here's my scenario: I have a name field that usually contains two or three names. For instance, let's suppose it contains: John Alfred Smith Alfred Johnson John Quincy Adams Fred Jones I'd like to have the autocomplete be case insensitive and match any of the names, preferably just at the beginning. In other words, if the user types alf, I want John Alfred Smith Alfred Johnson if the user types fre, I want Fred Jones but not: John Alfred Smith Alfred Johnson I can get the matches using the text_lu analyzer, but the hints that are returned are lower case, and only one name. If I use the string analyzer, I get the entire name like I want it, but the user must match the case, that is, must type Alf, and it only matches the first name, not the middle name. How can I get the matches of the text_lu analyzer, but get the hints like the string analyzer? Thanks, Paul
Re: enhancing auto complete
I preferred to answer this question privately earlier. But I have received innumerable requests to unveil the architecture. For the benefit of all, I am posting it here (after hiding as much info as I should, in my company's interest). The context: Auto-suggest feature on http://askme.in *Solr setup*: Underneath are some of the salient features - 1. TermsComponent is NOT used. 2. The index is made up of 4 fields of the following types - autocomplete_full, autocomplete_token, string and text. 3. autocomplete_full uses KeywordTokenizerFactory and EdgeNGramFilterFactory. autocomplete_token uses WhitespaceTokenizerFactory and EdgeNGramFilterFactory. Both of these are Solr text fields with standard filters like LowerCaseFilterFactory etc applied during querying and indexing. 4. Standard DataImportHandler and a bunch of sql procedures are used to derive all suggestable phrases from the system and index them in the above mentioned fields. *Controller setup*: The controller (to handle suggest queries) is a typical JAVA servlet using Solr as its backend (connecting via solrj). Based on the incoming query string, a lucene query is created. It is BooleanQuery comprising of TermQuery across all the above mentioned fields. The boost factor to each of these term queries would determine (to an extent) what kind of matches do you prefer to show up first. JSON is used as the data exchange format. *Frontend setup*: It is a home grown JS to address some specific use cases of the project in question. One simple exercise with Firebug will spill all the beans. However, I strongly recommend using jQuery to build (and extend) the UI component. Any help beyond this is available, but off the list. Cheers Avlesh @avlesh http://twitter.com/avlesh | http://webklipper.com On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar bhavnik.gaj...@gatewaynintec.com wrote: Whoops! table still not looks ok :( trying to send once again loremLorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ipLorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ipsltest xyz lorem ipslili On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote: Avlesh, Thanks for responding The table mentioned below looks like, lorem Lorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ip Lorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ipsl test xyz lorem ipslili Yes, [http://askme.in] looks good! I would like to know its designs/solr configurations etc.. Can you please provide me detailed views of it? In [http://askme.in], there is one thing to be noted. Search text like, [business c] populates [Business Centre] which looks OK but, [Consultant Business] looks bit odd. But, in general the pointer you suggested is great to start with. On 8/2/2010 8:39 PM, Avlesh Singh wrote: From whatever I could read in your broken table of sample use cases, I think you are looking for something similar to what has been done here -http://askme.in; if this is what you are looking do let me know. Cheers Avlesh @avleshhttp://twitter.com/avlesh http://twitter.com/avlesh | http://webklipper.com On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjarbhavnik.gaj...@gatewaynintec.com wrote: Hi, I'm looking for a solution related to auto complete feature for one application. Below is a list of texts from which auto complete results would be populated. Lorem ipsum dolor sit amet tincidunt ut laoreet dolore eu feugiat nulla facilisis at vero eros et te feugait nulla facilisi Claritas est etiam processus anteposuerit litterarum formas humanitatis fiant sollemnes in futurum Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili Consider below table. First column describes user entered value and second column describes expected result (list of auto complete terms that should be populated from Solr) lorem *Lorem* ipsum dolor sit amet Hieyed ddi *lorem* ipsum dolor test *lorem *ipsume test xyz *lorem *ipslili lorem ip *Lorem ip*sum dolor sit amet Hieyed ddi *lorem ip*sum dolor test *lorem ip*sume test xyz *lorem ip*slili lorem ipsl test xyz *lorem ipsl*ili Can anyone share ideas of how this can be achieved
Re: enhancing auto complete
From whatever I could read in your broken table of sample use cases, I think you are looking for something similar to what has been done here - http://askme.in; if this is what you are looking do let me know. Cheers Avlesh @avlesh http://twitter.com/avlesh | http://webklipper.com On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar bhavnik.gaj...@gatewaynintec.com wrote: Hi, I'm looking for a solution related to auto complete feature for one application. Below is a list of texts from which auto complete results would be populated. Lorem ipsum dolor sit amet tincidunt ut laoreet dolore eu feugiat nulla facilisis at vero eros et te feugait nulla facilisi Claritas est etiam processus anteposuerit litterarum formas humanitatis fiant sollemnes in futurum Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili Consider below table. First column describes user entered value and second column describes expected result (list of auto complete terms that should be populated from Solr) lorem *Lorem* ipsum dolor sit amet Hieyed ddi *lorem* ipsum dolor test *lorem *ipsume test xyz *lorem *ipslili lorem ip *Lorem ip*sum dolor sit amet Hieyed ddi *lorem ip*sum dolor test *lorem ip*sume test xyz *lorem ip*slili lorem ipsl test xyz *lorem ipsl*ili Can anyone share ideas of how this can be achieved with Solr? Already tried with various tokenizers and filter factories like, WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory, ShingleFilterFactory etc. but no luck so far.. Note that, It would be excellent if terms populated from Solr can be highlighted by using Highlighting or any other component/mechanism of Solr. *Note :* Standard autocomplete (like, facet.field=AutoCompletef.AutoComplete.facet.prefix=user entered termf.AutoComplete.facet.limit=10facet.sortrows=0) are already working fine with the application. but, nowadays, looking for enhancing the existing auto complete stuff with the above requirement. Any thoughts? Thanks in advance The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you would no longer like to receive eMails from us, then please send an eMail to d...@gatewaynintec.com
Re: enhancing auto complete
Hahaha ... sorry its not. And there is no readymade code that I can give you either. But yes, if you liked it, I can share the design of this feature (solr, backend and frontend). Cheers Avlesh @avlesh http://twitter.com/avlesh | http://webklipper.com On Mon, Aug 2, 2010 at 8:47 PM, scr...@asia.com wrote: Hi, I'm also interested of this feature... is it open source? -Original Message- From: Avlesh Singh avl...@gmail.com To: solr-user@lucene.apache.org Sent: Mon, Aug 2, 2010 5:09 pm Subject: Re: enhancing auto complete From whatever I could read in your broken table of sample use cases, I think you are looking for something similar to what has been done here - http://askme.in; if this is what you are looking do let me know. Cheers Avlesh @avlesh http://twitter.com/avlesh | http://webklipper.com On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar bhavnik.gaj...@gatewaynintec.com wrote: Hi, I'm looking for a solution related to auto complete feature for one application. Below is a list of texts from which auto complete results would be populated. Lorem ipsum dolor sit amet tincidunt ut laoreet dolore eu feugiat nulla facilisis at vero eros et te feugait nulla facilisi Claritas est etiam processus anteposuerit litterarum formas humanitatis fiant sollemnes in futurum Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili Consider below table. First column describes user entered value and second column describes expected result (list of auto complete terms that should be populated from Solr) lorem *Lorem* ipsum dolor sit amet Hieyed ddi *lorem* ipsum dolor test *lorem *ipsume test xyz *lorem *ipslili lorem ip *Lorem ip*sum dolor sit amet Hieyed ddi *lorem ip*sum dolor test *lorem ip*sume test xyz *lorem ip*slili lorem ipsl test xyz *lorem ipsl*ili Can anyone share ideas of how this can be achieved with Solr? Already tried with various tokenizers and filter factories like, WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory, ShingleFilterFactory etc. but no luck so far.. Note that, It would be excellent if terms populated from Solr can be highlighted by using Highlighting or any other component/mechanism of Solr. *Note :* Standard autocomplete (like, facet.field=AutoCompletef.AutoComplete.facet.prefix=user entered termf.AutoComplete.facet.limit=10facet.sortrows=0) are already working fine with the application. but, nowadays, looking for enhancing the existing auto complete stuff with the above requirement. Any thoughts? Thanks in advance The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you would no longer like to receive eMails from us, then please send an eMail to d...@gatewaynintec.com
Re: solr configuration for local search
They is me! Yes, multiple queries are being fired (though concurrently) for fetching suggestion. You would probably want to take this off the list with me for questions, if any. Cheers Avlesh http://webklipper.com On Mon, Jun 7, 2010 at 5:04 PM, Frank A fsa...@gmail.com wrote: Thanks. Do you have any idea what features they use, specifically thhe types of tokenizers and analyzers? Also, do you think they use two separate queries for the business name versus you may be looking for? Thanks again. On Sun, Jun 6, 2010 at 9:36 PM, Avlesh Singh avl...@gmail.com wrote: Frank, w.r.t features you may draw a lot of inspiration from these two sites 1. http://mumbai.burrp.com/ 2. http://askme.in/ Both these products are Indian local search applications. #1 primarily focuses on the eating out domain. All the search/suggest related features on these sites are powered by Solr. You can take a lot of cues for building the auto-complete feature, using facets, custom highlighting etc. Cheers Avlesh http://webklipper.com On Mon, Jun 7, 2010 at 6:08 AM, Frank A fsa...@gmail.com wrote: Hi, I'm playing with SOLR as the search engine for my local search site. I'm primarily focused on restaurants right now. I'm working with the following data attributes: Name - Restaurant name Cuisine - a list of 1 or more cusines, e.g. Italian, Pizza Features - a list of 1 or more features - Open Late, Take-Out Tags - a list of 1 or more freeform, open entry tags I want the site to allow searches by name e.g. Jake's Pizza as well as more general pizza and even something like take-out pizza. I'd also like to handle variations, takeout carryout and spelling issues. I've started with the out of the box text definition and cloned it for cuisines, features and tags. For name I've left it as a string and then created a copyTo field for the phoentic value of Name. My text catch-all has all the fields copied to it. Finally, I implemented spell check as well. The search seems to work pretty well based on some initial testing but I feel like I'm missing something. I'm curious as to any advice around some missing features I should be utilizing or steps that I've missed etc... The steps I was planning: - Update the stopwords to contain certain adjectives (good best, etc). - Create synonyms for features and cuisines All thoughts/comments/advice is really appreciated. Thanks.
Re: solr configuration for local search
Frank, w.r.t features you may draw a lot of inspiration from these two sites 1. http://mumbai.burrp.com/ 2. http://askme.in/ Both these products are Indian local search applications. #1 primarily focuses on the eating out domain. All the search/suggest related features on these sites are powered by Solr. You can take a lot of cues for building the auto-complete feature, using facets, custom highlighting etc. Cheers Avlesh http://webklipper.com On Mon, Jun 7, 2010 at 6:08 AM, Frank A fsa...@gmail.com wrote: Hi, I'm playing with SOLR as the search engine for my local search site. I'm primarily focused on restaurants right now. I'm working with the following data attributes: Name - Restaurant name Cuisine - a list of 1 or more cusines, e.g. Italian, Pizza Features - a list of 1 or more features - Open Late, Take-Out Tags - a list of 1 or more freeform, open entry tags I want the site to allow searches by name e.g. Jake's Pizza as well as more general pizza and even something like take-out pizza. I'd also like to handle variations, takeout carryout and spelling issues. I've started with the out of the box text definition and cloned it for cuisines, features and tags. For name I've left it as a string and then created a copyTo field for the phoentic value of Name. My text catch-all has all the fields copied to it. Finally, I implemented spell check as well. The search seems to work pretty well based on some initial testing but I feel like I'm missing something. I'm curious as to any advice around some missing features I should be utilizing or steps that I've missed etc... The steps I was planning: - Update the stopwords to contain certain adjectives (good best, etc). - Create synonyms for features and cuisines All thoughts/comments/advice is really appreciated. Thanks.
Facet pagination
Is there a way to get *total count of facets* per field? Meaning, if my facets are - lst name=facet_fields lst name=first_char int name=s305807/int int name=d264748/int int name=p181084/int int name=m130546/int int name=r98544/int int name=b82741/int int name=k77157/int /lst /lst Then, is the underneath possible? lst name=first_char *totalFacetCount=7'* where 7 is the count of all facets available. In this example - s, d, p, m, r, b and k. I need this to fetch paginated facets of a field for a given query; not by doing next-previous. Cheers Avlesh
Re: Knowledge about contents of a page
Classification? - http://en.wikipedia.org/wiki/Document_classification Cheers Avlesh On Fri, Jan 29, 2010 at 1:18 AM, ram_sj rpachaiyap...@gmail.com wrote: Hi, My question is about crawling, I know this is not relevant here, but I asked nutch people, didn't get any response, I just thought of posing here, I'm trying to crawl reviews for business, a. is there any way to tell the content in a web pages are reviews or not? Is it possible to do it in automated fashion? b. How could be map a block of text to a particular business ? ex: like google reviews Thanks Ram -- View this message in context: http://old.nabble.com/Knowledge-about-contents-of-a-page-tp27358779p27358779.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Understanding the query parser
It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. That's exactly the question. Would be nice to hear from someone as to why is it that way? Cheers Avlesh On Mon, Jan 11, 2010 at 5:10 PM, Ahmet Arslan iori...@yahoo.com wrote: I am running in to the same issue. I have tried to replace my WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern (\s+|-) but I still seem to get a phrase query. Why is that? It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. Modifications in analysis phase (CharFilterFactory, TokenizerFactory, TokenFilterFactory) won't change this behavior. Something must be done before analysis phase. But i think in your case, you can obtain match with modifying parameters of WordDelimeterFilterFactory even with PhraseQuery.
Re: Tokenizer question
If the analyzer produces multiple Tokens, but they all have the same position then the QueryParser produces a BooleanQuery will all SHOULD clauses. -- This is what allows simple synonyms to work. You rock Hoss!!! This is exactly the explanation I was looking for .. it is as simple as it sounds. Thanks! Cheers Avlesh On Tue, Jan 12, 2010 at 6:37 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : q=PostCode:(1078 pw)+AND+HouseNumber:(39-43) : : the resulting parsed query contains a phrase query: : : +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43) This stems from some fairly fundemental behavior i nthe QueryParser ... each chunk of input that isn't deemed markup (ie: not field names, or special characters) is sent to the analyzer. If the analyzer produces multiple tokens at differnet positions, then a PhraseQuery is constructed. -- Things like simple phrase searchs and N-Gram based partial matching require this behavior. If the analyzer produces multiple Tokens, but they all have the same position then the QueryParser produces a BooleanQuery will all SHOULD clauses. -- This is what allows simple synonyms to work. If you write a simple TokenFilter to flatten all of the positions to be the same, and use it after WordDelimiterFilter then it should give you the OR style query you want. This isn't hte default behavior because the Phrase behavior of WDF fits it's intended case better --- someone searching for a product sku like X3QZ-D5 expects it to match X-3QZD5, but not just X or 3QZ -Hoss
Re: Understanding the query parser
Thanks Erik for responding. Hoss explained the behavior with nice corollaries here - http://www.lucidimagination.com/search/document/8bc351d408f24cf6/tokenizer_question Cheers Avlesh On Tue, Jan 12, 2010 at 2:21 AM, Erik Hatcher erik.hatc...@gmail.comwrote: On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote: It is in the source code of QueryParser's getFieldQuery(String field, String queryText) method line#660. If numTokens 1 it returns Phrase Query. That's exactly the question. Would be nice to hear from someone as to why is it that way? Suppose you indexed Foo Bar. It'd get indexed as two tokens [foo] followed by [bar]. Then someone searches for foo-bar, which would get analyzed into two tokens also. A PhraseQuery is the most logical thing for it to turn into, no? What's the alternative? Of course it's tricky business though, impossible to do the right thing for all cases within SolrQueryParser. Thankfully it is pleasantly subclassable and overridable for this method. Erik
Understanding the query parser
I am using Solr 1.3. I have an index with a field called name. It is of type text (unmodified, stock text field from solr). My query field:foo-bar is parsed as a phrase query field:foo bar I was rather expecting it to be parsed as field:(foo bar) or field:foo field:bar Is there an expectation mismatch? Can I make it work as I expect it to? Cheers Avlesh
Re: Rules engine and Solr
Thanks for the revert, Ravi. I am currently working on some of kind rules in front (application side) of our solr instance. These rules are more application specific and are not general. Like deciding which fields to facet, which fields to return in response, which fields to highlight, boost value for each field (both at query time and at index time). The approach I have taken is to define a database table which holds these fields parameters. Which are then interpreted by my application to decide the query to be sent to Solr. This allow tweaking the Solr fields on the fly and hence influence the search results. I guess, this is the usual usage of solr server. In my case this is no different. Search queries have a personalized experience, which means behaviors for facets, highlighting etc .. are customizable. We pull it off using databases and java data structures. I will be interested to hear from you about the Kind of rules you talk about and your approach towards it. Are these Rules like a regular expression that when matched with the user query, execute a specific solr query ? http://en.wikipedia.org/wiki/Business_rules_engine Cheers Avlesh On Wed, Jan 6, 2010 at 12:12 PM, Ravi Gidwani ravi.gidw...@gmail.comwrote: Avlesh: I am currently working on some of kind rules in front (application side) of our solr instance. These rules are more application specific and are not general. Like deciding which fields to facet, which fields to return in response, which fields to highlight, boost value for each field (both at query time and at index time). The approach I have taken is to define a database table which holds these fields parameters. Which are then interpreted by my application to decide the query to be sent to Solr. This allow tweaking the Solr fields on the fly and hence influence the search results. I will be interested to hear from you about the Kind of rules you talk about and your approach towards it. Are these Rules like a regular expression that when matched with the user query, execute a specific solr query ? ~Ravi On Tue, Jan 5, 2010 at 8:25 PM, Avlesh Singh avl...@gmail.com wrote: Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 Hahaha, thats classic Hoss! Thanks for introducing me to the XY problem. Had I known the two completely, I wouldn't have posted it on the mailing list. And I wasn't looking for a solution either. Anyways, as I replied back earlier, I'll get back with questions once I get more clarity. Cheers Avlesh On Wed, Jan 6, 2010 at 2:02 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I am planning to build a rules engine on top search. The rules are database : driven and can't be stored inside solr indexes. These rules would ultimately : two do things - : :1. Change the order of Lucene hits. :2. Add/remove some results to/from the Lucene hits. : : What should be my starting point? Custom search handler? This smells like an XY problem ... can you elaborate on the types of rules/conditions/situations when you want #1 and #2 listed above to happen? http://people.apache.org/~hossman/#xyproblemhttp://people.apache.org/%7Ehossman/#xyproblem http://people.apache.org/%7Ehossman/#xyproblem http://people.apache.org/%7Ehossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Rules engine and Solr
Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 Hahaha, thats classic Hoss! Thanks for introducing me to the XY problem. Had I known the two completely, I wouldn't have posted it on the mailing list. And I wasn't looking for a solution either. Anyways, as I replied back earlier, I'll get back with questions once I get more clarity. Cheers Avlesh On Wed, Jan 6, 2010 at 2:02 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am planning to build a rules engine on top search. The rules are database : driven and can't be stored inside solr indexes. These rules would ultimately : two do things - : :1. Change the order of Lucene hits. :2. Add/remove some results to/from the Lucene hits. : : What should be my starting point? Custom search handler? This smells like an XY problem ... can you elaborate on the types of rules/conditions/situations when you want #1 and #2 listed above to happen? http://people.apache.org/~hossman/#xyproblemhttp://people.apache.org/%7Ehossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Rules engine and Solr
Thanks for the response, Shalin. I am still in two minds over doing it inside Solr versus outside. I'll get back with more questions, if any. Cheers Avlesh On Mon, Jan 4, 2010 at 5:11 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Jan 4, 2010 at 10:24 AM, Avlesh Singh avl...@gmail.com wrote: I have a Solr (version 1.3) powered search server running in production. Search is keyword driven is supported using custom fields and tokenizers. I am planning to build a rules engine on top search. The rules are database driven and can't be stored inside solr indexes. These rules would ultimately two do things - 1. Change the order of Lucene hits. A Lucene FieldComparator is what you'd need. The QueryElevationComponent uses this technique. 2. Add/remove some results to/from the Lucene hits. This is a bit more tricky. If you will always have a very limited number of docs to add or remove, it may be best to change the query itself to include or exclude them (i.e. add fq). Otherwise you'd need to write a custom Collector (see DocSetCollector) and change SolrIndexSearcher to use it. We are planning to modify SolrIndexSearcher to allow custom collectors soon for field collapsing but for now you will have to modify it. What should be my starting point? Custom search handler? A custom SearchComponent which extends/overrides QueryComponent will do the job. -- Regards, Shalin Shekhar Mangar.
Rules engine and Solr
I have a Solr (version 1.3) powered search server running in production. Search is keyword driven is supported using custom fields and tokenizers. I am planning to build a rules engine on top search. The rules are database driven and can't be stored inside solr indexes. These rules would ultimately two do things - 1. Change the order of Lucene hits. 2. Add/remove some results to/from the Lucene hits. What should be my starting point? Custom search handler? Cheers Avlesh
Re: Newbie Solr questions
a) Since Solr is built on top of lucene, using SolrJ, can I still directly create custom documents, specify the field specifics etc (indexed, stored etc) and then map POJOs to those documents, simular to just using the straight lucene API? b) I took a quick look at the SolrJ javadocs but did not see anything in there that allowed me to customize if a field is stored, indexed, not indexed etc. How do I do that with SolrJ without having to go directly to the lucene apis? c) The SolrJ beans package. By annotating a POJO with @Field, how exactly does SolrJ treat that field? Indexed/stored, or just indexed? Is there any other way to control this? The answer to all your questions above is the magical file called schema.xml. For more read here - http://wiki.apache.org/solr/SchemaXml. SolrJ is simply a java client to access (read and update from) the solr server. c) If I create a custom index outside of Solr using straight lucene, is it easy to import a pre-exisiting lucene index into a Solr Server? As long as the Lucene index matches the definitions in your schema you can use the same index. The data however needs to copied into a predictable location inside SOLR_HOME. Cheers Avlesh On Sun, Nov 15, 2009 at 9:26 AM, yz5od2 woods5242-outdo...@yahoo.comwrote: Hi, I am new to Solr but fairly advanced with lucene. In the past I have created custom Lucene search engines that indexed objects in a Java application, so my background is coming from this requirement a) Since Solr is built on top of lucene, using SolrJ, can I still directly create custom documents, specify the field specifics etc (indexed, stored etc) and then map POJOs to those documents, simular to just using the straight lucene API? b) I took a quick look at the SolrJ javadocs but did not see anything in there that allowed me to customize if a field is stored, indexed, not indexed etc. How do I do that with SolrJ without having to go directly to the lucene apis? c) The SolrJ beans package. By annotating a POJO with @Field, how exactly does SolrJ treat that field? Indexed/stored, or just indexed? Is there any other way to control this? c) If I create a custom index outside of Solr using straight lucene, is it easy to import a pre-exisiting lucene index into a Solr Server? thanks!
Re: how to search against multiple attributes in the index
Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
For a starting point, this might be a good read - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query Cheers Avlesh On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote: I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Obtaining list of dynamic fields beind available in index
Luke Handler? - http://wiki.apache.org/solr/LukeRequestHandler /admin/luke?numTerms=0 Cheers Avlesh On Fri, Nov 13, 2009 at 10:05 PM, Eugene Dzhurinsky b...@redwerk.comwrote: Hi there! How can we retrieve the complete list of dynamic fields, which are currently available in index? Thank you in advance! -- Eugene N Dzhurinsky
Re: how to search against multiple attributes in the index
you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Nope. You would need to read more - http://wiki.apache.org/solr/FilterQueryGuidance For your impatience, here's a quick starter - #and between two fields solrQuery.setQuery(+field1:foo +field2:bar); #or between two fields solrQuery.setQuery(field1:foo field2:bar); Cheers Avlesh On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev vika...@yahoo.com wrote: I think I found the answer. needed to read more API documentation :-) you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Avlesh Singh wrote: For a starting point, this might be a good read - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query Cheers Avlesh On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote: I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: Dive in - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote: I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term (techGroup,searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term(techProgram,searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reseting doc boosts
AFAIK there is no way to reset the doc boost. You would need to re-index. Moreover, there is no way to search by boost. Cheers Avlesh On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote: Hi, Im trying to figure out if there is an easy way to basically reset all of any doc boosts which you have made (for analytical purposes) ... for example if I run an index, gather report, doc boost on the report, and reset the boosts @ time of next index ... It would seem to be from just knowing how Lucene works that I would really need to reindex since its a attrib on the doc itself which would have to be modified, but there is no easy way to query for docs which have been boosted either. Any insight? Thanks. - Jon
Re: [DIH] concurrent requests to DIH
1. Is it considered as good practice to set up several DIH request handlers, one for each possible parameter value? Nothing wrong with this. My assumption is that you want to do this to speed up indexing. Each DIH instance would block all others, once a Lucene commit for the former is performed. 2. In case the range of parameter values is broad, it's not convenient to define separate request handlers for each value. But this entails a limitation (as far as I see): It is not possible to fire several request to the same DIH handler (with different parameter values) at the same time. Nope. I had done a similar exercise in my quest to write a ParallelDataImportHandler. This thread might be of interest to you - http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler. Though there is a ticket in JIRA, I haven't been able to contribute this back. If you think this is what you need, lemme know. Cheers Avlesh On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott sz...@zib.de wrote: Hi all, I'm using the DIH in a parameterized way by passing request parameters that are used inside of my data-config. All imports end up in the same index. 1. Is it considered as good practice to set up several DIH request handlers, one for each possible parameter value? 2. In case the range of parameter values is broad, it's not convenient to define separate request handlers for each value. But this entails a limitation (as far as I see): It is not possible to fire several request to the same DIH handler (with different parameter values) at the same time. However, in case several request handlers would be used (as in 1.), concurrent requests (to the different handlers) are possible. So, how to overcome this limitation? Best, Sascha
Re: Question about the message Indexing failed. Rolled back all changes.
But even after I successfully index data using http://host:port/solr-example/dataimport?command=full-importcommit=trueclean=true, do solr search which returns meaningful results I am not sure what meaningful means. The full-import command starts an asynchronous process to start re-indexing. The response that you get in return to the above mentioned URL, (always) indicates that a full-import has been started. It does NOT know about anything that might go wrong with the process itself. and then visit http://host:port/solr-example/dataimport?command=status, I can see thefollowing result ... The status URL is the one which tells you what is going on with the process. The message - Indexing failed. Rolled back all changes can come because of multiple reasons - missing database drivers, incorrect sql queries, runtime errors in custom transformers etc. Start the full-import once more. Keep a watch on the Solr server log. If you can figure out what's going wrong, great; otherwise, copy-paste the exception stack-trace from the log file for specific answers. Cheers Avlesh On Tue, Nov 10, 2009 at 1:32 PM, Bertie Shen bertie.s...@gmail.com wrote: No. I did not check the logs. But even after I successfully index data using http://host:port /solr-example/dataimport?command=full-importcommit=trueclean=true, do solr search which returns meaningful results, and then visit http://host:port/solr-example/dataimport?command=status, I can see the following result response - lst name=responseHeader int name=status0/int int name=QTime1/int /lst - lst name=initArgs - lst name=defaults str name=configdata-config.xml/str /lst /lst str name=commandstatus/str str name=statusidle/str str name=importResponse/ - lst name=statusMessages str name=Time Elapsed0:2:11.426/str str name=Total Requests made to DataSource584/str str name=Total Rows Fetched1538/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-11-09 23:54:41/str *str name=Indexing failed. Rolled back all changes./str* str name=Committed2009-11-09 23:54:42/str str name=Optimized2009-11-09 23:54:42/str str name=Rolledback2009-11-09 23:54:42/str /lst - str name=WARNING This response format is experimental. It is likely to change in the future. /str /response On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen bertie.s...@gmail.com wrote: When I use http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport to debug the indexing config file, I always see the status message on the right part str name=Indexing failed. Rolled back all changes./str, even the indexing process looks to be successful. I am not sure whether you guys have seen the same phenomenon or not. BTW, I usually check the checkbox Clean and sometimes check Commit box, and then click Debug Now button. Do you see any exceptions in the logs? -- Regards, Shalin Shekhar Mangar.
Re: How TEXT field make sortable?
Can some one help me how we can sort the text field. You CANNOT sort on a text field. Sorting can only be done on an untokenized field (e.g string, sint, sfloat etc fields) Cheers Avlesh On Tue, Nov 10, 2009 at 11:44 AM, deepak agrawal dk.a...@gmail.com wrote: Can some one help me how we can sort the text field. field name=TITLE type=text indexed=true stored=true/ -- DEEPAK AGRAWAL +91-9379433455 GOOD LUCK.
Re: solr query help alpha numeric and not
Didn't the queries in my reply work? Cheers Avlesh On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi yes its a string, in the case of a title, it can be anything, a letter a number, a symbol or a multibyte char etc. Any ideas if I wanted a query that was not a letter a-z or a number 0-9, given that its a string? thanks Joel On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote: Hi Joel, The ID is sent back as a string (instead of as an integer) in your example. Could this be the cause? - Jonathan On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote: Hi, I have a field called firstLetterTitle, this field has 1 char, it can be anything, I need help with a few queries on this char: 1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z or 0-9 I tried: http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z But I get back numeric results: doc str name=firstLetterTitle9/str str name=id23946447/str /doc 2.) I want all only Numerics: http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209 This seems to work but just checking if its the right way. 2.) I want all only English Letters: http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z This seems to work but just checking if its the right way. thanks Joel
Re: Bug with DIH and MySQL CONCAT()?
Try cast(concat(...) as char) ... Cheers Avlesh On Wed, Nov 4, 2009 at 7:36 PM, Jonathan Hendler jonathan.hend...@gmail.com wrote: Hi All, I have an SQL query that begins with SELECT CONCAT ( 'ID', Subject.id , ':' , Subject.name , ':L', Subject.level) as subject_name and the query runs great against MySQL from the command line. Since this is a nested entity, the schema.xml contains field name=subject_name type=string indexed=true stored=true multiValued=true / After a full-import, a select output of the xml looks like arr name=subject_name str[...@1db4c43/str str[...@6bcef1/str str[...@1df503b/str str[...@c5dbb/str str[...@1ddc3ea/str str[...@6963b0/str str[...@10fe215/str ... Without a CONCAT - it works fine. Is this a bug? Meanwhile - should I go about concatenating some where else in the DIH config? Thanks. - Jonathan
Re: How to integrate Solr into my project
Take a look at this - http://wiki.apache.org/solr/Solrj Cheers Avlesh On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan caroline@gmail.com wrote: Hi, I wish to intergrate Solr into my current working project. I've played around the Solr example and get it started in my tomcat. But the next step is HOW do i integrate that into my working project? You see, Lucence provides API and tutorial on what class i need to instanstiate in order to index and search. But Solr seems to be pretty vague on this..as it is a working solr search server. Can anybody help me by stating the steps by steps, what classes that i should look into in order to assimiliate Solr into my project? Thanks. regards ~caroLine
Re: SolrJ looping until I get all the results
This isn't a search, this is a search and destroy. Basically I need the file names of all the documents that I've indexed in Solr so that I can delete them. Okay. I am sure you are aware of the fl parameter which restricts the number of fields returned back with a response. If you need limited info, it might be a good idea to use this parameter. Cheers Avlesh On Tue, Nov 3, 2009 at 7:23 AM, Paul Tomblin ptomb...@xcski.com wrote: On Mon, Nov 2, 2009 at 8:47 PM, Avlesh Singh avl...@gmail.com wrote: I was doing it that way, but what I'm doing with the documents is do some manipulation and put the new classes into a different list. Because I basically have two times the number of documents in lists, I'm running out of memory. So I figured if I do it 1000 documents at a time, the SolrDocumentList will get garbage collected at least. You are right w.r.t to all that but I am surprised that you would need ALL the documents from the index for a search requirement. This isn't a search, this is a search and destroy. Basically I need the file names of all the documents that I've indexed in Solr so that I can delete them. -- http://www.linkedin.com/in/paultomblin http://careers.stackoverflow.com/ptomblin
Re: solrj query size limit?
Did you hit the limit for maximum number of characters in a GET request? Cheers Avlesh On Tue, Nov 3, 2009 at 9:36 AM, Gregg Horan greggho...@gmail.com wrote: I'm constructing a query using solrj that has a fairly large number of 'OR' clauses. I'm just adding it as a big string to setQuery(), in the format accountId:(this OR that OR yada). This works all day long with 300 values. When I push it up to 350-400 values, I get a Bad Request SolrServerException. It appears to just be a client error - nothing reaching the server logs. Very repeatable... dial it back down and it goes through again fine. The total string length of the query (including a handful of other faceting entries) is about 9500chars. I do have the maxBooleanClauses jacked up to 2048. Using javabin. 1.4-dev. Are there any other options or settings I might be overlooking? -Gregg
Re: problems with PhraseHighlighter
Copy-paste your field definition for the field you are trying to highlight/search on. Cheers Avlesh On Sun, Nov 1, 2009 at 8:24 PM, AHMET ARSLAN iori...@yahoo.com wrote: Hello everyone, I am having problems with highlighting the complete text of a field. I have an xml field. I am querying proximity searches on this field. xml: ( proximity1 AND/OR proximity2 AND/OR …) Results are returned successfully satisfying the proximity query. However when I request highlighting sometimes it returns nothing sometimes it returns missing proximity terms. I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml. maxFieldLength2147483647/maxFieldLength I am using these highlighting parameters: hl.maxAnalyzedChars=2147483647 hl.fragsize=2147483647 hl.usePhraseHighlighter=true hl.requireFieldMatch=true hl.fl=xml hl=true I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but all query terms are highlighted. What value of hl.fragsize should I use to highlight complete text of a field? 0 or 2147483647? What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize? I am querying same field and requesting same field in highlighting. Although a document matches a query no highlighting returns back. What could be the reason? If a document matches a query, there should be highlighting returning back, right? Any help or pointers are really appreciated.
Re: best way to model 1-N
what am I missing? Change your entity name=category query=select cfcr.feedId ... to entity name=category *transformer=RegexTransformer* query=select cfcr.feedId .. The splitBy directive is understood by this transformer and in your case the attribute was simply ignored. Don't forget to re-index once you have changed. Cheers Avlesh On Fri, Oct 30, 2009 at 9:33 PM, Joel Nylund jnyl...@yahoo.com wrote: Thanks Chantal, I will keep that in mind for tuning, for sql I figured way to combine them into one row using concat, but I still seem to be having an issue splitting them: Db now returns as one column categoryType: TOPIC,LANGUAGE but my solr result, if you note the item in categoryType all seem to be within one str, I would expect it to be in multiple strings within the array, is this assumption wrong? doc - arr name=categoryType strTOPIC,LANGUAGE/str /arr str name=id40/str str name=titlefeed title/str /doc Here is my import: document name=doc entity name=item query=SELECT f.id, f.title FROM Feed f field column=id name=id / field column=title name=title / entity name=category query=select cfcr.feedId, group_concat(cfcr.categoryType) as categoryType from CFR cfcr where cfcr.feedId = '${item.id}' AND group by cfcr.feedId field column=categoryType name=categoryType splityBy=, / /entity /entity In schema: field name=categoryType type=text indexed=true stored=true required=false multiValued=true/ field name=categoryName type=text indexed=true stored=true required=false multiValued=true/ what am I missing? thanks Joel On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote: That depends a bit on your database, but it is tricky and might not be performant. If you are more of a Java developer, you might prefer retrieving mutliple rows per SOLR document from your dataSource (join on your category and main table), and aggregate them in your custom EntityProcessor. I got a far(!) better performance retrieving everything in one query and doing the aggregation in Java. But this is, of course, depending on your table structure and data. Noble Paul helped me with the custom EntityProcessor, and it turned out quite easy. Have a look at the thread with the heading from this mailing list (SOLR-USER): DataImportHandler / Import from DB : one data set comes in multiple rows Cheers, Chantal Joel Nylund schrieb: thanks, but im confused how I can aggregate across rows, I dont know of any easy way to get my db to return one row for all the categories (given the hint from your other email), I have split the category query into a separate entity, but its returning multiple rows, how do I combine multiple rows into 1 index entity? thanks Joel On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote: In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. Yes you are right. A multivalued field for categories is the answer. For populating in the index - 1. If you use DIH to populate your indexes and your datasource is a database then you can use DIH's RegexTransformer on an aggregated list of categories. e.g. if your database query retruns a,b,c,d in a column called db_categories, this is how you would put it in DIH's data-config file - field column=db_categories name=categories splityBy=, /. 2. If you add documents to Solr yourself multiple values for the field can be specified as an array or list of values in the SolrInputDocument. A multivalued field provides the same faceting and searching capabilites like regular fields. There is no special syntax. Cheers Avlesh On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have one index so far which contains feeds. I have been able to de-normalize several tables and map this data onto the feed entity. There is one tricky problem that I need help on. Feeds have 1 - many categories. So Lets say we have Category1, Category2 and Category3 Feed 1 - is in Category 1 Feed 2 is in category2 and category3 Feed 3 is in category2 Feed 4 has no category In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would
Re: autocomplete
q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?; Why is the json.wrf not specified? Without the callback function, the string that is return back is illegal javascript for the browser. You need to specify this parameter which is a wrapper or a callback function. If you specify json.wrf=foo, as soon as the browser gets a response, it would call a function named foo (needs to already defined). Inside foo you can have you own implementation to interpret and render this data. Cheers Avlesh On Sat, Oct 31, 2009 at 12:13 AM, Ankit Bhatnagar abhatna...@vantage.comwrote: Hi guys, Enterprise 1.4 Solr Book (AutoComplete) says this works - My query looks like - q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?; And it returns three results { responseHeader:{ status:0, QTime:38, params:{ indent:on, start:0, q:*:*, wt:json, fq:ac:*all*, rows:15}}, response:{numFound:3,start:0,docs:[ { id:1, ac:Can you show me all the results}, { id:2, ac:Can you show all companies }, { id:3, ac:Can you list all companies}] }} But browser says syntax error -- Ankit
Re: Iso accents and wildcards
When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time. Wildcard queries are not analyzed and hence the inconsistent behaviour. The easiest way out is to define one more field title_orginal as an untokenized field. While querying, you can use both the fields at the same time. e.g. q=(title:écon* title_orginal:écon*). In any case, you would get desired matches. Cheers Avlesh On Fri, Oct 30, 2009 at 9:19 PM, Nicolas Leconte nicolas.ai...@aidel.comwrote: Hi all, I have a field that contains accentuated char in it, what I whant is to be able to search with ignore accents. I have set up that field with : analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SnowballPorterFilterFactory language=French/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer In the index the word économie is translated to econom, the accent is removed thanks to the ISOLatin1AccentFilterFactory and the end of the word removent thanks to the SnowballPorterFilterFactory. When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time. I have tested with changing the order of the filters (putting the ISOLatin1AccentFilterFactory on top) without any result. Could anybody help me with that and point me what may be wrong with my shema ?
Re: Indexing multiple entities
The use case on DocumentObjectBinder is that I could override toSolrInputDocument, and if field = ID, I could do: setField(id, obj.getClass().getName() + obj.getId()) or something like that. Unless I am missing something here, can't you write the getter of id field in your solr bean as underneath? @Field private String id; public getId(){ return (this.getClass().getName() + this.id); } Cheers Avlesh On Fri, Oct 30, 2009 at 1:33 PM, Christian López Espínola penyask...@gmail.com wrote: On Fri, Oct 30, 2009 at 2:04 AM, Avlesh Singh avl...@gmail.com wrote: One thing I thought about is if I can define my own DocumentObjectBinder, so I can concatenate my entity names with the IDs in the XML creation. Anyone knows if something like this can be done without modifying Solrj sources? Is there any injection or plugin mecanism for this? More details on the use-case please. If I index a Book with ID=3, and then a Magazine with ID=3, I'll be really removing my Book3 and indexing Magazine3. I want both entities to be in the index. The use case on DocumentObjectBinder is that I could override toSolrInputDocument, and if field = ID, I could do: setField(id, obj.getClass().getName() + obj.getId()) or something like that. The goal is avoiding creating all the XMLs to be sent to Solr but having the possibility of modifying them in some way. Do you know how can I do that, or a better way of achieving the same results? Cheers Avlesh On Fri, Oct 30, 2009 at 2:16 AM, Christian López Espínola penyask...@gmail.com wrote: Hi Israel, Thanks for your suggestion, On Thu, Oct 29, 2009 at 9:37 PM, Israel Ekpo israele...@gmail.com wrote: On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola penyask...@gmail.com wrote: Hi, my name is Christian and I'm a newbie introducing to solr (and solrj). I'm working on a website where I want to index multiple entities, like Book or Magazine. The issue I'm facing is both of them have an attribute ID, which I want to use as the uniqueKey on my schema, so I cannot identify uniquely a document (because ID is saved in a database too, and it's autonumeric). I'm sure that this is a common pattern, but I don't find the way of solving it. How do you usually solve this? Thanks in advance. -- Cheers, Christian López Espínola penyaskito Hi Christian, It looks like you are bringing in data to Solr from a database where there are two separate tables. One for *Books* and another one for *Magazines*. If this is the case, you could define your uniqueKey element in Solr schema to be a string instead of an integer then you can still load documents from both the books and magazines database tables but your could prefix the uniqueKey field with B for books and M for magazines Like so : field name=id type=string indexed=true stored=true required=true/ uniqueKeyid/uniqueKey Then when loading the books or magazines into Solr you can create the documents with id fields like this add doc field name=idB14000/field /doc doc field name=idM14000/field /doc doc field name=idB14001/field /doc doc field name=idM14001/field /doc /add I hope this helps This was my first thought, but in practice there isn't Book and Magazine, but about 50 different entities, so I'm using the Field annotation of solrj for simplifying my code (it manages for me the XML creation, etc). One thing I thought about is if I can define my own DocumentObjectBinder, so I can concatenate my entity names with the IDs in the XML creation. Anyone knows if something like this can be done without modifying Solrj sources? Is there any injection or plugin mecanism for this? Thanks in advance. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. -- Cheers, Christian López Espínola penyaskito -- Cheers, Christian López Espínola penyaskito
Re: begins with searches
G'day Avlesh, converting the all field to type edgytext doesn't work as expected as the various text analysers etc don't get to work on that field, so I get less results than expected. And adding the edgy filter into the text field also yields less results. I can work around the issue by setting up a new beginswith edgytext field and using copyfield to copy the relevant fields into it. You are absolutely right. What you think of being a work-around is actually a solution! But this approach doesn't really suit our parent application's main search screen, which is a single box labelled quick search. Users will be puzzled as to why a search for beginswith:Houghton, b yields 20 results, while a search for Houghton, b yields 10. And also puzzled as to why Houghton, b* won't work. as they expect - people are already familiar with using wildcards. A way to get around this user perception problem is to get rid of the single search box and set up a series of drop down boxes for type of search (begins with, etc), along with field names. We might have to go there, but the ideal solution from our perspective would be for users to be able to enter terms in the quick search box without any field prefix, and have solr go off and search all field names/types. As I said earlier, a field can be analyzed in only ONE way. In your kind of requirements, multiple searching capabilities are needed for a single query. Unfortunately, not all of these can be addressed by a single field. The solution is to create multiple fields set up with different analyzers (tokenizers and filters) while indexing and searching. At query time an OR query can be done for all such fields (with a corresponding boost for a particular field, if desired). Lucene would automatically rank the results in correct order based on hits across multiple fields. Hope this helps. And sorry for the delayed response. Cheers Avlesh On Fri, Oct 30, 2009 at 3:22 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: G'day Avlesh, converting the all field to type edgytext doesn't work as expected as the various text analysers etc don't get to work on that field, so I get less results than expected. And adding the edgy filter into the text field also yields less results. I can work around the issue by setting up a new beginswith edgytext field and using copyfield to copy the relevant fields into it. But this approach doesn't really suit our parent application's main search screen, which is a single box labelled quick search. Users will be puzzled as to why a search for beginswith:Houghton, b yields 20 results, while a search for Houghton, b yields 10. And also puzzled as to why Houghton, b* won't work.as they expect - people are already familiar with using wildcards. A way to get around this user perception problem is to get rid of the single search box and set up a series of drop down boxes for type of search (begins with, etc), along with field names. We might have to go there, but the ideal solution from our perspective would be for users to be able to enter terms in the quick search box without any field prefix, and have solr go off and search all field names/types. By the way, our text field type config is currently set as - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bern -Original Message- From: Avlesh Singh [mailto:avl...@gmail.com] Sent: Thursday, 29 October 2009 12:35 PM To: solr-user@lucene.apache.org Subject: Re: begins
Re: best way to model 1-N
In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. Yes you are right. A multivalued field for categories is the answer. For populating in the index - 1. If you use DIH to populate your indexes and your datasource is a database then you can use DIH's RegexTransformer on an aggregated list of categories. e.g. if your database query retruns a,b,c,d in a column called db_categories, this is how you would put it in DIH's data-config file - field column=db_categories name=categories splityBy=, /. 2. If you add documents to Solr yourself multiple values for the field can be specified as an array or list of values in the SolrInputDocument. A multivalued field provides the same faceting and searching capabilites like regular fields. There is no special syntax. Cheers Avlesh On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have one index so far which contains feeds. I have been able to de-normalize several tables and map this data onto the feed entity. There is one tricky problem that I need help on. Feeds have 1 - many categories. So Lets say we have Category1, Category2 and Category3 Feed 1 - is in Category 1 Feed 2 is in category2 and category3 Feed 3 is in category2 Feed 4 has no category In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. thanks Joel
Re: multiple sql queries for one index?
Read this example fully - http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example nested entities is an answer to your question. The example has a sample. Cheers Avlesh On Fri, Oct 30, 2009 at 2:58 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, Its been hurting my brain all day to try to build 1 query for my index (joins upon joins upon joins). Is there a way I can do multiple queries to populate the same index? I have one main table that I can join everything back via ID, it should be theoretically possible If this can be done, can someone point me to an example? thanks Joel
Re: Indexing multiple entities
One thing I thought about is if I can define my own DocumentObjectBinder, so I can concatenate my entity names with the IDs in the XML creation. Anyone knows if something like this can be done without modifying Solrj sources? Is there any injection or plugin mecanism for this? More details on the use-case please. Cheers Avlesh On Fri, Oct 30, 2009 at 2:16 AM, Christian López Espínola penyask...@gmail.com wrote: Hi Israel, Thanks for your suggestion, On Thu, Oct 29, 2009 at 9:37 PM, Israel Ekpo israele...@gmail.com wrote: On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola penyask...@gmail.com wrote: Hi, my name is Christian and I'm a newbie introducing to solr (and solrj). I'm working on a website where I want to index multiple entities, like Book or Magazine. The issue I'm facing is both of them have an attribute ID, which I want to use as the uniqueKey on my schema, so I cannot identify uniquely a document (because ID is saved in a database too, and it's autonumeric). I'm sure that this is a common pattern, but I don't find the way of solving it. How do you usually solve this? Thanks in advance. -- Cheers, Christian López Espínola penyaskito Hi Christian, It looks like you are bringing in data to Solr from a database where there are two separate tables. One for *Books* and another one for *Magazines*. If this is the case, you could define your uniqueKey element in Solr schema to be a string instead of an integer then you can still load documents from both the books and magazines database tables but your could prefix the uniqueKey field with B for books and M for magazines Like so : field name=id type=string indexed=true stored=true required=true/ uniqueKeyid/uniqueKey Then when loading the books or magazines into Solr you can create the documents with id fields like this add doc field name=idB14000/field /doc doc field name=idM14000/field /doc doc field name=idB14001/field /doc doc field name=idM14001/field /doc /add I hope this helps This was my first thought, but in practice there isn't Book and Magazine, but about 50 different entities, so I'm using the Field annotation of solrj for simplifying my code (it manages for me the XML creation, etc). One thing I thought about is if I can define my own DocumentObjectBinder, so I can concatenate my entity names with the IDs in the XML creation. Anyone knows if something like this can be done without modifying Solrj sources? Is there any injection or plugin mecanism for this? Thanks in advance. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. -- Cheers, Christian López Espínola penyaskito
Re: MLT cross core
My thought now is I cannot use MLT and instead must do a query to B using the fields from core A ID as query params. Is there big difference in what will be returned as results using query instead of MLT? Yes, there is definitely a difference between the results from a MLT handler and any other search handler. MLT components/handlers are supposed to return similar results based on stored TermVectors for a field. Search handlers are supposed to return exact matching results based on indexed values for a field. In your multi-core set-up, I don't think you are anywhere close to using MLT. The arrangement looks more like a search query. Cheers Avlesh On Wed, Oct 28, 2009 at 5:37 PM, Adamsky, Robert radam...@techtarget.comwrote: Have two cores with some common fields in their schemas. I want to perform a MLT query on one core and get results from the other schema. Both cores have same type of id. Having the same type of id in two different cores is of no good for a MLT handler (which in-fact operates on one core) Right; the cores share the same data type and name for their ID. Was hoping that would allow me to do the same thing I am doing for cross core queries on common schema fields - I can query one core and get aggregate results from both based on common fields. How is it suggested to perform a MLT query cross core / schema? Currently, I only see getting the result from one core and performing a query with the common fields in the second core and treating those results as MLT results. It depends on your requirement. If it is about simply aggregating the results, then you can run MLT handler for both the cores independently and merge the response thereafter based on your understanding of the underlying data in the responses. Am trying to do the following given: - Two cores with common named and typed IDs (no dupes between them) - Some number of common fields for other data like title and body. Make a MLT query to core A (or event core B) passing in ID of core A and getting MLT results that have data from core B only. My thought now is I cannot use MLT and instead must do a query to B using the fields from core A ID as query params. Is there big difference in what will be returned as results using query instead of MLT?
Re: MLT cross core
Does that mean that you cannot do a 'MLT' query from one core result to get MLT from another (even if there is some common schema between)? You can always run MLT handlers on a core. Each MLT handler takes certain parameters based on which similar results are fetched. You would need to pass such parameters for MLT handler to work properly. It is immaterial where do you get these values from. In your case it happens to be an outcome of results from another core. As I said, it hardly matters. Cheers Avlesh On Wed, Oct 28, 2009 at 10:10 PM, Adamsky, Robert radam...@techtarget.comwrote: Thanks for the reply -- In your multi-core set-up, I don't think you are anywhere close to using MLT. The arrangement looks more like a search query. Does that mean that you cannot do a 'MLT' query from one core result to get MLT from another (even if there is some common schema between)?
Re: Simple problem with a nested entity and it's SQL
Shouldn't this work too? SELECT * FROM table2 WHERE IS NOT NULL ${table1.somethin_like_a_foreign_key} AND ${table1.somethin_like_a_foreign_key} 0 AND id = ${table1.somethin_like_a_foreign_key} Cheers Avlesh On Wed, Oct 28, 2009 at 11:03 PM, Jonathan Hendler jonathan.hend...@gmail.com wrote: I have a nested entity on a jdbc data import handler that is causing an SQL error because the second key is either NULL (blank when generating the sql) or non-zero INT. The query is in the following form: document name=content entity name=bl_lessonfiles transformer=TemplateTransformer query=SELECT * FROM table1 ... entity name=user_index query=SELECT * FROM table2 WHERE id = ${table1.somethin_like_a_foreign_key} /entity /entity /document Is the only way to avoid this to modify the source DB schema to be NOT NULL so it always returns at least a 0? - Jonathan
Re: Simple problem with a nested entity and it's SQL
Assuming this to be MySQL, will this work - SELECT * FROM table2 WHERE id = IF(ISNULL(${table1.somethin_like_a_foreign_key}), 0, ${table1.somethin_like_a_foreign_key}); Cheers Avlesh On Wed, Oct 28, 2009 at 11:12 PM, Jonathan Hendler jonathan.hend...@gmail.com wrote: No - the SQL will fail to validate because at runtime it will look like SELECT * FROM table2 WHERE IS NOT NULL table1.somethin_like_a_foreign_key AND table1.somethin_like_a_foreign_key 0 AND id = Note the id = On Oct 28, 2009, at 1:38 PM, Avlesh Singh wrote: Shouldn't this work too? SELECT * FROM table2 WHERE IS NOT NULL ${table1.somethin_like_a_foreign_key} AND ${table1.somethin_like_a_foreign_key} 0 AND id = ${table1.somethin_like_a_foreign_key} Cheers Avlesh On Wed, Oct 28, 2009 at 11:03 PM, Jonathan Hendler jonathan.hend...@gmail.com wrote: I have a nested entity on a jdbc data import handler that is causing an SQL error because the second key is either NULL (blank when generating the sql) or non-zero INT. The query is in the following form: document name=content entity name=bl_lessonfiles transformer=TemplateTransformer query=SELECT * FROM table1 ... entity name=user_index query=SELECT * FROM table2 WHERE id = ${table1.somethin_like_a_foreign_key} /entity /entity /document Is the only way to avoid this to modify the source DB schema to be NOT NULL so it always returns at least a 0? - Jonathan
Re: faceting ordering
curious...is it possible to have faceted results ordered by score? First, I am not sure what that means. Score of what? Documents? If yes, how do you think the same should influence faceting? Second, there are only two ways you can sort facet values on a field. More here - http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort If you can further elaborate on your use case, you might get better solutions for the problem at hand. Cheers Avlesh On Wed, Oct 28, 2009 at 11:31 PM, Joe Calderon calderon@gmail.comwrote: curious...is it possible to have faceted results ordered by score? im having a problem where im faceting on a field while searching for the same word twice, for example: im searching for the the on a tokenized field and faceting by the untokenized version, faceting returns records with the the, but way at the bottom since everything with a single the happens to be way more frequent, i tried restricting my search to phrases with small slop: myfield:token1 token2 token3~3 but that affects other searches negatively, ideally i want to be as loose as possible as these searches power an auto suggest feature i figured if faceted results could be sorted by score, i could simply boost phrases instead of restricting by them, thoughts? --joe
Re: Faceting within one document
For facets - http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount For terms - http://wiki.apache.org/solr/TermsComponent Helps? Cheers Avlesh On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg andrew.cl...@gmail.comwrote: Hi, If I give a query that matches a single document, and facet on a particular field, I get a list of all the terms in that field which appear in that document. (I also get some with a count of zero, I don't really understand where they come from... ?) Is it possible with faceting, or a similar mechanism, to get a count of how many times each term appears within that document? This would be really useful for building a list of top keywords within a long document, for summarization purposes. I can do it on the client side but it'd be nice to know if there's a quicker way. Thanks! Andrew. -- View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: weird problem with letters S and T
Any ideas, are S and T special chars in query for solr? Nope, they are NOT. My guess is that - You are using a text type field for firstLetterTitle which has the stopword filter applied to it. - Your stopwords.txt file contains the characters s and t because of which the above mentioned filter eats them up while indexing and searching. If the above assumptions are correct, then there are two ways to fix it - - Remove the characters s and t from your stopwords.txt file and do a re-index. Searches should work fine after that. - For this particular use-case, you can keep your firstLetterTitle field as a string type untokenized field. You will not have to worry about stopwords in that case. Cheers Avlesh On Thu, Oct 29, 2009 at 3:47 AM, Joel Nylund jnyl...@yahoo.com wrote: (I am super new to solr, sorry if this is an easy one) Hi, I want to support an A-Z type view of my data. I have a DataImportHandler that uses sql (my query is complex, but the part that matters is: SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f I can create this index with no issues. I can query the title with no problem: http://localhost:8983/solr/select?q=title:super I can query the first letters mostly with no problem: http://localhost:8983/solr/select?q=firstLetterTitle:a Returns all the foo's with the first letter a. This actually works with every letter except S and T If I query those, I get no results. The weird thing if I do the title query above with Super I get lots of results, and the xml shoes the firstLetterTitles for those to be S doc str name=firstLetterTitleS/str str name=id84861348/str str name=titleSuper Cool/str /doc - doc str name=firstLetterTitleS/str str name=id108692/str str name=titleSuper 45/str /doc - doc etc. Any ideas, are S and T special chars in query for solr? here is the response from the s query with debug = true response - lst name=responseHeader int name=status0/int int name=QTime24/int - lst name=params str name=qfirstLetterTitle:s/str str name=debugQuerytrue/str /lst /lst result name=response numFound=0 start=0/ - lst name=debug str name=rawquerystringfirstLetterTitle:s/str str name=querystringfirstLetterTitle:s/str str name=parsedquery/ str name=parsedquery_toString/ lst name=explain/ str name=QParserOldLuceneQParser/str - lst name=timing double name=time2.0/double - lst name=prepare double name=time1.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst - lst name=process double name=time0.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response thanks Joel
Re: begins with searches
It sounds from what you say that I'm going to need to change the field type to edgytext. Which won't achieve the result I want, viz. the current all plus the edgytext. Any way to achieve this? I guess there is a mismatch of expectations here. A field can be analyzed in only ONE way. If your field all is of type text, indexing and searching would go through the analyzers (tokenizers and filters) specified ONLY for the text field. It does not matter if data from a edgytext or any other field type is being copied into the field. Having said that converting the all field to type edgytext should still work fine. All your regular searches on a text field should also work with the edgytext field. Ain't it like that? Cheers Avlesh On Thu, Oct 29, 2009 at 2:52 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Here's the all code snippets - !-- catchall field, containing all other searchable text fields (implemented via copyField further on in this schema -- field name=all type=text indexed=true stored=false multiValued=true/ . . !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldall/defaultSearchField . . !-- Copy for ALL search -- copyField source=*_t dest=*_t_ft/ copyField source=*_mt dest=*_mft/ copyField source=content dest=all/ copyField source=*_t dest=all/ copyField source=*_mt dest=all/ It sounds from what you say that I'm going to need to change the field type to edgytext. Which won't achieve the result I want, viz. the current all plus the edgytext. Any way to achieve this? Thanks! bern -Original Message- From: Avlesh Singh [mailto:avl...@gmail.com] Sent: Wednesday, 28 October 2009 3:30 PM To: solr-user@lucene.apache.org Subject: Re: begins with searches My next issue relates to how to get the results of the author field come up in a search across all fields. For example, a search on author:Houghton, B (which uses the edgytext) yields 16 documents, but a search on all:Houghton, B (which doesn't) yields only 9. I thought the solution should be copyfield source=*author_mt dest=all/ but that doesn't do the trick. Do you have a field called all? How is it set up? Can you post the schema.xml snippet relating to this field here? copyField is supported for a dynamic field source. copyfield source=*author_mt dest=all/ should work for you as long as you have a field called all defined in your schema. Moreover, for your specific use case, the all field needs to be of type edgytext. Cheers Avlesh On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks Avlesh. The issue with not doing a phrase query on my edgytext field was that my parent application was adding an escape character to the quotation marks, and I was hoping to fix (or rather, work around) at the solr end to save maintenance overhead. But I've done a hack in the parent application to remove those escape chars, and all is working well in that respect. My next issue relates to how to get the results of the author field come up in a search across all fields. For example, a search on author:Houghton, B (which uses the edgytext) yields 16 documents, but a search on all:Houghton, B (which doesn't) yields only 9. I thought the solution should be copyfield source=*author_mt dest=all/ but that doesn't do the trick. Thanks! bern -Original Message- From: Avlesh Singh [mailto:avl...@gmail.com] Sent: Tuesday, 27 October 2009 5:54 PM To: solr-user@lucene.apache.org Subject: Re: begins with searches You are right about the parsing of query terms without a double quote (solrQueryParser's defaultOperator has to be AND in your case). For the problem at hand, two things - 1. Do you have any reason for not doing a PhraseQuery (query terms enclosed in double quotes) on your edgytext field? If not then you can always enclose your query in double quotes to get expected begins with matches. 2. You can always escape your query string before passing to Solr; and you wouldn't need to pass your query term in double quotes. For exapmle, search for the query string - surname, fre when escaped would be converted into surname,\+fre thereby asking Solr to treat this as a single query term. For more details - http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters . If you use SolrJ, there is a ClientUtils class somewhere in the package which has helper functions to achieve query escaping. Cheers Avlesh On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for this suggestion (thanks Gerald also: no, we're not using BlackLight-type prefixes). I've set up an edgytext fieldType in schema.xml thus - fieldType name=edgytext class
Re: begins with searches
You are right about the parsing of query terms without a double quote (solrQueryParser's defaultOperator has to be AND in your case). For the problem at hand, two things - 1. Do you have any reason for not doing a PhraseQuery (query terms enclosed in double quotes) on your edgytext field? If not then you can always enclose your query in double quotes to get expected begins with matches. 2. You can always escape your query string before passing to Solr; and you wouldn't need to pass your query term in double quotes. For exapmle, search for the query string - surname, fre when escaped would be converted into surname,\+fre thereby asking Solr to treat this as a single query term. For more details - http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters. If you use SolrJ, there is a ClientUtils class somewhere in the package which has helper functions to achieve query escaping. Cheers Avlesh On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for this suggestion (thanks Gerald also: no, we're not using BlackLight-type prefixes). I've set up an edgytext fieldType in schema.xml thus - fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType And defined a field name thus - dynamicField name=*author_mt type=edgytextindexed=true stored=true multiValued=true/ The results are mixed - * searches such as surname, f and surname, fre (with quotations and commas) work well, retrieving surname, f, surname, Fred, surname, Frederick etc etc * searches such as the above but without quotations don't work too well as they get parsed as author_mt:surname + author_mt:firstname, with solr reading the query as author beginning with surname AND author beginning with firstname, which yields nil results. Is there an analyser that will strip the whitespace out altogether? Or another alternative? bern -Original Message- From: Avlesh Singh [mailto:avl...@gmail.com] Sent: Monday, 26 October 2009 6:32 PM To: solr-user@lucene.apache.org Subject: Re: begins with searches Read up of setting-up these kind searches here - http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ Cheers Avlesh On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: We need to offer begins with type searches, e.g. a search for surname, f will retrieve surname, firstname, surname, f, surname fm etc. Ideally, the user would be able to enter something like surname f*. However, wildcards don't work on phrase searches, nor do range searches. Any suggestions as to how best to search for begins with phrases; or, how to best configure solr to support such searches? TIA Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto: bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: begins with searches
My next issue relates to how to get the results of the author field come up in a search across all fields. For example, a search on author:Houghton, B (which uses the edgytext) yields 16 documents, but a search on all:Houghton, B (which doesn't) yields only 9. I thought the solution should be copyfield source=*author_mt dest=all/ but that doesn't do the trick. Do you have a field called all? How is it set up? Can you post the schema.xml snippet relating to this field here? copyField is supported for a dynamic field source. copyfield source=*author_mt dest=all/ should work for you as long as you have a field called all defined in your schema. Moreover, for your specific use case, the all field needs to be of type edgytext. Cheers Avlesh On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks Avlesh. The issue with not doing a phrase query on my edgytext field was that my parent application was adding an escape character to the quotation marks, and I was hoping to fix (or rather, work around) at the solr end to save maintenance overhead. But I've done a hack in the parent application to remove those escape chars, and all is working well in that respect. My next issue relates to how to get the results of the author field come up in a search across all fields. For example, a search on author:Houghton, B (which uses the edgytext) yields 16 documents, but a search on all:Houghton, B (which doesn't) yields only 9. I thought the solution should be copyfield source=*author_mt dest=all/ but that doesn't do the trick. Thanks! bern -Original Message- From: Avlesh Singh [mailto:avl...@gmail.com] Sent: Tuesday, 27 October 2009 5:54 PM To: solr-user@lucene.apache.org Subject: Re: begins with searches You are right about the parsing of query terms without a double quote (solrQueryParser's defaultOperator has to be AND in your case). For the problem at hand, two things - 1. Do you have any reason for not doing a PhraseQuery (query terms enclosed in double quotes) on your edgytext field? If not then you can always enclose your query in double quotes to get expected begins with matches. 2. You can always escape your query string before passing to Solr; and you wouldn't need to pass your query term in double quotes. For exapmle, search for the query string - surname, fre when escaped would be converted into surname,\+fre thereby asking Solr to treat this as a single query term. For more details - http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters . If you use SolrJ, there is a ClientUtils class somewhere in the package which has helper functions to achieve query escaping. Cheers Avlesh On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for this suggestion (thanks Gerald also: no, we're not using BlackLight-type prefixes). I've set up an edgytext fieldType in schema.xml thus - fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType And defined a field name thus - dynamicField name=*author_mt type=edgytextindexed=true stored=true multiValued=true/ The results are mixed - * searches such as surname, f and surname, fre (with quotations and commas) work well, retrieving surname, f, surname, Fred, surname, Frederick etc etc * searches such as the above but without quotations don't work too well as they get parsed as author_mt:surname + author_mt:firstname, with solr reading the query as author beginning with surname AND author beginning with firstname, which yields nil results. Is there an analyser that will strip the whitespace out altogether? Or another alternative? bern -Original Message- From: Avlesh Singh [mailto:avl...@gmail.com] Sent: Monday, 26 October 2009 6:32 PM To: solr-user@lucene.apache.org Subject: Re: begins with searches Read up of setting-up these kind searches here - http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ Cheers Avlesh On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: We need to offer begins with type searches, e.g. a search for surname, f will retrieve surname, firstname, surname, f, surname fm etc. Ideally, the user would be able to enter something like surname f*. However, wildcards don't work on phrase searches, nor do range searches. Any suggestions as to how best to search for begins
Re: MLT cross core
Have two cores with some common fields in their schemas. I want to perform a MLT query on one core and get results from the other schema. Both cores have same type of id. Having the same type of id in two different cores is of no good for a MLT handler (which in-fact operates on one core) How is it suggested to perform a MLT query cross core / schema? Currently, I only see getting the result from one core and performing a query with the common fields in the second core and treating those results as MLT results. It depends on your requirement. If it is about simply aggregating the results, then you can run MLT handler for both the cores independently and merge the response thereafter based on your understanding of the underlying data in the responses. Cheers Avlesh On Wed, Oct 28, 2009 at 5:52 AM, Adamsky, Robert radam...@techtarget.comwrote: Have two cores with some common fields in their schemas. I want to perform a MLT query on one core and get results from the other schema. Both cores have same type of id. I saw this thread: http://www.nabble.com/Does-MoreLikeThis-support-sharding--td25378654.html This is not quite what I am doing as this is for shards against same schema. How is it suggested to perform a MLT query cross core / schema? Currently, I only see getting the result from one core and performing a query with the common fields in the second core and treating those results as MLT results.
Re: begins with searches
Read up of setting-up these kind searches here - http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ Cheers Avlesh On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: We need to offer begins with type searches, e.g. a search for surname, f will retrieve surname, firstname, surname, f, surname fm etc. Ideally, the user would be able to enter something like surname f*. However, wildcards don't work on phrase searches, nor do range searches. Any suggestions as to how best to search for begins with phrases; or, how to best configure solr to support such searches? TIA Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto: bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: Problem searching for phrases with the word to
My guess is that Solr is treating this as a range query. I've tried escaping the word To with backslashes, but it doesn't seem to make a difference. Is there a way to tell Solr that to is not a special word in this instance? Nope. Any occurrence of to in search term(s) does NOT cause the query to be parsed as a RangeQuery. You are probably doing phrase search on a text field which is analyzed for stopwords. These stopwords are typically stored in a file called stopwords.txt. Make sure your that the stopword analyzer is applied both at index time and query time. Cheers Avlesh On Mon, Oct 26, 2009 at 12:55 PM, mike mulvaney mike.mulva...@gmail.comwrote: I'm having trouble searching for phrases that have the word to in them. I have a bunch of articles indexed, and I need to be able to search the headlines like this: headline:House Committee Leaders Ask FCC To Consider Spectrum in Broadband Plan When I search like that, I get no hits. When I take out the word To, it finds the document: headline:House Committee Leaders Ask FCC My guess is that Solr is treating this as a range query. I've tried escaping the word To with backslashes, but it doesn't seem to make a difference. Is there a way to tell Solr that to is not a special word in this instance? -Mike
Re: copyField from multiple fields into one
It should have worked as expected. See if your name field is getting populated. Cheers Avlesh 2009/10/26 Steinar Asbjørnsen steinar...@gmail.com Hi all. I'm currently working on setting up spelling suggestion functionality. What I'd like is to put the values of two fields (keyword and name)into the spell-field. Something like (from schema.xml): field name=spell type=textSpell indexed=true stored=true multiValued=true/ ... copyField source=keyword dest=spell/ copyField source=name dest=spell/ As far as i can see I only get suggestions from the keyword-field, and not from the name-field. So my question is: Is it possible to copy both keyword and name into the spell-field? Thanks, Steinar
Re: Retrieve Matching Term
If you query looks like this - q=(myField:aaa myField:bbb myField:ccc) you would get the desired results for any tokenized field (e.g. text) called myField. Cheers Avlesh On Tue, Oct 20, 2009 at 6:28 AM, angry127 angry...@gmail.com wrote: Hi, Is it possible to get the matching terms from your query for each document returned without using highlighting. For example if you have the query aaa bbb ccc and one of the documents has the term aaa and another document has the term bbb and ccc. To have Solr return: Document 1: aaa Document 2: bbb ccc I was told this is possible using Term Vectors. I have not been able to find a way to do this using Term Vectors. The only reason I am against using highlighting is for performance reasons. Thanks. -- View this message in context: http://www.nabble.com/Retrieve-Matching-Term-tp25967886p25967886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding callback url to data import handler...Is this possible?
But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback. I would say the latter is more specific than the former. People who are comfortable writing JAVA wouldn't need any of these but the second best thing for others would be a capability to handle it in their own applications. A url can be the simplest way to invoke things in respective application. Doing it via javascript sounds like a round-about way of doing it. Cheers Avlesh 2009/10/15 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I can understand the concern that you do not wish to write Java code . But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback . Will it help? On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.com wrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
One more happy Solr user ...
I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Re: One more happy Solr user ...
Ah! I knew that was coming :) We are planning a spell-checker integration pretty soon. Thanks for trying out the site Andrew. Cheers Avlesh On Wed, Oct 14, 2009 at 2:53 PM, Andrew McCombe eupe...@gmail.com wrote: Hi Nice site. First search I tried was for 'italien' in 'Mumbai' which returned zero results. Are you using spellcheck suggestions? Apart from that it's nice and fast. Regards Andrew McCombe iWebsolutions.co.uk 2009/10/14 Avlesh Singh avl...@gmail.com I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Re: One more happy Solr user ...
If burrp! can keep pace with Solr enhancements, we are not too far from a munich.burrp.com ;) Thanks for checking out the site, Chantal. Cheers Avlesh On Wed, Oct 14, 2009 at 4:47 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi Avlesh, that is mean to sent something like that http://mumbai.burrp.com/pack/list/kolkata-on-a-roll around at lunch time - in Germany(!). Very very sadly, there are many places in Mumbai that have mastered the art of making authentic Kolkata rolls but I don't know of any here in Munich Congratulations for launching successfully! Chantal Avlesh Singh schrieb: I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Re: Sorting on Multiple fields
Do we attempt to raise some sort of functional query to find the least amount of the requested price id's? This would seem to imply some playing around in the query handler to allow a function of this sort. Unless I am missing something, this information can always be obtained by post-processing the data obtained from search results. Isn't it? Do we look at this rather than some internal method to handle the query and sort actions as a matter of relevancy on a calculated field? If so the methods of determining the fields included in the calculated field are alluding me at the moment. So pointers are welcome. I really did not understand the question. Is it related to sorting of results? Does this ultimately involve the implementation of some sort of custom type and handler to do this sort of task. If the answer to my previous question is affirmative, then yes, you would need to implement custom sorting behavior. It can be achieved in multiple ways depending upon your requirement. From something as simple as function-queries to using the power of dynamic fields to writing a custom field-type to writing a custom implementation of Lucene's Similarity .. any of these can be a potential answer to custom sorting. Cheers Avlesh On Wed, Oct 14, 2009 at 5:53 PM, Neil Lunn neil.l...@trixan.com wrote: We have come up against a situation we are trying to resolve in our Solr implementation project. This revolves mostly around how to sort results from index data we are likely to store in multiple fields but at runtime we are likely to query on the result of which one is most relevant. A brief example: We have product catalog information in the index which will have multiple prices dependent on the user logged in and other scenarios. For simplification this will look something like this: price_id101 = 100.00 price_id102 = 105.00 price_id103 = 110.00 price_id104 = 95.00 (etc) What we are looking at is at runtime we want to know which one of several selected prices is the minimum (or maximum), but not all prices, just a select set of say 3 or 2 id's. The purpose we are looking at is to determine a sort order to the results. This as we would be aware approaching a SQL respository we would feed it some query logic to say find me the least amount of these set of id's, therefore the search approach here raises some questions. - Do we attempt to raise some sort of functional query to find the least amount of the requested price id's? This would seem to imply some playing around in the query handler to allow a function of this sort. - Do we look at this rather than some internal method to handle the query and sort actions as a matter of relevancy on a calculated field? If so the methods of determining the fields included in the calculated field are alluding me at the moment. So pointers are welcome. - Does this ultimately involve the implementation of some sort of custom type and handler to do this sort of task. I am open to any response as if someone has not come across a similar problem before and can suggest an approach we are willing to open up a patch branch or similar to do some work on the issue. Though if there are no suggestions this will likely move out of our current stream and into future development. Neil
Re: Adding callback url to data import handler...Is this possible?
Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill
Re: Adding callback url to data import handler...Is this possible?
Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.comwrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill
Re: Dynamically compute document scores...
Options - 1. Can you pre-compute your business logic score at index time? If yes, then this value can be stored in some field and you can use function queries to use this data plus the score to return a value which you can sort upon. 2. Take a look at - http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Similarity.html. Custom similarity implementations can be hooked up into Solr easily. Cheers Avlesh On Tue, Oct 13, 2009 at 9:05 PM, William Pierce evalsi...@hotmail.comwrote: Folks: During query time, I want to dynamically compute a document score as follows: a) Take the SOLR score for the document -- call it S. b) Lookup the business logic score for this document. Call it L. c) Compute a new score T = func(S, L) d) Return the documents sorted by T. I have looked at function queries but in my limited/quick review of it, I could not see a quick way of doing this. Is this possible? Thanks, - Bill
Re: Scoring for specific field queries
Lame question, but are you populating data in the autoCompleteHelper2 field? Cheers Avlesh On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com wrote: The problem is, I'm getting equal scores for this: Query: q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf) Partial Result: doc float name=score0.7821733/float str name=autoCompleteHelperBikes Café/str /doc doc float name=score0.7821733/float str name=autoCompleteHelperCafe Feliy/str /doc I'm using the standard request handler with this. Thanks, Rih On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com wrote: Avlesh, I don't see anything wrong with the data from analysis. KeywordTokenized: *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9** ** 10** **11** **12** **13** **14** **15** **16** **...* *term text ** **th** **he** **e ** **c** **ch** **ha** **am** **mp** **pi* * **io** **on** **the** **he ** **e c** **ch** **cha** **...* *term type ** **word** **word** **word** **word** **word** **word** **word ** **word** **word** **word** **word** **word** **word** **word** **word** **word** **...* *source start,end ** **0,2** **1,3** **2,4** **3,5** **4,6** **5,7** **6,8 ** **7,9** **8,10** **9,11** **10,12** **0,3** **1,4** **2,5** **3,6** ** ...* WhitespaceTokenized: *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9** ** 10** **11** **...* *term text ** **th** **he** **the** **ch** **ha** **am** **mp** **pi** ** io** **on** **cha** **...* *term type ** **word** **word** **word** **word** **word** **word** **word ** **word** **word** **word** **word** **...* *source start,end ** **0,2** **1,3** **0,3** **0,2** **1,3** **2,4** **3,5 ** **4,6** **5,7** **6,8** **...* Is term position considered during scoring? Thanks, Rih On Fri, Oct 9, 2009 at 9:40 AM, Avlesh Singh avl...@gmail.com wrote: Use the field analysis tool to see how the data is being analyzed in both the fields. Cheers Avlesh On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com wrote: Hmm... I don't quite get the desired results. Those starting with cha are now randomly ordered. Is there something wrong with the filters I applied? On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote: Filters? I did not mean filters at all. I am in a mad rush right now, but on the face of it your field definitions look right. This is what I asked for - q=(autoComplete2:cha^10 autoComplete:cha) Lemme know if this does not work for you. Cheers Avlesh On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote: Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes
Re: Scoring for specific field queries
Can you just do q=autoCompleteHelper2:caf to see you get results? Cheers Avlesh On Fri, Oct 9, 2009 at 12:53 PM, R. Tan tanrihae...@gmail.com wrote: Yup, it is. Both are copied from another field called name. On Fri, Oct 9, 2009 at 3:15 PM, Avlesh Singh avl...@gmail.com wrote: Lame question, but are you populating data in the autoCompleteHelper2 field? Cheers Avlesh On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com wrote: The problem is, I'm getting equal scores for this: Query: q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf) Partial Result: doc float name=score0.7821733/float str name=autoCompleteHelperBikes Café/str /doc doc float name=score0.7821733/float str name=autoCompleteHelperCafe Feliy/str /doc I'm using the standard request handler with this. Thanks, Rih On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com wrote: Avlesh, I don't see anything wrong with the data from analysis. KeywordTokenized: *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9** ** 10** **11** **12** **13** **14** **15** **16** **...* *term text ** **th** **he** **e ** **c** **ch** **ha** **am** **mp** **pi* * **io** **on** **the** **he ** **e c** **ch** **cha** **...* *term type ** **word** **word** **word** **word** **word** **word** **word ** **word** **word** **word** **word** **word** **word** **word** **word** **word** **...* *source start,end ** **0,2** **1,3** **2,4** **3,5** **4,6** **5,7** **6,8 ** **7,9** **8,10** **9,11** **10,12** **0,3** **1,4** **2,5** **3,6** ** ...* WhitespaceTokenized: *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9** ** 10** **11** **...* *term text ** **th** **he** **the** **ch** **ha** **am** **mp** **pi** ** io** **on** **cha** **...* *term type ** **word** **word** **word** **word** **word** **word** **word ** **word** **word** **word** **word** **...* *source start,end ** **0,2** **1,3** **0,3** **0,2** **1,3** **2,4** **3,5 ** **4,6** **5,7** **6,8** **...* Is term position considered during scoring? Thanks, Rih On Fri, Oct 9, 2009 at 9:40 AM, Avlesh Singh avl...@gmail.com wrote: Use the field analysis tool to see how the data is being analyzed in both the fields. Cheers Avlesh On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com wrote: Hmm... I don't quite get the desired results. Those starting with cha are now randomly ordered. Is there something wrong with the filters I applied? On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote: Filters? I did not mean filters at all. I am in a mad rush right now, but on the face of it your field definitions look right. This is what I asked for - q=(autoComplete2:cha^10 autoComplete:cha) Lemme know if this does not work for you. Cheers Avlesh On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote: Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my
Re: Scoring for specific field queries
I have a very similar set-up for my auto-suggest (I am sorry that it can't be viewed from an external network). I am sending you my field definitions, please use them and see if it works out correctly. fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=tokenized_autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=suggestion type=autocomplete indexed=true stored=false/ field name=tokenized_suggestion type=tokenized_autocomplete indexed=true stored=true/ q=(suggestion:formula^2 tokenized_suggestion:formula) Hope this helps. Cheers Avlesh On Fri, Oct 9, 2009 at 1:03 PM, R. Tan tanrihae...@gmail.com wrote: Yeah, I do get results. Anything else I missed out? I want it to work like this site's auto suggest feature. http://www.sematext.com/demo/ac/index.html Try the keyword 'formula'. Thanks, Rih On Fri, Oct 9, 2009 at 3:24 PM, Avlesh Singh avl...@gmail.com wrote: Can you just do q=autoCompleteHelper2:caf to see you get results? Cheers Avlesh On Fri, Oct 9, 2009 at 12:53 PM, R. Tan tanrihae...@gmail.com wrote: Yup, it is. Both are copied from another field called name. On Fri, Oct 9, 2009 at 3:15 PM, Avlesh Singh avl...@gmail.com wrote: Lame question, but are you populating data in the autoCompleteHelper2 field? Cheers Avlesh On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com wrote: The problem is, I'm getting equal scores for this: Query: q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf) Partial Result: doc float name=score0.7821733/float str name=autoCompleteHelperBikes Café/str /doc doc float name=score0.7821733/float str name=autoCompleteHelperCafe Feliy/str /doc I'm using the standard request handler with this. Thanks, Rih On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com wrote: Avlesh, I don't see anything wrong with the data from analysis. KeywordTokenized: *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9** ** 10** **11** **12** **13** **14** **15** **16** **...* *term text ** **th** **he** **e ** **c** **ch** **ha** **am** **mp** **pi* * **io** **on** **the** **he ** **e c** **ch** **cha** **...* *term type ** **word** **word** **word** **word** **word** **word** **word ** **word** **word** **word** **word** **word** **word** **word** **word** **word** **...* *source start,end ** **0,2** **1,3** **2,4** **3,5** **4,6** **5,7** **6,8 ** **7,9** **8,10** **9,11** **10,12** **0,3** **1,4** **2,5** **3,6** ** ...* WhitespaceTokenized: *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9** ** 10** **11** **...* *term text ** **th** **he** **the** **ch** **ha** **am** **mp** **pi** ** io** **on** **cha** **...* *term type ** **word** **word** **word** **word** **word** **word** **word ** **word** **word** **word** **word** **...* *source start,end ** **0,2** **1,3** **0,3** **0,2** **1,3** **2,4** **3,5 ** **4,6** **5,7** **6,8** **...* Is term position considered during scoring? Thanks, Rih On Fri, Oct 9, 2009 at 9:40 AM, Avlesh Singh avl...@gmail.com wrote: Use
Re: Scoring for specific field queries
What are the replacements for, the special character and 20 char? I had no time to diff between your definitions and mine. Copy-pasting mine was easier :) Also, do you get results such as formula? The autocomplete field would definitely not match this query, but the tokenized autocomplete would. Give it a shot, it should work as you expect it to. Cheers Avlesh On Fri, Oct 9, 2009 at 1:25 PM, R. Tan tanrihae...@gmail.com wrote: Thanks, I'll give this a go. What are the replacements for, the special character and 20 char? Also, do you get results such as formula? On Fri, Oct 9, 2009 at 3:45 PM, Avlesh Singh avl...@gmail.com wrote: I have a very similar set-up for my auto-suggest (I am sorry that it can't be viewed from an external network). I am sending you my field definitions, please use them and see if it works out correctly. fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=tokenized_autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=suggestion type=autocomplete indexed=true stored=false/ field name=tokenized_suggestion type=tokenized_autocomplete indexed=true stored=true/ q=(suggestion:formula^2 tokenized_suggestion:formula) Hope this helps. Cheers Avlesh On Fri, Oct 9, 2009 at 1:03 PM, R. Tan tanrihae...@gmail.com wrote: Yeah, I do get results. Anything else I missed out? I want it to work like this site's auto suggest feature. http://www.sematext.com/demo/ac/index.html Try the keyword 'formula'. Thanks, Rih On Fri, Oct 9, 2009 at 3:24 PM, Avlesh Singh avl...@gmail.com wrote: Can you just do q=autoCompleteHelper2:caf to see you get results? Cheers Avlesh On Fri, Oct 9, 2009 at 12:53 PM, R. Tan tanrihae...@gmail.com wrote: Yup, it is. Both are copied from another field called name. On Fri, Oct 9, 2009 at 3:15 PM, Avlesh Singh avl...@gmail.com wrote: Lame question, but are you populating data in the autoCompleteHelper2 field? Cheers Avlesh On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com wrote: The problem is, I'm getting equal scores for this: Query: q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf) Partial Result: doc float name=score0.7821733/float str name=autoCompleteHelperBikes Café/str /doc doc float name=score0.7821733/float str name=autoCompleteHelperCafe Feliy/str /doc I'm using the standard request handler with this. Thanks, Rih On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com wrote: Avlesh, I don't see anything wrong with the data from analysis. KeywordTokenized: *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9** ** 10** **11** **12** **13** **14** **15** **16** **...* *term text ** **th** **he** **e ** **c** **ch** **ha** **am** **mp** **pi* * **io** **on** **the** **he ** **e c** **ch** **cha** **...* *term type ** **word** **word** **word** **word** **word** **word** **word ** **word** **word** **word** **word** **word** **word** **word** **word** **word
Re: Scoring for specific field queries
Filters? I did not mean filters at all. I am in a mad rush right now, but on the face of it your field definitions look right. This is what I asked for - q=(autoComplete2:cha^10 autoComplete:cha) Lemme know if this does not work for you. Cheers Avlesh On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote: Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct syntax for boolean search
q=+fieldname1:(+(word_a1 word_b1) +(word_a2 word_b2) +(word_a3 word_b3)) +fieldname2:... Cheers Avlesh On Thu, Oct 8, 2009 at 7:40 PM, Elaine Li elaine.bing...@gmail.com wrote: Hi, What is the correct syntax for the following boolean search from a field? fieldname1:(word_a1 or word_b1) (word_a2 or word_b2) (word_a3 or word_b3) fieldname2:. Thanks. Elaine
Re: Scoring for specific field queries
Use the field analysis tool to see how the data is being analyzed in both the fields. Cheers Avlesh On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com wrote: Hmm... I don't quite get the desired results. Those starting with cha are now randomly ordered. Is there something wrong with the filters I applied? On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote: Filters? I did not mean filters at all. I am in a mad rush right now, but on the face of it your field definitions look right. This is what I asked for - q=(autoComplete2:cha^10 autoComplete:cha) Lemme know if this does not work for you. Cheers Avlesh On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote: Hi Avlesh, I can't seem to get the scores right. I now have these types for the fields I'm targeting, fieldType name=autoComplete class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=autoComplete2 class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType My query is this, q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0 What should I tweak from the above config and query? Thanks, Rih On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote: I will have to pass on this and try your suggestion first. So, how does your suggestion (1 and 2) boost the my startswith query? Is it because of the n-gram filter? On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Yes it can be done but it needs some customization. Search for custom sort implementations/discussions. You can check... http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html . Let us know if you have any issues. Sandeep R. Tan wrote: This might work and I also have a single value field which makes it cleaner. Can sort be customized (with indexOf()) from the solr parameters alone? -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re : Questions about synonyms and highlighting
4 - the same question for highlighting with lemmatisation? Settings for manage (all highlighted) == the two wordsemmanage/em and emmanagement/em are highlighted Settings for manage == the first word emmanage/em is highlighted but not the second : management There is no Lemmatisation support in Solr as of now. The only support you get is stemming. Let me understand this correctly - you basically want the searches to happen with stemmed base but want to selectively highlight the original and/or stemmed words. Right? If yes, then AFAIK, this is not possible. Search passes through your fields analyzers (tokenizers and filters). Highlighters, typically, use the same set of analyzers and the behavior will be the same as in search; this essentially means that the keywords manage, managing, management and manager are REDUCED to manage for searchers and highlighters. If this can be done, then the only place to enable your feature could be Lucene highlighter api's. Someone more knowledegable can tell you, if that is possible. I have no idea about your #3, though my idea of handling accentuation is to apply a ISOLatin1AccentFilterFactory and get rid of them altogether :) I am curious to know the answer though. Cheers Avlesh On Wed, Oct 7, 2009 at 3:17 PM, Nourredine K. nourredin...@yahoo.comwrote: I'm not an expert on hit highlighting but please find some answers inline: Thanks Shalin for your answers. It helps a lot. I post again questions #3 and #4 for the others :) 3 - Is it possible and if so How can I configure solR to set or not highlighting for tokens with diacritics ? Settings for vélo (all highlighted) == the two words emvélo/em and emvelo/em are highlighted Settings for vélo == the first word emvélo/em is highlighted but not the second : velo 4 - the same question for highlighting with lemmatisation? Settings for manage (all highlighted) == the two wordsemmanage/em and emmanagement/em are highlighted Settings for manage == the first word emmanage/em is highlighted but not the second : management Regard, Nourredine.
Re: Facet query pb
I have no idea what pb mean but this is what you probably want - fq=(location_field:(NORTH AMERICA*)) Cheers Avlesh On Wed, Oct 7, 2009 at 10:40 PM, clico cl...@mairie-marseille.fr wrote: Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29: ok, retrieve docs http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring for specific field queries
You would need to boost your startswith matches artificially for the desired behavior. I would do it this way - 1. Create a KeywordTokenized field with n-gram filter. 2. Create a Whitespace tokenized field with n-gram flter. 3. Search on both the fields, boost matches for #1 over #2. Hope this helps. Cheers Avlesh On Thu, Oct 8, 2009 at 10:30 AM, R. Tan tanrihae...@gmail.com wrote: Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. For example, I have multiple documents with titles containing the word champion. Some of the document titles start with the word champion and some our entitled we are the champions. The ones that starts with the keyword needs to rank first or score higher. Is there a way to do this? I'm using this query for auto-suggest term feature where the keyword doesn't necessarily need to be the first word. Rihaed
Re: Scoring for specific field queries
I guess we don't need to depend on scores all the times. You can use custom sort to sort the results. Take a dynamicField, fill it with indexOf(keyword) value, sort the results by the field in ascending order. Then the records which contain the keyword at the earlier position will come first. Warning: This is a bad idea for multiple reasons: 1. If the word computer occurs in multiple times in a document what would you do in that case? Is this dynamic field supposed to be multivalued? I can't even imagine what would you do if the word computer occurs in multiple documents multiple times? 2. Multivalued fields cannot be sorted upon. 3. One needs to know the unique number of such keywords before implementing because you'll potentially end up creating those many fields. Cheers Avlesh On Thu, Oct 8, 2009 at 11:10 AM, Sandeep Tagore sandeep.tag...@gmail.comwrote: Hi Rihaed, I guess we don't need to depend on scores all the times. You can use custom sort to sort the results. Take a dynamicField, fill it with indexOf(keyword) value, sort the results by the field in ascending order. Then the records which contain the keyword at the earlier position will come first. Regards, Sandeep R. Tan wrote: Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re : wildcard searches
You are processing your tokens in the filter that you wrote. I am assuming it is the first filter being applied and removes the character 'h' from tokens. When you are doing that, you can preserve the original token in the same field as well. Because as of now, you are simply removing the character. Subsequent filters don't even know that there was an 'h' character in the original token. Since wild card queries are not analyzed, the 'h' character in the query hésita* does NOT get removed during query time. This means that unless the original token was preserved in the field it wouldn't find any matches. This helps? Cheers Avlesh On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice lbil...@yahoo.fr wrote: Hi. Thanks for your answers Christian and Avlesh. But I don't understant what you mean by : If you want to enable wildcard queries, preserving the original token (while processing each token in your filter) might work. Could you explain this point please ? Laurent De : Avlesh Singh avl...@gmail.com À : solr-user@lucene.apache.org Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s Objet : Re: wildcard searches Zambrano is right, Laurent. The analyzers for a field are not invoked for wildcard queries. You custom filter is not even getting executed at query-time. If you want to enable wildcard queries, preserving the original token (while processing each token in your filter) might work. Cheers Avlesh On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice lbil...@yahoo.fr wrote: Hi everyone, I have a little question regarding the search engine when a wildcard character is used in the query. Let's take the following example : - I have sent in indexation the word Hésitation (with an accent on the e) - The filters applied to the field that will handle this word, result in the indexation of esit (the mute H is suppressed (home made filter), the accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the ation. When i search for hesitation, esitation, ésitation etc ... all is OK, the document is returned. But as soon as I use a wildcard, like hésita*, the document is not returned. In fact, I have to put the wildcard in a manner that match the indexed term exactly (example esi*) Does the search engine applies the filters to the word that prefix the wildcard ? Or does it use this prefix verbatim ? Thanks for you help. Laurent
Re: Re : Re : wildcard searches
You are right, Angel. The problem would still persist. Why don't you consider putting the original data in some field. While querying, you can query on both the fields - analyzed and original one. Wildcard queries will not give you any results from the analyzed field but would match the data in your original field. Works? Cheers Avlesh On Tue, Oct 6, 2009 at 2:27 PM, Angel Ice lbil...@yahoo.fr wrote: Ah yes, got it. But i'm not sure this will solve my problem. Because, I'm aloso using the IsoLatin1 filter, that remove the accentued characters. So I will have the same problem with accentued characters. Cause the original token is not stored with this filter. Laurent De : Avlesh Singh avl...@gmail.com À : solr-user@lucene.apache.org Envoyé le : Mardi, 6 Octobre 2009, 10h41mn 56s Objet : Re: Re : wildcard searches You are processing your tokens in the filter that you wrote. I am assuming it is the first filter being applied and removes the character 'h' from tokens. When you are doing that, you can preserve the original token in the same field as well. Because as of now, you are simply removing the character. Subsequent filters don't even know that there was an 'h' character in the original token. Since wild card queries are not analyzed, the 'h' character in the query hésita* does NOT get removed during query time. This means that unless the original token was preserved in the field it wouldn't find any matches. This helps? Cheers Avlesh On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice lbil...@yahoo.fr wrote: Hi. Thanks for your answers Christian and Avlesh. But I don't understant what you mean by : If you want to enable wildcard queries, preserving the original token (while processing each token in your filter) might work. Could you explain this point please ? Laurent De : Avlesh Singh avl...@gmail.com À : solr-user@lucene.apache.org Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s Objet : Re: wildcard searches Zambrano is right, Laurent. The analyzers for a field are not invoked for wildcard queries. You custom filter is not even getting executed at query-time. If you want to enable wildcard queries, preserving the original token (while processing each token in your filter) might work. Cheers Avlesh On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice lbil...@yahoo.fr wrote: Hi everyone, I have a little question regarding the search engine when a wildcard character is used in the query. Let's take the following example : - I have sent in indexation the word Hésitation (with an accent on the e) - The filters applied to the field that will handle this word, result in the indexation of esit (the mute H is suppressed (home made filter), the accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the ation. When i search for hesitation, esitation, ésitation etc ... all is OK, the document is returned. But as soon as I use a wildcard, like hésita*, the document is not returned. In fact, I have to put the wildcard in a manner that match the indexed term exactly (example esi*) Does the search engine applies the filters to the word that prefix the wildcard ? Or does it use this prefix verbatim ? Thanks for you help. Laurent
Re: A little help with indexing joined words
We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index with 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results respectively. Borderland should have worked for a regular text field. For all other desired matches you can use EdgeNGramTokenizerFactory. Cheers Avlesh On Mon, Oct 5, 2009 at 7:51 PM, Andrew McCombe eupe...@gmail.com wrote: Hi I am hoping someone can point me in the right direction with regards to indexing words that are concatenated together to make other words or product names. We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index with 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results respectively. Where do I look to resolve this? The product name field is indexed using a text field type. Thanks in advance Andrew
Re: A little help with indexing joined words
Using synonyms might be a better solution because the use of EdgeNGramTokenizerFactory has the potential of creating a large number of token which will artificially increase the number of tokens in the index which in turn will affect the IDF score. Well, I don't see a reason as to why someone would need a length based normalization on such matches. I always have done omitNorms while using fields with this filter. Yes, synonyms might an answer when you have limited number of such words (phrases) and their possible combinations. Cheers Avlesh On Mon, Oct 5, 2009 at 10:32 PM, Christian Zambrano czamb...@gmail.comwrote: Using synonyms might be a better solution because the use of EdgeNGramTokenizerFactory has the potential of creating a large number of token which will artificially increase the number of tokens in the index which in turn will affect the IDF score. A query for borderland should have returned results though. It is difficult to troubleshoot why it didn't without knowing what query you used, and what kind of analysis is taking place. Have you tried using the analysis page on the admin section to see what tokens gets generated for 'Borderlands'? Christian On 10/05/2009 11:01 AM, Avlesh Singh wrote: We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index with 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results respectively. Borderland should have worked for a regular text field. For all other desired matches you can use EdgeNGramTokenizerFactory. Cheers Avlesh On Mon, Oct 5, 2009 at 7:51 PM, Andrew McCombeeupe...@gmail.com wrote: Hi I am hoping someone can point me in the right direction with regards to indexing words that are concatenated together to make other words or product names. We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index with 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results respectively. Where do I look to resolve this? The product name field is indexed using a text field type. Thanks in advance Andrew
Re: wildcard searches
No filters are applied to wildcard/fuzzy searches. Ah! Not like that .. I guess, it is just that the phrase searches using wildcards are not analyzed. Cheers Avlesh On Mon, Oct 5, 2009 at 10:42 PM, Christian Zambrano czamb...@gmail.comwrote: No filters are applied to wildcard/fuzzy searches. I couldn't find a reference to this on either the solr or lucene documentation but I read it on the Solr book from PACKT On 10/05/2009 12:09 PM, Angel Ice wrote: Hi everyone, I have a little question regarding the search engine when a wildcard character is used in the query. Let's take the following example : - I have sent in indexation the word Hésitation (with an accent on the e) - The filters applied to the field that will handle this word, result in the indexation of esit (the mute H is suppressed (home made filter), the accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the ation. When i search for hesitation, esitation, ésitation etc ... all is OK, the document is returned. But as soon as I use a wildcard, like hésita*, the document is not returned. In fact, I have to put the wildcard in a manner that match the indexed term exactly (example esi*) Does the search engine applies the filters to the word that prefix the wildcard ? Or does it use this prefix verbatim ? Thanks for you help. Laurent
Re: wildcard searches
First of all, I know of no way of doing wildcard phrase queries. http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_combine_wildcard_and_phrase_search.2C_e.g._.22foo_ba.2A.22.3F When I said not filters, I meant TokenFilters which is what I believe you mean by 'not analyzed' Analysis is a Lucene way of configuring tokenizers and filters for a field (index time and query time). I guess, both of us mean the same thing. Cheers Avlesh On Mon, Oct 5, 2009 at 11:04 PM, Christian Zambrano czamb...@gmail.comwrote: Avlesh, I don't understand your answer. First of all, I know of no way of doing wildcard phrase queries. When I said not filters, I meant TokenFilters which is what I believe you mean by 'not analyzed' On 10/05/2009 12:27 PM, Avlesh Singh wrote: No filters are applied to wildcard/fuzzy searches. Ah! Not like that .. I guess, it is just that the phrase searches using wildcards are not analyzed. Cheers Avlesh On Mon, Oct 5, 2009 at 10:42 PM, Christian Zambranoczamb...@gmail.com wrote: No filters are applied to wildcard/fuzzy searches. I couldn't find a reference to this on either the solr or lucene documentation but I read it on the Solr book from PACKT On 10/05/2009 12:09 PM, Angel Ice wrote: Hi everyone, I have a little question regarding the search engine when a wildcard character is used in the query. Let's take the following example : - I have sent in indexation the word Hésitation (with an accent on the e) - The filters applied to the field that will handle this word, result in the indexation of esit (the mute H is suppressed (home made filter), the accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the ation. When i search for hesitation, esitation, ésitation etc ... all is OK, the document is returned. But as soon as I use a wildcard, like hésita*, the document is not returned. In fact, I have to put the wildcard in a manner that match the indexed term exactly (example esi*) Does the search engine applies the filters to the word that prefix the wildcard ? Or does it use this prefix verbatim ? Thanks for you help. Laurent
Re: wildcard searches
Zambrano is right, Laurent. The analyzers for a field are not invoked for wildcard queries. You custom filter is not even getting executed at query-time. If you want to enable wildcard queries, preserving the original token (while processing each token in your filter) might work. Cheers Avlesh On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice lbil...@yahoo.fr wrote: Hi everyone, I have a little question regarding the search engine when a wildcard character is used in the query. Let's take the following example : - I have sent in indexation the word Hésitation (with an accent on the e) - The filters applied to the field that will handle this word, result in the indexation of esit (the mute H is suppressed (home made filter), the accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the ation. When i search for hesitation, esitation, ésitation etc ... all is OK, the document is returned. But as soon as I use a wildcard, like hésita*, the document is not returned. In fact, I have to put the wildcard in a manner that match the indexed term exactly (example esi*) Does the search engine applies the filters to the word that prefix the wildcard ? Or does it use this prefix verbatim ? Thanks for you help. Laurent
Re: A little help with indexing joined words
Zambrano, I was too quick to respond to your idf explanation. I definitely did not mean that idf and length-norms are the same thing. Andrew, this is how i would have done it - First, I would create a field called prefix_text as undeneath in my schema.xml fieldType name=prefix_text class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all/ filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Second, I would declare a field of this and populate the same (using copyField) while indexing. Third, while querying I would query on the both the fields. I would boost the matches for original field to a large extent over the n-grammed field. Scenarios where Dragon Fly is expected to match against Dragonfly in the index, query on the original field would not give you any matches, thereby bringing results from the prefix_token field right there on top. Hope this helps. Cheers Avlesh On Mon, Oct 5, 2009 at 11:10 PM, Christian Zambrano czamb...@gmail.comwrote: Would you mind explaining how omitNorm has any effect on the IDF problem I described earlier? I agree with your second sentence. I had to use the NGramTokenFilter to accommodate partial matches. On 10/05/2009 12:11 PM, Avlesh Singh wrote: Using synonyms might be a better solution because the use of EdgeNGramTokenizerFactory has the potential of creating a large number of token which will artificially increase the number of tokens in the index which in turn will affect the IDF score. Well, I don't see a reason as to why someone would need a length based normalization on such matches. I always have done omitNorms while using fields with this filter. Yes, synonyms might an answer when you have limited number of such words (phrases) and their possible combinations. Cheers Avlesh On Mon, Oct 5, 2009 at 10:32 PM, Christian Zambranoczamb...@gmail.com wrote: Using synonyms might be a better solution because the use of EdgeNGramTokenizerFactory has the potential of creating a large number of token which will artificially increase the number of tokens in the index which in turn will affect the IDF score. A query for borderland should have returned results though. It is difficult to troubleshoot why it didn't without knowing what query you used, and what kind of analysis is taking place. Have you tried using the analysis page on the admin section to see what tokens gets generated for 'Borderlands'? Christian On 10/05/2009 11:01 AM, Avlesh Singh wrote: We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index with 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results respectively. Borderland should have worked for a regular text field. For all other desired matches you can use EdgeNGramTokenizerFactory. Cheers Avlesh On Mon, Oct 5, 2009 at 7:51 PM, Andrew McCombeeupe...@gmail.com wrote: Hi I am hoping someone can point me in the right direction with regards to indexing words that are concatenated together to make other words or product names. We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index with 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results respectively. Where do I look to resolve this? The product name field is indexed using a text field type. Thanks in advance Andrew
Highlighting bean properties using DocumentObjectBinder - New feature?
Like most others, I use SolrJ and bind my beans with @Field annotations to read responses from Solr. For highlighting these properties in my bean, I always write a separate piece - Get the list of highlights from response and then use the MapfieldName, Listhighlights to put them back in my original bean. This evening, I tried creating an @Highlight annotation and modified the DocumentObjectBinder to understand this attribute (with a bunch of other properties). This is how it works: You can annotate your beans with @Highlight as underneath. class MyBean{ @Field @Highlight String name; @Field (solr_category_field_name) ListString categories; @Highlight (solr_category_field_name) ListString highlightedCategories @Field float score; ... } and use QueryResponse#getBeans(MyBean.class) to achieve both - object binding as well as highlighting. I was wondering if this can be of help to most users or not. Can this be a possible enhancement in DocumentObjectBinder? If yes, I can write a patch. Cheers Avlesh
Re: Usage of Sort and fq
/?q=*:*fq:category:animalsort=child_count%20asc Search for all documents (of animals), and filter the ones that belong to the category animal and sort ascending by a field called child_count that contains number of children for each animal. You can pass multiple fq's with more fq=... parameters. Secondary, tertiary sorts can be specified using comma (,) as the separator. i.e. sort=fieldA asc,fieldB desc, fieldC asc, ... Cheers Avlesh On Tue, Sep 29, 2009 at 3:51 PM, bhaskar chandrasekar bas_s...@yahoo.co.inwrote: Hi, Can some one let me know how to use sort and fq parameters in Solr. Any examples woould be appreciated. Regards Bhaskar
Re: Questions on RandomSortField
Thanks Hoss! The approach that I explained in my subsequent email works like a charm. Cheers Avlesh On Wed, Sep 30, 2009 at 3:45 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : The question was either non-trivial or heavily uninteresting! No replies yet it's pretty non-trivial, and pretty interesting, but i'm also pretty behind on my solr-user email. I don't think there's anyway to do what you wanted without a custom plugin, so your efforts weren't in vain ... if we add the abiliity to sort by a ValueSource (aka function ... there's a Jira issue for this somewhere) then you could also do witha combination of functions so that anything in your category gets flattened to an extremely high constant, and everything else has a real score -- then a secondary sort on a random field would effectively only randomize the things in your category ... but we're not there yet. : Hoss, I have a small question (RandomSortField bears your signature) - Any : reason as to why RandomSortField#hash() and RandomSortField#getSeed() : methods are private? Having them public would have saved myself from : owning a copy in my class as well. just a general principle of API future-proofing: keep internals private unless you explicitly think through how subclasses will use them. I haven't thought it through all the way, but do you really need to copy everything? couldn't you get the SortField/Comparator from super and only delegate to it if the categories both match your specific categoryId? -Hoss
Re: Regular expression not working
Such questions are better answered on the user mailing list. You don't need to post them on the dev list. What matches an incoming query is largely a function of your field type definition and the way you analyze your field data query time and index time. Copy-paste your field and its type definition in schema.xml Cheers Avlesh On Mon, Sep 28, 2009 at 8:56 PM, Siddhartha Pahade pahade@gmail.comwrote: Hi guys, My search result is Gilmore Girls If I search on Gilmore, it gives me result Gilmore Girls in the output as desired. However, if I search on string gilmore* or gilm , it does not work whereas we want it to work. Any help highly appreciated. Thanks!
Re: Unsubscribe from this mailing-list
You seem to be desperate to get out of the Solr mailing list :) Send an email to solr-user-unsubscr...@lucene.apache.org Cheers Avlesh On Fri, Sep 25, 2009 at 11:54 AM, Rafeek Raja rafeek.r...@gmail.com wrote: Unsubscribe from this mailing-list
Highlighting on text fields
I am new to the whole highlighting API and have a few basic questions: I have a text type field defined as underneath: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType And the schema field is associated as follows: field name=text_entity_name type=text indexed=true stored=false/ My query, q=text_entity_name:(foo bar)hl=truehl.fl=text_entity_name work fine for the search part but not for highlighting. The highlight named list is empty for each document returned back. I have a unique key defined. What am I missing? Do I need to store term vectors for highlighting to work properly? Cheers Avlesh
Highlighting not working on a prefix_token field
I have a prefix_token field defined as underneath in my schema.xml fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Searches on the field work fine and as expected. However, attempts to highlight on this field does not yield any results. Highlighting on other fields work fine. Any clues? I am using Solr 1.3 Cheers Avlesh
Re: Highlighting not working on a prefix_token field
Hmmm .. But ngrams with KeywordTokenizerFactory instead of the WhitespaceTokenizerFactory work just as fine. Related issues? Cheers Avlesh On Wed, Sep 23, 2009 at 12:27 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 23, 2009 at 12:23 PM, Avlesh Singh avl...@gmail.com wrote: I have a prefix_token field defined as underneath in my schema.xml fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Searches on the field work fine and as expected. However, attempts to highlight on this field does not yield any results. Highlighting on other fields work fine. Won't work until SOLR-1268 comes along. http://www.lucidimagination.com/search/document/4da480fe3eb0e7e4/highlighting_in_stemmed_or_n_grammed_fields_possible -- Regards, Shalin Shekhar Mangar.
Re: Highlighting not working on a prefix_token field
I'm sorry I don't understand the question. Do you mean to say that highlighting works with one but not with another? Yes. Cheers Avlesh On Wed, Sep 23, 2009 at 12:59 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 23, 2009 at 12:31 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm .. But ngrams with KeywordTokenizerFactory instead of the WhitespaceTokenizerFactory work just as fine. Related issues? I'm sorry I don't understand the question. Do you mean to say that highlighting works with one but not with another? -- Regards, Shalin Shekhar Mangar.
Re: Overlapping zipcodes
Range queries? Cheers Avlesh On Mon, Sep 21, 2009 at 2:57 PM, Anders Melchiorsen m...@spoon.kalibalik.dk wrote: We are in a situation where we are trying to match up documents based on a number of zipcodes. In our case, zipcodes are just integers, so that hopefully simplifies things. So, we might have a document listing a number of zipcodes: 1200-1450,2000,5000-5999 and we want to do a search of 1100-1300,8000 and have it match the document. How can this be done using Solr? Thanks, Anders.
Re: Questions on RandomSortField
The question was either non-trivial or heavily uninteresting! No replies yet :) Thankfully, I figured out a solution for the problem at hand. For people who might be looking for a solution, here it goes - 1. Extended the RandomSortField to create your own YourCustomRandomField. 2. Override the RandomSortField #getSortField method to return YourSortField. 3. Return YourSortComparatorSource from YourSortField#getFactory(). 4. Most of the rules related to the problem statement would be handled in the YourSortComparatorSource#newComparator(). 5. In your schema, create a dynamic field of YourFieldType. Pass in the id (Look at the problem statement in the trailing post) as a part of the dynamic field name in your sort query. 6. Inside YourSortComparatorSource#newComparator(), get the above mentioned id from fieldName parameter and then fetch the values indexed in this field using Lucene's FieldCache. 7. In your ScoreDocComparator#compare(), first check for the values in the id field and return -1,1,0 or hash(i.doc + seed) - hash(j.doc + seed) based on the values in this field. The idea is to only randomize results for a particular id value. Hoss, I have a small question (RandomSortField bears your signature) - Any reason as to why RandomSortField#hash() and RandomSortField#getSeed() methods are private? Having them public would have saved myself from owning a copy in my class as well. My solution applies to Solr 1.3. It might not hold true for higher versions as underlying Lucene API's might have changed. Cheers Avlesh On Sun, Sep 20, 2009 at 4:28 PM, Avlesh Singh avl...@gmail.com wrote: I am using Solr 1.3 I have a solr.RandomSortField type dynamic field which I use to randomize my results. I am in a tricky situation. I need to randomize only certain results in my Hits. To elaborate, I have a integer field called category_id. When performing a query, I need to get results from all categories and place the ones from SOME_CAT_ID at the top. I achieved this by populating a separate dynamic field while indexing data. i.e When a doc is added to the index a field called dynamic_cat_id_SOME_CAT_ID is populated with its category id. While querying, I know the value of SOME_CAT_ID, so adding a sort=dynamic_cat_id_SOME_CAT_ID asc, score desc to my query, works absolutely fine. So far so good. I am now supposed to randomize the results for category_id=SOME_CAT_ID, i.e results at the top. My understading is that adding sort=dynamic_cat_id_SOME_CAT_ID asc, *my_dynamic_random_field_SOME_SEED asc*, score desc to the query would randomize all the results. This is not desired. I only want to randomize the one's at the top (category_id=SOME_CAT_ID), rest should be ordered based on relevance score. Two simple questions: 1. Is there a way to achieve this without writing any custom code? 2. If the answer to #1 is no, the Where should I start? I glanced the RandomSortField class but could not figure out how to proceed. Do I need to create a custom FieldType? Can I extend the RandomSortField and override the sorting behaviour? Any help would be appreciated. Cheers Avlesh
Questions on RandomSortField
I am using Solr 1.3 I have a solr.RandomSortField type dynamic field which I use to randomize my results. I am in a tricky situation. I need to randomize only certain results in my Hits. To elaborate, I have a integer field called category_id. When performing a query, I need to get results from all categories and place the ones from SOME_CAT_ID at the top. I achieved this by populating a separate dynamic field while indexing data. i.e When a doc is added to the index a field called dynamic_cat_id_SOME_CAT_ID is populated with its category id. While querying, I know the value of SOME_CAT_ID, so adding a sort=dynamic_cat_id_SOME_CAT_ID asc, score desc to my query, works absolutely fine. So far so good. I am now supposed to randomize the results for category_id=SOME_CAT_ID, i.e results at the top. My understading is that adding sort=dynamic_cat_id_SOME_CAT_ID asc, *my_dynamic_random_field_SOME_SEED asc*, score desc to the query would randomize all the results. This is not desired. I only want to randomize the one's at the top (category_id=SOME_CAT_ID), rest should be ordered based on relevance score. Two simple questions: 1. Is there a way to achieve this without writing any custom code? 2. If the answer to #1 is no, the Where should I start? I glanced the RandomSortField class but could not figure out how to proceed. Do I need to create a custom FieldType? Can I extend the RandomSortField and override the sorting behaviour? Any help would be appreciated. Cheers Avlesh
Re: Need help to finalize my autocomplete
Instead of tokenizer class=solr.WhitespaceTokenizerFactory / use tokenizer class=solr.KeywordTokenizerFactory/ Cheers Avlesh 2009/9/16 Vincent Pérès vincent.pe...@gmail.com Hello, I'm using the following code for my autocomplete feature : The field type : fieldType name=autoComplete class=solr.TextField omitNorms=true analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory maxGramSize=20 minGramSize=2 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType The field : dynamicField name=*_ac type=autoComplete indexed=true stored=true / The query : ?q=*:*fq=query_ac:harry*wt=jsonrows=15start=0fl=*indent=onfq=model:SearchQuery It gives me a list of results I can parse and use with jQuery autocomplete plugin and all that works very well. Example of results : harry harry potter the last fighting harry harry potter 5 comic relief harry potter What I would like to do now is only to have results starting with the query, so it should be : harry harry potter harry potter 5 Can anybody tell me if it is possible and so how to do it ? Thank you ! Vincent And -- View this message in context: http://www.nabble.com/Need-help-to-finalize-my-autocomplete-tp25468885p25468885.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Creating facet query using SolrJ
When constructing query, I create a lucene query and use query.toString to create SolrQuery. Go this thread - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query I am facing difficulty while creating facet query for individual field, as I could not find an easy and clean way of constructing facet query with parameters specified at field level. Per field overrides for facet params using SolrJ is not supported yet. However, you can always use solrQuery.set(f.myField.facet.limit,10) ... to pass field specific facet params to the SolrServer. Cheers Avlesh On Wed, Sep 9, 2009 at 2:42 PM, Aakash Dharmadhikari aaka...@gmail.comwrote: hello, I am using SolrJ to access solr indexes. When constructing query, I create a lucene query and use query.toString to create SolrQuery. I am facing difficulty while creating facet query for individual field, as I could not find an easy and clean way of constructing facet query with parameters specified at field level. As I understand, the faceting parameters like limit, sort order etc. can be set on SolrQuery object but they are used for all the facets in query. I would like to provide these parameters separately for each field. I am currently building such query in Java code using string append. But it looks really bad, and would be prone to breaking when query syntax changes in future. If there any better way of constructing such detailed facet queries, the way we build the main solr search query? regards, aakash
Re: Facet search field returning results on split words
Your field needs to be untokenized for expected results. Faceting on the text field that you use to search will give you facets like these. You can index the same data in some other string field and facet on that field. PS: You can use copyField to copy data during index time from one field to other. Cheers Avlesh On Fri, Sep 4, 2009 at 6:21 PM, EwanH drldgt...@sneakemail.com wrote: Hi I have a solr search where a particular field named location is a place name. I have the field indexed and stored. It is quite likely that a field value could comprise more than one term or at least 2 words split by a space such as Burnham Market. Now if I search on location:burnham I get the appropriate docs returned ok but the facet results return lst name=location int name=burnham2/int int name=thorp2/int /lst i.e. values for both words which I don' want. What can I do about this? Can I somehow escape the space when adding the data for indexing? -- Ewan -- View this message in context: http://www.nabble.com/Facet-search-field-returning-results-on-split-words-tp25293787p25293787.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to scan dynamic field without specifying each field in query
I dont have that answer as I was asking a general question, not one for a specific situation I am encountering). I can understand :) what I am essentially asking for is: is there a short, simple and generic method/technique to deal with large numbers of dynamic fields (rather than having to specify each and every test on each and every dynamic field) in a query Not as of now. There are a lot of open issues in Solr aiming to handle dynamic fields in an intuitive way. SolrJ has already been made capable of binding dynamic field content into Java beans ( https://issues.apache.org/jira/browse/SOLR-1129). Faceting on myField_* ( https://issues.apache.org/jira/browse/SOLR-1387) and adding SolrDocuments with MapString, String myField_* ( https://issues.apache.org/jira/browse/SOLR-1357) are just some of the enhancements on the way. what originally prompted this question is I was looking at FunctionQueries ( http://wiki.apache.org/solr/FunctionQuery) and started to wonder if there was some way to create my own functions to handle dynamic fields. I don't think you need function queries here. Function queries are supposed to return score for a document based on their ValueSource. What you probably need is a custom QueryParser. Cheers Avlesh On Fri, Sep 4, 2009 at 9:48 PM, gdeconto gerald.deco...@topproducer.comwrote: I dont have that answer as I was asking a general question, not one for a specific situation I am encountering). what I am essentially asking for is: is there a short, simple and generic method/technique to deal with large numbers of dynamic fields (rather than having to specify each and every test on each and every dynamic field) in a query what originally prompted this question is I was looking at FunctionQueries (http://wiki.apache.org/solr/FunctionQuery) and started to wonder if there was some way to create my own functions to handle dynamic fields. Aakash Dharmadhikari wrote: what all other searches you would like to perform on these fields? ... -- View this message in context: http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25297439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Schema for group/child entity setup
Well you are talking about a very relational behavior, Tan. You can declare a locations and location_* field in your schema. While indexing a document, put all the locations inside the field locations. Populate location_state, location_city etc .. with their corresponding location values. That ways, when no filter is applied, you can facet on the locations field to get all the locations. In all other scenarios when a filter on field foo is applied, faceting on location_foo will give you the desired results. Cheers Avlesh On Fri, Sep 4, 2009 at 10:16 PM, R. Tan tanrihae...@gmail.com wrote: I can't because there are facet values for each location, such as state/city/neighborhood and facilities. Example result is 7 Eleven, 100 locations when no location filters are applied, where there is a filter for state, it should show 7 Eleven, 20 locations. On Fri, Sep 4, 2009 at 11:57 PM, Aakash Dharmadhikari aaka...@gmail.com wrote: can't you store the locations as part of the parent listing while storing. This way there would be only one document per parent listing. And all the locations related information can be multi valued attributes per property or any other way depending on the attributes. 2009/9/3 R. Tan tanrihae...@gmail.com Hi Solrers, I would like to get your opinion on how to best approach a search requirement that I have. The scenario is I have a set of business listings that may be group into one parent business (such as 7-eleven having several locations). On the results page, I only want 7-eleven to show up once but also show how many locations matched the query (facet filtered by state, for example) and maybe a preview of the some of the locations. Searching for the business name is straightforward but the locations within the a result is quite tricky. I can do the opposite, searching for the locations and faceting on business names, but it will still basically be the same thing and repeat results with the same business name. Any advice? Thanks, R
Re: Schema for group/child entity setup
But, as I've discovered the field collapsing feature recently (although I haven't tested it), can't it solve this requirement? From the top of my head, no. The answer might change on deep thinking. It is one of the most popular features which is yet to be incorporated into Solr. Cheers Avlesh On Fri, Sep 4, 2009 at 10:58 PM, R. Tan tanrihae...@gmail.com wrote: Hmmm, interesting solution. But, as I've discovered the field collapsing feature recently (although I haven't tested it), can't it solve this requirement? On Sat, Sep 5, 2009 at 1:14 AM, Avlesh Singh avl...@gmail.com wrote: Well you are talking about a very relational behavior, Tan. You can declare a locations and location_* field in your schema. While indexing a document, put all the locations inside the field locations. Populate location_state, location_city etc .. with their corresponding location values. That ways, when no filter is applied, you can facet on the locations field to get all the locations. In all other scenarios when a filter on field foo is applied, faceting on location_foo will give you the desired results. Cheers Avlesh On Fri, Sep 4, 2009 at 10:16 PM, R. Tan tanrihae...@gmail.com wrote: I can't because there are facet values for each location, such as state/city/neighborhood and facilities. Example result is 7 Eleven, 100 locations when no location filters are applied, where there is a filter for state, it should show 7 Eleven, 20 locations. On Fri, Sep 4, 2009 at 11:57 PM, Aakash Dharmadhikari aaka...@gmail.com wrote: can't you store the locations as part of the parent listing while storing. This way there would be only one document per parent listing. And all the locations related information can be multi valued attributes per property or any other way depending on the attributes. 2009/9/3 R. Tan tanrihae...@gmail.com Hi Solrers, I would like to get your opinion on how to best approach a search requirement that I have. The scenario is I have a set of business listings that may be group into one parent business (such as 7-eleven having several locations). On the results page, I only want 7-eleven to show up once but also show how many locations matched the query (facet filtered by state, for example) and maybe a preview of the some of the locations. Searching for the business name is straightforward but the locations within the a result is quite tricky. I can do the opposite, searching for the locations and faceting on business names, but it will still basically be the same thing and repeat results with the same business name. Any advice? Thanks, R
Re: how to create a custom queryparse to handle new functions
You do not need to create a custom query parser for this. You just need to create a custom function query. Look at one of the existing function queries in Solr as an example. This is where the need originates from - http://www.lucidimagination.com/search/document/a4bb0dfee53f7493/how_to_scan_dynamic_field_without_specifying_each_field_in_query Within the function, the intent is to rewrite incoming parameter into a different query. Can this be done? AFAIK, not. Cheers Avlesh On Sat, Sep 5, 2009 at 3:21 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Sep 5, 2009 at 2:15 AM, gdeconto gerald.deco...@topproducer.com wrote: Can someone point me in the general direction of how to create a custom queryparser that would allow me to create custom query commands like this: http://localhost:8994/solr/select?q=myfunction(http://localhost:8994/solr/select?q=myfunction%28 http://localhost:8994/solr/select?q=myfunction%28‘Foo’, 3) or point me towards an example? note that the actual functionality of myfunction is not defined. I am just wondering if this sort of extensibility is possible. You do not need to create a custom query parser for this. You just need to create a custom function query. Look at one of the existing function queries in Solr as an example. -- Regards, Shalin Shekhar Mangar.