Re: Add me to the Solr ContributorsGroup
Actually, I just found my username in the list of names ( https://wiki.apache.org/solr/ContributorsGroup), however, when I wanted to create my own page or change an existing one, I got the message: "You are not allowed to edit this page". Thank you in advance for your collaboration, -SR 2016-03-03 14:12 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>: > Hello, > > Could you please add me to the Contributor Group. Here are my account > info : > > - Name: Saïd Radhouani > - User name: radhouani > - email: said.radhou...@gmail.com > > For more info about myself, please visit my linked page: > https://www.linkedin.com/in/radhouani > > Thanks, > -Saïd > > 2015-12-30 20:36 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>: > >> Hi - I'd appreciate if you could add me to the Contributor Group. Here >> are my account info : >> >> - Name: Saïd Radhouani >> - User name: radhouani >> - email: said.radhou...@gmail.com >> >> Thanks, >> -Saïd >> > >
Re: Add me to the Solr ContributorsGroup
Hello, Could you please add me to the Contributor Group. Here are my account info : - Name: Saïd Radhouani - User name: radhouani - email: said.radhou...@gmail.com For more info about myself, please visit my linked page: https://www.linkedin.com/in/radhouani Thanks, -Saïd 2015-12-30 20:36 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>: > Hi - I'd appreciate if you could add me to the Contributor Group. Here are > my account info : > > - Name: Saïd Radhouani > - User name: radhouani > - email: said.radhou...@gmail.com > > Thanks, > -Saïd >
Re: Add me to the Solr ContributorsGroup
Hello, Could you please add me to the Contributor Group. Here are my account info : - Name: Saïd Radhouani - User name: radhouani - email: said.radhou...@gmail.com For more info about myself, please visit my linked page: https://www.linkedin.com/in/radhouani Thanks, -Saïd 2015-12-30 20:36 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>: > Hi - I'd appreciate if you could add me to the Contributor Group. Here are > my account info : > > - Name: Saïd Radhouani > - User name: radhouani > - email: said.radhou...@gmail.com > > Thanks, > -Saïd >
Add me to the Solr ContributorsGroup
Hi - I'd appreciate if you could add me to the Contributor Group. Here are my account info : - Name: Saïd Radhouani - User name: radhouani - email: said.radhou...@gmail.com Thanks, -Saïd
Re: LocalSolr distance in km?
Hi, What resource are you using for LocalSolr? Using the SpatialTierQParser, you can choose between km or mile: http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/ Or, if you are using the LocalSolrQueryComponent (http://www.gissearch.com/localsolr), and you can't choose between the two units, you can use the radius parameter and the conversion from mile to Km (1 kilometer = 0.621371192 mile), e.g., http://...select?qt=geolat=xx.xxlong=yy.yyq=*:*radius=0.621371192 HTP -S On Jul 21, 2010, at 6:14 AM, Chamnap Chhorn wrote: Hi, I want to do a geo query with LocalSolr. However, It seems it supports only miles **when calculating distances. Is there a quick way to use this search component with solr using Km instead? The other thing I want it to calculate distance start from 500 meters up. How could I do this? -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Spatial Search - Best choice (if any)?
Hi, Using Solr 1.4, I'm now working on adding spatial search options, such as distance-based sorting, Bounding-box filter, etc. To the best of my knowledge, there are three possible points we can start from: 1. The http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/ 2. The gissearch.com 3. The http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources I saw that these three options have been used but didn't see any comparison between them. Is there any one out there who can recommend one option over another? Thanks, -S
Re: Spatial Search - Best choice ?
Thanks for the links, but this makes things even harder :) Do you have any recommendations for one pointer over another? Thanks, -S On Jul 15, 2010, at 1:08 PM, findbestopensource wrote: Some more pointers to spatial search, http://www.jteam.nl/products/spatialsolrplugin.html http://code.google.com/p/spatial-search-lucene/ http://sujitpal.blogspot.com/2008/02/spatial-search-with-lucene.html Regards Aditya www.findbestopensource.com On Thu, Jul 15, 2010 at 3:54 PM, Saïd Radhouani r.steve@gmail.comwrote: Hi, Using Solr 1.4, I'm now working on adding spatial search options, such as distance-based sorting, Bounding-box filter, etc. To the best of my knowledge, there are three possible points we can start from: 1. The http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/ 2. The gissearch.com 3. The http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources I saw that these three options have been used but didn't see any comparison between them. Is there any one out there who can recommend one option over another? Thanks, -S
Function Query Sorting vs 'Sort' parameter?
Hi, I'm making some basic sorting (date, price, etc.) using the sort parameter (sort=field+asc), and it's working fine. I'm wondering whether there's a significant argument to use function query sorting instead of the sort parameter? Thanks, -S
Re: Function Query Sorting vs 'Sort' parameter?
Yes, indeed, you understood my question. Looking forward to the next version then. To your reply, I'd add that _val_ is used for standard request handler, and bf is used for dismax, right? -S On Jul 10, 2010, at 12:05 AM, Koji Sekiguchi wrote: (10/07/10 0:54), Saïd Radhouani wrote: Hi, I'm making some basic sorting (date, price, etc.) using the sort parameter (sort=field+asc), and it's working fine. I'm wondering whether there's a significant argument to use function query sorting instead of the sort parameter? Thanks, -S I'm not sure if I understand your question correctly, but sort by function will be available in next version of Solr: https://issues.apache.org/jira/browse/SOLR-1297 q=ipodsort=func(price) asc Or you can sort by function via _val_ in Solr 1.4: q=ipod^0 _val_:func(price)sort=score asc Koji -- http://www.rondhuit.com/en/
Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory
Hi, I'm using Solr 1.4 and I need to use a Latin Accent Filter. In the Solr wiki (http://wiki.apache.org/solr/SchemaDesign), it's recommended to use MappingCharFilterFactory instead of ISOLatin1AccentFilterFactory. Could someone tell me the reason of choosing the first filter instead of the second one? In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! I saw on the Web that this problem has been faced, but I didn't see any solution. Does someone have any idea to fix this issue? Thanks, -Saïd
Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory
Thanks Koji for the reply and for updating wiki. As it's written now in wiki, it sounds (at least to me) like MappingCharFilterFactory works only with WhitespaceTokenizerFactory. Did you really mean that? Because this filter works also with other tkenizers. For instance, in my text type, I'm using StandardTokenizerFactory for document processing, and WhitespaceTokenizerFactory for query processing. I also noticed that, in whatever order you put this filter in the definition of a field type, it's always applied (during text processing) before the tokenizer and all the other filters. Is there a reason for that? Is there a possibility to force the filter to be applied at a certain order among the other filters? Thanks, -S On Jul 5, 2010, at 4:28 PM, Koji Sekiguchi wrote: In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4), Tokenizers can take Reader argument in constructor. But after that, because they can take CharStream argument in constructor, *CharStreamAware* Tokenizers are no longer needed (all Tokenizers are aware of CharStream). I'll update the wiki. Koji -- http://www.rondhuit.com/en/
Re: Use free text to search against boolean fields?
Hi Jan, The vocabulary of my domain is very small and pretty controlled. Users will ask queries about features of our products, and we have less than one hundred features.. So the idea is to have a text field features storing all the features. And, re: the multilingualism, I can have features_en, features_fr, etc. What do you think? -Saïd On Jul 3, 2010, at 5:09 PM, Jan Høydahl / Cominvent wrote: Hi, It would help to know more about the actual application, and see some use cases in order to answer that question. I thought that this would be free-text queries from users, and as soon as you have free-text then you WILL get all kinds of stuff in the queries. However, if your users are well educated on how to query your system and behave, then what you suggest makes more sense. It's quick to test and see how it works. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 3. juli 2010, at 01.11, Saïd Radhouani wrote: Hi Jan, Thanks for this suggestion. If we choose parsing, then why don't we do it at the indexing side, instead of the querying side, which might slows down the search process? i.e., if a document has is_man=true and is_single=true, the we populate a text field by the words man and single. Then, during the search, we compare the user query with the text field. There's no intelligent query in my application, i.e., users would not ask for not smoking. If they mention a word, it means that the boolean value is true. I don't have many fields, so populating a text field will not dramatically increase the size of my index. What do you think? -Saïd On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote: Hi, I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start with. I'm not saying it is easy to write such a parser, but you know the domain and the users... Another reason for doing it this way is that if you have a field does_smoke=true, you still want to match if someone writes not smoking. Your parser would have to understand negations, e.g. through a set of regex ((not|non|no) (smoker|smoking|smoke))... You could always do a mix also - to keep a free-text field as well, and any words that your parser does not understand can be passed through to the free-text as a should term with a boost. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 2. juli 2010, at 18.36, Saïd Radhouani wrote: Hi, I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc. Logically, the underlying fields have a value of yes or no. That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be single man having job. Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the no case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string man. Then, I copy all these fields into a text field that I ca user for free text search. Does that make sense? Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN = man, single, jog, FR = homme, célibataire, emploi, etc.) Thanks! -Saïd
Use free text to search against boolean fields?
Hi, I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc. Logically, the underlying fields have a value of yes or no. That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be single man having job. Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the no case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string man. Then, I copy all these fields into a text field that I ca user for free text search. Does that make sense? Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN = man, single, jog, FR = homme, célibataire, emploi, etc.) Thanks! -Saïd
Re: Use free text to search against boolean fields?
Hi Jan, Thanks for this suggestion. If we choose parsing, then why don't we do it at the indexing side, instead of the querying side, which might slows down the search process? i.e., if a document has is_man=true and is_single=true, the we populate a text field by the words man and single. Then, during the search, we compare the user query with the text field. There's no intelligent query in my application, i.e., users would not ask for not smoking. If they mention a word, it means that the boolean value is true. I don't have many fields, so populating a text field will not dramatically increase the size of my index. What do you think? -Saïd On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote: Hi, I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start with. I'm not saying it is easy to write such a parser, but you know the domain and the users... Another reason for doing it this way is that if you have a field does_smoke=true, you still want to match if someone writes not smoking. Your parser would have to understand negations, e.g. through a set of regex ((not|non|no) (smoker|smoking|smoke))... You could always do a mix also - to keep a free-text field as well, and any words that your parser does not understand can be passed through to the free-text as a should term with a boost. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 2. juli 2010, at 18.36, Saïd Radhouani wrote: Hi, I have the following kind of data to index in a multilingual context: is_man, is_single, has_job, etc. Logically, the underlying fields have a value of yes or no. That's why the boolean type would be appropriate. But my problem is, in addition to be able to filter on these fields, I would like to give my users the possibility to search against these fields using free text. i.e., a query might be single man having job. Therefore, I think that the boolean type is not appropriate anymore. Instead, I'm thinking of using the string type, and each field will be either empty (the no case), or populated by its own tag. e.g., if we deal about a man, the field is_man will contain the string man. Then, I copy all these fields into a text field that I ca user for free text search. Does that make sense? Does that make sense in a multilingual context, i.e., field tags can be different in each language (EN = man, single, jog, FR = homme, célibataire, emploi, etc.) Thanks! -Saïd
Multilingual - Search against the appropriate field
Hi, I know this topic has been treated many times in the (distant) past, but I wonder whether there are new better practices/tendencies. In my application, I'm dealing with documents in different languages. Each document is monolingual; it has some fields containing free text and a set of fields that do not require any text analysis. For the free text, we need to make a specific analysis based of the language of the document. I'm for the use of a single index for all the documents instead of one index per language (any objection?). Thus, in schema.xml, I need to declare a separate field for each language (text_fr, text_en, etc.), each with its own appropriate analysis. Then, during the indexing, I need to assign the free text content of each document to the appropriate field. Thus, for each document, only one of the freetext fields would be populated. My question is, at search time, what is the best solution to search against the appropriate field? I know that using dismax, we can define in qf the set the fields we want to search against. e.g., str name=qf text_fr text_en/str With this solution, does Solr choose the appropriate analysis for the query. i.e., if a query is compared to a document having English free text (text_en is populated), does Solr analyze the query as it was in English ? One problem with this approach is that, each query will be compared to all the available documents. i.e., a query in English would be compared to a document in French. As I know, if we know the query language, this problem can be avoided, either by searching against the appropriate field (e.g., text_fr:query), or by using a filter to select only those documents having English text. Am I correct? Or is there a better solution? Thanks, -Saïd
Re: Multilingual - Search against the appropriate field
Hi Jan, I totally agree with what you said. In a), you talked about boosting. I guess you meant to boost at the client side, right? I still have a question: does Solr choose the appropriate analysis for the query. i.e., if a query is compared to a document having English free text (text_en is populated), does Solr analyze it as it was in English ? Thanks, -Saïd On Jul 1, 2010, at 1:26 PM, Jan Høydahl / Cominvent wrote: Hi, I have chosen the same approach as you, indexing content into text_language fields with custom analysis, and it works great. Solr does not have any overhead with this even if there are hundreds of languages, due to the schema-less nature of Lucene. And if you know which language is being searched, you can select only those fields in question, and you'd still be as fast as the mono language case. But you'd only get documents in that language returned. Say you want to match across languages, it could be you search for obama which would be written the same in all languages. How to achieve this? I see two approaches: a) Seach across all languages with proper analysis, as you suggest qf=text_fr text_en^10 (you can even boost the preferred languages). b) Index all content in a text_all field with no stemming involved and search qf=text_all (you will match obama in all languages but lose stemming) My feeling is that a) would work if you have a limited set of languages, but b) might be necessary if you have dozens of languages to search across, due to reduced query performance with such a large disMax query. Of course with a) there may be ambiguities that an english word gets stemmed and hits the same stem as a totally different french word - I don't have any hands on examples, but I'm sure the issue exists. Then it is probably better to search the other languages un-stemmed, like a hybrid approach: c) Search the query language stemmed and all other unstemmed (qf=text_en^10 text_all - giving increased recall) The downside of a text_all field is you almost double the size of your index worst-case. Then you have the issue of displaying the results in front end. Which title do you pick? title_en or title_fr? Here, I also see two solutions and I have tried both: 1) Store a title_display which is stored, while the title_language fields are only indexed, not stored. Use the title_display in frontend 2) Make a wrapper around QueryResult class so when frontend asks for title, you intelligently try to pull out title_XY where XY is pulled from documents language metadata. I think which you choose depends on taste, each has its + and - -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. juli 2010, at 12.26, Saïd Radhouani wrote: Hi, I know this topic has been treated many times in the (distant) past, but I wonder whether there are new better practices/tendencies. In my application, I'm dealing with documents in different languages. Each document is monolingual; it has some fields containing free text and a set of fields that do not require any text analysis. For the free text, we need to make a specific analysis based of the language of the document. I'm for the use of a single index for all the documents instead of one index per language (any objection?). Thus, in schema.xml, I need to declare a separate field for each language (text_fr, text_en, etc.), each with its own appropriate analysis. Then, during the indexing, I need to assign the free text content of each document to the appropriate field. Thus, for each document, only one of the freetext fields would be populated. My question is, at search time, what is the best solution to search against the appropriate field? I know that using dismax, we can define in qf the set the fields we want to search against. e.g., str name=qf text_fr text_en/str With this solution, does Solr choose the appropriate analysis for the query. i.e., if a query is compared to a document having English free text (text_en is populated), does Solr analyze the query as it was in English ? One problem with this approach is that, each query will be compared to all the available documents. i.e., a query in English would be compared to a document in French. As I know, if we know the query language, this problem can be avoided, either by searching against the appropriate field (e.g., text_fr:query), or by using a filter to select only those documents having English text. Am I correct? Or is there a better solution? Thanks, -Saïd
Re: Multilingual - Search against the appropriate field
Hi Jan, I totally agree with what you said. In a), you talked about boosting. I guess you meant to boost at the client side, right? I still have a question: does Solr choose the appropriate analysis for the query. i.e., if a query is compared to a document having English free text (text_en is populated), does Solr analyze it as it was in English ? Thanks, -Saïd On Jul 1, 2010, at 1:26 PM, Jan Høydahl / Cominvent wrote: Hi, I have chosen the same approach as you, indexing content into text_language fields with custom analysis, and it works great. Solr does not have any overhead with this even if there are hundreds of languages, due to the schema-less nature of Lucene. And if you know which language is being searched, you can select only those fields in question, and you'd still be as fast as the mono language case. But you'd only get documents in that language returned. Say you want to match across languages, it could be you search for obama which would be written the same in all languages. How to achieve this? I see two approaches: a) Seach across all languages with proper analysis, as you suggest qf=text_fr text_en^10 (you can even boost the preferred languages). b) Index all content in a text_all field with no stemming involved and search qf=text_all (you will match obama in all languages but lose stemming) My feeling is that a) would work if you have a limited set of languages, but b) might be necessary if you have dozens of languages to search across, due to reduced query performance with such a large disMax query. Of course with a) there may be ambiguities that an english word gets stemmed and hits the same stem as a totally different french word - I don't have any hands on examples, but I'm sure the issue exists. Then it is probably better to search the other languages un-stemmed, like a hybrid approach: c) Search the query language stemmed and all other unstemmed (qf=text_en^10 text_all - giving increased recall) The downside of a text_all field is you almost double the size of your index worst-case. Then you have the issue of displaying the results in front end. Which title do you pick? title_en or title_fr? Here, I also see two solutions and I have tried both: 1) Store a title_display which is stored, while the title_language fields are only indexed, not stored. Use the title_display in frontend 2) Make a wrapper around QueryResult class so when frontend asks for title, you intelligently try to pull out title_XY where XY is pulled from documents language metadata. I think which you choose depends on taste, each has its + and - -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. juli 2010, at 12.26, Saïd Radhouani wrote: Hi, I know this topic has been treated many times in the (distant) past, but I wonder whether there are new better practices/tendencies. In my application, I'm dealing with documents in different languages. Each document is monolingual; it has some fields containing free text and a set of fields that do not require any text analysis. For the free text, we need to make a specific analysis based of the language of the document. I'm for the use of a single index for all the documents instead of one index per language (any objection?). Thus, in schema.xml, I need to declare a separate field for each language (text_fr, text_en, etc.), each with its own appropriate analysis. Then, during the indexing, I need to assign the free text content of each document to the appropriate field. Thus, for each document, only one of the freetext fields would be populated. My question is, at search time, what is the best solution to search against the appropriate field? I know that using dismax, we can define in qf the set the fields we want to search against. e.g., str name=qf text_fr text_en/str With this solution, does Solr choose the appropriate analysis for the query. i.e., if a query is compared to a document having English free text (text_en is populated), does Solr analyze the query as it was in English ? One problem with this approach is that, each query will be compared to all the available documents. i.e., a query in English would be compared to a document in French. As I know, if we know the query language, this problem can be avoided, either by searching against the appropriate field (e.g., text_fr:query), or by using a filter to select only those documents having English text. Am I correct? Or is there a better solution? Thanks, -Saïd
Re: Setting many properties for a multivalued field. Schema.xml ? External file?
Thanks so much Otis. This is working great. Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o pic To the best of my knowledge, everyone is saying that faceting cannot be done on dynamic fields (only on definitive field names). Thus, I tried the following and it's working: I assume that the stored pictures have a sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it means that the underlying doc has at least one picture: ...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:* While this is working fine, I'm wondering whether there's a cleaner way to do the same thing without assuming that pictures have a sequential number. Also, do you have any documentation about handling Dynamic Fields using SolrJ. So far, I found only issues about that on JIRA, but no documentation. Thanks a lot. -Saïd On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote: Saïd, Dynamic fields could help here, for example imagine a doc with: id pic_url_* pic_caption_* pic_description_* See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields So, for you: dynamicField name=pic_url_* type=string indexed=true stored=true/ dynamicField name=pic_caption_* type=text indexed=true stored=true/ dynamicField name=pic_description_* type=text indexed=true stored=true/ Then you can add docs with unlimited number of pic_(url|caption|description)_* fields, e.g. id pic_url_1 pic_caption_1 pic_description_1 id pic_url_2 pic_caption_2 pic_description_2 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Saïd Radhouani r.steve@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 25, 2010 6:01:13 PM Subject: Setting many properties for a multivalued field. Schema.xml ? External file? Hi, I'm trying to index data containing a multivalued field picture, that has three properties: url, caption and description: picture/ url/ caption/ description/ Thus, each indexed document might have many pictures, each of them has a url, a caption, and a description. I wonder wether it's possible to store this data using only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of using an external file to sore the properties of each picture, but I haven't tried yet this solution, waiting for your suggestions... Thanks, -Saïd
Re: Setting many properties for a multivalued field. Schema.xml ? External file?
Thanks Geert-Jan for the detailed answer. Actually, I don't search at all on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the number of pictures). Thus, your suggestion of adding an extra field NrOfPics [0,N] would be the best solution. Regarding the other suggestion: If you dont need search at all on these fields, the best thing imo is to store all pic-related info of all pics together by concatenating them with some delimiter which you know how to seperate at the client-side. That or just store it in an external RDB since solr is just sitting on the data and not doing anything intelligent with it. If I understand your suggestion correctly, you said that there's NO need to have many Dynamic Fields; instead, we can have one definitive field name, which can store a long string (concatenation of information about tens of pictures), e.g., using - and % delimiters: pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%... I don't clearly see the reason of doing this. Is there a gain in terms of performance? Or does this make programming on the client-side easier? Or something else? My other question was: in case we use Dynamic Fields, is there a documentation about using SolrJ for this purpose? Thanks -Saïd On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote: You can treat dynamic fields like any other field, so you can facet, sort, filter, etc on these fields (afaik) I believe the confusion arises that sometimes the usecase for dynamic fields seems to be ill-understood, i.e: to be able to use them to do some kind of wildcard search, e.g: search for a value in any of the dynamic fields at once like pic_url_*. This however is NOT possible. As far as your question goes: Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o pic To the best of my knowledge, everyone is saying that faceting cannot be done on dynamic fields (only on definitive field names). Thus, I tried the following and it's working: I assume that the stored pictures have a sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it means that the underlying doc has at least one picture: ...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:* While this is working fine, I'm wondering whether there's a cleaner way to do the same thing without assuming that pictures have a sequential number. If I understand your question correctly: faceting on docs with and without pics could ofcourse by done like you mention, however it would be more efficient to have an extra field defined: hasAtLestOnePic with values (0 | 1) use that to facet / filter on. you can extend this to NrOfPics [0,N) if you need to filter / facet on docs with a certain nr of pics. also I wondered what else you wanted to do with this pic-related info. Do you want to search on pic-description / pic-caption for instance? In that case the dynamic-fields approach may not be what you want: how would you know in which dynamic-field to search for a particular term? Would if be pic_desc_1 , or pic_desc_x? Of couse you could OR over all dynamic fields, but you need to know how many pics an upperbound for the nr of pics and it really doesn't feel right, to me at least. If you need search on pic_description for instance, but don't mind what pic matches, you could create a single field pic_description and put in the concat of all pic-descriptions and search on that, or just make it a a multi-valued field. If you dont need search at all on these fields, the best thing imo is to store all pic-related info of all pics together by concatenating them with some delimiter which you know how to seperate at the client-side. That or just store it in an external RDB since solr is just sitting on the data and not doing anything intelligent with it. I assume btw that you don't want to sort/ facet on pic-desc / pic_caption/ pic_url either ( I have a hard time thinking of a useful usecase for that) HTH, Geert-Jan 2010/6/26 Saïd Radhouani r.steve@gmail.com Thanks so much Otis. This is working great. Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o pic To the best of my knowledge, everyone is saying that faceting cannot be done on dynamic fields (only on definitive field names). Thus, I tried the following and it's working: I assume that the stored pictures have a sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it means that the underlying doc has at least one picture: ...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:* While this is working fine, I'm wondering whether there's a cleaner way to do the same thing without assuming that pictures have a sequential number. Also, do you have any documentation about handling Dynamic Fields using SolrJ. So far, I found only issues about that on JIRA
Re: Setting many properties for a multivalued field. Schema.xml ? External file?
Thanks Geert-Jan, this is indeed very helpful. The delimiters I gave were just for the need of the example. I will use non frequent delimiter. Cheers, -Saïd On Jun 26, 2010, at 1:53 PM, Geert-Jan Brits wrote: If I understand your suggestion correctly, you said that there's NO need to have many Dynamic Fields; instead, we can have one definitive field name, which can store a long string (concatenation of information about tens of pictures), e.g., using - and % delimiters: pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%... I don't clearly see the reason of doing this. Is there a gain in terms of performance? Or does this make programming on the client-side easier? Or something else? I think you should ask the exact opposite question. If you don't do anything with these fields which Solr is particularly good at (searching / filtering / faceting/ sorting) why go through the trouble of creating dynamic fields? (more fields is more overhead cost/ tracking cost no matter how you look at it) Moreover, indeed from a client-view it's easier the way I suggested, since otherwise you: - would have to ask (through SolrJ) to include all dynamic fields to be returned in the Fl-field ( http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult, because a-priori you don't know how many dynamic-fields to query. So in other words you can't just ask SOlr (though SolrJ lik you asked) to just return all dynamic fields beginning with pic_*. (afaik) - your client iterate code (looping the pics) is a bit more involved. HTH, Cheers, Geert-Jan 2010/6/26 Saïd Radhouani r.steve@gmail.com Thanks Geert-Jan for the detailed answer. Actually, I don't search at all on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the number of pictures). Thus, your suggestion of adding an extra field NrOfPics [0,N] would be the best solution. Regarding the other suggestion: If you dont need search at all on these fields, the best thing imo is to store all pic-related info of all pics together by concatenating them with some delimiter which you know how to seperate at the client-side. That or just store it in an external RDB since solr is just sitting on the data and not doing anything intelligent with it. If I understand your suggestion correctly, you said that there's NO need to have many Dynamic Fields; instead, we can have one definitive field name, which can store a long string (concatenation of information about tens of pictures), e.g., using - and % delimiters: pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%... I don't clearly see the reason of doing this. Is there a gain in terms of performance? Or does this make programming on the client-side easier? Or something else? My other question was: in case we use Dynamic Fields, is there a documentation about using SolrJ for this purpose? Thanks -Saïd On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote: You can treat dynamic fields like any other field, so you can facet, sort, filter, etc on these fields (afaik) I believe the confusion arises that sometimes the usecase for dynamic fields seems to be ill-understood, i.e: to be able to use them to do some kind of wildcard search, e.g: search for a value in any of the dynamic fields at once like pic_url_*. This however is NOT possible. As far as your question goes: Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o pic To the best of my knowledge, everyone is saying that faceting cannot be done on dynamic fields (only on definitive field names). Thus, I tried the following and it's working: I assume that the stored pictures have a sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it means that the underlying doc has at least one picture: ...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:* While this is working fine, I'm wondering whether there's a cleaner way to do the same thing without assuming that pictures have a sequential number. If I understand your question correctly: faceting on docs with and without pics could ofcourse by done like you mention, however it would be more efficient to have an extra field defined: hasAtLestOnePic with values (0 | 1) use that to facet / filter on. you can extend this to NrOfPics [0,N) if you need to filter / facet on docs with a certain nr of pics. also I wondered what else you wanted to do with this pic-related info. Do you want to search on pic-description / pic-caption for instance? In that case the dynamic-fields approach may not be what you want: how would you know in which dynamic-field to search for a particular term? Would if be pic_desc_1 , or pic_desc_x? Of couse you could OR over all dynamic fields, but you need to know how many pics
Setting many properties for a multivalued field. Schema.xml ? External file?
Hi, I'm trying to index data containing a multivalued field picture, that has three properties: url, caption and description: picture/ url/ caption/ description/ Thus, each indexed document might have many pictures, each of them has a url, a caption, and a description. I wonder wether it's possible to store this data using only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of using an external file to sore the properties of each picture, but I haven't tried yet this solution, waiting for your suggestions... Thanks, -Saïd
TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?
Hi, I'm using the Terms Component to se up the autocomplete feature based on a String field. Here are the params I'm using: terms=trueterms.fl=typeterms.lower=catterms.prefix=catterms.lower.incl=false With the above params, I've been able to get suggestions for terms that start with the specified prefix. I'm wondering wether it's possible to: - have inclusive search, i.e., by typing cat, we get category, subcategory, etc.? - start suggestion from any word in the field. i.e., by typing cat, we get The best category...? Thanks! -Saïd
Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ
Hello, I'm developing a Web application that communicate with Solr using SolrJ. I have three search interfaces, and I'm facing two options: 1- Configuring one SearchHandler per search interface in solrconfig.xml Or 2- Write the configuration in the java servlet code that is using SolrJ It there any significant difference between these two options ? If yes, what's the best choice? Thanks, -Saïd
Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ
I completely agreed. Thanks a lot! -S On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote: Why would someone port the solr config into servlet code ? IMO the first option would be the best choice, one obvious reason is that, when alter the solr config you only need to restart the server, whereas changing in the source drive you to redeploy your app and restart the server. On 6/21/10, Saïd Radhouani r.steve@gmail.com wrote: Hello, I'm developing a Web application that communicate with Solr using SolrJ. I have three search interfaces, and I'm facing two options: 1- Configuring one SearchHandler per search interface in solrconfig.xml Or 2- Write the configuration in the java servlet code that is using SolrJ It there any significant difference between these two options ? If yes, what's the best choice? Thanks, -Saïd -- Abdelhamid ABID Software Engineer- J2EE / WEB
Re: Issue w/ highlighting a String field
There's a match between the query and the content of field I want to highlight on. Solr is giving me the id of the document matching my query, but it's not displaying the field I want to highlight on. Here's the definition of the field I want to highlight on:field name=title type=string indexed=false stored=true / And here's part of my URL: /?q=TerraindebugQuery=onhl=truehl.fl=title If I change the type to text instead of string, the highlighting works well! Thanks for your help. -S. 2010/3/23 Ahmet Arslan iori...@yahoo.com Thanks Erik. Actually, I restarted and reindexed numers of time, but still not working. Highlighting on string typed fields perferctly works. See the output of : http://localhost:8983/solr/select/?q=id%3ASOLR1000version=2.2start=0rows=10indent=onhl=truehl.fl=id But there must be a match/hit to get highlighting. What is your query and candidate field content that you want to highlight?
Re: Issue w/ highlighting a String field
2010/3/24 Ahmet Arslan iori...@yahoo.com There's a match between the query and the content of field I want to highlight on. Solr is giving me the id of the document matching my query, but it's not displaying the field I want to highlight on. Here's the definition of the field I want to highlight on:field name=title type=string indexed=false stored=true / And here's part of my URL: /?q=TerraindebugQuery=onhl=truehl.fl=title With q=Terrain you are querying your defaultSearchField and requesting highlighting from title field. I don't have defaultSearchField, instead, I have the following qf clause, where title_tokenized is a tokenized version of title str name=qf title_tokenized^3 text_description_tokenized phonetic_text^0.5/str What is numFound when you hit this url? Highlighting comes? the numFound is not zero, I get results, and also, in the highlighting section, I get the id of the docs that matched my query /?q=title:TerraindebugQuery=onhl=truehl.fl=title if it is zero, then it means that your match comes from your defaultSearchField (not from title field). if it is not zero, highlighting should work. can you confirm this? this URL gives zero answer. Again, I don't have defaultSearchField, the result is coming from the qf clause. What do you think? Thanks.
Re: Issue w/ highlighting a String field
I didn't know that you are using dismax. In your query fields list there is no title field. Probably match is coming from title_tokenized, and when you request highlighting from title (hl.fl=title) it returns empty snippets. If thats the case it is pretty expected because string typed fields are not analyzed. I mean there is no partial matches on string fields. If your title contains Terrain something q=Terrain won't match this document. What are the title fields of returned documents? You are right, the match is coming from the title_tokenized, but I also added the field title to the qf clause, but still not working. We should re-write this url (just to query on title field) accourding to dismax: /?q=TerraindebugQuery=onhl=truehl.fl=titleqf=title /?q=TerraindebugQuery=onhl=truehl.fl=titleqf=title is not giving any result, perhaps because title is not tokenized. I tried even phrases with , but still not working. On the other hand, I got highlighting *working*by adding to the above URL the following: qf=title_tokenized. With this configuration, the title field is highlighted only when there's a perfect match, i.e., the quoted query equals the title content (f.i., q=Terrain sehloul allows highlighting the entire title containing Terrain sehloul, but q=Terrain sehloul doesn't enable to highlight this title. Is there a solution to this problem? Thanks a lot.
Re: Issue w/ highlighting a String field
2010/3/24 Ahmet Arslan iori...@yahoo.com With this configuration, the title field is highlighted only when there's a perfect match, i.e., the quoted query equals the title content (f.i., q=Terrain sehloul allows highlighting the entire title containing Terrain sehloul, Exactly. There should be a *perfect* match for string typed fields to return snippets. but q=Terrain sehloul doesn't enable to highlight this title. Is there a solution to this problem? Escaping (using backslash) whitespace can solve this problem. q=Terrain\ sehloul Now i clearly understand you. You have a title field containing 'Terrain sehloul' and you want to get highlighting with the query Terrain. You cannot do that with type=string. You need a tokenized field type in your case. Thank a lot Ahmet. In addition, I want to highlight phrases containing stop words. I guess that the best way is to use a tokenized type without stopwordFilter. Do you agree with me defining a new type for this purpose ? By he way, I wanted to highlight a phrase using a tokenized field type, but I got wrong result; I tried 2 cases (q=Terrain\ sehloul and q=Terrain sehloul), and I got the following: emTerrain/em emsehloul/em Any ideas? Thanks
Re: Issue w/ highlighting a String field
2010/3/24 Ahmet Arslan iori...@yahoo.com Thank a lot Ahmet. In addition, I want to highlight phrases containing stop words. I guess that the best way is to use a tokenized type without stopwordFilter. Do you agree with me defining a new type for this purpose ? I am not sure about that. May be solr.CommonGramsFilterFactory can do the job. I personally do not perform stop-word removal. By he way, I wanted to highlight a phrase using a tokenized field type, but I got wrong result; I tried 2 cases (q=Terrain\ sehloul and q=Terrain sehloul), and I got the following: emTerrain/em emsehloul/em This is okey. Were you expecting this? : emTerrain sehloul/em Yes, that's what I was expecting. Actually, I'd like to highlight phrases containing stopwords, like emTerrain à sehloul/em
Re: Issue w/ highlighting a String field
Thanks a lot Ahmet. Now I'm gonna learn new thing: how to apply a new patch :) Cheers. 2010/3/24 Ahmet Arslan iori...@yahoo.com Yes, that's what I was expecting. Actually, I'd like to highlight phrases containing stopwords, like emTerrain à sehloul/em Lucene's FastVectorHighlighter[1] can do that kind of phrase highlighting. It seems that solr integration [2] has finished. You need to apply SOLR-1268 patch. [1] http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.html [2]http://issues.apache.org/jira/browse/SOLR-1268
Re: Issue w/ highlighting a String field
Thanks Markus. It says that a tokenizer ust be defined for the field. Here's is the fildType I'm using and the field I want to highlight on. As you can see, I defined a tokenizer, but it's not working though. Any idea? In the schema: fieldType name=text_Sort class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType field name=title_sort type=text_Sort indexed=true stored=true multiValued=false / In solrconfig.xml: str name=hl.fltitle_sort text_description /str At the same time, I wanted to highlight phrases (including stop words), but it's not working. I use and as you can see in my fieldType, I don't have a stopword filter. Any idea? Thanks a lot, -S. Thanks 2010/3/23 Markus Jelsma mar...@buyways.nl Hello, Check out the wiki [1] on what options to use for highlighting and other components. [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase Cheers, On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote: I have trouble with highlighting field of type string. It looks like highlighting is only working with tokenized fields, f.i., it worked with text and another type I defined. Is this true, or I'm making a mistake that is preventing me to have the highlighting option working on string? Thanks for your help. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 http://www.linkedin.com/in/markus17%0A050-8536620 / 06-50258350
Re: Issue w/ highlighting a String field
Thanks Erik. Actually, I restarted and reindexed numers of time, but still not working. RE: your question, I intend to use this field for automatic PHRASED boosting; is that ok?: str name=pf title_sort /str Thanks. 2010/3/23 Erick Erickson erickerick...@gmail.com Did you restart solr and reindex? just changing the field definition won't help you without reindexing... One thing worries me about your fragment, you call it text_Sort. If you really intend to sort by this field, it may NOT be tokenized, you'll probably have to use copyfield HTH Erick On Tue, Mar 23, 2010 at 12:45 PM, Saïd Radhouani r.steve@gmail.com wrote: Thanks Markus. It says that a tokenizer ust be defined for the field. Here's is the fildType I'm using and the field I want to highlight on. As you can see, I defined a tokenizer, but it's not working though. Any idea? In the schema: fieldType name=text_Sort class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType field name=title_sort type=text_Sort indexed=true stored=true multiValued=false / In solrconfig.xml: str name=hl.fltitle_sort text_description /str At the same time, I wanted to highlight phrases (including stop words), but it's not working. I use and as you can see in my fieldType, I don't have a stopword filter. Any idea? Thanks a lot, -S. Thanks 2010/3/23 Markus Jelsma mar...@buyways.nl Hello, Check out the wiki [1] on what options to use for highlighting and other components. [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase Cheers, On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote: I have trouble with highlighting field of type string. It looks like highlighting is only working with tokenized fields, f.i., it worked with text and another type I defined. Is this true, or I'm making a mistake that is preventing me to have the highlighting option working on string? Thanks for your help. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 http://www.linkedin.com/in/markus17%0A050-8536620 / 06-50258350
Solr 1.4 - Stemmer expansion
I'm using the SnowballPorterFilterFactory for stemming French words. Some words are not reconginized by this stemmer; I wonder wether, like synonyms processing, the stemmers have the option of expansion. Thanks.
Re: Solr 1.4 - Stemmer expansion
The configuration is correct and it works perfectly for French. So far, all the French words I tried got stemmed correctly; except the word studios. This is why I thought about expansion, perhaps I might need it for other words. Thanks, -Saïd 2010/3/17 Erick Erickson erickerick...@gmail.com Did you specify language=French? Did you re-index after specifying this? Can you give some examples of unrecognized words? Did you look in your index to see what was actually indexed via the admin pages and/or Luke? Did you use debugQuery=on to see how your search was parsed? Could you post your schema definitions for the field in question so folks can look at it? We need some details in order to actually be helpful G... Best Erick On Wed, Mar 17, 2010 at 5:05 AM, Saïd Radhouani r.steve@gmail.com wrote: I'm using the SnowballPorterFilterFactory for stemming French words. Some words are not reconginized by this stemmer; I wonder wether, like synonyms processing, the stemmers have the option of expansion. Thanks.
Re: mincount doesn't work with FacetQuery
Chris - Shall I open a JIRA request to add this feature? Thnx 2010/3/11 Chris Hostetter hossman_luc...@fucit.org : I'm faceting with a query range (with addFacetQuery) and setting mincount to : 10 (with setFacetMinCount(10)), but Solr is not respecting this mincount; : it's still giving me all responses, even those having less than 10 retrieved : documents. if by all responses you mean all facet queries then that is the correct behavior -- facet.mincount is a param that affects facet.field, not fact.query. The documentation notes this, in that all of the params are divided by section... http://wiki.apache.org/solr/SimpleFacetParameters ...if you'd like to open a feature request, it would be fairly easy to make facet.query (and facet.date) consider facet.mincount as well. -Hoss
SolrJ - how separte different results from the same facet query?
I'm faceting with a two different query ranges while using addFacetQuery. I wonder wether it's possible using SolrJ to extract the result of each query range separately. Here's is an example: addFacetQuery(price:[* TO 150]); addFacetQuery(price:[151 TO 300]); etc. addFacetQuery(length:[* TO 5]);addFacetQuery(length:[5 TO 10]); etc. When I use getFacetQuery, SolrJ gives me the responses of both query ranges (prices and lengths) mixed in the same list. I wonder wether it's possible to tell SolrJ to extract the response of a specific query range, i.e., tell it to extract the price-based response in a list and the length-based response in another list. It would be helpful to have something like getFacetQuery(field=price), getFacetQuery(field=length), etc. Any ideas? Thanks.