Re: Add me to the Solr ContributorsGroup

2016-03-03 Thread Saïd Radhouani
Actually, I just found my username in the list of names (
https://wiki.apache.org/solr/ContributorsGroup), however, when I wanted to
create my own page or change an existing one, I got the  message: "You are
not allowed to edit this page".

Thank you in advance for your collaboration,
-SR

2016-03-03 14:12 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>:

> Hello,
>
> Could you please add me to the Contributor Group. Here are my account
> info :
>
> - Name: Saïd Radhouani
> - User name: radhouani
> - email: said.radhou...@gmail.com
>
> For more info about myself, please visit my linked page:
> https://www.linkedin.com/in/radhouani
>
> Thanks,
> -Saïd
>
> 2015-12-30 20:36 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>:
>
>> Hi - I'd appreciate if you could add me to the Contributor Group. Here
>> are my account info :
>>
>> - Name: Saïd Radhouani
>> - User name: radhouani
>> - email: said.radhou...@gmail.com
>>
>> Thanks,
>> -Saïd
>>
>
>


Re: Add me to the Solr ContributorsGroup

2016-03-03 Thread Saïd Radhouani
Hello,

Could you please add me to the Contributor Group. Here are my account info :

- Name: Saïd Radhouani
- User name: radhouani
- email: said.radhou...@gmail.com

For more info about myself, please visit my linked page:
https://www.linkedin.com/in/radhouani

Thanks,
-Saïd

2015-12-30 20:36 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>:

> Hi - I'd appreciate if you could add me to the Contributor Group. Here are
> my account info :
>
> - Name: Saïd Radhouani
> - User name: radhouani
> - email: said.radhou...@gmail.com
>
> Thanks,
> -Saïd
>


Re: Add me to the Solr ContributorsGroup

2016-01-27 Thread Saïd Radhouani
Hello,

Could you please add me to the Contributor Group. Here are my account info :

- Name: Saïd Radhouani
- User name: radhouani
- email: said.radhou...@gmail.com

For more info about myself, please visit my linked page:
https://www.linkedin.com/in/radhouani

Thanks,
-Saïd

2015-12-30 20:36 GMT-05:00 Saïd Radhouani <said.radhou...@gmail.com>:

> Hi - I'd appreciate if you could add me to the Contributor Group. Here are
> my account info :
>
> - Name: Saïd Radhouani
> - User name: radhouani
> - email: said.radhou...@gmail.com
>
> Thanks,
> -Saïd
>


Add me to the Solr ContributorsGroup

2015-12-30 Thread Saïd Radhouani
Hi - I'd appreciate if you could add me to the Contributor Group. Here are
my account info :

- Name: Saïd Radhouani
- User name: radhouani
- email: said.radhou...@gmail.com

Thanks,
-Saïd


Re: LocalSolr distance in km?

2010-07-21 Thread Saïd Radhouani
Hi,

What resource are you using for LocalSolr?
Using the SpatialTierQParser, you can choose between km or mile: 
http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/   
Or, if you are using the LocalSolrQueryComponent 
(http://www.gissearch.com/localsolr), and you can't choose between the two 
units, you can use the radius parameter and the conversion from mile to Km (1 
kilometer = 0.621371192 mile), e.g., 
http://...select?qt=geolat=xx.xxlong=yy.yyq=*:*radius=0.621371192

HTP
-S

On Jul 21, 2010, at 6:14 AM, Chamnap Chhorn wrote:

 Hi,
 
 I want to do a geo query with LocalSolr. However, It seems it supports only
 miles **when calculating distances. Is there a quick way to use this search
 component with solr using Km instead?
 The other thing I want it to calculate distance start from 500 meters up.
 How could I do this?
 
 -- 
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



Spatial Search - Best choice (if any)?

2010-07-16 Thread Saïd Radhouani
Hi,

Using Solr 1.4, I'm now working on adding spatial search options, such as 
distance-based sorting, Bounding-box filter, etc.

To the best of my knowledge, there are three possible points we can start from: 

1. The http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/
2. The gissearch.com
3. The 
http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources
 

I saw that these three options have been used but didn't see any comparison 
between them. Is there any one out there who can recommend one option over 
another? 

Thanks,
-S

Re: Spatial Search - Best choice ?

2010-07-15 Thread Saïd Radhouani
Thanks for the links, but this makes things even harder :) Do you have any 
recommendations for one pointer over another?

Thanks,
-S


On Jul 15, 2010, at 1:08 PM, findbestopensource wrote:

 Some more pointers to spatial search,
 
 http://www.jteam.nl/products/spatialsolrplugin.html
 http://code.google.com/p/spatial-search-lucene/
 http://sujitpal.blogspot.com/2008/02/spatial-search-with-lucene.html
 
 Regards
 Aditya
 www.findbestopensource.com
 
 
 
 On Thu, Jul 15, 2010 at 3:54 PM, Saïd Radhouani r.steve@gmail.comwrote:
 
 Hi,
 
 Using Solr 1.4, I'm now working on adding spatial search options, such as
 distance-based sorting, Bounding-box filter, etc.
 
 To the best of my knowledge, there are three possible points we can start
 from:
 
 1. The
 http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/
 2. The gissearch.com
 3. The
 http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources
 
 I saw that these three options have been used but didn't see any comparison
 between them. Is there any one out there who can recommend one option over
 another?
 
 Thanks,
 -S



Function Query Sorting vs 'Sort' parameter?

2010-07-09 Thread Saïd Radhouani
Hi,

I'm making some basic sorting (date, price, etc.) using the sort parameter 
(sort=field+asc), and it's working fine. I'm wondering whether there's a 
significant argument to use function query sorting instead of the sort 
parameter?

Thanks,
-S

Re: Function Query Sorting vs 'Sort' parameter?

2010-07-09 Thread Saïd Radhouani
Yes, indeed, you understood my question. Looking forward to the next version 
then.

To your reply, I'd add that _val_ is used for standard request handler, and bf 
is used for dismax, right?

-S 


On Jul 10, 2010, at 12:05 AM, Koji Sekiguchi wrote:

 (10/07/10 0:54), Saïd Radhouani wrote:
 Hi,
 
 I'm making some basic sorting (date, price, etc.) using the sort parameter 
 (sort=field+asc), and it's working fine. I'm wondering whether there's a 
 significant argument to use function query sorting instead of the sort 
 parameter?
 
 Thanks,
 -S
   
 I'm not sure if I understand your question correctly,
 but sort by function will be available in next version of Solr:
 
 https://issues.apache.org/jira/browse/SOLR-1297
 
 q=ipodsort=func(price) asc
 
 Or you can sort by function via _val_ in Solr 1.4:
 
 q=ipod^0 _val_:func(price)sort=score asc
 
 Koji
 
 -- 
 http://www.rondhuit.com/en/
 



Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-05 Thread Saïd Radhouani
Hi,

I'm using Solr 1.4 and I need to use a Latin Accent Filter. In the Solr wiki 
(http://wiki.apache.org/solr/SchemaDesign), it's recommended to use 
MappingCharFilterFactory instead of ISOLatin1AccentFilterFactory.

Could someone tell me the reason of choosing the first filter instead of the 
second one?

In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must 
be used with MappingCharFilterFactory. But, when I use these tokenizer and 
filter together, I get a sever error saying that the filed type containing 
these filter and tokenizer is unknown. However, when I use this filter with 
StandardTokenizerFactory  or WhitespaceTokenizerFactory!

I saw on the Web that this problem has been faced, but I didn't see any 
solution. Does someone have any idea to fix this issue?

Thanks,
-Saïd

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-05 Thread Saïd Radhouani
Thanks Koji for the reply and for updating wiki. As it's written now in wiki, 
it sounds (at least to me) like MappingCharFilterFactory works only with 
WhitespaceTokenizerFactory.

Did you really mean that? Because this filter  works also with other tkenizers. 
For instance, in my text type, I'm using StandardTokenizerFactory for document 
processing, and  WhitespaceTokenizerFactory for query processing.

I also noticed that, in whatever order you put this filter in the definition of 
a field type, it's always applied (during text processing) before the tokenizer 
and all the other filters. Is there a reason for that? Is there a possibility 
to force the filter to be applied at a certain order among the other filters?

Thanks,
-S

On Jul 5, 2010, at 4:28 PM, Koji Sekiguchi wrote:

 
 In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory 
 must be used with MappingCharFilterFactory. But, when I use these tokenizer 
 and filter together, I get a sever error saying that the filed type 
 containing these filter and tokenizer is unknown. However, when I use this 
 filter with StandardTokenizerFactory  or WhitespaceTokenizerFactory!
 
   
 The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4),
 Tokenizers can take Reader argument in constructor. But after that,
 because they can take CharStream argument in constructor,
 *CharStreamAware* Tokenizers are no longer needed (all Tokenizers
 are aware of CharStream). I'll update the wiki.
 
 Koji
 
 -- 
 http://www.rondhuit.com/en/
 



Re: Use free text to search against boolean fields?

2010-07-03 Thread Saïd Radhouani
Hi Jan,

The vocabulary of my domain is very small and pretty controlled. Users will ask 
queries about features of our products, and we have less than one hundred 
features.. So the idea is to have a text field features storing all the 
features. And, re: the multilingualism, I can have features_en, 
features_fr, etc.  

What do you think?
-Saïd


On Jul 3, 2010, at 5:09 PM, Jan Høydahl / Cominvent wrote:

 Hi,
 
 It would help to know more about the actual application, and see some use 
 cases in order to answer that question. I thought that this would be 
 free-text queries from users, and as soon as you have free-text then you WILL 
 get all kinds of stuff in the queries. However, if your users are well 
 educated on how to query your system and behave, then what you suggest makes 
 more sense. It's quick to test and see how it works.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Training in Europe - www.solrtraining.com
 
 On 3. juli 2010, at 01.11, Saïd Radhouani wrote:
 
 Hi Jan,
 
 Thanks for this suggestion. If we choose parsing, then why don't we do it at 
 the indexing side, instead of the querying side, which might slows down the 
 search process? i.e., if a document has is_man=true and is_single=true, 
 the we populate a text field by the words man and single. Then, during 
 the search, we compare the user query with the text field. There's no 
 intelligent query in my application, i.e., users would not ask for not 
 smoking. If they mention a word, it means that the boolean value is true.
 
 I don't have many fields, so populating a text field will not dramatically 
 increase the size of my index.
 
 What do you think?
 
 -Saïd
 
 On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote:
 
 Hi,
 
 I would rather go for the boolean variant and spend some time writing a 
 query parser which tries to understand all kinds of input people may make, 
 mapping it into boolean filters. In this way you can support both 
 navigation and search and keep both in sync whatever people prefert to 
 start with. I'm not saying it is easy to write such a parser, but you know 
 the domain and the users...
 
 Another reason for doing it this way is that if you have a field 
 does_smoke=true, you still want to match if someone writes not smoking. 
 Your parser would have to understand negations, e.g. through a set of regex 
 ((not|non|no) (smoker|smoking|smoke))...
 
 You could always do a mix also - to keep a free-text field as well, and any 
 words that your parser does not understand can be passed through to the 
 free-text as a should term with a boost.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Training in Europe - www.solrtraining.com
 
 On 2. juli 2010, at 18.36, Saïd Radhouani wrote:
 
 Hi,
 
 I have the following kind of data to index in a multilingual context: 
 is_man, is_single, has_job, etc.
 
 Logically, the underlying fields have a value of yes or no. That's why 
 the boolean type would be appropriate. But my problem is, in addition to 
 be able to filter on these fields, I would like to give my users the 
 possibility to search against these fields using free text. i.e., a query 
 might be single man having job. Therefore, I think that the boolean type 
 is not appropriate anymore. Instead, I'm thinking of using the string 
 type, and each field will be either empty (the no case), or populated by 
 its own tag. e.g., if we deal about a man, the field is_man will contain 
 the string man. Then, I copy all these fields into a text field that I 
 ca user for free text search.
 
 Does that make sense?
 
 Does that make sense in a multilingual context, i.e., field tags can be 
 different in each language (EN = man, single, jog, FR = homme, 
 célibataire, emploi, etc.)
 
 Thanks!
 
 -Saïd
 
 
 



Use free text to search against boolean fields?

2010-07-02 Thread Saïd Radhouani
Hi,

I have the following kind of data to index in a multilingual context: is_man, 
is_single, has_job, etc.

Logically, the underlying fields have a value of yes or no. That's why the 
boolean type would be appropriate. But my problem is, in addition to be able to 
filter on these fields, I would like to give my users the possibility to search 
against these fields using free text. i.e., a query might be single man having 
job. Therefore, I think that the boolean type is not appropriate anymore. 
Instead, I'm thinking of using the string type, and each field will be either 
empty (the no case), or populated by its own tag. e.g., if we deal about a 
man, the field is_man will contain the string man. Then, I copy all these 
fields into a text field that I ca user for free text search.

Does that make sense?

Does that make sense in a multilingual context, i.e., field tags can be 
different in each language (EN = man, single, jog, FR = homme, célibataire, 
emploi, etc.)

Thanks!

-Saïd

Re: Use free text to search against boolean fields?

2010-07-02 Thread Saïd Radhouani
Hi Jan,

Thanks for this suggestion. If we choose parsing, then why don't we do it at 
the indexing side, instead of the querying side, which might slows down the 
search process? i.e., if a document has is_man=true and is_single=true, the 
we populate a text field by the words man and single. Then, during the 
search, we compare the user query with the text field. There's no intelligent 
query in my application, i.e., users would not ask for not smoking. If they 
mention a word, it means that the boolean value is true.

I don't have many fields, so populating a text field will not dramatically 
increase the size of my index.

What do you think?

-Saïd

On Jul 3, 2010, at 12:36 AM, Jan Høydahl / Cominvent wrote:

 Hi,
 
 I would rather go for the boolean variant and spend some time writing a query 
 parser which tries to understand all kinds of input people may make, mapping 
 it into boolean filters. In this way you can support both navigation and 
 search and keep both in sync whatever people prefert to start with. I'm not 
 saying it is easy to write such a parser, but you know the domain and the 
 users...
 
 Another reason for doing it this way is that if you have a field 
 does_smoke=true, you still want to match if someone writes not smoking. 
 Your parser would have to understand negations, e.g. through a set of regex 
 ((not|non|no) (smoker|smoking|smoke))...
 
 You could always do a mix also - to keep a free-text field as well, and any 
 words that your parser does not understand can be passed through to the 
 free-text as a should term with a boost.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Training in Europe - www.solrtraining.com
 
 On 2. juli 2010, at 18.36, Saïd Radhouani wrote:
 
 Hi,
 
 I have the following kind of data to index in a multilingual context: 
 is_man, is_single, has_job, etc.
 
 Logically, the underlying fields have a value of yes or no. That's why 
 the boolean type would be appropriate. But my problem is, in addition to be 
 able to filter on these fields, I would like to give my users the 
 possibility to search against these fields using free text. i.e., a query 
 might be single man having job. Therefore, I think that the boolean type 
 is not appropriate anymore. Instead, I'm thinking of using the string type, 
 and each field will be either empty (the no case), or populated by its own 
 tag. e.g., if we deal about a man, the field is_man will contain the string 
 man. Then, I copy all these fields into a text field that I ca user for 
 free text search.
 
 Does that make sense?
 
 Does that make sense in a multilingual context, i.e., field tags can be 
 different in each language (EN = man, single, jog, FR = homme, 
 célibataire, emploi, etc.)
 
 Thanks!
 
 -Saïd
 



Multilingual - Search against the appropriate field

2010-07-01 Thread Saïd Radhouani
Hi,

I know this topic has been treated many times in the (distant) past, but I 
wonder whether there are new better practices/tendencies.

In my application, I'm dealing with documents in different languages. Each 
document is monolingual; it has some fields containing free text and a set of 
fields that do not require any text analysis. For the free text, we need to 
make a specific analysis based of the language of the document.

I'm for the use of a single index for all the documents instead of one index 
per language (any objection?). Thus, in schema.xml, I need to declare a 
separate field for each language (text_fr, text_en, etc.), each with its own 
appropriate analysis. Then, during the indexing, I need to assign the free text 
content of each document to the appropriate field. Thus, for each document, 
only one of the freetext fields would be populated.

My question is, at search time, what is the best solution to search against the 
appropriate field?

I know that using dismax, we can define in qf the set the fields we want to 
search against. e.g., str name=qf text_fr text_en/str

With this solution, does Solr choose the appropriate analysis for the query. 
i.e., if a query is compared to a document having English free text (text_en is 
populated), does Solr analyze the query as it was in English ?

One problem with this approach is that, each query will be compared to all the 
available documents. i.e., a query in English would be compared to a document 
in French. As I know, if we know the query language, this problem can be 
avoided, either by searching against the appropriate field (e.g., 
text_fr:query), or by using a filter to select only those documents having 
English text. Am I correct? Or is there a better solution?

Thanks,
-Saïd



Re: Multilingual - Search against the appropriate field

2010-07-01 Thread Saïd Radhouani
Hi Jan,

I totally agree with what you said.

In a), you talked about boosting. I guess you meant to boost at the client 
side, right?

I still have a question: 

 does Solr choose the appropriate analysis for the query. i.e., if a query is 
 compared to a document having English free text (text_en is populated), does 
 Solr analyze it as it was in English ?


Thanks,
-Saïd

On Jul 1, 2010, at 1:26 PM, Jan Høydahl / Cominvent wrote:

 Hi,
 
 I have chosen the same approach as you, indexing content into text_language 
 fields with custom analysis, and it works great. Solr does not have any 
 overhead with this even if there are hundreds of languages, due to the 
 schema-less nature of Lucene.
 
 And if you know which language is being searched, you can select only those 
 fields in question, and you'd still be as fast as the mono language case. But 
 you'd only get documents in that language returned.
 
 Say you want to match across languages, it could be you search for obama 
 which would be written the same in all languages. How to achieve this? I see 
 two approaches:
 a) Seach across all languages with proper analysis, as you suggest qf=text_fr 
 text_en^10 (you can even boost the preferred languages).
 b) Index all content in a text_all field with no stemming involved and 
 search qf=text_all (you will match obama in all languages but lose stemming)
 
 My feeling is that a) would work if you have a limited set of languages, but 
 b) might be necessary if you have dozens of languages to search across, due 
 to reduced query performance with such a large disMax query.
 
 Of course with a) there may be ambiguities that an english word gets stemmed 
 and hits the same stem as a totally different french word - I don't have any 
 hands on examples, but I'm sure the issue exists. Then it is probably better 
 to search the other languages un-stemmed, like a hybrid approach:
 
 c) Search the query language stemmed and all other unstemmed (qf=text_en^10 
 text_all - giving increased recall)
 
 The downside of a text_all field is you almost double the size of your index 
 worst-case.
 
 Then you have the issue of displaying the results in front end.
 Which title do you pick? title_en or title_fr? Here, I also see two solutions 
 and I have tried both:
 1) Store a title_display which is stored, while the title_language fields 
 are only indexed, not stored. Use the title_display in frontend
 2) Make a wrapper around QueryResult class so when frontend asks for title, 
 you intelligently try to pull out title_XY where XY is pulled from documents 
 language metadata.
 
 I think which you choose depends on taste, each has its + and -
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Training in Europe - www.solrtraining.com
 
 On 1. juli 2010, at 12.26, Saïd Radhouani wrote:
 
 Hi,
 
 I know this topic has been treated many times in the (distant) past, but I 
 wonder whether there are new better practices/tendencies.
 
 In my application, I'm dealing with documents in different languages. Each 
 document is monolingual; it has some fields containing free text and a set 
 of fields that do not require any text analysis. For the free text, we need 
 to make a specific analysis based of the language of the document.
 
 I'm for the use of a single index for all the documents instead of one index 
 per language (any objection?). Thus, in schema.xml, I need to declare a 
 separate field for each language (text_fr, text_en, etc.), each with its own 
 appropriate analysis. Then, during the indexing, I need to assign the free 
 text content of each document to the appropriate field. Thus, for each 
 document, only one of the freetext fields would be populated.
 
 My question is, at search time, what is the best solution to search against 
 the appropriate field?
 
 I know that using dismax, we can define in qf the set the fields we want 
 to search against. e.g., str name=qf text_fr text_en/str
 
 With this solution, does Solr choose the appropriate analysis for the query. 
 i.e., if a query is compared to a document having English free text (text_en 
 is populated), does Solr analyze the query as it was in English ?
 
 One problem with this approach is that, each query will be compared to all 
 the available documents. i.e., a query in English would be compared to a 
 document in French. As I know, if we know the query language, this problem 
 can be avoided, either by searching against the appropriate field (e.g., 
 text_fr:query), or by using a filter to select only those documents having 
 English text. Am I correct? Or is there a better solution?
 
 Thanks,
 -Saïd
 
 
 



Re: Multilingual - Search against the appropriate field

2010-07-01 Thread Saïd Radhouani
Hi Jan,

I totally agree with what you said.

In a), you talked about boosting. I guess you meant to boost at the client 
side, right?

I still have a question: 

 does Solr choose the appropriate analysis for the query. i.e., if a query is 
 compared to a document having English free text (text_en is populated), does 
 Solr analyze it as it was in English ?


Thanks,
-Saïd

On Jul 1, 2010, at 1:26 PM, Jan Høydahl / Cominvent wrote:

 Hi,
 
 I have chosen the same approach as you, indexing content into text_language 
 fields with custom analysis, and it works great. Solr does not have any 
 overhead with this even if there are hundreds of languages, due to the 
 schema-less nature of Lucene.
 
 And if you know which language is being searched, you can select only those 
 fields in question, and you'd still be as fast as the mono language case. But 
 you'd only get documents in that language returned.
 
 Say you want to match across languages, it could be you search for obama 
 which would be written the same in all languages. How to achieve this? I see 
 two approaches:
 a) Seach across all languages with proper analysis, as you suggest qf=text_fr 
 text_en^10 (you can even boost the preferred languages).
 b) Index all content in a text_all field with no stemming involved and 
 search qf=text_all (you will match obama in all languages but lose stemming)
 
 My feeling is that a) would work if you have a limited set of languages, but 
 b) might be necessary if you have dozens of languages to search across, due 
 to reduced query performance with such a large disMax query.
 
 Of course with a) there may be ambiguities that an english word gets stemmed 
 and hits the same stem as a totally different french word - I don't have any 
 hands on examples, but I'm sure the issue exists. Then it is probably better 
 to search the other languages un-stemmed, like a hybrid approach:
 
 c) Search the query language stemmed and all other unstemmed (qf=text_en^10 
 text_all - giving increased recall)
 
 The downside of a text_all field is you almost double the size of your index 
 worst-case.
 
 Then you have the issue of displaying the results in front end.
 Which title do you pick? title_en or title_fr? Here, I also see two solutions 
 and I have tried both:
 1) Store a title_display which is stored, while the title_language fields 
 are only indexed, not stored. Use the title_display in frontend
 2) Make a wrapper around QueryResult class so when frontend asks for title, 
 you intelligently try to pull out title_XY where XY is pulled from documents 
 language metadata.
 
 I think which you choose depends on taste, each has its + and -
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Training in Europe - www.solrtraining.com
 
 On 1. juli 2010, at 12.26, Saïd Radhouani wrote:
 
 Hi,
 
 I know this topic has been treated many times in the (distant) past, but I 
 wonder whether there are new better practices/tendencies.
 
 In my application, I'm dealing with documents in different languages. Each 
 document is monolingual; it has some fields containing free text and a set 
 of fields that do not require any text analysis. For the free text, we need 
 to make a specific analysis based of the language of the document.
 
 I'm for the use of a single index for all the documents instead of one index 
 per language (any objection?). Thus, in schema.xml, I need to declare a 
 separate field for each language (text_fr, text_en, etc.), each with its own 
 appropriate analysis. Then, during the indexing, I need to assign the free 
 text content of each document to the appropriate field. Thus, for each 
 document, only one of the freetext fields would be populated.
 
 My question is, at search time, what is the best solution to search against 
 the appropriate field?
 
 I know that using dismax, we can define in qf the set the fields we want 
 to search against. e.g., str name=qf text_fr text_en/str
 
 With this solution, does Solr choose the appropriate analysis for the query. 
 i.e., if a query is compared to a document having English free text (text_en 
 is populated), does Solr analyze the query as it was in English ?
 
 One problem with this approach is that, each query will be compared to all 
 the available documents. i.e., a query in English would be compared to a 
 document in French. As I know, if we know the query language, this problem 
 can be avoided, either by searching against the appropriate field (e.g., 
 text_fr:query), or by using a filter to select only those documents having 
 English text. Am I correct? Or is there a better solution?
 
 Thanks,
 -Saïd
 
 
 



Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Saïd Radhouani
Thanks so much Otis. This is working great.

Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o pic

To the best of my knowledge, everyone is saying that faceting cannot be done on 
dynamic fields (only on definitive field names). Thus, I tried the following 
and it's working: I assume that the stored pictures have a sequential number 
(_1, _2, etc.), i.e., if pic_url_1 exists in the index, it means that the 
underlying doc has at least one picture: 

...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:*

While this is working fine, I'm wondering whether there's a cleaner way to do 
the same thing without assuming that pictures have a sequential number.

Also, do you have any documentation about handling Dynamic Fields using SolrJ. 
So far, I found only issues about that on JIRA, but no documentation.

Thanks a lot.

-Saïd

On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:

 Saïd,
 
 Dynamic fields could help here, for example imagine a doc with:
 id
 pic_url_*
 pic_caption_*
 pic_description_*
 
 See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
 
 So, for you:
 
 dynamicField name=pic_url_*  type=string  indexed=true  stored=true/
 dynamicField name=pic_caption_*  type=text  indexed=true  
 stored=true/
 dynamicField name=pic_description_*  type=text  indexed=true  
 stored=true/
 
 Then you can add docs with unlimited number of 
 pic_(url|caption|description)_* fields, e.g.
 
 id
 pic_url_1
 pic_caption_1
 pic_description_1
 
 id
 pic_url_2
 pic_caption_2
 pic_description_2
 
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
 From: Saïd Radhouani r.steve@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, June 25, 2010 6:01:13 PM
 Subject: Setting many properties for a multivalued field. Schema.xml ? 
 External file?
 
 Hi,
 
 I'm trying to index data containing a multivalued field picture, 
 that has three properties: url, caption and description:
 
 picture/ 
 
url/
 
 caption/
description/
 
 Thus, each 
 indexed document might have many pictures, each of them has a url, a 
 caption, 
 and a description.
 
 I wonder wether it's possible to store this data using 
 only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of 
 using 
 an external file to sore the properties of each picture, but I haven't tried 
 yet 
 this solution, waiting for your suggestions...
 
 Thanks,
 -Saïd



Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Saïd Radhouani
Thanks Geert-Jan for the detailed answer. Actually, I don't search at all on 
these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the 
number of pictures). Thus, your suggestion of adding an extra field NrOfPics 
[0,N] would be the best solution.

Regarding the other suggestion:

 If you dont need search at all on these fields, the best thing imo is to
 store all pic-related info of all pics together by concatenating them with
 some delimiter which you know how to seperate at the client-side.
 That or just store it in an external RDB since solr is just sitting on the
 data and not doing anything intelligent with it.

If I understand your suggestion correctly, you said that there's NO need to 
have many Dynamic Fields; instead, we can have one definitive field name, which 
can store a long string (concatenation of information about tens of pictures), 
e.g., using - and % delimiters: 
pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...

I don't clearly see the reason of doing this. Is there a gain in terms of 
performance? Or does this make programming on the client-side easier? Or 
something else?


My other question was: in case we use Dynamic Fields, is there a documentation 
about using SolrJ for this purpose? 

Thanks
-Saïd

On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:

 You can treat dynamic fields like any other field, so you can facet, sort,
 filter, etc on these fields (afaik)
 
 I believe the confusion arises that sometimes the usecase for dynamic fields
 seems to be ill-understood, i.e: to be able to use them to do some kind of
 wildcard search, e.g: search for a value in any of the dynamic fields at
 once like pic_url_*. This however is NOT possible.
 
 As far as your question goes:
 
 Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
 pic
 To the best of my knowledge, everyone is saying that faceting cannot be
 done on dynamic fields (only on definitive field names). Thus, I tried the
 following and it's working: I assume that the stored  pictures have a
 sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
 means that the underlying doc has at least one picture:
 ...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:*
 While this is working fine, I'm wondering whether there's a cleaner way to
 do the same thing without assuming that pictures have a sequential number.
 
 If I understand your question correctly: faceting on docs with and without
 pics could ofcourse by done like you mention, however it  would be more
 efficient to have an extra field defined:  hasAtLestOnePic with values (0 |
 1)
 use that to facet / filter on.
 
 you can extend this to NrOfPics [0,N)  if you need to filter / facet on docs
 with a certain nr of pics.
 
 also I wondered what else you wanted to do with this pic-related info. Do
 you want to search on pic-description / pic-caption for instance? In that
 case the dynamic-fields approach may not be what you want: how would you
 know in which dynamic-field to search for a particular term? Would if be
 pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic fields,
 but you need to know how many pics an upperbound for the nr of pics and it
 really doesn't feel right, to me at least.
 
 If you need search on pic_description for instance, but don't mind what pic
 matches, you could create a single field pic_description and put in the
 concat of all pic-descriptions and search on that, or just make it a a
 multi-valued field.
 
 If you dont need search at all on these fields, the best thing imo is to
 store all pic-related info of all pics together by concatenating them with
 some delimiter which you know how to seperate at the client-side.
 That or just store it in an external RDB since solr is just sitting on the
 data and not doing anything intelligent with it.
 
 I assume btw that you don't want to sort/ facet on pic-desc / pic_caption/
 pic_url either ( I have a hard time thinking of a useful usecase for that)
 
 HTH,
 
 Geert-Jan
 
 
 
 2010/6/26 Saïd Radhouani r.steve@gmail.com
 
 Thanks so much Otis. This is working great.
 
 Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
 pic
 
 To the best of my knowledge, everyone is saying that faceting cannot be
 done on dynamic fields (only on definitive field names). Thus, I tried the
 following and it's working: I assume that the stored pictures have a
 sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
 means that the underlying doc has at least one picture:
 
 ...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:*
 
 While this is working fine, I'm wondering whether there's a cleaner way to
 do the same thing without assuming that pictures have a sequential number.
 
 Also, do you have any documentation about handling Dynamic Fields using
 SolrJ. So far, I found only issues about that on JIRA

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Saïd Radhouani
Thanks Geert-Jan, this is indeed very helpful.

The delimiters I gave were just for the need of the example. I will use non 
frequent delimiter.

Cheers,
-Saïd

On Jun 26, 2010, at 1:53 PM, Geert-Jan Brits wrote:

 If I understand your suggestion correctly, you said that there's NO need to
 have many Dynamic Fields; instead, we can have one definitive field name,
 which can store a long string (concatenation of information about tens of
 pictures), e.g., using - and % delimiters:
 pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
 I don't clearly see the reason of doing this. Is there a gain in terms of
 performance? Or does this make programming on the client-side easier? Or
 something else?
 
 I think you should ask the exact opposite question. If you don't do anything
 with these fields which Solr is particularly good at (searching / filtering
 / faceting/ sorting) why go through the trouble of creating dynamic fields?
 (more fields is more overhead cost/ tracking cost no matter how you look at
 it)
 
 Moreover, indeed from a client-view it's easier the way I suggested, since
 otherwise you:
 - would have to ask (through SolrJ) to include all dynamic fields to be
 returned in the Fl-field (
 http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
 because a-priori you don't know how many dynamic-fields to query. So in
 other words you can't just ask SOlr (though SolrJ lik you asked) to just
 return all dynamic fields beginning with pic_*. (afaik)
 - your client iterate code (looping the pics) is a bit more involved.
 
 HTH, Cheers,
 
 Geert-Jan
 
 2010/6/26 Saïd Radhouani r.steve@gmail.com
 
 Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
 on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
 number of pictures). Thus, your suggestion of adding an extra field NrOfPics
 [0,N] would be the best solution.
 
 Regarding the other suggestion:
 
 If you dont need search at all on these fields, the best thing imo is to
 store all pic-related info of all pics together by concatenating them
 with
 some delimiter which you know how to seperate at the client-side.
 That or just store it in an external RDB since solr is just sitting on
 the
 data and not doing anything intelligent with it.
 
 If I understand your suggestion correctly, you said that there's NO need to
 have many Dynamic Fields; instead, we can have one definitive field name,
 which can store a long string (concatenation of information about tens of
 pictures), e.g., using - and % delimiters:
 pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
 
 I don't clearly see the reason of doing this. Is there a gain in terms of
 performance? Or does this make programming on the client-side easier? Or
 something else?
 
 
 My other question was: in case we use Dynamic Fields, is there a
 documentation about using SolrJ for this purpose?
 
 Thanks
 -Saïd
 
 On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
 
 You can treat dynamic fields like any other field, so you can facet,
 sort,
 filter, etc on these fields (afaik)
 
 I believe the confusion arises that sometimes the usecase for dynamic
 fields
 seems to be ill-understood, i.e: to be able to use them to do some kind
 of
 wildcard search, e.g: search for a value in any of the dynamic fields at
 once like pic_url_*. This however is NOT possible.
 
 As far as your question goes:
 
 Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
 w/o
 pic
 To the best of my knowledge, everyone is saying that faceting cannot be
 done on dynamic fields (only on definitive field names). Thus, I tried
 the
 following and it's working: I assume that the stored  pictures have a
 sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index,
 it
 means that the underlying doc has at least one picture:
 ...facet=onfacet.field=pic_url_1facet.mincount=1fq=pic_url_1:*
 While this is working fine, I'm wondering whether there's a cleaner way
 to
 do the same thing without assuming that pictures have a sequential
 number.
 
 If I understand your question correctly: faceting on docs with and
 without
 pics could ofcourse by done like you mention, however it  would be more
 efficient to have an extra field defined:  hasAtLestOnePic with values (0
 |
 1)
 use that to facet / filter on.
 
 you can extend this to NrOfPics [0,N)  if you need to filter / facet on
 docs
 with a certain nr of pics.
 
 also I wondered what else you wanted to do with this pic-related info. Do
 you want to search on pic-description / pic-caption for instance? In that
 case the dynamic-fields approach may not be what you want: how would you
 know in which dynamic-field to search for a particular term? Would if be
 pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic
 fields,
 but you need to know how many pics

Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-25 Thread Saïd Radhouani
Hi,

I'm trying to index data containing a multivalued field picture, that has 
three properties: url, caption and description:

picture/ 
url/
caption/
description/

Thus, each indexed document might have many pictures, each of them has a url, a 
caption, and a description.

I wonder wether it's possible to store this data using only schema.xml. I 
couldn't figure it out so far. Instead, I'm thinking of using an external file 
to sore the properties of each picture, but I haven't tried yet this solution, 
waiting for your suggestions...

Thanks,
-Saïd



TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?

2010-06-23 Thread Saïd Radhouani
Hi,

I'm using the Terms Component to se up the autocomplete feature based on a 
String field. Here are the params I'm using:

terms=trueterms.fl=typeterms.lower=catterms.prefix=catterms.lower.incl=false

With the above params, I've been able to get suggestions for terms that start 
with the specified prefix. I'm wondering wether it's possible to:

- have inclusive search, i.e., by typing cat, we get category, 
subcategory, etc.?

- start suggestion from any word in the field. i.e., by typing cat, we get 
The best category...?

Thanks!

 -Saïd




Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ

2010-06-21 Thread Saïd Radhouani
Hello,

I'm developing a Web application that communicate with Solr using SolrJ. I have 
three search interfaces, and I'm facing two options:

1- Configuring one SearchHandler per search interface in solrconfig.xml

Or

2- Write the configuration in the java servlet code that is using SolrJ

It there any significant difference between these two options ? If yes, what's 
the best choice?

Thanks,
-Saïd 

Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ

2010-06-21 Thread Saïd Radhouani
I completely agreed. Thanks a lot!

-S

On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote:

 Why would someone port the solr config into servlet code  ?
 IMO the first option would be the best choice, one obvious reason is that,
 when alter the solr config you only need to restart the server, whereas
 changing in the source drive you to redeploy your app and restart the
 server.
 
 
 
 On 6/21/10, Saïd Radhouani r.steve@gmail.com wrote:
 
 Hello,
 
 I'm developing a Web application that communicate with Solr using SolrJ. I
 have three search interfaces, and I'm facing two options:
 
 1- Configuring one SearchHandler per search interface in solrconfig.xml
 
 Or
 
 2- Write the configuration in the java servlet code that is using SolrJ
 
 It there any significant difference between these two options ? If yes,
 what's the best choice?
 
 Thanks,
 
 -Saïd
 
 
 
 
 -- 
 Abdelhamid ABID
 Software Engineer- J2EE / WEB



Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
There's a match between the query and the content of field I want to
highlight on. Solr is giving me the id of the document matching my query,
but it's not displaying the field I want to highlight on.

Here's the definition of the field I want to highlight on:field
name=title type=string indexed=false stored=true  /

And here's part of my URL: /?q=TerraindebugQuery=onhl=truehl.fl=title

If I change the type to text instead of string, the highlighting works
well!

Thanks for your help.
-S.



2010/3/23 Ahmet Arslan iori...@yahoo.com

  Thanks Erik. Actually, I restarted
  and reindexed numers of time, but still
  not working.

 Highlighting on string typed fields perferctly works. See the output of :


 http://localhost:8983/solr/select/?q=id%3ASOLR1000version=2.2start=0rows=10indent=onhl=truehl.fl=id

 But there must be a match/hit to get highlighting. What is your query and
 candidate field content that you want to highlight?






Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
2010/3/24 Ahmet Arslan iori...@yahoo.com

  There's a match between the query and
  the content of field I want to
  highlight on. Solr is giving me the id of the document
  matching my query,
  but it's not displaying the field I want to highlight on.
 
  Here's the definition of the field I want to highlight
  on:field
  name=title type=string indexed=false
  stored=true  /
 
  And here's part of my URL:
  /?q=TerraindebugQuery=onhl=truehl.fl=title

 With q=Terrain you are querying your defaultSearchField and requesting
 highlighting from title field.


I don't have defaultSearchField, instead, I have the following qf clause,
where title_tokenized is a tokenized version of title str
name=qf title_tokenized^3 text_description_tokenized
phonetic_text^0.5/str



 What is numFound when you hit this url? Highlighting comes?


the numFound is not zero, I get results, and also, in the highlighting
section, I get the id of the docs that matched my query


 /?q=title:TerraindebugQuery=onhl=truehl.fl=title


 if it is zero, then it means that your match comes from your
 defaultSearchField (not from title field).

 if it is not zero, highlighting should work. can you confirm this?


this URL gives zero answer.  Again, I don't have defaultSearchField, the
result is coming from the qf clause.

What do you think?

Thanks.


Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
 I didn't know that you are using dismax. In your query fields list there is
 no title field. Probably match is coming from title_tokenized, and when you
 request highlighting from title (hl.fl=title) it returns empty snippets. If
 thats the case it is pretty expected because string typed fields are not
 analyzed. I mean there is no partial matches on string fields. If your title
 contains Terrain something q=Terrain won't match this document.
 What are the title fields of returned documents?


You are right, the match is coming from the title_tokenized, but I also
added the field title to the qf clause, but still not working.


 We should re-write this url (just to query on title field) accourding to
 dismax: /?q=TerraindebugQuery=onhl=truehl.fl=titleqf=title


/?q=TerraindebugQuery=onhl=truehl.fl=titleqf=title   is not giving any
result, perhaps because title is not tokenized. I tried even phrases with
, but still not working. On the other hand, I got highlighting
*working*by adding to the above URL the following:
qf=title_tokenized.

With this configuration, the title field is highlighted only when there's a
perfect match, i.e., the quoted query equals the title content (f.i.,
q=Terrain sehloul allows highlighting the entire title containing Terrain
sehloul, but q=Terrain sehloul doesn't enable to highlight this title. Is
there a solution to this problem?

Thanks a lot.


Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
2010/3/24 Ahmet Arslan iori...@yahoo.com

  With this configuration, the title field is highlighted
  only when there's a
  perfect match, i.e., the quoted query equals the title
  content (f.i.,
  q=Terrain sehloul allows highlighting the entire title
  containing Terrain
  sehloul,

 Exactly. There should be a *perfect* match for string typed fields to
 return snippets.

  but q=Terrain sehloul doesn't enable to highlight
  this title. Is
  there a solution to this problem?

 Escaping (using backslash) whitespace can solve this problem.
 q=Terrain\ sehloul

 Now i clearly understand you. You have a title field containing 'Terrain
 sehloul' and you want to get highlighting with the query Terrain. You cannot
 do that with type=string. You need a tokenized field type in your case.



Thank a lot Ahmet. In addition, I want to highlight phrases containing stop
words. I guess that the best way is to use a tokenized type without
stopwordFilter. Do you agree with me defining a new type for this purpose ?

By he way, I wanted to highlight a phrase using a tokenized field type, but
I got wrong result; I tried 2 cases (q=Terrain\ sehloul  and q=Terrain
sehloul), and I got the following: emTerrain/em emsehloul/em

Any ideas?
Thanks


Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
2010/3/24 Ahmet Arslan iori...@yahoo.com


  Thank a lot Ahmet. In addition, I want to highlight phrases
  containing stop
  words. I guess that the best way is to use a tokenized type
  without
  stopwordFilter. Do you agree with me defining a new type
  for this purpose ?

 I am not sure about that. May be solr.CommonGramsFilterFactory can do the
 job. I personally do not perform stop-word removal.

  By he way, I wanted to highlight a phrase using a tokenized
  field type, but
  I got wrong result; I tried 2 cases (q=Terrain\
  sehloul  and q=Terrain
  sehloul), and I got the following:
  emTerrain/em emsehloul/em

 This is okey. Were you expecting this? : emTerrain sehloul/em

 Yes, that's what I was expecting. Actually, I'd like to highlight phrases
containing stopwords, like emTerrain à sehloul/em


Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
Thanks a lot Ahmet. Now I'm gonna learn new thing: how to apply a new patch
:)

Cheers.

2010/3/24 Ahmet Arslan iori...@yahoo.com

  Yes, that's what I was expecting. Actually, I'd like
  to highlight phrases
  containing stopwords, like emTerrain à sehloul/em

 Lucene's FastVectorHighlighter[1] can do that kind of phrase highlighting.
 It seems that solr integration [2] has finished. You need to apply
 SOLR-1268 patch.

 [1]
 http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.html

 [2]http://issues.apache.org/jira/browse/SOLR-1268






Re: Issue w/ highlighting a String field

2010-03-23 Thread Saïd Radhouani
Thanks Markus. It says that a tokenizer ust be defined for the field. Here's
is the fildType I'm using and the field I want to highlight on. As you can
see, I defined a tokenizer, but it's not working though. Any idea?

In the schema:

fieldType name=text_Sort class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
/analyzer
/fieldType

field name=title_sort type=text_Sort indexed=true
stored=true multiValued=false /

In solrconfig.xml:
 str name=hl.fltitle_sort text_description /str

At the same time, I wanted to highlight phrases (including stop words), but
it's not working. I use  and as you can see in my fieldType, I don't have
a stopword filter. Any idea?

Thanks a lot,
-S.


Thanks


2010/3/23 Markus Jelsma mar...@buyways.nl

 Hello,


 Check out the wiki [1] on what options to use for highlighting and other
 components.


 [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase


 Cheers,



 On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote:
  I have trouble with highlighting field of type string. It looks like
  highlighting is only working with tokenized fields, f.i., it worked with
  text and another type I defined. Is this true, or I'm making a mistake
 that
  is preventing me to have the highlighting option working on string?
 
  Thanks for your help.
 

 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 http://www.linkedin.com/in/markus17%0A050-8536620 /
 06-50258350




Re: Issue w/ highlighting a String field

2010-03-23 Thread Saïd Radhouani
Thanks Erik. Actually, I restarted and reindexed numers of time, but still
not working.

RE: your question, I intend to use this field for automatic PHRASED
boosting; is that ok?:

str name=pf title_sort /str

Thanks.

2010/3/23 Erick Erickson erickerick...@gmail.com

 Did you restart solr and reindex? just changing the field definition
 won't help you without reindexing...

 One thing worries me about your fragment, you call it text_Sort.
 If you really intend to sort by this field, it may NOT be tokenized,
 you'll probably have to use copyfield

 HTH
 Erick

 On Tue, Mar 23, 2010 at 12:45 PM, Saïd Radhouani r.steve@gmail.com
 wrote:

  Thanks Markus. It says that a tokenizer ust be defined for the field.
  Here's
  is the fildType I'm using and the field I want to highlight on. As you
 can
  see, I defined a tokenizer, but it's not working though. Any idea?
 
  In the schema:
 
 fieldType name=text_Sort class=solr.TextField
  sortMissingLast=true omitNorms=true
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.TrimFilterFactory /
 /analyzer
 /fieldType
 
 field name=title_sort type=text_Sort indexed=true
  stored=true multiValued=false /
 
  In solrconfig.xml:
  str name=hl.fltitle_sort text_description /str
 
  At the same time, I wanted to highlight phrases (including stop words),
 but
  it's not working. I use  and as you can see in my fieldType, I don't
 have
  a stopword filter. Any idea?
 
  Thanks a lot,
  -S.
 
 
  Thanks
 
 
  2010/3/23 Markus Jelsma mar...@buyways.nl
 
   Hello,
  
  
   Check out the wiki [1] on what options to use for highlighting and
 other
   components.
  
  
   [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase
  
  
   Cheers,
  
  
  
   On Tuesday 23 March 2010 17:11:42 Saïd Radhouani wrote:
I have trouble with highlighting field of type string. It looks
 like
highlighting is only working with tokenized fields, f.i., it worked
  with
text and another type I defined. Is this true, or I'm making a
 mistake
   that
is preventing me to have the highlighting option working on string?
   
Thanks for your help.
   
  
   Markus Jelsma - Technisch Architect - Buyways BV
   http://www.linkedin.com/in/markus17
   050-8536620 http://www.linkedin.com/in/markus17%0A050-8536620 /
   06-50258350
  
  
 



Solr 1.4 - Stemmer expansion

2010-03-17 Thread Saïd Radhouani
I'm using the SnowballPorterFilterFactory for stemming French words. Some
words are not reconginized by this stemmer; I wonder wether, like synonyms
processing, the stemmers have the option of expansion.

Thanks.


Re: Solr 1.4 - Stemmer expansion

2010-03-17 Thread Saïd Radhouani
The configuration is correct and it works perfectly for French. So far, all
the French words I tried got stemmed correctly; except the word studios.
This is why I thought about expansion,  perhaps I might need it for other
words.

Thanks,
-Saïd


2010/3/17 Erick Erickson erickerick...@gmail.com

 Did you specify language=French? Did you re-index
 after specifying this? Can you give some examples of
 unrecognized words? Did you look in your index to see what
 was actually indexed via the admin pages and/or Luke?
 Did you use debugQuery=on to see how your search
 was parsed? Could you post your schema definitions for
 the field in question so folks can look at it?

 We need some details in order to actually be helpful G...

 Best
 Erick

 On Wed, Mar 17, 2010 at 5:05 AM, Saïd Radhouani r.steve@gmail.com
 wrote:

  I'm using the SnowballPorterFilterFactory for stemming French words. Some
  words are not reconginized by this stemmer; I wonder wether, like
 synonyms
  processing, the stemmers have the option of expansion.
 
  Thanks.
 



Re: mincount doesn't work with FacetQuery

2010-03-15 Thread Saïd Radhouani
Chris -

Shall I open a JIRA request to add this feature?

Thnx

2010/3/11 Chris Hostetter hossman_luc...@fucit.org


 : I'm faceting with a query range (with addFacetQuery) and setting mincount
 to
 : 10 (with setFacetMinCount(10)), but Solr is not respecting this mincount;
 : it's still giving me all responses, even those having less than 10
 retrieved
 : documents.

 if by all responses you mean all facet queries then that is the
 correct behavior -- facet.mincount is a param that affects facet.field,
 not fact.query.

 The documentation notes this, in that all of the params are divided by
 section...

   http://wiki.apache.org/solr/SimpleFacetParameters

 ...if you'd like to open a feature request, it would be fairly easy to
 make facet.query (and facet.date) consider facet.mincount as well.


 -Hoss




SolrJ - how separte different results from the same facet query?

2010-03-15 Thread Saïd Radhouani
I'm faceting with a two different query ranges while using addFacetQuery. I
wonder wether it's possible using SolrJ to extract the result of each query
range separately. Here's is an example:

addFacetQuery(price:[* TO 150]); addFacetQuery(price:[151 TO 300]); etc.
addFacetQuery(length:[* TO 5]);addFacetQuery(length:[5 TO 10]); etc.

When I use getFacetQuery, SolrJ gives me the responses of both query ranges
(prices and lengths) mixed in the same list. I wonder wether it's possible
to tell SolrJ to extract the response of a specific query range, i.e., tell
it to extract the price-based response in a list and the length-based
response in another list. It would be helpful to have something like
getFacetQuery(field=price), getFacetQuery(field=length), etc.

Any ideas?

Thanks.