Re: Tokenizer question

2010-01-11 Thread rswart


Cristal clear. Thanks for your response&time!
-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27123281.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tokenizer question

2010-01-11 Thread rswart

We are using the standard query parser (so no dismax).

Fieldtype is solr.TextField with the following query analyzer:


 
  











Grant Ingersoll-6 wrote:
> 
> And also, what query parser are you using? 
> On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote:
> 
>> What do your FieldTypes look like for the fields in question?
>> 
>> On Jan 10, 2010, at 10:05 AM, rswart wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> This is probably an easy question. 
>>> 
>>> I am doing a simple query on postcode and house number. If the
>>> housenumber
>>> contains a minus sign like:
>>> 
>>> q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
>>> 
>>> the resulting parsed query contains a phrase query:
>>> 
>>> +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")
>>> 
>>> This never matches.
>>> 
>>> What I want solr to do is generate the following parsed query
>>> (essentially
>>> an OR for both house numbers):
>>> 
>>> +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
>>> 
>>> Solr generates this based on the following query (so a space instead of
>>> a
>>> minus sign):
>>> 
>>> q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
>>> 
>>> 
>>> I tried two things to have Solr generate the desired parsed query:
>>> 
>>> 1. WordDelimiterFilterFactory with generateNumberParts=1 but this
>>> results in
>>> a phrase query
>>> 2. PatternTokenizerFactory that splits on (\s+|-).
>>> 
>>> But both options don't work. 
>>> 
>>> Any suggestions on how to get rid of the phrase query?
>>> 
>>> Thanks,
>>> 
>>> Richard
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27117036.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Understanding the query parser

2010-01-11 Thread rswart

I am running in to the same issue. I have tried to replace my
WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern
(\s+|-) but I still seem to get a phrase query. Why is that?




Ahmet Arslan wrote:
> 
> 
>> I am using Solr 1.3.
>> I have an index with a field called "name". It is of type
>> "text"
>> (unmodified, stock text field from solr).
>> 
>> My query
>> field:foo-bar
>> is parsed as a phrase query
>> field:"foo bar"
>> 
>> I was rather expecting it to be parsed as
>> field:(foo bar)
>> or
>> field:foo field:bar
>> 
>> Is there an expectation mismatch? Can I make it work as I
>> expect it to?
> 
> If the query analyzer produces two or more tokens from a single token,
> QueryParser constructs PhraseQuery. Therefore it is expected. 
> 
> Without writing custom code it seems impossible to alter this behavior.
> 
> Modifying QueryParser to change this behavior will be troublesome. 
> I think easiest way is to replace '-' with whitespace before analysis
> phase. Probably in client side. Or in an custom RequestHandler.
> 
> May be you can set qp.setPhraseSlop(Integer.MAX_VALUE); so that 
> field:foo-bar and field:(foo AND bar) will be virtually equal.
> 
> hope this helps.
> 
> 
>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Understanding-the-query-parser-tp27071483p27107523.html
Sent from the Solr - User mailing list archive at Nabble.com.



Tokenizer question

2010-01-10 Thread rswart

Hi,

This is probably an easy question. 

I am doing a simple query on postcode and house number. If the housenumber
contains a minus sign like:

q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)

the resulting parsed query contains a phrase query:

+(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")

This never matches.

What I want solr to do is generate the following parsed query (essentially
an OR for both house numbers):

+(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)

Solr generates this based on the following query (so a space instead of a
minus sign):

q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)


I tried two things to have Solr generate the desired parsed query:

1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in
a phrase query
2. PatternTokenizerFactory that splits on (\s+|-).

But both options don't work. 

Any suggestions on how to get rid of the phrase query?

Thanks,

Richard
-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is it possible to apply index-time synonyms just for a section of the index

2009-06-25 Thread rswart

What is stopping you from defining different field types for faqs and
attorneys? One with index time synomyms and one without.



anuvenk wrote:
> 
> I've posted a few questions on synonyms before and finally understood how
> it worked and settled with index-time synonyms. Seems to work much better
> than query time synonyms. But now @ my work, they have a special request.
> They want certain synonyms to be applied only to certain sections of the
> index.
> For example, we have legal faqs, forms etc and we have attorneys in our
> index.
> The following synonyms for example,
> california,san diego
> florida,miami
> So for a search 'real estate san diego', it makes sense to return all
> faqs, forms for 'california' in the index but doesn't make sense to return
> a real estate attorney elsewhere in california (like burbank) besides just
> restricting to san diego attorneys.
> To be more clear I want to be able to return all california faqs & forms
> for 'real estate san diego' but not all california attorneys for the same.
> That means, i should index the faqs, forms with the state => city mappings
> as above but not for attorneys.
> Well I could index all other resources like faqs, forms first with these
> synonyms, then remove them and index attorneys. But that wouldn't work
> well in my case because we have a scheduler set up that runs every night
> to index any new resources from our database.
> Can someone suggest a good solution for this?
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24210694.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: User search in Facebook like

2009-05-31 Thread rswart

Hi Vincent,


If I recall correctly a wildcard query does not use any of the filters
defined in your fieldtype (search the news group for this). So using a
LowerCaseFilterFactory does not work and you'll need to do the to lower case
transform yourself on the client side (javascript?).

However, if I understand correctly you are trying to build an autocomplete
functionality. The wildcard query may be too slow for this. We are using an 
ngram filter for autocompleting over millions of names. This seems to work
fine in test (we are not live yet). Another advantage is that your lower
case issue is solved.

Example config:
















Cheers,

Richard


Vincent Pérès wrote:
> 
> Thanks very much, that's solve my problem !
> 
> Now I see another question : how can I manage the lower/upper cases in my
> search?
> 
> Thanks !
> 
> 
> Dietrich Featherston-2 wrote:
>> 
>> try searching for matches where the name starts with whatever the user
>> has
>> entered so far with a wildcard
>> 
>> ?q=vinc*
>> 
>> Are you always going to be searching for names?  If so you could see if
>> the
>> user has entered two terms and suffix each with a wildcard to get
>> potentially more relevant searches.
>> 
>> For example, if a user enters "vince p", you might substitute that with
>> the
>> query "vince* p*" to get the following hits
>> Vincent Pérès
>> Vincent Price
>> Vince Price
>> Vince Pérès
>> etc...
>> 
>> D
>> 
>> 
>> 
>> 2009/5/31 Vincent Pérès 
>> 
>>>
>>> Hello,
>>>
>>> I built a feature which allow users to search for other user thanks to a
>>> dynamic text box.
>>> Like facebook, when you search for your friends, the name is display in
>>> a
>>> javascript dropdown list with a small picture.
>>> But I'm not completely happy with the search... I'm using a standard
>>> search
>>> like "?q=vincent" and I get back the results list. If I type 'vinc' I
>>> will
>>> not get any results (But I would like to display all the users where the
>>> name start with 'vinc'). Maybe I need an extra param?
>>> I also tried the autosuggest, but I get a list of terms and not direct
>>> results...
>>>
>>> Could you suggest me some solr feature which could help me to get better
>>> results?
>>>
>>> Thanks a lot !
>>> Vincent
>>> --
>>> View this message in context:
>>> http://www.nabble.com/User-search-in-Facebook-like-tp23804854p23804854.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/User-search-in-Facebook-like-tp23804854p23807385.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr statistics of top searches and results returned

2009-05-26 Thread rswart

If this is is not done in an async way wouldn't this have a serious
performance impact? 

 

Plaatje, Patrick wrote:
> 
> Hi all,
> 
> I created a script that uses a Solr Search Component, which hooks into the
> main solr core and catches the searches being done. After this it
> tokenizes the search and send both the tokenized as well as the original
> query to another Solr core. I have not written a factory for this, but if
> required, it shouldn't be so hard to modify the script and code Database
> support into it.
> 
> You can find the source here:
> 
> http://www.ipros.nl/uploads/Stats-component.zip
> 
> It includes a README, and a schema.xml that should be used.
> 
> Please let me know you're thoughts.
> 
> Best,
> 
> Patrick
> 
> 
> 
>  
> 
> -Original Message-
> From: Umar Shah [mailto:u...@wisdomtap.com] 
> Sent: vrijdag 22 mei 2009 10:03
> To: solr-user@lucene.apache.org
> Subject: Re: Solr statistics of top searches and results returned
> 
> Hi,
> 
> good feature to have,
> maintaining top N would also require storing all the search queries done
> so far and keep updating (or atleast in some time window).
> 
> having pluggable persistent storage for all time search queries would be
> great.
> 
> tell me how can I help?
> 
> -umar
> 
> On Fri, May 22, 2009 at 12:21 PM, Shalin Shekhar Mangar
>  wrote:
>> On Fri, May 22, 2009 at 3:22 AM, Grant Ingersoll
>> wrote:
>>
>>>
>>> I think you will want some type of persistence mechanism otherwise 
>>> you will end up consuming a lot of resources keeping track of all the 
>>> query strings, unless I'm missing something.  Either a Lucene index 
>>> (Solr core) or the option of embedding a DB.  Ideally, it would be 
>>> pluggable such that people could choose their storage mechanism.  
>>> Most people do this kind of thing offline via log analysis as logs can
>>> grow quite large quite quickly.
>>>
>>
>> For a general case, yes. But I was thinking more of a top 'n' queries 
>> as a running statistic.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-statistics-of-top-searches-and-results-returned-tp23621779p23724277.html
Sent from the Solr - User mailing list archive at Nabble.com.