from:"Avlesh Singh"

Re: autocomplete: case-insensitive and middle word

2010-08-17 Thread Avlesh Singh

This thread might help -
http://www.lucidimagination.com/search/document/9edc01a90a195336/enhancing_auto_complete

Cheers
Avlesh
@avlesh http://twitter.com/avlesh | http://webklipper.com

On Tue, Aug 17, 2010 at 8:30 PM, Paul p...@nines.org wrote:

 I have a couple questions about implementing an autocomplete function
 in solr. Here's my scenario:

 I have a name field that usually contains two or three names. For
 instance, let's suppose it contains:

 John Alfred Smith
 Alfred Johnson
 John Quincy Adams
 Fred Jones

 I'd like to have the autocomplete be case insensitive and match any of
 the names, preferably just at the beginning.

 In other words, if the user types alf, I want

 John Alfred Smith
 Alfred Johnson

 if the user types fre, I want

 Fred Jones

 but not:
 John Alfred Smith
 Alfred Johnson

 I can get the matches using the text_lu analyzer, but the hints that
 are returned are lower case, and only one name.

 If I use the string analyzer, I get the entire name like I want it,
 but the user must match the case, that is, must type Alf, and it
 only matches the first name, not the middle name.

 How can I get the matches of the text_lu analyzer, but get the hints
 like the string analyzer?

 Thanks,
 Paul

Re: enhancing auto complete

2010-08-04 Thread Avlesh Singh

I preferred to answer this question privately earlier. But I have received
innumerable requests to unveil the architecture. For the benefit of all, I
am posting it here (after hiding as much info as I should, in my company's
interest).

The context: Auto-suggest feature on http://askme.in

*Solr setup*: Underneath are some of the salient features -

   1. TermsComponent is NOT used.
   2. The index is made up of 4 fields of the following types -
   autocomplete_full, autocomplete_token, string and text.
   3. autocomplete_full uses KeywordTokenizerFactory and
   EdgeNGramFilterFactory. autocomplete_token uses WhitespaceTokenizerFactory
   and EdgeNGramFilterFactory. Both of these are Solr text fields with standard
   filters like LowerCaseFilterFactory etc applied during querying and
   indexing.
   4. Standard DataImportHandler and a bunch of sql procedures are used to
   derive all suggestable phrases from the system and index them in the above
   mentioned fields.

*Controller setup*: The controller (to handle suggest queries) is a typical
JAVA servlet using Solr as its backend (connecting via solrj). Based on the
incoming query string, a lucene query is created. It is BooleanQuery
comprising of TermQuery across all the above mentioned fields. The boost
factor to each of these term queries would determine (to an extent) what
kind of matches do you prefer to show up first. JSON is used as the data
exchange format.

*Frontend setup*: It is a home grown JS to address some specific use cases
of the project in question. One simple exercise with Firebug will spill all
the beans. However, I strongly recommend using jQuery to build (and extend)
the UI component.

Any help beyond this is available, but off the list.

Cheers
Avlesh
@avlesh http://twitter.com/avlesh | http://webklipper.com

On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar 
bhavnik.gaj...@gatewaynintec.com wrote:

  Whoops!

 table still not looks ok :(

 trying to send once again


 loremLorem ipsum dolor sit amet
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 lorem ipLorem ipsum dolor sit amet
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 lorem ipsltest xyz lorem ipslili

 On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:

 Avlesh,

 Thanks for responding

 The table mentioned below looks like,

 lorem   Lorem ipsum dolor sit amet
  Hieyed ddi lorem ipsum
 dolor
  test lorem ipsume
  test xyz lorem ipslili

 lorem ip   Lorem ipsum dolor sit amet
  Hieyed ddi lorem ipsum
 dolor
  test lorem ipsume
  test xyz lorem ipslili

 lorem ipsl test xyz lorem ipslili


 Yes, [http://askme.in] looks good!

 I would like to know its designs/solr configurations etc.. Can you
 please provide me detailed views of it?

 In [http://askme.in], there is one thing to be noted. Search text like,
 [business c] populates [Business Centre] which looks OK but, [Consultant
 Business] looks bit odd. But, in general the pointer you suggested is
 great to start with.

 On 8/2/2010 8:39 PM, Avlesh Singh wrote:


  From whatever I could read in your broken table of sample use cases, I think


  you are looking for something similar to what has been done here 
 -http://askme.in; if this is what you are looking do let me know.

 Cheers
 Avlesh
 @avleshhttp://twitter.com/avlesh http://twitter.com/avlesh  | 
 http://webklipper.com

 On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik 
 Gajjarbhavnik.gaj...@gatewaynintec.com  wrote:




  Hi,

 I'm looking for a solution related to auto complete feature for one
 application.

 Below is a list of texts from which auto complete results would be
 populated.

 Lorem ipsum dolor sit amet
 tincidunt ut laoreet
 dolore eu feugiat nulla facilisis at vero eros et
 te feugait nulla facilisi
 Claritas est etiam processus
 anteposuerit litterarum formas humanitatis
 fiant sollemnes in futurum
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 Consider below table. First column describes user entered value and
 second column describes expected result (list of auto complete terms
 that should be populated from Solr)

 lorem
 *Lorem* ipsum dolor sit amet
 Hieyed ddi *lorem* ipsum dolor
 test *lorem *ipsume
 test xyz *lorem *ipslili
 lorem ip
 *Lorem ip*sum dolor sit amet
 Hieyed ddi *lorem ip*sum dolor
 test *lorem ip*sume
 test xyz *lorem ip*slili
 lorem ipsl
 test xyz *lorem ipsl*ili



 Can anyone share ideas of how this can be achieved

Re: enhancing auto complete

2010-08-02 Thread Avlesh Singh

From whatever I could read in your broken table of sample use cases, I think
you are looking for something similar to what has been done here -
http://askme.in; if this is what you are looking do let me know.

Cheers
Avlesh
@avlesh http://twitter.com/avlesh | http://webklipper.com

On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar 
bhavnik.gaj...@gatewaynintec.com wrote:

 Hi,

 I'm looking for a solution related to auto complete feature for one
 application.

 Below is a list of texts from which auto complete results would be
 populated.

 Lorem ipsum dolor sit amet
 tincidunt ut laoreet
 dolore eu feugiat nulla facilisis at vero eros et
 te feugait nulla facilisi
 Claritas est etiam processus
 anteposuerit litterarum formas humanitatis
 fiant sollemnes in futurum
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 Consider below table. First column describes user entered value and
 second column describes expected result (list of auto complete terms
 that should be populated from Solr)

 lorem
*Lorem* ipsum dolor sit amet
 Hieyed ddi *lorem* ipsum dolor
 test *lorem *ipsume
 test xyz *lorem *ipslili
 lorem ip
*Lorem ip*sum dolor sit amet
 Hieyed ddi *lorem ip*sum dolor
 test *lorem ip*sume
 test xyz *lorem ip*slili
 lorem ipsl
test xyz *lorem ipsl*ili



 Can anyone share ideas of how this can be achieved with Solr? Already
 tried with various tokenizers and filter factories like,
 WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,
 ShingleFilterFactory etc. but no luck so far..

 Note that, It would be excellent if terms populated from Solr can be
 highlighted by using Highlighting or any other component/mechanism of Solr.

 *Note :* Standard autocomplete (like,
 facet.field=AutoCompletef.AutoComplete.facet.prefix=user entered
 termf.AutoComplete.facet.limit=10facet.sortrows=0) are already
 working fine with the application. but, nowadays, looking for enhancing
 the existing auto complete stuff with the above requirement.

 Any thoughts?

 Thanks in advance




 The contents of this eMail including the contents of attachment(s) are
 privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and
 should not be disclosed to, used by or copied in any manner by anyone other
 than the intended addressee(s). If this eMail has been received by error,
 please advise the sender immediately and delete it from your system. The
 views expressed in this eMail message are those of the individual sender,
 except where the sender expressly, and with authority, states them to be the
 views of GNPL. Any unauthorized review, use, disclosure, dissemination,
 forwarding, printing or copying of this eMail or any action taken in
 reliance on this eMail is strictly prohibited and may be unlawful. This
 eMail may contain viruses. GNPL has taken every reasonable precaution to
 minimize this risk, but is not liable for any damage you may sustain as a
 result of any virus in this eMail. You should carry out your own virus
 checks before opening the eMail or attachment(s). GNPL is neither liable for
 the proper and complete transmission of the information contained in this
 communication nor for any delay in its receipt. GNPL reserves the right to
 monitor and review the content of all messages sent to or from this eMail
 address and may be stored on the GNPL eMail system. In case this eMail has
 reached you in error, and you  would no longer like to receive eMails from
 us, then please send an eMail to d...@gatewaynintec.com

Re: enhancing auto complete

2010-08-02 Thread Avlesh Singh

Hahaha ... sorry its not. And there is no readymade code that I can give you
either. But yes, if you liked it, I can share the design of this feature
(solr, backend and frontend).

Cheers
Avlesh
@avlesh http://twitter.com/avlesh | http://webklipper.com

On Mon, Aug 2, 2010 at 8:47 PM, scr...@asia.com wrote:


  Hi, I'm also interested of this feature... is it open source?








 -Original Message-
 From: Avlesh Singh avl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, Aug 2, 2010 5:09 pm
 Subject: Re: enhancing auto complete


 From whatever I could read in your broken table of sample use cases, I
 think

 you are looking for something similar to what has been done here -

 http://askme.in; if this is what you are looking do let me know.



 Cheers

 Avlesh

 @avlesh http://twitter.com/avlesh | http://webklipper.com



 On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar 

 bhavnik.gaj...@gatewaynintec.com wrote:



  Hi,

 

  I'm looking for a solution related to auto complete feature for one

  application.

 

  Below is a list of texts from which auto complete results would be

  populated.

 

  Lorem ipsum dolor sit amet

  tincidunt ut laoreet

  dolore eu feugiat nulla facilisis at vero eros et

  te feugait nulla facilisi

  Claritas est etiam processus

  anteposuerit litterarum formas humanitatis

  fiant sollemnes in futurum

  Hieyed ddi lorem ipsum dolor

  test lorem ipsume

  test xyz lorem ipslili

 

  Consider below table. First column describes user entered value and

  second column describes expected result (list of auto complete terms

  that should be populated from Solr)

 

  lorem

 *Lorem* ipsum dolor sit amet

  Hieyed ddi *lorem* ipsum dolor

  test *lorem *ipsume

  test xyz *lorem *ipslili

  lorem ip

 *Lorem ip*sum dolor sit amet

  Hieyed ddi *lorem ip*sum dolor

  test *lorem ip*sume

  test xyz *lorem ip*slili

  lorem ipsl

 test xyz *lorem ipsl*ili

 

 

 

  Can anyone share ideas of how this can be achieved with Solr? Already

  tried with various tokenizers and filter factories like,

  WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,

  ShingleFilterFactory etc. but no luck so far..

 

  Note that, It would be excellent if terms populated from Solr can be

  highlighted by using Highlighting or any other component/mechanism of
 Solr.

 

  *Note :* Standard autocomplete (like,

  facet.field=AutoCompletef.AutoComplete.facet.prefix=user entered

  termf.AutoComplete.facet.limit=10facet.sortrows=0) are already

  working fine with the application. but, nowadays, looking for enhancing

  the existing auto complete stuff with the above requirement.

 

  Any thoughts?

 

  Thanks in advance

 

 

 

 

  The contents of this eMail including the contents of attachment(s) are

  privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL)
 and

  should not be disclosed to, used by or copied in any manner by anyone
 other

  than the intended addressee(s). If this eMail has been received by error,

  please advise the sender immediately and delete it from your system. The

  views expressed in this eMail message are those of the individual sender,

  except where the sender expressly, and with authority, states them to be
 the

  views of GNPL. Any unauthorized review, use, disclosure, dissemination,

  forwarding, printing or copying of this eMail or any action taken in

  reliance on this eMail is strictly prohibited and may be unlawful. This

  eMail may contain viruses. GNPL has taken every reasonable precaution to

  minimize this risk, but is not liable for any damage you may sustain as a

  result of any virus in this eMail. You should carry out your own virus

  checks before opening the eMail or attachment(s). GNPL is neither liable
 for

  the proper and complete transmission of the information contained in this

  communication nor for any delay in its receipt. GNPL reserves the right
 to

  monitor and review the content of all messages sent to or from this eMail

  address and may be stored on the GNPL eMail system. In case this eMail
 has

  reached you in error, and you  would no longer like to receive eMails
 from

  us, then please send an eMail to d...@gatewaynintec.com

Re: solr configuration for local search

2010-06-07 Thread Avlesh Singh

They is me!

Yes, multiple queries are being fired (though concurrently) for fetching
suggestion. You would probably want to take this off the list with me for
questions, if any.

Cheers
Avlesh
http://webklipper.com

On Mon, Jun 7, 2010 at 5:04 PM, Frank A fsa...@gmail.com wrote:

 Thanks.

 Do you have any idea what features they use, specifically thhe types of
 tokenizers and analyzers?

 Also, do you think they use two separate queries for the business name
 versus you may be looking for?

 Thanks again.

 On Sun, Jun 6, 2010 at 9:36 PM, Avlesh Singh avl...@gmail.com wrote:

  Frank, w.r.t features you may draw a lot of inspiration from these two
  sites
 
1. http://mumbai.burrp.com/
2. http://askme.in/
 
  Both these products are Indian local search applications. #1 primarily
  focuses on the eating out domain. All the search/suggest related features
  on
  these sites are powered by Solr. You can take a lot of cues for building
  the
  auto-complete feature, using facets, custom highlighting etc.
 
  Cheers
  Avlesh
  http://webklipper.com
 
  On Mon, Jun 7, 2010 at 6:08 AM, Frank A fsa...@gmail.com wrote:
 
   Hi,
  
   I'm playing with SOLR as the search engine for my local search site.
  I'm
   primarily focused on restaurants right now.  I'm working with the
  following
   data attributes:
  
   Name - Restaurant name
   Cuisine - a list of 1 or more cusines, e.g. Italian, Pizza
   Features - a list of 1 or more features - Open Late, Take-Out
   Tags - a list of 1 or more freeform, open entry tags
  
   I want the site to allow searches by name e.g. Jake's Pizza as well
 as
   more general pizza and even something like take-out pizza.  I'd
 also
   like to handle variations, takeout carryout and spelling issues.
  
   I've started with the out of the box text definition and cloned it for
   cuisines, features and tags.  For name I've left it as a string and
 then
   created a copyTo field for the phoentic value of Name.  My text
  catch-all
   has all the fields copied to it.  Finally, I implemented spell check as
   well.
  
   The search seems to work pretty well based on some initial testing but
 I
   feel like I'm missing something.  I'm curious as to any advice around
  some
   missing features I should be utilizing or steps that I've missed etc...
  
   The steps I was planning:
   - Update the stopwords to contain certain adjectives (good best,
  etc).
   - Create synonyms for features and cuisines
  
   All thoughts/comments/advice is really appreciated.
   Thanks.

Re: solr configuration for local search

2010-06-06 Thread Avlesh Singh

Frank, w.r.t features you may draw a lot of inspiration from these two sites

   1. http://mumbai.burrp.com/
   2. http://askme.in/

Both these products are Indian local search applications. #1 primarily
focuses on the eating out domain. All the search/suggest related features on
these sites are powered by Solr. You can take a lot of cues for building the
auto-complete feature, using facets, custom highlighting etc.

Cheers
Avlesh
http://webklipper.com

On Mon, Jun 7, 2010 at 6:08 AM, Frank A fsa...@gmail.com wrote:

 Hi,

 I'm playing with SOLR as the search engine for my local search site.  I'm
 primarily focused on restaurants right now.  I'm working with the following
 data attributes:

 Name - Restaurant name
 Cuisine - a list of 1 or more cusines, e.g. Italian, Pizza
 Features - a list of 1 or more features - Open Late, Take-Out
 Tags - a list of 1 or more freeform, open entry tags

 I want the site to allow searches by name e.g. Jake's Pizza as well as
 more general pizza and even something like take-out pizza.  I'd also
 like to handle variations, takeout carryout and spelling issues.

 I've started with the out of the box text definition and cloned it for
 cuisines, features and tags.  For name I've left it as a string and then
 created a copyTo field for the phoentic value of Name.  My text catch-all
 has all the fields copied to it.  Finally, I implemented spell check as
 well.

 The search seems to work pretty well based on some initial testing but I
 feel like I'm missing something.  I'm curious as to any advice around some
 missing features I should be utilizing or steps that I've missed etc...

 The steps I was planning:
 - Update the stopwords to contain certain adjectives (good best, etc).
 - Create synonyms for features and cuisines

 All thoughts/comments/advice is really appreciated.
 Thanks.

Facet pagination

2010-03-11 Thread Avlesh Singh

Is there a way to get *total count of facets* per field?

Meaning, if my facets are -
lst name=facet_fields
lst name=first_char
int name=s305807/int
int name=d264748/int
int name=p181084/int
int name=m130546/int
int name=r98544/int
int name=b82741/int
int name=k77157/int
/lst
/lst

Then, is the underneath possible?
lst name=first_char *totalFacetCount=7'*
where 7 is the count of all facets available. In this example - s, d, p, m,
r, b and k.

I need this to fetch paginated facets of a field for a given query; not by
doing next-previous.

Cheers
Avlesh

Re: Knowledge about contents of a page

2010-01-28 Thread Avlesh Singh

Classification? - http://en.wikipedia.org/wiki/Document_classification

Cheers
Avlesh

On Fri, Jan 29, 2010 at 1:18 AM, ram_sj rpachaiyap...@gmail.com wrote:


 Hi,

 My question is about crawling, I know this is not relevant here, but I
 asked
 nutch people, didn't get any response, I just thought of posing here,

 I'm trying to crawl reviews for business,

 a. is there any way to tell the content in a web pages are reviews or not?
 Is it possible to do it in automated fashion?

 b. How could be map a block of text to a particular business ? ex: like
 google reviews



 Thanks
 Ram
 --
 View this message in context:
 http://old.nabble.com/Knowledge-about-contents-of-a-page-tp27358779p27358779.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Understanding the query parser

2010-01-11 Thread Avlesh Singh


 It is in the source code of QueryParser's getFieldQuery(String field,
 String queryText)  method line#660. If numTokens  1 it returns Phrase
 Query.

That's exactly the question. Would be nice to hear from someone as to why is
it that way?

Cheers
Avlesh

On Mon, Jan 11, 2010 at 5:10 PM, Ahmet Arslan iori...@yahoo.com wrote:


  I am running in to the same issue. I have tried to replace
  my
  WhitespaceTokenizerFactory with a PatternTokenizerFactory
  with pattern
  (\s+|-) but I still seem to get a phrase query. Why is
  that?

 It is in the source code of QueryParser's getFieldQuery(String field,
 String queryText)  method line#660. If numTokens  1 it returns Phrase
 Query.

 Modifications in analysis phase (CharFilterFactory, TokenizerFactory,
 TokenFilterFactory) won't change this behavior. Something must be done
 before analysis phase.

 But i think in your case, you can obtain match with modifying parameters of
 WordDelimeterFilterFactory even with PhraseQuery.

Re: Tokenizer question

2010-01-11 Thread Avlesh Singh


 If the analyzer produces multiple Tokens, but they all have the same
 position then the QueryParser produces a BooleanQuery will all SHOULD
 clauses.  -- This is what allows simple synonyms to work.

You rock Hoss!!! This is exactly the explanation I was looking for .. it is
as simple as it sounds. Thanks!

Cheers
Avlesh

On Tue, Jan 12, 2010 at 6:37 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
 :
 : the resulting parsed query contains a phrase query:
 :
 : +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:39 43)

 This stems from some fairly fundemental behavior i nthe QueryParser ...
 each chunk of input that isn't deemed markup (ie: not field names, or
 special characters) is sent to the analyzer.  If the analyzer produces
 multiple tokens at differnet positions, then a PhraseQuery is constructed.
 -- Things like simple phrase searchs and N-Gram based partial matching
 require this behavior.

 If the analyzer produces multiple Tokens, but they all have the same
 position then the QueryParser produces a BooleanQuery will all SHOULD
 clauses.  -- This is what allows simple synonyms to work.

 If you write a simple TokenFilter to flatten all of the positions to be
 the same, and use it after WordDelimiterFilter then it should give you the
 OR style query you want.

 This isn't hte default behavior because the Phrase behavior of WDF fits
 it's intended case better --- someone searching for a product sku
 like X3QZ-D5 expects it to match X-3QZD5, but not just X or 3QZ

 -Hoss

Re: Understanding the query parser

2010-01-11 Thread Avlesh Singh

Thanks Erik for responding.
Hoss explained the behavior with nice corollaries here -
http://www.lucidimagination.com/search/document/8bc351d408f24cf6/tokenizer_question

Cheers
Avlesh

On Tue, Jan 12, 2010 at 2:21 AM, Erik Hatcher erik.hatc...@gmail.comwrote:


 On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote:


 It is in the source code of QueryParser's getFieldQuery(String field,
 String queryText)  method line#660. If numTokens  1 it returns Phrase
 Query.

  That's exactly the question. Would be nice to hear from someone as to
 why is
 it that way?


 Suppose you indexed Foo Bar.  It'd get indexed as two tokens [foo]
 followed by [bar].  Then someone searches for foo-bar, which would get
 analyzed into two tokens also.  A PhraseQuery is the most logical thing for
 it to turn into, no?

 What's the alternative?

 Of course it's tricky business though, impossible to do the right thing for
 all cases within SolrQueryParser.  Thankfully it is pleasantly subclassable
 and overridable for this method.

Erik

Understanding the query parser

2010-01-07 Thread Avlesh Singh

I am using Solr 1.3.
I have an index with a field called name. It is of type text
(unmodified, stock text field from solr).

My query
field:foo-bar
is parsed as a phrase query
field:foo bar

I was rather expecting it to be parsed as
field:(foo bar)
or
field:foo field:bar

Is there an expectation mismatch? Can I make it work as I expect it to?

Cheers
Avlesh

Re: Rules engine and Solr

2010-01-06 Thread Avlesh Singh

Thanks for the revert, Ravi.

I am currently working on some of kind rules in front
(application side) of our solr instance. These rules are more application
specific and are not general. Like deciding which fields to facet, which
fields to return in response, which fields to highlight, boost value for
each field (both at query time and at index time).
The approach I have taken is to define a database table which
holds these fields parameters. Which are then interpreted by my application
to decide the query to be sent to Solr. This allow tweaking the Solr fields
on the fly and hence influence the search results.

I guess, this is the usual usage of solr server. In my case this is no
different. Search queries have a personalized experience, which means
behaviors for facets, highlighting etc .. are customizable. We pull it off
using databases and java data structures.

I will be interested to hear from you about the Kind of rules you talk
about and your approach towards it. Are these Rules like a regular
expression that when matched with the user query, execute a specific
solr
query ?

http://en.wikipedia.org/wiki/Business_rules_engine

Cheers
Avlesh

On Wed, Jan 6, 2010 at 12:12 PM, Ravi Gidwani ravi.gidw...@gmail.comwrote:

Avlesh:
I am currently working on some of kind rules in front
(application side) of our solr instance. These rules are more application
specific and are not general. Like deciding which fields to facet, which
fields to return in response, which fields to highlight, boost value for
each field (both at query time and at index time).
The approach I have taken is to define a database table which
holds these fields parameters. Which are then interpreted by my application
to decide the query to be sent to Solr. This allow tweaking the Solr fields
on the fly and hence influence the search results.

~Ravi

On Tue, Jan 5, 2010 at 8:25 PM, Avlesh Singh avl...@gmail.com wrote:

Your question appears to be an XY Problem ... that is: you are
dealing
with X, you are assuming Y will help you, and you are asking about
Y
without giving more details about the X so that we can understand the
full
issue. Perhaps the best solution doesn't involve Y at all? See Also:
http://www.perlmonks.org/index.pl?node_id=542341

Hahaha, thats classic Hoss!
Thanks for introducing me to the XY problem. Had I known the two
completely,
I wouldn't have posted it on the mailing list. And I wasn't looking for a
solution either. Anyways, as I replied back earlier, I'll get back with
questions once I get more clarity.

Cheers
Avlesh

On Wed, Jan 6, 2010 at 2:02 AM, Chris Hostetter
hossman_luc...@fucit.org
wrote:

: I am planning to build a rules engine on top search. The rules are
database
: driven and can't be stored inside solr indexes. These rules would
ultimately
: two do things -
:
:1. Change the order of Lucene hits.
:2. Add/remove some results to/from the Lucene hits.
:
: What should be my starting point? Custom search handler?

This smells like an XY problem ... can you elaborate on the types of
rules/conditions/situations when you want #1 and #2 listed above to
happen?

http://people.apache.org/~hossman/#xyproblemhttp://people.apache.org/%7Ehossman/#xyproblem
http://people.apache.org/%7Ehossman/#xyproblem
http://people.apache.org/%7Ehossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are
dealing
with X, you are assuming Y will help you, and you are asking about
Y
without giving more details about the X so that we can understand the
full issue. Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

-Hoss

Re: Rules engine and Solr

2010-01-05 Thread Avlesh Singh

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the full
issue. Perhaps the best solution doesn't involve Y at all? See Also:
http://www.perlmonks.org/index.pl?node_id=542341

Hahaha, thats classic Hoss!
Thanks for introducing me to the XY problem. Had I known the two completely,
I wouldn't have posted it on the mailing list. And I wasn't looking for a
solution either. Anyways, as I replied back earlier, I'll get back with
questions once I get more clarity.

Cheers
Avlesh

On Wed, Jan 6, 2010 at 2:02 AM, Chris Hostetter hossman_luc...@fucit.orgwrote:

This smells like an XY problem ... can you elaborate on the types of
rules/conditions/situations when you want #1 and #2 listed above to
happen?

http://people.apache.org/~hossman/#xyproblemhttp://people.apache.org/%7Ehossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue. Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

-Hoss

Re: Rules engine and Solr

2010-01-04 Thread Avlesh Singh

 Thanks for the response, Shalin. I am still in two minds over doing it
inside Solr versus outside.
I'll get back with more questions, if any.

Cheers
Avlesh

On Mon, Jan 4, 2010 at 5:11 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Jan 4, 2010 at 10:24 AM, Avlesh Singh avl...@gmail.com wrote:

  I have a Solr (version 1.3) powered search server running in production.
  Search is keyword driven is supported using custom fields and tokenizers.
 
  I am planning to build a rules engine on top search. The rules are
 database
  driven and can't be stored inside solr indexes. These rules would
  ultimately
  two do things -
 
1. Change the order of Lucene hits.
 

 A Lucene FieldComparator is what you'd need. The QueryElevationComponent
 uses this technique.


2. Add/remove some results to/from the Lucene hits.
 
 
 This is a bit more tricky. If you will always have a very limited number of
 docs to add or remove, it may be best to change the query itself to include
 or exclude them (i.e. add fq). Otherwise you'd need to write a custom
 Collector (see DocSetCollector) and change SolrIndexSearcher to use it. We
 are planning to modify SolrIndexSearcher to allow custom collectors soon
 for
 field collapsing but for now you will have to modify it.


  What should be my starting point? Custom search handler?
 
 
 A custom SearchComponent which extends/overrides QueryComponent will do the
 job.

 --
 Regards,
 Shalin Shekhar Mangar.

Rules engine and Solr

2010-01-03 Thread Avlesh Singh

I have a Solr (version 1.3) powered search server running in production.
Search is keyword driven is supported using custom fields and tokenizers.

I am planning to build a rules engine on top search. The rules are database
driven and can't be stored inside solr indexes. These rules would ultimately
two do things -

   1. Change the order of Lucene hits.
   2. Add/remove some results to/from the Lucene hits.

What should be my starting point? Custom search handler?

Cheers
Avlesh

Re: Newbie Solr questions

2009-11-14 Thread Avlesh Singh


 a) Since Solr is built on top of lucene, using SolrJ, can I still directly
 create custom documents, specify the field specifics etc (indexed, stored
 etc) and then map POJOs to those documents, simular to just using the
 straight lucene API?

 b) I took a quick look at the SolrJ javadocs but did not see anything in
 there that allowed me to customize if a field is stored, indexed, not
 indexed etc. How do I do that with SolrJ without having to go directly to
 the lucene apis?

 c) The SolrJ beans package. By annotating a POJO with @Field, how exactly
 does SolrJ treat that field? Indexed/stored, or just indexed? Is there any
 other way to control this?

The answer to all your questions above is the magical file called
schema.xml. For more read here - http://wiki.apache.org/solr/SchemaXml.
SolrJ is simply a java client to access (read and update from) the solr
server.

c) If I create a custom index outside of Solr using straight lucene, is it
 easy to import a pre-exisiting lucene index into a Solr Server?

As long as the Lucene index matches the definitions in your schema you can
use the same index. The data however needs to copied into a predictable
location inside SOLR_HOME.

Cheers
Avlesh

On Sun, Nov 15, 2009 at 9:26 AM, yz5od2 woods5242-outdo...@yahoo.comwrote:

 Hi,
 I am new to Solr but fairly advanced with lucene.

 In the past I have created custom Lucene search engines that indexed
 objects in a Java application, so my background is coming from this
 requirement

 a) Since Solr is built on top of lucene, using SolrJ, can I still directly
 create custom documents, specify the field specifics etc (indexed, stored
 etc) and then map POJOs to those documents, simular to just using the
 straight lucene API?

 b) I took a quick look at the SolrJ javadocs but did not see anything in
 there that allowed me to customize if a field is stored, indexed, not
 indexed etc. How do I do that with SolrJ without having to go directly to
 the lucene apis?

 c) The SolrJ beans package. By annotating a POJO with @Field, how exactly
 does SolrJ treat that field? Indexed/stored, or just indexed? Is there any
 other way to control this?

 c) If I create a custom index outside of Solr using straight lucene, is it
 easy to import a pre-exisiting lucene index into a Solr Server?

 thanks!

Re: how to search against multiple attributes in the index

2009-11-13 Thread Avlesh Singh

Dive in - http://wiki.apache.org/solr/Solrj

Cheers
Avlesh

On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote:


 I want to build AND search query against field1 AND field2 etc. Both these
 fields are stored in an index. I am migrating lucene code to Solr.
 Following
 is my existing lucene code

 BooleanQuery currentSearchingQuery = new BooleanQuery();

 currentSearchingQuery.add(titleDescQuery,Occur.MUST);
 highlighter = new Highlighter( new QueryScorer(titleDescQuery));

 TermQuery searchTechGroupQyery = new TermQuery(new Term
 (techGroup,searchForm.getTechGroup()));
currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
 TermQuery searchProgramQyery = new TermQuery(new
 Term(techProgram,searchForm.getTechProgram()));
currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
 }

 What's the equivalent Solr code for above Luce code. Any samples would be
 appreciated.

 Thanks,
 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to search against multiple attributes in the index

2009-11-13 Thread Avlesh Singh

For a starting point, this might be a good read -
http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote:


 I already did  dive in before. I am using solrj API and SolrQuery object to
 build query. but its not clear/written how to build booleanQuery ANDing
 bunch of different attributes in the index. Any samples please?

 Avlesh Singh wrote:
 
  Dive in - http://wiki.apache.org/solr/Solrj
 
  Cheers
  Avlesh
 
  On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com
 wrote:
 
 
  I want to build AND search query against field1 AND field2 etc. Both
  these
  fields are stored in an index. I am migrating lucene code to Solr.
  Following
  is my existing lucene code
 
  BooleanQuery currentSearchingQuery = new BooleanQuery();
 
  currentSearchingQuery.add(titleDescQuery,Occur.MUST);
  highlighter = new Highlighter( new QueryScorer(titleDescQuery));
 
  TermQuery searchTechGroupQyery = new TermQuery(new Term
  (techGroup,searchForm.getTechGroup()));
 currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
  TermQuery searchProgramQyery = new TermQuery(new
  Term(techProgram,searchForm.getTechProgram()));
 currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
  }
 
  What's the equivalent Solr code for above Luce code. Any samples would
 be
  appreciated.
 
  Thanks,
  --
  View this message in context:
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Obtaining list of dynamic fields beind available in index

2009-11-13 Thread Avlesh Singh

Luke Handler? - http://wiki.apache.org/solr/LukeRequestHandler
/admin/luke?numTerms=0

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:05 PM, Eugene Dzhurinsky b...@redwerk.comwrote:

 Hi there!

 How can we retrieve the complete list of dynamic fields, which are
 currently
 available in index?

 Thank you in advance!
 --
 Eugene N Dzhurinsky

Re: how to search against multiple attributes in the index

2009-11-13 Thread Avlesh Singh


 you can do it using
 solrQuery.setFilterQueries() and build AND queries of multiple parameters.

Nope. You would need to read more -
http://wiki.apache.org/solr/FilterQueryGuidance

For your impatience, here's a quick starter -

#and between two fields
solrQuery.setQuery(+field1:foo +field2:bar);

#or between two fields
solrQuery.setQuery(field1:foo field2:bar);

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev vika...@yahoo.com wrote:


 I think I found the answer. needed to read more API documentation :-)

 you can do it using
 solrQuery.setFilterQueries() and build AND queries of multiple parameters.


 Avlesh Singh wrote:
 
  For a starting point, this might be a good read -
 
 http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
 
  Cheers
  Avlesh
 
  On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com
  wrote:
 
 
  I already did  dive in before. I am using solrj API and SolrQuery object
  to
  build query. but its not clear/written how to build booleanQuery ANDing
  bunch of different attributes in the index. Any samples please?
 
  Avlesh Singh wrote:
  
   Dive in - http://wiki.apache.org/solr/Solrj
  
   Cheers
   Avlesh
  
   On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com
  wrote:
  
  
   I want to build AND search query against field1 AND field2 etc. Both
   these
   fields are stored in an index. I am migrating lucene code to Solr.
   Following
   is my existing lucene code
  
   BooleanQuery currentSearchingQuery = new BooleanQuery();
  
   currentSearchingQuery.add(titleDescQuery,Occur.MUST);
   highlighter = new Highlighter( new QueryScorer(titleDescQuery));
  
   TermQuery searchTechGroupQyery = new TermQuery(new Term
   (techGroup,searchForm.getTechGroup()));
  currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
   TermQuery searchProgramQyery = new TermQuery(new
   Term(techProgram,searchForm.getTechProgram()));
  currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
   }
  
   What's the equivalent Solr code for above Luce code. Any samples
 would
  be
   appreciated.
  
   Thanks,
   --
   View this message in context:
  
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 
  --
  View this message in context:
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reseting doc boosts

2009-11-13 Thread Avlesh Singh

AFAIK there is no way to reset the doc boost. You would need to re-index.
Moreover, there is no way to search by boost.

Cheers
Avlesh

On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote:

 Hi,

 Im trying to figure out if there is an easy way to basically reset all of
 any doc boosts which you have made (for analytical purposes) ... for example
 if I run an index, gather report, doc boost on the report, and reset the
 boosts @ time of next index ...

 It would seem to be from just knowing how Lucene works that I would really
 need to reindex since its a attrib on the doc itself which would have to be
 modified, but there is no easy way to query for docs which have been boosted
 either.  Any insight?

 Thanks.

 - Jon

Re: [DIH] concurrent requests to DIH

2009-11-11 Thread Avlesh Singh

1. Is it considered as good practice to set up several DIH request
handlers, one for each possible parameter value?

Nothing wrong with this. My assumption is that you want to do this to speed
up indexing. Each DIH instance would block all others, once a Lucene commit
for the former is performed.

Nope.

I had done a similar exercise in my quest to write a
ParallelDataImportHandler. This thread might be of interest to you -
http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler.
Though there is a ticket in JIRA, I haven't been able to contribute this
back. If you think this is what you need, lemme know.

Cheers
Avlesh

On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott sz...@zib.de wrote:

Hi all,

I'm using the DIH in a parameterized way by passing request parameters
that are used inside of my data-config. All imports end up in the same
index.

1. Is it considered as good practice to set up several DIH request
handlers, one for each possible parameter value?

2. In case the range of parameter values is broad, it's not convenient to
define separate request handlers for each value. But this entails a
limitation (as far as I see): It is not possible to fire several request
to the same DIH handler (with different parameter values) at the same
time. However, in case several request handlers would be used (as in 1.),
concurrent requests (to the different handlers) are possible. So, how to
overcome this limitation?

Best,
Sascha

Re: Question about the message Indexing failed. Rolled back all changes.

2009-11-10 Thread Avlesh Singh


 But  even after I successfully index data using 
 http://host:port/solr-example/dataimport?command=full-importcommit=trueclean=true,
 do solr search which returns meaningful results

I am not sure what meaningful means. The full-import command starts an
asynchronous process to start re-indexing. The response that you get in
return to the above mentioned URL, (always) indicates that a full-import has
been started. It does NOT know about anything that might go wrong with the
process itself.

and then visit http://host:port/solr-example/dataimport?command=status, I
 can see thefollowing result ...

The status URL is the one which tells you what is going on with the process.
The message - Indexing failed. Rolled back all changes can come because of
multiple reasons - missing database drivers, incorrect sql queries, runtime
errors in custom transformers etc.

Start the full-import once more. Keep a watch on the Solr server log. If you
can figure out what's going wrong, great; otherwise, copy-paste the
exception stack-trace from the log file for specific answers.

Cheers
Avlesh

On Tue, Nov 10, 2009 at 1:32 PM, Bertie Shen bertie.s...@gmail.com wrote:

 No. I did not check the logs.

 But  even after I successfully index data using
 http://host:port
 /solr-example/dataimport?command=full-importcommit=trueclean=true,
 do solr search which returns meaningful results, and then visit
 http://host:port/solr-example/dataimport?command=status, I can see the
 following result

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 /lst
 -
 lst name=initArgs
 -
 lst name=defaults
 str name=configdata-config.xml/str
 /lst
 /lst
 str name=commandstatus/str
 str name=statusidle/str
 str name=importResponse/
 -
 lst name=statusMessages
 str name=Time Elapsed0:2:11.426/str
 str name=Total Requests made to DataSource584/str
 str name=Total Rows Fetched1538/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2009-11-09 23:54:41/str
 *str name=Indexing failed. Rolled back all changes./str*
 str name=Committed2009-11-09 23:54:42/str
 str name=Optimized2009-11-09 23:54:42/str
 str name=Rolledback2009-11-09 23:54:42/str
 /lst
 -
 str name=WARNING
 This response format is experimental.  It is likely to change in the
 future.
 /str
 /response

 On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen bertie.s...@gmail.com
 wrote:
 
  
When I use
   http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport to
   debug
   the indexing config file, I always see the status message on the right
  part
   str name=Indexing failed. Rolled back all changes./str, even the
   indexing process looks to be successful. I am not sure whether you guys
   have
   seen the same phenomenon or not.  BTW, I usually check the checkbox
 Clean
   and sometimes check Commit box, and then click Debug Now button.
  
  
  Do you see any exceptions in the logs?
 
  --
  Regards,
  Shalin Shekhar Mangar.

Re: How TEXT field make sortable?

2009-11-09 Thread Avlesh Singh


 Can some one help me how we can sort the text field.

You CANNOT sort on a text field. Sorting can only be done on an
untokenized field (e.g string, sint, sfloat etc fields)

Cheers
Avlesh

On Tue, Nov 10, 2009 at 11:44 AM, deepak agrawal dk.a...@gmail.com wrote:

 Can some one help me how we can sort the text field.

 field name=TITLE type=text indexed=true stored=true/

 --
 DEEPAK AGRAWAL
 +91-9379433455
 GOOD LUCK.

Re: solr query help alpha numeric and not

2009-11-05 Thread Avlesh Singh

Didn't the queries in my reply work?

Cheers
Avlesh

On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund jnyl...@yahoo.com wrote:

 Hi yes its a string, in the case of a title, it can be anything, a letter a
 number, a symbol or a multibyte char etc.

 Any ideas if I wanted a query that was not a letter a-z or a number 0-9,
 given that its a string?

 thanks
 Joel


 On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote:

  Hi Joel,

 The ID is sent back as a string (instead of as an integer) in your
 example. Could this be the cause?

 - Jonathan

 On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote:

  Hi, I have a field called firstLetterTitle, this field has 1 char, it can
 be anything, I need help with a few queries on this char:

 1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z or
 0-9

 I tried:


 http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z

 But I get back numeric results:

 doc
 str name=firstLetterTitle9/str
 str name=id23946447/str
 /doc


 2.) I want all only Numerics:

 http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209

 This seems to work but just checking if its the right way.



 2.) I want all only English Letters:

 http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z

 This seems to work but just checking if its the right way.


 thanks
 Joel

Re: Bug with DIH and MySQL CONCAT()?

2009-11-04 Thread Avlesh Singh

Try cast(concat(...) as char) ...

Cheers
Avlesh

On Wed, Nov 4, 2009 at 7:36 PM, Jonathan Hendler jonathan.hend...@gmail.com
 wrote:

 Hi All,

 I have an SQL query that begins with SELECT CONCAT (  'ID', Subject.id  ,
 ':' , Subject.name , ':L', Subject.level) as subject_name and the query
 runs great against MySQL from the command line.
 Since this is a nested entity, the schema.xml contains field
 name=subject_name type=string indexed=true stored=true
 multiValued=true /

 After a full-import, a select output of the xml looks like

 arr name=subject_name
 str[...@1db4c43/str
 str[...@6bcef1/str
 str[...@1df503b/str
 str[...@c5dbb/str
 str[...@1ddc3ea/str
 str[...@6963b0/str
 str[...@10fe215/str
 ...
 

 Without a CONCAT - it works fine.

 Is this a bug?

 Meanwhile - should I go about concatenating some where else in the DIH
 config?

 Thanks.

 - Jonathan

Re: How to integrate Solr into my project

2009-11-03 Thread Avlesh Singh

Take a look at this - http://wiki.apache.org/solr/Solrj

Cheers
Avlesh

On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan caroline@gmail.com wrote:

 Hi,
 I wish to intergrate Solr into my current working project. I've played
 around the Solr example and get it started in my tomcat. But the next step
 is HOW do i integrate that into my working project? You see, Lucence
 provides API and tutorial on what class i need to instanstiate in order to
 index and search. But Solr seems to be pretty vague on this..as it is a
 working solr search server. Can anybody help me by stating the steps by
 steps, what classes that i should look into in order to assimiliate Solr
 into my project?
 Thanks.

 regards
 ~caroLine

Re: SolrJ looping until I get all the results

2009-11-02 Thread Avlesh Singh


 This isn't a search, this is a search and destroy.  Basically I need the
 file names of all the documents that I've indexed in Solr so that I can
 delete them.

Okay. I am sure you are aware of the fl parameter which restricts the
number of fields returned back with a response. If you need limited info, it
might be a good idea to use this parameter.

Cheers
Avlesh

On Tue, Nov 3, 2009 at 7:23 AM, Paul Tomblin ptomb...@xcski.com wrote:

 On Mon, Nov 2, 2009 at 8:47 PM, Avlesh Singh avl...@gmail.com wrote:
 
  I was doing it that way, but what I'm doing with the documents is do
  some manipulation and put the new classes into a different list.
  Because I basically have two times the number of documents in lists,
  I'm running out of memory.  So I figured if I do it 1000 documents at
  a time, the SolrDocumentList will get garbage collected at least.
 
  You are right w.r.t to all that but I am surprised that you would need
 ALL
  the documents from the index for a search requirement.

 This isn't a search, this is a search and destroy.  Basically I need
 the file names of all the documents that I've indexed in Solr so that
 I can delete them.

 --
 http://www.linkedin.com/in/paultomblin
 http://careers.stackoverflow.com/ptomblin

Re: solrj query size limit?

2009-11-02 Thread Avlesh Singh

Did you hit the limit for maximum number of characters in a GET request?

Cheers
Avlesh

On Tue, Nov 3, 2009 at 9:36 AM, Gregg Horan greggho...@gmail.com wrote:

 I'm constructing a query using solrj that has a fairly large number of 'OR'
 clauses.  I'm just adding it as a big string to setQuery(), in the format
 accountId:(this OR that OR yada).

 This works all day long with 300 values.  When I push it up to 350-400
 values, I get a Bad Request SolrServerException.  It appears to just be a
 client error - nothing reaching the server logs.  Very repeatable... dial
 it
 back down and it goes through again fine.

 The total string length of the query (including a handful of other faceting
 entries) is about 9500chars.   I do have the maxBooleanClauses jacked up to
 2048.  Using javabin.  1.4-dev.

 Are there any other options or settings I might be overlooking?

 -Gregg

Re: problems with PhraseHighlighter

2009-11-01 Thread Avlesh Singh

Copy-paste your field definition for the field you are trying to
highlight/search on.

Cheers
Avlesh

On Sun, Nov 1, 2009 at 8:24 PM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hello everyone,

 I am having problems with highlighting the complete text of a field. I have
 an xml field. I am querying proximity searches on this field.

 xml:  ( proximity1 AND/OR proximity2 AND/OR …)

 Results are returned successfully satisfying the proximity query. However
 when I request highlighting sometimes it returns nothing sometimes it
 returns missing proximity terms.

 I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml.
 maxFieldLength2147483647/maxFieldLength

 I am using these highlighting parameters:

 hl.maxAnalyzedChars=2147483647
 hl.fragsize=2147483647
 hl.usePhraseHighlighter=true
 hl.requireFieldMatch=true
 hl.fl=xml
 hl=true

 I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it
 didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns
 but all query terms are highlighted.

 What value of hl.fragsize should I use to highlight complete text of a
 field? 0 or 2147483647?

 What is the highest value that I can set to hl.maxAnalyzedChars and
 hl.fragsize?

 I am querying same field and requesting same field in highlighting.
 Although a document matches a query no highlighting returns back. What could
 be the reason?

 If a document matches a query, there should be highlighting returning back,
 right?

 Any help or pointers are really appreciated.

Re: best way to model 1-N

2009-10-31 Thread Avlesh Singh


 what am I missing?

Change your entity name=category query=select cfcr.feedId ... to
entity name=category *transformer=RegexTransformer* query=select
cfcr.feedId .. The splitBy directive is understood by this transformer
and in your case the attribute was simply ignored.

Don't forget to re-index once you have changed.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 9:33 PM, Joel Nylund jnyl...@yahoo.com wrote:

 Thanks Chantal, I will keep that in mind for tuning,

 for sql I figured  way to combine them into one row using concat, but I
 still seem to be having an issue splitting them:

 Db now returns as one column categoryType:
 TOPIC,LANGUAGE

 but my solr result, if you note the item in categoryType  all seem to be
 within one str, I would expect it to be in multiple strings within the
 array, is this assumption wrong?

 doc
 -
 arr name=categoryType
 strTOPIC,LANGUAGE/str
 /arr
 str name=id40/str
 str name=titlefeed title/str
 /doc


 Here is my import:
  document name=doc
entity name=item
   query=SELECT f.id, f.title
FROM Feed f
field column=id name=id /
field column=title name=title /
entity name=category query=select cfcr.feedId,
 group_concat(cfcr.categoryType) as categoryType
from CFR cfcr
where
cfcr.feedId = '${item.id}'
 AND
group by cfcr.feedId
field column=categoryType
 name=categoryType splityBy=, /
/entity

 /entity

 In schema:
field name=categoryType type=text indexed=true stored=true
 required=false multiValued=true/
field name=categoryName type=text indexed=true stored=true
 required=false multiValued=true/


 what am I missing?

 thanks
 Joel



 On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:

  That depends a bit on your database, but it is tricky and might not be
 performant.

 If you are more of a Java developer, you might prefer retrieving mutliple
 rows per SOLR document from your dataSource (join on your category and main
 table), and aggregate them in your custom EntityProcessor. I got a far(!)
 better performance retrieving everything in one query and doing the
 aggregation in Java. But this is, of course, depending on your table
 structure and data.

 Noble Paul helped me with the custom EntityProcessor, and it turned out
 quite easy. Have a look at the thread with the heading from this mailing
 list (SOLR-USER):
 DataImportHandler / Import from DB : one data set comes in multiple rows

 Cheers,
 Chantal


 Joel Nylund schrieb:

 thanks, but im confused how I can aggregate across rows, I dont know
 of any easy way to get my db to return one row for all the categories
 (given the hint from your other email), I have split the category
 query into a separate entity, but its returning multiple rows, how do
 I combine multiple rows into 1 index entity?
 thanks
 Joel
 On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:

 In the database this is modeled a a 1-N where category table has the
 mapping of feed to category
 I need to be able to query , give me all the feeds in any given
 category.
 How can I best model this in solr?
 Seems like multiValued field might help, but how would I populate
 it, and
 would the query above work?.

  Yes you are right. A multivalued field for categories is the answer.

 For populating in the index -

  1. If you use DIH to populate your indexes and your datasource is a
  database then you can use DIH's RegexTransformer on an aggregated
 list of
  categories. e.g. if your database query retruns a,b,c,d in a
 column called
  db_categories, this is how you would put it in DIH's data-config
 file -
  field column=db_categories name=categories splityBy=, /.
  2. If you add documents to Solr yourself  multiple values for
 the field
  can be specified as an array or list of values in the
 SolrInputDocument.

 A multivalued field provides the same faceting and searching
 capabilites
 like regular fields. There is no special syntax.

 Cheers
 Avlesh

 On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com
 wrote:

  Hi,

 I have one index so far which contains feeds.  I have been able to
 de-normalize several tables and map this data onto the feed entity.
 There is
 one tricky problem that I need help on.

 Feeds have 1 - many categories.

 So Lets say we have Category1, Category2 and Category3

 Feed 1 - is in Category 1
 Feed 2 is in category2 and category3
 Feed 3 is in category2
 Feed 4 has no category

 In the database this is modeled a a 1-N where category table has the
 mapping of feed to category

 I need to be able to query , give me all the feeds in any given
 category.

 How can I best model this in solr?

 Seems like multiValued field might help, but how would I populate
 it, and
 would

Re: autocomplete

2009-10-31 Thread Avlesh Singh



 q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?;

Why is the json.wrf not specified? Without the callback function, the string
that is return back is illegal javascript for the browser. You need to
specify this parameter which is a wrapper or a callback function. If you
specify json.wrf=foo, as soon as the browser gets a response, it would call
a function named foo (needs to already defined). Inside foo you can have
you own implementation to interpret and render this data.

Cheers
Avlesh

On Sat, Oct 31, 2009 at 12:13 AM, Ankit Bhatnagar abhatna...@vantage.comwrote:


 Hi guys,

 Enterprise 1.4 Solr Book (AutoComplete) says this works -

 My query looks like -


 q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?;


 And it returns three results


 {
  responseHeader:{
  status:0,
  QTime:38,
  params:{
indent:on,
start:0,
q:*:*,
wt:json,
fq:ac:*all*,
rows:15}},
  response:{numFound:3,start:0,docs:[
{
 id:1,
 ac:Can you show me all the results},
{
 id:2,
 ac:Can you show all companies },
{
 id:3,
 ac:Can you list all companies}]
  }}



 But browser says syntax error --


 Ankit

Re: Iso accents and wildcards

2009-10-31 Thread Avlesh Singh


 When I request with title:econ* I can have the correct  answers, but if  I
 request  with  title:écon*  I  have no  answers.
 If I request with title:économ (the exact word of the index) it works, so
 there might be something wrong with the wildcard.
 As far as I can understand the analyser should be use exactly the same in
 both index and query time.

Wildcard queries are not analyzed and hence the inconsistent behaviour.
The easiest way out is to define one more field title_orginal as an
untokenized field. While querying, you can use both the fields at the same
time. e.g. q=(title:écon* title_orginal:écon*). In any case, you would get
desired matches.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 9:19 PM, Nicolas Leconte nicolas.ai...@aidel.comwrote:

 Hi all,

 I have a field that contains accentuated char in it, what I whant is to be
 able to search with ignore accents.
 I have set up that field with :
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.SnowballPorterFilterFactory language=French/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer

 In the index the word économie is translated to  econom, the  accent is
 removed thanks to the ISOLatin1AccentFilterFactory and the end of the word
 removent thanks to the SnowballPorterFilterFactory.

 When I request with title:econ* I can have the correct  answers, but if  I
 request  with  title:écon*  I  have no  answers.
 If I request with title:économ (the exact word of the index) it works, so
 there might be something wrong with the wildcard.
 As far as I can understand the analyser should be use exactly the same in
 both index and query time.

 I have tested with changing the order of the filters (putting the
 ISOLatin1AccentFilterFactory on top) without any result.

 Could anybody help me with that and point me what may be wrong with my
 shema ?

Re: Indexing multiple entities

2009-10-31 Thread Avlesh Singh


 The use case on DocumentObjectBinder is that I could override
 toSolrInputDocument, and if field = ID, I could do: setField(id,
 obj.getClass().getName() + obj.getId()) or something like that.


Unless I am missing something here, can't you write the getter of id field
in your solr bean as underneath?

@Field
private String id;
public getId(){
  return (this.getClass().getName() + this.id);
}

Cheers
Avlesh

On Fri, Oct 30, 2009 at 1:33 PM, Christian López Espínola 
penyask...@gmail.com wrote:

 On Fri, Oct 30, 2009 at 2:04 AM, Avlesh Singh avl...@gmail.com wrote:
 
  One thing I thought about is if I can define my own
  DocumentObjectBinder, so I can concatenate my entity names with the
  IDs in the XML creation.
 
  Anyone knows if something like this can be done without modifying
  Solrj sources? Is there any injection or plugin mecanism for this?
 
  More details on the use-case please.

 If I index a Book with ID=3, and then a Magazine with ID=3, I'll be
 really removing my Book3 and indexing Magazine3. I want both entities
 to be in the index.

 The use case on DocumentObjectBinder is that I could override
 toSolrInputDocument, and if field = ID, I could do: setField(id,
 obj.getClass().getName() + obj.getId()) or something like that.

 The goal is avoiding creating all the XMLs to be sent to Solr but
 having the possibility of modifying them in some way.

 Do you know how can I do that, or a better way of achieving the same
 results?


  Cheers
  Avlesh
 
  On Fri, Oct 30, 2009 at 2:16 AM, Christian López Espínola 
  penyask...@gmail.com wrote:
 
  Hi Israel,
 
  Thanks for your suggestion,
 
  On Thu, Oct 29, 2009 at 9:37 PM, Israel Ekpo israele...@gmail.com
 wrote:
   On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola 
   penyask...@gmail.com wrote:
  
   Hi, my name is Christian and I'm a newbie introducing to solr (and
  solrj).
  
   I'm working on a website where I want to index multiple entities,
 like
   Book or Magazine.
   The issue I'm facing is both of them have an attribute ID, which I
   want to use as the uniqueKey on my schema, so I cannot identify
   uniquely a document (because ID is saved in a database too, and it's
   autonumeric).
  
   I'm sure that this is a common pattern, but I don't find the way of
  solving
   it.
  
   How do you usually solve this? Thanks in advance.
  
  
   --
   Cheers,
  
   Christian López Espínola penyaskito
  
  
   Hi Christian,
  
   It looks like you are bringing in data to Solr from a database where
  there
   are two separate tables.
  
   One for *Books* and another one for *Magazines*.
  
   If this is the case, you could define your uniqueKey element in Solr
  schema
   to be a string instead of an integer then you can still load
 documents
   from both the books and magazines database tables but your could
 prefix
  the
   uniqueKey field with B for books and M for magazines
  
   Like so :
  
   field name=id type=string indexed=true stored=true
   required=true/
  
   uniqueKeyid/uniqueKey
  
   Then when loading the books or magazines into Solr you can create the
   documents with id fields like this
  
   add
doc
  field name=idB14000/field
/doc
doc
  field name=idM14000/field
/doc
doc
  field name=idB14001/field
/doc
doc
  field name=idM14001/field
/doc
   /add
  
   I hope this helps
 
  This was my first thought, but in practice there isn't Book and
  Magazine, but about 50 different entities, so I'm using the Field
  annotation of solrj for simplifying my code (it manages for me the XML
  creation, etc).
  One thing I thought about is if I can define my own
  DocumentObjectBinder, so I can concatenate my entity names with the
  IDs in the XML creation.
 
  Anyone knows if something like this can be done without modifying
  Solrj sources? Is there any injection or plugin mecanism for this?
 
  Thanks in advance.
 
 
   --
   Good Enough is not good enough.
   To give anything less than your best is to sacrifice the gift.
   Quality First. Measure Twice. Cut Once.
  
 
 
 
  --
  Cheers,
 
  Christian López Espínola penyaskito
 
 



 --
 Cheers,

 Christian López Espínola penyaskito

Re: begins with searches

2009-10-31 Thread Avlesh Singh


 G'day Avlesh, converting the all field to type edgytext doesn't work as
 expected as the various text analysers etc don't get to work on that
 field, so I get less results than expected. And adding the edgy filter into
 the text field also yields less results. I can work around the issue by
 setting up a new beginswith edgytext field and using copyfield to copy the
 relevant fields into it.

You are absolutely right. What you think of being a work-around is actually
a solution!

But this approach doesn't really suit our parent application's main search
 screen, which is a single box labelled quick search. Users will be puzzled
 as to why a search for beginswith:Houghton, b yields 20 results, while a
 search for Houghton, b yields 10. And also puzzled as to why Houghton,
 b* won't work. as they expect - people are already familiar with using
 wildcards. A way to get around this user perception problem is to get rid of
 the single search box and set up a series of drop down boxes for type of
 search (begins with, etc), along with field names. We might have to go
 there, but the ideal solution from our perspective would be for users to be
 able to enter terms in the quick search box without any field prefix, and
 have solr go off and search all field names/types.

As I said earlier, a field can be analyzed in only ONE way. In your kind of
requirements, multiple searching capabilities are needed for a single query.
Unfortunately, not all of these can be addressed by a single field. The
solution is to create multiple fields set up with different analyzers
(tokenizers and filters) while indexing and searching. At query time an OR
query can be done for all such fields (with a corresponding boost for a
particular field, if desired). Lucene would automatically rank the results
in correct order based on hits across multiple fields.

Hope this helps. And sorry for the delayed response.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 3:22 AM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 G'day Avlesh, converting the all field to type edgytext doesn't work as
 expected as the various text analysers etc don't get to work on that
 field, so I get less results than expected. And adding the edgy filter into
 the text field also yields less results. I can work around the issue by
 setting up a new beginswith edgytext field and using copyfield to copy the
 relevant fields into it.

 But this approach doesn't really suit our parent application's main search
 screen, which is a single box labelled quick search. Users will be puzzled
 as to why a search for beginswith:Houghton, b yields 20 results, while a
 search for Houghton, b yields 10. And also puzzled as to why Houghton,
 b* won't work.as they expect - people are already familiar with using
 wildcards. A way to get around this user perception problem is to get rid of
 the single search box and set up a series of drop down boxes for type of
 search (begins with, etc), along with field names. We might have to go
 there, but the ideal solution from our perspective would be for users to be
 able to enter terms in the quick search box without any field prefix, and
 have solr go off and search all field names/types.

 By the way, our text field type config is currently set as -

fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 Bern


 -Original Message-
 From: Avlesh Singh [mailto:avl...@gmail.com]
 Sent: Thursday, 29 October 2009 12:35 PM
 To: solr-user@lucene.apache.org
 Subject: Re: begins

Re: best way to model 1-N

2009-10-29 Thread Avlesh Singh


 In the database this is modeled a a 1-N where category table has the
 mapping of feed to category
 I need to be able to query , give me all the feeds in any given category.
 How can I best model this in solr?
 Seems like multiValued field might help, but how would I populate it, and
 would the query above work?.

Yes you are right. A multivalued field for categories is the answer.

For populating in the index -

   1. If you use DIH to populate your indexes and your datasource is a
   database then you can use DIH's RegexTransformer on an aggregated list of
   categories. e.g. if your database query retruns a,b,c,d in a column called
   db_categories, this is how you would put it in DIH's data-config file -
   field column=db_categories name=categories splityBy=, /.
   2. If you add documents to Solr yourself  multiple values for the field
   can be specified as an array or list of values in the SolrInputDocument.

A multivalued field provides the same faceting and searching capabilites
like regular fields. There is no special syntax.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com wrote:

 Hi,

 I have one index so far which contains feeds.  I have been able to
 de-normalize several tables and map this data onto the feed entity. There is
 one tricky problem that I need help on.

 Feeds have 1 - many categories.

 So Lets say we have Category1, Category2 and Category3

 Feed 1 - is in Category 1
 Feed 2 is in category2 and category3
 Feed 3 is in category2
 Feed 4 has no category

 In the database this is modeled a a 1-N where category table has the
 mapping of feed to category

 I need to be able to query , give me all the feeds in any given category.

 How can I best model this in solr?

 Seems like multiValued field might help, but how would I populate it, and
 would the query above work?.

 thanks
 Joel

Re: multiple sql queries for one index?

2009-10-29 Thread Avlesh Singh

Read this example fully -
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
nested entities is an answer to your question. The example has a sample.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 2:58 AM, Joel Nylund jnyl...@yahoo.com wrote:

 Hi, Its been hurting my brain all day to try to build 1 query for my index
 (joins upon joins upon joins). Is there a way I can do multiple queries to
 populate the same index? I have one main table that I can join everything
 back via ID, it should be theoretically possible

 If this can be done, can someone point me to an example?

 thanks
 Joel

Re: Indexing multiple entities

2009-10-29 Thread Avlesh Singh


 One thing I thought about is if I can define my own
 DocumentObjectBinder, so I can concatenate my entity names with the
 IDs in the XML creation.

 Anyone knows if something like this can be done without modifying
 Solrj sources? Is there any injection or plugin mecanism for this?

More details on the use-case please.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 2:16 AM, Christian López Espínola 
penyask...@gmail.com wrote:

 Hi Israel,

 Thanks for your suggestion,

 On Thu, Oct 29, 2009 at 9:37 PM, Israel Ekpo israele...@gmail.com wrote:
  On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola 
  penyask...@gmail.com wrote:
 
  Hi, my name is Christian and I'm a newbie introducing to solr (and
 solrj).
 
  I'm working on a website where I want to index multiple entities, like
  Book or Magazine.
  The issue I'm facing is both of them have an attribute ID, which I
  want to use as the uniqueKey on my schema, so I cannot identify
  uniquely a document (because ID is saved in a database too, and it's
  autonumeric).
 
  I'm sure that this is a common pattern, but I don't find the way of
 solving
  it.
 
  How do you usually solve this? Thanks in advance.
 
 
  --
  Cheers,
 
  Christian López Espínola penyaskito
 
 
  Hi Christian,
 
  It looks like you are bringing in data to Solr from a database where
 there
  are two separate tables.
 
  One for *Books* and another one for *Magazines*.
 
  If this is the case, you could define your uniqueKey element in Solr
 schema
  to be a string instead of an integer then you can still load documents
  from both the books and magazines database tables but your could prefix
 the
  uniqueKey field with B for books and M for magazines
 
  Like so :
 
  field name=id type=string indexed=true stored=true
  required=true/
 
  uniqueKeyid/uniqueKey
 
  Then when loading the books or magazines into Solr you can create the
  documents with id fields like this
 
  add
   doc
 field name=idB14000/field
   /doc
   doc
 field name=idM14000/field
   /doc
   doc
 field name=idB14001/field
   /doc
   doc
 field name=idM14001/field
   /doc
  /add
 
  I hope this helps

 This was my first thought, but in practice there isn't Book and
 Magazine, but about 50 different entities, so I'm using the Field
 annotation of solrj for simplifying my code (it manages for me the XML
 creation, etc).
 One thing I thought about is if I can define my own
 DocumentObjectBinder, so I can concatenate my entity names with the
 IDs in the XML creation.

 Anyone knows if something like this can be done without modifying
 Solrj sources? Is there any injection or plugin mecanism for this?

 Thanks in advance.


  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
 



 --
 Cheers,

 Christian López Espínola penyaskito

Re: MLT cross core

2009-10-28 Thread Avlesh Singh


 My thought now is I cannot use MLT and instead must do a query to B using
 the fields from core A ID as query params. Is there big difference in what
 will be returned as results using query instead of MLT?

Yes, there is definitely a difference between the results from a MLT handler
and any other search handler. MLT components/handlers are supposed to return
similar results based on stored TermVectors for a field. Search handlers
are supposed to return exact matching results based on indexed values for
a field.

In your multi-core set-up, I don't think you are anywhere close to using
MLT. The arrangement looks more like a search query.

Cheers
Avlesh

On Wed, Oct 28, 2009 at 5:37 PM, Adamsky, Robert radam...@techtarget.comwrote:


  Have two cores with some common fields in their schemas.  I want to
 perform
  a MLT query on one core and get results from the other schema. Both cores
  have same type of id.
 
 Having the same type of id in two different cores is of no good for a
 MLT
 handler (which in-fact operates on one core)

 Right; the cores share the same data type and name for their ID.
 Was hoping that would allow me to do the same thing I am doing for
 cross core queries on common schema fields - I can query one core and
 get aggregate results from both based on common fields.

  How is it suggested to perform a MLT query cross core / schema?
  Currently,
  I only see getting the result from one core and performing a query with
 the
  common fields in the second core and treating those results as MLT
 results.
 
 It depends on your requirement. If it is about simply aggregating the
 results, then you can run MLT handler for both the cores independently and
 merge the response thereafter based on your understanding of the
 underlying
 data in the responses.

 Am trying to do the following given:

 - Two cores with common named and typed IDs (no dupes between them)
 - Some number of common fields for other data like title and body.

 Make a MLT query to core A (or event core B) passing in ID of core A and
 getting
 MLT results that have data from core B only.

 My thought now is I cannot use MLT and instead must do a query to B using
 the fields from core A ID as query params.

 Is there big difference in what will be returned as results using query
 instead
 of MLT?

Re: MLT cross core

2009-10-28 Thread Avlesh Singh


 Does that mean that you cannot do a 'MLT' query from one core result to get
 MLT from another (even if there is some common schema between)?

You can always run MLT handlers on a core. Each MLT handler takes certain
parameters based on which similar results are fetched. You would need to
pass such parameters for MLT handler to work properly. It is immaterial
where do you get these values from. In your case it happens to be an outcome
of results from another core. As I said, it hardly matters.

Cheers
Avlesh

On Wed, Oct 28, 2009 at 10:10 PM, Adamsky, Robert
radam...@techtarget.comwrote:


 Thanks for the reply --

  In your multi-core set-up, I don't think you are anywhere close to using
  MLT. The arrangement looks more like a search query.

 Does that mean that you cannot do a 'MLT' query from one core result
 to get MLT from another (even if there is some common schema between)?

Re: Simple problem with a nested entity and it's SQL

2009-10-28 Thread Avlesh Singh

Shouldn't this work too?
SELECT *  FROM table2 WHERE IS NOT NULL
${table1.somethin_like_a_foreign_key} AND
${table1.somethin_like_a_foreign_key}  0 AND id =
${table1.somethin_like_a_foreign_key}

Cheers
Avlesh

On Wed, Oct 28, 2009 at 11:03 PM, Jonathan Hendler 
jonathan.hend...@gmail.com wrote:

 I have a nested entity on a jdbc data import handler that is causing an SQL
 error because the second key is either NULL (blank when generating the sql)
 or non-zero INT.
 The query is in the following form:

 document name=content
entity name=bl_lessonfiles
 transformer=TemplateTransformer query=SELECT * FROM table1 
  ...
entity name=user_index query=SELECT *  FROM
 table2 WHERE  id = ${table1.somethin_like_a_foreign_key} 

/entity
/entity
/document

 Is the only way to avoid this to modify the source DB schema to be NOT NULL
 so it always returns at least a 0?

 - Jonathan

Re: Simple problem with a nested entity and it's SQL

2009-10-28 Thread Avlesh Singh

Assuming this to be MySQL, will this work -
SELECT *  FROM table2 WHERE id =
IF(ISNULL(${table1.somethin_like_a_foreign_key}), 0,
${table1.somethin_like_a_foreign_key});

Cheers
Avlesh

On Wed, Oct 28, 2009 at 11:12 PM, Jonathan Hendler 
jonathan.hend...@gmail.com wrote:

 No - the SQL will fail to validate because at runtime it will look like


  SELECT *  FROM table2 WHERE
 IS NOT NULL table1.somethin_like_a_foreign_key
 AND table1.somethin_like_a_foreign_key  0
 AND id =



 Note the id = 


 On Oct 28, 2009, at 1:38 PM, Avlesh Singh wrote:

  Shouldn't this work too?
 SELECT *  FROM table2 WHERE IS NOT NULL
 ${table1.somethin_like_a_foreign_key} AND
 ${table1.somethin_like_a_foreign_key}  0 AND id =
 ${table1.somethin_like_a_foreign_key}

 Cheers
 Avlesh

 On Wed, Oct 28, 2009 at 11:03 PM, Jonathan Hendler 
 jonathan.hend...@gmail.com wrote:

  I have a nested entity on a jdbc data import handler that is causing an
 SQL
 error because the second key is either NULL (blank when generating the
 sql)
 or non-zero INT.
 The query is in the following form:

 document name=content
  entity name=bl_lessonfiles
 transformer=TemplateTransformer query=SELECT * FROM table1 
...
  entity name=user_index query=SELECT *  FROM
 table2 WHERE  id = ${table1.somethin_like_a_foreign_key} 
  
  /entity
  /entity
  /document

 Is the only way to avoid this to modify the source DB schema to be NOT
 NULL
 so it always returns at least a 0?

 - Jonathan

Re: faceting ordering

2009-10-28 Thread Avlesh Singh


 curious...is it possible to have faceted results ordered by score?

First, I am not sure what that means. Score of what? Documents? If yes, how
do you think the same should influence faceting?
Second, there are only two ways you can sort facet values on a field. More
here - http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

If you can further elaborate on your use case, you might get better
solutions for the problem at hand.

Cheers
Avlesh

On Wed, Oct 28, 2009 at 11:31 PM, Joe Calderon calderon@gmail.comwrote:

 curious...is it possible to have faceted results ordered by score?

 im having a problem where im faceting on a field while searching for
 the same word twice, for example:

 im searching for the the on a tokenized field and faceting by the
 untokenized version, faceting returns records with the the, but way
 at the bottom since everything with a single the happens to be way
 more frequent, i tried restricting my search to phrases with small
 slop:

 myfield:token1 token2 token3~3

 but that affects other searches negatively, ideally i want to be as
 loose as possible as these searches power an auto suggest feature

 i figured if faceted results could be sorted by score, i could simply
 boost phrases instead of restricting by them, thoughts?


 --joe

Re: Faceting within one document

2009-10-28 Thread Avlesh Singh

For facets -
http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
For terms - http://wiki.apache.org/solr/TermsComponent

Helps?

Cheers
Avlesh

On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg andrew.cl...@gmail.comwrote:


 Hi,

 If I give a query that matches a single document, and facet on a particular
 field, I get a list of all the terms in that field which appear in that
 document.

 (I also get some with a count of zero, I don't really understand where they
 come from... ?)

 Is it possible with faceting, or a similar mechanism, to get a count of how
 many times each term appears within that document?

 This would be really useful for building a list of top keywords within a
 long document, for summarization purposes. I can do it on the client side
 but it'd be nice to know if there's a quicker way.

 Thanks!

 Andrew.

 --
 View this message in context:
 http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: weird problem with letters S and T

2009-10-28 Thread Avlesh Singh


 Any ideas, are S and T special chars in query for solr?

Nope, they are NOT. My guess is that

   - You are using a text type field for firstLetterTitle which has the
   stopword filter applied to it.
   - Your stopwords.txt file contains the characters s and t because
   of which the above mentioned filter eats them up while indexing and
   searching.

If the above assumptions are correct, then there are two ways to fix it -

   - Remove the characters s and t from your stopwords.txt file and do a
   re-index. Searches should work fine after that.
   - For this particular use-case, you can keep your firstLetterTitle field
   as a string type untokenized field. You will not have to worry about
   stopwords in that case.

Cheers
Avlesh

On Thu, Oct 29, 2009 at 3:47 AM, Joel Nylund jnyl...@yahoo.com wrote:

 (I am super new to solr, sorry if this is an easy one)

 Hi, I want to support an A-Z type view of my data.

 I have a DataImportHandler that uses sql (my query is complex, but the part
 that matters is:

 SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f

 I can create this index with no issues.

 I can query the title with no problem:

 http://localhost:8983/solr/select?q=title:super

 I can query the first letters mostly with no problem:

 http://localhost:8983/solr/select?q=firstLetterTitle:a

 Returns all the foo's with the first letter a.

 This actually works with every letter except S and T

 If I query those, I get no results. The weird thing if I do the title query
 above with Super I get lots of results, and the xml shoes the
 firstLetterTitles for those to be S

 doc
 str name=firstLetterTitleS/str
 str name=id84861348/str
 str name=titleSuper Cool/str
 /doc
 -
 doc
 str name=firstLetterTitleS/str
 str name=id108692/str
 str name=titleSuper 45/str
 /doc
 -
 doc

 etc.

 Any ideas, are S and T special chars in query for solr?

 here is the response from the s query with debug = true

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime24/int
 -
 lst name=params
 str name=qfirstLetterTitle:s/str
 str name=debugQuerytrue/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 -
 lst name=debug
 str name=rawquerystringfirstLetterTitle:s/str
 str name=querystringfirstLetterTitle:s/str
 str name=parsedquery/
 str name=parsedquery_toString/
 lst name=explain/
 str name=QParserOldLuceneQParser/str
 -
 lst name=timing
 double name=time2.0/double
 -
 lst name=prepare
 double name=time1.0/double
 -
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time1.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 -
 lst name=process
 double name=time0.0/double
 -
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time0.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 -
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 /lst
 /lst
 /response



 thanks
 Joel

Re: begins with searches

2009-10-28 Thread Avlesh Singh

It sounds from what you say that I'm going to need to change the field type
to edgytext. Which won't achieve the result I want, viz. the current all
plus the edgytext. Any way to achieve this?

I guess there is a mismatch of expectations here. A field can be analyzed in
only ONE way. If your field all is of type text, indexing and searching
would go through the analyzers (tokenizers and filters) specified ONLY for
the text field. It does not matter if data from a edgytext or any other
field type is being copied into the field.

Having said that converting the all field to type edgytext should still
work fine. All your regular searches on a text field should also work with
the edgytext field. Ain't it like that?

Cheers
Avlesh

On Thu, Oct 29, 2009 at 2:52 AM, Bernadette Houghton
bernadette.hough...@deakin.edu.au wrote:

Here's the all code snippets -

!-- catchall field, containing all other searchable text fields
(implemented
via copyField further on in this schema --
field name=all type=text indexed=true stored=false
multiValued=true/
.
.
!-- field for the QueryParser to use when an explicit fieldname is absent
--
defaultSearchFieldall/defaultSearchField
.
.
!-- Copy for ALL search --
copyField source=*_t dest=*_t_ft/
copyField source=*_mt dest=*_mft/
copyField source=content dest=all/
copyField source=*_t dest=all/
copyField source=*_mt dest=all/

It sounds from what you say that I'm going to need to change the field type
to edgytext. Which won't achieve the result I want, viz. the current all
plus the edgytext. Any way to achieve this?

Thanks!
bern

-Original Message-
From: Avlesh Singh [mailto:avl...@gmail.com]
Sent: Wednesday, 28 October 2009 3:30 PM
To: solr-user@lucene.apache.org
Subject: Re: begins with searches

My next issue relates to how to get the results of the author field come
up
in a search across all fields. For example, a search on author:Houghton,
B
(which uses the edgytext) yields 16 documents, but a search on
all:Houghton, B (which doesn't) yields only 9. I thought the solution
should be copyfield source=*author_mt dest=all/ but that doesn't do
the trick.

Do you have a field called all? How is it set up? Can you post the
schema.xml snippet relating to this field here?
copyField is supported for a dynamic field source. copyfield
source=*author_mt dest=all/ should work for you as long as you have a
field called all defined in your schema. Moreover, for your specific use
case, the all field needs to be of type edgytext.

Cheers
Avlesh

On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton
bernadette.hough...@deakin.edu.au wrote:

Thanks Avlesh. The issue with not doing a phrase query on my edgytext
field was that my parent application was adding an escape character to
the
quotation marks, and I was hoping to fix (or rather, work around) at the
solr end to save maintenance overhead. But I've done a hack in the parent
application to remove those escape chars, and all is working well in that
respect.

Thanks!

bern
-Original Message-
From: Avlesh Singh [mailto:avl...@gmail.com]
Sent: Tuesday, 27 October 2009 5:54 PM
To: solr-user@lucene.apache.org
Subject: Re: begins with searches

You are right about the parsing of query terms without a double quote
(solrQueryParser's defaultOperator has to be AND in your case). For the
problem at hand, two things -

1. Do you have any reason for not doing a PhraseQuery (query terms
enclosed in double quotes) on your edgytext field? If not then you
can
always enclose your query in double quotes to get expected begins
with
matches.
2. You can always escape your query string before passing to Solr;
and
you wouldn't need to pass your query term in double quotes. For
exapmle,
search for the query string - surname, fre when escaped would be
converted
into surname,\+fre thereby asking Solr to treat this as a single query
term.
For more details -

http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters
.
If you use SolrJ, there is a ClientUtils class somewhere in the package
which has helper functions to achieve query escaping.

Cheers
Avlesh

On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton
bernadette.hough...@deakin.edu.au wrote:

Thanks for this suggestion (thanks Gerald also: no, we're not using
BlackLight-type prefixes).

I've set up an edgytext fieldType in schema.xml thus -

fieldType name=edgytext class

Re: begins with searches

2009-10-27 Thread Avlesh Singh

You are right about the parsing of query terms without a double quote
(solrQueryParser's defaultOperator has to be AND in your case). For the
problem at hand, two things -

   1. Do you have any reason for not doing a PhraseQuery (query terms
   enclosed in double quotes) on your edgytext field? If not then you can
   always enclose your query in double quotes to get expected begins with
   matches.
   2. You can always escape your query string before passing to Solr; and
   you wouldn't need to pass your query term in double quotes. For exapmle,
   search for the query string - surname, fre when escaped would be converted
   into surname,\+fre thereby asking Solr to treat this as a single query term.
   For more details -
   
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters.
   If you use SolrJ, there is a ClientUtils class somewhere in the package
   which has helper functions to achieve query escaping.

Cheers
Avlesh

On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 Thanks for this suggestion (thanks Gerald also: no, we're not using
 BlackLight-type prefixes).

 I've set up an edgytext fieldType in schema.xml thus -

 fieldType name=edgytext class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 /
  /analyzer
  analyzer type=query
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType

 And defined a field name thus -

 dynamicField name=*author_mt  type=edgytextindexed=true
  stored=true multiValued=true/

 The results are mixed -

 * searches such as surname, f and surname, fre (with quotations and
 commas) work well, retrieving surname, f, surname, Fred, surname,
 Frederick etc etc
 * searches such as the above but without quotations don't work too well as
 they get parsed as author_mt:surname + author_mt:firstname, with solr
 reading the query as author beginning with surname AND author beginning
 with firstname, which yields nil results.

 Is there an analyser that will strip the whitespace out altogether? Or
 another alternative?

 bern

 -Original Message-
 From: Avlesh Singh [mailto:avl...@gmail.com]
 Sent: Monday, 26 October 2009 6:32 PM
 To: solr-user@lucene.apache.org
 Subject: Re: begins with searches

 Read up of setting-up these kind searches here -

 http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

 Cheers
 Avlesh

 On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au wrote:

  We need to offer begins with type searches, e.g. a search for surname,
  f will retrieve surname, firstname, surname, f, surname fm etc.
 
  Ideally, the user would be able to enter something like surname f*.
 
  However, wildcards don't work on phrase searches, nor do range searches.
 
  Any suggestions as to how best to search for begins with phrases; or,
 how
  to best configure solr to support such searches?
 
  TIA
  Bernadette Houghton, Library Business Applications Developer
  Deakin University Geelong Victoria 3217 Australia.
  Phone: 03 5227 8230 International: +61 3 5227 8230
  Fax: 03 5227 8000 International: +61 3 5227 8000
  MSN: bern_hough...@hotmail.com
  Email: bernadette.hough...@deakin.edu.aumailto:
  bernadette.hough...@deakin.edu.au
  Website: http://www.deakin.edu.au
  http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B
  (Vic)
 
  Important Notice: The contents of this email are intended solely for the
  named addressee and are confidential; any unauthorised use, reproduction
 or
  storage of the contents is expressly prohibited. If you have received
 this
  email in error, please delete it and any attachments immediately and
 advise
  the sender by return email or telephone.
  Deakin University does not warrant that this email and any attachments
 are
  error or virus free

Re: begins with searches

2009-10-27 Thread Avlesh Singh

My next issue relates to how to get the results of the author field come up
in a search across all fields. For example, a search on author:Houghton, B
(which uses the edgytext) yields 16 documents, but a search on
all:Houghton, B (which doesn't) yields only 9. I thought the solution
should be copyfield source=*author_mt dest=all/ but that doesn't do
the trick.

Cheers
Avlesh

On Wed, Oct 28, 2009 at 9:35 AM, Bernadette Houghton
bernadette.hough...@deakin.edu.au wrote:

Thanks Avlesh. The issue with not doing a phrase query on my edgytext
field was that my parent application was adding an escape character to the
quotation marks, and I was hoping to fix (or rather, work around) at the
solr end to save maintenance overhead. But I've done a hack in the parent
application to remove those escape chars, and all is working well in that
respect.

My next issue relates to how to get the results of the author field come up
in a search across all fields. For example, a search on author:Houghton, B
(which uses the edgytext) yields 16 documents, but a search on
all:Houghton, B (which doesn't) yields only 9. I thought the solution
should be copyfield source=*author_mt dest=all/ but that doesn't do
the trick.

Thanks!

bern
-Original Message-
From: Avlesh Singh [mailto:avl...@gmail.com]
Sent: Tuesday, 27 October 2009 5:54 PM
To: solr-user@lucene.apache.org
Subject: Re: begins with searches

You are right about the parsing of query terms without a double quote
(solrQueryParser's defaultOperator has to be AND in your case). For the
problem at hand, two things -

1. Do you have any reason for not doing a PhraseQuery (query terms
enclosed in double quotes) on your edgytext field? If not then you can
always enclose your query in double quotes to get expected begins with
matches.
2. You can always escape your query string before passing to Solr; and
you wouldn't need to pass your query term in double quotes. For exapmle,
search for the query string - surname, fre when escaped would be
converted
into surname,\+fre thereby asking Solr to treat this as a single query
term.
For more details -

Cheers
Avlesh

On Tue, Oct 27, 2009 at 9:22 AM, Bernadette Houghton
bernadette.hough...@deakin.edu.au wrote:

Thanks for this suggestion (thanks Gerald also: no, we're not using
BlackLight-type prefixes).

I've set up an edgytext fieldType in schema.xml thus -

fieldType name=edgytext class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 /
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

And defined a field name thus -

dynamicField name=*author_mt type=edgytextindexed=true
stored=true multiValued=true/

The results are mixed -

* searches such as surname, f and surname, fre (with quotations and
commas) work well, retrieving surname, f, surname, Fred, surname,
Frederick etc etc
* searches such as the above but without quotations don't work too well
as
they get parsed as author_mt:surname + author_mt:firstname, with solr
reading the query as author beginning with surname AND author beginning
with firstname, which yields nil results.

Is there an analyser that will strip the whitespace out altogether? Or
another alternative?

bern

-Original Message-
From: Avlesh Singh [mailto:avl...@gmail.com]
Sent: Monday, 26 October 2009 6:32 PM
To: solr-user@lucene.apache.org
Subject: Re: begins with searches

Read up of setting-up these kind searches here -

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Cheers
Avlesh

On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton
bernadette.hough...@deakin.edu.au wrote:

We need to offer begins with type searches, e.g. a search for
surname,
f will retrieve surname, firstname, surname, f, surname fm etc.

Ideally, the user would be able to enter something like surname f*.

However, wildcards don't work on phrase searches, nor do range
searches.

Any suggestions as to how best to search for begins

Re: MLT cross core

2009-10-27 Thread Avlesh Singh


 Have two cores with some common fields in their schemas.  I want to perform
 a MLT query on one core and get results from the other schema. Both cores
 have same type of id.

Having the same type of id in two different cores is of no good for a MLT
handler (which in-fact operates on one core)

How is it suggested to perform a MLT query cross core / schema?  Currently,
 I only see getting the result from one core and performing a query with the
 common fields in the second core and treating those results as MLT results.

It depends on your requirement. If it is about simply aggregating the
results, then you can run MLT handler for both the cores independently and
merge the response thereafter based on your understanding of the underlying
data in the responses.

Cheers
Avlesh

On Wed, Oct 28, 2009 at 5:52 AM, Adamsky, Robert radam...@techtarget.comwrote:


 Have two cores with some common fields in their schemas.  I want to
 perform a MLT query on one core and get results from the other schema.
 Both cores have same type of id.

 I saw this thread:
 http://www.nabble.com/Does-MoreLikeThis-support-sharding--td25378654.html

 This is not quite what I am doing as this is for shards against same
 schema.

 How is it suggested to perform a MLT query cross core / schema?  Currently,
 I
 only see getting the result from one core and performing a query with the
 common
 fields in the second core and treating those results as MLT results.

Re: begins with searches

2009-10-26 Thread Avlesh Singh

Read up of setting-up these kind searches here -
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Cheers
Avlesh

On Mon, Oct 26, 2009 at 7:43 AM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 We need to offer begins with type searches, e.g. a search for surname,
 f will retrieve surname, firstname, surname, f, surname fm etc.

 Ideally, the user would be able to enter something like surname f*.

 However, wildcards don't work on phrase searches, nor do range searches.

 Any suggestions as to how best to search for begins with phrases; or, how
 to best configure solr to support such searches?

 TIA
 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: bernadette.hough...@deakin.edu.aumailto:
 bernadette.hough...@deakin.edu.au
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B
 (Vic)

 Important Notice: The contents of this email are intended solely for the
 named addressee and are confidential; any unauthorised use, reproduction or
 storage of the contents is expressly prohibited. If you have received this
 email in error, please delete it and any attachments immediately and advise
 the sender by return email or telephone.
 Deakin University does not warrant that this email and any attachments are
 error or virus free

Re: Problem searching for phrases with the word to

2009-10-26 Thread Avlesh Singh


 My guess is that Solr is treating this as a range query.  I've tried
 escaping the word To with backslashes, but it doesn't seem to make a
 difference.  Is there a way to tell Solr that to is not a special word in
 this instance?

Nope. Any occurrence of to in search term(s) does NOT cause the query to
be parsed as a RangeQuery.

You are probably doing phrase search on a text field which is analyzed for
stopwords. These stopwords are typically stored in a file called
stopwords.txt. Make sure your that the stopword analyzer is applied both
at index time and query time.

Cheers
Avlesh

On Mon, Oct 26, 2009 at 12:55 PM, mike mulvaney mike.mulva...@gmail.comwrote:

 I'm having trouble searching for phrases that have the word to in
 them.  I have a bunch of articles indexed, and I need to be able to
 search the headlines like this:

 headline:House Committee Leaders Ask FCC To Consider Spectrum in
 Broadband Plan

 When I search like that, I get no hits.  When I take out the word
 To, it finds the document:

 headline:House Committee Leaders Ask FCC

 My guess is that Solr is treating this as a range query.  I've tried
 escaping the word To with backslashes, but it doesn't seem to make a
 difference.  Is there a way to tell Solr that to is not a special
 word in this instance?

 -Mike

Re: copyField from multiple fields into one

2009-10-26 Thread Avlesh Singh

It should have worked as expected.
See if your name field is getting populated.

Cheers
Avlesh

2009/10/26 Steinar Asbjørnsen steinar...@gmail.com

 Hi all.

 I'm currently working on setting up spelling suggestion functionality.
 What I'd like is to put the values of two fields (keyword and name)into
 the spell-field.
 Something like (from schema.xml):
 field name=spell type=textSpell indexed=true stored=true
 multiValued=true/
 ...
 copyField source=keyword dest=spell/
 copyField source=name dest=spell/

 As far as i can see I only get suggestions from the keyword-field, and not
 from the name-field.

 So my question is:
 Is it possible to copy both keyword and name into the spell-field?

 Thanks,
 Steinar

Re: Retrieve Matching Term

2009-10-19 Thread Avlesh Singh

If you query looks like this -
q=(myField:aaa myField:bbb myField:ccc)
you would get the desired results for any tokenized field (e.g. text) called
myField.

Cheers
Avlesh

On Tue, Oct 20, 2009 at 6:28 AM, angry127 angry...@gmail.com wrote:


 Hi,

 Is it possible to get the matching terms from your query for each document
 returned without using highlighting.

 For example if you have the query aaa bbb ccc and one of the documents
 has
 the term aaa and another document has the term bbb and ccc.

 To have Solr return:

 Document 1: aaa
 Document 2: bbb ccc

 I was told this is possible using Term Vectors. I have not been able to
 find
 a way to do this using Term Vectors. The only reason I am against using
 highlighting is for performance reasons.

 Thanks.
 --
 View this message in context:
 http://www.nabble.com/Retrieve-Matching-Term-tp25967886p25967886.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding callback url to data import handler...Is this possible?

2009-10-15 Thread Avlesh Singh


 But a callback url is a very specific requirement. We plan to extend
 javascript support to the EventListener callback.

I would say the latter is more specific than the former.

People who are comfortable writing JAVA wouldn't need any of these but the
second best thing for others would be a capability to handle it in their own
applications. A url can be the simplest way to invoke things in respective
application. Doing it via javascript sounds like a round-about way of doing
it.

Cheers
Avlesh

2009/10/15 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 I can understand the concern that you do not wish to write Java code .
 But a callback url is a very specific requirement. We plan to extend
 javascript support to the EventListener callback . Will it help?

 On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote:
  Hmmm ... I think this is a valid use case and it might be a good idea to
  support it in someway.
  I will post this thread on the dev-mailing list to seek opinion.
 
  Cheers
  Avlesh
 
  On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.com
 wrote:
 
  Thanks, Avlesh.  Yes, I did take a look at the event listeners.  As I
  mentioned this would require us to write Java code.
 
  Our app(s) are entirely windows/asp.net/C# so while we could add Java
 in a
  pinch,  we'd prefer to stick to using SOLR using its convenient
 REST-style
  interfaces which makes no demand on our app environment.
 
  Thanks again for your suggestion!
 
  Cheers,
 
  Bill
 
  --
  From: Avlesh Singh avl...@gmail.com
  Sent: Wednesday, October 14, 2009 10:59 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Adding callback url to data import handler...Is this
 possible?
 
 
   Had a look at EventListeners in
  DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners
 
  Cheers
  Avlesh
 
  On Wed, Oct 14, 2009 at 11:21 PM, William Pierce 
 evalsi...@hotmail.com
  wrote:
 
   Folks:
 
  I am pretty happy with DIH -- it seems to work very well for my
  situation.
Thanks!!!
 
  The one issue I see has to do with the fact that I need to keep
 polling
  url/dataimport to check if the data import completed successfully.
 I
  need to know when/if the import is completed (successfully or
 otherwise)
  so
  that I can update appropriate structures in our app.
 
  What I would like is something like what Google Checkout API offers --
 a
  callback URL.  That is, I should be able to pass along a URL to DIH.
 Once
  it has completed the import, it can invoke the provided URL.  This
  provides
  a callback mechanism for those of us who don't have the liberty to
 change
  SOLR source code.  We can then do the needful upon receiving this
  callback.
 
  If this functionality is already provided in some form/fashion, I'd
 love
  to
  know.
 
  All in all, great functionality that has significantly helped me out!
 
  Cheers,
 
  - Bill
 
 
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

One more happy Solr user ...

2009-10-14 Thread Avlesh Singh

I am pleased to announce the latest release of a popular Indian local search
portal called http://www.burrp.com http://mumbai.burrp.com.
In prior versions of this web application, search was Lucene driven and we
had to write our own implementation of search facets amongst other painful
tasks.

I can't be happier to inform everyone on this list that search/suggest
features on the burrp! site are now powered by Solr.
Please use it and let me know if we can make it better.

Very soon, I'll be back to report another usage of Solr (a grand one by
scale).
Thank you Solr developers.

Cheers
Avlesh

Re: One more happy Solr user ...

2009-10-14 Thread Avlesh Singh

Ah! I knew that was coming :)
We are planning a spell-checker integration pretty soon.

Thanks for trying out the site Andrew.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 2:53 PM, Andrew McCombe eupe...@gmail.com wrote:

 Hi
 Nice site.  First search I tried was for 'italien' in 'Mumbai' which
 returned zero results.   Are you using spellcheck suggestions?

 Apart from that it's nice and fast.

 Regards
 Andrew McCombe
 iWebsolutions.co.uk


 2009/10/14 Avlesh Singh avl...@gmail.com

  I am pleased to announce the latest release of a popular Indian local
  search
  portal called http://www.burrp.com http://mumbai.burrp.com.
  In prior versions of this web application, search was Lucene driven and
 we
  had to write our own implementation of search facets amongst other
  painful
  tasks.
 
  I can't be happier to inform everyone on this list that search/suggest
  features on the burrp! site are now powered by Solr.
  Please use it and let me know if we can make it better.
 
  Very soon, I'll be back to report another usage of Solr (a grand one by
  scale).
  Thank you Solr developers.
 
  Cheers
  Avlesh

Re: One more happy Solr user ...

2009-10-14 Thread Avlesh Singh

If burrp! can keep pace with Solr enhancements, we are not too far from a 
munich.burrp.com ;)
Thanks for checking out the site, Chantal.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 4:47 PM, Chantal Ackermann 
chantal.ackerm...@btelligent.de wrote:

 Hi Avlesh,

 that is mean to sent something like that
 http://mumbai.burrp.com/pack/list/kolkata-on-a-roll
 around at lunch time - in Germany(!).

 Very very sadly, there are many places in Mumbai that have mastered the
 art of making authentic Kolkata rolls but I don't know of any here in
 Munich

 Congratulations for launching successfully!
 Chantal

 Avlesh Singh schrieb:

 I am pleased to announce the latest release of a popular Indian local
 search
 portal called http://www.burrp.com http://mumbai.burrp.com.
 In prior versions of this web application, search was Lucene driven and we
 had to write our own implementation of search facets amongst other
 painful
 tasks.

 I can't be happier to inform everyone on this list that search/suggest
 features on the burrp! site are now powered by Solr.
 Please use it and let me know if we can make it better.

 Very soon, I'll be back to report another usage of Solr (a grand one by
 scale).
 Thank you Solr developers.

 Cheers
 Avlesh

Re: Sorting on Multiple fields

2009-10-14 Thread Avlesh Singh


 Do we attempt to raise some sort of functional query to find the least
 amount of the requested price id's? This would seem to imply some playing
 around in the query handler to allow a function of this sort.

Unless I am missing something, this information can always be obtained by
post-processing the data obtained from search results. Isn't it?

Do we look at this rather than some internal method to handle the query
 and sort actions as a matter of relevancy on a calculated field? If so the
 methods of determining the fields included in the calculated field are
 alluding me at the moment. So pointers are welcome.

I really did not understand the question. Is it related to sorting of
results?

Does this ultimately involve the implementation of some sort of custom
 type and handler to do this sort of task.

If the answer to my previous question is affirmative, then yes, you would
need to implement custom sorting behavior. It can be achieved in multiple
ways depending upon your requirement. From something as simple as
function-queries to using the power of dynamic fields to writing a custom
field-type to writing a custom implementation of Lucene's Similarity .. any
of these can be a potential answer to custom sorting.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 5:53 PM, Neil Lunn neil.l...@trixan.com wrote:

 We have come up against a situation we are trying to resolve in our Solr
 implementation project. This revolves mostly around how to sort results
 from
 index data we are likely to store in multiple fields but at runtime we are
 likely to query on the result of which one is most relevant. A brief
 example:
 We have product catalog information in the index which will have multiple
 prices dependent on the user logged in and other scenarios. For
 simplification this will look something like this:

 price_id101 = 100.00
 price_id102 = 105.00
 price_id103 = 110.00
 price_id104 = 95.00
 (etc)

 What we are looking at is at runtime we want to know which one of several
 selected prices is the minimum (or maximum), but not all prices, just a
 select set of say 3 or 2 id's. The purpose we are looking at is to
 determine
 a sort order to the results. This as we would be aware approaching a SQL
 respository we would feed it some query logic to say find me the least
 amount of these set of id's, therefore the search approach here raises
 some
 questions.

 - Do we attempt to raise some sort of functional query to find the least
 amount of the requested price id's? This would seem to imply some playing
 around in the query handler to allow a function of this sort.

 - Do we look at this rather than some internal method to handle the query
 and sort actions as a matter of relevancy on a calculated field? If so the
 methods of determining the fields included in the calculated field are
 alluding me at the moment. So pointers are welcome.

 - Does this ultimately involve the implementation of some sort of custom
 type and handler to do this sort of task.

 I am open to any response as if someone has not come across a similar
 problem before and can suggest an approach we are willing to open up a
 patch
 branch or similar to do some work on the issue. Though if there are no
 suggestions this will likely move out of our current stream and into future
 development.

 Neil

Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread Avlesh Singh

Had a look at EventListeners in
DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners

Cheers
Avlesh

On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.comwrote:

 Folks:

 I am pretty happy with DIH -- it seems to work very well for my situation.
Thanks!!!

 The one issue I see has to do with the fact that I need to keep polling
 url/dataimport to check if the data import completed successfully.   I
 need to know when/if the import is completed (successfully or otherwise) so
 that I can update appropriate structures in our app.

 What I would like is something like what Google Checkout API offers -- a
 callback URL.  That is, I should be able to pass along a URL to DIH.  Once
 it has completed the import, it can invoke the provided URL.  This provides
 a callback mechanism for those of us who don't have the liberty to change
 SOLR source code.  We can then do the needful upon receiving this callback.

 If this functionality is already provided in some form/fashion, I'd love to
 know.

 All in all, great functionality that has significantly helped me out!

 Cheers,

 - Bill

Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread Avlesh Singh

Hmmm ... I think this is a valid use case and it might be a good idea to
support it in someway.
I will post this thread on the dev-mailing list to seek opinion.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.comwrote:

 Thanks, Avlesh.  Yes, I did take a look at the event listeners.  As I
 mentioned this would require us to write Java code.

 Our app(s) are entirely windows/asp.net/C# so while we could add Java in a
 pinch,  we'd prefer to stick to using SOLR using its convenient REST-style
 interfaces which makes no demand on our app environment.

 Thanks again for your suggestion!

 Cheers,

 Bill

 --
 From: Avlesh Singh avl...@gmail.com
 Sent: Wednesday, October 14, 2009 10:59 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Adding callback url to data import handler...Is this possible?


  Had a look at EventListeners in
 DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners

 Cheers
 Avlesh

 On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com
 wrote:

  Folks:

 I am pretty happy with DIH -- it seems to work very well for my
 situation.
   Thanks!!!

 The one issue I see has to do with the fact that I need to keep polling
 url/dataimport to check if the data import completed successfully. I
 need to know when/if the import is completed (successfully or otherwise)
 so
 that I can update appropriate structures in our app.

 What I would like is something like what Google Checkout API offers -- a
 callback URL.  That is, I should be able to pass along a URL to DIH. Once
 it has completed the import, it can invoke the provided URL.  This
 provides
 a callback mechanism for those of us who don't have the liberty to change
 SOLR source code.  We can then do the needful upon receiving this
 callback.

 If this functionality is already provided in some form/fashion, I'd love
 to
 know.

 All in all, great functionality that has significantly helped me out!

 Cheers,

 - Bill

Re: Dynamically compute document scores...

2009-10-13 Thread Avlesh Singh

Options -

   1. Can you pre-compute your business logic score at index time? If yes,
   then this value can be stored in some field and you can use function queries
   to use this data plus the score to return a value which you can sort upon.
   2. Take a look at -
   
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Similarity.html.
   Custom similarity implementations can be hooked up into Solr easily.

Cheers
Avlesh

On Tue, Oct 13, 2009 at 9:05 PM, William Pierce evalsi...@hotmail.comwrote:

 Folks:

 During query time, I want to dynamically compute a document score as
 follows:

  a) Take the SOLR score for the document -- call it S.
  b) Lookup the business logic score for this document.  Call it L.
  c) Compute a new score T = func(S, L)
  d) Return the documents sorted by T.

 I have looked at function queries but in my limited/quick review of it,  I
 could not see a quick way of doing this.

 Is this possible?

 Thanks,

 - Bill

Re: Scoring for specific field queries

2009-10-09 Thread Avlesh Singh

Lame question, but are you populating data in the autoCompleteHelper2 field?

Cheers
Avlesh

On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com wrote:

 The problem is, I'm getting equal scores for this:
 Query:
 q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf)

 Partial Result:

 doc
 float name=score0.7821733/float
 str name=autoCompleteHelperBikes Café/str
 /doc

 doc
 float name=score0.7821733/float
 str name=autoCompleteHelperCafe Feliy/str
 /doc

 I'm using the standard request handler with this.

 Thanks,
 Rih


 On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com wrote:

  Avlesh,
  I don't see anything wrong with the data from analysis.
 
  KeywordTokenized:
 
  *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9**
 **
  10** **11** **12** **13** **14** **15** **16** **...*
  *term text ** **th** **he** **e ** **c** **ch** **ha** **am** **mp**
 **pi*
  * **io** **on** **the** **he ** **e c** **ch** **cha** **...*
  *term type ** **word** **word** **word** **word** **word** **word**
 **word
  ** **word** **word** **word** **word** **word** **word** **word**
 **word**
  **word** **...*
  *source start,end ** **0,2** **1,3** **2,4** **3,5** **4,6** **5,7**
 **6,8
  ** **7,9** **8,10** **9,11** **10,12** **0,3** **1,4** **2,5** **3,6** **
  ...*
 
  WhitespaceTokenized:
 
  *term position ** **1** **2** **3** **4** **5** **6** **7** **8** **9**
 **
  10** **11** **...*
  *term text ** **th** **he** **the** **ch** **ha** **am** **mp** **pi** **
  io** **on** **cha** **...*
  *term type ** **word** **word** **word** **word** **word** **word**
 **word
  ** **word** **word** **word** **word** **...*
  *source start,end ** **0,2** **1,3** **0,3** **0,2** **1,3** **2,4**
 **3,5
  ** **4,6** **5,7** **6,8** **...*
 
  Is term position considered during scoring?
 
  Thanks,
  Rih
 
 
  On Fri, Oct 9, 2009 at 9:40 AM, Avlesh Singh avl...@gmail.com wrote:
 
  Use the field analysis tool to see how the data is being analyzed in
 both
  the fields.
 
  Cheers
  Avlesh
 
  On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com wrote:
 
   Hmm... I don't quite get the desired results. Those starting with
 cha
  are
   now randomly ordered. Is there something wrong with the filters I
  applied?
  
  
   On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com
 wrote:
  
Filters? I did not mean filters at all.
I am in a mad rush right now, but on the face of it your field
   definitions
look right.
   
This is what I asked for -
q=(autoComplete2:cha^10 autoComplete:cha)
   
Lemme know if this does not work for you.
   
Cheers
Avlesh
   
On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com
 wrote:
   
 Hi Avlesh,

 I can't seem to get the scores right.

 I now have these types for the fields I'm targeting,

 fieldType name=autoComplete class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
fieldType name=autoComplete2 class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

 My query is this,


   
  
 
 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0

 What should I tweak from the above config and query?

 Thanks,
 Rih


 On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com
  wrote:

  I will have to pass on this and try your suggestion first. So,
 how
   does
  your suggestion (1 and 2) boost the my startswith query? Is it
   because
of
  the n-gram filter?
 
 
 
  On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore 
sandeep.tag...@gmail.com
 wrote:
 
 
  Yes it can be done but it needs some customization. Search for
   custom
 sort
  implementations/discussions.
  You can check...
 
 

   
  
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
  .
  Let us know if you have any issues.
 
  Sandeep
 
 
  R. Tan wrote:
  
   This might work and I also have a single value field which
  makes

Re: Scoring for specific field queries

2009-10-09 Thread Avlesh Singh

Can you just do q=autoCompleteHelper2:caf to see you get results?

Cheers
Avlesh

On Fri, Oct 9, 2009 at 12:53 PM, R. Tan tanrihae...@gmail.com wrote:

 Yup, it is. Both are copied from another field called name.

 On Fri, Oct 9, 2009 at 3:15 PM, Avlesh Singh avl...@gmail.com wrote:

  Lame question, but are you populating data in the autoCompleteHelper2
  field?
 
  Cheers
  Avlesh
 
  On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com wrote:
 
   The problem is, I'm getting equal scores for this:
   Query:
   q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf)
  
   Partial Result:
  
   doc
   float name=score0.7821733/float
   str name=autoCompleteHelperBikes Café/str
   /doc
  
   doc
   float name=score0.7821733/float
   str name=autoCompleteHelperCafe Feliy/str
   /doc
  
   I'm using the standard request handler with this.
  
   Thanks,
   Rih
  
  
   On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com wrote:
  
Avlesh,
I don't see anything wrong with the data from analysis.
   
KeywordTokenized:
   
*term position ** **1** **2** **3** **4** **5** **6** **7** **8**
 **9**
   **
10** **11** **12** **13** **14** **15** **16** **...*
*term text ** **th** **he** **e ** **c** **ch** **ha** **am** **mp**
   **pi*
* **io** **on** **the** **he ** **e c** **ch** **cha** **...*
*term type ** **word** **word** **word** **word** **word** **word**
   **word
** **word** **word** **word** **word** **word** **word** **word**
   **word**
**word** **...*
*source start,end ** **0,2** **1,3** **2,4** **3,5** **4,6** **5,7**
   **6,8
** **7,9** **8,10** **9,11** **10,12** **0,3** **1,4** **2,5**
 **3,6**
  **
...*
   
WhitespaceTokenized:
   
*term position ** **1** **2** **3** **4** **5** **6** **7** **8**
 **9**
   **
10** **11** **...*
*term text ** **th** **he** **the** **ch** **ha** **am** **mp**
 **pi**
  **
io** **on** **cha** **...*
*term type ** **word** **word** **word** **word** **word** **word**
   **word
** **word** **word** **word** **word** **...*
*source start,end ** **0,2** **1,3** **0,3** **0,2** **1,3** **2,4**
   **3,5
** **4,6** **5,7** **6,8** **...*
   
Is term position considered during scoring?
   
Thanks,
Rih
   
   
On Fri, Oct 9, 2009 at 9:40 AM, Avlesh Singh avl...@gmail.com
 wrote:
   
Use the field analysis tool to see how the data is being analyzed in
   both
the fields.
   
Cheers
Avlesh
   
On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com
  wrote:
   
 Hmm... I don't quite get the desired results. Those starting with
   cha
are
 now randomly ordered. Is there something wrong with the filters I
applied?


 On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com
   wrote:

  Filters? I did not mean filters at all.
  I am in a mad rush right now, but on the face of it your field
 definitions
  look right.
 
  This is what I asked for -
  q=(autoComplete2:cha^10 autoComplete:cha)
 
  Lemme know if this does not work for you.
 
  Cheers
  Avlesh
 
  On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com
   wrote:
 
   Hi Avlesh,
  
   I can't seem to get the scores right.
  
   I now have these types for the fields I'm targeting,
  
   fieldType name=autoComplete class=solr.TextField
   positionIncrementGap=1
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.NGramFilterFactory minGramSize=1
   maxGramSize=20/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
/analyzer
  /fieldType
  fieldType name=autoComplete2 class=solr.TextField
   positionIncrementGap=1
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.NGramFilterFactory minGramSize=1
   maxGramSize=20/
/analyzer
analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
/analyzer
  /fieldType
  
   My query is this,
  
  
 

   
  
 
 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0
  
   What should I tweak from the above config and query?
  
   Thanks,
   Rih
  
  
   On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com
 
wrote:
  
I will have to pass on this and try your suggestion first.
 So,
   how
 does
your suggestion (1 and 2) boost the my

Re: Scoring for specific field queries

2009-10-09 Thread Avlesh Singh

I have a very similar set-up for my auto-suggest (I am sorry that it can't
be viewed from an external network).
I am sending you my field definitions, please use them and see if it works
out correctly.

fieldType name=autocomplete class=solr.TextField
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.EdgeNGramFilterFactory maxGramSize=100
minGramSize=1 /
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.PatternReplaceFilterFactory
pattern=^(.{20})(.*)? replacement=$1 replace=all /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

fieldType name=tokenized_autocomplete class=solr.TextField
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.EdgeNGramFilterFactory maxGramSize=100
minGramSize=1 /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.PatternReplaceFilterFactory
pattern=^(.{20})(.*)? replacement=$1 replace=all /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

field name=suggestion type=autocomplete indexed=true stored=false/
field name=tokenized_suggestion type=tokenized_autocomplete
indexed=true stored=true/

q=(suggestion:formula^2 tokenized_suggestion:formula)

Hope this helps.

Cheers
Avlesh

On Fri, Oct 9, 2009 at 1:03 PM, R. Tan tanrihae...@gmail.com wrote:

 Yeah, I do get results. Anything else I missed out?
 I want it to work like this site's auto suggest feature.

 http://www.sematext.com/demo/ac/index.html

 Try the keyword 'formula'.

 Thanks,
 Rih


 On Fri, Oct 9, 2009 at 3:24 PM, Avlesh Singh avl...@gmail.com wrote:

  Can you just do q=autoCompleteHelper2:caf to see you get results?
 
  Cheers
  Avlesh
 
  On Fri, Oct 9, 2009 at 12:53 PM, R. Tan tanrihae...@gmail.com wrote:
 
   Yup, it is. Both are copied from another field called name.
  
   On Fri, Oct 9, 2009 at 3:15 PM, Avlesh Singh avl...@gmail.com wrote:
  
Lame question, but are you populating data in the autoCompleteHelper2
field?
   
Cheers
Avlesh
   
On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com
 wrote:
   
 The problem is, I'm getting equal scores for this:
 Query:
 q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf)

 Partial Result:

 doc
 float name=score0.7821733/float
 str name=autoCompleteHelperBikes Café/str
 /doc

 doc
 float name=score0.7821733/float
 str name=autoCompleteHelperCafe Feliy/str
 /doc

 I'm using the standard request handler with this.

 Thanks,
 Rih


 On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com
  wrote:

  Avlesh,
  I don't see anything wrong with the data from analysis.
 
  KeywordTokenized:
 
  *term position ** **1** **2** **3** **4** **5** **6** **7** **8**
   **9**
 **
  10** **11** **12** **13** **14** **15** **16** **...*
  *term text ** **th** **he** **e ** **c** **ch** **ha** **am**
  **mp**
 **pi*
  * **io** **on** **the** **he ** **e c** **ch** **cha** **...*
  *term type ** **word** **word** **word** **word** **word**
 **word**
 **word
  ** **word** **word** **word** **word** **word** **word** **word**
 **word**
  **word** **...*
  *source start,end ** **0,2** **1,3** **2,4** **3,5** **4,6**
  **5,7**
 **6,8
  ** **7,9** **8,10** **9,11** **10,12** **0,3** **1,4** **2,5**
   **3,6**
**
  ...*
 
  WhitespaceTokenized:
 
  *term position ** **1** **2** **3** **4** **5** **6** **7** **8**
   **9**
 **
  10** **11** **...*
  *term text ** **th** **he** **the** **ch** **ha** **am** **mp**
   **pi**
**
  io** **on** **cha** **...*
  *term type ** **word** **word** **word** **word** **word**
 **word**
 **word
  ** **word** **word** **word** **word** **...*
  *source start,end ** **0,2** **1,3** **0,3** **0,2** **1,3**
  **2,4**
 **3,5
  ** **4,6** **5,7** **6,8** **...*
 
  Is term position considered during scoring?
 
  Thanks,
  Rih
 
 
  On Fri, Oct 9, 2009 at 9:40 AM, Avlesh Singh avl...@gmail.com
   wrote:
 
  Use

Re: Scoring for specific field queries

2009-10-09 Thread Avlesh Singh


 What are the replacements for, the special character and 20 char?

I had no time to diff between your definitions and mine. Copy-pasting mine
was easier :)

Also, do you get results such as  formula?

The autocomplete field would definitely not match this query, but the
tokenized autocomplete would.
Give it a shot, it should work as you expect it to.

Cheers
Avlesh

On Fri, Oct 9, 2009 at 1:25 PM, R. Tan tanrihae...@gmail.com wrote:

 Thanks, I'll give this a go. What are the replacements for, the special
 character and 20 char? Also, do you get results such as  formula?

 On Fri, Oct 9, 2009 at 3:45 PM, Avlesh Singh avl...@gmail.com wrote:

  I have a very similar set-up for my auto-suggest (I am sorry that it
 can't
  be viewed from an external network).
  I am sending you my field definitions, please use them and see if it
 works
  out correctly.
 
  fieldType name=autocomplete class=solr.TextField
  analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
  filter class=solr.PatternReplaceFilterFactory
  pattern=([^a-z0-9]) replacement= replace=all /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.EdgeNGramFilterFactory maxGramSize=100
  minGramSize=1 /
  /analyzer
 analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
  filter class=solr.PatternReplaceFilterFactory
  pattern=([^a-z0-9]) replacement= replace=all /
 filter class=solr.PatternReplaceFilterFactory
  pattern=^(.{20})(.*)? replacement=$1 replace=all /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
  /fieldType
 
  fieldType name=tokenized_autocomplete class=solr.TextField
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.EdgeNGramFilterFactory maxGramSize=100
  minGramSize=1 /
  /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
  filter class=solr.PatternReplaceFilterFactory
  pattern=([^a-z0-9]) replacement= replace=all /
 filter class=solr.PatternReplaceFilterFactory
  pattern=^(.{20})(.*)? replacement=$1 replace=all /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
  /fieldType
 
  field name=suggestion type=autocomplete indexed=true
  stored=false/
  field name=tokenized_suggestion type=tokenized_autocomplete
  indexed=true stored=true/
 
  q=(suggestion:formula^2 tokenized_suggestion:formula)
 
  Hope this helps.
 
  Cheers
  Avlesh
 
  On Fri, Oct 9, 2009 at 1:03 PM, R. Tan tanrihae...@gmail.com wrote:
 
   Yeah, I do get results. Anything else I missed out?
   I want it to work like this site's auto suggest feature.
  
   http://www.sematext.com/demo/ac/index.html
  
   Try the keyword 'formula'.
  
   Thanks,
   Rih
  
  
   On Fri, Oct 9, 2009 at 3:24 PM, Avlesh Singh avl...@gmail.com wrote:
  
Can you just do q=autoCompleteHelper2:caf to see you get results?
   
Cheers
Avlesh
   
On Fri, Oct 9, 2009 at 12:53 PM, R. Tan tanrihae...@gmail.com
 wrote:
   
 Yup, it is. Both are copied from another field called name.

 On Fri, Oct 9, 2009 at 3:15 PM, Avlesh Singh avl...@gmail.com
  wrote:

  Lame question, but are you populating data in the
  autoCompleteHelper2
  field?
 
  Cheers
  Avlesh
 
  On Fri, Oct 9, 2009 at 12:36 PM, R. Tan tanrihae...@gmail.com
   wrote:
 
   The problem is, I'm getting equal scores for this:
   Query:
   q=(autoCompleteHelper2:caf^10.0 autoCompleteHelper:caf)
  
   Partial Result:
  
   doc
   float name=score0.7821733/float
   str name=autoCompleteHelperBikes Café/str
   /doc
  
   doc
   float name=score0.7821733/float
   str name=autoCompleteHelperCafe Feliy/str
   /doc
  
   I'm using the standard request handler with this.
  
   Thanks,
   Rih
  
  
   On Fri, Oct 9, 2009 at 3:02 PM, R. Tan tanrihae...@gmail.com
wrote:
  
Avlesh,
I don't see anything wrong with the data from analysis.
   
KeywordTokenized:
   
*term position ** **1** **2** **3** **4** **5** **6** **7**
  **8**
 **9**
   **
10** **11** **12** **13** **14** **15** **16** **...*
*term text ** **th** **he** **e ** **c** **ch** **ha** **am**
**mp**
   **pi*
* **io** **on** **the** **he ** **e c** **ch** **cha** **...*
*term type ** **word** **word** **word** **word** **word**
   **word**
   **word
** **word** **word** **word** **word** **word** **word**
  **word**
   **word**
**word

Re: Scoring for specific field queries

2009-10-08 Thread Avlesh Singh

Filters? I did not mean filters at all.
I am in a mad rush right now, but on the face of it your field definitions
look right.

This is what I asked for -
q=(autoComplete2:cha^10 autoComplete:cha)

Lemme know if this does not work for you.

Cheers
Avlesh

On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote:

 Hi Avlesh,

 I can't seem to get the scores right.

 I now have these types for the fields I'm targeting,

 fieldType name=autoComplete class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
fieldType name=autoComplete2 class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

 My query is this,

 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0

 What should I tweak from the above config and query?

 Thanks,
 Rih


 On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:

  I will have to pass on this and try your suggestion first. So, how does
  your suggestion (1 and 2) boost the my startswith query? Is it because of
  the n-gram filter?
 
 
 
  On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com
 wrote:
 
 
  Yes it can be done but it needs some customization. Search for custom
 sort
  implementations/discussions.
  You can check...
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
  .
  Let us know if you have any issues.
 
  Sandeep
 
 
  R. Tan wrote:
  
   This might work and I also have a single value field which makes it
   cleaner.
   Can sort be customized (with indexOf()) from the solr parameters
 alone?
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: correct syntax for boolean search

2009-10-08 Thread Avlesh Singh

q=+fieldname1:(+(word_a1 word_b1) +(word_a2 word_b2) +(word_a3 word_b3))
+fieldname2:...

Cheers
Avlesh

On Thu, Oct 8, 2009 at 7:40 PM, Elaine Li elaine.bing...@gmail.com wrote:

 Hi,

 What is the correct syntax for the following boolean search from a field?

 fieldname1:(word_a1 or word_b1)  (word_a2 or word_b2)  (word_a3 or
 word_b3)  fieldname2:.

 Thanks.

 Elaine

Re: Scoring for specific field queries

2009-10-08 Thread Avlesh Singh

Use the field analysis tool to see how the data is being analyzed in both
the fields.

Cheers
Avlesh

On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com wrote:

Hmm... I don't quite get the desired results. Those starting with cha are
now randomly ordered. Is there something wrong with the filters I applied?

On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote:

Filters? I did not mean filters at all.
I am in a mad rush right now, but on the face of it your field
definitions
look right.

This is what I asked for -
q=(autoComplete2:cha^10 autoComplete:cha)

Lemme know if this does not work for you.

Cheers
Avlesh

On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote:

Hi Avlesh,

I can't seem to get the scores right.

I now have these types for the fields I'm targeting,

fieldType name=autoComplete class=solr.TextField
positionIncrementGap=1
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
maxGramSize=20/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType
fieldType name=autoComplete2 class=solr.TextField
positionIncrementGap=1
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
maxGramSize=20/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

My query is this,

q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0

What should I tweak from the above config and query?

Thanks,
Rih

On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:

I will have to pass on this and try your suggestion first. So, how
does
your suggestion (1 and 2) boost the my startswith query? Is it
because
of
the n-gram filter?

On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore
sandeep.tag...@gmail.com
wrote:

Yes it can be done but it needs some customization. Search for
custom
sort
implementations/discussions.
You can check...

http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
.
Let us know if you have any issues.

Sandeep

R. Tan wrote:

This might work and I also have a single value field which makes
it
cleaner.
Can sort be customized (with indexOf()) from the solr parameters
alone?

--
View this message in context:

http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re : Questions about synonyms and highlighting

2009-10-07 Thread Avlesh Singh


 4 - the same question for highlighting with lemmatisation?
 Settings for manage (all highlighted) == the two wordsemmanage/em
 and
 emmanagement/em are highlighted
 Settings for manage == the first word emmanage/em is highlighted
 but
 not the second  : management


There is no Lemmatisation support in Solr as of now. The only support you
get is stemming.
Let me understand this correctly - you basically want the searches to happen
with stemmed base but want to selectively highlight the original and/or
stemmed words. Right? If yes, then AFAIK, this is not possible. Search
passes through your fields analyzers (tokenizers and filters). Highlighters,
typically, use the same set of analyzers and the behavior will be the same
as in search; this essentially means that the keywords manage, managing,
management and manager are REDUCED to manage for searchers and
highlighters.
If this can be done, then the only place to enable your feature could be
Lucene highlighter api's. Someone more knowledegable can tell you, if that
is possible.

I have no idea about your #3, though my idea of handling accentuation is to
apply a  ISOLatin1AccentFilterFactory and get rid of them altogether :)
I am curious to know the answer though.

Cheers
Avlesh

On Wed, Oct 7, 2009 at 3:17 PM, Nourredine K. nourredin...@yahoo.comwrote:

  I'm not an expert on hit highlighting but please find some answers
 inline:

 Thanks Shalin for your answers. It helps a lot.

 I post again questions #3 and #4 for the others :)


 3 - Is it possible and if so How can I configure solR to set or not
 highlighting
 for tokens with diacritics ?


 Settings for vélo (all highlighted) == the two words emvélo/em and
 emvelo/em are highlighted
 Settings for vélo == the first word emvélo/em is highlighted but
 not
 the second  : velo


 4 - the same question for highlighting with lemmatisation?


 Settings for manage (all highlighted) == the two wordsemmanage/em
 and
 emmanagement/em are highlighted
 Settings for manage == the first word emmanage/em is highlighted
 but
 not the second  : management
 Regard,

 Nourredine.

Re: Facet query pb

2009-10-07 Thread Avlesh Singh

I have no idea what pb mean but this is what you probably want -
fq=(location_field:(NORTH AMERICA*))

Cheers
Avlesh

On Wed, Oct 7, 2009 at 10:40 PM, clico cl...@mairie-marseille.fr wrote:


 Hello
 I have a pb trying to retrieve a tree with facet use

 I 've got a field location_field
 Each doc in my index has a location_field

 Location field can be
 continent/country/city


 I have 2 queries:

 http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29:
  ok, retrieve docs

 http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*)
  : not ok


 I think with NORTH AMERICA I have a pb with the space caractere

 Could u help me



 --
 View this message in context:
 http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scoring for specific field queries

2009-10-07 Thread Avlesh Singh

You would need to boost your startswith matches artificially for the
desired behavior.
I would do it this way -

   1. Create a KeywordTokenized field with n-gram filter.
   2. Create a Whitespace tokenized field with n-gram flter.
   3. Search on both the fields, boost matches for #1 over #2.

Hope this helps.

Cheers
Avlesh

On Thu, Oct 8, 2009 at 10:30 AM, R. Tan tanrihae...@gmail.com wrote:

 Hi,
 How can I get wildcard search (e.g. cha*) to score documents based on the
 position of the keyword in a field? Closer (to the start) means higher
 score.

 For example, I have multiple documents with titles containing the word
 champion. Some of the document titles start with the word champion and
 some our entitled we are the champions. The ones that starts with the
 keyword needs to rank first or score higher. Is there a way to do this? I'm
 using this query for auto-suggest term feature where the keyword doesn't
 necessarily need to be the first word.

 Rihaed

Re: Scoring for specific field queries

2009-10-07 Thread Avlesh Singh


 I guess we don't need to depend on scores all the times.
 You can use custom sort to sort the results. Take a dynamicField, fill it
 with indexOf(keyword) value, sort the results by the field in ascending
 order. Then the records which contain the keyword at the earlier position
 will come first.

Warning: This is a bad idea for multiple reasons:

   1. If the word computer occurs in multiple times in a document what
   would you do in that case? Is this dynamic field supposed to be multivalued?
   I can't even imagine what would you do if the word computer occurs in
   multiple documents multiple times?
   2. Multivalued fields cannot be sorted upon.
   3. One needs to know the unique number of such keywords before
   implementing because you'll potentially end up creating those many fields.

Cheers
Avlesh

On Thu, Oct 8, 2009 at 11:10 AM, Sandeep Tagore sandeep.tag...@gmail.comwrote:


 Hi Rihaed,
 I guess we don't need to depend on scores all the times.
 You can use custom sort to sort the results. Take a dynamicField, fill it
 with indexOf(keyword) value, sort the results by the field in ascending
 order. Then the records which contain the keyword at the earlier position
 will come first.

 Regards,
 Sandeep


 R. Tan wrote:
 
  Hi,
  How can I get wildcard search (e.g. cha*) to score documents based on the
  position of the keyword in a field? Closer (to the start) means higher
  score.
 

 --
 View this message in context:
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798657.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re : wildcard searches

2009-10-06 Thread Avlesh Singh

You are processing your tokens in the filter that you wrote. I am assuming
it is the first filter being applied and removes the character 'h' from
tokens. When you are doing that, you can preserve the original token in the
same field as well. Because as of now, you are simply removing the
character. Subsequent filters don't even know that there was an 'h'
character in the original token.

Since wild card queries are not analyzed, the 'h' character in the query
hésita* does NOT get removed during query time. This means that unless the
original token was preserved in the field it wouldn't find any matches.

This helps?

Cheers
Avlesh

On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice lbil...@yahoo.fr wrote:

 Hi.

 Thanks for your answers Christian and Avlesh.

 But I don't understant what you mean by :
 If you want to enable wildcard queries, preserving the original token
 (while processing each token in your filter) might work.

 Could you explain this point please ?

 Laurent





 
 De : Avlesh Singh avl...@gmail.com
 À : solr-user@lucene.apache.org
 Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s
 Objet : Re: wildcard searches

 Zambrano is right, Laurent. The analyzers for a field are not invoked for
 wildcard queries. You custom filter is not even getting executed at
 query-time.
 If you want to enable wildcard queries, preserving the original token
 (while
 processing each token in your filter) might work.

 Cheers
 Avlesh

 On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice lbil...@yahoo.fr wrote:

  Hi everyone,
 
  I have a little question regarding the search engine when a wildcard
  character is used in the query.
  Let's take the following example :
 
  - I have sent in indexation the word Hésitation (with an accent on the
 e)
  - The filters applied to the field that will handle this word, result in
  the indexation of esit (the mute H is suppressed (home made filter),
 the
  accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
  ation.
 
  When i search for hesitation, esitation, ésitation etc ... all is
 OK,
  the document is returned.
  But as soon as I use a wildcard, like hésita*, the document is not
  returned. In fact, I have to put the wildcard in a manner that match the
  indexed term exactly (example esi*)
 
  Does the search engine applies the filters to the word that prefix the
  wildcard ? Or does it use this prefix verbatim ?
 
  Thanks for you help.
 
  Laurent

Re: Re : Re : wildcard searches

2009-10-06 Thread Avlesh Singh

You are right, Angel. The problem would still persist.
Why don't you consider putting the original data in some field. While
querying, you can query on both the fields - analyzed and original one.
Wildcard queries will not give you any results from the analyzed field but
would match the data in your original field.

Works?

Cheers
Avlesh

On Tue, Oct 6, 2009 at 2:27 PM, Angel Ice lbil...@yahoo.fr wrote:

 Ah yes, got it.
 But i'm not sure this will solve my problem.
 Because, I'm aloso using the IsoLatin1 filter, that remove the accentued
 characters.
 So I will have the same problem with accentued characters. Cause the
 original token is not stored with this filter.

 Laurent






 
 De : Avlesh Singh avl...@gmail.com
 À : solr-user@lucene.apache.org
 Envoyé le : Mardi, 6 Octobre 2009, 10h41mn 56s
 Objet : Re: Re : wildcard searches

 You are processing your tokens in the filter that you wrote. I am assuming
 it is the first filter being applied and removes the character 'h' from
 tokens. When you are doing that, you can preserve the original token in the
 same field as well. Because as of now, you are simply removing the
 character. Subsequent filters don't even know that there was an 'h'
 character in the original token.

 Since wild card queries are not analyzed, the 'h' character in the query
 hésita* does NOT get removed during query time. This means that unless
 the
 original token was preserved in the field it wouldn't find any matches.

 This helps?

 Cheers
 Avlesh

 On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice lbil...@yahoo.fr wrote:

  Hi.
 
  Thanks for your answers Christian and Avlesh.
 
  But I don't understant what you mean by :
  If you want to enable wildcard queries, preserving the original token
  (while processing each token in your filter) might work.
 
  Could you explain this point please ?
 
  Laurent
 
 
 
 
 
  
  De : Avlesh Singh avl...@gmail.com
  À : solr-user@lucene.apache.org
  Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s
  Objet : Re: wildcard searches
 
  Zambrano is right, Laurent. The analyzers for a field are not invoked for
  wildcard queries. You custom filter is not even getting executed at
  query-time.
  If you want to enable wildcard queries, preserving the original token
  (while
  processing each token in your filter) might work.
 
  Cheers
  Avlesh
 
  On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice lbil...@yahoo.fr wrote:
 
   Hi everyone,
  
   I have a little question regarding the search engine when a wildcard
   character is used in the query.
   Let's take the following example :
  
   - I have sent in indexation the word Hésitation (with an accent on the
  e)
   - The filters applied to the field that will handle this word, result
 in
   the indexation of esit (the mute H is suppressed (home made filter),
  the
   accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
   ation.
  
   When i search for hesitation, esitation, ésitation etc ... all is
  OK,
   the document is returned.
   But as soon as I use a wildcard, like hésita*, the document is not
   returned. In fact, I have to put the wildcard in a manner that match
 the
   indexed term exactly (example esi*)
  
   Does the search engine applies the filters to the word that prefix the
   wildcard ? Or does it use this prefix verbatim ?
  
   Thanks for you help.
  
   Laurent

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh


 We have indexed a product database and have come across some search terms
 where zero results are returned.  There are products in the index with
 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title.  Searches for
 'Borderland'  or 'Border Land' and 'Dragon Fly' return zero results
 respectively.

Borderland should have worked for a regular text field. For all other
desired matches you can use EdgeNGramTokenizerFactory.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 7:51 PM, Andrew McCombe eupe...@gmail.com wrote:

 Hi
 I am hoping someone can point me in the right direction with regards to
 indexing words that are concatenated together to make other words or
 product
 names.

 We have indexed a product database and have come across some search terms
 where zero results are returned.  There are products in the index with
 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title.  Searches for
 'Borderland'  or 'Border Land' and 'Dragon Fly' return zero results
 respectively.

 Where do I look to resolve this?  The product name field is indexed using a
 text field type.

 Thanks in advance
 Andrew

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh


 Using synonyms might be a better solution because the use of
 EdgeNGramTokenizerFactory has the potential of creating a large number of
 token which will artificially increase the number of tokens in the index
 which in turn will affect the IDF score.

Well, I don't see a reason as to why someone would need a length based
normalization on such matches. I always have done omitNorms while using
fields with this filter.

Yes, synonyms might an answer when you have limited number of such words
(phrases) and their possible combinations.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 10:32 PM, Christian Zambrano czamb...@gmail.comwrote:

 Using synonyms might be a better solution because the use of
 EdgeNGramTokenizerFactory has the potential of creating a large number of
 token which will artificially increase the number of tokens in the index
 which in turn will affect the IDF score.

 A query for borderland should have returned results though. It is
 difficult to troubleshoot why it didn't without knowing what query you used,
 and what kind of analysis is taking place.

 Have you tried using the analysis page on the admin section to see what
 tokens gets generated for 'Borderlands'?

 Christian


 On 10/05/2009 11:01 AM, Avlesh Singh wrote:

 We have indexed a product database and have come across some search terms
 where zero results are returned.  There are products in the index with
 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title.  Searches for
 'Borderland'  or 'Border Land' and 'Dragon Fly' return zero results
 respectively.



 Borderland should have worked for a regular text field. For all other
 desired matches you can use EdgeNGramTokenizerFactory.

 Cheers
 Avlesh

 On Mon, Oct 5, 2009 at 7:51 PM, Andrew McCombeeupe...@gmail.com  wrote:



 Hi
 I am hoping someone can point me in the right direction with regards to
 indexing words that are concatenated together to make other words or
 product
 names.

 We have indexed a product database and have come across some search terms
 where zero results are returned.  There are products in the index with
 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title.  Searches for
 'Borderland'  or 'Border Land' and 'Dragon Fly' return zero results
 respectively.

 Where do I look to resolve this?  The product name field is indexed using
 a
 text field type.

 Thanks in advance
 Andrew

Re: wildcard searches

2009-10-05 Thread Avlesh Singh


 No filters are applied to wildcard/fuzzy searches.

Ah! Not like that ..
I guess, it is just that the phrase searches using wildcards are not
analyzed.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 10:42 PM, Christian Zambrano czamb...@gmail.comwrote:

 No filters are applied to wildcard/fuzzy searches.

 I couldn't find a reference to this on either the solr or lucene
 documentation but I read it on the Solr book from PACKT


 On 10/05/2009 12:09 PM, Angel Ice wrote:

 Hi everyone,

 I have a little question regarding the search engine when a wildcard
 character is used in the query.
 Let's take the following example :

 - I have sent in indexation the word Hésitation (with an accent on the
 e)
 - The filters applied to the field that will handle this word, result in
 the indexation of esit (the mute H is suppressed (home made filter), the
 accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
 ation.

 When i search for hesitation, esitation, ésitation etc ... all is
 OK, the document is returned.
 But as soon as I use a wildcard, like hésita*, the document is not
 returned. In fact, I have to put the wildcard in a manner that match the
 indexed term exactly (example esi*)

 Does the search engine applies the filters to the word that prefix the
 wildcard ? Or does it use this prefix verbatim ?

 Thanks for you help.

 Laurent

Re: wildcard searches

2009-10-05 Thread Avlesh Singh

First of all, I know of no way of doing wildcard phrase queries.

http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_combine_wildcard_and_phrase_search.2C_e.g._.22foo_ba.2A.22.3F

When I said not filters, I meant TokenFilters which is what I believe you
mean by 'not analyzed'

Analysis is a Lucene way of configuring tokenizers and filters for a field
(index time and query time). I guess, both of us mean the same thing.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 11:04 PM, Christian Zambrano czamb...@gmail.comwrote:

Avlesh, I don't understand your answer.

First of all, I know of no way of doing wildcard phrase queries.

When I said not filters, I meant TokenFilters which is what I believe you
mean by 'not analyzed'

On 10/05/2009 12:27 PM, Avlesh Singh wrote:

No filters are applied to wildcard/fuzzy searches.

Ah! Not like that ..
I guess, it is just that the phrase searches using wildcards are not
analyzed.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 10:42 PM, Christian Zambranoczamb...@gmail.com
wrote:

No filters are applied to wildcard/fuzzy searches.

I couldn't find a reference to this on either the solr or lucene
documentation but I read it on the Solr book from PACKT

On 10/05/2009 12:09 PM, Angel Ice wrote:

Hi everyone,

I have a little question regarding the search engine when a wildcard
character is used in the query.
Let's take the following example :

- I have sent in indexation the word Hésitation (with an accent on the
e)
- The filters applied to the field that will handle this word, result in
the indexation of esit (the mute H is suppressed (home made filter),
the
accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
ation.

When i search for hesitation, esitation, ésitation etc ... all is
OK, the document is returned.
But as soon as I use a wildcard, like hésita*, the document is not
returned. In fact, I have to put the wildcard in a manner that match the
indexed term exactly (example esi*)

Does the search engine applies the filters to the word that prefix the
wildcard ? Or does it use this prefix verbatim ?

Thanks for you help.

Laurent

Re: wildcard searches

2009-10-05 Thread Avlesh Singh

Zambrano is right, Laurent. The analyzers for a field are not invoked for
wildcard queries. You custom filter is not even getting executed at
query-time.
If you want to enable wildcard queries, preserving the original token (while
processing each token in your filter) might work.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice lbil...@yahoo.fr wrote:

 Hi everyone,

 I have a little question regarding the search engine when a wildcard
 character is used in the query.
 Let's take the following example :

 - I have sent in indexation the word Hésitation (with an accent on the e)
 - The filters applied to the field that will handle this word, result in
 the indexation of esit (the mute H is suppressed (home made filter), the
 accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
 ation.

 When i search for hesitation, esitation, ésitation etc ... all is OK,
 the document is returned.
 But as soon as I use a wildcard, like hésita*, the document is not
 returned. In fact, I have to put the wildcard in a manner that match the
 indexed term exactly (example esi*)

 Does the search engine applies the filters to the word that prefix the
 wildcard ? Or does it use this prefix verbatim ?

 Thanks for you help.

 Laurent

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh

Zambrano, I was too quick to respond to your idf explanation. I definitely
did not mean that idf and length-norms are the same thing.

Andrew, this is how i would have done it -
First, I would create a field called prefix_text as undeneath in my
schema.xml
fieldType name=prefix_text class=solr.TextField
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.EdgeNGramFilterFactory maxGramSize=100
minGramSize=1/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all/
filter class=solr.PatternReplaceFilterFactory
pattern=^(.{20})(.*)? replacement=$1 replace=all/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

Second, I would declare a field of this and populate the same (using
copyField) while indexing.

Third, while querying I would query on the both the fields. I would boost
the matches for original field to a large extent over the n-grammed field.
Scenarios where Dragon Fly is expected to match against Dragonfly in the
index, query on the original field would not give you any matches, thereby
bringing results from the prefix_token field right there on top.

Hope this helps.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 11:10 PM, Christian Zambrano czamb...@gmail.comwrote:

 Would you mind explaining how omitNorm has any effect on the IDF problem I
 described earlier?

 I agree with your second sentence. I had to use the NGramTokenFilter to
 accommodate partial matches.


 On 10/05/2009 12:11 PM, Avlesh Singh wrote:

 Using synonyms might be a better solution because the use of
 EdgeNGramTokenizerFactory has the potential of creating a large number of
 token which will artificially increase the number of tokens in the index
 which in turn will affect the IDF score.



 Well, I don't see a reason as to why someone would need a length based
 normalization on such matches. I always have done omitNorms while using
 fields with this filter.

 Yes, synonyms might an answer when you have limited number of such words
 (phrases) and their possible combinations.

 Cheers
 Avlesh

 On Mon, Oct 5, 2009 at 10:32 PM, Christian Zambranoczamb...@gmail.com
 wrote:



 Using synonyms might be a better solution because the use of
 EdgeNGramTokenizerFactory has the potential of creating a large number of
 token which will artificially increase the number of tokens in the index
 which in turn will affect the IDF score.

 A query for borderland should have returned results though. It is
 difficult to troubleshoot why it didn't without knowing what query you
 used,
 and what kind of analysis is taking place.

 Have you tried using the analysis page on the admin section to see what
 tokens gets generated for 'Borderlands'?

 Christian


 On 10/05/2009 11:01 AM, Avlesh Singh wrote:



 We have indexed a product database and have come across some search
 terms


 where zero results are returned.  There are products in the index with
 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title.  Searches for
 'Borderland'  or 'Border Land' and 'Dragon Fly' return zero results
 respectively.





 Borderland should have worked for a regular text field. For all other
 desired matches you can use EdgeNGramTokenizerFactory.

 Cheers
 Avlesh

 On Mon, Oct 5, 2009 at 7:51 PM, Andrew McCombeeupe...@gmail.com
 wrote:





 Hi
 I am hoping someone can point me in the right direction with regards to
 indexing words that are concatenated together to make other words or
 product
 names.

 We have indexed a product database and have come across some search
 terms
 where zero results are returned.  There are products in the index with
 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title.  Searches for
 'Borderland'  or 'Border Land' and 'Dragon Fly' return zero results
 respectively.

 Where do I look to resolve this?  The product name field is indexed
 using
 a
 text field type.

 Thanks in advance
 Andrew

Highlighting bean properties using DocumentObjectBinder - New feature?

2009-10-04 Thread Avlesh Singh

Like most others, I use SolrJ and bind my beans with @Field annotations to
read responses from Solr.
For highlighting these properties in my bean, I always write a separate
piece - Get the list of highlights from response and then use the
MapfieldName, Listhighlights to put them back in my original bean.

This evening, I tried creating an @Highlight annotation and modified the
DocumentObjectBinder to understand this attribute (with a bunch of other
properties).

This is how it works:
You can annotate your beans with @Highlight as underneath.

class MyBean{
  @Field
  @Highlight
  String name;

  @Field (solr_category_field_name)
  ListString categories;

  @Highlight (solr_category_field_name)
  ListString highlightedCategories

  @Field
  float score;

  ...
}

and use QueryResponse#getBeans(MyBean.class) to achieve both - object
binding as well as highlighting.
I was wondering if this can be of help to most users or not. Can this be a
possible enhancement in DocumentObjectBinder? If yes, I can write a patch.

Cheers
Avlesh

Re: Usage of Sort and fq

2009-09-29 Thread Avlesh Singh

/?q=*:*fq:category:animalsort=child_count%20asc

Search for all documents (of animals), and filter the ones that belong to
the category animal and sort ascending by a field called child_count that
contains number of children for each animal.

You can pass multiple fq's with more fq=... parameters. Secondary,
tertiary sorts can be specified using comma (,) as the separator. i.e.
sort=fieldA asc,fieldB desc, fieldC asc, ...

Cheers
Avlesh

On Tue, Sep 29, 2009 at 3:51 PM, bhaskar chandrasekar
bas_s...@yahoo.co.inwrote:

 Hi,

 Can some one let me know how to use sort and fq parameters in Solr.
 Any examples woould be appreciated.

 Regards
 Bhaskar

Re: Questions on RandomSortField

2009-09-29 Thread Avlesh Singh

Thanks Hoss!
The approach that I explained in my subsequent email works like a charm.

Cheers
Avlesh

On Wed, Sep 30, 2009 at 3:45 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : The question was either non-trivial or heavily uninteresting! No replies
 yet

 it's pretty non-trivial, and pretty interesting, but i'm also pretty
 behind on my solr-user email.

 I don't think there's anyway to do what you wanted without a custom
 plugin, so your efforts weren't in vain ... if we add the abiliity to sort
 by a ValueSource (aka function ... there's a Jira issue for this
 somewhere) then you could also do witha combination of functions so that
 anything in your category gets flattened to an extremely high constant,
 and everything else has a real score -- then a secondary sort on a random
 field would effectively only randomize the things in your category ... but
 we're not there yet.

 : Hoss, I have a small question (RandomSortField bears your signature) -
 Any
 : reason as to why RandomSortField#hash() and RandomSortField#getSeed()
 : methods are private? Having them public would have saved myself from
 : owning a copy in my class as well.

 just a general principle of API future-proofing: keep internals private
 unless you explicitly think through how subclasses will use them.

 I haven't thought it through all the way, but do you really need to copy
 everything?  couldn't you get the SortField/Comparator from super and
 only delegate to it if the categories both match your specific categoryId?



 -Hoss

Re: Regular expression not working

2009-09-28 Thread Avlesh Singh

Such questions are better answered on the user mailing list. You don't need
to post them on the dev list.
What matches an incoming query is largely a function of your field type
definition and the way you analyze your field data query time and index
time.

Copy-paste your field and its type definition in schema.xml

Cheers
Avlesh

On Mon, Sep 28, 2009 at 8:56 PM, Siddhartha Pahade pahade@gmail.comwrote:

 Hi guys,

 My search result is Gilmore Girls

 If I search on Gilmore, it gives me result Gilmore Girls in the output as
 desired.

 However, if I search on string gilmore* or gilm , it does not work whereas
 we want it to work.

 Any help highly appreciated.


 Thanks!

Re: Unsubscribe from this mailing-list

2009-09-25 Thread Avlesh Singh

You seem to be desperate to get out of the Solr mailing list :)
Send an email to solr-user-unsubscr...@lucene.apache.org

Cheers
Avlesh

On Fri, Sep 25, 2009 at 11:54 AM, Rafeek Raja rafeek.r...@gmail.com wrote:

 Unsubscribe from this mailing-list

Highlighting on text fields

2009-09-25 Thread Avlesh Singh

I am new to the whole highlighting API and have a few basic questions:
I have a text type field defined as underneath:
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

And the schema field is associated as follows:
field name=text_entity_name type=text indexed=true stored=false/

My query, q=text_entity_name:(foo bar)hl=truehl.fl=text_entity_name
work fine for the search part but not for highlighting. The highlight named
list is empty for each document returned back.

I have a unique key defined. What am I missing? Do I need to store term
vectors for highlighting to work properly?

Cheers
Avlesh

Highlighting not working on a prefix_token field

2009-09-23 Thread Avlesh Singh

I have a prefix_token field defined as underneath in my schema.xml

fieldType name=prefix_token class=solr.TextField
positionIncrementGap=1
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=20/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

Searches on the field work fine and as expected.
However, attempts to highlight on this field does not yield any results.
Highlighting on other fields work fine.

Any clues? I am using Solr 1.3

Cheers
Avlesh

Re: Highlighting not working on a prefix_token field

2009-09-23 Thread Avlesh Singh

Hmmm .. But ngrams with KeywordTokenizerFactory instead of the
WhitespaceTokenizerFactory work just as fine. Related issues?

Cheers
Avlesh

On Wed, Sep 23, 2009 at 12:27 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Sep 23, 2009 at 12:23 PM, Avlesh Singh avl...@gmail.com wrote:

  I have a prefix_token field defined as underneath in my schema.xml
 
  fieldType name=prefix_token class=solr.TextField
  positionIncrementGap=1
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
  maxGramSize=20/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 /analyzer
  /fieldType
 
  Searches on the field work fine and as expected.
  However, attempts to highlight on this field does not yield any results.
  Highlighting on other fields work fine.
 


 Won't work until SOLR-1268 comes along.


 http://www.lucidimagination.com/search/document/4da480fe3eb0e7e4/highlighting_in_stemmed_or_n_grammed_fields_possible

 --
 Regards,
 Shalin Shekhar Mangar.

Re: Highlighting not working on a prefix_token field

2009-09-23 Thread Avlesh Singh


 I'm sorry I don't understand the question. Do you mean to say that
 highlighting works with one but not with another?

Yes.

Cheers
Avlesh

On Wed, Sep 23, 2009 at 12:59 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Sep 23, 2009 at 12:31 PM, Avlesh Singh avl...@gmail.com wrote:

  Hmmm .. But ngrams with KeywordTokenizerFactory instead of the
  WhitespaceTokenizerFactory work just as fine. Related issues?
 
 
 I'm sorry I don't understand the question. Do you mean to say that
 highlighting works with one but not with another?

 --
 Regards,
 Shalin Shekhar Mangar.

Re: Overlapping zipcodes

2009-09-21 Thread Avlesh Singh

Range queries?

Cheers
Avlesh

On Mon, Sep 21, 2009 at 2:57 PM, Anders Melchiorsen m...@spoon.kalibalik.dk
 wrote:

 We are in a situation where we are trying to match up documents based on a
 number of zipcodes.

 In our case, zipcodes are just integers, so that hopefully simplifies
 things.


 So, we might have a document listing a number of zipcodes:

  1200-1450,2000,5000-5999

 and we want to do a search of 1100-1300,8000 and have it match the
 document.


 How can this be done using Solr?


 Thanks,
 Anders.

Re: Questions on RandomSortField

2009-09-21 Thread Avlesh Singh

The question was either non-trivial or heavily uninteresting! No replies yet
:)
Thankfully, I figured out a solution for the problem at hand. For people who
might be looking for a solution, here it goes -

   1. Extended the RandomSortField to create your own YourCustomRandomField.

   2. Override the RandomSortField #getSortField method to return
   YourSortField.
   3. Return YourSortComparatorSource from YourSortField#getFactory().
   4. Most of the rules related to the problem statement would be handled in
   the YourSortComparatorSource#newComparator().
   5. In your schema, create a dynamic field of YourFieldType. Pass in the
   id (Look at the problem statement in the trailing post) as a part of the
   dynamic field name in your sort query.
   6. Inside YourSortComparatorSource#newComparator(), get the above
   mentioned id from fieldName parameter and then fetch the values indexed in
   this field using Lucene's FieldCache.
   7. In your ScoreDocComparator#compare(), first check for the values in
   the id field and return -1,1,0 or hash(i.doc + seed) - hash(j.doc +
   seed) based on the values in this field. The idea is to only randomize
   results for a particular id value.

Hoss, I have a small question (RandomSortField bears your signature) - Any
reason as to why RandomSortField#hash() and RandomSortField#getSeed()
methods are private? Having them public would have saved myself from
owning a copy in my class as well.

My solution applies to Solr 1.3. It might not hold true for higher versions
as underlying Lucene API's might have changed.

Cheers
Avlesh

On Sun, Sep 20, 2009 at 4:28 PM, Avlesh Singh avl...@gmail.com wrote:

 I am using Solr 1.3
 I have a solr.RandomSortField type dynamic field which I use to randomize
 my results.

 I am in a tricky situation. I need to randomize only certain results in
 my Hits.
 To elaborate, I have a integer field called category_id. When performing
 a query, I need to get results from all categories and place the ones from
 SOME_CAT_ID at the top. I achieved this by populating a separate dynamic
 field while indexing data. i.e When a doc is added to the index a field
 called dynamic_cat_id_SOME_CAT_ID is populated with its category id. While
 querying, I know the value of SOME_CAT_ID, so adding a
 sort=dynamic_cat_id_SOME_CAT_ID asc, score desc to my query, works
 absolutely fine.

 So far so good. I am now supposed to randomize the results for
 category_id=SOME_CAT_ID, i.e results at the top. My understading is that
 adding sort=dynamic_cat_id_SOME_CAT_ID asc, 
 *my_dynamic_random_field_SOME_SEED
 asc*, score desc to the query would randomize all the results. This is
 not desired. I only want to randomize the one's at the top
 (category_id=SOME_CAT_ID), rest should be ordered based on relevance score.

 Two simple questions:

1. Is there a way to achieve this without writing any custom code?
2. If the answer to #1 is no, the Where should I start? I glanced the
RandomSortField class but could not figure out how to proceed. Do I need to
create a custom FieldType? Can I extend the RandomSortField and override 
 the
sorting behaviour?

 Any help would be appreciated.

 Cheers
 Avlesh

Questions on RandomSortField

2009-09-20 Thread Avlesh Singh

I am using Solr 1.3
I have a solr.RandomSortField type dynamic field which I use to randomize
my results.

I am in a tricky situation. I need to randomize only certain results in my
Hits.
To elaborate, I have a integer field called category_id. When performing a
query, I need to get results from all categories and place the ones from
SOME_CAT_ID at the top. I achieved this by populating a separate dynamic
field while indexing data. i.e When a doc is added to the index a field
called dynamic_cat_id_SOME_CAT_ID is populated with its category id. While
querying, I know the value of SOME_CAT_ID, so adding a
sort=dynamic_cat_id_SOME_CAT_ID asc, score desc to my query, works
absolutely fine.

So far so good. I am now supposed to randomize the results for
category_id=SOME_CAT_ID, i.e results at the top. My understading is that
adding sort=dynamic_cat_id_SOME_CAT_ID asc, *my_dynamic_random_field_SOME_SEED
asc*, score desc to the query would randomize all the results. This is not
desired. I only want to randomize the one's at the top
(category_id=SOME_CAT_ID), rest should be ordered based on relevance score.

Two simple questions:

   1. Is there a way to achieve this without writing any custom code?
   2. If the answer to #1 is no, the Where should I start? I glanced the
   RandomSortField class but could not figure out how to proceed. Do I need to
   create a custom FieldType? Can I extend the RandomSortField and override the
   sorting behaviour?

Any help would be appreciated.

Cheers
Avlesh

Re: Need help to finalize my autocomplete

2009-09-16 Thread Avlesh Singh

Instead of tokenizer class=solr.WhitespaceTokenizerFactory / use
tokenizer class=solr.KeywordTokenizerFactory/

Cheers
Avlesh

2009/9/16 Vincent Pérès vincent.pe...@gmail.com


 Hello,

 I'm using the following code for my autocomplete feature :

 The field type :

fieldType name=autoComplete class=solr.TextField omitNorms=true
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory maxGramSize=20
 minGramSize=2 /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
 pattern=^(.{20})(.*)? replacement=$1 replace=all /
  /analyzer
/fieldType

 The field :

 dynamicField name=*_ac type=autoComplete indexed=true stored=true
 /

 The query :

 ?q=*:*fq=query_ac:harry*wt=jsonrows=15start=0fl=*indent=onfq=model:SearchQuery

 It gives me a list of results I can parse and use with jQuery autocomplete
 plugin and all that works very well.

 Example of results :
  harry
  harry potter
  the last fighting harry
  harry potter 5
  comic relief harry potter

 What I would like to do now is only to have results starting with the
 query,
 so it should be :
  harry
  harry potter
  harry potter 5

 Can anybody tell me if it is possible and so how to do it ?

 Thank you !
 Vincent
 And
 --
 View this message in context:
 http://www.nabble.com/Need-help-to-finalize-my-autocomplete-tp25468885p25468885.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Creating facet query using SolrJ

2009-09-09 Thread Avlesh Singh

When constructing query, I create a lucene query and use query.toString to
create SolrQuery.

Go this thread -
http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query

I am facing difficulty while creating facet query for individual field, as I
could not find an easy and clean way of constructing facet query with
parameters specified at field level.

Per field overrides for facet params using SolrJ is not supported yet.
However, you can always use
solrQuery.set(f.myField.facet.limit,10) ...
to pass field specific facet params to the SolrServer.

Cheers
Avlesh

On Wed, Sep 9, 2009 at 2:42 PM, Aakash Dharmadhikari aaka...@gmail.comwrote:

hello,

I am using SolrJ to access solr indexes. When constructing query, I create
a lucene query and use query.toString to create SolrQuery.

I am facing difficulty while creating facet query for individual field, as
I could not find an easy and clean way of constructing facet query with
parameters specified at field level.

As I understand, the faceting parameters like limit, sort order etc. can
be set on SolrQuery object but they are used for all the facets in query. I
would like to provide these parameters separately for each field. I am
currently building such query in Java code using string append. But it
looks
really bad, and would be prone to breaking when query syntax changes in
future.

If there any better way of constructing such detailed facet queries, the
way we build the main solr search query?

regards,
aakash

Re: Facet search field returning results on split words

2009-09-04 Thread Avlesh Singh

Your field needs to be untokenized for expected results. Faceting on the
text field that you use to search will give you facets like these. You can
index the same data in some other string field and facet on that field.

PS: You can use copyField to copy data during index time from one field to
other.

Cheers
Avlesh

On Fri, Sep 4, 2009 at 6:21 PM, EwanH drldgt...@sneakemail.com wrote:


 Hi

 I have a solr search where a particular field named location is a place
 name.  I have the field indexed and stored.  It is quite likely that a
 field
 value could comprise more than one term or at least 2 words split by a
 space
 such as Burnham Market.  Now if I search on location:burnham I get the
 appropriate docs returned ok but the facet results return

 lst name=location
 int name=burnham2/int
 int name=thorp2/int
 /lst

 i.e. values for both words which I don' want.  What can I do about this?
 Can I somehow escape the space when adding the data for indexing?

 -- Ewan
 --
 View this message in context:
 http://www.nabble.com/Facet-search-field-returning-results-on-split-words-tp25293787p25293787.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to scan dynamic field without specifying each field in query

2009-09-04 Thread Avlesh Singh

I dont have that answer as I was asking a general question, not one for a
specific situation I am encountering).

I can understand :)

what I am essentially asking for is: is there a short, simple and generic
method/technique to deal with large numbers of dynamic fields (rather than
having to specify each and every test on each and every dynamic field) in a
query

Not as of now. There are a lot of open issues in Solr aiming to handle
dynamic fields in an intuitive way. SolrJ has already been made capable of
binding dynamic field content into Java beans (
https://issues.apache.org/jira/browse/SOLR-1129). Faceting on myField_* (
https://issues.apache.org/jira/browse/SOLR-1387) and adding SolrDocuments
with MapString, String myField_* (
https://issues.apache.org/jira/browse/SOLR-1357) are just some of the
enhancements on the way.

what originally prompted this question is I was looking at FunctionQueries (
http://wiki.apache.org/solr/FunctionQuery) and started to wonder if there
was some way to create my own functions to handle dynamic fields.

I don't think you need function queries here. Function queries are supposed
to return score for a document based on their ValueSource. What you probably
need is a custom QueryParser.

Cheers
Avlesh

On Fri, Sep 4, 2009 at 9:48 PM, gdeconto gerald.deco...@topproducer.comwrote:

I dont have that answer as I was asking a general question, not one for a
specific situation I am encountering).

what originally prompted this question is I was looking at FunctionQueries
(http://wiki.apache.org/solr/FunctionQuery) and started to wonder if there
was some way to create my own functions to handle dynamic fields.

Aakash Dharmadhikari wrote:

what all other searches you would like to perform on these fields?

...

--
View this message in context:
http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25297439.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Schema for group/child entity setup

2009-09-04 Thread Avlesh Singh

Well you are talking about a very relational behavior, Tan.
You can declare a locations and location_* field in your schema. While
indexing a document, put all the locations inside the field locations.
Populate location_state, location_city etc .. with their corresponding
location values. That ways, when no filter is applied, you can facet on the
locations field to get all the locations. In all other scenarios when a
filter on field foo is applied, faceting on location_foo will give you
the desired results.

Cheers
Avlesh

On Fri, Sep 4, 2009 at 10:16 PM, R. Tan tanrihae...@gmail.com wrote:

 I can't because there are facet values for each location, such as
 state/city/neighborhood and facilities. Example result is 7 Eleven, 100
 locations when no location filters are applied, where there is a filter for
 state, it should show 7 Eleven, 20 locations.

 On Fri, Sep 4, 2009 at 11:57 PM, Aakash Dharmadhikari aaka...@gmail.com
 wrote:

  can't you store the locations as part of the parent listing while
 storing.
  This way there would be only one document per parent listing. And all the
  locations related information can be multi valued attributes per property
  or
  any other way depending on the attributes.
 
  2009/9/3 R. Tan tanrihae...@gmail.com
 
   Hi Solrers,
   I would like to get your opinion on how to best approach a search
   requirement that I have. The scenario is I have a set of business
  listings
   that may be group into one parent business (such as 7-eleven having
  several
   locations). On the results page, I only want 7-eleven to show up once
 but
   also show how many locations matched the query (facet filtered by
 state,
   for
   example) and maybe a preview of the some of the locations.
  
   Searching for the business name is straightforward but the locations
  within
   the a result is quite tricky. I can do the opposite, searching for the
   locations and faceting on business names, but it will still basically
 be
   the
   same thing and repeat results with the same business name.
  
   Any advice?
  
   Thanks,
   R

Re: Schema for group/child entity setup

2009-09-04 Thread Avlesh Singh


 But, as I've discovered the field collapsing feature recently (although I
 haven't tested it), can't it solve this requirement?

From the top of my head, no. The answer might change on deep thinking. It is
one of the most popular features which is yet to be incorporated into Solr.

Cheers
Avlesh

On Fri, Sep 4, 2009 at 10:58 PM, R. Tan tanrihae...@gmail.com wrote:

 Hmmm, interesting solution. But, as I've discovered the field collapsing
 feature recently (although I haven't tested it), can't it solve this
 requirement?

 On Sat, Sep 5, 2009 at 1:14 AM, Avlesh Singh avl...@gmail.com wrote:

  Well you are talking about a very relational behavior, Tan.
  You can declare a locations and location_* field in your schema.
 While
  indexing a document, put all the locations inside the field locations.
  Populate location_state, location_city etc .. with their
 corresponding
  location values. That ways, when no filter is applied, you can facet on
 the
  locations field to get all the locations. In all other scenarios when a
  filter on field foo is applied, faceting on location_foo will give
 you
  the desired results.
 
  Cheers
  Avlesh
 
  On Fri, Sep 4, 2009 at 10:16 PM, R. Tan tanrihae...@gmail.com wrote:
 
   I can't because there are facet values for each location, such as
   state/city/neighborhood and facilities. Example result is 7 Eleven,
 100
   locations when no location filters are applied, where there is a filter
  for
   state, it should show 7 Eleven, 20 locations.
  
   On Fri, Sep 4, 2009 at 11:57 PM, Aakash Dharmadhikari 
 aaka...@gmail.com
   wrote:
  
can't you store the locations as part of the parent listing while
   storing.
This way there would be only one document per parent listing. And all
  the
locations related information can be multi valued attributes per
  property
or
any other way depending on the attributes.
   
2009/9/3 R. Tan tanrihae...@gmail.com
   
 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business
listings
 that may be group into one parent business (such as 7-eleven having
several
 locations). On the results page, I only want 7-eleven to show up
 once
   but
 also show how many locations matched the query (facet filtered by
   state,
 for
 example) and maybe a preview of the some of the locations.

 Searching for the business name is straightforward but the
 locations
within
 the a result is quite tricky. I can do the opposite, searching for
  the
 locations and faceting on business names, but it will still
 basically
   be
 the
 same thing and repeat results with the same business name.

 Any advice?

 Thanks,
 R

Re: how to create a custom queryparse to handle new functions

2009-09-04 Thread Avlesh Singh


 You do not need to create a custom query parser for this. You just need to
 create a custom function query. Look at one of the existing function queries
 in Solr as an example.

This is where the need originates from -
http://www.lucidimagination.com/search/document/a4bb0dfee53f7493/how_to_scan_dynamic_field_without_specifying_each_field_in_query

Within the function, the intent is to rewrite incoming parameter into a
different query. Can this be done? AFAIK, not.

Cheers
Avlesh

On Sat, Sep 5, 2009 at 3:21 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Sat, Sep 5, 2009 at 2:15 AM, gdeconto gerald.deco...@topproducer.com
 wrote:

 
  Can someone point me in the general direction of how to create a custom
  queryparser that would allow me to create custom query commands like
 this:
 
  http://localhost:8994/solr/select?q=myfunction(http://localhost:8994/solr/select?q=myfunction%28
 http://localhost:8994/solr/select?q=myfunction%28‘Foo’,
  3)
 
  or point me towards an example?
 
  note that the actual functionality of myfunction is not defined.  I am
 just
  wondering if this sort of extensibility is possible.
 

 You do not need to create a custom query parser for this. You just need to
 create a custom function query. Look at one of the existing function
 queries
 in Solr as an example.


 --
 Regards,
 Shalin Shekhar Mangar.

1 2 3 >

1 - 100 of 238 matches

Mail list logo