from:"Sandeep Mestry"

Re: Sorting in solr

2016-07-11 Thread Sandeep Mestry

Hi Naveen,

I am not too sure what you're after but the sorting mechanism is applied
after search results are fetched.

>From Solr Ref Guide:
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

The sort parameter *arranges search results* in either ascending (asc) or
descending (desc) order.

Thanks,
Sandeep

On 11 July 2016 at 11:13, Naveen Pajjuri  wrote:

> Hi,
> If i apply some sorting order on solr. when are the Documents sorted.
>
>1. are documents sorted after fetching the results  ?
>2. or we get sorted documents ?
>
> Regards,
> Naveen
>

Re: Many to Many Mapping with Solr

2016-05-01 Thread Sandeep Mestry

Thanks Alexandre, even I am of the opinion not to use solr rdbms way but i
am concerned about the updates to the indexes. We're expecting around 500
writes per second to the database which will generate in >500 updates to
the index per second. If the entities are denormalised this will have an
impact on performance hence I was inclined to design it like db.

Joel,
I will explain it in a bit more detail what my use cases are, all of these
should be driven by search engine:

1) user logs in and the system should display all recordings for that user
2) user adds a recording, the system is updated with the additional
recording
3) user removes a recording, the system is updated with the recording
removed.
4) when the user searches for a recording, the system should only display
matches in his recordings. Every user-recording mapping has additional
properties which are also searchable attributes.

here, we are talking about 2M users and 500M recordings and this is
currently driven by database of size ~60-80GB.

I am going to do a small poc for these use cases and I will go with
denormalised entities with search requirements as my main focus. However,
if you have anything more to add, do let me know. I will be grateful.

Many Thanks,
Sandeep

On 29 April 2016 at 14:54, Joel Bernstein <joels...@gmail.com> wrote:

> We really still need to know more about your use case. In particular what
> types of questions will you be asking of the data? It's useful to do this
> in plain english without mapping to any specific implementation.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 29, 2016 at 9:43 AM, Alexandre Rafalovitch <arafa...@gmail.com
> >
> wrote:
>
> > You do not structure Solr to represent your database. You structure it
> > to represent what you will search.
> >
> > In your case, it sounds like you want to return 'user-records', in
> > which case you will index the related information all together. Yes,
> > you will possibly need to recreate the multiple documents when you
> > update one record (or one user). And yes, you will have the same
> > information multiple times. But you can used index-only values or
> > docvalues to reduce storage and duplication.
> >
> > You may also want to have Solr return only the relevant IDs from the
> > search and you recreate the m-to-m object structure from the database.
> > Then, you don't need to store much at all, just index.
> >
> > Basically, don't think about your database as much when deciding Solr
> > structure. It does not map one-to-one.
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 29 April 2016 at 20:48, Sandeep Mestry <sanmes...@gmail.com> wrote:
> > > Hi All,
> > >
> > > Hope the day is going on well for you.
> > >
> > > This question has been asked before, but I couldn't find answer to my
> > > specific request. I have many to many relationship and the mapping
> table
> > > has additional columns. Whats the best way I can model this into solr
> > > entity?
> > >
> > > For example: a user has many recordings and a recording belongs to many
> > > users. But each user-recording has additional feature like type, number
> > etc.
> > > I'd like to fetch recordings for the user. If the user adds/ updates/
> > > deletes a recording then that should be reflected in the search.
> > >
> > > I have 2 options:
> > > 1) to create user entity, recording entity and user_recording entity
> > > - this is good but it's like treating solr like rdbms which i mostly
> > avoid..
> > >
> > > 2) user entity containing all the recordings information and each
> > recording
> > > containing user information
> > > - this has impact on index size but the fetch and manipulation will be
> > > faster.
> > >
> > > Any guidance will be good..
> > >
> > > Thanks,
> > > Sandeep
> >
>

Many to Many Mapping with Solr

2016-04-29 Thread Sandeep Mestry

Hi All,

Hope the day is going on well for you.

This question has been asked before, but I couldn't find answer to my
specific request. I have many to many relationship and the mapping table
has additional columns. Whats the best way I can model this into solr
entity?

For example: a user has many recordings and a recording belongs to many
users. But each user-recording has additional feature like type, number etc.
I'd like to fetch recordings for the user. If the user adds/ updates/
deletes a recording then that should be reflected in the search.

I have 2 options:
1) to create user entity, recording entity and user_recording entity
- this is good but it's like treating solr like rdbms which i mostly avoid..

2) user entity containing all the recordings information and each recording
containing user information
- this has impact on index size but the fetch and manipulation will be
faster.

Any guidance will be good..

Thanks,
Sandeep

Re: Newbie SolR - Need advice

2013-07-03 Thread Sandeep Mestry

+1


On 3 July 2013 14:58, Jack Krupansky j...@basetechnology.com wrote:

 Design your own application layer for both indexing and query that knows
 about both SQL and Solr. Give it a REST API and then your client
 applications can talk to your REST API and not have to care about the
 details of Solr or SQL. That's the best starting point.


 -- Jack Krupansky

 -Original Message- From: fabio1605
 Sent: Wednesday, July 03, 2013 4:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Newbie SolR - Need advice


 Hi Sandeep

 Thank you for your reply

 Il have a read through the tutorials now that i understand the principle of
 all this,

 i would ideally like to keep mssql and bolt solr on top of this so that we
 can keep mssql as we have a 200GB database

 Cheers



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Newbie-SolR-Need-**advice-tp4074746p4075026.htmlhttp://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4075026.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie SolR - Need advice

2013-07-02 Thread Sandeep Mestry

Hi Fabio,

No, Solr isn't the database replacement for MS SQL.
Solr is built on top of Lucene which is a search engine library for text
searches.

Solr in itself is not a replacement for any database as it does not support
any relational db features, however as Jack and David mentioned its fully
optimised search engine platform that can provide all search related
features like faceting, highlighting etc.
Solr does not have a *database*. It stores the data in binary files called
indexes http://lucene.apache.org/core/3_0_3/fileformats.html. These
indexes are populated with the data from the database. Solr provides an
inbuilt functionality through DataImportHandler component to get the data
and generate indexes.

When you say, your web servers are mainly doing search function, do you
mean it is a text search and you use queries with clauses as 'like', 'in'
etc. (in addition to multiple joints) to get the results? Does the web
application need faceting? If yes, then solr can be your friend to get it
through.

Do remember that it always takes some time to get the new concepts from
understanding through to implementation. As David mentioned already, it
*is* going to be a bumpy ride at the start but *definitely* a sensational
one.

Good Luck,
Sandeep



On 2 July 2013 17:09, fabio1605 fabio.to...@btinternet.com wrote:

 Thanks guys

 So SolR is actually a database replacement for mssql...  Am I right


 We have a lot of perl scripts that contains lots of sql insert
 queries. Etc


 How do we query the SolR database from scripts  I know I have a lot to
 learn still so excuse my ignorance.

 Also...  What is mongo and how does it compare

 I just don't understand how in 10years of Web development I have never
 heard of SolR till last week




 Sent from Samsung Mobile

  Original message 
 From: David Quarterman [via Lucene] 
 ml-node+s472066n4074772...@n3.nabble.com
 Date: 02/07/2013  16:57  (GMT+00:00)
 To: fabio1605 fabio.to...@btinternet.com
 Subject: RE: Newbie SolR - Need advice

 Hi Fabio,

 Like Jack says, try the tutorial. But to answer your question, SOLR isn't
 a bolt on to SQLServer or any other DB. It's a fantastically fast
 indexing/searching tool. You'll need to use the DataImportHandler (see the
 tutorial) to import your data from the DB into the indices that SOLR uses.
 Once in there, you'll have more power  flexibility than SQLServer would
 ever give you!

 Haven't tried SOLR on Windows (I guess your environment) but I'm sure
 it'll work using Jetty or Tomcat as web container.

 Stick with it. The ride can be bumpy but the experience is sensational!

 DQ

 -Original Message-
 From: fabio1605 [mailto:[hidden email]]
 Sent: 02 July 2013 16:16
 To: [hidden email]
 Subject: Newbie SolR - Need advice

 Hi

 we have a MSSQL Server which is just getting far to large now and
 performance is dying! the majority of our webservers mainly are doing
 search function so i thought it may be best to move to SolR But i know very
 little about it!

 My questions are!

 Does SolR Run as a bolt on to MSSQL - as in the data is still in MSSQL and
 SolR is just the search bit between?

 Im really struggling to understand the point of SOLR etc so if someone
 could point me to a Dummies website id apprecaite it! google is throwing to
 much confusion at me!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074772.html
 To unsubscribe from Newbie SolR - Need advice, click here.
 NAML



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074782.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie SolR - Need advice

2013-07-02 Thread Sandeep Mestry

Hi Fabio,

Yes, you're on right track.

I'd like to now direct you to first reply from Jack to go through solr
tutorial.
Even with Solr,, it will take some time to learn various bits and pieces
about designing fields, their field types, server configuration, etc. and
then tune the results to match the results that you're currently getting
from the database. There is lots of info available for Solr on web and do
check Lucidworks' Solr Reference Guide.
http://docs.lucidworks.com/display/solr/Apache+Solr+Reference+Guide;jsessionid=16ED0DB3B6F6BE8CEC6E6CDB207DBC49

Best of Solr Luck!

Sandeep

On 2 July 2013 20:47, fabio1605 fabio.to...@btinternet.com wrote:

So, you keep your mssql database, you just don't use it for searches -
that'll relieve some of the load. Searches then all go through SOLR its
Lucene indexes. If your various tables need SQL joins, you specify those in
the DataImportHandler (DIH) config. That way, when SOLR indexes everything,
it indexes the data the way you want to see it.

-- SO by this you mean we keep mssql as we do!!

But we use the website to run through SOLR SOLR will then handle the
indexing and retrieval of data from its own index's, and will make its own
calls to our MSSQL server when required(i.e updating/adding to
indexs..)

Am I on the right tracks there now!

So MSSQL becomes the datastore
SOLR becomes the search engine...

--
View this message in context:
http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074889.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dot operater issue.

2013-06-27 Thread Sandeep Mestry

Hi Sri,

This depends on how the fields (that hold the value) are defined and how
the query is generated.
Try running the query in solr console and use debug=true to see how the
query string is getting parsed.

If that doesn't help then could you answer following 3 questions relating
to your question.

1) field definition in schema.xml
2) solr query url
3) parser config from solrconfig.xml


Thanks,
Sandeep


On 27 June 2013 10:41, Srinivasa Chegu cheg...@hcl.com wrote:

 Hi team,

 When the user enter search term as h.e.r.b.a.l  in the search textbox
 and click on search button then  SOLR search engine is not returning any
  results found. As I can see SOLR is accepting the request parameter as
 h.e.r.b.a.l. However we have many records with the string h.e.r.b.a.l as
 part of the product name.

 Look like there is an issue with dot operator in the search term.  If we
 enter search term as herbal then it is returning search results .

 Our requirement is search term should be h.e.r.b.a.l then it needs to
 display results based on dot operator .

 Please help us on this issue.

 Regards
 Srinivas


 ::DISCLAIMER::

 

 The contents of this e-mail and any attachment(s) are confidential and
 intended for the named recipient(s) only.
 E-mail transmission is not guaranteed to be secure or error-free as
 information could be intercepted, corrupted,
 lost, destroyed, arrive late or incomplete, or may contain viruses in
 transmission. The e mail and its contents
 (with or without referred errors) shall therefore not attach any liability
 on the originator or HCL or its affiliates.
 Views or opinions, if any, presented in this email are solely those of the
 author and may not necessarily reflect the
 views or opinions of HCL or its affiliates. Any form of reproduction,
 dissemination, copying, disclosure, modification,
 distribution and / or publication of this message without the prior
 written consent of authorized representative of
 HCL is strictly prohibited. If you have received this email in error
 please delete it and notify the sender immediately.
 Before opening any email and/or attachments, please check them for viruses
 and other defects.

Re: Solr 4.2.1 + Distribution scripts (rsync) Issue

2013-06-05 Thread Sandeep Mestry

Hi Hoss,

Thanks for your reply, Please find answers to your questions below.

*Well, for starters -- have you considered at least looking into using the java
based Replicationhandler instead of the rsync scripts?*
- There was an attempt to to implement java based replication but it was
very slow and so that option was discarded and instead rsync was used. This
was done couple of years ago and till Feb of this year, we were using Solr
1.4. I upgraded solr to 4.0 with rsync, however due to time and resource
constraint rsync alternative was not evaluated and it can't be done even
today - only in next release, we'll be doing solrcloud.

My setup looks like below - this was working correctly with Solr 1.4, Solr
4.0 versions.

1) Index Feeder applications feeds indexes to indexer boxes.
2) A cron job that runs every minute on indexer boxes (commiter), commits
the indexes (commit) and invokes snapshooter to create snapshot. rsync
daemon running on indexer boxes.
3) Another cron job runs on search boxes every minute, which pulls the
snapshot (using snappuller), installs it on search boxes (snapinstaller)
which also notifies search to open a new searcher (commit)

Additionally, there is a cron job that runs every morning at 4 am on
indexer boxes which optimises the index (optimize) and cleans the snapshots
until a day (snapcleaner).
This is as per http://wiki.apache.org/solr/SolrCollectionDistributionScripts

*Which config is this, your indexer or your searcher? (i'm assuming it's the
searcher since i don't see any postCommit commands to exec snapshooter but
i wanted to sanity check that wasn't a simple explanation for your problem)*
- Because of this set up, I do not have any post commit setup in
solrconfig.xml.
- This solrconfig.xml is used for both indexer and searcher boxes.

I can see that after my upgrade to Solr 4.2.1, all these scripts behave
normally just that I do not see the updates getting refreshed on search
boxes unless I restart.
*
*
*What exactly does your manual commit command look like?  *
- This is by using commit script under bin directory (commit -h localhost
-p 8983)
- I have also tried URL based commit as you had mentioned but no luck

*Are you doing this on the indexer box or the searcher boxes? *
- I executed manual commit on searcher boxes, the indexer boxes do show the
commit and updates correctly.

*what is the HTTP response from this comment? what do the logs show when
you do this?
*
- I have attached the logs, please note that I have enabled the
openSearcher for testing.

Thanks, please let me know if I'm missing something. I remembered people
not getting their deletes and the workaround was to add _version_ field in
schema, which I had done but no luck. I know it might be unrelated but I am
just trying all my options.

Thanks again,
Sandeep


On 5 June 2013 00:41, Chris Hostetter hossman_luc...@fucit.org wrote:


 : However, we haven't yet implemented SolrCloud and still relying on
 : distribution scripts - rsync, indexpuller mechanism.

 Well, for starters -- have you considered at least looking into using hte
 java based Replicationhandler instead of the rsync scripts?

 Script based replication has not been actively maintained since java
 replication was added back in Solr 1.4!

 : I see that the indexes are getting created on indexer boxes, snapshots
 : being created and then pulled across to search boxes. The snapshots are
 : getting installed on search boxes as well. There are no errors in the
 : scripts logs and this process works well.
 : However, when I check the update in solr console (on search boxes), I do
 : not see the updated result. The updates do not appear in search boxes
 even
 : after manual commit. Only after a *restart* of the search application
 : (deployed in tomcat) I can see the updated results.

 What exactly does your manual commit command look like?  Are you
 doing this on the indexer box or the searcher boxes?  what is the HTTP
 response from this comment? what do the logs show when you do this?

 It's possible that some internal changes in Solr relating to NRT
 improvements may have optimized away re-opening on commit if solr doesn't
 think the index has changed -- but i doubt it.  because I just tried a
 simple test using the 4.3.0 example where i manually simulated
 snapinstaller replacing hte index files with a newer index and issued
 http://localhost:8983/solr/update?commit=true; and solr loaded up that
 new index and started searching it -- so i suspect the devil is in the
 details of your setup.

 you're sure each of the snapshooter, snappuller, snapinstaller scripts are
 executing properly?

 : I have done minimal changes for the upgrade in solrconfig.xml and is
 pasted
 : below. Please can someone take a look and let me know what the issue is.
 : The same config was working fine on Solr 4.0 (as well as Solr 1.4.1).

 which config is this, your indexer or your searcher? (i'm assuming it's
 the searcher since i don't see any postCommit commands

Re: Solr Faceting doesn't return values.

2013-05-23 Thread Sandeep Mestry

*str name=msgorg.apache.solr.search.SyntaxError: Cannot parse
'*mm_state_code:(**TX)*': Encountered  : :  at line 1, column 14.
Was expecting one of:*

This suggests to me that you kept the df parameter in the query hence it
was forming mm_state_code:mm_state_code:(TX), can you try exactly they way
I gave you - i.e. without the df parameter?
Also, can you post schema.xml and /select handler config from
solrconfig.xml?


On 22 May 2013 18:36, samabhiK qed...@gmail.com wrote:

 When I use your query, I get :

 ?xml version=1.0 encoding=UTF-8?
 response

 lst name=responseHeader
   int name=status400/int
   int name=QTime12/int
   lst name=params
 str name=facettrue/str
 str name=dfmm_state_code/str
 str name=indenttrue/str
 str name=q*mm_state_code:(**TX)*/str
 str name=_1369244078714/str
 str name=debugall/str
 str name=facet.fieldsa_site_city/str
 str name=wtxml/str
   /lst
 /lst
 lst name=error
   str name=msgorg.apache.solr.search.SyntaxError: Cannot parse
 '*mm_state_code:(**TX)*': Encountered  : :  at line 1, column 14.
 Was expecting one of:
 EOF
 AND ...
 OR ...
 NOT ...
 + ...
 - ...
 BAREOPER ...
 ( ...
 * ...
 ^ ...
 QUOTED ...
 TERM ...
 FUZZY_SLOP ...
 PREFIXTERM ...
 WILDTERM ...
 REGEXPTERM ...
 [ ...
 { ...
 LPARAMS ...
 NUMBER ...
 /str
   int name=code400/int
 /lst
 /response

 Not sure why the data wont show up. Almost all the records has the field
 sa_site_city has data and is also indexed. :(



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065406.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

Hi Oussama,

This is explained very nicely on Solr Wiki..
http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

All you need to do is something similar to below..

   -

   add  doc boost=2.5field name=employeeId05991/field
  field name=office boost=2.0Bridgewater/field  /doc/add


What is not clear from your message is whether you need better scoring or
better sorting. so, additionally, you can consider adding a secondary sort
parameter for the docs having the same score.
http://wiki.apache.org/solr/CommonQueryParameters#sort


HTH,
Sandeep


On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote:

 Thank you for your reply bbarani,

 I can't do that because I want to boost some documents over others,
 independing of the query.


 On 05/21/2013 05:41 PM, bbarani wrote:

  Why don't you boost during query time?

 Something like q=supermanqf=title^2 subject

 You can refer: 
 http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

I think that is applicable only for the field level boosting and not at
document level boosting.

Can you post your query, field definition and results you're expecting.

I am using index and query time boosting without any issues so far. also
which version of Solr you're using?


On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

 I don't know if this is the issue or not but, concidering this note from
 the wiki :

 NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
 for any fields where the index-time boost should be stored.

 In my case where I only need to boost the whole document (not a specific
 field), do I have to activate the  omitNorms=false  for all the
 fields in the schema ?




 On 05/22/2013 10:41 AM, Oussama Jilal wrote:

 Thank you Sandeep,

 I did post the document like that (a minor difference is that I did not
 add the boost to the field since I don't want to boost on specific field, I
 boosted the whole document 'doc boost=2.0  /doc'), but the issue
 is that everything in the queries results has the same score even if they
 had been indexed with different boosts, and I can't sort on another field
 since this is independent from any field value.

 Any ideas ?

 On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

 Hi Oussama,

 This is explained very nicely on Solr Wiki..
 http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
 http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**
 attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

 All you need to do is something similar to below..

 -

 add  doc boost=2.5field name=employeeId05991/**field
field name=office boost=2.0Bridgewater/**field /doc/add


 What is not clear from your message is whether you need better scoring or
 better sorting. so, additionally, you can consider adding a secondary
 sort
 parameter for the docs having the same score.
 http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort


 HTH,
 Sandeep


 On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote:

  Thank you for your reply bbarani,

 I can't do that because I want to boost some documents over others,
 independing of the query.


 On 05/21/2013 05:41 PM, bbarani wrote:

Why don't you boost during query time?

 Something like q=supermanqf=title^2 subject

 You can refer: 
 http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ
 http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ
 



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html
 http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-**
 tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html

 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

Did you use the debugQuery=true in solr console to see how the query is
being interpreted and the result calculation?

Also, I'm not sure but this copyfield directive seems a bit confusing to
me..
copyField  source=Id  dest=Suggestion  /
Because multiValued is false for Suggestion field so does that schema mean
Suggestion has value only from Id and not from any other input?

You haven't mentioned the version of Solr, can you also post the query
params?



On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote:

 I don't know if this can help (since the document boost should be
 independent of any schema) but here is my schema :

|?xml version=1.0 encoding=UTF-8?
schema  name=  version=1.5
 types
 fieldType  name=string  class=solr.StrField
  sortMissingLast=true  /
 fieldType  name=long  class=solr.TrieLongField
  sortMissingLast=true  precisionStep=0  positionIncrementGap=0  /
 fieldType  name=text  class=solr.TextField
  sortMissingLast=true  omitNorms=true
 analyzer  type=index
 tokenizer  class=solr.**KeywordTokenizerFactory
  /
 filter  class=solr.**LowerCaseFilterFactory  /
 filter  class=solr.**EdgeNGramFilterFactory
  maxGramSize=255  /
 /analyzer
 analyzer  type=query
 tokenizer  class=solr.**KeywordTokenizerFactory
  /
 filter  class=solr.**LowerCaseFilterFactory  /
 /analyzer
 /fieldType
 /types
 fields
 field  name=Id  type=string  indexed=true
  stored=true  multiValued=false  required=true  /
 field  name=Suggestion  type=text  indexed=true
  stored=true  multiValued=false  required=false  /
 field  name=Type  type=string  indexed=true
  stored=true  multiValued=false  required=true  /
 field  name=Sections  type=string  indexed=true
  stored=true  multiValued=true  required=false  /
 field  name=_version_  type=long  indexed=true
  stored=true/
 /fields
 copyField  source=Id  dest=Suggestion  /
 uniqueKeyId/uniqueKey
 defaultSearchField**Suggestion/**defaultSearchField
/schema|

 My query is somthing like : Suggestion:Olive Oil.

 The result is 9 documents, wich all has the same score 11.287682, even
 if they had been indexed with different boosts (I am sure of this).




 On 05/22/2013 10:54 AM, Sandeep Mestry wrote:

 I think that is applicable only for the field level boosting and not at
 document level boosting.

 Can you post your query, field definition and results you're expecting.

 I am using index and query time boosting without any issues so far. also
 which version of Solr you're using?


 On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

  I don't know if this is the issue or not but, concidering this note from
 the wiki :

 NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
 for any fields where the index-time boost should be stored.

 In my case where I only need to boost the whole document (not a specific
 field), do I have to activate the  omitNorms=false  for all the
 fields in the schema ?




 On 05/22/2013 10:41 AM, Oussama Jilal wrote:

  Thank you Sandeep,

 I did post the document like that (a minor difference is that I did not
 add the boost to the field since I don't want to boost on specific
 field, I
 boosted the whole document 'doc boost=2.0  /doc'), but the
 issue
 is that everything in the queries results has the same score even if
 they
 had been indexed with different boosts, and I can't sort on another
 field
 since this is independent from any field value.

 Any ideas ?

 On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

  Hi Oussama,

 This is explained very nicely on Solr Wiki..
 http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts
 http://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
 
 http://wiki.apache.org/solr/UpdateXmlMessages#Optional_**http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**
 attributes_for_.22add.22http:**//wiki.apache.org/solr/**
 UpdateXmlMessages#Optional_**attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22
 


 All you need to do is something similar to below..

  -

  add  doc boost=2.5field name=employeeId05991/
 field
 field name=office boost=2.0Bridgewater/field
 /doc/add



 What is not clear from your message is whether you need better scoring
 or
 better sorting. so, additionally, you can consider adding a secondary
 sort
 parameter for the docs having the same score.
 http

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

I'm running out of options now, can't really see the issue you're facing
unless the debug analysis is posted.
I think a thorough debugging is required from both application and solr
level.

If you want a customize scoring from Solr, you can also consider overriding
DefaultSimilarity implementation - but that'll be a separate issue.


On 22 May 2013 11:32, Oussama Jilal jilal.ouss...@gmail.com wrote:

 Yes I did debug it and there is nothing special about it, everything is
 treated the same,

 My Solr version is 4.2

 The copy field is used because the 2 field are of different types but only
 one value is indexed in them (so no multiValue is required and it works
 perfectly).




 On 05/22/2013 11:18 AM, Sandeep Mestry wrote:

 Did you use the debugQuery=true in solr console to see how the query is
 being interpreted and the result calculation?

 Also, I'm not sure but this copyfield directive seems a bit confusing to
 me..
 copyField  source=Id  dest=Suggestion  /
 Because multiValued is false for Suggestion field so does that schema mean
 Suggestion has value only from Id and not from any other input?

 You haven't mentioned the version of Solr, can you also post the query
 params?



 On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote:

  I don't know if this can help (since the document boost should be
 independent of any schema) but here is my schema :

 |?xml version=1.0 encoding=UTF-8?
 schema  name=  version=1.5
  types
  fieldType  name=string  class=solr.StrField
   sortMissingLast=true  /
  fieldType  name=long  class=solr.TrieLongField
   sortMissingLast=true  precisionStep=0  positionIncrementGap=0  /
  fieldType  name=text  class=solr.TextField
   sortMissingLast=true  omitNorms=true
  analyzer  type=index
  tokenizer  class=solr.
 KeywordTokenizerFactory
   /
  filter  class=solr.LowerCaseFilterFactory
  /
  filter  class=solr.
 EdgeNGramFilterFactory

   maxGramSize=255  /
  /analyzer
  analyzer  type=query
  tokenizer  class=solr.
 KeywordTokenizerFactory
   /
  filter  class=solr.LowerCaseFilterFactory
  /

  /analyzer
  /fieldType
  /types
  fields
  field  name=Id  type=string  indexed=true
   stored=true  multiValued=false  required=true  /
  field  name=Suggestion  type=text  indexed=true
   stored=true  multiValued=false  required=false  /
  field  name=Type  type=string  indexed=true
   stored=true  multiValued=false  required=true  /
  field  name=Sections  type=string  indexed=true
   stored=true  multiValued=true  required=false  /
  field  name=_version_  type=long  indexed=true
   stored=true/
  /fields
  copyField  source=Id  dest=Suggestion  /
  uniqueKeyId/uniqueKey
  defaultSearchFieldSuggestion/defaultSearchField

 /schema|

 My query is somthing like : Suggestion:Olive Oil.

 The result is 9 documents, wich all has the same score 11.287682, even
 if they had been indexed with different boosts (I am sure of this).




 On 05/22/2013 10:54 AM, Sandeep Mestry wrote:

  I think that is applicable only for the field level boosting and not at
 document level boosting.

 Can you post your query, field definition and results you're expecting.

 I am using index and query time boosting without any issues so far. also
 which version of Solr you're using?


 On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

   I don't know if this is the issue or not but, concidering this note
 from

 the wiki :

 NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
 for any fields where the index-time boost should be stored.

 In my case where I only need to boost the whole document (not a
 specific
 field), do I have to activate the  omitNorms=false  for all the
 fields in the schema ?




 On 05/22/2013 10:41 AM, Oussama Jilal wrote:

   Thank you Sandeep,

 I did post the document like that (a minor difference is that I did
 not
 add the boost to the field since I don't want to boost on specific
 field, I
 boosted the whole document 'doc boost=2.0  /doc'), but the
 issue
 is that everything in the queries results has the same score even if
 they
 had been indexed with different boosts, and I can't sort on another
 field
 since this is independent from any field value.

 Any ideas ?

 On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

   Hi Oussama,

 This is explained very nicely on Solr Wiki..
 http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**
 boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
 http://wiki.apache.**org/solr

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-22 Thread Sandeep Mestry

Thanks Erick for your suggestion.

Turns out I won't be going that route after all as the highlighter
component is quite complicated - to follow and to override - and not much
time left in hand so did it the manual (dirty) way.

Beat Regards,
Sandeep


On 22 May 2013 12:21, Erick Erickson erickerick...@gmail.com wrote:

 Sandeep:

 You need to be a little careful here, I second Shawn's comment that
 you are mixing versions. You say you are using solr 4.0. But the jar
 that ships with that is apache-solr-core-4.0.0.jar. Then you talk
 about using solr-core, which is called solr-core-4.1.jar.

 Maven is not officially supported, so grabbing some solr-core.jar
 (with no apache) and doing _anything_ with it from a 4.0 code base is
 not a good idea.

 You can check out the 4.0 code branch and just compile the whole
 thing. Or you can get a new 4.0 distro and use the jars there. But I'd
 be _really_ cautious about using a 4.1 or later jar with 4.0.

 FWIW,
 Erick

 On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:
  Thanks Steve,
 
  I could find solr-core.jar in the repo but could not find
  apache-solr-core.jar.
  I think my issue got misunderstood - which is totally my fault.
 
  Anyway, I took into account Shawn's comment and will use solr-core.jar
 only
  for compiling the project - not for deploying.
 
  Thanks,
  Sandeep
 
 
  On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote:
 
  The 4.0 solr-core jar is available in Maven Central: 
 
 http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
  
 
  Steve
 
  On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com
 wrote:
 
   Hi Steve,
  
   Solr 4.0 - mentioned in the subject.. :-)
  
   Thanks,
   Sandeep
  
  
   On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote:
  
   Sandeep,
  
   What version of Solr are you using?
  
   Steve
  
   On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com
  wrote:
  
   Hi Shawn,
  
   Thanks for your reply.
  
   I'm not mixing versions.
   The problem I faced is I want to override Highlighter from solr-core
  jar
   and if I add that as a dependency in my project then there was a
 clash
   between solr-core.jar and the apache-solr-core.jar that comes
 bundled
   within the solr distribution. It was complaining about
   MorfologikFilterFactory
   classcastexception.
   I can't use apache-solr-core.jar as a dependency as no such jar
 exists
  in
   any maven repo.
  
   The only thing I could do is to remove apache-solr-core.jar from
  solr.war
   and then use solr-core.jar as a dependency - however I do not think
  this
   is
   the ideal solution.
  
   Thanks,
   Sandeep
  
  
   On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote:
  
   On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
   And I do remember the discussion on the forum about dropping the
 name
   *apache* from solr jars. If that's what caused this issue, then
 can
  you
   tell me if the mirrors need updating with solr-core.jar instead of
   apache-solr-core.jar?
  
   If it's named apache-solr-core, then it's from 4.0 or earlier.  If
  it's
   named solr-core, then it's from 4.1 or later.  That might mean that
  you
   are mixing versions - don't do that.  Make sure that you have jars
  from
   the exact same version as your server.
  
   Thanks,
   Shawn

Re: Solr Faceting doesn't return values.

2013-05-22 Thread Sandeep Mestry

Hi There,

Not sure I understand your problem correctly, but is 'mm_state_code' a real
value or is it field name?
Also, as Erick pointed out above, the facets are not calculated if there
are no results. Hence you get no facets.

You have mentioned which facets you want but you haven't mentioned which
field you want to search against. That field should be defined in df
parameter instead of sa_property_id.

Can you post example solr document you're indexing?

-Sandeep


On 22 May 2013 14:28, samabhiK qed...@gmail.com wrote:

 Ok my bad.

 I do have a default field defined in the /select handler in the config
 file.

 lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfsa_property_id/str
 /lst

 But then how do I change my query now?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065298.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: filter query by string length or word count?

2013-05-22 Thread Sandeep Mestry

I doubt if there is any straight out of the box feature that supports this
requirement, you will probably need to handle this at the index time.
You can play around with Function Queries
http://wiki.apache.org/solr/FunctionQuery for any such feature.



On 22 May 2013 16:37, Sam Lee skyn...@gmail.com wrote:

 I have schema.xml
 field name=body type=text_en_html indexed=true stored=true
 omitNorms=true/
 ...
 fieldType name=text_en_html class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPossessiveFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPossessiveFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType


 how can I query docs whose body has more than 80 words (or 80 characters) ?

Re: Solr Faceting doesn't return values.

2013-05-22 Thread Sandeep Mestry

From the response you've mentioned it appears to me that the query term TX
is searched against sa_site_city instead of mm_state_code.
Can you try your query like below:

http://xx.xx.xx.xx/solr/collection1/select?q=*mm_state_code:(**TX)*
wt=xmlindent=truefacet=truefacet.field=sa_site_citydebug=all

and post your output?

On 22 May 2013 17:13, samabhiK qed...@gmail.com wrote:

 str name=dfsa_site_city/str

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Hi Shawn,

Thanks for your reply.

I'm not mixing versions.
The problem I faced is I want to override Highlighter from solr-core jar
and if I add that as a dependency in my project then there was a clash
between solr-core.jar and the apache-solr-core.jar that comes bundled
within the solr distribution. It was complaining about MorfologikFilterFactory
classcastexception.
I can't use apache-solr-core.jar as a dependency as no such jar exists in
any maven repo.

The only thing I could do is to remove apache-solr-core.jar from solr.war
and then use solr-core.jar as a dependency - however I do not think this is
the ideal solution.

Thanks,
Sandeep


On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote:

 On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
  And I do remember the discussion on the forum about dropping the name
  *apache* from solr jars. If that's what caused this issue, then can you
  tell me if the mirrors need updating with solr-core.jar instead of
  apache-solr-core.jar?

 If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
 named solr-core, then it's from 4.1 or later.  That might mean that you
 are mixing versions - don't do that.  Make sure that you have jars from
 the exact same version as your server.

 Thanks,
 Shawn

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Hi Steve,

Solr 4.0 - mentioned in the subject.. :-)

Thanks,
Sandeep


On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote:

 Sandeep,

 What version of Solr are you using?

 Steve

 On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com wrote:

  Hi Shawn,
 
  Thanks for your reply.
 
  I'm not mixing versions.
  The problem I faced is I want to override Highlighter from solr-core jar
  and if I add that as a dependency in my project then there was a clash
  between solr-core.jar and the apache-solr-core.jar that comes bundled
  within the solr distribution. It was complaining about
 MorfologikFilterFactory
  classcastexception.
  I can't use apache-solr-core.jar as a dependency as no such jar exists in
  any maven repo.
 
  The only thing I could do is to remove apache-solr-core.jar from solr.war
  and then use solr-core.jar as a dependency - however I do not think this
 is
  the ideal solution.
 
  Thanks,
  Sandeep
 
 
  On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote:
 
  On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
  And I do remember the discussion on the forum about dropping the name
  *apache* from solr jars. If that's what caused this issue, then can you
  tell me if the mirrors need updating with solr-core.jar instead of
  apache-solr-core.jar?
 
  If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
  named solr-core, then it's from 4.1 or later.  That might mean that you
  are mixing versions - don't do that.  Make sure that you have jars from
  the exact same version as your server.
 
  Thanks,
  Shawn

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Thanks Steve,

I could find solr-core.jar in the repo but could not find
apache-solr-core.jar.
I think my issue got misunderstood - which is totally my fault.

Anyway, I took into account Shawn's comment and will use solr-core.jar only
for compiling the project - not for deploying.

Thanks,
Sandeep


On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote:

 The 4.0 solr-core jar is available in Maven Central: 
 http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
 

 Steve

 On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com wrote:

  Hi Steve,
 
  Solr 4.0 - mentioned in the subject.. :-)
 
  Thanks,
  Sandeep
 
 
  On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote:
 
  Sandeep,
 
  What version of Solr are you using?
 
  Steve
 
  On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com
 wrote:
 
  Hi Shawn,
 
  Thanks for your reply.
 
  I'm not mixing versions.
  The problem I faced is I want to override Highlighter from solr-core
 jar
  and if I add that as a dependency in my project then there was a clash
  between solr-core.jar and the apache-solr-core.jar that comes bundled
  within the solr distribution. It was complaining about
  MorfologikFilterFactory
  classcastexception.
  I can't use apache-solr-core.jar as a dependency as no such jar exists
 in
  any maven repo.
 
  The only thing I could do is to remove apache-solr-core.jar from
 solr.war
  and then use solr-core.jar as a dependency - however I do not think
 this
  is
  the ideal solution.
 
  Thanks,
  Sandeep
 
 
  On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote:
 
  On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
  And I do remember the discussion on the forum about dropping the name
  *apache* from solr jars. If that's what caused this issue, then can
 you
  tell me if the mirrors need updating with solr-core.jar instead of
  apache-solr-core.jar?
 
  If it's named apache-solr-core, then it's from 4.0 or earlier.  If
 it's
  named solr-core, then it's from 4.1 or later.  That might mean that
 you
  are mixing versions - don't do that.  Make sure that you have jars
 from
  the exact same version as your server.
 
  Thanks,
  Shawn

Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Dear All,

I have a requirement to highlight a field only when all keywords entered
match. This also needs to support phrase, operator or wildcard queries.
I'm using Solr 4.0 with edismax because the search needs to be carried out
on multiple fields.
I know with highlighting feature I can configure a field to indicate a
match, however I do not find a setting to highlight only if all keywords
match. That makes me think is that the right approach to take? Can you
please guide me in right direction?

The edsimax config looks like below:

requestHandler name=assdismax class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qftitle^10 description^5 annotations^3 notes^2 categories/str
str name=pftitle/str
int name=ps0/int
str name=q.alt*:*/str
str name=fl*,score/str
str name=mm100%/str
str name=q.opAND/str
str name=sortscore desc/str
str name=facettrue/str
str name=facet.limit-1/str
str name=facet.mincount1/str
str name=facet.fielduniq_subtype_id/str
str name=facet.fieldcomponent_type/str
str name=facet.fieldgenre_type/str
/lst
lst name=appends
str name=fqcollection:assets/str
/lst
/requestHandler

If I search for 'countryside number 10' as the keyword then highlight only
if the 'annotations' contain all these entered search terms. Any document
containing just one or two terms is not a match.

Thanks,
Sandeep
(p.s: I haven't enabled the highlighting feature yet on this config and
will be doing so only if that will fulfil the requirement I have mentioned
above.)

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Hi Jaideep,

The edismax config I have posted mentioned that the default operator is
AND. I am sorry if I was not clear in my previous mail, what I need really
is highlight a field when all search query terms present. The current
highlighter works for *any* of the terms match and not for *all* terms
match.

Thanks,
Sandeep


On 20 May 2013 11:40, Jaideep Dhok jaideep.d...@inmobi.com wrote:

 Sandeep,
 If you AND all keywords, that should be OK?

 Thanks
 Jaideep


 On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Dear All,
 
  I have a requirement to highlight a field only when all keywords entered
  match. This also needs to support phrase, operator or wildcard queries.
  I'm using Solr 4.0 with edismax because the search needs to be carried
 out
  on multiple fields.
  I know with highlighting feature I can configure a field to indicate a
  match, however I do not find a setting to highlight only if all keywords
  match. That makes me think is that the right approach to take? Can you
  please guide me in right direction?
 
  The edsimax config looks like below:
 
  requestHandler name=assdismax class=solr.SearchHandler
  lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qftitle^10 description^5 annotations^3 notes^2
  categories/str
  str name=pftitle/str
  int name=ps0/int
  str name=q.alt*:*/str
  str name=fl*,score/str
  str name=mm100%/str
  str name=q.opAND/str
  str name=sortscore desc/str
  str name=facettrue/str
  str name=facet.limit-1/str
  str name=facet.mincount1/str
  str name=facet.fielduniq_subtype_id/str
  str name=facet.fieldcomponent_type/str
  str name=facet.fieldgenre_type/str
  /lst
  lst name=appends
  str name=fqcollection:assets/str
  /lst
  /requestHandler
 
  If I search for 'countryside number 10' as the keyword then highlight
 only
  if the 'annotations' contain all these entered search terms. Any document
  containing just one or two terms is not a match.
 
  Thanks,
  Sandeep
  (p.s: I haven't enabled the highlighting feature yet on this config and
  will be doing so only if that will fulfil the requirement I have
 mentioned
  above.)
 

 --
 _
 The information contained in this communication is intended solely for the
 use of the individual or entity to whom it is addressed and others
 authorized to receive it. It may contain confidential or legally privileged
 information. If you are not the intended recipient you are hereby notified
 that any disclosure, copying, distribution or taking any action in reliance
 on the contents of this information is strictly prohibited and may be
 unlawful. If you have received this communication in error, please notify
 us immediately by responding to this email and then delete it from your
 system. The firm is neither liable for the proper and complete transmission
 of the information contained in this communication nor for any delay in its
 receipt.

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

I doubt if that will be the correct approach as it will be hard to generate
the query grammar considering we have support for phrase, operator,
wildcard and group queries.
That's why I have kept it simple and only passing the query text with
minimal parsing (escaping lucene special characters) to configured edismax.
The number of fields I have mentioned above are a lot lesser than the
actual number of fields - around 50 in number :-). So forming such a long
query will both be time and resource consuming. Further, it's not going to
fulfill my requirement anyway because I do not want to change my search
results, the requirement is only to provide a highlight if a field is
matched for all the query terms.

Thanks,
Sandeep


On 20 May 2013 12:02, Jaideep Dhok jaideep.d...@inmobi.com wrote:

 If you know all fields that need to be queried, you can rewrite it as -
 (assuming, f1, f2 are the fields that you have to search)
 (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)

 -
 Jaideep


 On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Hi Jaideep,
 
  The edismax config I have posted mentioned that the default operator is
  AND. I am sorry if I was not clear in my previous mail, what I need
 really
  is highlight a field when all search query terms present. The current
  highlighter works for *any* of the terms match and not for *all* terms
  match.
 
  Thanks,
  Sandeep
 
 
  On 20 May 2013 11:40, Jaideep Dhok jaideep.d...@inmobi.com wrote:
 
   Sandeep,
   If you AND all keywords, that should be OK?
  
   Thanks
   Jaideep
  
  
   On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry sanmes...@gmail.com
   wrote:
  
Dear All,
   
I have a requirement to highlight a field only when all keywords
  entered
match. This also needs to support phrase, operator or wildcard
 queries.
I'm using Solr 4.0 with edismax because the search needs to be
 carried
   out
on multiple fields.
I know with highlighting feature I can configure a field to indicate
 a
match, however I do not find a setting to highlight only if all
  keywords
match. That makes me think is that the right approach to take? Can
 you
please guide me in right direction?
   
The edsimax config looks like below:
   
requestHandler name=assdismax class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qftitle^10 description^5 annotations^3 notes^2
categories/str
str name=pftitle/str
int name=ps0/int
str name=q.alt*:*/str
str name=fl*,score/str
str name=mm100%/str
str name=q.opAND/str
str name=sortscore desc/str
str name=facettrue/str
str name=facet.limit-1/str
str name=facet.mincount1/str
str name=facet.fielduniq_subtype_id/str
str name=facet.fieldcomponent_type/str
str name=facet.fieldgenre_type/str
/lst
lst name=appends
str name=fqcollection:assets/str
/lst
/requestHandler
   
If I search for 'countryside number 10' as the keyword then highlight
   only
if the 'annotations' contain all these entered search terms. Any
  document
containing just one or two terms is not a match.
   
Thanks,
Sandeep
(p.s: I haven't enabled the highlighting feature yet on this config
 and
will be doing so only if that will fulfil the requirement I have
   mentioned
above.)
   
  
   --
   _
   The information contained in this communication is intended solely for
  the
   use of the individual or entity to whom it is addressed and others
   authorized to receive it. It may contain confidential or legally
  privileged
   information. If you are not the intended recipient you are hereby
  notified
   that any disclosure, copying, distribution or taking any action in
  reliance
   on the contents of this information is strictly prohibited and may be
   unlawful. If you have received this communication in error, please
 notify
   us immediately by responding to this email and then delete it from your
   system. The firm is neither liable for the proper and complete
  transmission
   of the information contained in this communication nor for any delay in
  its
   receipt.
  
 

 --
 _
 The information contained in this communication is intended solely for the
 use of the individual or entity to whom it is addressed and others
 authorized to receive it. It may contain confidential or legally privileged
 information. If you are not the intended recipient you are hereby notified
 that any disclosure, copying, distribution or taking any action in reliance
 on the contents of this information is strictly prohibited and may be
 unlawful. If you have received this communication in error, please notify
 us immediately by responding to this email and then delete it from your
 system

Re: Highlight only when all keywords match

2013-05-20 Thread Sandeep Mestry

Thanks Upayavira for that valuable suggestion.

I believe overriding highlight component should be the way forward.
Could you tell me if there is any existing example or which methods I
should particularly override?

Thanks,
Sandeep


On 20 May 2013 12:47, Upayavira u...@odoko.co.uk wrote:

 If you are saying that you want to change highlighting behaviour, not
 query behaviour, then I suspect you are going to have to interact with
 the java HighlightComponent. If you can work out how to update that
 component to behave as you wish, you could either subclass it, or create
 your own implementation that you can include in your Solr setup. Or, if
 you make it generic enough, offer it back as a contribution that can be
 included in future Solr releases.

 Upayavira

 On Mon, May 20, 2013, at 12:14 PM, Sandeep Mestry wrote:
  I doubt if that will be the correct approach as it will be hard to
  generate
  the query grammar considering we have support for phrase, operator,
  wildcard and group queries.
  That's why I have kept it simple and only passing the query text with
  minimal parsing (escaping lucene special characters) to configured
  edismax.
  The number of fields I have mentioned above are a lot lesser than the
  actual number of fields - around 50 in number :-). So forming such a long
  query will both be time and resource consuming. Further, it's not going
  to
  fulfill my requirement anyway because I do not want to change my search
  results, the requirement is only to provide a highlight if a field is
  matched for all the query terms.
 
  Thanks,
  Sandeep
 
 
  On 20 May 2013 12:02, Jaideep Dhok jaideep.d...@inmobi.com wrote:
 
   If you know all fields that need to be queried, you can rewrite it as -
   (assuming, f1, f2 are the fields that you have to search)
   (f1:kw1 AND f1:kw2 ... f1:kwn) OR (f2:kw1 AND f2:kw2 ... f2:kwn)
  
   -
   Jaideep
  
  
   On Mon, May 20, 2013 at 4:22 PM, Sandeep Mestry sanmes...@gmail.com
   wrote:
  
Hi Jaideep,
   
The edismax config I have posted mentioned that the default operator
 is
AND. I am sorry if I was not clear in my previous mail, what I need
   really
is highlight a field when all search query terms present. The current
highlighter works for *any* of the terms match and not for *all*
 terms
match.
   
Thanks,
Sandeep
   
   
On 20 May 2013 11:40, Jaideep Dhok jaideep.d...@inmobi.com wrote:
   
 Sandeep,
 If you AND all keywords, that should be OK?

 Thanks
 Jaideep


 On Mon, May 20, 2013 at 3:44 PM, Sandeep Mestry 
 sanmes...@gmail.com
 wrote:

  Dear All,
 
  I have a requirement to highlight a field only when all keywords
entered
  match. This also needs to support phrase, operator or wildcard
   queries.
  I'm using Solr 4.0 with edismax because the search needs to be
   carried
 out
  on multiple fields.
  I know with highlighting feature I can configure a field to
 indicate
   a
  match, however I do not find a setting to highlight only if all
keywords
  match. That makes me think is that the right approach to take?
 Can
   you
  please guide me in right direction?
 
  The edsimax config looks like below:
 
  requestHandler name=assdismax class=solr.SearchHandler
  lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qftitle^10 description^5 annotations^3 notes^2
  categories/str
  str name=pftitle/str
  int name=ps0/int
  str name=q.alt*:*/str
  str name=fl*,score/str
  str name=mm100%/str
  str name=q.opAND/str
  str name=sortscore desc/str
  str name=facettrue/str
  str name=facet.limit-1/str
  str name=facet.mincount1/str
  str name=facet.fielduniq_subtype_id/str
  str name=facet.fieldcomponent_type/str
  str name=facet.fieldgenre_type/str
  /lst
  lst name=appends
  str name=fqcollection:assets/str
  /lst
  /requestHandler
 
  If I search for 'countryside number 10' as the keyword then
 highlight
 only
  if the 'annotations' contain all these entered search terms. Any
document
  containing just one or two terms is not a match.
 
  Thanks,
  Sandeep
  (p.s: I haven't enabled the highlighting feature yet on this
 config
   and
  will be doing so only if that will fulfil the requirement I have
 mentioned
  above.)
 

 --
 _
 The information contained in this communication is intended solely
 for
the
 use of the individual or entity to whom it is addressed and others
 authorized to receive it. It may contain confidential or legally
privileged
 information. If you are not the intended recipient you are hereby
notified
 that any disclosure, copying, distribution or taking any action

Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-20 Thread Sandeep Mestry

Hi All,

I want to override a component from solr-core and for that I need solr-core
jar.

I am using the solr.war that comes from Apache mirror and if I open the
war, I see the solr-core jar is actually named as apache-solr-core.jar.
This is also true about solrj jar.

If I now provide a dependency in my module for apache-solr-core.jar, it's
not being found in the mirror. And if I use solr-core.jar, I get strange
class cast exception during Solr startup for MorfologikFilterFactory.

(I'm not using this factory at all in my project.)

at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.ClassCastException: class
org.apache.lucene.analysis.morfologik.MorfologikFilterFactory
at java.lang.Class.asSubclass(Unknown Source)
at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:126)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:73)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.init(AnalysisSPILoader.java:55)

I tried manually removing the apache-solr-core.jar from the solr
distribution war and then providing the dependency and everything worked
fine.

And I do remember the discussion on the forum about dropping the name
*apache* from solr jars. If that's what caused this issue, then can you
tell me if the mirrors need updating with solr-core.jar instead of
apache-solr-core.jar?

Many Thanks,
Sandeep

Re: Question about Edismax - Solr 4.0

2013-05-17 Thread Sandeep Mestry

Hello Jack,

Thanks for pointing the issues out and for your valuable suggestion. My
preliminary tests were okay on search but I will be doing more testing to
see if this has impacted any other searches.

Thanks once again and have a nice sunny weekend,
Sandeep


On 17 May 2013 05:35, Jack Krupansky j...@basetechnology.com wrote:

 Ah... I think your issue is the preserveOriginal=1 on the query analyzer
 as well as the fact that you have all of these catenatexx=1 options on
 the query analyzer - I indicated that you should remove them all.

 The problem is that the whitespace analyzer leaves the leading comma in
 place, and the preserveOriginal=1 also generates an extra token for the
 term, with the comma in place . But, with the space, the comma and 10 are
 separate terms and get analyzed independently.

 The query results probably indicate that you don't have that exact
 combination of the term and leading punctuation - or that there is no
 standalone comma in your input data.

 Try the following replacement for the query-time WDF:


 filter class=solr.**WordDelimiterFilterFactory
 stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0
 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=0 /


 -- Jack Krupansky

 -Original Message- From: Sandeep Mestry
 Sent: Thursday, May 16, 2013 5:50 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Question about Edismax - Solr 4.0

 Hi Jack,

 Thanks for your response again and for helping me out to get through this.

 The URL is definitely encoded for spaces and it looks like below. As I
 mentioned in my previous mail, I can't add it to query parameter as that
 searches on multiple fields.

 The title field is defined as below:
 field name=title type=text_wc indexed=true stored=false
 multiValued=true/

 q=countrysiderows=20qt=**assdismaxfq=%28title%3A%28,**
 10%29%29fq=collection:assets

 requestHandler name=assdismax class=solr.SearchHandler
 lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/**str
 float name=tie0.01/float
 str name=qftitle^10 description^5 annotations^3 notes^2
 categories/str
 str name=pftitle/str
 int name=ps0/int
 str name=q.alt*:*/str
 str name=fl*,score/str
 str name=mm100%/str
 str name=q.opAND/str
 str name=sortscore desc/str
 str name=facettrue/str
 str name=facet.limit-1/str
 str name=facet.mincount1/str
 str name=facet.fielduniq_**subtype_id/str
 str name=facet.fieldcomponent_**type/str
 str name=facet.fieldgenre_type**/str
 /lst
 lst name=appends
 str name=fqcollection:assets/**str
 /lst
 /requestHandler

 The term 'countryside' needs to be searched against multiple fields
 including titles, descriptions, annotations, categories, notes but the UI
 also has a feature to limit results by providing a title field.


 I can see that the filter queries are always parsed by LuceneQueryParser
 however I'd expect it to generate the parsed_filter_queries debug output in
 every situation.

 I have tried it as the main query with both edismax and lucene defType and
 it gives me correct output and correct results.
 But, there is some problem when this is used as a filter query as the the
 parser is not able to parse a comma with a space.

 Thanks again Jack, please let me know in case you need more inputs from my
 side.

 Best Regards,
 Sandeep

 On 16 May 2013 18:03, Jack Krupansky j...@basetechnology.com wrote:

  Could you show us the full query URL - spaces must be encoded in URL query
 parameters.

 Also show the actual field XML - you omitted that.

 Try the same query as a main query, using both defType=edismax and
 defType=lucene.

 Note that the filter query is parsed using the Lucene query parser, not
 edismax, independent of the defType parameter. But you don't have any
 edismax features in your fq anyway.

 But you can stick {!edismax} in front of the query to force edismax to be
 used for the fq, although it really shouldn't change anything:

 Also, catenate is fine for indexing, but will mess up your queries at
 query time, so set them to 0 in the query analyzer

 Also, make sure you have autoGeneratePhraseQueries=true on the
 field

 type, but that's not the issue here.


 -- Jack Krupansky

 -Original Message- From: Sandeep Mestry
 Sent: Thursday, May 16, 2013 12:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about Edismax - Solr 4.0


 Thanks Jack for your reply..

 The problem is, I'm finding results for fq=title:(,10) but not for
 fq=title:(, 10) - apologies if that was not clear from my first mail.
 I have already mentioned the debug analysis in my previous mail.

 Additionally, the title field is defined as below:
 fieldType name=text_wc class=solr.TextField
 positionIncrementGap=100


  analyzer type=index

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory

 stemEnglishPossessive=0 generateWordParts=1

Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

-- *Edismax and Filter Queries with Commas and spaces* --

Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,
arr name=filter_queries
str(titles:(,10))/str
str(collection:assets)/str

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 1
above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:
fieldType name=text_wc class=solr.TextField positionIncrementGap=100

 analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?
- And why other filter queries did not show up (collection:assets) in debug
section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote:

 You haven't indicated any problem here! What is the symptom that you
 actually think is a problem.

 There is no comma operator in any of the Solr query parsers. Comma is just
 another character that may or may not be included or discarded depending on
 the specific field type and analyzer. For example, a white space analyzer
 will keep commas, but the standard analyzer or the word delimiter filter
 will discard them. If title were a string type, all punctuation would
 be preserved, including commas and spaces (but spaces would need to be
 escaped or the term text enclosed in parentheses.)

 Let us know what your symptom is though, first.

 I mean, the filter query looks perfectly reasonable from an abstract
 perspective.

 -- Jack Krupansky

 -Original Message- From: Sandeep Mestry
 Sent: Thursday, May 16, 2013 6:51 AM
 To: solr-user@lucene.apache.org
 Subject: Question about Edismax - Solr 4.0

 -- *Edismax and Filter Queries with Commas and spaces* --


 Dear Experts,

 This appears to be a bug, please suggest if I'm wrong.

 If I search with the following filter query,

 1) fq=title:(, 10)

 - I get no results.
 - The debug output does NOT show the section containing
 parsed_filter_queries

 if I carry a search with the filter query,

 2) fq=title:(,10) - (No space between , and 10)

 - I get results and the debug output shows the parsed filter queries
 section as,
 arr name=filter_queries
 str(titles:(,10))/str
 str(collection:assets)/str

 As you can see above, I'm also passing in other filter queries
 (collection:assets) which appear correctly but they do not appear in case 1
 above.

 I can't make this as part of the query parameter as that needs to be
 searched against multiple fields.

 Can someone suggest a fix in this case please. I'm using Solr 4.0.

 Many Thanks,
 Sandeep

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Hi Jack,

Thanks for your response again and for helping me out to get through this.

The URL is definitely encoded for spaces and it looks like below. As I
mentioned in my previous mail, I can't add it to query parameter as that
searches on multiple fields.

The title field is defined as below:
field name=title type=text_wc indexed=true stored=false
multiValued=true/

q=countrysiderows=20qt=assdismaxfq=%28title%3A%28,10%29%29fq=collection:assets

requestHandler name=assdismax class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qftitle^10 description^5 annotations^3 notes^2 categories/str
str name=pftitle/str
int name=ps0/int
str name=q.alt*:*/str
str name=fl*,score/str
str name=mm100%/str
str name=q.opAND/str
str name=sortscore desc/str
str name=facettrue/str
str name=facet.limit-1/str
str name=facet.mincount1/str
str name=facet.fielduniq_subtype_id/str
str name=facet.fieldcomponent_type/str
str name=facet.fieldgenre_type/str
/lst
lst name=appends
str name=fqcollection:assets/str
/lst
/requestHandler

The term 'countryside' needs to be searched against multiple fields
including titles, descriptions, annotations, categories, notes but the UI
also has a feature to limit results by providing a title field.


I can see that the filter queries are always parsed by LuceneQueryParser
however I'd expect it to generate the parsed_filter_queries debug output in
every situation.

I have tried it as the main query with both edismax and lucene defType and
it gives me correct output and correct results.
But, there is some problem when this is used as a filter query as the the
parser is not able to parse a comma with a space.

Thanks again Jack, please let me know in case you need more inputs from my
side.

Best Regards,
Sandeep

On 16 May 2013 18:03, Jack Krupansky j...@basetechnology.com wrote:

 Could you show us the full query URL - spaces must be encoded in URL query
 parameters.

 Also show the actual field XML - you omitted that.

 Try the same query as a main query, using both defType=edismax and
 defType=lucene.

 Note that the filter query is parsed using the Lucene query parser, not
 edismax, independent of the defType parameter. But you don't have any
 edismax features in your fq anyway.

 But you can stick {!edismax} in front of the query to force edismax to be
 used for the fq, although it really shouldn't change anything:

 Also, catenate is fine for indexing, but will mess up your queries at
 query time, so set them to 0 in the query analyzer

 Also, make sure you have autoGeneratePhraseQueries=**true on the field
 type, but that's not the issue here.


 -- Jack Krupansky

 -Original Message- From: Sandeep Mestry
 Sent: Thursday, May 16, 2013 12:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about Edismax - Solr 4.0


 Thanks Jack for your reply..

 The problem is, I'm finding results for fq=title:(,10) but not for
 fq=title:(, 10) - apologies if that was not clear from my first mail.
 I have already mentioned the debug analysis in my previous mail.

 Additionally, the title field is defined as below:
 fieldType name=text_wc class=solr.TextField positionIncrementGap=100


  analyzer type=index
tokenizer class=solr.**WhitespaceTokenizerFactory/
filter class=solr.**WordDelimiterFilterFactory
 stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
 splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.**LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.**WhitespaceTokenizerFactory/
filter class=solr.**WordDelimiterFilterFactory
 stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
 splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.**LowerCaseFilterFactory/
/analyzer
/fieldType

 I have the set catenate options to 1 for all types.
 I can understand if ',' getting ignored when it is on its own (title:(,
 10)) but
 - Why solr is not searching for 10 in that case just like it did when the
 query was (title:(,10))?
 - And why other filter queries did not show up (collection:assets) in debug
 section?


 Thanks,
 Sandeep


 On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote:

  You haven't indicated any problem here! What is the symptom that you
 actually think is a problem.

 There is no comma operator in any of the Solr query parsers. Comma is just
 another character that may or may not be included or discarded depending
 on
 the specific field type and analyzer. For example, a white space analyzer
 will keep commas, but the standard analyzer or the word delimiter filter
 will discard them. If title were a string type, all punctuation would

Solr Sorting Algorithm

2013-05-13 Thread Sandeep Mestry

Good Morning All,

The alphabetical sorting is causing slight issues as below:

I have 3 documents with title value as below:

1) Acer Palmatum (Tree)
2) Aceraceae (Tree Family)
3) Acer Pseudoplatanus (Tree)

I have created title_sort field which is defined with field type as
alphaNumericalSort (that comes with solr example schema)

When I apply the sort order (sort=title_sort asc), I get the results as:

Aceraceae (Tree Family)
Acer Palmatum (Tree)
Acer Pseudoplatanus (Tree)

But, the expected order is (spaces first),

Acer Palmatum (Tree)
Acer Pseudoplatanus (Tree)
Aceraceae (Tree Family)

My unit test contains Collections.sort method and I get the expected
results but I'm not sure why Solr is doing it in different way.

From Collections.sort API, I can see that it uses modified merge sort,
could you tell me which algorithm solr follows for sorting logic and also
if there is any other approach I can take?

Many Thanks,
Sandeep

Re: commit in solr4 takes a longer time

2013-05-03 Thread Sandeep Mestry

That's not ideal.
Can you post solrconfig.xml?
On 3 May 2013 07:41, vicky desai vicky.de...@germinait.com wrote:

 Hi sandeep,

 I made the changes u mentioned and tested again for the same set of docs
 but
 unfortunately the commit time increased.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060622.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: commit in solr4 takes a longer time

2013-05-02 Thread Sandeep Mestry

Hi Vicky,

I faced this issue as well and after some playing around I found the
autowarm count in cache sizes to be a problem.
I changed that from a fixed count (3072) to percentage (10%) and all commit
times were stable then onwards.

filterCache class=solr.FastLRUCache size=8192 initialSize=3072
autowarmCount=10% /
queryResultCache class=solr.LRUCache size=16384 initialSize=3072
autowarmCount=10% /
documentCache class=solr.LRUCache size=8192 initialSize=4096
autowarmCount=10% /

HTH,
Sandeep


On 2 May 2013 16:31, Alexandre Rafalovitch arafa...@gmail.com wrote:

 If you don't re-open the searcher, you will not see new changes. So,
 if you only have hard commit, you never see those changes (until
 restart). But if you also have soft commit enabled, that will re-open
 your searcher for you.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, May 2, 2013 at 11:21 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  What happens exactly when you don't open searcher at commit?
 
  2013/5/2 Gopal Patwa gopalpa...@gmail.com
 
  you might want to added openSearcher=false for hard commit, so hard
 commit
  also act like soft commit
 
 autoCommit
  maxDocs5/maxDocs
  maxTime30/maxTime
 openSearcherfalse/openSearcher
  /autoCommit
 
 
 
  On Thu, May 2, 2013 at 12:16 AM, vicky desai vicky.de...@germinait.com
  wrote:
 
   Hi,
  
   I am using 1 shard and two replicas. Document size is around 6 lakhs
  
  
   My solrconfig.xml is as follows
   ?xml version=1.0 encoding=UTF-8 ?
   config
   luceneMatchVersionLUCENE_40/luceneMatchVersion
   indexConfig
  
  
   maxFieldLength2147483647/maxFieldLength
   lockTypesimple/lockType
   unlockOnStartuptrue/unlockOnStartup
   /indexConfig
   updateHandler class=solr.DirectUpdateHandler2
   autoSoftCommit
   maxDocs500/maxDocs
   maxTime1000/maxTime
   /autoSoftCommit
   autoCommit
   maxDocs5/maxDocs
   maxTime30/maxTime
   /autoCommit
   /updateHandler
  
   requestDispatcher handleSelect=true 
   requestParsers enableRemoteStreaming=false
   multipartUploadLimitInKB=204800 /
   /requestDispatcher
  
   requestHandler name=standard
  class=solr.StandardRequestHandler
   default=true /
   requestHandler name=/update
 class=solr.UpdateRequestHandler
  /
   requestHandler name=/admin/
   class=org.apache.solr.handler.admin.AdminHandlers /
   requestHandler name=/replication
   class=solr.ReplicationHandler /
   directoryFactory name=DirectoryFactory
   class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} /
   enableLazyFieldLoadingtrue/enableLazyFieldLoading
   admin
   defaultQuery*:*/defaultQuery
   /admin
   /config
  
  
  
  
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060402.html
   Sent from the Solr - User mailing list archive at Nabble.com.

Custom sorting of Solr Results

2013-04-30 Thread Sandeep Mestry

Dear Experts,


 I have a requirement for the exact matches and applying alphabetical
 sorting thereafter.

 To illustrate, the results should be sorted in exact matches and all later
 alphabetical.

 So, if there are 5 documents as below

 Doc1
 title: trees

 Doc 2
 title: plum trees

 Doc 3
 title: Money Trees (Legendary Trees)

 Doc 4
 title: Cork Trees

 Doc 5
 title: Old Trees

 Then, if user searches with query term as 'trees', the results should be
 in following order:

 Doc 1 trees - Highest Rank
 Doc 4 Cork Trees - Alphabetical afterwards..
 Doc 3 Money Trees (Legendary Trees)
 Doc 5 Old Trees
 Doc 2 plum trees

 I can achieve the alphabetical sorting by adding the title sort parameter, 
 However,
 Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so it 
 arranges
 Doc 3 above Doc 4, 5 and 2).
 So, it looks like:

 Doc 1 trees - Highest Rank
 Doc 3 Money Trees (Legendary Trees)
 Doc 4 Cork Trees - Alphabetical afterwards..
 Doc 5 Old Trees
 Doc 2 plum trees

 Can you tell me an easy way to achieve this requirement please?

 I'm using Solr 4.0 and the *title *field is defined as follows:

 fieldType name=text_wc class=solr.TextField
 positionIncrementGap=100 
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
 splitOnNumerics=0 preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
 splitOnNumerics=0 preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType



 Many Thanks in advance,
 Sandeep

Re: Exact and Partial Matches

2013-04-30 Thread Sandeep Mestry

Thanks Erick,

I tried grouping and it appears to work okay. However, I will need to
change the client to parse the output..

fq=title:(tree)group=truegroup.query=title:(trees) NOT
title_ci:treesgroup.query=title_ci:blairgroup.sort=title_sort
descsort=score desc,title_sort asc

I used the actual query as the filter query so my scores will be 1 and then
used 2 group queries - one which will give me exact matches and other that
will give me partial minus exact matches.
I have tried this with operators too and it seems to be doing the job I
want, do you see any issue in this?

Thanks again for your reply and by the way thanks for SOLR-4662.

-S


On 30 April 2013 15:06, Erick Erickson erickerick...@gmail.com wrote:

 I don't think you can do that. You're essentially
 trying to mix ordering of the result set. You
 _might_ be able to kludge some of this with
 grouping, but I doubt it.

 You'll need two queries I'd guess.

 Best
 Erick

 On Mon, Apr 29, 2013 at 9:44 AM, Sandeep Mestry sanmes...@gmail.com
 wrote:
  Dear Experts,
 
  I have a requirement for the exact matches and applying alphabetical
  sorting thereafter.
 
  To illustrate, the results should be sorted in exact matches and all
 later
  alphabetical.
 
  So, if there are 5 documents as below
 
  Doc1
  title: trees
 
  Doc 2
  title: plum trees
 
  Doc 3
  title: Money Trees (Legendary Trees)
 
  Doc 4
  title: Cork Trees
 
  Doc 5
  title: Old Trees
 
  Then, if user searches with query term as 'trees', the results should be
 in
  following order:
 
  Doc 1 trees - Highest Rank
  Doc 4 Cork Trees - Alphabetical afterwards..
  Doc 3 Money Trees (Legendary Trees)
  Doc 5 Old Trees
  Doc 2 plum trees
 
  I can achieve the alphabetical sorting by adding the title sort
  parameter, However,
  Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so
  it arranges
  Doc 3 above Doc 4, 5 and 2).
  So, it looks like:
 
  Doc 1 trees - Highest Rank
  Doc 3 Money Trees (Legendary Trees)
  Doc 4 Cork Trees - Alphabetical afterwards..
  Doc 5 Old Trees
  Doc 2 plum trees
 
  Can you tell me an easy way to achieve this requirement please?
 
  I'm using Solr 4.0 and the *title *field is defined as follows:
 
  fieldType name=text_wc class=solr.TextField
 positionIncrementGap=100
 
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=1
 splitOnCaseChange=1
  splitOnNumerics=0 preserveOriginal=1 /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=1
 splitOnCaseChange=1
  splitOnNumerics=0 preserveOriginal=1 /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  /fieldType
 
 
 
  Many Thanks in advance,
  Sandeep

Exact and Partial Matches

2013-04-29 Thread Sandeep Mestry

Dear Experts,

I have a requirement for the exact matches and applying alphabetical
sorting thereafter.

To illustrate, the results should be sorted in exact matches and all later
alphabetical.

So, if there are 5 documents as below

Doc1
title: trees

Doc 2
title: plum trees

Doc 3
title: Money Trees (Legendary Trees)

Doc 4
title: Cork Trees

Doc 5
title: Old Trees

Then, if user searches with query term as 'trees', the results should be in
following order:

Doc 1 trees - Highest Rank
Doc 4 Cork Trees - Alphabetical afterwards..
Doc 3 Money Trees (Legendary Trees)
Doc 5 Old Trees
Doc 2 plum trees

I can achieve the alphabetical sorting by adding the title sort
parameter, However,
Solr relevancy is higher for Doc 3 (due to matches in 2 terms and so
it arranges
Doc 3 above Doc 4, 5 and 2).
So, it looks like:

Doc 1 trees - Highest Rank
Doc 3 Money Trees (Legendary Trees)
Doc 4 Cork Trees - Alphabetical afterwards..
Doc 5 Old Trees
Doc 2 plum trees

Can you tell me an easy way to achieve this requirement please?

I'm using Solr 4.0 and the *title *field is defined as follows:

fieldType name=text_wc class=solr.TextField positionIncrementGap=100

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType



Many Thanks in advance,
Sandeep

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry

Hi Pawel,

Not sure which parser you are using, I am using edismax and tried using the
bq parameter to boost the results having exact matches at the top.
You may try something like:
q=cats AND London NOT Leedsbq=cats^50

In edismax, pf and pf2 parameters also need some tuning to get the results
at the top.

HTH,
Sandeep


On 25 April 2013 10:33, vsl ociepa.pa...@gmail.com wrote:

 Hi,
  is it possible to get exact matched result if the search term is combined
 e.g. cats AND London NOT Leeds


 In the previus threads I have read that it is possible to create new field
 of String type and perform phrase search on it but nowhere the above
 mentioned combined search term had been taken into consideration.

 BR
 Pawel



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry

I think in that case, making a field String type is your option, however
remember that it'd be case sensitive.
Another approach is to create a case insensitive field type and doing
searches on those fields only.

fieldType name=string_ci class=solr.TextField sortMissingLast=true
omitNorms=true compressThreshold=10
   analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

Can you provide your fields and dismax config and if possible records you
would like and records you do not want?

-S


On 25 April 2013 11:50, vsl ociepa.pa...@gmail.com wrote:

 Thanks for your reply. I am using edismax as well. What I want to get is
 the
 exact match without other results that could be close to the given term.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact matching in Solr 3.6.1

2013-04-25 Thread Sandeep Mestry

Agree with Jack.

The current field type text_general is designed to match the query tokens
instead of exact matches - so it's not able to fulfill your requirements.

Can you use flat file
http://wiki.apache.org/solr/FileBasedSpellCheckeras spell check
dictionary instead and that way you can search on exact
matched field while generating spell check suggestions from the file
instead of from index?

-S

On 25 April 2013 16:25, Jack Krupansky j...@basetechnology.com wrote:

Well then just do an exact match ONLY!

It sounds like you haven't worked out the inconsistencies in your
requirements.

To be clear: We're not offering you solutions - that's your job. We're
only pointing out tools that you can use. It is up to you to utilize the
tools wisely to implement your solution.

I suspect that you simply haven't experimented enough with various boosts
to assure that the unstemmed result is consistently higher.

Maybe you need a custom stemmer or stemmer overide so that passengers
does get stemmed to passenger, but cats does not (but dogs does.)
That can be a choice that you can make, but I would urge caution. Still, it
is a decision that you can make - it's not a matter of Solr forcing or
preventing you. I still think boosting of an unstemmed field should be
sufficient.

But until you clarify the inconsistencies in your requirements, we won't
be able to make much progress.

-- Jack Krupansky

-Original Message- From: vsl
Sent: Thursday, April 25, 2013 10:45 AM

To: solr-user@lucene.apache.org
Subject: Re: Exact matching in Solr 3.6.1

Thanks for your reply but this solution does not fullfil my requirment
because other documents (not exact matched) will be returned as well.

--
View this message in context: http://lucene.472066.n3.**
nabble.com/Exact-matching-in-**Solr-3-6-1-tp4058865p4058929.**htmlhttp://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question on Exact Matches - edismax

2013-04-04 Thread Sandeep Mestry

Hi Jan,

Thanks for your reply. I have defined string_ci like below:

fieldType name=string_ci class=solr.TextField sortMissingLast=true
omitNorms=true compressThreshold=10
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

When I analyse the query in solr, I saw that document containing
pg_series_title_ci:funny  matches when I do a search for
pg_series_title_ci:funny games and is ranked higher than the document
containing the exact matches. I can use the default string data type but
then the match will be on exact casing.

Thanks,
Sandeep


On 3 April 2013 22:20, Jan Høydahl jan@cominvent.com wrote:

 Can you show us your *_ci field type? Solr does not really have a way to
 tell whether a match is exact or only partial, but you could hack around
 it with the fieldType. See https://github.com/cominvent/exactmatch for a
 possible solution.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 3. apr. 2013 kl. 15:55 skrev Sandeep Mestry sanmes...@gmail.com:

  Hi All,
 
  I have a requirement where in exact matches for 2 fields (Series Title,
  Title) should be ranked higher than the partial matches. The
 configuration
  looks like below:
 
  requestHandler name=assetdismax class=solr.SearchHandler 
 lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf*pg_series_title_ci*^500 *title_ci*^300 *
  pg_series_title*^200 *title*^25 classifications^15
 classifications_texts^15
  parent_classifications^10 synonym_classifications^5 pg_brand_title^5
  pg_series_working_title^5 p_programme_title^5 p_item_title^5
  p_interstitial_title^5 description^15 pg_series_description
 annotations^0.1
  classification_notes^0.05 pv_program_version_number^2
  pv_program_version_number_ci^2 pv_program_number^2 pv_program_number_ci^2
  p_program_number^2 ma_version_number^2 ma_recording_location
  ma_contributions^0.001 rel_pg_series_title rel_programme_title
  rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5
  pv_uuid^0.5 ma_uuid^0.5/str
 str name=pfpg_series_title_ci^500 title_ci^500/str
 int name=ps0/int
 str name=q.alt*:*/str
 str name=mm100%/str
 str name=q.opAND/str
 str name=facettrue/str
 str name=facet.limit-1/str
 str name=facet.mincount1/str
 /lst
 /requestHandler
 
  As you can see above, the search is against many fields. What I'd want is
  the documents that have exact matches for series title and title fields
  should rank higher than the rest.
 
  I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields
 for
  series title and title and have boosted them higher over the tokenized
 and
  rest of the fields. I have also implemented a similarity class to
 override
  idf however I still get documents having partial matches in title and
 other
  fields ranking higher than exact match in pg_series_title_ci.
 
  Many Thanks,
  Sandeep

Re: Question on Exact Matches - edismax

2013-04-04 Thread Sandeep Mestry

Another problem that I see in Solr analysis is the query term that matches
the tokenized field does not match on the case insensitive field.
So, if I'm searching for 'coast to coast', I see that the tokenized series
title (pg_series_title) is matched but not the ci field which is
pg_series_title_ci.

The definition of both field is as below:

fieldType name=text_wc class=solr.TextField positionIncrementGap=100

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


fieldType name=string_ci class=solr.TextField sortMissingLast=true
omitNorms=true compressThreshold=10
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

field name=pg_series_title type=text_wc indexed=true stored=true
multiValued=false /
field name=pg_series_title_ci type=string_ci indexed=true
stored=true multiValued=false /

*copyField source=pg_series_title dest=pg_series_title_ci /*
*
*
*Can this copyfield directive be an issue? Should it be other way round or
does it matter?*

Thanks,
Sandeep





On 4 April 2013 10:38, Sandeep Mestry sanmes...@gmail.com wrote:

 Hi Jan,

 Thanks for your reply. I have defined string_ci like below:

 fieldType name=string_ci class=solr.TextField sortMissingLast=true
 omitNorms=true compressThreshold=10
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 When I analyse the query in solr, I saw that document containing
 pg_series_title_ci:funny  matches when I do a search for
 pg_series_title_ci:funny games and is ranked higher than the document
 containing the exact matches. I can use the default string data type but
 then the match will be on exact casing.

 Thanks,
 Sandeep


 On 3 April 2013 22:20, Jan Høydahl jan@cominvent.com wrote:

 Can you show us your *_ci field type? Solr does not really have a way to
 tell whether a match is exact or only partial, but you could hack around
 it with the fieldType. See https://github.com/cominvent/exactmatch for a
 possible solution.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 3. apr. 2013 kl. 15:55 skrev Sandeep Mestry sanmes...@gmail.com:

  Hi All,
 
  I have a requirement where in exact matches for 2 fields (Series Title,
  Title) should be ranked higher than the partial matches. The
 configuration
  looks like below:
 
  requestHandler name=assetdismax class=solr.SearchHandler 
 lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf*pg_series_title_ci*^500 *title_ci*^300 *
  pg_series_title*^200 *title*^25 classifications^15
 classifications_texts^15
  parent_classifications^10 synonym_classifications^5 pg_brand_title^5
  pg_series_working_title^5 p_programme_title^5 p_item_title^5
  p_interstitial_title^5 description^15 pg_series_description
 annotations^0.1
  classification_notes^0.05 pv_program_version_number^2
  pv_program_version_number_ci^2 pv_program_number^2
 pv_program_number_ci^2
  p_program_number^2 ma_version_number^2 ma_recording_location
  ma_contributions^0.001 rel_pg_series_title rel_programme_title
  rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5
  pv_uuid^0.5 ma_uuid^0.5/str
 str name=pfpg_series_title_ci^500 title_ci^500/str
 int name=ps0/int
 str name=q.alt*:*/str
 str name=mm100%/str
 str name=q.opAND/str
 str name=facettrue/str
 str name=facet.limit-1/str
 str name=facet.mincount1/str
 /lst
 /requestHandler
 
  As you can see above, the search is against many fields. What I'd want
 is
  the documents that have exact matches for series title and title fields
  should rank higher than the rest.
 
  I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields
 for
  series title and title and have boosted them higher over the tokenized
 and
  rest of the fields. I have

Question on Exact Matches - edismax

2013-04-03 Thread Sandeep Mestry

Hi All,

I have a requirement where in exact matches for 2 fields (Series Title,
Title) should be ranked higher than the partial matches. The configuration
looks like below:

requestHandler name=assetdismax class=solr.SearchHandler 
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qf*pg_series_title_ci*^500 *title_ci*^300 *
pg_series_title*^200 *title*^25 classifications^15 classifications_texts^15
parent_classifications^10 synonym_classifications^5 pg_brand_title^5
pg_series_working_title^5 p_programme_title^5 p_item_title^5
p_interstitial_title^5 description^15 pg_series_description annotations^0.1
classification_notes^0.05 pv_program_version_number^2
pv_program_version_number_ci^2 pv_program_number^2 pv_program_number_ci^2
p_program_number^2 ma_version_number^2 ma_recording_location
ma_contributions^0.001 rel_pg_series_title rel_programme_title
rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5
pv_uuid^0.5 ma_uuid^0.5/str
str name=pfpg_series_title_ci^500 title_ci^500/str
int name=ps0/int
str name=q.alt*:*/str
str name=mm100%/str
str name=q.opAND/str
str name=facettrue/str
str name=facet.limit-1/str
str name=facet.mincount1/str
/lst
/requestHandler

As you can see above, the search is against many fields. What I'd want is
the documents that have exact matches for series title and title fields
should rank higher than the rest.

I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields for
series title and title and have boosted them higher over the tokenized and
rest of the fields. I have also implemented a similarity class to override
idf however I still get documents having partial matches in title and other
fields ranking higher than exact match in pg_series_title_ci.

Many Thanks,
Sandeep

Re: How to give more more importance to a document if term match is more

2013-02-19 Thread Sandeep Mestry

Hi Pragyanshis,

I faced a similar problem few days ago and I was advised on this forum to
override Solr DefaultSimilairy calculation to return always a constant
value for idf. I think, in your case you'd also want to suppress the length
norm which will require re-indexing as length norm is calculated during
indexing.

The link of my issue is as below:
http://lucene.472066.n3.nabble.com/Possible-issue-in-edismax-td4037397.html

Cheers,
Sandeep


On 14 February 2013 19:20, Pragyanshis Pattanaik pragyans...@outlook.comwrote:

 Hi,
 My schema is like below.
  fields   dynamicField name=Subject-Name-* type=string
 indexed=true stored=true/dynamicField
 name=Subject-Mark-* type=int indexed=true stored=true//fields
 My need is to search only three subject fields and boost those subjects
 which has a higher Mark(Mark can be in between 1 - 10).
 Again Top subjects will get a higher boost than preceding one's.
 Like if a search term is present in Subject-Name-1,Then it will get a
 higher boost than Subject-Name-2 and Subject-Name-3.
 Similarly Subject-Mark-1 will get higher boost than Subject-Mark-2 and
 Subject-Mark-3.
 To achieve this i am querying over subject fields and my query looks like
 below.

 q=+Economics+Geographywt=xmldeftype=edismaxqf=Subject-Name-1+Subject-Name-2+Subject-Name-3bq=Subject-Name-1%3AEconomics%3BGeography^50.0+Subject-Mark-1%3A20^90.0+Subject-Mark-1%3A9^80.0+Subject-Mark-1%3A8^70.0+Subject-Mark-1%3A7^60.0+Subject-Name-2%3AEconomics%3BGeography^45.0+Subject-Mark-2%3A20^90.0+Subject-Mark-2%3A9^80.0+Subject-Mark-2%3A8^70.0+Subject-Mark-2%3A7^60.0+Subject-Name-3%3AEconomics%3BGeography^40.0+Subject-Mark-3%3A20^90.0+Subject-Mark-3%3A9^80.0+Subject-Mark-3%3A8^70.0+Subject-Mark-3%3A7^60.0
 If i am having four documents like below
 docstr name=Subject-Name-1Economics/strstr
 name=Subject-Name-1Geography/strstr
 name=Subject-Name-1History/strint name=Subject-Name-17/int
  int name=Subject-Name-17/intint name=Subject-Name-16/int
  /docdocstr name=Subject-Name-1Economics/strstr
 name=Subject-Name-1History/strstr
 name=Subject-Name-1Geography/strint name=Subject-Name-18/int
int name=Subject-Name-18/intint name=Subject-Name-15/int
/docdocstr name=Subject-Name-1Economics/str
  str name=Subject-Name-1History/strstr
 name=Subject-Name-1Geography/strint name=Subject-Name-19/int
int name=Subject-Name-16/intint name=Subject-Name-17/int
/docdocstr name=Subject-Name-1Economics/str
  str name=Subject-Name-1Mathematics/strstr
 name=Subject-Name-1History/strint name=Subject-Name-17/int
  int name=Subject-Name-17/intint name=Subject-Name-16/int
  /doc

 then i am getting a higher score for last document which has only one of
 the search term !!!
 But in my situation it is not applicable. My requirement is,if a document
 has only one term then they should get a lower score than the documents
 which are having both of the terms.
 Is it happening because of idf(rarer terms give higher contribution to the
 total score) ?
 Or there is something wrong with my query ?
 Can anybody help me to achieve the desired output.
 Thanks in advance

Re: Problem when I search something that contains a forward slash?

2013-02-19 Thread Sandeep Mestry

Hi Bruno,

[image: !] Solr 4.0 added regular expression support, which means that
'/' is now a special character and must be escaped if searching for literal
forward slash.

http://wiki.apache.org/solr/SolrQuerySyntax

So, you can either escape it or use quotes like A01H2/001

Cheers,
Sandeep



On 19 February 2013 11:40, Bruno Mannina bmann...@free.fr wrote:

 Dear Solr Users,

 I use Solr 3.6

 I have a field name IC which contains IPC codes with a forward slash
 inside like:
 A01H2/001
 G06F1/023
 C01C3/147
 G06F3/023
 etc...

 My definition for this field is:
 field name=ic type=text_general indexed=true stored=true
 multiValued=true/

 If i try to search:
 ic:G06F3/023
 http://:/solr/**select/?q=ic%3AG06F3%2F023**
 version=2.2start=0rows=10**indent=on

 the result is wrong.

 When I use debugQuery, I see that the forward slash split the request as:
 str name=parsedquery_toString**ic:g06f3 ic:023/str

 How can I search a term that contains a / (forward slash)?

 Thanks a lot for your help,
 Bruno

Re: Problem when I search something that contains a forward slash?

2013-02-19 Thread Sandeep Mestry

Hi Bruno,

I have never used 3.6 so I am sorry I might not be of much help.
But, I have a similar requirement for 2 fields and I use string  case
insensitive string fields and by escaping the forward slash, I get the
result correctly.

The field definitions are as below:

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true compressThreshold=10/

fieldType name=string_ci class=solr.TextField sortMissingLast=true
omitNorms=true compressThreshold=10
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

The debug output for string field is as below:

*String field:*

str name=rawquerystringpv_program_version_number_ci:HNAD002D\/01/str
str name=querystringpv_program_version_number_ci:HNAD002D\/01/str
str name=parsedquerypv_program_version_number_ci:hnad002d/01/str
str name=parsedquery_toStringpv_program_version_number_ci:hnad002d/01
/str

*Case Insensitive String field:*
str name=rawquerystringpv_program_version_number:HNAD002D\/01/str
str name=querystringpv_program_version_number:HNAD002D\/01/str
str name=parsedquerypv_program_version_number:HNAD002D/01/str
str name=parsedquery_toStringpv_program_version_number:HNAD002D/01/str


HTH,
Sandeep


On 19 February 2013 12:24, Bruno Mannina bmann...@free.fr wrote:

 Hi,

 Even I use backslash, the problem is the same:
 ic:A01H2\/023 returns the same problem.

 May be I must disable an option ? or something 

 Le 19/02/2013 13:11, Bruno Mannina a écrit :

  Hi Sandeep,

 First thanks for your answer but I use Solr 3.6 and not 4.0.
 I can't actually update my solr to 4.0 version.

 And using the   is not the solution because Solr 3.6 has an issue when
 I use troncation like * inside the request:
 A01H2/0* doesn't work.

 Do you have an other solution for Solr 3.6 ?

 thanks a lot,
 Bruno

 Le 19/02/2013 13:05, Sandeep Mestry a écrit :

 Hi Bruno,

 [image: !] Solr 4.0 added regular expression support, which means that
 '/' is now a special character and must be escaped if searching for
 literal
 forward slash.

 http://wiki.apache.org/solr/**SolrQuerySyntaxhttp://wiki.apache.org/solr/SolrQuerySyntax

 So, you can either escape it or use quotes like A01H2/001

 Cheers,
 Sandeep



 On 19 February 2013 11:40, Bruno Mannina bmann...@free.fr wrote:

  Dear Solr Users,

 I use Solr 3.6

 I have a field name IC which contains IPC codes with a forward slash
 inside like:
 A01H2/001
 G06F1/023
 C01C3/147
 G06F3/023
 etc...

 My definition for this field is:
 field name=ic type=text_general indexed=true stored=true
 multiValued=true/

 If i try to search:
 ic:G06F3/023
 http://:/solr/select/?q=ic%3AG06F3%2F023**
 version=2.2start=0rows=10indent=on

 the result is wrong.

 When I use debugQuery, I see that the forward slash split the request
 as:
 str name=parsedquery_toStringic:g06f3 ic:023/str

 How can I search a term that contains a / (forward slash)?

 Thanks a lot for your help,
 Bruno

Re: Possible issue in edismax?

2013-02-12 Thread Sandeep Mestry

Hi Felipe, Just a short note to say thanks for your valuable suggestion. I
had implemented that and could see expected results. The length norm still
spoils it for few fields but I balanced it with the boost factors
accordingly.

Once again, Many Thanks!
Sandeep


On 1 February 2013 22:53, Sandeep Mestry sanmes...@gmail.com wrote:

 Brilliant!  Thanks very much for your response. .
 On 1 Feb 2013 20:37, Felipe Lahti fla...@thoughtworks.com wrote:

 It's not necessary. It's only query time.


 On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Hi..
 
  Could you tell me if changing default similarity to custom
 implementation
  will require me to rebuild the index? Or will it be used only query
 time?
 
  thanks,
  Sandeep
   On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote:
 
   So, it depends of your business requirement, right? If a document has
   matches in more searchable fields, at least for me, this document is
 more
   important than other document that has less matches.
  
   Example:
   Put this in your schema:
   similarity class=com.your.namespace.NoIDFSimilarity /
  
   And create a class in your classpath of your Solr:
  
   package com.your.namespace;
  
   import org.apache.lucene.search.similarities.DefaultSimilarity;
  
   public class NoIDFSimilarity extends DefaultSimilarity {
  
   @Override
  
   public float idf(long docFreq, long numDocs) {
  
   return 1;
  
   }
  
   }
  
  
   It will neutralize the idf (which is the rarity of term).
  
  
  
  
  
  
   On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com
   wrote:
  
Thanks Felipe..
Can you point me an example please?
   
Also forgive me but if a document has matches in more searchable
 fields
then should it not rank higher?
   
Thanks,
Sandeep
On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com
 wrote:
   
 If you compare the first and last document scores you will see
 that
  the
 last one matches more fields than first one. So, you maybe
 thinking
   why?
 The first doc only matches contributions field and the last
  matches a
 bunch of fields so if you want to  have behave more like (str
 name=qfseries_title^500 title^100 description^15
  contribution/str)
you
 have to override the method of DefaultSimilarity.


 On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
 sanmes...@gmail.com
  
 wrote:

  I have pasted it below and it is slightly variant from the
 dismax
  configuration I have mentioned above as I was playing with all
  sorts
   of
  boost values, however it looks more lie below:
 
  str name=c208c2ca-4270-27b8-e040-a8c00409063a
  2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01
 times
 others
  of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
  [DefaultSimilarity], result of: 2675.7844 =
  score(doc=63298,freq=1.0
   =
  termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0),
  with
 freq
  of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
  40960.0 = fieldNorm(doc=63298)
  /str
  str name=c208c2a9-66bc-27b8-e040-a8c00409063a
  2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01
 times
others
  of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
  [DefaultSimilarity], result of: 2317.297 =
   score(doc=9826415,freq=3.0 =
  termFreq=3.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  515439.0 = fieldWeight in 9826415, product of: 1.7320508 =
tf(freq=3.0),
  with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
  /str
  str name=c208c2aa-1806-27b8-e040-a8c00409063a
  2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01
 times
 others
  of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
  [DefaultSimilarity], result of: 2140.6274 =
   score(doc=9882325,freq=1.0
=
  termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  476142.16 = fieldWeight in 9882325, product of: 1.0 =
 tf(freq=1.0),
with
  freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
maxDocs=11282414)
  32768.0 = fieldNorm(doc=9882325)
  /str
  str name=c208c2b0-5165-27b8-e040-a8c00409063a
  1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01
 times
 others
  of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
  [DefaultSimilarity], result of: 1605.4707 =
   score(doc=220007,freq=1.0 =
  termFreq=1.0 ), product of: 0.004495774

Re: Possible issue in edismax?

2013-02-01 Thread Sandeep Mestry

Hi..

Could you tell me if changing default similarity to custom implementation
will require me to rebuild the index? Or will it be used only query time?

thanks,
Sandeep
 On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote:

 So, it depends of your business requirement, right? If a document has
 matches in more searchable fields, at least for me, this document is more
 important than other document that has less matches.

 Example:
 Put this in your schema:
 similarity class=com.your.namespace.NoIDFSimilarity /

 And create a class in your classpath of your Solr:

 package com.your.namespace;

 import org.apache.lucene.search.similarities.DefaultSimilarity;

 public class NoIDFSimilarity extends DefaultSimilarity {

 @Override

 public float idf(long docFreq, long numDocs) {

 return 1;

 }

 }


 It will neutralize the idf (which is the rarity of term).






 On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Thanks Felipe..
  Can you point me an example please?
 
  Also forgive me but if a document has matches in more searchable fields
  then should it not rank higher?
 
  Thanks,
  Sandeep
  On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com wrote:
 
   If you compare the first and last document scores you will see that the
   last one matches more fields than first one. So, you maybe thinking
 why?
   The first doc only matches contributions field and the last matches a
   bunch of fields so if you want to  have behave more like (str
   name=qfseries_title^500 title^100 description^15 contribution/str)
  you
   have to override the method of DefaultSimilarity.
  
  
   On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry sanmes...@gmail.com
   wrote:
  
I have pasted it below and it is slightly variant from the dismax
configuration I have mentioned above as I was playing with all sorts
 of
boost values, however it looks more lie below:
   
str name=c208c2ca-4270-27b8-e040-a8c00409063a
2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times
   others
of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
[DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0
 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with
   freq
of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
40960.0 = fieldNorm(doc=63298)
/str
str name=c208c2a9-66bc-27b8-e040-a8c00409063a
2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times
  others
of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
[DefaultSimilarity], result of: 2317.297 =
 score(doc=9826415,freq=3.0 =
termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
515439.0 = fieldWeight in 9826415, product of: 1.7320508 =
  tf(freq=3.0),
with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
/str
str name=c208c2aa-1806-27b8-e040-a8c00409063a
2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times
   others
of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
[DefaultSimilarity], result of: 2140.6274 =
 score(doc=9882325,freq=1.0
  =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0),
  with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
32768.0 = fieldNorm(doc=9882325)
/str
str name=c208c2b0-5165-27b8-e040-a8c00409063a
1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
   others
of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
[DefaultSimilarity], result of: 1605.4707 =
 score(doc=220007,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0),
 with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
24576.0 = fieldNorm(doc=220007)
/str
str name=c208c2cc-d01b-27b8-e040-a8c00409063a
1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
   others
of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
[DefaultSimilarity], result of: 1605.4707 =
 score(doc=241151,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0),
 with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14

Re: Possible issue in edismax?

2013-02-01 Thread Sandeep Mestry

Brilliant!  Thanks very much for your response. .
On 1 Feb 2013 20:37, Felipe Lahti fla...@thoughtworks.com wrote:

 It's not necessary. It's only query time.


 On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Hi..
 
  Could you tell me if changing default similarity to custom implementation
  will require me to rebuild the index? Or will it be used only query time?
 
  thanks,
  Sandeep
   On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote:
 
   So, it depends of your business requirement, right? If a document has
   matches in more searchable fields, at least for me, this document is
 more
   important than other document that has less matches.
  
   Example:
   Put this in your schema:
   similarity class=com.your.namespace.NoIDFSimilarity /
  
   And create a class in your classpath of your Solr:
  
   package com.your.namespace;
  
   import org.apache.lucene.search.similarities.DefaultSimilarity;
  
   public class NoIDFSimilarity extends DefaultSimilarity {
  
   @Override
  
   public float idf(long docFreq, long numDocs) {
  
   return 1;
  
   }
  
   }
  
  
   It will neutralize the idf (which is the rarity of term).
  
  
  
  
  
  
   On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com
   wrote:
  
Thanks Felipe..
Can you point me an example please?
   
Also forgive me but if a document has matches in more searchable
 fields
then should it not rank higher?
   
Thanks,
Sandeep
On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com
 wrote:
   
 If you compare the first and last document scores you will see that
  the
 last one matches more fields than first one. So, you maybe thinking
   why?
 The first doc only matches contributions field and the last
  matches a
 bunch of fields so if you want to  have behave more like (str
 name=qfseries_title^500 title^100 description^15
  contribution/str)
you
 have to override the method of DefaultSimilarity.


 On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
 sanmes...@gmail.com
  
 wrote:

  I have pasted it below and it is slightly variant from the dismax
  configuration I have mentioned above as I was playing with all
  sorts
   of
  boost values, however it looks more lie below:
 
  str name=c208c2ca-4270-27b8-e040-a8c00409063a
  2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01
 times
 others
  of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
  [DefaultSimilarity], result of: 2675.7844 =
  score(doc=63298,freq=1.0
   =
  termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0),
  with
 freq
  of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
  40960.0 = fieldNorm(doc=63298)
  /str
  str name=c208c2a9-66bc-27b8-e040-a8c00409063a
  2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times
others
  of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
  [DefaultSimilarity], result of: 2317.297 =
   score(doc=9826415,freq=3.0 =
  termFreq=3.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  515439.0 = fieldWeight in 9826415, product of: 1.7320508 =
tf(freq=3.0),
  with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
  /str
  str name=c208c2aa-1806-27b8-e040-a8c00409063a
  2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01
 times
 others
  of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
  [DefaultSimilarity], result of: 2140.6274 =
   score(doc=9882325,freq=1.0
=
  termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  476142.16 = fieldWeight in 9882325, product of: 1.0 =
 tf(freq=1.0),
with
  freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
maxDocs=11282414)
  32768.0 = fieldNorm(doc=9882325)
  /str
  str name=c208c2b0-5165-27b8-e040-a8c00409063a
  1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01
 times
 others
  of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
  [DefaultSimilarity], result of: 1605.4707 =
   score(doc=220007,freq=1.0 =
  termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  357106.62 = fieldWeight in 220007, product of: 1.0 =
 tf(freq=1.0),
   with
  freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
maxDocs=11282414)
  24576.0 = fieldNorm(doc=220007)
  /str
  str name=c208c2cc

Re: Possible issue in edismax?

2013-01-31 Thread Sandeep Mestry

Fantastic! Thanks very much.. I will do so accordingly and will let you
know the results.

Thanks again,
Sandeep


On 31 January 2013 13:54, Felipe Lahti fla...@thoughtworks.com wrote:

 So, it depends of your business requirement, right? If a document has
 matches in more searchable fields, at least for me, this document is more
 important than other document that has less matches.

 Example:
 Put this in your schema:
 similarity class=com.your.namespace.NoIDFSimilarity /

 And create a class in your classpath of your Solr:

 package com.your.namespace;

 import org.apache.lucene.search.similarities.DefaultSimilarity;

 public class NoIDFSimilarity extends DefaultSimilarity {

 @Override

 public float idf(long docFreq, long numDocs) {

 return 1;

 }

 }


 It will neutralize the idf (which is the rarity of term).






 On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Thanks Felipe..
  Can you point me an example please?
 
  Also forgive me but if a document has matches in more searchable fields
  then should it not rank higher?
 
  Thanks,
  Sandeep
  On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com wrote:
 
   If you compare the first and last document scores you will see that the
   last one matches more fields than first one. So, you maybe thinking
 why?
   The first doc only matches contributions field and the last matches a
   bunch of fields so if you want to  have behave more like (str
   name=qfseries_title^500 title^100 description^15 contribution/str)
  you
   have to override the method of DefaultSimilarity.
  
  
   On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry sanmes...@gmail.com
   wrote:
  
I have pasted it below and it is slightly variant from the dismax
configuration I have mentioned above as I was playing with all sorts
 of
boost values, however it looks more lie below:
   
str name=c208c2ca-4270-27b8-e040-a8c00409063a
2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times
   others
of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
[DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0
 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with
   freq
of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
40960.0 = fieldNorm(doc=63298)
/str
str name=c208c2a9-66bc-27b8-e040-a8c00409063a
2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times
  others
of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
[DefaultSimilarity], result of: 2317.297 =
 score(doc=9826415,freq=3.0 =
termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
515439.0 = fieldWeight in 9826415, product of: 1.7320508 =
  tf(freq=3.0),
with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
/str
str name=c208c2aa-1806-27b8-e040-a8c00409063a
2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times
   others
of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
[DefaultSimilarity], result of: 2140.6274 =
 score(doc=9882325,freq=1.0
  =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0),
  with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
32768.0 = fieldNorm(doc=9882325)
/str
str name=c208c2b0-5165-27b8-e040-a8c00409063a
1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
   others
of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
[DefaultSimilarity], result of: 1605.4707 =
 score(doc=220007,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0),
 with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
24576.0 = fieldNorm(doc=220007)
/str
str name=c208c2cc-d01b-27b8-e040-a8c00409063a
1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
   others
of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
[DefaultSimilarity], result of: 1605.4707 =
 score(doc=241151,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0),
 with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
24576.0 = fieldNorm(doc

Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

Hi All,

I'm facing an issue in relevancy calculation by dismax query parser.
The boost factor applied does not work as expected in certain cases when
the keyword is generic and by generic I mean, if the keyword is appearing
many times in the document as well as in the index.

I have parser configuration as below:

requestHandler name=querydismax class=solr.SearchHandler 
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfseries_title^500 title^100 description^15
contribution/str
str name=pfseries_title^200/str
int name=ps0/int
str name=q.alt*:*/str
/lst
/requestHandler

As you can see above, I'd expect the documents containing the matches for
series title should rank higher than the ones in contribution.

This works well, if I type in a query like 'wonderworld' which is a less
occurring term and the series titles rank higher. But, if I type in a
keyword like 'news' which is the most common term in the index, I get hits
in contributions even though I have lots of documents having word news in
series title.

The field definition is as below:

field name=series_title type=text_wc indexed=true stored=true
multiValued=false /
field name=title type=text_wc indexed=true stored=true
multiValued=false /
field name=description type=text_wc indexed=true stored=true
multiValued=false /
field name=contribution type=text indexed=true stored=true
multiValued=true /

fieldType name=text class=solr.TextField positionIncrementGap=100
compressThreshold=10
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

fieldType name=text_wc class=solr.TextField positionIncrementGap=100

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
 /fieldType

I have tried debugging and when I use query term news, I see that matches
for contributions are ranked higher than series title. The parsed queries
look like below:
(Note that I have edited the query as in reality I have lot of fields that
are searchable and I have only mentioned the fields containing text data -
rest all contain uuids)

str name=parsedquery
(+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
() () () () () () () () () () () () () () () () () () () ())/no_coord
/str
str name=parsedquery_toString
+(description:news^15 | title:news^100.0 | contributions:news |
series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
() () () () () () () () () () () () ()


Could you guide me in right direction please?

Many Thanks,
Sandeep

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

Thanks Felipe, yes I have seen that and my requirement somewhere falls for


On 30 January 2013 15:53, Felipe Lahti fla...@thoughtworks.com wrote:

 Hi Sandeep,

 Quick answer is that not only the boost that you define in your
 requestHandler is taken to calculate the score of each document. There are
 others factors that contribute to score calculation. You can take a look
 here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see
 using debugQuery=true the score calculation for each document returned.

 Let me know you need something else.



 On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Hi All,
 
  I'm facing an issue in relevancy calculation by dismax query parser.
  The boost factor applied does not work as expected in certain cases when
  the keyword is generic and by generic I mean, if the keyword is appearing
  many times in the document as well as in the index.
 
  I have parser configuration as below:
 
  requestHandler name=querydismax class=solr.SearchHandler 
  lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qfseries_title^500 title^100 description^15
  contribution/str
  str name=pfseries_title^200/str
  int name=ps0/int
  str name=q.alt*:*/str
  /lst
  /requestHandler
 
  As you can see above, I'd expect the documents containing the matches for
  series title should rank higher than the ones in contribution.
 
  This works well, if I type in a query like 'wonderworld' which is a less
  occurring term and the series titles rank higher. But, if I type in a
  keyword like 'news' which is the most common term in the index, I get
 hits
  in contributions even though I have lots of documents having word news in
  series title.
 
  The field definition is as below:
 
  field name=series_title type=text_wc indexed=true stored=true
  multiValued=false /
  field name=title type=text_wc indexed=true stored=true
  multiValued=false /
  field name=description type=text_wc indexed=true stored=true
  multiValued=false /
  field name=contribution type=text indexed=true stored=true
  multiValued=true /
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
  compressThreshold=10
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  /fieldType
 
  fieldType name=text_wc class=solr.TextField
 positionIncrementGap=100
  
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=1
 splitOnCaseChange=1
  splitOnNumerics=0 preserveOriginal=1 /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=1
 splitOnCaseChange=1
  splitOnNumerics=0 preserveOriginal=1 /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
   /fieldType
 
  I have tried debugging and when I use query term news, I see that matches
  for contributions are ranked higher than series title. The parsed queries
  look like below:
  (Note that I have edited the query as in reality I have lot of fields
 that
  are searchable and I have only mentioned the fields containing text data
 -
  rest all contain uuids)
 
  str name=parsedquery
  (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
  contributions:news | series_title:news^500.0)~0.01) () () () () () () ()
 ()
  () () () () () () () () () () () () () () () () () () () ())/no_coord
  /str
  str name=parsedquery_toString
  +(description:news^15 | title:news^100.0 | contributions:news |
  series_title:news^500.0)~0.01 () () () () () () () () () () () () () ()
 ()
  () () () () () () () () () () () () ()
 
 
  Could you guide me in right direction please?
 
  Many Thanks,
  Sandeep
 



 --
 Felipe Lahti
 Consultant Developer

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

(Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
an email in Gmail.. ;-))

Thanks Felipe, yes I have seen that and my requirement falls for

How can I make exact-case matches score higher

Example: a query of Penguin should score documents containing Penguin
higher than docs containing penguin.

The general strategy is to index the content twice, using different fields
with different fieldTypes (and different analyzers associated with those
fieldTypes). One analyzer will contain a lowercase filter for
case-insensitive matches, and one will preserve case for exact-case matches.

Use copyField http://wiki.apache.org/solr/SchemaXml#copyField commands in
the schema to index a single input field multiple times.

Once the content is indexed into multiple fields that are analyzed
differently, query across both
fieldshttp://wiki.apache.org/solr/SolrRelevancyFAQ#multiFieldQuery
.

I have added a case insensitive field too to match the exact matches
higher, however the result is not even considering the matches in field -
forget the exact matching part.

And I have tried the debugQuery option as mentioned in my previous mail,
and I have also posted the parsed queries. From the debug query, I see that
field boosted with lesser factor (contribution) is still resulting higher
than the one with higher boost factor (series_title).


Thanks,

Sandeep




On 30 January 2013 16:02, Sandeep Mestry sanmes...@gmail.com wrote:

 Thanks Felipe, yes I have seen that and my requirement somewhere falls for


 On 30 January 2013 15:53, Felipe Lahti fla...@thoughtworks.com wrote:

 Hi Sandeep,

 Quick answer is that not only the boost that you define in your
 requestHandler is taken to calculate the score of each document. There are
 others factors that contribute to score calculation. You can take a look
 here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
 see
 using debugQuery=true the score calculation for each document returned.

 Let me know you need something else.



 On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Hi All,
 
  I'm facing an issue in relevancy calculation by dismax query parser.
  The boost factor applied does not work as expected in certain cases when
  the keyword is generic and by generic I mean, if the keyword is
 appearing
  many times in the document as well as in the index.
 
  I have parser configuration as below:
 
  requestHandler name=querydismax class=solr.SearchHandler 
  lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qfseries_title^500 title^100 description^15
  contribution/str
  str name=pfseries_title^200/str
  int name=ps0/int
  str name=q.alt*:*/str
  /lst
  /requestHandler
 
  As you can see above, I'd expect the documents containing the matches
 for
  series title should rank higher than the ones in contribution.
 
  This works well, if I type in a query like 'wonderworld' which is a less
  occurring term and the series titles rank higher. But, if I type in a
  keyword like 'news' which is the most common term in the index, I get
 hits
  in contributions even though I have lots of documents having word news
 in
  series title.
 
  The field definition is as below:
 
  field name=series_title type=text_wc indexed=true stored=true
  multiValued=false /
  field name=title type=text_wc indexed=true stored=true
  multiValued=false /
  field name=description type=text_wc indexed=true stored=true
  multiValued=false /
  field name=contribution type=text indexed=true stored=true
  multiValued=true /
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
  compressThreshold=10
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  /fieldType
 
  fieldType name=text_wc class=solr.TextField
 positionIncrementGap=100
  
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=1
 splitOnCaseChange=1
  splitOnNumerics=0 preserveOriginal=1

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry

 = fieldWeight in 967895, product of: 1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.4641423 =
idf(docFreq=47791, maxDocs=11282414) 1.0 = fieldNorm(doc=967895) 1.6107484
= (MATCH) weight(title_ci:news^100.0 in 967895) [DefaultSimilarity], result
of: 1.6107484 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
0.22324038 = queryWeight, product of: 100.0 = boost 7.2153096 =
idf(docFreq=22548, maxDocs=11282414) 3.093982E-4 = queryNorm 7.2153096 =
fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 7.2153096 = idf(docFreq=22548, maxDocs=11282414) 1.0 =
fieldNorm(doc=967895)
/str


On 30 January 2013 17:55, Felipe Lahti fla...@thoughtworks.com wrote:

 Let me see if I understood your problem:

 By your first e-mail I think you are worried about the returned order of
 documents from Solr. Is that correct? If yes, as I said before it's not
 only the boosting that influence the order of returned documents. There's
 term frequency, IDF(inverse document frequency)... If I understood
 correctly by your first e-mail, you are interested in get rid of IDF. So
 for that, you can create a NoIDFSimilarity class to override the default
 similarity.

 Can you paste here the score calculation for one document?


 On Wed, Jan 30, 2013 at 2:06 PM, Sandeep Mestry sanmes...@gmail.comwrote:

 (Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
 an email in Gmail.. ;-))

 Thanks Felipe, yes I have seen that and my requirement falls for

 How can I make exact-case matches score higher

 Example: a query of Penguin should score documents containing Penguin
 higher than docs containing penguin.

 The general strategy is to index the content twice, using different fields
 with different fieldTypes (and different analyzers associated with those
 fieldTypes). One analyzer will contain a lowercase filter for
 case-insensitive matches, and one will preserve case for exact-case
 matches.

 Use copyField http://wiki.apache.org/solr/SchemaXml#copyField commands
 in

 the schema to index a single input field multiple times.

 Once the content is indexed into multiple fields that are analyzed
 differently, query across both
 fieldshttp://wiki.apache.org/solr/SolrRelevancyFAQ#multiFieldQuery

 .

 I have added a case insensitive field too to match the exact matches
 higher, however the result is not even considering the matches in field -
 forget the exact matching part.

 And I have tried the debugQuery option as mentioned in my previous mail,
 and I have also posted the parsed queries. From the debug query, I see
 that
 field boosted with lesser factor (contribution) is still resulting higher
 than the one with higher boost factor (series_title).


 Thanks,

 Sandeep




 On 30 January 2013 16:02, Sandeep Mestry sanmes...@gmail.com wrote:

  Thanks Felipe, yes I have seen that and my requirement somewhere falls
 for
 
 
  On 30 January 2013 15:53, Felipe Lahti fla...@thoughtworks.com wrote:
 
  Hi Sandeep,
 
  Quick answer is that not only the boost that you define in your
  requestHandler is taken to calculate the score of each document. There
 are
  others factors that contribute to score calculation. You can take a
 look
  here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
  see
  using debugQuery=true the score calculation for each document returned.
 
  Let me know you need something else.
 
 
 
  On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry sanmes...@gmail.com
  wrote:
 
   Hi All,
  
   I'm facing an issue in relevancy calculation by dismax query parser.
   The boost factor applied does not work as expected in certain cases
 when
   the keyword is generic and by generic I mean, if the keyword is
  appearing
   many times in the document as well as in the index.
  
   I have parser configuration as below:
  
   requestHandler name=querydismax class=solr.SearchHandler 
   lst name=defaults
   str name=defTypeedismax/str
   str name=echoParamsexplicit/str
   float name=tie0.01/float
   str name=qfseries_title^500 title^100 description^15
   contribution/str
   str name=pfseries_title^200/str
   int name=ps0/int
   str name=q.alt*:*/str
   /lst
   /requestHandler
  
   As you can see above, I'd expect the documents containing the matches
  for
   series title should rank higher than the ones in contribution.
  
   This works well, if I type in a query like 'wonderworld' which is a
 less
   occurring term and the series titles rank higher. But, if I type in a
   keyword like 'news' which is the most common term in the index, I get
  hits
   in contributions even though I have lots of documents having word
 news
  in
   series title.
  
   The field definition is as below:
  
   field name=series_title type=text_wc indexed=true
 stored=true
   multiValued=false /
   field name=title type=text_wc indexed=true stored=true
   multiValued=false /
   field name

Re: ConcurrentModificationException in Solr 3.6.1

2013-01-18 Thread Sandeep Mestry

Hi There, I think Andre has already guided you in your earlier mail..


This should be fixed in 3.6.2 which is available since Dec 25.

From the release notes:

Fixed ConcurrentModificationException during highlighting, if all fields
were requested.

André




Von: mechravi25 [mechrav...@yahoo.co.in]
Gesendet: Freitag, 18. Januar 2013 11:10
An: solr-user@lucene.apache.org
Betreff: ConcurrentModificationException in Solr 3.6.1


On 18 January 2013 12:01, mechravi25 mechrav...@yahoo.co.in wrote:

 Hi all,


 I am using Solr 3.6.1 version. I am giving a set of requests to solr
 simultaneously. When I check the log file, I noticed the below exception
 stack trace


 SEVERE: java.util.ConcurrentModificationException
  at
 java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
  at java.util.LinkedList$ListItr.next(LinkedList.java:696)
  at

 org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
  at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
  at

 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
  at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
  at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
  at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
  at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
  at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
  at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
  at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:326)
  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
  at

 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
  at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
  at

 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

 When I searched through the solr issues, I got the following two url's,

 https://issues.apache.org/jira/browse/SOLR-2684
 https://issues.apache.org/jira/browse/SOLR-3790

 The stack trace given in the second url coincides with the one given above
 so I have applied the code change as given in the below link

 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h

 The first url's stack trace seems to be different.
 I have two questions here. 1.) Please tell me why this exception stack
 trace
 occurs 2.) IS there any other patch/solution available to overcome this
 exception.
 Please guide me.

 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034520.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4 : Optimize very slow

2012-12-06 Thread Sandeep Mestry

Hi All,

I followed the advice Michael and the timings reduced to couple of hours
now from 6-8 hours :-)
I have attached the solrconfig.xml we're using, can you let me know if I'm
missing something..

Thanks,
Sandeep
?xml version=1.0 encoding=UTF-8 ?
!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the License); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--
!--
For more details about configurations options that may appear in this
file, see http://wiki.apache.org/solr/SolrConfigXml.

Specifically, the Solr Config can support XInclude, which may make it easier to manage
the configuration. See https://issues.apache.org/jira/browse/SOLR-1167
--
config
luceneMatchVersionLUCENE_40/luceneMatchVersion
!-- Set this to 'false' if you want solr to continue working after it has
encountered an severe configuration error. In a production environment,
you may want solr to keep working even if one handler is mis-configured.

You may also set this to false using by setting the system property:
-Dsolr.abortOnConfigurationError=false
--
abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError

!-- lib directives can be used to instruct Solr to load an Jars identified
and use them to resolve any plugins specified in your solrconfig.xml or
schema.xml (ie: Analyzers, Request Handlers, etc...).

All directories and paths are resolved relative the instanceDir.

If a ./lib directory exists in your instanceDir, all files found in it
are included as if you had used the following syntax...

lib dir=./lib /
--
!-- A dir option by itself adds any files found in the directory to the
classpath, this is useful for including all jars in a directory.
--
lib dir=../../contrib/extraction/lib /
!-- When a regex is specified in addition to a directory, only the files in that
directory which completely match the regex (anchored on both ends)
will be included.
--
lib dir=../../dist/ regex=apache-solr-cell-\d.*\.jar /
lib dir=../../dist/ regex=apache-solr-clustering-\d.*\.jar /
!-- If a dir option (with or without a regex) is used and nothing is found
that matches, it will be ignored
--
lib dir=../../contrib/clustering/lib/downloads/ /
lib dir=../../contrib/clustering/lib/ /
lib dir=/total/crap/dir/ignored /
!-- an exact path can be used to specify a specific file. This will cause
a serious error to be logged if it can't be loaded.
lib path=../a-jar-that-does-not-exist.jar /
--

!-- Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration. --
dataDir${solr.data.dir:./solr/data}/dataDir

directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NIOFSDirectory}/

!-- WARNING: this indexDefaults section only provides defaults for index writers
in general. See also the mainIndex section after that when changing parameters
for Solr's main Lucene index. --
indexConfig
!-- Values here affect all index writers and act as a default unless overridden. --
mergeFactor30/mergeFactor
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce15/int
int name=segmentsPerTier15/int
/mergePolicy

!-- options specific to the main on-disk lucene index --
ramBufferSizeMB32/ramBufferSizeMB

!--
Custom deletion policies can specified here. The class must
implement org.apache.lucene.index.IndexDeletionPolicy.

http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexDeletionPolicy.html

The standard Solr IndexDeletionPolicy implementation supports deleting
index commit points on number of commits, age of commit point and
optimized status.

The latest commit point should always be preserved regardless
of the criteria.
--
deletionPolicy class=solr.SolrDeletionPolicy
!-- The number of commit points to be kept --
str name=maxCommitsToKeep1/str
!-- The number of optimized commit

Re: Solr 4 : Optimize very slow

2012-12-05 Thread Sandeep Mestry

@ Walter, the daily optimization was introduced as we saw a decrease in the
performance for searches that happen during the peak hours - when loads of
updates take place on index. The load testing was proved slightly
successfull on optimized indexes. As a matter of fact, the merge factor was
increased from 10 to 30 to make it acceptable.

@Upayavira , thanks for the inputs. I will try to avoid the daily
optimizations however its sort of the workplace policy not to alter
anything except the essential configs for this release of project. I take
your point that the daily optimizations are unnecessary even then its hard
to imagine why they take 6-8 hours a day when previously they were finished
within half an hour.

@Michael, thank for poitning that out, I will try using
solr.NIOFSDirectoryFactory
as currently I'm using the default one. Regarding your questions,
- Nothing has changed between solr 1.4 and solr 4 except the solr config. I
have built 2 separate environments using solr 1.4 and solr 4 with the same
application code, db config etc. and can see the difference in the
optimization timings.
- I will check the solr stats for gc and also during optimization. I see
that the index size reaches to 17 Gig from 8.5G and the CPU utilization
then is the highest..
And I meant WAS only as in Websphere Application Server.

@Otis, a quick google for optimize wunder Erick Otis results in this mail
chain (ha ha !), but I will dig the mail archives, thank you for your
suggestion..

Have a good day all, I will come back with my findings..

Best,
Sandeep


On 5 December 2012 06:07, Walter Underwood wun...@wunderwood.org wrote:

 It was not necessary under 1.4. It has never been necessary.

 It was not necessary in Ultraseek Server in 1996, using the same merging
 model.

 In some cases, it can be a good idea. Since you are continuously updating,
 this is not one of those cases.

 wunder

 On Dec 4, 2012, at 9:29 PM, Upayavira wrote:

  I tried that search, without success :-(
 
  I suspect what Otis was trying to say was to question why you are
  optimising. Optimise was necessary under 1.4, but with newer Solr, the
  new TieredMergePolicy does a much better job of handling background
  merging, reducing the need for optimize. Try just not doing it at all
  and see if your index actually reaches a point where it is needed.
 
  Upayavira
 
  On Wed, Dec 5, 2012, at 12:31 AM, Otis Gospodnetic wrote:
  Hi,
 
  You should search the ML archives for : optimize wunder Erick Otis :)
 
  Is WAS really AWS? If so, if these are new EC2 instances you are
  unfortunately unable to do a fair apples to apples comparison. Have you
  tried a different set of instances?
 
  Otis
  --
  Performance Monitoring - http://sematext.com/spm
  On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote:
 
  Hi All,
 
  I have recently migrated from solr 1.4 to solr 4 and have done the
 basic
  changes required for solr 4 in solrconfig.xml and schema.xml. I have
 also
  rebuilt the index set for solr 4.
  We run optimize every morning at 4 am and we keep the index updates off
  during this process.
  Previously, with 1.4 - the optimization used to take around 20-30 mins
 per
  shard but now with solr 4, its taking 6-8 hours or even more..
  I have also tested the optimize from solr UI and that takes 6-8 hours
 too..
  The hardware is saeme and, we have deployed solr under WAS.
  There ar 4 shards and every shard contains around 8 - 9 Gig of data
 and we
  are using master-slave configuration with rsync. I have not enabled
 soft
  commit. Also, commiter process is scheduled to run every minute.
 
  I am not sure which part I'm missing, do let me know your inputs
 please.
 
  Many Thanks in advance,
  Sandeep
 

 --
 Walter Underwood
 wun...@wunderwood.org

Re: Incremental Update of index

2012-12-05 Thread Sandeep Mestry

Hi Amit/Shanu,

You can create the solr document for only the updated record and index it
to ensure only the updated record gets indexed.
You need not rebuild indexes from scratch for every record update.

Thanks,
Sandeep

Solr 4 : Optimize very slow

2012-12-04 Thread Sandeep Mestry

Hi All,

I have recently migrated from solr 1.4 to solr 4 and have done the basic
changes required for solr 4 in solrconfig.xml and schema.xml. I have also
rebuilt the index set for solr 4.
We run optimize every morning at 4 am and we keep the index updates off
during this process.
Previously, with 1.4 - the optimization used to take around 20-30 mins per
shard but now with solr 4, its taking 6-8 hours or even more..
I have also tested the optimize from solr UI and that takes 6-8 hours too..
The hardware is saeme and, we have deployed solr under WAS.
There ar 4 shards and every shard contains around 8 - 9 Gig of data and we
are using master-slave configuration with rsync. I have not enabled soft
commit. Also, commiter process is scheduled to run every minute.

I am not sure which part I'm missing, do let me know your inputs please.

Many Thanks in advance,
Sandeep

Re: Does SolrCloud support distributed IDFs?

2012-11-28 Thread Sandeep Mestry

Dear All, Can anyone suggest how long it will take to get SOLR-1632 patch
into Solr 4?

Also, it'd be good if someone has used any alternate method like Ultraseek
XPA Java library to calculate the distributed ranking?

Many Thanks,
Sandeep


On 22 October 2012 13:23, Sascha SZOTT sz...@gmx.de wrote:

 Hi Mark,


 Mark Miller wrote:

 Still waiting on that issue. I think Andrzej should just update it to
 trunk and commit - it's option and defaults to off. Go vote :)

 Sounds like the problem is already solved and the remaining work consists
 of code integration? Can somebody estimate how much work that would be?

 -Sascha

Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Sandeep Mestry

Dear All,

I have a requirement to search against multiple fields like title,
description, annotations, comments, text and the query can contain multiple
boolean operators.
So, can someone point me out in right direction.

If the user enters a query like ,

- (day AND world) NOT night

I want to form a query:

*(title:day AND title:world NOT title:night) OR (description:day
AND description:world NOT description:night) OR (annotations:day
AND annotations:world NOT annotations:night) OR (comments:day
AND comments:world NOT comments:night) OR (text:day AND text:world
NOT text:night) *

I've tried Lucene MultiFieldQueryParser to form the query and after some
string manipulation tried producing a query as below, however it does not
provide me correct relevancy.

*(title:day OR description:day OR annotations:day OR comments:day OR
text:day) AND (title:world OR description:world OR annotations:world OR
comments:world OR text:world) NOT (title:night OR description:night
OR annotations:night OR comments:night OR text:night)*

For the record, the project is still on Solr 1.4 and hence I'm using
Standard Query Parser (the upgrade is due in coming months). But for now, I
need to make it work for above requirement.

Please suggest if there is any straightforward approach or should I take
the route of writing the QueryGrammar myself?

Many Thanks,
Sandeep

Re: Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Sandeep Mestry

Thanks Ahmet, however as I have mentioned in my e-mail, we're using Solr
1.4 here and edismax is supported from Solr 3.1.

:-)

On 23 October 2012 13:42, Ahmet Arslan iori...@yahoo.com wrote:



 --- On Tue, 10/23/12, Sandeep Mestry sanmes...@gmail.com wrote:

  From: Sandeep Mestry sanmes...@gmail.com
  Subject: Forming Solr Query for multiple operators against multiple
 fields
  To: solr-user@lucene.apache.org
  Date: Tuesday, October 23, 2012, 2:51 PM
  Dear All,
 
  I have a requirement to search against multiple fields like
  title,
  description, annotations, comments, text and the query can
  contain multiple
  boolean operators.
  So, can someone point me out in right direction.
 
  If the user enters a query like ,
 
  - (day AND world) NOT night

 Probably you can make use of (e)dismax query parser.
 http://wiki.apache.org/solr/DisMax
 http://wiki.apache.org/solr/ExtendedDisMax

60 matches

Mail list logo