I did try the raw query against the *simi* field and those seem to return
results in the order expected.
For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
Running a query with those words against the simi field returns the
expected models (X5, Audi Q5, etc) and then the subsequent documents have
decreasing relevance. So the basic query mechanism seems to be fine.

The issue just seems to be with MoreLikeThis component and handler.
I can post the index on a public SOLR instance - any suggestions? (or for
hosting)


On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh <gagan.g...@gmail.com>wrote:

> If you can bring up your solr setup on a public machine then im sure a lot
> of debugging can be done. Without that, i think what you should look at is
> the tf-idf scores of the terms like "camry" etc. Usually idf is the
> deciding factor into which results show at the top (tf should be 1 for your
> data).
> Enable &debugQuery=true and look at explain section to see show score is
> getting calculated.
>
> You should try giving different boosts to class, type, drive, size to
> control the results.
>
>
> On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote:
>
>> I am running some experiments on more like this and the results seem
>> rather odd - I am doing something wrong but just cannot figure out what.
>> Basically, the similarity results are decent - but not great.
>>
>> *Issue 1  = Quality*
>> Toyota Camry : finds Altima (good) but then next one is Camry Hybrid
>> whereas it should have found Accord.
>> I have normalized the data into a simi field which has only the
>> attributes that I care about.
>> Without the simi field, I could not get mlt.qf boosts to work well enough
>> to return results
>>
>> *Issue 2*
>> Some fields do not work at all. For instance, text+simi (in mlt.fl) works
>> whereas just simi does not.
>> So some weirdness that am just not understanding.
>>
>> Would be grateful for your guidance !
>>
>>
>> Here is the setup:
>> *1. SOLR Version*
>> solr-spec 4.2.0.2013.03.06.22.32.13
>> solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
>> lucene-spec 4.2.0
>> lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
>>
>> *2. Machine Information*
>> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
>> 19.0-b09)
>> Windows 7 Home 64 Bit with 4 GB RAM
>>
>> *3. Sample Data *
>> I created this 'dummy' data of cars  - the idea being that these would be
>> sufficient and simple to generate similarity and understand how it would
>> work.
>> There are 181 rows in the data set (I have attached it for reference in
>> CSV format)
>>
>> [image: Inline image 1]
>>
>> *4. SCHEMA*
>> *Field Definitions*
>>    <field name="id" type="string" indexed="true" stored="true"
>> termVectors="true" multiValued="false"/>
>>    <field name="make" type="string" indexed="true" stored="true"
>> termVectors="true" multiValued="false"/>
>>    <field name="model" type="string" indexed="true" stored="true"
>> termVectors="true" multiValued="false"/>
>>    <field name="class" type="string" indexed="true" stored="true"
>> termVectors="true" multiValued="false"/>
>>    <field name="type" type="string" indexed="true" stored="true"
>> termVectors="true" multiValued="false"/>
>>    <field name="drive" type="string" indexed="true" stored="true"
>> termVectors="true" multiValued="false"/>
>>    <field name="comment" type="text_general" indexed="true" stored="true"
>> termVectors="true" multiValued="true"/>
>>    <field name="size" type="string" indexed="true" stored="true"
>> termVectors="true" multiValued="false"/>
>> *
>> *
>> *Copy Fields*
>> <copyField   source="make"     dest="make_en"   />  <!-- Search  -->
>> <copyField   source="model"     dest="model_en"   />  <!-- Search  -->
>> <copyField   source="class"     dest="class_en"   />  <!-- Search  -->
>> <copyField   source="type"     dest="type_en"   />  <!-- Search  -->
>> <copyField   source="drive"     dest="drive_en"   />  <!-- Search  -->
>> <copyField   source="comment"     dest="comment_en"   />  <!-- Search  -->
>> <copyField   source="size"     dest="size_en"   />  <!-- Search  -->
>> <copyField   source="id"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="make"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="model"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="class"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="type"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="drive"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="comment"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>> *<copyField   source="class"     dest="simi_en"   />  <!-- similarity
>>  -->*
>> *<copyField   source="type"     dest="simi_en"   />  <!-- similarity  -->
>> *
>> *<copyField   source="drive"     dest="simi_en"   />  <!-- similarity
>>  -->*
>> *<copyField   source="size"     dest="simi_en"   />  <!-- similarity  -->
>> *
>>
>> Note that the "simi" field ends up with values like  make, class, size
>> and drive:
>> - Luxury SUV 4WD Large
>> - Standard Sedan Front Familt
>>
>>
>> *5. MLT Setup*
>> a. mlt.FL  = *text* QF=*text*  Works but results are obviously not good
>> (make is not a good similarity indicator)
>>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=text&mlt.qf=text
>>
>> b. mlt.FL  = *simi* QF=*simi*  Does not work at all (0 results)
>>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi&mlt.qf=simi
>>
>> c.  mlt.FL  = *simi,text * QF=*simi^10 text^.1*   Works with decent
>> results in most cases
>>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi,text&mlt.qf=simi
>> ^10%20text^.01
>> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large)
>> But for Toyota Camry - it finds hybrid family cars (Prius) ahead of Honda.
>>
>>
>> *
>> *
>>
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to