Isn't this an AWS security groups question? You should probably post this 
question on the AWS forums, but for the moment, here's the basic reading 
material - go set up your EC2 security groups and lock down your systems.

        
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html

If you just want to password protect Solr here are the instructions:

        http://wiki.apache.org/solr/SolrSecurity

But I most certainly would not leave it open to the world even with a password 
(note that the basic password authentication sends passwords in clear text if 
you're not using HTTPS, best lock the thing down behind a firewall).

Dave


-----Original Message-----
From: DC tech [mailto:dctech1...@gmail.com] 
Sent: Tuesday, April 02, 2013 1:02 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis - Odd results - what am I doing wrong?

OK - so I have my SOLR instance running on AWS. 
Any suggestions on how to safely share the link?  Right now, the whole SOLR 
instance is totally open. 



Gagandeep singh <gagan.g...@gmail.com> wrote:

>say &debugQuery=true&mlt=true and see the scores for the MLT query, not 
>a sample query. You can use Amazon ec2 to bring up your solr, you 
>should be able to get a micro instance for free trial.
>
>
>On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1...@gmail.com> wrote:
>
>> I did try the raw query against the *simi* field and those seem to 
>> return results in the order expected.
>> For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
>> Running a query with those words against the simi field returns the 
>> expected models (X5, Audi Q5, etc) and then the subsequent documents 
>> have decreasing relevance. So the basic query mechanism seems to be fine.
>>
>> The issue just seems to be with MoreLikeThis component and handler.
>> I can post the index on a public SOLR instance - any suggestions? (or 
>> for
>> hosting)
>>
>>
>> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh 
>> <gagan.g...@gmail.com
>> >wrote:
>>
>> > If you can bring up your solr setup on a public machine then im 
>> > sure a
>> lot
>> > of debugging can be done. Without that, i think what you should 
>> > look at
>> is
>> > the tf-idf scores of the terms like "camry" etc. Usually idf is the 
>> > deciding factor into which results show at the top (tf should be 1 
>> > for
>> your
>> > data).
>> > Enable &debugQuery=true and look at explain section to see show 
>> > score is getting calculated.
>> >
>> > You should try giving different boosts to class, type, drive, size 
>> > to control the results.
>> >
>> >
>> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote:
>> >
>> >> I am running some experiments on more like this and the results 
>> >> seem rather odd - I am doing something wrong but just cannot figure out 
>> >> what.
>> >> Basically, the similarity results are decent - but not great.
>> >>
>> >> *Issue 1  = Quality*
>> >> Toyota Camry : finds Altima (good) but then next one is Camry 
>> >> Hybrid whereas it should have found Accord.
>> >> I have normalized the data into a simi field which has only the 
>> >> attributes that I care about.
>> >> Without the simi field, I could not get mlt.qf boosts to work well
>> enough
>> >> to return results
>> >>
>> >> *Issue 2*
>> >> Some fields do not work at all. For instance, text+simi (in 
>> >> mlt.fl)
>> works
>> >> whereas just simi does not.
>> >> So some weirdness that am just not understanding.
>> >>
>> >> Would be grateful for your guidance !
>> >>
>> >>
>> >> Here is the setup:
>> >> *1. SOLR Version*
>> >> solr-spec 4.2.0.2013.03.06.22.32.13
>> >> solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
>> >> lucene-spec 4.2.0
>> >> lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
>> >>
>> >> *2. Machine Information*
>> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
>> >> 19.0-b09)
>> >> Windows 7 Home 64 Bit with 4 GB RAM
>> >>
>> >> *3. Sample Data *
>> >> I created this 'dummy' data of cars  - the idea being that these 
>> >> would
>> be
>> >> sufficient and simple to generate similarity and understand how it 
>> >> would work.
>> >> There are 181 rows in the data set (I have attached it for 
>> >> reference in CSV format)
>> >>
>> >> [image: Inline image 1]
>> >>
>> >> *4. SCHEMA*
>> >> *Field Definitions*
>> >>    <field name="id" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="make" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="model" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="class" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="type" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="drive" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="comment" type="text_general" indexed="true"
>> stored="true"
>> >> termVectors="true" multiValued="true"/>
>> >>    <field name="size" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >> *
>> >> *
>> >> *Copy Fields*
>> >> <copyField   source="make"     dest="make_en"   />  <!-- Search  -->
>> >> <copyField   source="model"     dest="model_en"   />  <!-- Search  -->
>> >> <copyField   source="class"     dest="class_en"   />  <!-- Search  -->
>> >> <copyField   source="type"     dest="type_en"   />  <!-- Search  -->
>> >> <copyField   source="drive"     dest="drive_en"   />  <!-- Search  -->
>> >> <copyField   source="comment"     dest="comment_en"   />  <!-- Search
>>  -->
>> >> <copyField   source="size"     dest="size_en"   />  <!-- Search  -->
>> >> <copyField   source="id"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="make"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="model"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="class"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="type"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="drive"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="comment"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>> >> *<copyField   source="class"     dest="simi_en"   />  <!-- similarity
>> >>  -->*
>> >> *<copyField   source="type"     dest="simi_en"   />  <!-- similarity
>>  -->
>> >> *
>> >> *<copyField   source="drive"     dest="simi_en"   />  <!-- similarity
>> >>  -->*
>> >> *<copyField   source="size"     dest="simi_en"   />  <!-- similarity
>>  -->
>> >> *
>> >>
>> >> Note that the "simi" field ends up with values like  make, class, 
>> >> size and drive:
>> >> - Luxury SUV 4WD Large
>> >> - Standard Sedan Front Familt
>> >>
>> >>
>> >> *5. MLT Setup*
>> >> a. mlt.FL  = *text* QF=*text*  Works but results are obviously not 
>> >> good (make is not a good similarity indicator)
>> >>
>> >>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f
>> l=text&mlt.qf=text
>> >>
>> >> b. mlt.FL  = *simi* QF=*simi*  Does not work at all (0 results)
>> >>
>> >>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f
>> l=simi&mlt.qf=simi
>> >>
>> >> c.  mlt.FL  = *simi,text * QF=*simi^10 text^.1*   Works with decent
>> >> results in most cases
>> >>
>> >>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f
>> l=simi,text&mlt.qf=simi
>> >> ^10%20text^.01
>> >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large) 
>> >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead 
>> >> of
>> Honda.
>> >>
>> >>
>> >> *
>> >> *
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>

Reply via email to