Finally got to running these tests.

Here are the basics...

Core i7 - 960
24GB RAM
Solr index on its own drive

Solr 3.3.0  running under tomcat 7.0.19, jdk1.6.0_26, java opts are:

        JAVA_OPTS="-Xmx4096M -XX:-UseGCOverheadLimit" 
 
Raw data is 80GB in SOLR marking for adding, sample below:

<field name="key">5</field>
<field name="language">en</field>
<field name="rank">202</field>
<field name="date">2008-07-31T23:29:40Z</field>
<field 
name="url">http://tomfoolery4.wordpress.com/2008/07/31/finally-a-buffalo-webmedia-site-that-doesnt-sit-on-the-fence/</field>
<field name="title">Finally! A Buffalo Web/Media Site That Doesn’t Sit On The 
Fence!</field>
<field name="text">The Buffalo News has got my back on this one. A lot of area 
musicians, artists, writers and photographers have got my back on this one. And 
now, I&apos;m pleased to say, so does WNYMedia.net, another new voice in a 
small 
sea of journalistic endeavors afoot in Buffalo. What I like about this site 
[...]</field>


icwsm does not include content - 52GB

        <!-- Fields -->
        <field name="key" type="string" indexed="true" stored="true"/>
        <field name="language" type="string" indexed="true" stored="false"/>
        <field name="rank" type="int" indexed="true" stored="false"/>
        <field name="date" type="date" indexed="true" stored="false"/>
        <field name="url" type="string" indexed="true" stored="false"/>
        <field name="title" type="text_general" indexed="true" stored="false"/>
        <field name="text" type="text_general" indexed="true" stored="false"/>

        <!-- Default search field -->
        <field name="default" type="text_general" indexed="true" stored="false" 
multiValued="true"/>


icwsm2 includes content - 117GB 

        <!-- Fields -->
        <field name="key" type="string" indexed="true" stored="true"/>
        <field name="language" type="string" indexed="true" stored="true"/>
        <field name="rank" type="int" indexed="true" stored="true"/>
        <field name="date" type="date" indexed="true" stored="true"/>
        <field name="url" type="string" indexed="true" stored="true"/>
        <field name="title" type="text_general" indexed="true" stored="true"/>
        <field name="text" type="text_general" indexed="true" stored="true"/>

        <!-- Default search field -->
        <field name="default" type="text_general" indexed="true" stored="false" 
multiValued="true"/>


I used 1,000 searches from a 162,000 search set I saved from feedster days, 
here are some sample searches:

belize
st louis cardinals
offshoring
2010 olympic games
nanotubes
"beamed power"
"space elevator"
"power beaming"
world news
dogster
vancouver-centre
news


I ran six tests, two on icwsm getting the key and the score (10 rows and 100 
rows), two on icwsm2 getting the key and the score (10 rows and 100 rows), and 
two on icwsm2 getting all the fields and the scores (10 rows and 100 rows). 
Each test was run 10 times consecutively, nothing was running on the machine.

This table shows the time elapsed, the index name, the rows requested and the 
fields requested:

 182  icwsm  10  key,score
 184  icwsm  10  key,score
 182  icwsm  10  key,score
 182  icwsm  10  key,score
 184  icwsm  10  key,score
 183  icwsm  10  key,score
 183  icwsm  10  key,score
 183  icwsm  10  key,score
 184  icwsm  10  key,score
 183  icwsm  10  key,score

 190  icwsm  100  key,score
 183  icwsm  100  key,score
 184  icwsm  100  key,score
 184  icwsm  100  key,score
 183  icwsm  100  key,score
 183  icwsm  100  key,score
 182  icwsm  100  key,score
 183  icwsm  100  key,score
 185  icwsm  100  key,score
 184  icwsm  100  key,score

 204  icwsm2  10  key,score
 183  icwsm2  10  key,score
 184  icwsm2  10  key,score
 184  icwsm2  10  key,score
 185  icwsm2  10  key,score
 184  icwsm2  10  key,score
 183  icwsm2  10  key,score
 185  icwsm2  10  key,score
 184  icwsm2  10  key,score
 184  icwsm2  10  key,score

 288  icwsm2  100  key,score
 184  icwsm2  100  key,score
 186  icwsm2  100  key,score
 184  icwsm2  100  key,score
 186  icwsm2  100  key,score
 186  icwsm2  100  key,score
 186  icwsm2  100  key,score
 186  icwsm2  100  key,score
 189  icwsm2  100  key,score
 188  icwsm2  100  key,score

 185  icwsm2  10  *,score
 184  icwsm2  10  *,score
 183  icwsm2  10  *,score
 184  icwsm2  10  *,score
 184  icwsm2  10  *,score
 184  icwsm2  10  *,score
 185  icwsm2  10  *,score
 184  icwsm2  10  *,score
 184  icwsm2  10  *,score
 184  icwsm2  10  *,score

 206  icwsm2  100  *,score
 185  icwsm2  100  *,score
 186  icwsm2  100  *,score
 190  icwsm2  100  *,score
 195  icwsm2  100  *,score
 191  icwsm2  100  *,score
 193  icwsm2  100  *,score
 190  icwsm2  100  *,score
 186  icwsm2  100  *,score
 186  icwsm2  100  *,score

Basically storing the data in the index has virtually no impact on search speed 
from what I can see which is what I would expect.


Cheers

François






On Jul 8, 2011, at 12:18 PM, Erick Erickson wrote:

> Well, it depends (tm). Raw search time should be unaffected (or very
> close to that). The stored data is in a completely separate file in
> the index directory and is not referenced during searches.
> 
> That said, assembling the response may take longer since you're
> potentially reading more data from the disk to create each document.
> 
> Insure that lazy field loading is turned on, and when you're comparing
> times it would probably be best to return the same fields (perhaps just ID).
> 
> Note that the Qtime in the response packet is the search, exclusive of
> assembling the response so that's probably a good number to measure.
> 
> Best
> Erick
> 
> On Fri, Jul 8, 2011 at 8:01 AM, jame vaalet <jamevaa...@gmail.com> wrote:
>> i would prefer every setting to be in its default stage and compare the
>> result with stored = true and False .
>> 
>> 2011/7/8 François Schiettecatte <fschietteca...@gmail.com>
>> 
>>> Hi
>>> 
>>> I don't think that anyone has run such benchmarks, in fact this topic came
>>> up two weeks ago and I volunteered some time to do that because I have some
>>> spare time this week, so I am going to run some benchmarks this weekend and
>>> report back.
>>> 
>>> The machine I have to do this a core i7 960, 24GB, 4TB of disk. I am going
>>> to run SOLR 3.3 under Tomcat 7.0.16. I have three databases I can use for
>>> this, icwsm-2009 (38.5GB compressed), cdip (24GB compressed), trec vlc2
>>> (31GB compressed). I could also use a copy of wikipedia. I have lots of user
>>> searches I can use (saved from Feedster days).
>>> 
>>> I would like some input on a couple of things to make this test as
>>> real-world as possible. One is any optimizations I should set in
>>> solrconfig.xml, and the other are the heap/GC settings I should set for
>>> tomcat. Anything else?
>>> 
>>> Cheers
>>> 
>>> François
>>> 
>>> On Jul 8, 2011, at 4:08 AM, jame vaalet wrote:
>>> 
>>>> hi,
>>>> 
>>>> is there any performance degradation (response time etc ) if the index
>>> has
>>>> document content text stored in it  (stored=true)?
>>>> 
>>>> -JAME
>>> 
>>> 
>> 
>> 
>> --
>> 
>> -JAME
>> 

Reply via email to