Finally got to running these tests. Here are the basics...
Core i7 - 960 24GB RAM Solr index on its own drive Solr 3.3.0 running under tomcat 7.0.19, jdk1.6.0_26, java opts are: JAVA_OPTS="-Xmx4096M -XX:-UseGCOverheadLimit" Raw data is 80GB in SOLR marking for adding, sample below: <field name="key">5</field> <field name="language">en</field> <field name="rank">202</field> <field name="date">2008-07-31T23:29:40Z</field> <field name="url">http://tomfoolery4.wordpress.com/2008/07/31/finally-a-buffalo-webmedia-site-that-doesnt-sit-on-the-fence/</field> <field name="title">Finally! A Buffalo Web/Media Site That Doesn’t Sit On The Fence!</field> <field name="text">The Buffalo News has got my back on this one. A lot of area musicians, artists, writers and photographers have got my back on this one. And now, I'm pleased to say, so does WNYMedia.net, another new voice in a small sea of journalistic endeavors afoot in Buffalo. What I like about this site [...]</field> icwsm does not include content - 52GB <!-- Fields --> <field name="key" type="string" indexed="true" stored="true"/> <field name="language" type="string" indexed="true" stored="false"/> <field name="rank" type="int" indexed="true" stored="false"/> <field name="date" type="date" indexed="true" stored="false"/> <field name="url" type="string" indexed="true" stored="false"/> <field name="title" type="text_general" indexed="true" stored="false"/> <field name="text" type="text_general" indexed="true" stored="false"/> <!-- Default search field --> <field name="default" type="text_general" indexed="true" stored="false" multiValued="true"/> icwsm2 includes content - 117GB <!-- Fields --> <field name="key" type="string" indexed="true" stored="true"/> <field name="language" type="string" indexed="true" stored="true"/> <field name="rank" type="int" indexed="true" stored="true"/> <field name="date" type="date" indexed="true" stored="true"/> <field name="url" type="string" indexed="true" stored="true"/> <field name="title" type="text_general" indexed="true" stored="true"/> <field name="text" type="text_general" indexed="true" stored="true"/> <!-- Default search field --> <field name="default" type="text_general" indexed="true" stored="false" multiValued="true"/> I used 1,000 searches from a 162,000 search set I saved from feedster days, here are some sample searches: belize st louis cardinals offshoring 2010 olympic games nanotubes "beamed power" "space elevator" "power beaming" world news dogster vancouver-centre news I ran six tests, two on icwsm getting the key and the score (10 rows and 100 rows), two on icwsm2 getting the key and the score (10 rows and 100 rows), and two on icwsm2 getting all the fields and the scores (10 rows and 100 rows). Each test was run 10 times consecutively, nothing was running on the machine. This table shows the time elapsed, the index name, the rows requested and the fields requested: 182 icwsm 10 key,score 184 icwsm 10 key,score 182 icwsm 10 key,score 182 icwsm 10 key,score 184 icwsm 10 key,score 183 icwsm 10 key,score 183 icwsm 10 key,score 183 icwsm 10 key,score 184 icwsm 10 key,score 183 icwsm 10 key,score 190 icwsm 100 key,score 183 icwsm 100 key,score 184 icwsm 100 key,score 184 icwsm 100 key,score 183 icwsm 100 key,score 183 icwsm 100 key,score 182 icwsm 100 key,score 183 icwsm 100 key,score 185 icwsm 100 key,score 184 icwsm 100 key,score 204 icwsm2 10 key,score 183 icwsm2 10 key,score 184 icwsm2 10 key,score 184 icwsm2 10 key,score 185 icwsm2 10 key,score 184 icwsm2 10 key,score 183 icwsm2 10 key,score 185 icwsm2 10 key,score 184 icwsm2 10 key,score 184 icwsm2 10 key,score 288 icwsm2 100 key,score 184 icwsm2 100 key,score 186 icwsm2 100 key,score 184 icwsm2 100 key,score 186 icwsm2 100 key,score 186 icwsm2 100 key,score 186 icwsm2 100 key,score 186 icwsm2 100 key,score 189 icwsm2 100 key,score 188 icwsm2 100 key,score 185 icwsm2 10 *,score 184 icwsm2 10 *,score 183 icwsm2 10 *,score 184 icwsm2 10 *,score 184 icwsm2 10 *,score 184 icwsm2 10 *,score 185 icwsm2 10 *,score 184 icwsm2 10 *,score 184 icwsm2 10 *,score 184 icwsm2 10 *,score 206 icwsm2 100 *,score 185 icwsm2 100 *,score 186 icwsm2 100 *,score 190 icwsm2 100 *,score 195 icwsm2 100 *,score 191 icwsm2 100 *,score 193 icwsm2 100 *,score 190 icwsm2 100 *,score 186 icwsm2 100 *,score 186 icwsm2 100 *,score Basically storing the data in the index has virtually no impact on search speed from what I can see which is what I would expect. Cheers François On Jul 8, 2011, at 12:18 PM, Erick Erickson wrote: > Well, it depends (tm). Raw search time should be unaffected (or very > close to that). The stored data is in a completely separate file in > the index directory and is not referenced during searches. > > That said, assembling the response may take longer since you're > potentially reading more data from the disk to create each document. > > Insure that lazy field loading is turned on, and when you're comparing > times it would probably be best to return the same fields (perhaps just ID). > > Note that the Qtime in the response packet is the search, exclusive of > assembling the response so that's probably a good number to measure. > > Best > Erick > > On Fri, Jul 8, 2011 at 8:01 AM, jame vaalet <jamevaa...@gmail.com> wrote: >> i would prefer every setting to be in its default stage and compare the >> result with stored = true and False . >> >> 2011/7/8 François Schiettecatte <fschietteca...@gmail.com> >> >>> Hi >>> >>> I don't think that anyone has run such benchmarks, in fact this topic came >>> up two weeks ago and I volunteered some time to do that because I have some >>> spare time this week, so I am going to run some benchmarks this weekend and >>> report back. >>> >>> The machine I have to do this a core i7 960, 24GB, 4TB of disk. I am going >>> to run SOLR 3.3 under Tomcat 7.0.16. I have three databases I can use for >>> this, icwsm-2009 (38.5GB compressed), cdip (24GB compressed), trec vlc2 >>> (31GB compressed). I could also use a copy of wikipedia. I have lots of user >>> searches I can use (saved from Feedster days). >>> >>> I would like some input on a couple of things to make this test as >>> real-world as possible. One is any optimizations I should set in >>> solrconfig.xml, and the other are the heap/GC settings I should set for >>> tomcat. Anything else? >>> >>> Cheers >>> >>> François >>> >>> On Jul 8, 2011, at 4:08 AM, jame vaalet wrote: >>> >>>> hi, >>>> >>>> is there any performance degradation (response time etc ) if the index >>> has >>>> document content text stored in it (stored=true)? >>>> >>>> -JAME >>> >>> >> >> >> -- >> >> -JAME >>