Thanks for the quick response, Otis.

We have been able to achieve the ratio of 2 with different settings, however, considering the huge volume of the data that we need to deal with - 600 GB of data per day, and, we need to keep it in the index for 3 days - we're looking at all possible ways to reduce the index size further. Will definitely keep exploring the straightforward things and see if we can find a better setting.


On Jul 23, 2009, at 9:49 AM, Otis Gospodnetic wrote:

I'm not sure if there is a lot of benefit from storing the literal values in that external file vs. directly in the index. There are a number of things one should look at first, as far as performance is concerned - JVM settings, cache sizes, analysis, etc.

For example, I have one index here that is 9 times the size of the original data because of how its fields are analyzed. I can change one analysis-level setting and make that ratio go down to 2. So I'd look at other, more straight forward things first. There is a Wiki page either on Solr or Lucene Wiki dedicated to various search performance tricks.

Sematext is hiring:
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

----- Original Message ----
From: Jibo John <>
Sent: Thursday, July 23, 2009 12:08:26 PM
Subject: Re: Storing string field in solr.ExternalFieldFile type

Thanks for the response, Eric.

We have seen that size of the index has a direct impact on the search speed, especially when the index size is in GBs, so trying all possible ways to keep
the index size as low as we can.

We thought solr.ExternalFileField type would help to keep the index size low by
storing a text field out side of the index.

Here's what we were planning: initially, all the fields except the
solr.ExternalFileField type field will be queried and will be displayed to the
end user. . There will be subsequent calls from the UI  to pull the
solr.ExternalFileField field that will be loaded in a lazy manner.

However, realized that solr.ExternalFileField only supports float type, however, the data that we're planning to keep as an external field is a string type.


On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote:

Hoping the experts chime in if I'm wrong, but....
As far as I know, while storing a field increases the size of an index,
it doesn't have much impact on the search speed. Which you could
pretty easily test by creating the index both ways and firing off some timing queries and comparing..... Although it would be time consuming...

I believe there's some info on the Lucene Wiki about this, but my memory
isn't what it used to be.


On Tue, Jul 21, 2009 at 2:42 PM, Jibo John wrote:

We're in the process of building a log searcher application.

In order to reduce the index size to improve the query performance, we're
exploring the possibility of having:

1. One field for each log line with 'indexed=true & stored=false' that
will be used for searching
2. Another field for each log line of type solr.ExternalFileField that
will be used only for display purpose.

We realized that currently solr.ExternalFileField supports only float type.

Is there a way we can override this to support string type? Any issues with
this approach?

Any ideas are welcome.


Reply via email to