Thanks for the quick response, Otis.
We have been able to achieve the ratio of 2 with different settings,
however, considering the huge volume of the data that we need to deal
with - 600 GB of data per day, and, we need to keep it in the index
for 3 days - we're looking at all possible ways to reduce the index
size further.
Will definitely keep exploring the straightforward things and see if
we can find a better setting.
Thanks,
-Jibo
On Jul 23, 2009, at 9:49 AM, Otis Gospodnetic wrote:
I'm not sure if there is a lot of benefit from storing the literal
values in that external file vs. directly in the index. There are a
number of things one should look at first, as far as performance is
concerned - JVM settings, cache sizes, analysis, etc.
For example, I have one index here that is 9 times the size of the
original data because of how its fields are analyzed. I can change
one analysis-level setting and make that ratio go down to 2. So I'd
look at other, more straight forward things first. There is a Wiki
page either on Solr or Lucene Wiki dedicated to various search
performance tricks.
Otis
--
Sematext is hiring: http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
----- Original Message ----
From: Jibo John <jiboj...@mac.com>
To: solr-user@lucene.apache.org
Sent: Thursday, July 23, 2009 12:08:26 PM
Subject: Re: Storing string field in solr.ExternalFieldFile type
Thanks for the response, Eric.
We have seen that size of the index has a direct impact on the
search speed,
especially when the index size is in GBs, so trying all possible
ways to keep
the index size as low as we can.
We thought solr.ExternalFileField type would help to keep the index
size low by
storing a text field out side of the index.
Here's what we were planning: initially, all the fields except the
solr.ExternalFileField type field will be queried and will be
displayed to the
end user. . There will be subsequent calls from the UI to pull the
solr.ExternalFileField field that will be loaded in a lazy manner.
However, realized that solr.ExternalFileField only supports float
type, however,
the data that we're planning to keep as an external field is a
string type.
Thanks,
-Jibo
On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote:
Hoping the experts chime in if I'm wrong, but....
As far as I know, while storing a field increases the size of an
index,
it doesn't have much impact on the search speed. Which you could
pretty easily test by creating the index both ways and firing off
some
timing queries and comparing..... Although it would be time
consuming...
I believe there's some info on the Lucene Wiki about this, but my
memory
isn't what it used to be.
Erick
On Tue, Jul 21, 2009 at 2:42 PM, Jibo John wrote:
We're in the process of building a log searcher application.
In order to reduce the index size to improve the query
performance, we're
exploring the possibility of having:
1. One field for each log line with 'indexed=true & stored=false'
that
will be used for searching
2. Another field for each log line of type solr.ExternalFileField
that
will be used only for display purpose.
We realized that currently solr.ExternalFileField supports only
float type.
Is there a way we can override this to support string type? Any
issues with
this approach?
Any ideas are welcome.
Thanks,
-Jibo