RE: How to reduce the Solr index size..

2009-08-27 Thread Fuad Efendi
stored=true means that this piece of info will be stored in a filesystem.
So that your index will contain 1Mb of pure log PLUS some info related to
indexing itself: terms, etc.

Search speed is more important than index size...

And note this: message field contains actual log, stored=true, so that
only this field will make 1Mb if not indexed


-Original Message-
From: Silent Surfer [mailto:silentsurfe...@yahoo.com] 
Sent: August-20-09 11:01 AM
To: Solr User
Subject: How to reduce the Solr index size..

Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the indexes
for each line of the logs, so that users would be able to do a fine grain
search upto second/ms.

Now what we are observing is , the index size that is being created is
almost double the size of the actual log size. i.e if the logs size is say 1
MB, the actual index size is around 2 MB.

Could anyone let us know what can be done to reduce the index size. Do we
need to change any configurations/delete any files which are created during
the indexing processes, but not required for searching..

Our schema is as follows:

   field name=pkey type=string indexed=true stored=true
required=false / 
   field name=date type=date indexed=true stored=true
omitNorms=true/
   field name=level type=string indexed=true stored=true/
   field name=app type=string indexed=true stored=true/
   field name=server type=string indexed=true stored=true/
   field name=port type=string indexed=true stored=true/
   field name=class type=string indexed=true stored=true/
   field name=method type=string indexed=true stored=true/
   field name=filename type=string indexed=true stored=true/
   field name=linenumber type=string indexed=true stored=true/
   field name=message type=text indexed=true stored=true/

message field holds the actual logtext.

Thanks,
sS


  





Re: How to reduce the Solr index size..

2009-08-27 Thread Glen Newton
2009/8/27 Fuad Efendi f...@efendi.ca:
 stored=true means that this piece of info will be stored in a filesystem.
 So that your index will contain 1Mb of pure log PLUS some info related to
 indexing itself: terms, etc.

 Search speed is more important than index size...

Not if you run out of space for the index. :-)

 And note this: message field contains actual log, stored=true, so that
 only this field will make 1Mb if not indexed


 -Original Message-
 From: Silent Surfer [mailto:silentsurfe...@yahoo.com]
 Sent: August-20-09 11:01 AM
 To: Solr User
 Subject: How to reduce the Solr index size..

 Hi,

 I am newbie to Solr. We recently started using Solr.

 We are using Solr to process the server logs. We are creating the indexes
 for each line of the logs, so that users would be able to do a fine grain
 search upto second/ms.

 Now what we are observing is , the index size that is being created is
 almost double the size of the actual log size. i.e if the logs size is say 1
 MB, the actual index size is around 2 MB.

 Could anyone let us know what can be done to reduce the index size. Do we
 need to change any configurations/delete any files which are created during
 the indexing processes, but not required for searching..

 Our schema is as follows:

   field name=pkey type=string indexed=true stored=true
 required=false /
   field name=date type=date indexed=true stored=true
 omitNorms=true/
   field name=level type=string indexed=true stored=true/
   field name=app type=string indexed=true stored=true/
   field name=server type=string indexed=true stored=true/
   field name=port type=string indexed=true stored=true/
   field name=class type=string indexed=true stored=true/
   field name=method type=string indexed=true stored=true/
   field name=filename type=string indexed=true stored=true/
   field name=linenumber type=string indexed=true stored=true/
   field name=message type=text indexed=true stored=true/

 message field holds the actual logtext.

 Thanks,
 sS










-- 

-


Re: How to reduce the Solr index size..

2009-08-20 Thread Grant Ingersoll


On Aug 20, 2009, at 11:00 AM, Silent Surfer wrote:


Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the  
indexes for each line of the logs, so that users would be able to do  
a fine grain search upto second/ms.


Now what we are observing is , the index size that is being created  
is almost double the size of the actual log size. i.e if the logs  
size is say 1 MB, the actual index size is around 2 MB.


Could anyone let us know what can be done to reduce the index size.  
Do we need to change any configurations/delete any files which are  
created during the indexing processes, but not required for  
searching..


Our schema is as follows:

  field name=pkey type=string indexed=true stored=true  
required=false /
  field name=date type=date indexed=true stored=true  
omitNorms=true/

  field name=level type=string indexed=true stored=true/
  field name=app type=string indexed=true stored=true/
  field name=server type=string indexed=true stored=true/
  field name=port type=string indexed=true stored=true/
  field name=class type=string indexed=true stored=true/
  field name=method type=string indexed=true stored=true/
  field name=filename type=string indexed=true stored=true/
  field name=linenumber type=string indexed=true  
stored=true/

  field name=message type=text indexed=true stored=true/

message field holds the actual logtext.


There are a couple of things you can do:
1. stored = true only needs to be on if you are going to use that  
value later in your application (i.e. for display).  Storage is not  
needed for search.
2. You can omitNorms and termFreqsAndPositions for any fields that you  
aren't searching (but just displaying).


A doubling in size seems a bit much.  However, 1 MB is likely not  
enough to show whether this holds true for a larger index.  Often  
times, the growth of the index is sublinear, since the same terms  
appear over and over again and Lucene can obtain pretty high levels of  
compression.


Also, are you adding any other content to what comes in (synonyms,  
etc.)?


I would open up the index in Luke, too and make sure everything looks  
right.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



How to reduce the Solr index size..

2009-08-20 Thread Silent Surfer
Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the indexes for 
each line of the logs, so that users would be able to do a fine grain search 
upto second/ms.

Now what we are observing is , the index size that is being created is almost 
double the size of the actual log size. i.e if the logs size is say 1 MB, the 
actual index size is around 2 MB.

Could anyone let us know what can be done to reduce the index size. Do we need 
to change any configurations/delete any files which are created during the 
indexing processes, but not required for searching..

Our schema is as follows:

   field name=pkey type=string indexed=true stored=true 
required=false / 
   field name=date type=date indexed=true stored=true 
omitNorms=true/
   field name=level type=string indexed=true stored=true/
   field name=app type=string indexed=true stored=true/
   field name=server type=string indexed=true stored=true/
   field name=port type=string indexed=true stored=true/
   field name=class type=string indexed=true stored=true/
   field name=method type=string indexed=true stored=true/
   field name=filename type=string indexed=true stored=true/
   field name=linenumber type=string indexed=true stored=true/
   field name=message type=text indexed=true stored=true/

message field holds the actual logtext.

Thanks,
sS