Stored fields are kept as plain text. It is possible to compress the
fields if it is a lot of data, but you could look into not storing
certain fields (but of course you won't be able to retrieve the data
out of the document after a search). depending on your requirements
this may be interesting.

another thing i suggest is looking at the index using a tool called
'luke'  (http://www.getopt.org/luke/). You can analyse what's going
on, see how much data there is, perhaps run the check index tool,
check to see if there are any extra segments that aren't used, etc.

hope that helps
ben

On Fri, Feb 4, 2011 at 7:00 AM, Ahmed Saidi <ci7nu...@gmail.com> wrote:
> i'm using an arabic analyzer, it analyze only arabic characters, please see
> the attached file.
> there is no duplicate document, and no IndexReader is open.
>
> Ahmed
>
> 2011/2/3 Ahmed Saidi <ci7nu...@gmail.com>
>>
>> i'm using an arabic analyzer, it analyze only arabic characters, please
>> see the attached file.
>> there is no duplicate document, and no IndexReader is open.
>>
>> Ahmed
>> 2011/2/3 Veit Jahns <nuncupa...@googlemail.com>
>>>
>>> 2011/2/2 Ahmed Saidi <ci7nu...@gmail.com>:
>>> > Even after optimizing the index, the size is 20 gb. The size of the
>>> > data which i want to index is about 8 GB.
>>>
>>> Strange indeed. Just some further questions which came into my mind:
>>>
>>> - What kind of analyzer do you use for tokenizing?
>>> - Is the correct number of documents in the indexed and no document
>>> indexed twice?
>>>
>>> And this disuccussion [1] may be useful to you.
>>>
>>> > if i add a set of fields that have the same values to the index, will
>>> > clucene do any kind of compression?
>>>
>>> Not directly. But as far as I understand the index format [2] the
>>> terms are only stored in the term dictionary and which are references
>>> in an implicit manner in the frequency files.
>>>
>>> Veit
>>>
>>> [1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/8622
>>> [2] http://lucene.apache.org/java/2_3_2/fileformats.html
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
>>> Finally, a world-class log management solution at an even better
>>> price-free!
>>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
>>> February 28th, so secure your free ArcSight Logger TODAY!
>>> http://p.sf.net/sfu/arcsight-sfd2d
>>> _______________________________________________
>>> CLucene-developers mailing list
>>> CLucene-developers@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>>
>
>
> ------------------------------------------------------------------------------
> The modern datacenter depends on network connectivity to access resources
> and provide services. The best practices for maximizing a physical server's
> connectivity to a physical network are well understood - see how these
> rules translate into the virtual world?
> http://p.sf.net/sfu/oracle-sfdevnlfb
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
>



-- 
-------------------------------------
Ben van Klinken

Mob: 0401 921847
Em: b...@villagechief.com

------------------------------------------------------------------------------
The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world? 
http://p.sf.net/sfu/oracle-sfdevnlfb
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to