The problem is solved, it was my mistake, by accident i have stored
the file text without tokenization in the categorie field!
Thanks for your help.
Ahmed
2011/2/3, Ben van Klinken :
> Stored fields are kept as plain text. It is possible to compress the
> fields if it is a lot of data, but you co
Stored fields are kept as plain text. It is possible to compress the
fields if it is a lot of data, but you could look into not storing
certain fields (but of course you won't be able to retrieve the data
out of the document after a search). depending on your requirements
this may be interesting.
i'm using an arabic analyzer, it analyze only arabic characters, please see
the attached file.
there is no duplicate document, and no IndexReader is open.
Ahmed
2011/2/3 Ahmed Saidi
> i'm using an arabic analyzer, it analyze only arabic characters, please see
> the attached file.
> there is no
i'm using an arabic analyzer, it analyze only arabic characters, please see
the attached file.
there is no duplicate document, and no IndexReader is open.
Ahmed
2011/2/3 Veit Jahns
> 2011/2/2 Ahmed Saidi :
> > Even after optimizing the index, the size is 20 gb. The size of the
> > data which i w
2011/2/2 Ahmed Saidi :
> Even after optimizing the index, the size is 20 gb. The size of the
> data which i want to index is about 8 GB.
Strange indeed. Just some further questions which came into my mind:
- What kind of analyzer do you use for tokenizing?
- Is the correct number of documents in
Even after optimizing the index, the size is 20 gb. The size of the
data which i want to index is about 8 GB.
if i add a set of fields that have the same values to the index, will
clucene do any kind of compression?
Ahmed
2011/2/1, Veit Jahns :
> Hi Ahmed!
>
> 2011/2/1 Ahmed Saidi :
>> I'm using
Hi Ahmed!
2011/2/1 Ahmed Saidi :
> I'm using clucene to index a large set of files, the index size was
> about 2 GB, after adding tree fildes that contient a numbrer such as
> categorie, author id, those fields are not tokinized but stored in the
> index, and a large set of document have the same