RE: HW requirements

2015-05-28 Thread Allison, Timothy B.
A classic on the importance of prototyping with your data and on the 
intractability of sizing in the abstract:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
 


This might be of use:

https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls

but note this thread:
https://mail-archives.apache.org/mod_mbox/lucene-java-user/201503.mbox/%3cCALZAj3KMiStgiFZb=RTAEqDg8dpPYcmaj25T26Hi+=c7cal...@mail.gmail.com%3e
 

perhaps: 
http://docs.alfresco.com/4.1/concepts/solrnodes-memory.html 
-Original Message-
From: Sznajder ForMailingList [mailto:bs4mailingl...@gmail.com] 
Sent: Wednesday, May 27, 2015 12:34 PM
To: solr-user@lucene.apache.org
Subject: HW requirements

Hi ,

Could you give me some hints wrt HW requirements for Solr if I need to
index about 400 Gigas of text?

Thanks

Benjamin


Re: HW requirements

2015-05-28 Thread Jack Krupansky
You need to translate your source data size into number of documents and
document size. Document size will depend on number of fields, the type of
data in each field, and the size of the data in each field. You need to
think about numeric and date fields, raw string fields, and keyword text
fields.

Solr and Lucene do not merely index a bulk blob of bytes, but
semi-structured data, in the form of documents and fields.

In some cases the indexed data can be smaller than the source data, but it
can sometimes be larger as well.


-- Jack Krupansky

On Wed, May 27, 2015 at 12:33 PM, Sznajder ForMailingList 
bs4mailingl...@gmail.com wrote:

 Hi ,

 Could you give me some hints wrt HW requirements for Solr if I need to
 index about 400 Gigas of text?

 Thanks

 Benjamin



Re: HW requirements

2015-05-27 Thread Toke Eskildsen
Sznajder ForMailingList bs4mailingl...@gmail.com wrote:
 Could you give me some hints wrt HW requirements for Solr if I need to
 index about 400 Gigas of text?

No. You are providing far too few data for us to guess.

400GB can be handled on a laptop or require 3 strong servers, depending on what 
you intend to do with the data, how much the machine(s) will be used while 
indexing and your requirements to speed.

See 
https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

- Toke Eskildsen