RE: HW requirements
A classic on the importance of prototyping with your data and on the intractability of sizing in the abstract: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ This might be of use: https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls but note this thread: https://mail-archives.apache.org/mod_mbox/lucene-java-user/201503.mbox/%3cCALZAj3KMiStgiFZb=RTAEqDg8dpPYcmaj25T26Hi+=c7cal...@mail.gmail.com%3e perhaps: http://docs.alfresco.com/4.1/concepts/solrnodes-memory.html -Original Message- From: Sznajder ForMailingList [mailto:bs4mailingl...@gmail.com] Sent: Wednesday, May 27, 2015 12:34 PM To: solr-user@lucene.apache.org Subject: HW requirements Hi , Could you give me some hints wrt HW requirements for Solr if I need to index about 400 Gigas of text? Thanks Benjamin
Re: HW requirements
You need to translate your source data size into number of documents and document size. Document size will depend on number of fields, the type of data in each field, and the size of the data in each field. You need to think about numeric and date fields, raw string fields, and keyword text fields. Solr and Lucene do not merely index a bulk blob of bytes, but semi-structured data, in the form of documents and fields. In some cases the indexed data can be smaller than the source data, but it can sometimes be larger as well. -- Jack Krupansky On Wed, May 27, 2015 at 12:33 PM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi , Could you give me some hints wrt HW requirements for Solr if I need to index about 400 Gigas of text? Thanks Benjamin
Re: HW requirements
Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Could you give me some hints wrt HW requirements for Solr if I need to index about 400 Gigas of text? No. You are providing far too few data for us to guess. 400GB can be handled on a laptop or require 3 strong servers, depending on what you intend to do with the data, how much the machine(s) will be used while indexing and your requirements to speed. See https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ - Toke Eskildsen