Fri, 04 Mar 2011 14:53:50 +0100, Krinkle <[email protected]> wrote:
> On March 4 2011, Seb35 wrote:
>> Fri, 04 Mar 2011 03:45:53 +0100, MZMcBride <[email protected]> wrote:
>>> Seb35 wrote:
>>>> I'm from the French chapter and we need sometimes a lot of CPU power
>>>> and/or a lot of memory for some projects. For now it happened two
>>>> times:
>>>
>>> It's difficult to know what "a lot" of CPU power or memory is from
>>> your
>>> post. Toolserver accounts have account limits
>>> (<https://wiki.toolserver.org/view/Account_limits>), so if you're
>>> staying
>>> within those limits, there's generally no problem. If you want to
>>> exceed
>>> those limits, you should talk to the Toolserver roots first
>>> (<https://wiki.toolserver.org/view/System_administrators>). There are
>>> places
>>> like /mnt/user-store that can be used for large media storage as
>>> well.
>>>
>>> As always, the Toolserver resources that you use need to relate to
>>> Wikimedia
>>> in some way, but it sounds like both of your projects do. :-)
>>>
>>> MZMcBride
>> Ok, thank you, I didn't find this page.
>>
>> For the BnF project we needed in fact about one day of computation
>> (most
>> of the time was used by the disk accesses), but we thought it would be
>> more (we optimized too by using SAX instead of DOM to read big XML
>> files,
>> it used too much memory with DOM too).
>> For the video encoding to OGV (it's not me who done that), it was 4-5
>> hours for a single video but some time was used to swap (and there
>> are 100
>> videos corresponding to the conferences).
>>
>> Thank you for the response.
>> Seb35
>
> Hi Seb35,
>
> "One day" or "4-5 hours" still don't mean a lot in terms of technical
> requirements.
> One day of computing with what equipment ? With 24 hours of runtime a
> small
> difference can make a big difference. What kind of server server/setup
> did this run
> on ?
>
> How much is "too much memory" ?

We needed to transform and crop TIFF images, read an XML associated with a  
book containing the OCRized text of the digitized book, and create a DjVu  
with the images and the text layer.

For that we rent a server, I cannot remember exactly the hardware we  
choosed, but it was probably a 4-core (or 8-core) with 4GB (or 8GB) of RAM  
and 200-300GB of disk (and a server bandwith, useful to download the files  
 from the FTP of the BnF, about 500 files by book (1 XML/page + TIFF  
multipage + some others) x 1416 books = 2-3 days of download on the server  
because of many small files).

 From what I remember, "Too much memory" means my laptop (2-core 2.8GHz,  
3GB of RAM) on which I developed the (Python) program had difficulies to  
load the whole XML file (with DOM). Then I tried with SAX and the work was  
done in some seconds without a lot of memory (I didn't used SAX before,  
but I ♥ SAX now :-)

We wrote a technical report about that, but didn't published it for now  
(perhaps a day, I hope), you can see  
<http://commons.wikimedia.org/wiki/Commons:Bibliothèque_nationale_de_France>  
for an "outreach" document and  
<https://fisheye.toolserver.org/browse/Seb35/BnF_import> for the Python  
program.

Seb35

_______________________________________________
Toolserver-l mailing list ([email protected])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to