Fri, 04 Mar 2011 14:53:50 +0100, Krinkle <[email protected]> wrote: > On March 4 2011, Seb35 wrote: >> Fri, 04 Mar 2011 03:45:53 +0100, MZMcBride <[email protected]> wrote: >>> Seb35 wrote: >>>> I'm from the French chapter and we need sometimes a lot of CPU power >>>> and/or a lot of memory for some projects. For now it happened two >>>> times: >>> >>> It's difficult to know what "a lot" of CPU power or memory is from >>> your >>> post. Toolserver accounts have account limits >>> (<https://wiki.toolserver.org/view/Account_limits>), so if you're >>> staying >>> within those limits, there's generally no problem. If you want to >>> exceed >>> those limits, you should talk to the Toolserver roots first >>> (<https://wiki.toolserver.org/view/System_administrators>). There are >>> places >>> like /mnt/user-store that can be used for large media storage as >>> well. >>> >>> As always, the Toolserver resources that you use need to relate to >>> Wikimedia >>> in some way, but it sounds like both of your projects do. :-) >>> >>> MZMcBride >> Ok, thank you, I didn't find this page. >> >> For the BnF project we needed in fact about one day of computation >> (most >> of the time was used by the disk accesses), but we thought it would be >> more (we optimized too by using SAX instead of DOM to read big XML >> files, >> it used too much memory with DOM too). >> For the video encoding to OGV (it's not me who done that), it was 4-5 >> hours for a single video but some time was used to swap (and there >> are 100 >> videos corresponding to the conferences). >> >> Thank you for the response. >> Seb35 > > Hi Seb35, > > "One day" or "4-5 hours" still don't mean a lot in terms of technical > requirements. > One day of computing with what equipment ? With 24 hours of runtime a > small > difference can make a big difference. What kind of server server/setup > did this run > on ? > > How much is "too much memory" ?
We needed to transform and crop TIFF images, read an XML associated with a book containing the OCRized text of the digitized book, and create a DjVu with the images and the text layer. For that we rent a server, I cannot remember exactly the hardware we choosed, but it was probably a 4-core (or 8-core) with 4GB (or 8GB) of RAM and 200-300GB of disk (and a server bandwith, useful to download the files from the FTP of the BnF, about 500 files by book (1 XML/page + TIFF multipage + some others) x 1416 books = 2-3 days of download on the server because of many small files). From what I remember, "Too much memory" means my laptop (2-core 2.8GHz, 3GB of RAM) on which I developed the (Python) program had difficulies to load the whole XML file (with DOM). Then I tried with SAX and the work was done in some seconds without a lot of memory (I didn't used SAX before, but I ♥ SAX now :-) We wrote a technical report about that, but didn't published it for now (perhaps a day, I hope), you can see <http://commons.wikimedia.org/wiki/Commons:Bibliothèque_nationale_de_France> for an "outreach" document and <https://fisheye.toolserver.org/browse/Seb35/BnF_import> for the Python program. Seb35 _______________________________________________ Toolserver-l mailing list ([email protected]) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
