Hi! Sure, the 5.6M titles in a HashMap take about 1.3-1.5 G ram, so I run the whole stanbol with -Xmx2500M without issues.
In earlier iterations I have used ehcache + sophisticated custom hit and miss handlers to save memory, but I had to realize that it creates more performance issues than it solves in everyday setups, to I gave up on that. Cheers Mihály On 3 September 2012 15:58, Anuj Kumar <anujs...@gmail.com> wrote: > Hi Mihály, > > Thanks a lot for sharing this. Looks good. > > I was curious to know the memory requirements to load the 5.6million titles > and the whole system to run. If you have any stats, can you please share > that? > > Regards, > Anuj > > On Mon, Sep 3, 2012 at 7:14 PM, Mihály Héder <hederm...@gmail.com> wrote: > >> Hi! >> >> let me introduce BookSpotter Enhancement Engige by Sztaki: >> >> http://blog.iks-project.eu/introducing-bookspotter-enhancement-engine-by-sztaki/ >> >> Bookspotter uses a selection of 5.6M titles from the British National >> Bibliography and the Open Library. >> It scans the incoming text, looking for titles, and in case the author >> is also mentioned, it produces the corresponding entity annotations >> that refer to the proper resource uris of either BNB or OL. >> >> You can check the system out here: >> http://pedia2.sztaki.hu:9090/enhancer/chain/bookspotter >> >> Thanks to the Early Adopter Program, I was able to buy some student >> work hours for data cleaning and for some basic testing. >> You might want to read the report on our test set of 25 tests: >> http://pedia2.sztaki.hu/stanbol/bookspotter/Bookspotter_tests.pdf >> >> For details, see the blog post! >> >> Any comments are much appreciated! >> Cheers, >> Mihály >>