Re: DBPedia+TDB on the latest AWS service: 64GB RAM + 1TB SSD.

Andy Seaborne Fri, 20 Jul 2012 11:04:17 -0700

Hi Olivier,

Which loader are you using? Your right, if the data start spilling todisk, things slow down. Do you have the log file?

On a smaller dataset, using tdbloader2, on an m2.2xlarge (34G RAM) [Ihappen to have access to the machine rather than allocating for aspecial test] I recently got:


Finance (COINS): 417,908,490 triples in 12,406s (3hours, 26min)
 => 33KTPS
with the initial stage proceeding at 150KTPS.

It does not even use all of RAM.

ulimit -d and -m must be "unlimited" and some kernels seem to havelimits on the amount of mapped memory a process is allowed.

We also have tdbloader3 - that needs bedding down but does do parallelsorting. It requires tuning to use it at scale; the defaults are toosmall.

It would be interesting to put more concurrent operation in the indexcreation stage for an SSD in tdbloader2. For a single plain HDD,parallel can create disk head thrashing as two or more processes attemptto write to the disk (more spindles would help).

Paolo has in the past looked at MapReduce jobs for very large scaleloading. Paolo?


        Andy

On 20/07/12 17:26, Olivier Rossel wrote:

Hi all.

Amazon cloud used to provide a high-end solution : 8 core/64GB RAM/1 TB HDD.
I tried to load DBPedia in TDB with this solution, but performances
are "bad" as soon as the 64GB RAM
are not enough to store the indexes. Swap on disk is then used and HDD
performances are "bad".
So it takes several hours (days?) to load DBPedia.
(Honestely I gave up).

Now Amazon cloud has upgraded its high-end solution: 8 core/64GB RAM/1TB SSD.
The SSD option seems to be EXTREMELY fast w.r.t the previous HDD option.

I am wondering if this SSD option can make the loading of DBPedia to
go below (let's say) 4h?
Did anyone try it?

Or may be, you know of a pay-per-hour cloud solution with a LOT of RAM
(let's say 256GB) so
TDB never has to swap on disk?

Any opinion or idea about all that?

Re: DBPedia+TDB on the latest AWS service: 64GB RAM + 1TB SSD.

Reply via email to