xloader on large dataset : Data Task with poor load average

Steven Blanchard Mon, 22 May 2023 02:49:45 -0700

Hello,

I am currently trying to load a very large dataset ( 54 billiontriples) with the tdb2.xloader command.

The first two steps (Nodes and Terms) are completed with an averageload speed of ~ 120,000.The third stage (Data) has an average load speed of only 800. Thisaverage load speed is incompatible with the amount of data to be loaded.

Looking at the status of the job, it is possible that there is anexcessive demand on memory which slows down the process extremely.


We saw with a top that java required many memories :
```
top

# PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND# 867362 sblanch+ 20 0 289,0g 90,2g 88,4g S 3,3 72,11102:32 java

```

But with a free -g, we see that it actually uses very little memory.
```
free -g
#             total used free shared buff/cache available

# Mem: 125 3 0 0 121120

```

Are there any possibilities to speed up this step? (Give a -xms tojava?)Can this significant drop in loading speed for this step be due tomemory usage? Do you know of any other limiting causes in this loadingstage?

For previous insertions on smaller datasets, this Data step was notlimiting and the average speed was even slightly higher than the Nodesand Terms steps.


For information, the machine used has 32 CPUs and 128 Giga of Ram.

Thanks for your help,
Regards,

Steven

xloader on large dataset : Data Task with poor load average

Reply via email to