Hello,

I would like to upload a very large dataset (UniRef) to a fuseki database. I tried to upload file by file but the upload time was exponential with each file added.

code use :
```python
   url: str = f"{jena_url}/{db_name}/data
   multipart_data: MultipartEncoder = MultipartEncoder(
       fields={
           "file": (
               f"{file_name}",
               open(
                   f"{path_file}",
                   "rb",
               ),
               "text/turtle",
           )
       }
   )
   response : requests.Request = requests.post(
       url,
       data=multipart_data,
       headers={"Content-Type": multipart_data.content_type},
       cookies=cookies,
   )
```

Then I tried to upload with the command tdb2.tdbloader.
By uploading all the files in the same command the upload became very much faster. Also, tdb2.tdbloader has an option to parallelize the upload.

code use :
```bash
bin/tdb2.tdbloader --loader=parallel --loc fuseki/base/databases/uniref/ data/uniref_*
```
The problem with tdb2 is that it does not work in http.

I would like to know if it is possible to get the same performance as tdb2 (loading all files at once, parallelization...) by using an http request?
I'm also open to other suggestions to optimize this file loading.

What explains this exponential evolution of the upload time when adding data in several times?

Thank you for your help,

Steven

Reply via email to