Beebs.systap added a comment.
In https://phabricator.wikimedia.org/T88717#1037712, @Jdouglas wrote:
> Is there a standard way to load a huge amount of RDF data into Bigdata? I
> tried the following (with a 3GB gzipped .nt file), but it very quickly blew
> the heap:
>
> Repository repo = BigdataSailFactory.connect("localhost", 9999);
> RepositoryConnection con = repo.getConnection();
> File file = new File("/home/james/dumps/wikidata-statements.nt.gz");
> FileInputStream fileInputStream = new FileInputStream(file);
> GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
> con.add(gzipInputStream, null, RDFFormat.N3);
>
>
> EDIT: I cranked up the heap, and ran into the max array length limitation:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Requested array size
> exceeds VM limit
> at java.util.Arrays.copyOf(Arrays.java:2271)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at info.aduna.io.IOUtil.transfer(IOUtil.java:494)
> at info.aduna.io.IOUtil.readBytes(IOUtil.java:210)
> at
> com.bigdata.rdf.sail.webapp.client.RemoteRepository$AddOp.prepareForWire(RemoteRepository.java:1492)
> at
> com.bigdata.rdf.sail.webapp.client.RemoteRepository$AddOp.access$000(RemoteRepository.java:1436)
> at
> com.bigdata.rdf.sail.webapp.client.RemoteRepository.add(RemoteRepository.java:890)
> at
> com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.add(BigdataSailRemoteRepositoryConnection.java:663)
> at
> com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.add(BigdataSailRemoteRepositoryConnection.java:648)
> at example.bigdata_client.App.update(App.java:33)
> at example.bigdata_client.App.main(App.java:23)
>
>
> These errors are all client-side -- the servlet appears to be humming along.
> Is there a preferred streaming API to use?
The way to do it is use the SPARQL LOAD and provide the URI for the RDF data
file.
TASK DETAIL
https://phabricator.wikimedia.org/T88717
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
<username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Smalyshev, Beebs.systap
Cc: Jdouglas, Beebs.systap, Aklapper, Manybubbles, jkroll, Smalyshev,
Wikidata-bugs, aude, GWicke, daniel, JanZerebecki
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs