[Wikidata-bugs] [Maniphest] [Commented On] T88717: Investigate BigData for WDQ

Beebs.systap Fri, 13 Feb 2015 14:09:13 -0800

Beebs.systap added a comment.

In https://phabricator.wikimedia.org/T88717#1037712, @Jdouglas wrote:


> Is there a standard way to load a huge amount of RDF data into Bigdata?  I 
> tried the following (with a 3GB gzipped .nt file), but it very quickly blew 
> the heap:
>
>   Repository repo = BigdataSailFactory.connect("localhost", 9999);
>   RepositoryConnection con = repo.getConnection();
>   File file = new File("/home/james/dumps/wikidata-statements.nt.gz");
>   FileInputStream fileInputStream = new FileInputStream(file);
>   GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
>   con.add(gzipInputStream, null, RDFFormat.N3);
>
>
> EDIT: I cranked up the heap, and ran into the max array length limitation:
>
>   Exception in thread "main" java.lang.OutOfMemoryError: Requested array size 
> exceeds VM limit
>       at java.util.Arrays.copyOf(Arrays.java:2271)
>       at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>       at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>       at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>       at info.aduna.io.IOUtil.transfer(IOUtil.java:494)
>       at info.aduna.io.IOUtil.readBytes(IOUtil.java:210)
>       at 
> com.bigdata.rdf.sail.webapp.client.RemoteRepository$AddOp.prepareForWire(RemoteRepository.java:1492)
>       at 
> com.bigdata.rdf.sail.webapp.client.RemoteRepository$AddOp.access$000(RemoteRepository.java:1436)
>       at 
> com.bigdata.rdf.sail.webapp.client.RemoteRepository.add(RemoteRepository.java:890)
>       at 
> com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.add(BigdataSailRemoteRepositoryConnection.java:663)
>       at 
> com.bigdata.rdf.sail.remote.BigdataSailRemoteRepositoryConnection.add(BigdataSailRemoteRepositoryConnection.java:648)
>       at example.bigdata_client.App.update(App.java:33)
>       at example.bigdata_client.App.main(App.java:23)
>
>
> These errors are all client-side -- the servlet appears to be humming along.  
> Is there a preferred streaming API to use?


The way to do it is use the SPARQL LOAD and provide the URI for the RDF data 
file.


TASK DETAIL
  https://phabricator.wikimedia.org/T88717

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Beebs.systap
Cc: Jdouglas, Beebs.systap, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, aude, GWicke, daniel, JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T88717: Investigate BigData for WDQ

Reply via email to