How to parse huge RDF data in a tar.gz file.

Yasunori Yamamoto Tue, 06 Aug 2019 07:06:23 -0700

Hello, I'm trying to learn how to parse RDF data archived in a tar.gz
file (e.g., rdfdatasets.tar.gz that contains a set of RDF data files)
within my Java program.
The following code does work properly, but it is inefficient because
the process reads and loads the entire RDF data in an entry of the
given tar.gz file into a main memory before parsing.
So, could you please let me know a better way to save a memory space ?


TarArchiveInputStream tarInput = new TarArchiveInputStream(new
GzipCompressorInputStream(new FileInputStream(filename)));
TarArchiveEntry currentEntry;
PipedRDFIterator<Triple> iter = new
PipedRDFIterator<Triple>(buffersize, false, pollTimeout, maxPolls);
final PipedRDFStream<Triple> inputStream = new PipedTriplesStream(iter);

while ((currentEntry = tarInput.getNextTarEntry()) != null) {
  String currentFile = currentEntry.getName();
  Lang lang = RDFLanguages.filenameToLang(currentFile);
  parser_object = RDFParserBuilder
    .create()
    .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
    .source(new StringReader(CharStreams.toString(new
InputStreamReader(tarInput))))
    .checking(checking)
    .lang(lang)
    .build();
  parser_object.parse(inputStream);
}
tarInput.close();

Sincerely yours,
Yasunori Yamamoto

How to parse huge RDF data in a tar.gz file.

Reply via email to