Yes, getting rid of the union model and using a single model to collect all the RDF is better.

Have you detemined where the time is going? On reading or writing?

You have a lot of files and starting up the XML parser is expensive let along the RDF/XML parser on top of that. If you have the option of a different format such as Turtle, that is worth trying.

If this is a fixed set of files to use many times, consider using a database, loading it once and then using that persistent database.

    tdbloader --loc DB *rdf

You are writing with pretty printing turned on. Writing RDF/XML in pretty format is costly, try RDFFormat.RDFXML_PLAIN.

If you are trying to read all the files and print them, you can do that with the command line

# Read all RDF/XML files and print a single graph in turtle:
riot --stream Turtle *.rdf

    Andy

On 12/02/2021 13:43, Alexis Armin Huf wrote:
Hi, emri.

In my experience with Jena I have observed that Graphs are more efficient
than Models when there is too much data being iterated. Also, at every
createUnion() call, your code is creating a new Union graph which in the
end will yield a tree of 700 models that will potentially be traversed when
doing searches (which the RDF/XML serializer will eventually have to do in
order to layout resources properly inside the XML).

Maybe this will be faster:

public static void main(String[] args)  throws Exception{
     File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
     Graph acc = GraphFactory.createDefaultGraph();
     RDFDataMgr.read(acc,
"file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
     for (File file : folder.listFiles()) {
         if (file.isFile())
             RDFDataMgr.read(acc, file.toURI().toString());
     }
     try (FileOutputStream out = new
FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
         RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
     }
}


Note that I have not tested this, you may want to run some iterations under
the debugger.

On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri <emrimbiemri8...@gmail.com>
wrote:

Hi, the code is:
----------------------------------


public class RDFMerge {

private static File folder;
private static Model kg;
// private static OutputStream out;

public static void iterate() {

folder = new File("C:\\Users\\Admin\\Desktop\\KG");
kg =

ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");

File[] listOfFiles = folder.listFiles(); // Interating through the
directory
for (File file : listOfFiles) {

if (file.isFile()) {

kg = merge(kg, file);

}

}


  OutputStream out1;
  try { out1 = new FileOutputStream( new
File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
  kg.write( out1, "RDF/XML", null );
  System.out.println("RDFs merged successfully!"); }
  catch(FileNotFoundException e) { e.printStackTrace(); }


}


public static Model merge(Model k, File i) {

// Model kg_in =
ModelFactory.createDefaultModel().read(k.getAbsolutePath());
Model other_in =
ModelFactory.createDefaultModel().read(i.getAbsolutePath());

Model union = ModelFactory.createUnion(k, other_in);
return union;
}}

On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne <a...@apache.org> wrote:

Hi there - the attachment didn't make it through.

Could you include the code in the body of the message please? Or put in
somewhere like a gist on github / pastebin /... and send a link.

      Andy

On 12/02/2021 11:45, emri mbiemri wrote:
Dear all,

Do you know how I can merge some thousands RDF models into a single
one?
I have tried it by iteration through all files within a folder and then
using Jena's union function to merge them one by one! The problem is
that the program is running for more than 13  hours and is still not
stopping (with only 50 models as test).

So far I have close to 700 models, in total 68MB.

Attached you seen the code I am using for.

Do you have any idea what I can do to merge all these files into a
single knowledge-graph?

Thanks for your help.




Reply via email to