Thanks for clarifying, Andy. I hadn't followed the code in RDFDataMgr.read/write().
The "In my experience" bit comes from two cases: - iterating a Model - Iterating the QuerySolution's of a ResultSet In both cases the culprit turned out to be just the GC overhead. The Model instances in that application were short-lived, so despite the Model internally caching ResourceImpl/LIteralImpl classes, there was simply too much object churn. On Sat, Feb 13, 2021 at 10:43 AM Andy Seaborne <a...@apache.org> wrote: > > > On 12/02/2021 13:43, Alexis Armin Huf wrote: > > Hi, emri. > > > > In my experience with Jena I have observed that Graphs are more efficient > > than Models when there is too much data being iterated. > > The actual parsing should go straight into the graph, having picked it > out of the model. > > What can happen with large numbers of small files, called one by one, is > that the model creation overhead shows up. That used to be quite a lot > but still today it's not zero because of initialization of internal > objects. > > If data is added via the model API, so not the parsing case, there are > additional layers that have a cost and also cause the JIT to take longer > to fully optimize the code paths. > > > Also, at every > > createUnion() call, your code is creating a new Union graph which in the > > end will yield a tree of 700 models that will potentially be traversed > when > > Not even "potentially" :-) > > The ModelFactory.createUnion will add layers and layers of union graphs > so that "add triple" becomes a call to N separate graphs in a stack > before getting to graph that actually stores the triple. > > Andy > > > doing searches (which the RDF/XML serializer will eventually have to do > in > > order to layout resources properly inside the XML). > > > > Maybe this will be faster: > > > > public static void main(String[] args) throws Exception{ > > File folder = new File("C:\\Users\\Admin\\Desktop\\KG"); > > Graph acc = GraphFactory.createDefaultGraph(); > > RDFDataMgr.read(acc, > > "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf"); > > for (File file : folder.listFiles()) { > > if (file.isFile()) > > RDFDataMgr.read(acc, file.toURI().toString()); > > } > > try (FileOutputStream out = new > > FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) { > > RDFDataMgr.write(out, acc, RDFFormat.RDFXML); > > } > > } > > > > > > Note that I have not tested this, you may want to run some iterations > under > > the debugger. > > > > On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri <emrimbiemri8...@gmail.com> > > wrote: > > > >> Hi, the code is: > >> ---------------------------------- > >> > >> > >> public class RDFMerge { > >> > >> private static File folder; > >> private static Model kg; > >> // private static OutputStream out; > >> > >> public static void iterate() { > >> > >> folder = new File("C:\\Users\\Admin\\Desktop\\KG"); > >> kg = > >> > >> > ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf"); > >> > >> File[] listOfFiles = folder.listFiles(); // Interating through the > >> directory > >> for (File file : listOfFiles) { > >> > >> if (file.isFile()) { > >> > >> kg = merge(kg, file); > >> > >> } > >> > >> } > >> > >> > >> OutputStream out1; > >> try { out1 = new FileOutputStream( new > >> File("C:\\Users\\Admin\\Desktop\\merged.rdf")); > >> kg.write( out1, "RDF/XML", null ); > >> System.out.println("RDFs merged successfully!"); } > >> catch(FileNotFoundException e) { e.printStackTrace(); } > >> > >> > >> } > >> > >> > >> public static Model merge(Model k, File i) { > >> > >> // Model kg_in = > >> ModelFactory.createDefaultModel().read(k.getAbsolutePath()); > >> Model other_in = > >> ModelFactory.createDefaultModel().read(i.getAbsolutePath()); > >> > >> Model union = ModelFactory.createUnion(k, other_in); > >> return union; > >> }} > >> > >> On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne <a...@apache.org> wrote: > >> > >>> Hi there - the attachment didn't make it through. > >>> > >>> Could you include the code in the body of the message please? Or put in > >>> somewhere like a gist on github / pastebin /... and send a link. > >>> > >>> Andy > >>> > >>> On 12/02/2021 11:45, emri mbiemri wrote: > >>>> Dear all, > >>>> > >>>> Do you know how I can merge some thousands RDF models into a single > >> one? > >>>> I have tried it by iteration through all files within a folder and > then > >>>> using Jena's union function to merge them one by one! The problem is > >>>> that the program is running for more than 13 hours and is still not > >>>> stopping (with only 50 models as test). > >>>> > >>>> So far I have close to 700 models, in total 68MB. > >>>> > >>>> Attached you seen the code I am using for. > >>>> > >>>> Do you have any idea what I can do to merge all these files into a > >>>> single knowledge-graph? > >>>> > >>>> Thanks for your help. > >>> > >> > > > > > -- Alexis Armin Huf <alexis...@gmail.com>