Thanks for clarifying, Andy. I hadn't followed the code in
RDFDataMgr.read/write().

The "In my experience" bit comes from two cases:
- iterating a Model
- Iterating the QuerySolution's of a ResultSet

In both cases the culprit turned out to be just the GC overhead. The Model
instances in that application were short-lived, so despite the Model
internally caching ResourceImpl/LIteralImpl classes, there was simply too
much object churn.

On Sat, Feb 13, 2021 at 10:43 AM Andy Seaborne <a...@apache.org> wrote:

>
>
> On 12/02/2021 13:43, Alexis Armin Huf wrote:
> > Hi, emri.
> >
> > In my experience with Jena I have observed that Graphs are more efficient
> > than Models when there is too much data being iterated.
>
> The actual parsing should go straight into the graph, having picked it
> out of the model.
>
> What can happen with large numbers of small files, called one by one, is
> that the model creation overhead shows up. That used to be quite a lot
> but still today it's not zero because of initialization of internal
> objects.
>
> If data is added via the model API, so not the parsing case, there are
> additional layers that have a cost and also cause the JIT to take longer
> to fully optimize the code paths.
>
> > Also, at every
> > createUnion() call, your code is creating a new Union graph which in the
> > end will yield a tree of 700 models that will potentially be traversed
> when
>
> Not even "potentially" :-)
>
> The ModelFactory.createUnion will add layers and layers of union graphs
> so that "add triple" becomes a call to N separate graphs in a stack
> before getting to graph that actually stores the triple.
>
>      Andy
>
> > doing searches (which the RDF/XML serializer will eventually have to do
> in
> > order to layout resources properly inside the XML).
> >
> > Maybe this will be faster:
> >
> > public static void main(String[] args)  throws Exception{
> >      File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
> >      Graph acc = GraphFactory.createDefaultGraph();
> >      RDFDataMgr.read(acc,
> > "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
> >      for (File file : folder.listFiles()) {
> >          if (file.isFile())
> >              RDFDataMgr.read(acc, file.toURI().toString());
> >      }
> >      try (FileOutputStream out = new
> > FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
> >          RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
> >      }
> > }
> >
> >
> > Note that I have not tested this, you may want to run some iterations
> under
> > the debugger.
> >
> > On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri <emrimbiemri8...@gmail.com>
> > wrote:
> >
> >> Hi, the code is:
> >> ----------------------------------
> >>
> >>
> >> public class RDFMerge {
> >>
> >> private static File folder;
> >> private static Model kg;
> >> // private static OutputStream out;
> >>
> >> public static void iterate() {
> >>
> >> folder = new File("C:\\Users\\Admin\\Desktop\\KG");
> >> kg =
> >>
> >>
> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
> >>
> >> File[] listOfFiles = folder.listFiles(); // Interating through the
> >> directory
> >> for (File file : listOfFiles) {
> >>
> >> if (file.isFile()) {
> >>
> >> kg = merge(kg, file);
> >>
> >> }
> >>
> >> }
> >>
> >>
> >>   OutputStream out1;
> >>   try { out1 = new FileOutputStream( new
> >> File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
> >>   kg.write( out1, "RDF/XML", null );
> >>   System.out.println("RDFs merged successfully!"); }
> >>   catch(FileNotFoundException e) { e.printStackTrace(); }
> >>
> >>
> >> }
> >>
> >>
> >> public static Model merge(Model k, File i) {
> >>
> >> // Model kg_in =
> >> ModelFactory.createDefaultModel().read(k.getAbsolutePath());
> >> Model other_in =
> >> ModelFactory.createDefaultModel().read(i.getAbsolutePath());
> >>
> >> Model union = ModelFactory.createUnion(k, other_in);
> >> return union;
> >> }}
> >>
> >> On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne <a...@apache.org> wrote:
> >>
> >>> Hi there - the attachment didn't make it through.
> >>>
> >>> Could you include the code in the body of the message please? Or put in
> >>> somewhere like a gist on github / pastebin /... and send a link.
> >>>
> >>>       Andy
> >>>
> >>> On 12/02/2021 11:45, emri mbiemri wrote:
> >>>> Dear all,
> >>>>
> >>>> Do you know how I can merge some thousands RDF models into a single
> >> one?
> >>>> I have tried it by iteration through all files within a folder and
> then
> >>>> using Jena's union function to merge them one by one! The problem is
> >>>> that the program is running for more than 13  hours and is still not
> >>>> stopping (with only 50 models as test).
> >>>>
> >>>> So far I have close to 700 models, in total 68MB.
> >>>>
> >>>> Attached you seen the code I am using for.
> >>>>
> >>>> Do you have any idea what I can do to merge all these files into a
> >>>> single knowledge-graph?
> >>>>
> >>>> Thanks for your help.
> >>>
> >>
> >
> >
>


-- 
Alexis Armin Huf <alexis...@gmail.com>

Reply via email to