Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne
BTW - the very long time may be the garbage collector on a nearly exhausted heap. Actually, this couples with the way the data is being written. The RDFFormat.RDFXML_PLAIN does not have expensive corner cases. Andy On 12/02/2021 11:45, emri mbiemri wrote: Dear all, Do you know how I

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne
On 13/02/2021 13:53, Alexis Armin Huf wrote: Thanks for clarifying, Andy. I hadn't followed the code in RDFDataMgr.read/write(). The "In my experience" bit comes from two cases: - iterating a Model - Iterating the QuerySolution's of a ResultSet Right - that's when the query work is

Re: Merging a massive amount of RDFs

2021-02-13 Thread Alexis Armin Huf
Thanks for clarifying, Andy. I hadn't followed the code in RDFDataMgr.read/write(). The "In my experience" bit comes from two cases: - iterating a Model - Iterating the QuerySolution's of a ResultSet In both cases the culprit turned out to be just the GC overhead. The Model instances in that

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne
On 12/02/2021 13:43, Alexis Armin Huf wrote: Hi, emri. In my experience with Jena I have observed that Graphs are more efficient than Models when there is too much data being iterated. The actual parsing should go straight into the graph, having picked it out of the model. What can

Re: Merging a massive amount of RDFs

2021-02-13 Thread Lorenz Buehmann
aborne > Sent: Friday, February 12, 2021 6:59:55 PM > To: users@jena.apache.org > Subject: Re: Merging a massive amount of RDFs > > Yes, getting rid of the union model and using a single model to collect > all the RDF is better. > > Have you detemined where the time is going

Re: Merging a massive amount of RDFs

2021-02-12 Thread emri mbiemri
Hi Mr.Huf, I think your advice worked fine for me. I have in the end on RDF/XML file which I think contains all the data from the other files. Thanks a lot. On Fri, Feb 12, 2021 at 3:43 PM Alexis Armin Huf wrote: > Hi, emri. > > In my experience with Jena I have observed that Graphs are more

Re: Merging a massive amount of RDFs

2021-02-12 Thread Samita Bai / PhD CS Scholar @ City Campus
Hello all, Can anyone suggest me any code for cleaning RDF data? Regards, Samita Get Outlook for Android<https://aka.ms/ghei36> From: Andy Seaborne Sent: Friday, February 12, 2021 6:59:55 PM To: users@jena.apache.org Subject: Re: Merging a massive

Re: Merging a massive amount of RDFs

2021-02-12 Thread Andy Seaborne
Yes, getting rid of the union model and using a single model to collect all the RDF is better. Have you detemined where the time is going? On reading or writing? You have a lot of files and starting up the XML parser is expensive let along the RDF/XML parser on top of that. If you have the

Re: Merging a massive amount of RDFs

2021-02-12 Thread Alexis Armin Huf
Hi, emri. In my experience with Jena I have observed that Graphs are more efficient than Models when there is too much data being iterated. Also, at every createUnion() call, your code is creating a new Union graph which in the end will yield a tree of 700 models that will potentially be

Re: Merging a massive amount of RDFs

2021-02-12 Thread emri mbiemri
Hi, the code is: -- public class RDFMerge { private static File folder; private static Model kg; // private static OutputStream out; public static void iterate() { folder = new File("C:\\Users\\Admin\\Desktop\\KG"); kg =

Re: Merging a massive amount of RDFs

2021-02-12 Thread Andy Seaborne
Hi there - the attachment didn't make it through. Could you include the code in the body of the message please? Or put in somewhere like a gist on github / pastebin /... and send a link. Andy On 12/02/2021 11:45, emri mbiemri wrote: Dear all, Do you know how I can merge some thousands

Merging a massive amount of RDFs

2021-02-12 Thread emri mbiemri
Dear all, Do you know how I can merge some thousands RDF models into a single one? I have tried it by iteration through all files within a folder and then using Jena's union function to merge them one by one! The problem is that the program is running for more than 13 hours and is still not