Hello all, Can anyone suggest me any code for cleaning RDF data?
Regards, Samita Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Andy Seaborne <a...@apache.org> Sent: Friday, February 12, 2021 6:59:55 PM To: users@jena.apache.org <users@jena.apache.org> Subject: Re: Merging a massive amount of RDFs Yes, getting rid of the union model and using a single model to collect all the RDF is better. Have you detemined where the time is going? On reading or writing? You have a lot of files and starting up the XML parser is expensive let along the RDF/XML parser on top of that. If you have the option of a different format such as Turtle, that is worth trying. If this is a fixed set of files to use many times, consider using a database, loading it once and then using that persistent database. tdbloader --loc DB *rdf You are writing with pretty printing turned on. Writing RDF/XML in pretty format is costly, try RDFFormat.RDFXML_PLAIN. If you are trying to read all the files and print them, you can do that with the command line # Read all RDF/XML files and print a single graph in turtle: riot --stream Turtle *.rdf Andy On 12/02/2021 13:43, Alexis Armin Huf wrote: > Hi, emri. > > In my experience with Jena I have observed that Graphs are more efficient > than Models when there is too much data being iterated. Also, at every > createUnion() call, your code is creating a new Union graph which in the > end will yield a tree of 700 models that will potentially be traversed when > doing searches (which the RDF/XML serializer will eventually have to do in > order to layout resources properly inside the XML). > > Maybe this will be faster: > > public static void main(String[] args) throws Exception{ > File folder = new File("C:\\Users\\Admin\\Desktop\\KG"); > Graph acc = GraphFactory.createDefaultGraph(); > RDFDataMgr.read(acc, > "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf"); > for (File file : folder.listFiles()) { > if (file.isFile()) > RDFDataMgr.read(acc, file.toURI().toString()); > } > try (FileOutputStream out = new > FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) { > RDFDataMgr.write(out, acc, RDFFormat.RDFXML); > } > } > > > Note that I have not tested this, you may want to run some iterations under > the debugger. > > On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri <emrimbiemri8...@gmail.com> > wrote: > >> Hi, the code is: >> ---------------------------------- >> >> >> public class RDFMerge { >> >> private static File folder; >> private static Model kg; >> // private static OutputStream out; >> >> public static void iterate() { >> >> folder = new File("C:\\Users\\Admin\\Desktop\\KG"); >> kg = >> >> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf"); >> >> File[] listOfFiles = folder.listFiles(); // Interating through the >> directory >> for (File file : listOfFiles) { >> >> if (file.isFile()) { >> >> kg = merge(kg, file); >> >> } >> >> } >> >> >> OutputStream out1; >> try { out1 = new FileOutputStream( new >> File("C:\\Users\\Admin\\Desktop\\merged.rdf")); >> kg.write( out1, "RDF/XML", null ); >> System.out.println("RDFs merged successfully!"); } >> catch(FileNotFoundException e) { e.printStackTrace(); } >> >> >> } >> >> >> public static Model merge(Model k, File i) { >> >> // Model kg_in = >> ModelFactory.createDefaultModel().read(k.getAbsolutePath()); >> Model other_in = >> ModelFactory.createDefaultModel().read(i.getAbsolutePath()); >> >> Model union = ModelFactory.createUnion(k, other_in); >> return union; >> }} >> >> On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne <a...@apache.org> wrote: >> >>> Hi there - the attachment didn't make it through. >>> >>> Could you include the code in the body of the message please? Or put in >>> somewhere like a gist on github / pastebin /... and send a link. >>> >>> Andy >>> >>> On 12/02/2021 11:45, emri mbiemri wrote: >>>> Dear all, >>>> >>>> Do you know how I can merge some thousands RDF models into a single >> one? >>>> I have tried it by iteration through all files within a folder and then >>>> using Jena's union function to merge them one by one! The problem is >>>> that the program is running for more than 13 hours and is still not >>>> stopping (with only 50 models as test). >>>> >>>> So far I have close to 700 models, in total 68MB. >>>> >>>> Attached you seen the code I am using for. >>>> >>>> Do you have any idea what I can do to merge all these files into a >>>> single knowledge-graph? >>>> >>>> Thanks for your help. >>> >> > > P : Please consider the environment before printing this e-mail ________________________________ CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may contain confidential and privileged information. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. ________________________________