Hello all,

Can anyone suggest me any code for cleaning RDF data?


Regards,
Samita

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Andy Seaborne <a...@apache.org>
Sent: Friday, February 12, 2021 6:59:55 PM
To: users@jena.apache.org <users@jena.apache.org>
Subject: Re: Merging a massive amount of RDFs

Yes, getting rid of the union model and using a single model to collect
all the RDF is better.

Have you detemined where the time is going? On reading or writing?

You have a lot of files and starting up the XML parser is expensive let
along the RDF/XML parser on top of that.  If you have the option of a
different format such as Turtle, that is worth trying.

If this is a fixed set of files to use many times, consider using a
database, loading it once and then using that persistent database.

     tdbloader --loc DB *rdf

You are writing with pretty printing turned on. Writing RDF/XML in
pretty format is costly, try RDFFormat.RDFXML_PLAIN.

If you are trying to read all the files and print them, you can do that
with the command line

# Read all RDF/XML files and print a single graph in turtle:
riot --stream Turtle *.rdf

     Andy

On 12/02/2021 13:43, Alexis Armin Huf wrote:
> Hi, emri.
>
> In my experience with Jena I have observed that Graphs are more efficient
> than Models when there is too much data being iterated. Also, at every
> createUnion() call, your code is creating a new Union graph which in the
> end will yield a tree of 700 models that will potentially be traversed when
> doing searches (which the RDF/XML serializer will eventually have to do in
> order to layout resources properly inside the XML).
>
> Maybe this will be faster:
>
> public static void main(String[] args)  throws Exception{
>      File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
>      Graph acc = GraphFactory.createDefaultGraph();
>      RDFDataMgr.read(acc,
> "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
>      for (File file : folder.listFiles()) {
>          if (file.isFile())
>              RDFDataMgr.read(acc, file.toURI().toString());
>      }
>      try (FileOutputStream out = new
> FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
>          RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
>      }
> }
>
>
> Note that I have not tested this, you may want to run some iterations under
> the debugger.
>
> On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri <emrimbiemri8...@gmail.com>
> wrote:
>
>> Hi, the code is:
>> ----------------------------------
>>
>>
>> public class RDFMerge {
>>
>> private static File folder;
>> private static Model kg;
>> // private static OutputStream out;
>>
>> public static void iterate() {
>>
>> folder = new File("C:\\Users\\Admin\\Desktop\\KG");
>> kg =
>>
>> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
>>
>> File[] listOfFiles = folder.listFiles(); // Interating through the
>> directory
>> for (File file : listOfFiles) {
>>
>> if (file.isFile()) {
>>
>> kg = merge(kg, file);
>>
>> }
>>
>> }
>>
>>
>>   OutputStream out1;
>>   try { out1 = new FileOutputStream( new
>> File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
>>   kg.write( out1, "RDF/XML", null );
>>   System.out.println("RDFs merged successfully!"); }
>>   catch(FileNotFoundException e) { e.printStackTrace(); }
>>
>>
>> }
>>
>>
>> public static Model merge(Model k, File i) {
>>
>> // Model kg_in =
>> ModelFactory.createDefaultModel().read(k.getAbsolutePath());
>> Model other_in =
>> ModelFactory.createDefaultModel().read(i.getAbsolutePath());
>>
>> Model union = ModelFactory.createUnion(k, other_in);
>> return union;
>> }}
>>
>> On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne <a...@apache.org> wrote:
>>
>>> Hi there - the attachment didn't make it through.
>>>
>>> Could you include the code in the body of the message please? Or put in
>>> somewhere like a gist on github / pastebin /... and send a link.
>>>
>>>       Andy
>>>
>>> On 12/02/2021 11:45, emri mbiemri wrote:
>>>> Dear all,
>>>>
>>>> Do you know how I can merge some thousands RDF models into a single
>> one?
>>>> I have tried it by iteration through all files within a folder and then
>>>> using Jena's union function to merge them one by one! The problem is
>>>> that the program is running for more than 13  hours and is still not
>>>> stopping (with only 50 models as test).
>>>>
>>>> So far I have close to 700 models, in total 68MB.
>>>>
>>>> Attached you seen the code I am using for.
>>>>
>>>> Do you have any idea what I can do to merge all these files into a
>>>> single knowledge-graph?
>>>>
>>>> Thanks for your help.
>>>
>>
>
>

P : Please consider the environment before printing this e-mail

________________________________

CONFIDENTIALITY / DISCLAIMER NOTICE: This e-mail and any attachments may 
contain confidential and privileged information. If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
e-mail and destroy any copies. Any dissemination or use of this information by 
a person other than the intended recipient is unauthorized and may be illegal.

________________________________

Reply via email to