subject:"Merging a massive amount of RDFs"

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne

BTW - the very long time may be the garbage collector on a nearly 
exhausted heap.


Actually, this couples with the way the data is being written.

The RDFFormat.RDFXML_PLAIN does not have expensive corner cases.

Andy

On 12/02/2021 11:45, emri mbiemri wrote:

Dear all,

Do you know how I can merge some thousands RDF models into a single one?
I have tried it by iteration through all files within a folder and then 
using Jena's union function to merge them one by one! The problem is 
that the program is running for more than 13  hours and is still not 
stopping (with only 50 models as test).


So far I have close to 700 models, in total 68MB.

Attached you seen the code I am using for.

Do you have any idea what I can do to merge all these files into a 
single knowledge-graph?


Thanks for your help.

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne





On 13/02/2021 13:53, Alexis Armin Huf wrote:

Thanks for clarifying, Andy. I hadn't followed the code in
RDFDataMgr.read/write().

The "In my experience" bit comes from two cases:
- iterating a Model
- Iterating the QuerySolution's of a ResultSet


Right - that's when the query work is actually done. QueryExecution 
creation sets up a plan, but does not execute the query.



In both cases the culprit turned out to be just the GC overhead. The Model
instances in that application were short-lived, so despite the Model
internally caching ResourceImpl/LIteralImpl classes, there was simply too
much object churn.


Seems highly likely.  The "convenience" of Model has a cost.

Andy



On Sat, Feb 13, 2021 at 10:43 AM Andy Seaborne  wrote:




On 12/02/2021 13:43, Alexis Armin Huf wrote:

Hi, emri.

In my experience with Jena I have observed that Graphs are more efficient
than Models when there is too much data being iterated.


The actual parsing should go straight into the graph, having picked it
out of the model.

What can happen with large numbers of small files, called one by one, is
that the model creation overhead shows up. That used to be quite a lot
but still today it's not zero because of initialization of internal
objects.

If data is added via the model API, so not the parsing case, there are
additional layers that have a cost and also cause the JIT to take longer
to fully optimize the code paths.


Also, at every
createUnion() call, your code is creating a new Union graph which in the
end will yield a tree of 700 models that will potentially be traversed

when

Not even "potentially" :-)

The ModelFactory.createUnion will add layers and layers of union graphs
so that "add triple" becomes a call to N separate graphs in a stack
before getting to graph that actually stores the triple.

  Andy


doing searches (which the RDF/XML serializer will eventually have to do

in

order to layout resources properly inside the XML).

Maybe this will be faster:

public static void main(String[] args)  throws Exception{
  File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
  Graph acc = GraphFactory.createDefaultGraph();
  RDFDataMgr.read(acc,
"file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
  for (File file : folder.listFiles()) {
  if (file.isFile())
  RDFDataMgr.read(acc, file.toURI().toString());
  }
  try (FileOutputStream out = new
FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
  RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
  }
}


Note that I have not tested this, you may want to run some iterations

under

the debugger.

On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
wrote:


Hi, the code is:
--


public class RDFMerge {

private static File folder;
private static Model kg;
// private static OutputStream out;

public static void iterate() {

folder = new File("C:\\Users\\Admin\\Desktop\\KG");
kg =



ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");


File[] listOfFiles = folder.listFiles(); // Interating through the
directory
for (File file : listOfFiles) {

if (file.isFile()) {

kg = merge(kg, file);

}

}


   OutputStream out1;
   try { out1 = new FileOutputStream( new
File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
   kg.write( out1, "RDF/XML", null );
   System.out.println("RDFs merged successfully!"); }
   catch(FileNotFoundException e) { e.printStackTrace(); }


}


public static Model merge(Model k, File i) {

// Model kg_in =
ModelFactory.createDefaultModel().read(k.getAbsolutePath());
Model other_in =
ModelFactory.createDefaultModel().read(i.getAbsolutePath());

Model union = ModelFactory.createUnion(k, other_in);
return union;
}}

On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:


Hi there - the attachment didn't make it through.

Could you include the code in the body of the message please? Or put in
somewhere like a gist on github / pastebin /... and send a link.

   Andy

On 12/02/2021 11:45, emri mbiemri wrote:

Dear all,

Do you know how I can merge some thousands RDF models into a single

one?

I have tried it by iteration through all files within a folder and

then

using Jena's union function to merge them one by one! The problem is
that the program is running for more than 13  hours and is still not
stopping (with only 50 models as test).

So far I have close to 700 models, in total 68MB.

Attached you seen the code I am using for.

Do you have any idea what I can do to merge all these files into a
single knowledge-graph?

Thanks for your help.

Re: Merging a massive amount of RDFs

2021-02-13 Thread Alexis Armin Huf

Thanks for clarifying, Andy. I hadn't followed the code in
RDFDataMgr.read/write().

The "In my experience" bit comes from two cases:
- iterating a Model
- Iterating the QuerySolution's of a ResultSet

In both cases the culprit turned out to be just the GC overhead. The Model
instances in that application were short-lived, so despite the Model
internally caching ResourceImpl/LIteralImpl classes, there was simply too
much object churn.

On Sat, Feb 13, 2021 at 10:43 AM Andy Seaborne  wrote:

>
>
> On 12/02/2021 13:43, Alexis Armin Huf wrote:
> > Hi, emri.
> >
> > In my experience with Jena I have observed that Graphs are more efficient
> > than Models when there is too much data being iterated.
>
> The actual parsing should go straight into the graph, having picked it
> out of the model.
>
> What can happen with large numbers of small files, called one by one, is
> that the model creation overhead shows up. That used to be quite a lot
> but still today it's not zero because of initialization of internal
> objects.
>
> If data is added via the model API, so not the parsing case, there are
> additional layers that have a cost and also cause the JIT to take longer
> to fully optimize the code paths.
>
> > Also, at every
> > createUnion() call, your code is creating a new Union graph which in the
> > end will yield a tree of 700 models that will potentially be traversed
> when
>
> Not even "potentially" :-)
>
> The ModelFactory.createUnion will add layers and layers of union graphs
> so that "add triple" becomes a call to N separate graphs in a stack
> before getting to graph that actually stores the triple.
>
>  Andy
>
> > doing searches (which the RDF/XML serializer will eventually have to do
> in
> > order to layout resources properly inside the XML).
> >
> > Maybe this will be faster:
> >
> > public static void main(String[] args)  throws Exception{
> >  File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
> >  Graph acc = GraphFactory.createDefaultGraph();
> >  RDFDataMgr.read(acc,
> > "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
> >  for (File file : folder.listFiles()) {
> >  if (file.isFile())
> >  RDFDataMgr.read(acc, file.toURI().toString());
> >  }
> >  try (FileOutputStream out = new
> > FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
> >  RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
> >  }
> > }
> >
> >
> > Note that I have not tested this, you may want to run some iterations
> under
> > the debugger.
> >
> > On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
> > wrote:
> >
> >> Hi, the code is:
> >> --
> >>
> >>
> >> public class RDFMerge {
> >>
> >> private static File folder;
> >> private static Model kg;
> >> // private static OutputStream out;
> >>
> >> public static void iterate() {
> >>
> >> folder = new File("C:\\Users\\Admin\\Desktop\\KG");
> >> kg =
> >>
> >>
> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
> >>
> >> File[] listOfFiles = folder.listFiles(); // Interating through the
> >> directory
> >> for (File file : listOfFiles) {
> >>
> >> if (file.isFile()) {
> >>
> >> kg = merge(kg, file);
> >>
> >> }
> >>
> >> }
> >>
> >>
> >>   OutputStream out1;
> >>   try { out1 = new FileOutputStream( new
> >> File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
> >>   kg.write( out1, "RDF/XML", null );
> >>   System.out.println("RDFs merged successfully!"); }
> >>   catch(FileNotFoundException e) { e.printStackTrace(); }
> >>
> >>
> >> }
> >>
> >>
> >> public static Model merge(Model k, File i) {
> >>
> >> // Model kg_in =
> >> ModelFactory.createDefaultModel().read(k.getAbsolutePath());
> >> Model other_in =
> >> ModelFactory.createDefaultModel().read(i.getAbsolutePath());
> >>
> >> Model union = ModelFactory.createUnion(k, other_in);
> >> return union;
> >> }}
> >>
> >> On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:
> >>
> >>> Hi there - the attachment didn't make it through.
> >>>
> >>> Could you include the code in the body of the message please? Or put in
> >>> somewhere like a gist on github / pastebin /... and send a link.
> >>>
> >>>   Andy
> >>>
> >>> On 12/02/2021 11:45, emri mbiemri wrote:
>  Dear all,
> 
>  Do you know how I can merge some thousands RDF models into a single
> >> one?
>  I have tried it by iteration through all files within a folder and
> then
>  using Jena's union function to merge them one by one! The problem is
>  that the program is running for more than 13  hours and is still not
>  stopping (with only 50 models as test).
> 
>  So far I have close to 700 models, in total 68MB.
> 
>  Attached you seen the code I am using for.
> 
>  Do you have any idea what I can do to merge all these files into a
>  single knowledge-graph?
> 
>  Thanks for your help.
> >>>
> >>
> >
> >
>


-- 
Alexis Armin Huf

Re: Merging a massive amount of RDFs

2021-02-13 Thread Andy Seaborne





On 12/02/2021 13:43, Alexis Armin Huf wrote:

Hi, emri.

In my experience with Jena I have observed that Graphs are more efficient
than Models when there is too much data being iterated.


The actual parsing should go straight into the graph, having picked it 
out of the model.


What can happen with large numbers of small files, called one by one, is 
that the model creation overhead shows up. That used to be quite a lot 
but still today it's not zero because of initialization of internal objects.


If data is added via the model API, so not the parsing case, there are 
additional layers that have a cost and also cause the JIT to take longer 
to fully optimize the code paths.



Also, at every
createUnion() call, your code is creating a new Union graph which in the
end will yield a tree of 700 models that will potentially be traversed when


Not even "potentially" :-)

The ModelFactory.createUnion will add layers and layers of union graphs 
so that "add triple" becomes a call to N separate graphs in a stack 
before getting to graph that actually stores the triple.


Andy


doing searches (which the RDF/XML serializer will eventually have to do in
order to layout resources properly inside the XML).

Maybe this will be faster:

public static void main(String[] args)  throws Exception{
 File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
 Graph acc = GraphFactory.createDefaultGraph();
 RDFDataMgr.read(acc,
"file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
 for (File file : folder.listFiles()) {
 if (file.isFile())
 RDFDataMgr.read(acc, file.toURI().toString());
 }
 try (FileOutputStream out = new
FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
 RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
 }
}


Note that I have not tested this, you may want to run some iterations under
the debugger.

On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
wrote:


Hi, the code is:
--


public class RDFMerge {

private static File folder;
private static Model kg;
// private static OutputStream out;

public static void iterate() {

folder = new File("C:\\Users\\Admin\\Desktop\\KG");
kg =

ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");

File[] listOfFiles = folder.listFiles(); // Interating through the
directory
for (File file : listOfFiles) {

if (file.isFile()) {

kg = merge(kg, file);

}

}


  OutputStream out1;
  try { out1 = new FileOutputStream( new
File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
  kg.write( out1, "RDF/XML", null );
  System.out.println("RDFs merged successfully!"); }
  catch(FileNotFoundException e) { e.printStackTrace(); }


}


public static Model merge(Model k, File i) {

// Model kg_in =
ModelFactory.createDefaultModel().read(k.getAbsolutePath());
Model other_in =
ModelFactory.createDefaultModel().read(i.getAbsolutePath());

Model union = ModelFactory.createUnion(k, other_in);
return union;
}}

On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:


Hi there - the attachment didn't make it through.

Could you include the code in the body of the message please? Or put in
somewhere like a gist on github / pastebin /... and send a link.

  Andy

On 12/02/2021 11:45, emri mbiemri wrote:

Dear all,

Do you know how I can merge some thousands RDF models into a single

one?

I have tried it by iteration through all files within a folder and then
using Jena's union function to merge them one by one! The problem is
that the program is running for more than 13  hours and is still not
stopping (with only 50 models as test).

So far I have close to 700 models, in total 68MB.

Attached you seen the code I am using for.

Do you have any idea what I can do to merge all these files into a
single knowledge-graph?

Thanks for your help.

Re: Merging a massive amount of RDFs

2021-02-13 Thread Lorenz Buehmann

wrong thread and the question is too vague ...

1) open another thread and

2) what is "cleaning" in your context, i.e. which kind of data quality
issues? Also, Jena is pretty much not a data cleansing tool, you can use
its RDF capabilities to write your own algorithms though.

On 12.02.21 16:52, Samita Bai / PhD CS Scholar @ City Campus wrote:
> Hello all,
>
> Can anyone suggest me any code for cleaning RDF data?
>
>
> Regards,
> Samita
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> 
> From: Andy Seaborne 
> Sent: Friday, February 12, 2021 6:59:55 PM
> To: users@jena.apache.org 
> Subject: Re: Merging a massive amount of RDFs
>
> Yes, getting rid of the union model and using a single model to collect
> all the RDF is better.
>
> Have you detemined where the time is going? On reading or writing?
>
> You have a lot of files and starting up the XML parser is expensive let
> along the RDF/XML parser on top of that.  If you have the option of a
> different format such as Turtle, that is worth trying.
>
> If this is a fixed set of files to use many times, consider using a
> database, loading it once and then using that persistent database.
>
>  tdbloader --loc DB *rdf
>
> You are writing with pretty printing turned on. Writing RDF/XML in
> pretty format is costly, try RDFFormat.RDFXML_PLAIN.
>
> If you are trying to read all the files and print them, you can do that
> with the command line
>
> # Read all RDF/XML files and print a single graph in turtle:
> riot --stream Turtle *.rdf
>
>  Andy
>
> On 12/02/2021 13:43, Alexis Armin Huf wrote:
>> Hi, emri.
>>
>> In my experience with Jena I have observed that Graphs are more efficient
>> than Models when there is too much data being iterated. Also, at every
>> createUnion() call, your code is creating a new Union graph which in the
>> end will yield a tree of 700 models that will potentially be traversed when
>> doing searches (which the RDF/XML serializer will eventually have to do in
>> order to layout resources properly inside the XML).
>>
>> Maybe this will be faster:
>>
>> public static void main(String[] args)  throws Exception{
>>  File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
>>  Graph acc = GraphFactory.createDefaultGraph();
>>  RDFDataMgr.read(acc,
>> "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
>>  for (File file : folder.listFiles()) {
>>  if (file.isFile())
>>  RDFDataMgr.read(acc, file.toURI().toString());
>>  }
>>  try (FileOutputStream out = new
>> FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
>>  RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
>>  }
>> }
>>
>>
>> Note that I have not tested this, you may want to run some iterations under
>> the debugger.
>>
>> On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
>> wrote:
>>
>>> Hi, the code is:
>>> --
>>>
>>>
>>> public class RDFMerge {
>>>
>>> private static File folder;
>>> private static Model kg;
>>> // private static OutputStream out;
>>>
>>> public static void iterate() {
>>>
>>> folder = new File("C:\\Users\\Admin\\Desktop\\KG");
>>> kg =
>>>
>>> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
>>>
>>> File[] listOfFiles = folder.listFiles(); // Interating through the
>>> directory
>>> for (File file : listOfFiles) {
>>>
>>> if (file.isFile()) {
>>>
>>> kg = merge(kg, file);
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>   OutputStream out1;
>>>   try { out1 = new FileOutputStream( new
>>> File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
>>>   kg.write( out1, "RDF/XML", null );
>>>   System.out.println("RDFs merged successfully!"); }
>>>   catch(FileNotFoundException e) { e.printStackTrace(); }
>>>
>>>
>>> }
>>>
>>>
>>> public static Model merge(Model k, File i) {
>>>
>>> // Model kg_in =
>>> ModelFactory.createDefaultModel().read(k.getAbsolutePath());
>>> Model other_in =
>>> ModelFactory.createDefaultModel().read(i.getAbsolutePath());
>>>
>>> Model union = ModelFactory.createUnion(k, other_in);
>>> return union;
>>> }}
>&

Re: Merging a massive amount of RDFs

2021-02-12 Thread emri mbiemri

Hi Mr.Huf,

I think your advice worked fine for me. I have in the end on RDF/XML file
which I think contains all the data from the other files.

Thanks a lot.

On Fri, Feb 12, 2021 at 3:43 PM Alexis Armin Huf 
wrote:

> Hi, emri.
>
> In my experience with Jena I have observed that Graphs are more efficient
> than Models when there is too much data being iterated. Also, at every
> createUnion() call, your code is creating a new Union graph which in the
> end will yield a tree of 700 models that will potentially be traversed when
> doing searches (which the RDF/XML serializer will eventually have to do in
> order to layout resources properly inside the XML).
>
> Maybe this will be faster:
>
> public static void main(String[] args)  throws Exception{
> File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
> Graph acc = GraphFactory.createDefaultGraph();
> RDFDataMgr.read(acc,
> "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
> for (File file : folder.listFiles()) {
> if (file.isFile())
> RDFDataMgr.read(acc, file.toURI().toString());
> }
> try (FileOutputStream out = new
> FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
> RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
> }
> }
>
>
> Note that I have not tested this, you may want to run some iterations under
> the debugger.
>
> On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
> wrote:
>
> > Hi, the code is:
> > --
> >
> >
> > public class RDFMerge {
> >
> > private static File folder;
> > private static Model kg;
> > // private static OutputStream out;
> >
> > public static void iterate() {
> >
> > folder = new File("C:\\Users\\Admin\\Desktop\\KG");
> > kg =
> >
> >
> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
> >
> > File[] listOfFiles = folder.listFiles(); // Interating through the
> > directory
> > for (File file : listOfFiles) {
> >
> > if (file.isFile()) {
> >
> > kg = merge(kg, file);
> >
> > }
> >
> > }
> >
> >
> >  OutputStream out1;
> >  try { out1 = new FileOutputStream( new
> > File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
> >  kg.write( out1, "RDF/XML", null );
> >  System.out.println("RDFs merged successfully!"); }
> >  catch(FileNotFoundException e) { e.printStackTrace(); }
> >
> >
> > }
> >
> >
> > public static Model merge(Model k, File i) {
> >
> > // Model kg_in =
> > ModelFactory.createDefaultModel().read(k.getAbsolutePath());
> > Model other_in =
> > ModelFactory.createDefaultModel().read(i.getAbsolutePath());
> >
> > Model union = ModelFactory.createUnion(k, other_in);
> > return union;
> > }}
> >
> > On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:
> >
> > > Hi there - the attachment didn't make it through.
> > >
> > > Could you include the code in the body of the message please? Or put in
> > > somewhere like a gist on github / pastebin /... and send a link.
> > >
> > >  Andy
> > >
> > > On 12/02/2021 11:45, emri mbiemri wrote:
> > > > Dear all,
> > > >
> > > > Do you know how I can merge some thousands RDF models into a single
> > one?
> > > > I have tried it by iteration through all files within a folder and
> then
> > > > using Jena's union function to merge them one by one! The problem is
> > > > that the program is running for more than 13  hours and is still not
> > > > stopping (with only 50 models as test).
> > > >
> > > > So far I have close to 700 models, in total 68MB.
> > > >
> > > > Attached you seen the code I am using for.
> > > >
> > > > Do you have any idea what I can do to merge all these files into a
> > > > single knowledge-graph?
> > > >
> > > > Thanks for your help.
> > >
> >
>
>
> --
> Alexis Armin Huf 
>

Re: Merging a massive amount of RDFs

2021-02-12 Thread Samita Bai / PhD CS Scholar @ City Campus

Hello all,

Can anyone suggest me any code for cleaning RDF data?


Regards,
Samita

Get Outlook for Android<https://aka.ms/ghei36>


From: Andy Seaborne 
Sent: Friday, February 12, 2021 6:59:55 PM
To: users@jena.apache.org 
Subject: Re: Merging a massive amount of RDFs

Yes, getting rid of the union model and using a single model to collect
all the RDF is better.

Have you detemined where the time is going? On reading or writing?

You have a lot of files and starting up the XML parser is expensive let
along the RDF/XML parser on top of that.  If you have the option of a
different format such as Turtle, that is worth trying.

If this is a fixed set of files to use many times, consider using a
database, loading it once and then using that persistent database.

 tdbloader --loc DB *rdf

You are writing with pretty printing turned on. Writing RDF/XML in
pretty format is costly, try RDFFormat.RDFXML_PLAIN.

If you are trying to read all the files and print them, you can do that
with the command line

# Read all RDF/XML files and print a single graph in turtle:
riot --stream Turtle *.rdf

 Andy

On 12/02/2021 13:43, Alexis Armin Huf wrote:
> Hi, emri.
>
> In my experience with Jena I have observed that Graphs are more efficient
> than Models when there is too much data being iterated. Also, at every
> createUnion() call, your code is creating a new Union graph which in the
> end will yield a tree of 700 models that will potentially be traversed when
> doing searches (which the RDF/XML serializer will eventually have to do in
> order to layout resources properly inside the XML).
>
> Maybe this will be faster:
>
> public static void main(String[] args)  throws Exception{
>  File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
>  Graph acc = GraphFactory.createDefaultGraph();
>  RDFDataMgr.read(acc,
> "file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
>  for (File file : folder.listFiles()) {
>  if (file.isFile())
>  RDFDataMgr.read(acc, file.toURI().toString());
>  }
>  try (FileOutputStream out = new
> FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
>  RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
>  }
> }
>
>
> Note that I have not tested this, you may want to run some iterations under
> the debugger.
>
> On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
> wrote:
>
>> Hi, the code is:
>> --
>>
>>
>> public class RDFMerge {
>>
>> private static File folder;
>> private static Model kg;
>> // private static OutputStream out;
>>
>> public static void iterate() {
>>
>> folder = new File("C:\\Users\\Admin\\Desktop\\KG");
>> kg =
>>
>> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
>>
>> File[] listOfFiles = folder.listFiles(); // Interating through the
>> directory
>> for (File file : listOfFiles) {
>>
>> if (file.isFile()) {
>>
>> kg = merge(kg, file);
>>
>> }
>>
>> }
>>
>>
>>   OutputStream out1;
>>   try { out1 = new FileOutputStream( new
>> File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
>>   kg.write( out1, "RDF/XML", null );
>>   System.out.println("RDFs merged successfully!"); }
>>   catch(FileNotFoundException e) { e.printStackTrace(); }
>>
>>
>> }
>>
>>
>> public static Model merge(Model k, File i) {
>>
>> // Model kg_in =
>> ModelFactory.createDefaultModel().read(k.getAbsolutePath());
>> Model other_in =
>> ModelFactory.createDefaultModel().read(i.getAbsolutePath());
>>
>> Model union = ModelFactory.createUnion(k, other_in);
>> return union;
>> }}
>>
>> On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:
>>
>>> Hi there - the attachment didn't make it through.
>>>
>>> Could you include the code in the body of the message please? Or put in
>>> somewhere like a gist on github / pastebin /... and send a link.
>>>
>>>   Andy
>>>
>>> On 12/02/2021 11:45, emri mbiemri wrote:
>>>> Dear all,
>>>>
>>>> Do you know how I can merge some thousands RDF models into a single
>> one?
>>>> I have tried it by iteration through all files within a folder and then
>>>> using Jena's union function to merge them one by one! The problem is
>>>> that the program is running for more than 13  hours and is still not
>>>> stopping (with only 50 models as test).
>>&g

Re: Merging a massive amount of RDFs

2021-02-12 Thread Andy Seaborne

Yes, getting rid of the union model and using a single model to collect 
all the RDF is better.


Have you detemined where the time is going? On reading or writing?

You have a lot of files and starting up the XML parser is expensive let 
along the RDF/XML parser on top of that.  If you have the option of a 
different format such as Turtle, that is worth trying.


If this is a fixed set of files to use many times, consider using a 
database, loading it once and then using that persistent database.


tdbloader --loc DB *rdf

You are writing with pretty printing turned on. Writing RDF/XML in 
pretty format is costly, try RDFFormat.RDFXML_PLAIN.


If you are trying to read all the files and print them, you can do that 
with the command line


# Read all RDF/XML files and print a single graph in turtle:
riot --stream Turtle *.rdf

Andy

On 12/02/2021 13:43, Alexis Armin Huf wrote:

Hi, emri.

In my experience with Jena I have observed that Graphs are more efficient
than Models when there is too much data being iterated. Also, at every
createUnion() call, your code is creating a new Union graph which in the
end will yield a tree of 700 models that will potentially be traversed when
doing searches (which the RDF/XML serializer will eventually have to do in
order to layout resources properly inside the XML).

Maybe this will be faster:

public static void main(String[] args)  throws Exception{
 File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
 Graph acc = GraphFactory.createDefaultGraph();
 RDFDataMgr.read(acc,
"file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
 for (File file : folder.listFiles()) {
 if (file.isFile())
 RDFDataMgr.read(acc, file.toURI().toString());
 }
 try (FileOutputStream out = new
FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
 RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
 }
}


Note that I have not tested this, you may want to run some iterations under
the debugger.

On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
wrote:


Hi, the code is:
--


public class RDFMerge {

private static File folder;
private static Model kg;
// private static OutputStream out;

public static void iterate() {

folder = new File("C:\\Users\\Admin\\Desktop\\KG");
kg =

ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");

File[] listOfFiles = folder.listFiles(); // Interating through the
directory
for (File file : listOfFiles) {

if (file.isFile()) {

kg = merge(kg, file);

}

}


  OutputStream out1;
  try { out1 = new FileOutputStream( new
File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
  kg.write( out1, "RDF/XML", null );
  System.out.println("RDFs merged successfully!"); }
  catch(FileNotFoundException e) { e.printStackTrace(); }


}


public static Model merge(Model k, File i) {

// Model kg_in =
ModelFactory.createDefaultModel().read(k.getAbsolutePath());
Model other_in =
ModelFactory.createDefaultModel().read(i.getAbsolutePath());

Model union = ModelFactory.createUnion(k, other_in);
return union;
}}

On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:


Hi there - the attachment didn't make it through.

Could you include the code in the body of the message please? Or put in
somewhere like a gist on github / pastebin /... and send a link.

  Andy

On 12/02/2021 11:45, emri mbiemri wrote:

Dear all,

Do you know how I can merge some thousands RDF models into a single

one?

I have tried it by iteration through all files within a folder and then
using Jena's union function to merge them one by one! The problem is
that the program is running for more than 13  hours and is still not
stopping (with only 50 models as test).

So far I have close to 700 models, in total 68MB.

Attached you seen the code I am using for.

Do you have any idea what I can do to merge all these files into a
single knowledge-graph?

Thanks for your help.

Re: Merging a massive amount of RDFs

2021-02-12 Thread Alexis Armin Huf

Hi, emri.

In my experience with Jena I have observed that Graphs are more efficient
than Models when there is too much data being iterated. Also, at every
createUnion() call, your code is creating a new Union graph which in the
end will yield a tree of 700 models that will potentially be traversed when
doing searches (which the RDF/XML serializer will eventually have to do in
order to layout resources properly inside the XML).

Maybe this will be faster:

public static void main(String[] args)  throws Exception{
File folder = new File("C:\\Users\\Admin\\Desktop\\KG");
Graph acc = GraphFactory.createDefaultGraph();
RDFDataMgr.read(acc,
"file:C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
for (File file : folder.listFiles()) {
if (file.isFile())
RDFDataMgr.read(acc, file.toURI().toString());
}
try (FileOutputStream out = new
FileOutputStream("C:\\Users\\Admin\\Desktop\\merged.rdf")) {
RDFDataMgr.write(out, acc, RDFFormat.RDFXML);
}
}


Note that I have not tested this, you may want to run some iterations under
the debugger.

On Fri, Feb 12, 2021 at 9:29 AM emri mbiemri 
wrote:

> Hi, the code is:
> --
>
>
> public class RDFMerge {
>
> private static File folder;
> private static Model kg;
> // private static OutputStream out;
>
> public static void iterate() {
>
> folder = new File("C:\\Users\\Admin\\Desktop\\KG");
> kg =
>
> ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");
>
> File[] listOfFiles = folder.listFiles(); // Interating through the
> directory
> for (File file : listOfFiles) {
>
> if (file.isFile()) {
>
> kg = merge(kg, file);
>
> }
>
> }
>
>
>  OutputStream out1;
>  try { out1 = new FileOutputStream( new
> File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
>  kg.write( out1, "RDF/XML", null );
>  System.out.println("RDFs merged successfully!"); }
>  catch(FileNotFoundException e) { e.printStackTrace(); }
>
>
> }
>
>
> public static Model merge(Model k, File i) {
>
> // Model kg_in =
> ModelFactory.createDefaultModel().read(k.getAbsolutePath());
> Model other_in =
> ModelFactory.createDefaultModel().read(i.getAbsolutePath());
>
> Model union = ModelFactory.createUnion(k, other_in);
> return union;
> }}
>
> On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:
>
> > Hi there - the attachment didn't make it through.
> >
> > Could you include the code in the body of the message please? Or put in
> > somewhere like a gist on github / pastebin /... and send a link.
> >
> >  Andy
> >
> > On 12/02/2021 11:45, emri mbiemri wrote:
> > > Dear all,
> > >
> > > Do you know how I can merge some thousands RDF models into a single
> one?
> > > I have tried it by iteration through all files within a folder and then
> > > using Jena's union function to merge them one by one! The problem is
> > > that the program is running for more than 13  hours and is still not
> > > stopping (with only 50 models as test).
> > >
> > > So far I have close to 700 models, in total 68MB.
> > >
> > > Attached you seen the code I am using for.
> > >
> > > Do you have any idea what I can do to merge all these files into a
> > > single knowledge-graph?
> > >
> > > Thanks for your help.
> >
>


-- 
Alexis Armin Huf

Re: Merging a massive amount of RDFs

2021-02-12 Thread emri mbiemri

Hi, the code is:
--


public class RDFMerge {

private static File folder;
private static Model kg;
// private static OutputStream out;

public static void iterate() {

folder = new File("C:\\Users\\Admin\\Desktop\\KG");
kg =
ModelFactory.createDefaultModel().read("C:\\Users\\Admin\\Desktop\\KG\\AccessComponent.bo.rdf");

File[] listOfFiles = folder.listFiles(); // Interating through the directory
for (File file : listOfFiles) {

if (file.isFile()) {

kg = merge(kg, file);

}

}


 OutputStream out1;
 try { out1 = new FileOutputStream( new
File("C:\\Users\\Admin\\Desktop\\merged.rdf"));
 kg.write( out1, "RDF/XML", null );
 System.out.println("RDFs merged successfully!"); }
 catch(FileNotFoundException e) { e.printStackTrace(); }


}


public static Model merge(Model k, File i) {

// Model kg_in =
ModelFactory.createDefaultModel().read(k.getAbsolutePath());
Model other_in =
ModelFactory.createDefaultModel().read(i.getAbsolutePath());

Model union = ModelFactory.createUnion(k, other_in);
return union;
}}

On Fri, Feb 12, 2021 at 2:12 PM Andy Seaborne  wrote:

> Hi there - the attachment didn't make it through.
>
> Could you include the code in the body of the message please? Or put in
> somewhere like a gist on github / pastebin /... and send a link.
>
>  Andy
>
> On 12/02/2021 11:45, emri mbiemri wrote:
> > Dear all,
> >
> > Do you know how I can merge some thousands RDF models into a single one?
> > I have tried it by iteration through all files within a folder and then
> > using Jena's union function to merge them one by one! The problem is
> > that the program is running for more than 13  hours and is still not
> > stopping (with only 50 models as test).
> >
> > So far I have close to 700 models, in total 68MB.
> >
> > Attached you seen the code I am using for.
> >
> > Do you have any idea what I can do to merge all these files into a
> > single knowledge-graph?
> >
> > Thanks for your help.
>

Re: Merging a massive amount of RDFs

2021-02-12 Thread Andy Seaborne


Hi there - the attachment didn't make it through.

Could you include the code in the body of the message please? Or put in 
somewhere like a gist on github / pastebin /... and send a link.


Andy

On 12/02/2021 11:45, emri mbiemri wrote:

Dear all,

Do you know how I can merge some thousands RDF models into a single one?
I have tried it by iteration through all files within a folder and then 
using Jena's union function to merge them one by one! The problem is 
that the program is running for more than 13  hours and is still not 
stopping (with only 50 models as test).


So far I have close to 700 models, in total 68MB.

Attached you seen the code I am using for.

Do you have any idea what I can do to merge all these files into a 
single knowledge-graph?


Thanks for your help.

Merging a massive amount of RDFs

2021-02-12 Thread emri mbiemri

Dear all,

Do you know how I can merge some thousands RDF models into a single one?
I have tried it by iteration through all files within a folder and then
using Jena's union function to merge them one by one! The problem is that
the program is running for more than 13  hours and is still not stopping
(with only 50 models as test).

So far I have close to 700 models, in total 68MB.

Attached you seen the code I am using for.

Do you have any idea what I can do to merge all these files into a single
knowledge-graph?

Thanks for your help.

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Re: Merging a massive amount of RDFs

Merging a massive amount of RDFs

12 matches

Site Navigation

Mail list logo

Footer information