Holger,

If it is possible, a patch would be very helpful--upon further checking, 
one of my workarounds, which allowed 2000 files to be input per pass, 
caused most of the output to be omitted.  (I had moved the imports outside 
of the loop, but due to replace=true, this caused the logic to fail).  I 
have tried other arrangements, with only minimal improvement.  In a 
successful execution, I am getting  only about 250 files per pass, which 
is a big headache--I will eventually need to process about 14000 files in 
all.

Jim 





Holger Knublauch <[email protected]> 
Sent by: [email protected]
05/22/2009 01:51 PM
Please respond to
[email protected]


To
[email protected]
cc

Subject
[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports






Hi James,

you may have hit a problem in the current SPARQLMotion engine. It 
currently leaves all opened graphs in memory because future steps of a 
script may want to access them again, and then reloading would be very 
slow. So while a smart garbage collection algorithm may work here, a 
short-term work around would be to manually have a way to call this 
garbage collection. I will look into this.

Meanwhile, restarting TBC will work to a limited extent - as you have 
found out.

Thanks,
Holger


On May 22, 2009, at 10:43 AM, James A Miller wrote:

I was having Java heap space errors in my SparqlMotion script, which was 
processing 7000 input spreadsheets, (using a "control" spreadsheet with 
file URIs, and an IterateOverSelect for constructing triples from them, 
then adding reification triples, and finally writing them out to a Sesame 
store, within each iteration).  According to the console, it processed 
about 3500 files before dying. 

So, I split the 'control' spreadsheet into 4 spreadsheets, each containing 
about 2000 file references.  The first file ran to completion.  (hooray!) 
So I started the 2nd one, and it died, after processing 1500 files.  As a 
test, I restarted Eclipse, and ran it again, and then it succeeded. 

Is there a command that I should be issuing that will clear out memory 
from earlier executions, rather than restarting TBC/Eclipse repeatedly? 

Also, at one point I was accumulating all of the data, and attempting to 
write it all out after the iterations.  Due to the heap problems, I 
started writing to a Sesame store within the iterations, but that didn't 
really fix anything.   I am still getting the heap problems (as I 
described).  I guess it's an extension of the same question--is there 
something I can do within the iterations, which will clear out memory so 
that I don't run into this problem, and can extract all of my files in the 
same process? 

I have my memory set for Eclipse at -Xmx1500m. 

Jim







The following line is added for your protection and will be used for 
analysis if this message is reported as spam: (Raytheon Analysis: 
IP=209.85.221.157; 
e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com;
 
[email protected]; date=May 22, 2009 5:51:31 PM; 
subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports) 



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TopBraid Composer Users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/topbraid-composer-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to