Holger,

That is excellent news--thanks!  I will look forward to the patch...

Jim 





Holger Knublauch <[email protected]> 
Sent by: [email protected]
05/24/2009 12:48 PM
Please respond to
[email protected]


To
[email protected]
cc

Subject
[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports






Hi Jim,

(with thanks for sending me your file off-list). I have seen that you were 
using sml:ImportRDFFromWorkspace to do the spreadsheet import, and this 
indicates that you indeed hit a memory leak. When the script iterates over 
thousands of files, it is difficult to determine when to garbage collect 
any one of them. In the absence of a smarter approach of the SM engine, I 
have added a new module type sml:CollectGarbage in which you just need to 
provide the base URI of the file to "unload" and it will unregister that 
file from memory so that the next round of Java garbage collection can 
pick it up.

This will be part of the next official release, and I will contact you 
off-list as soon as we have an intermediate patch for you this coming 
week.

Holger


On May 23, 2009, at 11:33 AM, Holger Knublauch wrote:

Jim,

in order to enable me to narrow down the problem, could you kindly either 
send me the script or enumerate the module types that you are using? In 
particular I would like to know what type of file import modules you are 
using. We may do an inofficial test build next week and I might be able to 
send you a link to that.

Thanks
Holger


On May 23, 2009, at 7:59 AM, James A Miller wrote:

Holger, 

If it is possible, a patch would be very helpful--upon further checking, 
one of my workarounds, which allowed 2000 files to be input per pass, 
caused most of the output to be omitted.  (I had moved the imports outside 
of the loop, but due to replace=true, this caused the logic to fail).  I 
have tried other arrangements, with only minimal improvement.  In a 
successful execution, I am getting  only about 250 files per pass, which 
is a big headache--I will eventually need to process about 14000 files in 
all. 

Jim 




Holger Knublauch <[email protected]> 
Sent by: [email protected]
05/22/2009 01:51 PM 

Please respond to
[email protected]



To
[email protected] 
cc

Subject
[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports








Hi James, 

you may have hit a problem in the current SPARQLMotion engine. It 
currently leaves all opened graphs in memory because future steps of a 
script may want to access them again, and then reloading would be very 
slow. So while a smart garbage collection algorithm may work here, a 
short-term work around would be to manually have a way to call this 
garbage collection. I will look into this. 

Meanwhile, restarting TBC will work to a limited extent - as you have 
found out. 

Thanks, 
Holger 


On May 22, 2009, at 10:43 AM, James A Miller wrote: 

I was having Java heap space errors in my SparqlMotion script, which was 
processing 7000 input spreadsheets, (using a "control" spreadsheet with 
file URIs, and an IterateOverSelect for constructing triples from them, 
then adding reification triples, and finally writing them out to a Sesame 
store, within each iteration).  According to the console, it processed 
about 3500 files before dying. 

So, I split the 'control' spreadsheet into 4 spreadsheets, each containing 
about 2000 file references.  The first file ran to completion.  (hooray!) 
So I started the 2nd one, and it died, after processing 1500 files.  As a 
test, I restarted Eclipse, and ran it again, and then it succeeded. 

Is there a command that I should be issuing that will clear out memory 
from earlier executions, rather than restarting TBC/Eclipse repeatedly? 

Also, at one point I was accumulating all of the data, and attempting to 
write it all out after the iterations.  Due to the heap problems, I 
started writing to a Sesame store within the iterations, but that didn't 
really fix anything.   I am still getting the heap problems (as I 
described).  I guess it's an extension of the same question--is there 
something I can do within the iterations, which will clear out memory so 
that I don't run into this problem, and can extract all of my files in the 
same process? 

I have my memory set for Eclipse at -Xmx1500m. 

Jim








The following line is added for your protection and will be used for 
analysis if this message is reported as spam: (Raytheon Analysis: 
IP=209.85.221.157; 
e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com
; [email protected]; date=May 22, 2009 5:51:31 PM; 
subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports) 












The following line is added for your protection and will be used for 
analysis if this message is reported as spam: (Raytheon Analysis: 
IP=209.85.146.167; 
e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com;
 
[email protected]; date=May 24, 2009 4:47:55 PM; 
subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports) 



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TopBraid Composer Users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/topbraid-composer-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to