Hi Jim, (with thanks for sending me your file off-list). I have seen that you were using sml:ImportRDFFromWorkspace to do the spreadsheet import, and this indicates that you indeed hit a memory leak. When the script iterates over thousands of files, it is difficult to determine when to garbage collect any one of them. In the absence of a smarter approach of the SM engine, I have added a new module type sml:CollectGarbage in which you just need to provide the base URI of the file to "unload" and it will unregister that file from memory so that the next round of Java garbage collection can pick it up.
This will be part of the next official release, and I will contact you off-list as soon as we have an intermediate patch for you this coming week. Holger On May 23, 2009, at 11:33 AM, Holger Knublauch wrote: > Jim, > > in order to enable me to narrow down the problem, could you kindly > either send me the script or enumerate the module types that you are > using? In particular I would like to know what type of file import > modules you are using. We may do an inofficial test build next week > and I might be able to send you a link to that. > > Thanks > Holger > > > On May 23, 2009, at 7:59 AM, James A Miller wrote: > >> Holger, >> >> If it is possible, a patch would be very helpful--upon further >> checking, one of my workarounds, which allowed 2000 files to be >> input per pass, caused most of the output to be omitted. (I had >> moved the imports outside of the loop, but due to replace=true, >> this caused the logic to fail). I have tried other arrangements, >> with only minimal improvement. In a successful execution, I am >> getting only about 250 files per pass, which is a big headache--I >> will eventually need to process about 14000 files in all. >> >> Jim >> >> >> >> >> Holger Knublauch <[email protected]> >> Sent by: [email protected] >> 05/22/2009 01:51 PM >> Please respond to >> [email protected] >> >> To >> [email protected] >> cc >> Subject >> [tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports >> >> >> >> >> >> Hi James, >> >> you may have hit a problem in the current SPARQLMotion engine. It >> currently leaves all opened graphs in memory because future steps >> of a script may want to access them again, and then reloading would >> be very slow. So while a smart garbage collection algorithm may >> work here, a short-term work around would be to manually have a way >> to call this garbage collection. I will look into this. >> >> Meanwhile, restarting TBC will work to a limited extent - as you >> have found out. >> >> Thanks, >> Holger >> >> >> On May 22, 2009, at 10:43 AM, James A Miller wrote: >> >> I was having Java heap space errors in my SparqlMotion script, >> which was processing 7000 input spreadsheets, (using a "control" >> spreadsheet with file URIs, and an IterateOverSelect for >> constructing triples from them, then adding reification triples, >> and finally writing them out to a Sesame store, within each >> iteration). According to the console, it processed about 3500 >> files before dying. >> >> So, I split the 'control' spreadsheet into 4 spreadsheets, each >> containing about 2000 file references. The first file ran to >> completion. (hooray!) So I started the 2nd one, and it died, >> after processing 1500 files. As a test, I restarted Eclipse, and >> ran it again, and then it succeeded. >> >> Is there a command that I should be issuing that will clear out >> memory from earlier executions, rather than restarting TBC/Eclipse >> repeatedly? >> >> Also, at one point I was accumulating all of the data, and >> attempting to write it all out after the iterations. Due to the >> heap problems, I started writing to a Sesame store within the >> iterations, but that didn't really fix anything. I am still >> getting the heap problems (as I described). I guess it's an >> extension of the same question--is there something I can do within >> the iterations, which will clear out memory so that I don't run >> into this problem, and can extract all of my files in the same >> process? >> >> I have my memory set for Eclipse at -Xmx1500m. >> >> Jim >> >> >> >> >> >> >> >> >> The following line is added for your protection and will be used >> for analysis if this message is reported as spam: (Raytheon >> Analysis: IP=209.85.221.157; >> e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com >> >> ; [email protected]; date=May 22, 2009 5:51:31 PM; >> subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet >> imports) >> >> >> >> >> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TopBraid Composer Users" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/topbraid-composer-users?hl=en -~----------~----~----~----~------~----~------~--~---
