Holger, That is excellent news--thanks! I will look forward to the patch...
Jim Holger Knublauch <[email protected]> Sent by: [email protected] 05/24/2009 12:48 PM Please respond to [email protected] To [email protected] cc Subject [tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports Hi Jim, (with thanks for sending me your file off-list). I have seen that you were using sml:ImportRDFFromWorkspace to do the spreadsheet import, and this indicates that you indeed hit a memory leak. When the script iterates over thousands of files, it is difficult to determine when to garbage collect any one of them. In the absence of a smarter approach of the SM engine, I have added a new module type sml:CollectGarbage in which you just need to provide the base URI of the file to "unload" and it will unregister that file from memory so that the next round of Java garbage collection can pick it up. This will be part of the next official release, and I will contact you off-list as soon as we have an intermediate patch for you this coming week. Holger On May 23, 2009, at 11:33 AM, Holger Knublauch wrote: Jim, in order to enable me to narrow down the problem, could you kindly either send me the script or enumerate the module types that you are using? In particular I would like to know what type of file import modules you are using. We may do an inofficial test build next week and I might be able to send you a link to that. Thanks Holger On May 23, 2009, at 7:59 AM, James A Miller wrote: Holger, If it is possible, a patch would be very helpful--upon further checking, one of my workarounds, which allowed 2000 files to be input per pass, caused most of the output to be omitted. (I had moved the imports outside of the loop, but due to replace=true, this caused the logic to fail). I have tried other arrangements, with only minimal improvement. In a successful execution, I am getting only about 250 files per pass, which is a big headache--I will eventually need to process about 14000 files in all. Jim Holger Knublauch <[email protected]> Sent by: [email protected] 05/22/2009 01:51 PM Please respond to [email protected] To [email protected] cc Subject [tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports Hi James, you may have hit a problem in the current SPARQLMotion engine. It currently leaves all opened graphs in memory because future steps of a script may want to access them again, and then reloading would be very slow. So while a smart garbage collection algorithm may work here, a short-term work around would be to manually have a way to call this garbage collection. I will look into this. Meanwhile, restarting TBC will work to a limited extent - as you have found out. Thanks, Holger On May 22, 2009, at 10:43 AM, James A Miller wrote: I was having Java heap space errors in my SparqlMotion script, which was processing 7000 input spreadsheets, (using a "control" spreadsheet with file URIs, and an IterateOverSelect for constructing triples from them, then adding reification triples, and finally writing them out to a Sesame store, within each iteration). According to the console, it processed about 3500 files before dying. So, I split the 'control' spreadsheet into 4 spreadsheets, each containing about 2000 file references. The first file ran to completion. (hooray!) So I started the 2nd one, and it died, after processing 1500 files. As a test, I restarted Eclipse, and ran it again, and then it succeeded. Is there a command that I should be issuing that will clear out memory from earlier executions, rather than restarting TBC/Eclipse repeatedly? Also, at one point I was accumulating all of the data, and attempting to write it all out after the iterations. Due to the heap problems, I started writing to a Sesame store within the iterations, but that didn't really fix anything. I am still getting the heap problems (as I described). I guess it's an extension of the same question--is there something I can do within the iterations, which will clear out memory so that I don't run into this problem, and can extract all of my files in the same process? I have my memory set for Eclipse at -Xmx1500m. Jim The following line is added for your protection and will be used for analysis if this message is reported as spam: (Raytheon Analysis: IP=209.85.221.157; e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com ; [email protected]; date=May 22, 2009 5:51:31 PM; subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports) The following line is added for your protection and will be used for analysis if this message is reported as spam: (Raytheon Analysis: IP=209.85.146.167; e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com; [email protected]; date=May 24, 2009 4:47:55 PM; subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TopBraid Composer Users" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/topbraid-composer-users?hl=en -~----------~----~----~----~------~----~------~--~---
