Holger, If it is possible, a patch would be very helpful--upon further checking, one of my workarounds, which allowed 2000 files to be input per pass, caused most of the output to be omitted. (I had moved the imports outside of the loop, but due to replace=true, this caused the logic to fail). I have tried other arrangements, with only minimal improvement. In a successful execution, I am getting only about 250 files per pass, which is a big headache--I will eventually need to process about 14000 files in all.
Jim Holger Knublauch <[email protected]> Sent by: [email protected] 05/22/2009 01:51 PM Please respond to [email protected] To [email protected] cc Subject [tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports Hi James, you may have hit a problem in the current SPARQLMotion engine. It currently leaves all opened graphs in memory because future steps of a script may want to access them again, and then reloading would be very slow. So while a smart garbage collection algorithm may work here, a short-term work around would be to manually have a way to call this garbage collection. I will look into this. Meanwhile, restarting TBC will work to a limited extent - as you have found out. Thanks, Holger On May 22, 2009, at 10:43 AM, James A Miller wrote: I was having Java heap space errors in my SparqlMotion script, which was processing 7000 input spreadsheets, (using a "control" spreadsheet with file URIs, and an IterateOverSelect for constructing triples from them, then adding reification triples, and finally writing them out to a Sesame store, within each iteration). According to the console, it processed about 3500 files before dying. So, I split the 'control' spreadsheet into 4 spreadsheets, each containing about 2000 file references. The first file ran to completion. (hooray!) So I started the 2nd one, and it died, after processing 1500 files. As a test, I restarted Eclipse, and ran it again, and then it succeeded. Is there a command that I should be issuing that will clear out memory from earlier executions, rather than restarting TBC/Eclipse repeatedly? Also, at one point I was accumulating all of the data, and attempting to write it all out after the iterations. Due to the heap problems, I started writing to a Sesame store within the iterations, but that didn't really fix anything. I am still getting the heap problems (as I described). I guess it's an extension of the same question--is there something I can do within the iterations, which will clear out memory so that I don't run into this problem, and can extract all of my files in the same process? I have my memory set for Eclipse at -Xmx1500m. Jim The following line is added for your protection and will be used for analysis if this message is reported as spam: (Raytheon Analysis: IP=209.85.221.157; e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com; [email protected]; date=May 22, 2009 5:51:31 PM; subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TopBraid Composer Users" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/topbraid-composer-users?hl=en -~----------~----~----~----~------~----~------~--~---
