Jim, in order to enable me to narrow down the problem, could you kindly either send me the script or enumerate the module types that you are using? In particular I would like to know what type of file import modules you are using. We may do an inofficial test build next week and I might be able to send you a link to that.
Thanks Holger On May 23, 2009, at 7:59 AM, James A Miller wrote: > Holger, > > If it is possible, a patch would be very helpful--upon further > checking, one of my workarounds, which allowed 2000 files to be > input per pass, caused most of the output to be omitted. (I had > moved the imports outside of the loop, but due to replace=true, this > caused the logic to fail). I have tried other arrangements, with > only minimal improvement. In a successful execution, I am getting > only about 250 files per pass, which is a big headache--I will > eventually need to process about 14000 files in all. > > Jim > > > > > Holger Knublauch <[email protected]> > Sent by: [email protected] > 05/22/2009 01:51 PM > Please respond to > [email protected] > > To > [email protected] > cc > Subject > [tbc-users] Re: SparqlMotion Heap errors on spreadsheet imports > > > > > > Hi James, > > you may have hit a problem in the current SPARQLMotion engine. It > currently leaves all opened graphs in memory because future steps of > a script may want to access them again, and then reloading would be > very slow. So while a smart garbage collection algorithm may work > here, a short-term work around would be to manually have a way to > call this garbage collection. I will look into this. > > Meanwhile, restarting TBC will work to a limited extent - as you > have found out. > > Thanks, > Holger > > > On May 22, 2009, at 10:43 AM, James A Miller wrote: > > I was having Java heap space errors in my SparqlMotion script, which > was processing 7000 input spreadsheets, (using a "control" > spreadsheet with file URIs, and an IterateOverSelect for > constructing triples from them, then adding reification triples, and > finally writing them out to a Sesame store, within each iteration). > According to the console, it processed about 3500 files before dying. > > So, I split the 'control' spreadsheet into 4 spreadsheets, each > containing about 2000 file references. The first file ran to > completion. (hooray!) So I started the 2nd one, and it died, after > processing 1500 files. As a test, I restarted Eclipse, and ran it > again, and then it succeeded. > > Is there a command that I should be issuing that will clear out > memory from earlier executions, rather than restarting TBC/Eclipse > repeatedly? > > Also, at one point I was accumulating all of the data, and > attempting to write it all out after the iterations. Due to the > heap problems, I started writing to a Sesame store within the > iterations, but that didn't really fix anything. I am still > getting the heap problems (as I described). I guess it's an > extension of the same question--is there something I can do within > the iterations, which will clear out memory so that I don't run into > this problem, and can extract all of my files in the same process? > > I have my memory set for Eclipse at -Xmx1500m. > > Jim > > > > > > > > > The following line is added for your protection and will be used for > analysis if this message is reported as spam: (Raytheon Analysis: > IP=209.85.221.157; > e-from=grbounce--o7mwauaaadwl_g2bqlyhjkrnqovtno4=james_a_miller=raytheon....@googlegroups.com > > ; [email protected]; date=May 22, 2009 5:51:31 PM; > subject=[tbc-users] Re: SparqlMotion Heap errors on spreadsheet > imports) > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TopBraid Composer Users" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/topbraid-composer-users?hl=en -~----------~----~----~----~------~----~------~--~---
