On Mon, 8 Aug 2011, Matthew Reeves wrote:
I have a use case which supports reading a single large excel file and writing into multiple smaller plain text files. I'm using the event model to achieve a low memory foot print. What I would like to be able to do is save progress. If I've broken my single large file into 1000 row chunks and have saved off that I've worked through 100 chunks, I would like to be able to start reading at row number 100,000.

There's no way to do this, sorry. What you'll need to do is track when you hit a new line, record the row number of that, and flush your data out. When you start again, skip until you hit that row again, and away you go

Would it be reasonable to assume the memory variable, _unreadRecordIndex, in RecordFactoryInputStream contains both the number I'm looking for (how many records I've processed/read) and the number I could set if I wanted to start reading at, say row number, 100,000? (Obviously I would need to make code changes to make this work)

I'd advise against trying to do it at a raw record level. Because of continue records etc, you might find yourself trying to resume at an point that doesn't really make sense. Instead, I'd suggest you just track the last seen row number, and use that

You might also want to look at the MissingRecordAware code. One option is to use that, so you can be sure to always hit the right number of rows in a chunk, no matter how many of them are blank. The other is just to review the code to see how best to do the row tracking

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to