Yeah, I agree it's not rhino etl causing the problem. That was my
mistake, sorry about that. I think the problem is my third
operation. Even though I'm submitting the rows to the base Execute
method in batches, the rows collection that's coming into the
operation is growing. The reason for that is my last line: return
rows;
I'm not doing a yield, so it's trying to accumulate all 64 million rows.
I think the solution is the same, though. Break it up into chunks.
Thanks!
On Apr 7, 2010, at 8:30 AM, Ayende Rahien <[email protected]> wrote:
This looks good, I took a look at the ThreadPoolPipelineExecuter and
there isn't anything that should cause an OOM.
It is possible that SqlBulkCopy is holding the references, if that
is the case, there isn't much we can do about that.
You solution of splitting the bulk load to several parts is probably
best.
On Sun, Apr 4, 2010 at 6:17 PM, Dan Jensen <[email protected]> wrote:
I'm inheriting directly from EtlProcess, and the only thing I'm
overriding is the Initialize() method. So it's
using the default ThreadPoolPipelineExecuter. Here's my Initialize
method (Entity is an enum indicating
which Db table to work on):
protected override void Initialize()
{
Register(new LoadUdfsOperation(ConnectionStringName, Entity));
Register(new PivotUdfsOperation(Entity));
Register(new WriteUdfsToDBOperation(ConnectionStringName,
Entity));
}
I posted the code to the operations on pastebin:
LoadUdfsOperation: http://pastebin.com/PT9WMFT5
PivotUdfsOperation: http://pastebin.com/UbpRgPkW
WriteUdfsToDBOperation: http://pastebin.com/wGUyb822
Here's a basic summary:
LoadUdfsOperation gets a data reader with the rows from the source
table.
PivotUdfsOperation iterates through each row and yields one pivoted
row for each column in the original row.
WriteUdfsToDBOperation inherits from SqlBulkInsertOperation to
insert the pivoted rows into the target table. It does them in
batches, so that the bulk insert itself doesn't run out of memory.
So based on what I saw in ANTS, I believe the IEnumerable<Row>
collection that's being passed between the operations
is what's growing and causing the Out of Memory exception. If you
see something in the operations that looks wrong, please
let me know. At this point the only idea I have is to break the
table into chunks, and run each chunk through the process separately.
Thanks,
Dan
--
You received this message because you are subscribed to the Google Groups "Rhino
Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rhino-tools-dev?hl=en.