Re: [rhino-tools-dev] Re: Rhino ETL

Dan Jensen Wed, 07 Apr 2010 09:44:25 -0700

Yeah, I agree it's not rhino etl causing the problem. That was mymistake, sorry about that. I think the problem is my thirdoperation. Even though I'm submitting the rows to the base Executemethod in batches, the rows collection that's coming into theoperation is growing. The reason for that is my last line: returnrows;

I'm not doing a yield, so it's trying to accumulate all 64 million rows.

I think the solution is the same, though. Break it up into chunks.Thanks!




On Apr 7, 2010, at 8:30 AM, Ayende Rahien <[email protected]> wrote:

This looks good, I took a look at the ThreadPoolPipelineExecuter andthere isn't anything that should cause an OOM.It is possible that SqlBulkCopy is holding the references, if thatis the case, there isn't much we can do about that.You solution of splitting the bulk load to several parts is probablybest.
On Sun, Apr 4, 2010 at 6:17 PM, Dan Jensen <[email protected]> wrote:
I'm inheriting directly from EtlProcess, and the only thing I'moverriding is the Initialize() method. So it'susing the default ThreadPoolPipelineExecuter. Here's my Initializemethod (Entity is an enum indicating
which Db table to work on):

protected override void Initialize()
{
    Register(new LoadUdfsOperation(ConnectionStringName, Entity));
    Register(new PivotUdfsOperation(Entity));
Register(new WriteUdfsToDBOperation(ConnectionStringName,Entity));
}

I posted the code to the operations on pastebin:
LoadUdfsOperation:  http://pastebin.com/PT9WMFT5
PivotUdfsOperation:  http://pastebin.com/UbpRgPkW
WriteUdfsToDBOperation:  http://pastebin.com/wGUyb822

Here's a basic summary:
LoadUdfsOperation gets a data reader with the rows from the sourcetable.PivotUdfsOperation iterates through each row and yields one pivotedrow for each column in the original row.WriteUdfsToDBOperation inherits from SqlBulkInsertOperation toinsert the pivoted rows into the target table. It does them inbatches, so that the bulk insert itself doesn't run out of memory.
So based on what I saw in ANTS, I believe the IEnumerable<Row>collection that's being passed between the operationsis what's growing and causing the Out of Memory exception. If yousee something in the operations that looks wrong, pleaselet me know. At this point the only idea I have is to break thetable into chunks, and run each chunk through the process separately.
Thanks,
Dan


--
You received this message because you are subscribed to the Google Groups "Rhino 
Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

Re: [rhino-tools-dev] Re: Rhino ETL

Reply via email to