[rhino-tools-dev] Re: Rhino ETL

Ayende Rahien Wed, 07 Apr 2010 06:30:36 -0700

This looks good, I took a look at the ThreadPoolPipelineExecuter and there
isn't anything that should cause an OOM.
It is possible that SqlBulkCopy is holding the references, if that is the
case, there isn't much we can do about that.
You solution of splitting the bulk load to several parts is probably best.


On Sun, Apr 4, 2010 at 6:17 PM, Dan Jensen <[email protected]> wrote:

>  I'm inheriting directly from EtlProcess, and the only thing I'm
> overriding is the Initialize() method.  So it's
> using the default ThreadPoolPipelineExecuter.  Here's my Initialize
> method (Entity is an enum indicating
> which Db table to work on):
>
> protected override void Initialize()
> {
>     Register(new LoadUdfsOperation(ConnectionStringName, Entity));
>     Register(new PivotUdfsOperation(Entity));
>     Register(new WriteUdfsToDBOperation(ConnectionStringName, Entity));
> }
>
> I posted the code to the operations on pastebin:
> LoadUdfsOperation:  http://pastebin.com/PT9WMFT5
> PivotUdfsOperation:  http://pastebin.com/UbpRgPkW
> WriteUdfsToDBOperation:  http://pastebin.com/wGUyb822
>
> Here's a basic summary:
>
> LoadUdfsOperation gets a data reader with the rows from the source table.
> PivotUdfsOperation iterates through each row and yields one pivoted row
> for each column in the original row.
> WriteUdfsToDBOperation inherits from SqlBulkInsertOperation to insert the
> pivoted rows into the target table.  It does them in batches, so that the
> bulk insert itself doesn't run out of memory.
>
> So based on what I saw in ANTS, I believe the IEnumerable<Row> collection
> that's being passed between the operations
> is what's growing and causing the Out of Memory exception.  If you see
> something in the operations that looks wrong, please
> let me know.  At this point the only idea I have is to break the table into
> chunks, and run each chunk through the process separately.
>
> Thanks,
> Dan
>
>
>  *From:* Ayende Rahien <[email protected]>
> *Sent:* Sunday, April 04, 2010 3:46 AM
> *To:* Dan Jensen <[email protected]> ; 
> rhino-tools-dev<[email protected]>
> *Subject:* Re: Rhino ETL
>
> Dan,
> The proper place to ask is in the mailing list:  "rhino-tools-dev" <
> [email protected]>,
> Answers below (and the mailing list is CCed).
>
> On Fri, Apr 2, 2010 at 5:52 PM, Dan Jensen <[email protected]> wrote:
>
>>  Paul Barrier (you linked to his talk about ETL recently) has since
>> commented and suggested a couple
>> of things to try.  I tried the SqlBatch option, but it was much slower.
>> So now I've started over-riding the
>> Execute method of the SqlBulkInsertOperation in order to do smaller
>> batches of inserts at a time.  Now
>> the bulk insert itself is small, but I'm still getting the memory
>> exception.  I next profiled the app with ANTS,
>> and discovered that the cause of the exception was the collection of Row
>> objects that is being passed
>> between the operations in the process.  So this is my question:  Is there
>> a way around this?
>>
>>
>
> Hm, what is that collection? Who is maintaining it? Is it Rhino ETL?
> If so, what executer are you using?
>
>
>>  I thought that by yielding to each subsequent operation, only one row at
>> a time would be passed between
>> the operations.
>>
>
> Yes, that is how it should work.
> Of the top of my head, I can't think of a reason it would behave in this
> manner.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

[rhino-tools-dev] Re: Rhino ETL

Reply via email to