[rhino-tools-dev] Re: Questions on ETL: Follow up

Jason Meckley Mon, 11 Jan 2010 09:42:19 -0800

So joining is similar to InputCommandOperations in that the
IEnumerable<Row> parameter is ignored. The rows are generated by the
partial process within the JoinOperation. And this is purely a design
choice. Ok, makes sense.


"It looks like JoinOp is vestigal remains from NestedLoopJoinOp
refactoring."
I had to define vestigial(http://www.google.com/search?q=define
%3Avestigial). So  is NestedLoop preferred operation or Join :) ?

"The reason that you can join after a branch is that you can, the
syntax is just plain ugly. Basically, you need to register a join op
and then register the final stage of each branch."
I can join after branching, albeit ugly syntax. Not a problem.

"I think that in order to make your process happen you need an
intermediary operation that would drain the previous operation rows
(executing them) and the execute the next in line."
So instead of using the provided Database Operations I should define
my own [Abstract/Database]Operation that would process the row and
then yield it? something like:
class PassThroughOutputOperation: AbstractOperation
{
     public void IEnumerable<Row> Execute(IEnumerable<Row> rows)
     {
               foreach(var row in rows)
               {
                      //save row to database
                      yield return row;
               }
     }
}


On Jan 11, 12:13 pm, Ayende Rahien <aye...@ayende.com> wrote:
> Wow, so many questions.
> The reason that join op is ignoring the source input is that it is joining
> its left & right ops, what would it do with additional input?
> You could build it (and now that I think about it, I could argue that way)
> that the input is the "left" side, but that isn't how it is designed.
> A join op is always a root operation.
> It looks like JoinOp is vestigal remains from NestedLoopJoinOp refactoring.
>
> Your branching reasoning is solid.
> The reason that you can join after a branch is that you can, the syntax is
> just plain ugly.
> Basically, you need to register a join op and then register the final stage
> of each branch.
>
> I think that in order to make your process happen you need an intermediary
> operation that would drain the previous operation rows (executing them) and
> the execute the next in line.
>
> On Mon, Jan 11, 2010 at 6:51 PM, Jason Meckley <jasonmeck...@gmail.com>wrote:
>
> > I'm digging more into ETL and I have come across
> > NestedLoopsJoinOperation and JoinOperation. I cannot tell what the
> > difference is, or why I would use one over the other? Also, why are
> > the rows ignored rather than passed to the let and right operations?
>
> > I'm also trying to understand the PartialEtlProcess. Is the idea of
> > Partial to load multiple subset? Like if I wanted to branch my
> > operations with each branch containing multiple processes?
> > Register(new GetData())
> > .Register(new BranchingOperation()
> >        .Add(Partial
> >                      .Register(OperationA)
> >                      .Register(OperationB))
> >        .Add(Partial
> >                      .Register(OperationC)
> >                      .Register(OperationD))
> > );
>
> > In this scenario the operations would run:
> > [Branch 1] A1, B1, A2, B2
> > [Branch 2] C1, D1, C2, D2
> > where A & C would be intermediate logical operations (transformations)
> > and B & D are output operations?
>
> > Trying to follow thread
> >http://groups.google.com/group/rhino-tools-dev/browse_thread/thread/e...
> > ,
> > why isn't it possible to join after a branch? Is this because the left
> > and right operations are passed null?
>
> > here is what I have
> > a single flat DBF file. I need to import this into a Sql Database. Sql
> > has 2 tables Parent 1-N Children 1-N GrandChildren
> > when the data is imported I need to preform 3 different operations:
> > 1. group the source by field
> > 2. insert new groups into Parent
> > 3. insert children & grand children in DBF that are not in SQL (left
> > join)
> > 4. update existing children and grand children (inner join, there will
> > always be the same number of grand children)
> > 5. delete Parents from SQL that do not have any children
>
> > Operation 1 only applies to operation 2, so I figure this could be a
> > branch.
> > Operations 3 and 4 can also be done independently of one another,
> > again branching.
> > Operations 3 and 4 are dependent on the completion of operation 2.
> > Operation 5 must be executed after 3 and 4 are complete.
>
> > It looks like Operations 3 & 4 are actually 2 output command. in which
> > case I must break this logic into two operations and branch together.
>
> > I would like this to occur within a single transaction/ETL process,
> > but I'm not sure if that's possible, or reasonable?
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Rhino Tools Dev" group.
> > To post to this group, send email to rhino-tools-...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/rhino-tools-dev?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to rhino-tools-...@googlegroups.com.
To unsubscribe from this group, send email to 
rhino-tools-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

[rhino-tools-dev] Re: Questions on ETL: Follow up

Reply via email to