One of the things I'm finding is that: 1. Input Operations often need input to help them decide what to pull from a source, so that the data they deliver to the Output is correct. 2. Output Operations often need to act as input to the next Operation (e.g. when returning the last inserted id).
This brings me to the Join Operation - if the first operation of either side of the join is an Input Operation of the type mentioned above, it would be helpful if a Join didn't HAVE to be a root. I've got a couple of PassThroughOutputOperations I could contribute: -o- ConventionPassThroughOutputOperation : AbstractCommandOperation -o- ConventionPassThroughWithScalarOutputOperation<T> : ConventionPassThroughOutputOperation -rb -----Original Message----- From: rhino-tools-dev@googlegroups.com [mailto:rhino-tools-...@googlegroups.com] On Behalf Of Jason Meckley Sent: Monday, January 11, 2010 9:42 AM To: Rhino Tools Dev Subject: [rhino-tools-dev] Re: Questions on ETL: Follow up So joining is similar to InputCommandOperations in that the IEnumerable<Row> parameter is ignored. The rows are generated by the partial process within the JoinOperation. And this is purely a design choice. Ok, makes sense. "It looks like JoinOp is vestigal remains from NestedLoopJoinOp refactoring." I had to define vestigial(http://www.google.com/search?q=define %3Avestigial). So is NestedLoop preferred operation or Join :) ? "The reason that you can join after a branch is that you can, the syntax is just plain ugly. Basically, you need to register a join op and then register the final stage of each branch." I can join after branching, albeit ugly syntax. Not a problem. "I think that in order to make your process happen you need an intermediary operation that would drain the previous operation rows (executing them) and the execute the next in line." So instead of using the provided Database Operations I should define my own [Abstract/Database]Operation that would process the row and then yield it? something like: class PassThroughOutputOperation: AbstractOperation { public void IEnumerable<Row> Execute(IEnumerable<Row> rows) { foreach(var row in rows) { //save row to database yield return row; } } } On Jan 11, 12:13 pm, Ayende Rahien <aye...@ayende.com> wrote: > Wow, so many questions. > The reason that join op is ignoring the source input is that it is > joining its left & right ops, what would it do with additional input? > You could build it (and now that I think about it, I could argue that > way) that the input is the "left" side, but that isn't how it is designed. > A join op is always a root operation. > It looks like JoinOp is vestigal remains from NestedLoopJoinOp refactoring. > > Your branching reasoning is solid. > The reason that you can join after a branch is that you can, the > syntax is just plain ugly. > Basically, you need to register a join op and then register the final > stage of each branch. > > I think that in order to make your process happen you need an > intermediary operation that would drain the previous operation rows > (executing them) and the execute the next in line. > > On Mon, Jan 11, 2010 at 6:51 PM, Jason Meckley <jasonmeck...@gmail.com>wrote: > > > I'm digging more into ETL and I have come across > > NestedLoopsJoinOperation and JoinOperation. I cannot tell what the > > difference is, or why I would use one over the other? Also, why are > > the rows ignored rather than passed to the let and right operations? > > > I'm also trying to understand the PartialEtlProcess. Is the idea of > > Partial to load multiple subset? Like if I wanted to branch my > > operations with each branch containing multiple processes? > > Register(new GetData()) > > .Register(new BranchingOperation() > > .Add(Partial > > .Register(OperationA) > > .Register(OperationB)) > > .Add(Partial > > .Register(OperationC) > > .Register(OperationD)) ); > > > In this scenario the operations would run: > > [Branch 1] A1, B1, A2, B2 > > [Branch 2] C1, D1, C2, D2 > > where A & C would be intermediate logical operations > > (transformations) and B & D are output operations? > > > Trying to follow thread > >http://groups.google.com/group/rhino-tools-dev/browse_thread/thread/e... > > , > > why isn't it possible to join after a branch? Is this because the > >left and right operations are passed null? > > > here is what I have > > a single flat DBF file. I need to import this into a Sql Database. > > Sql has 2 tables Parent 1-N Children 1-N GrandChildren when the data > > is imported I need to preform 3 different operations: > > 1. group the source by field > > 2. insert new groups into Parent > > 3. insert children & grand children in DBF that are not in SQL (left > > join) > > 4. update existing children and grand children (inner join, there > > will always be the same number of grand children) 5. delete Parents > > from SQL that do not have any children > > > Operation 1 only applies to operation 2, so I figure this could be a > > branch. > > Operations 3 and 4 can also be done independently of one another, > > again branching. > > Operations 3 and 4 are dependent on the completion of operation 2. > > Operation 5 must be executed after 3 and 4 are complete. > > > It looks like Operations 3 & 4 are actually 2 output command. in > > which case I must break this logic into two operations and branch together. > > > I would like this to occur within a single transaction/ETL process, > > but I'm not sure if that's possible, or reasonable? > > > -- > > You received this message because you are subscribed to the Google > > Groups "Rhino Tools Dev" group. > > To post to this group, send email to rhino-tools-...@googlegroups.com. > > To unsubscribe from this group, send email to > > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2Bunsub > > rhino-tools-dev+scr...@googlegroups.com> > > . > > For more options, visit this group at > >http://groups.google.com/group/rhino-tools-dev?hl=en.
-- You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group. To post to this group, send email to rhino-tools-...@googlegroups.com. To unsubscribe from this group, send email to rhino-tools-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.