Exactly. When choosing between the two just try them out and see whether using the multi-threaded executor gives any advantage, it's not obvious.
On Fri, Jan 8, 2010 at 15:32, Jason Meckley <jasonmeck...@gmail.com> wrote: > At first Simone's response did not make any sense. If B is dependent > on A, B cannot do anything until A has processed at least 1 row. once > the row is processed it's passed to the next operator. > > But giving this some thought I think I see the value now. > If operations B and A are run in parallel and B depends on A. > A could have completed processing rows 1 - 10 on Thread[1] while B may > still be processing row 2 on Thread[2]. > In a single threaded pipeline A would have only iterated rows 1 and 2 > because operation B is still processing row 2. > > Now things make sense :) > > It would seem then, that Thread Pool would be overkill, maybe even > more resource intensive, on a process which is either (1) dealing with > a small number of records or (2) is executing some very simple > operations. > > On Jan 8, 9:09 am, Simone Busoli <simone.bus...@gmail.com> wrote: > > Yes, more or less. The multi threaded case means just that each operation > is > > executed as soon as the process starts executing. In the single threaded > > case an operation is not executed (that is, its rows iterated) until the > > operation which comes next in the pipeline starts asking for them. > > > > The reason why it makes sense is the same as when you are executing an > > operation asynchronously, and while you're waiting for that operation to > > complete you do something else. Say that I am operation B, which needs to > > perform some intensive computation on each of the rows which come from > > operation A, which in turn does something intensive before being able to > > provide me a row. If I execute them in a single thread then operation B > > starts executing and asks A for its first row. A starts executing and B > has > > to wait until A provides the row. What if as soon as I (B) ask A for its > row > > A has one ready to give me? This happens in the multi threaded case > before > > each operation starts executing (and caches its result) even before the > > following operation in the pipeline asks for the first row. That what I > > meant with executing eagerly. > > > > On Fri, Jan 8, 2010 at 14:57, Jason Meckley <jasonmeck...@gmail.com> > wrote: > > > OK I think this is starting to come together. here is an example: > > > Operations: One, Two, Three > > > Rows: a, b, c, d > > > > > Single Threaded: > > > [Thread 1] One a, Two a, Three a > > > [Thread 1] One b, Two b, Three b > > > [Thread 1] One c, Two c, Three c > > > [Thread 1] One d, Two d, Three d > > > > > Thread Pool: > > > [Thread 1] One a, One b, One c, One d > > > [Thread 2] Two a, Two b, Two c, One d > > > [Thread 3] Three a, Three b, Three c, Three d > > > > > I'm not seeing the value with Thread Pool though, if I understand this > > > correctly. Is the idea of Thread Pool like a one way messaging? The > > > main thread would call Process.Execute(); and the rows would be > > > handled on background thread and the Main thread would continue with > > > processing? > > > > > On Jan 8, 2:32 am, Simone Busoli <simone.bus...@gmail.com> wrote: > > > > inline > > > > > > On Fri, Jan 8, 2010 at 03:08, Jason Meckley <jasonmeck...@gmail.com> > > > wrote: > > > > > Thanks Simone makes sense. One thing I'm a bit cloudy on is this > > > > > statement: > > > > > "you're no longer pulling from the tail but eagerly execute each > > > > > operation which will cache the result and feed it to the following > > > > > operation > > > > > as soon as it asks for it. " > > > > > > > I understand how yield works. I don't understand what you mean by > > > > > "tail" and "eagerly executing". I'm assuming all operations of an > > > > > EtlProcess are either Single Threaded or Thread Pooled. Can you > > > > > selectively change which Pipeline Executor is used per operation. > If > > > > > so, what would be the benefit of putting some operations on a > single > > > > > thread, and others on the thread pool queue? > > > > > > This thread explains how it works: > > >http://groups.google.com/group/rhino-tools-dev/browse_thread/thread/c. > .. > > > > BTW, I suggest you use the SingleThreaded one and debug a test like > that. > > > > You'll realize how it works, and then just take a look at the > > > > implementations on the two pipeline executors, which is a > > > > single overridden method. > > > > > > > When you used branching as an insert, update or delete how did you > > > > > separate the rows? I'm assuming you had a validation clause in each > > > > > operation checking this flag. > > > > > > I just added a new entry in each row with key, say "action", and > value > > > > either "toInsert", "toDelete" or "toUpdate". Then branched on them > and > > > each > > > > of the three branches just performed its own action on the rows > having > > > the > > > > action they knew about. > > > > > > > On Jan 7, 5:27 pm, Simone Busoli <simone.bus...@gmail.com> wrote: > > > > > > Hi Jason, inline > > > > > > > > On Thu, Jan 7, 2010 at 21:30, Jason Meckley < > jasonmeck...@gmail.com> > > > > > wrote: > > > > > > > I am experimenting with ETL and I have a few questions. So in > no > > > > > > > particular order. > > > > > > > > > 1. Branching > > > > > > > I'm trying to understand branching. Branching allows you to > copy > > > the > > > > > > > row into multiple operations. Each operation can manipulate the > row > > > > > > > independent of the other operations on the branch(sending it to > > > > > > > separate outputs, aggregating the data differently, etc.), > correct? > > > > > > > The test for BranchingOperation repeats the FibonacciBulkInsert > > > > > > > operation a specific number of times, but I don't readily see > the > > > > > > > value since each operation is the same, other than low memory > > > > > > > consumption. > > > > > > > > branching will feed clones of its input rows to all the > operations it > > > > > > branches on. I can give you a use case in which I adopted it. I > had a > > > > > list > > > > > > of rows and hat to decide whether each row was to delete, insert > or > > > > > update. > > > > > > In an operation I marked each row with a marker attribute as a > field > > > in > > > > > the > > > > > > row itself which said whether the row was to delete, insert or > > > update, > > > > > then > > > > > > I branched the rows into three branches, one to deal with rows to > > > delete, > > > > > > another with rows to insert and the other one with those to > update. > > > Each > > > > > > branch dealt only and performed actions just on the rows whose > marker > > > > > > attribute has the value it knew about (again, delete, insert, > update) > > > > > > > > > 2. Pipeline Executors > > > > > > > Is the ThreadPoolPipelineExecuter designed to execute multiple > ETL > > > > > > > Processes simultaneously? At I first thought it would execute > each > > > row > > > > > > > on a separate thread, but that doesn't seem to be the case. So > now > > > I'm > > > > > > > thinking It would be used like this: > > > > > > > new EtlProcessOne { PipelineExecuter = new > > > > > > > ThreadPoolPipelineExecuter() }.Execute(); > > > > > > > new EtlProcessTwo { PipelineExecuter = new > > > > > > > ThreadPoolPipelineExecuter() }.Execute(); > > > > > > > Where One and Two are executed on separate threads in parallel > > > rather > > > > > > > than on the same thread in serial. So in theory One may fail, > but > > > Two > > > > > > > could pass. If they use the SingleThreadedPipelineExecuter and > One > > > > > > > fails, Two would not execute. > > > > > > > > The purpose of the multi threaded one is to eagerly iterate the > rows > > > of > > > > > each > > > > > > operation using a thread from the thread pool, thus improving > > > performance > > > > > > since you're no longer pulling from the tail but eagerly execute > each > > > > > > operation which will cache the result and feed it to the > following > > > > > operation > > > > > > as soon as it asks for it. > > > > > > > > > 2. Magic Strings > > > > > > > I understand why a Row is basically a HashTable, but this makes > for > > > > > > > laborious code. I found the convenience methods ToObject<T>, > > > > > > > FromObject, etc. but wanted a way to easily merge rows. I > didn't > > > > > > > readily see any options, so i build an extension method. > > > > > > > public static class RowExtensions > > > > > > > { > > > > > > > public static void MergeWith(this Row row, Row other) > > > > > > > { > > > > > > > foreach(var key in other.Keys) > > > > > > > { > > > > > > > row[key] = other[key]; > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > > > here is the usage in terms of an AggregationOperation. > > > > > > > internal class AggregateData : AbstractAggregationOperation > > > > > > > { > > > > > > > protected override void Accumulate(Row row, Row > aggregate) > > > > > > > { > > > > > > > var source = row.ToObject<Source>(); > > > > > > > var destination = aggregate.ToObject<Destination>(); > > > > > > > > > destination.Grouping = source.ColumnToGroupBy; > > > > > > > destination.Total += source.ColumnToSum; > > > > > > > > > aggregate.MergeWith(Row.FromObject(destination)); > > > > > > > } > > > > > > > > > protected override string[] GetColumnsToGroupBy() > > > > > > > { > > > > > > > return new[] { "ColumnToGroupBy" }; > > > > > > > } > > > > > > > } > > > > > > > This isn't out of line is it? > > > > > > > > I don't think it's out of line, whatever is working for you then > it > > > makes > > > > > > sense. > > > > > > > > > All in all I'm excited about the possibilities with this tool! > > > > > > > > > -- > > > > > > > You received this message because you are subscribed to the > Google > > > > > Groups > > > > > > > "Rhino Tools Dev" group. > > > > > > > To post to this group, send email to > > > rhino-tools-...@googlegroups.com. > > > > > > > To unsubscribe from this group, send email to > > > > > > > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com> > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com> > > > > > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com> > <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com> > > > > > > > > > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com> > <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com> > > > > > <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com> > <rhino-tools-dev%25252bunsubscr...@googlegroups.com<rhino-tools-dev%2525252bunsubscr...@googlegroups.com> > > > > > > > > > > > . > > > > > > > For more options, visit this group at > > > > > > >http://groups.google.com/group/rhino-tools-dev?hl=en. > > > > > > > -- > > > > > You received this message because you are subscribed to the Google > > > Groups > > > > > "Rhino Tools Dev" group. > > > > > To post to this group, send email to > rhino-tools-...@googlegroups.com. > > > > > To unsubscribe from this group, send email to > > > > > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com> > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com> > > > > > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com> > <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com> > > > > > > > > > . > > > > > For more options, visit this group at > > > > >http://groups.google.com/group/rhino-tools-dev?hl=en. > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "Rhino Tools Dev" group. > > > To post to this group, send email to rhino-tools-...@googlegroups.com. > > > To unsubscribe from this group, send email to > > > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com> > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com> > > > > > . > > > For more options, visit this group at > > >http://groups.google.com/group/rhino-tools-dev?hl=en. > > > > ... > > > > read more ยป > > -- > You received this message because you are subscribed to the Google Groups > "Rhino Tools Dev" group. > To post to this group, send email to rhino-tools-...@googlegroups.com. > To unsubscribe from this group, send email to > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/rhino-tools-dev?hl=en. > > > >--
You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
To post to this group, send email to rhino-tools-...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-dev+unsubscr...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.