Exactly. When choosing between the two just try them out and see whether
using the multi-threaded executor gives any advantage, it's not obvious.

On Fri, Jan 8, 2010 at 15:32, Jason Meckley <jasonmeck...@gmail.com> wrote:

> At first Simone's response did not make any sense. If B is dependent
> on A, B cannot do anything until A has processed at least 1 row. once
> the row is processed it's passed to the next operator.
>
> But giving this some thought I think I see the value now.
> If operations B and A are run in parallel and B depends on A.
> A could have completed processing rows 1 - 10 on Thread[1] while B may
> still be processing row 2 on Thread[2].
> In a single threaded pipeline A would have only iterated rows 1 and 2
> because operation B is still processing row 2.
>
> Now things make sense :)
>
> It would seem then, that Thread Pool would be overkill, maybe even
> more resource intensive, on a process which is either (1) dealing with
> a small number of records or (2) is executing some very simple
> operations.
>
> On Jan 8, 9:09 am, Simone Busoli <simone.bus...@gmail.com> wrote:
> > Yes, more or less. The multi threaded case means just that each operation
> is
> > executed as soon as the process starts executing. In the single threaded
> > case an operation is not executed (that is, its rows iterated) until the
> > operation which comes next in the pipeline starts asking for them.
> >
> > The reason why it makes sense is the same as when you are executing an
> > operation asynchronously, and while you're waiting for that operation to
> > complete you do something else. Say that I am operation B, which needs to
> > perform some intensive computation on each of the rows which come from
> > operation A, which in turn does something intensive before being able to
> > provide me a row. If I execute them in a single thread then operation B
> > starts executing and asks A for its first row. A starts executing and B
> has
> > to wait until A provides the row. What if as soon as I (B) ask A for its
> row
> > A has one ready to give me? This happens in the multi threaded case
> before
> > each operation starts executing (and caches its result) even before the
> > following operation in the pipeline asks for the first row. That what I
> > meant with executing eagerly.
> >
> > On Fri, Jan 8, 2010 at 14:57, Jason Meckley <jasonmeck...@gmail.com>
> wrote:
> > > OK I think this is starting to come together. here is an example:
> > > Operations: One, Two, Three
> > > Rows: a, b, c, d
> >
> > > Single Threaded:
> > >     [Thread 1] One a, Two a, Three a
> > >     [Thread 1] One b, Two b, Three b
> > >     [Thread 1] One c, Two c, Three c
> > >     [Thread 1] One d, Two d, Three d
> >
> > > Thread Pool:
> > >     [Thread 1] One a, One b, One c, One d
> > >     [Thread 2] Two a, Two b, Two c, One d
> > >     [Thread 3] Three a, Three b, Three c, Three d
> >
> > > I'm not seeing the value with Thread Pool though, if I understand this
> > > correctly. Is the idea of Thread Pool like a one way messaging? The
> > > main thread would call Process.Execute(); and the rows would be
> > > handled on background thread and the Main thread would continue with
> > > processing?
> >
> > > On Jan 8, 2:32 am, Simone Busoli <simone.bus...@gmail.com> wrote:
> > > > inline
> >
> > > > On Fri, Jan 8, 2010 at 03:08, Jason Meckley <jasonmeck...@gmail.com>
> > > wrote:
> > > > > Thanks Simone makes sense. One thing I'm a bit cloudy on is this
> > > > > statement:
> > > > > "you're no longer pulling from the tail but eagerly execute each
> > > > > operation which will cache the result and feed it to the following
> > > > > operation
> > > > > as soon as it asks for it. "
> >
> > > > > I understand how yield works. I don't understand what you mean by
> > > > > "tail" and "eagerly executing". I'm assuming all operations of an
> > > > > EtlProcess are either Single Threaded or Thread Pooled. Can you
> > > > > selectively change which Pipeline Executor is used per operation.
> If
> > > > > so, what would be the benefit of putting some operations on a
> single
> > > > > thread, and others on the thread pool queue?
> >
> > > > This thread explains how it works:
> > >http://groups.google.com/group/rhino-tools-dev/browse_thread/thread/c.
> ..
> > > > BTW, I suggest you use the SingleThreaded one and debug a test like
> that.
> > > > You'll realize how it works, and then just take a look at the
> > > > implementations on the two pipeline executors, which is a
> > > > single overridden method.
> >
> > > > > When you used branching as an insert, update or delete how did you
> > > > > separate the rows? I'm assuming you had a validation clause in each
> > > > > operation checking this flag.
> >
> > > > I just added a new entry in each row with key, say "action", and
> value
> > > > either "toInsert", "toDelete" or "toUpdate". Then branched on them
> and
> > > each
> > > > of the three branches just performed its own action on the rows
> having
> > > the
> > > > action they knew about.
> >
> > > > > On Jan 7, 5:27 pm, Simone Busoli <simone.bus...@gmail.com> wrote:
> > > > > > Hi Jason, inline
> >
> > > > > > On Thu, Jan 7, 2010 at 21:30, Jason Meckley <
> jasonmeck...@gmail.com>
> > > > > wrote:
> > > > > > > I am experimenting with ETL and I have a few questions. So in
> no
> > > > > > > particular order.
> >
> > > > > > > 1. Branching
> > > > > > > I'm trying to understand branching. Branching allows you to
> copy
> > > the
> > > > > > > row into multiple operations. Each operation can manipulate the
> row
> > > > > > > independent of the other operations on the branch(sending it to
> > > > > > > separate outputs, aggregating the data differently, etc.),
> correct?
> > > > > > > The test for BranchingOperation repeats the FibonacciBulkInsert
> > > > > > > operation a specific number of times, but I don't readily see
> the
> > > > > > > value since each operation is the same, other than low memory
> > > > > > > consumption.
> >
> > > > > > branching will feed clones of its input rows to all the
> operations it
> > > > > > branches on. I can give you a use case in which I adopted it. I
> had a
> > > > > list
> > > > > > of rows and hat to decide whether each row was to delete, insert
> or
> > > > > update.
> > > > > > In an operation I marked each row with a marker attribute as a
> field
> > > in
> > > > > the
> > > > > > row itself which said whether the row was to delete, insert or
> > > update,
> > > > > then
> > > > > > I branched the rows into three branches, one to deal with rows to
> > > delete,
> > > > > > another with rows to insert and the other one with those to
> update.
> > > Each
> > > > > > branch dealt only and performed actions just on the rows whose
> marker
> > > > > > attribute has the value it knew about (again, delete, insert,
> update)
> >
> > > > > > > 2. Pipeline Executors
> > > > > > > Is the ThreadPoolPipelineExecuter designed to execute multiple
> ETL
> > > > > > > Processes simultaneously? At I first thought it would execute
> each
> > > row
> > > > > > > on a separate thread, but that doesn't seem to be the case. So
> now
> > > I'm
> > > > > > > thinking It would be used like this:
> > > > > > >       new EtlProcessOne { PipelineExecuter = new
> > > > > > > ThreadPoolPipelineExecuter() }.Execute();
> > > > > > >       new EtlProcessTwo { PipelineExecuter = new
> > > > > > > ThreadPoolPipelineExecuter() }.Execute();
> > > > > > > Where One and Two are executed on separate threads in parallel
> > > rather
> > > > > > > than on the same thread in serial. So in theory One may fail,
> but
> > > Two
> > > > > > > could pass. If they use the SingleThreadedPipelineExecuter and
> One
> > > > > > > fails, Two would not execute.
> >
> > > > > > The purpose of the multi threaded one is to eagerly iterate the
> rows
> > > of
> > > > > each
> > > > > > operation using a thread from the thread pool, thus improving
> > > performance
> > > > > > since you're no longer pulling from the tail but eagerly execute
> each
> > > > > > operation which will cache the result and feed it to the
> following
> > > > > operation
> > > > > > as soon as it asks for it.
> >
> > > > > > > 2. Magic Strings
> > > > > > > I understand why a Row is basically a HashTable, but this makes
> for
> > > > > > > laborious code. I found the convenience methods ToObject<T>,
> > > > > > > FromObject, etc. but wanted a way to easily merge rows. I
> didn't
> > > > > > > readily see any options, so i build an extension method.
> > > > > > >    public static class RowExtensions
> > > > > > >    {
> > > > > > >        public static void MergeWith(this Row row, Row other)
> > > > > > >        {
> > > > > > >            foreach(var key in other.Keys)
> > > > > > >            {
> > > > > > >                row[key] = other[key];
> > > > > > >            }
> > > > > > >        }
> > > > > > >    }
> >
> > > > > > > here is the usage in terms of an AggregationOperation.
> > > > > > >    internal class AggregateData : AbstractAggregationOperation
> > > > > > >    {
> > > > > > >        protected override void Accumulate(Row row, Row
> aggregate)
> > > > > > >        {
> > > > > > >            var source = row.ToObject<Source>();
> > > > > > >            var destination = aggregate.ToObject<Destination>();
> >
> > > > > > >            destination.Grouping = source.ColumnToGroupBy;
> > > > > > >            destination.Total += source.ColumnToSum;
> >
> > > > > > >            aggregate.MergeWith(Row.FromObject(destination));
> > > > > > >        }
> >
> > > > > > >        protected override string[] GetColumnsToGroupBy()
> > > > > > >        {
> > > > > > >            return new[] { "ColumnToGroupBy" };
> > > > > > >        }
> > > > > > >    }
> > > > > > > This isn't out of line is it?
> >
> > > > > > I don't think it's out of line, whatever is working for you then
> it
> > > makes
> > > > > > sense.
> >
> > > > > > > All in all I'm excited about the possibilities with this tool!
> >
> > > > > > > --
> > > > > > > You received this message because you are subscribed to the
> Google
> > > > > Groups
> > > > > > > "Rhino Tools Dev" group.
> > > > > > > To post to this group, send email to
> > > rhino-tools-...@googlegroups.com.
> > > > > > > To unsubscribe from this group, send email to
> > > > > > > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com>
> <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com>
> >
> > > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com>
> <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com>
> >
> >
> > > > > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com>
> <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com>
> >
> > > <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com>
> <rhino-tools-dev%25252bunsubscr...@googlegroups.com<rhino-tools-dev%2525252bunsubscr...@googlegroups.com>
> >
> >
> > > > > > > .
> > > > > > > For more options, visit this group at
> > > > > > >http://groups.google.com/group/rhino-tools-dev?hl=en.
> >
> > > > > --
> > > > > You received this message because you are subscribed to the Google
> > > Groups
> > > > > "Rhino Tools Dev" group.
> > > > > To post to this group, send email to
> rhino-tools-...@googlegroups.com.
> > > > > To unsubscribe from this group, send email to
> > > > > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com>
> <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com>
> >
> > > <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com>
> <rhino-tools-dev%252bunsubscr...@googlegroups.com<rhino-tools-dev%25252bunsubscr...@googlegroups.com>
> >
> >
> > > > > .
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/rhino-tools-dev?hl=en.
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Rhino Tools Dev" group.
> > > To post to this group, send email to rhino-tools-...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com>
> <rhino-tools-dev%2bunsubscr...@googlegroups.com<rhino-tools-dev%252bunsubscr...@googlegroups.com>
> >
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/rhino-tools-dev?hl=en.
> >
> > ...
> >
> > read more ยป
>
> --
> You received this message because you are subscribed to the Google Groups
> "Rhino Tools Dev" group.
> To post to this group, send email to rhino-tools-...@googlegroups.com.
> To unsubscribe from this group, send email to
> rhino-tools-dev+unsubscr...@googlegroups.com<rhino-tools-dev%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/rhino-tools-dev?hl=en.
>
>
>
>
--
You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
To post to this group, send email to rhino-tools-...@googlegroups.com.
To unsubscribe from this group, send email to rhino-tools-dev+unsubscr...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.

Reply via email to