Hi Bill,
Thanks for such a clear and concise explanation. 

If I'm correct, dynamic partitioner is the one explained in the ASPLOS'06 
paper, and it also implements software pipelining in addition to filter fusion 
and fission for task and data parallelization. Is software pipelining working 
for multicore backend? Based on our results, I guess it's functional, since in 
many of our tests, we got a sequenced stream graph (no task or data parallel 
structs), but the resulting program ran faster than the one targeted for 
uniprocessor.

One more question, we noticed the -standalone option to the Main java method in 
case of compiling for uniprocessor (no cluster option). What type of 
partitioner is used for that case, could you explain the difference of that 
path compared to path with the -cluster x (x could be 1,2,...) option. I'm 
asking that because I noticed that in the case for uniprocessor target, 
before-partitions.dot and after-partitions.dot are equal.

Regarding your benchmarks, I suppose they were run on the Raw simulator, not 
the cluster or multicore machine. I'm asking this, because we have been testing 
a set of apps written in streamit on our side, and found out that a clever 
selection of -cluster target and greedy/greedier algorithm gave quite better 
results than dynamic partitioner, but admittedly these were only short 
inspections, we haven't run a thorough tests.


Best regards,

Josip

------------------------------------------------ 
Mr. Sc. Josip Knezovic, Ph.D. student 
Faculty of Electrical Engineering and Computing
Dept. of Control and Computer Engineering

Unska 3, 10000 Zagreb, Croatia 
Tel.: +385 1 6129 619 
Fax.: +385 1 6129 785 
E-mail: [EMAIL PROTECTED] 
------------------------------------------------ 


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Thies
Sent: Monday, October 27, 2008 6:33 PM
To: Josip Knezović
Cc: [email protected]
Subject: Re: [StreamIt-users] Dynamic-Greedy partitioner difference

Hi Josip,

For the benefit of the rest of the list, the role of the partitioner
is to adjust the granularity of filters in the stream graph to obtain
good load balancing on a parallel target.  This is done by fusing
(combining) and fissing (splitting) filters.

The dynamic programming partitioner uses a dynamic programming
algorithm to estimate which filters should be fused and fissed.  In a
pipeline (or splitjoin), it considers all possible allocations of
machine resources to all possible sub-segments of the pipeline (or
splitjoin).  If a splitjoin contains parallel pipelines, then it also
considers all possible decompositions of that splitjoin into a
pipeline of successive splitjoins.  Likewise, it canonicalizes
sequences of splitjoins into a single large splitjoin.

While the dynamic programming algorithm is "optimal" under a certain
formulation, it has important limitations:

1) Once a filter has been fused, it can not subsequently be fissed.
This sequence of transformations was very important for obtaining good
performance in our latest (and unfortunately unreleased) partitioner,
described in ASPLOS'06.

2) Filter fission is disabled on the cluster/multicore backend, due to
an incompatibility with the dynamic programming partitioner.

3) As opposed to greedy partitioners, it must only estimate the cost
of fusion and fission, rather than inspecting the real cost following
fusion or fission.

The greedy partitioner performs a simple iterative algorithm.  If
there are more filters than target processors, it selects the stream
structure (pipeline, splitjoin, feedbackloop) that contains the least
work in the graph, and it fuses it into a single filter.  (Depending
on the number of processors remaining, it may fuse only half of the
container -- for example, fuse each neighboring filter with itself.)
If there are fewer filters than target processors, then it selects the
filter with the most work in the graph, and fisses it into multiple
filters.

The "greedier" partitioner is a more fine-grained version of the
greedy partitioner.  Rather than selecting the target of
transformations based on an entire stream container, it analyzes each
pair of adjacent filters in the graph, and fuses the pair that has the
smallest amount of work.  When fission is needed, it acts analogously
to the greedy partitioner.

So, which partitioner should you use?  Out of the partitioners in the
current release, we did a comparison across our benchmark suite a few
years ago.  Relative to the greedier partitioner, the dynamic
programming partitioner performed 1.47x better on average, but only
1.01x better at the median.  Thus, approximately half of the
benchmarks do better with greedy partitioning, and half with dynaimc
programming partitioning.

Out of the partitioners in our development version, the best algorithm
(from ASPLOS'06) is currently being ported from the Raw backend to the
cluster/multicore backend.  On Raw, it performs up to 1.84x better
than the partitioners above.  We hope to release it soon.

Please let me know if you have any further questions

Best,
-Bill

On Mon, Oct 27, 2008 at 9:48 PM, Josip Knezović <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm just wondering what's the difference between dynamic partitioner (which
> is, if I'm correct, a default one for cluster backend) and a greedy
> (greedier) partitioner?
>
> Thanks,
>
> Josip Knezovic
>
> Faculty of EE & Computing
>
> University of Zagreb, Croatia
>
> _______________________________________________
> StreamIt-users mailing list
> [email protected]
> https://lists.csail.mit.edu/mailman/listinfo/streamit-users
>
>

_______________________________________________
StreamIt-users mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/streamit-users

Reply via email to