Hi Bill, Thanks for such a clear and concise explanation. If I'm correct, dynamic partitioner is the one explained in the ASPLOS'06 paper, and it also implements software pipelining in addition to filter fusion and fission for task and data parallelization. Is software pipelining working for multicore backend? Based on our results, I guess it's functional, since in many of our tests, we got a sequenced stream graph (no task or data parallel structs), but the resulting program ran faster than the one targeted for uniprocessor.
One more question, we noticed the -standalone option to the Main java method in case of compiling for uniprocessor (no cluster option). What type of partitioner is used for that case, could you explain the difference of that path compared to path with the -cluster x (x could be 1,2,...) option. I'm asking that because I noticed that in the case for uniprocessor target, before-partitions.dot and after-partitions.dot are equal. Regarding your benchmarks, I suppose they were run on the Raw simulator, not the cluster or multicore machine. I'm asking this, because we have been testing a set of apps written in streamit on our side, and found out that a clever selection of -cluster target and greedy/greedier algorithm gave quite better results than dynamic partitioner, but admittedly these were only short inspections, we haven't run a thorough tests. Best regards, Josip ------------------------------------------------ Mr. Sc. Josip Knezovic, Ph.D. student Faculty of Electrical Engineering and Computing Dept. of Control and Computer Engineering Unska 3, 10000 Zagreb, Croatia Tel.: +385 1 6129 619 Fax.: +385 1 6129 785 E-mail: [EMAIL PROTECTED] ------------------------------------------------ -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Thies Sent: Monday, October 27, 2008 6:33 PM To: Josip Knezović Cc: [email protected] Subject: Re: [StreamIt-users] Dynamic-Greedy partitioner difference Hi Josip, For the benefit of the rest of the list, the role of the partitioner is to adjust the granularity of filters in the stream graph to obtain good load balancing on a parallel target. This is done by fusing (combining) and fissing (splitting) filters. The dynamic programming partitioner uses a dynamic programming algorithm to estimate which filters should be fused and fissed. In a pipeline (or splitjoin), it considers all possible allocations of machine resources to all possible sub-segments of the pipeline (or splitjoin). If a splitjoin contains parallel pipelines, then it also considers all possible decompositions of that splitjoin into a pipeline of successive splitjoins. Likewise, it canonicalizes sequences of splitjoins into a single large splitjoin. While the dynamic programming algorithm is "optimal" under a certain formulation, it has important limitations: 1) Once a filter has been fused, it can not subsequently be fissed. This sequence of transformations was very important for obtaining good performance in our latest (and unfortunately unreleased) partitioner, described in ASPLOS'06. 2) Filter fission is disabled on the cluster/multicore backend, due to an incompatibility with the dynamic programming partitioner. 3) As opposed to greedy partitioners, it must only estimate the cost of fusion and fission, rather than inspecting the real cost following fusion or fission. The greedy partitioner performs a simple iterative algorithm. If there are more filters than target processors, it selects the stream structure (pipeline, splitjoin, feedbackloop) that contains the least work in the graph, and it fuses it into a single filter. (Depending on the number of processors remaining, it may fuse only half of the container -- for example, fuse each neighboring filter with itself.) If there are fewer filters than target processors, then it selects the filter with the most work in the graph, and fisses it into multiple filters. The "greedier" partitioner is a more fine-grained version of the greedy partitioner. Rather than selecting the target of transformations based on an entire stream container, it analyzes each pair of adjacent filters in the graph, and fuses the pair that has the smallest amount of work. When fission is needed, it acts analogously to the greedy partitioner. So, which partitioner should you use? Out of the partitioners in the current release, we did a comparison across our benchmark suite a few years ago. Relative to the greedier partitioner, the dynamic programming partitioner performed 1.47x better on average, but only 1.01x better at the median. Thus, approximately half of the benchmarks do better with greedy partitioning, and half with dynaimc programming partitioning. Out of the partitioners in our development version, the best algorithm (from ASPLOS'06) is currently being ported from the Raw backend to the cluster/multicore backend. On Raw, it performs up to 1.84x better than the partitioners above. We hope to release it soon. Please let me know if you have any further questions Best, -Bill On Mon, Oct 27, 2008 at 9:48 PM, Josip Knezović <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm just wondering what's the difference between dynamic partitioner (which > is, if I'm correct, a default one for cluster backend) and a greedy > (greedier) partitioner? > > Thanks, > > Josip Knezovic > > Faculty of EE & Computing > > University of Zagreb, Croatia > > _______________________________________________ > StreamIt-users mailing list > [email protected] > https://lists.csail.mit.edu/mailman/listinfo/streamit-users > > _______________________________________________ StreamIt-users mailing list [email protected] https://lists.csail.mit.edu/mailman/listinfo/streamit-users
