Limit 4 would make processing of join stop after 4 records. It is not a
good idea to add it if you are testing performance of join.

On Tue, Dec 6, 2016 at 8:13 PM mingda li <limingda1...@gmail.com> wrote:

> Thanks for your quick reply. If so, I can use the limit operator to compare
>
> good and bad join plan. It takes time to dump all.
>
>
>
> Bests,
>
> Mingda
>
>
>
>
>
> On Tue, Dec 6, 2016 at 5:23 PM, Zhang, Liyun <liyun.zh...@intel.com>
> wrote:
>
>
>
> > Hi:
>
> >    I think the query time about multiple join part is not related with
> the
>
> > number of limit operator(in your case the number is 4). When the query is
>
> > executed, limit_data is executed after Bad_OrderRes, after join
>
> > (Bad_OrderRes) is finished, limit(limit_data) starts.
>
> > If I have missed something, please tell me.
>
> >
>
> >
>
> > Best Regards
>
> > Kelly Zhang/Zhang,Liyun
>
> >
>
> >
>
> >
>
> > -----Original Message-----
>
> > From: mingda li [mailto:limingda1...@gmail.com]
>
> > Sent: Wednesday, December 7, 2016 8:18 AM
>
> > To: d...@pig.apache.org; user@pig.apache.org
>
> > Subject: How to test the efficiency of multiple join
>
> >
>
> > Dear all,
>
> >
>
> > I want to test the different multiple join orders' efficiency. However,
>
> > since the pig query is executed lazily, I need to use dump or store to
> let
>
> > the query be executed.
>
> >
>
> > Now, I use the following query to test the efficiency.
>
> >
>
> > *Bad_OrderIn = JOIN inventory BY  inv_item_sk, catalog_sales BY
>
> > cs_item_sk;*
>
> > *Bad_OrderRes = JOIN Bad_OrderIn  BY   (cs_item_sk, cs_order_number),
>
> > catalog_returns BY (cr_item_sk, cr_order_number);* *limit_data = LIMIT
>
> > Bad_OrderRes 4; * *Dump limit_data;*
>
> >
>
> > Do you think this is OK to just show 4 of results? Could this query
>
> > execution time represent the efficiency of multilpe join? I am not sure
> if
>
> > it will just get 4 items and stop without executing other items.
>
> >
>
> > Bests,
>
> > Mingda
>
> >
>
>

Reply via email to