How to test the efficiency of multiple join

2016-12-06 Thread mingda li
Dear all, I want to test the different multiple join orders' efficiency. However, since the pig query is executed lazily, I need to use dump or store to let the query be executed. Now, I use the following query to test the efficiency. *Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales

Re: How to test the efficiency of multiple join

2016-12-06 Thread mingda li
Thanks for your quick reply. If so, I can use the limit operator to compare good and bad join plan. It takes time to dump all. Bests, Mingda On Tue, Dec 6, 2016 at 5:23 PM, Zhang, Liyun wrote: > Hi: >I think the query time about multiple join part is not related with the > number of limit

File could only be replicated to 0 nodes, instead of 1

2016-12-06 Thread mingda li
Hi, I am running a multiple join of 100G TPC-DS data with bad order on our cluster. And each time, it returns such log file to me with the exception: Has anyone ever met it? Is it caused by too much data more than disk space? * org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp