By the way, just for clarification, these queries are used for gathering performance data.
Zheng From: Zheng Shao Sent: Monday, June 22, 2009 10:37 PM To: 'email@example.com' Subject: asking for comments on benchmark queries Hi Pig team, We'd like to get your feedback on a set of queries we implemented on Pig. We've attached the hadoop configuration and pig queries in the email. We start the queries by issuing "pig xxx.pig". The queries are from SIGMOD'2009 paper. More details are at https://issues.apache.org/jira/browse/HIVE-396 (Shall we open a JIRA on PIG for this?) One improvement is that we are going to change hadoop to use LZO as intermediate compression algorithm very soon. Previously we used gzip for all performance tests including hadoop, hive and pig. The reason that we specify the number of reducers in the query is to try to match the same number of reducer as Hive automatically suggested. Please let us know what is the best way to set the number of reducers in Pig. Are there any other improvements we can make to the Pig query and the hadoop configuration? Thanks, Zheng