Issues in the number of reducers spawn in a pig job

rajan sthapit Thu, 28 Jan 2016 11:20:05 -0800

I have this pig job

something like this


data = LOAD 'database.table' USING org.apache.hive.hcatalog.pig.HCatLoader();

SPLIT data INTO A if type == 'A', B otherwise

    A = DISTINCT A PARALLEL 200;
    B = DISTINCT B PARALLEL 10;

STORE A INTO 'database.output_table' USING
org.apache.hive.hcatalog.pig.HCatStorer('type=A');
STORE B INTO 'database.output_table' USING
org.apache.hive.hcatalog.pig.HCatStorer('type=B');

However, it runs as a single mapreduce job and results into both A and B
partitions having 200 files. I want the number of files for A to be 200 and
for B to be 10. How can I fix this?

Issues in the number of reducers spawn in a pig job

Reply via email to