[
https://issues.apache.org/jira/browse/PIG-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich resolved PIG-1537.
---------------------------------
Resolution: Fixed
> Column pruner causes wrong results when using both Custom Store UDF and
> PigStorage
> ----------------------------------------------------------------------------------
>
> Key: PIG-1537
> URL: https://issues.apache.org/jira/browse/PIG-1537
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Viraj Bhat
> Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> I have script which is of this pattern and it uses 2 StoreFunc's:
> {code}
> register loader.jar
> register piggy-bank/java/build/storage.jar;
> %DEFAULT OUTPUTDIR /user/viraj/prunecol/
> ss_sc_0 = LOAD '/data/click/20100707/0' USING Loader() AS (a, b, c);
> ss_sc_filtered_0 = FILTER ss_sc_0 BY
> a#'id' matches '1.*' OR
> a#'id' matches '2.*' OR
> a#'id' matches '3.*' OR
> a#'id' matches '4.*';
> ss_sc_1 = LOAD '/data/click/20100707/1' USING Loader() AS (a, b, c);
> ss_sc_filtered_1 = FILTER ss_sc_1 BY
> a#'id' matches '65.*' OR
> a#'id' matches '466.*' OR
> a#'id' matches '043.*' OR
> a#'id' matches '044.*' OR
> a#'id' matches '0650.*' OR
> a#'id' matches '001.*';
> ss_sc_all = UNION ss_sc_filtered_0,ss_sc_filtered_1;
> ss_sc_all_proj = FOREACH ss_sc_all GENERATE
> a#'query' as query,
> a#'testid' as testid,
> a#'timestamp' as timestamp,
> a,
> b,
> c;
> ss_sc_all_ord = ORDER ss_sc_all_proj BY query,testid,timestamp PARALLEL 10;
> ss_sc_all_map = FOREACH ss_sc_all_ord GENERATE a, b, c;
> STORE ss_sc_all_map INTO '$OUTPUTDIR/data/20100707' using Storage();
> ss_sc_all_map_count = group ss_sc_all_map all;
> count = FOREACH ss_sc_all_map_count GENERATE 'record_count' as
> record_count,COUNT($1);
> STORE count INTO '$OUTPUTDIR/count/20100707' using PigStorage('\u0009');
> {code}
> I run this script using:
> a) java -cp pig0.7.jar script.pig
> b) java -cp pig0.7.jar -t PruneColumns script.pig
> What I observe is that the alias "count" produces the same number of records
> but "ss_sc_all_map" have different sizes when run with above 2 options.
> Is due to the fact that there are 2 store func's used?
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.