[ https://issues.apache.org/jira/browse/PIG-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1537: -------------------------------- Assignee: Daniel Dai Fix Version/s: 0.8.0 Daniel, can we test if this is a problem with 0.8 Viraj, is this data specific and if so can you provide data tp reproduce. Also, do you know which one produces correct results. > Column pruner causes wrong results when using both Custom Store UDF and > PigStorage > ---------------------------------------------------------------------------------- > > Key: PIG-1537 > URL: https://issues.apache.org/jira/browse/PIG-1537 > Project: Pig > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Viraj Bhat > Assignee: Daniel Dai > Fix For: 0.8.0 > > > I have script which is of this pattern and it uses 2 StoreFunc's: > {code} > register loader.jar > register piggy-bank/java/build/storage.jar; > %DEFAULT OUTPUTDIR /user/viraj/prunecol/ > ss_sc_0 = LOAD '/data/click/20100707/0' USING Loader() AS (a, b, c); > ss_sc_filtered_0 = FILTER ss_sc_0 BY > a#'id' matches '1.*' OR > a#'id' matches '2.*' OR > a#'id' matches '3.*' OR > a#'id' matches '4.*'; > ss_sc_1 = LOAD '/data/click/20100707/1' USING Loader() AS (a, b, c); > ss_sc_filtered_1 = FILTER ss_sc_1 BY > a#'id' matches '65.*' OR > a#'id' matches '466.*' OR > a#'id' matches '043.*' OR > a#'id' matches '044.*' OR > a#'id' matches '0650.*' OR > a#'id' matches '001.*'; > ss_sc_all = UNION ss_sc_filtered_0,ss_sc_filtered_1; > ss_sc_all_proj = FOREACH ss_sc_all GENERATE > a#'query' as query, > a#'testid' as testid, > a#'timestamp' as timestamp, > a, > b, > c; > ss_sc_all_ord = ORDER ss_sc_all_proj BY query,testid,timestamp PARALLEL 10; > ss_sc_all_map = FOREACH ss_sc_all_ord GENERATE a, b, c; > STORE ss_sc_all_map INTO '$OUTPUTDIR/data/20100707' using Storage(); > ss_sc_all_map_count = group ss_sc_all_map all; > count = FOREACH ss_sc_all_map_count GENERATE 'record_count' as > record_count,COUNT($1); > STORE count INTO '$OUTPUTDIR/count/20100707' using PigStorage('\u0009'); > {code} > I run this script using: > a) java -cp pig0.7.jar script.pig > b) java -cp pig0.7.jar -t PruneColumns script.pig > What I observe is that the alias "count" produces the same number of records > but "ss_sc_all_map" have different sizes when run with above 2 options. > Is due to the fact that there are 2 store func's used? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.