Thanks Elliot. Nice christmas present. Those settings in that stackoverflow link look to me to be exactly what i need to set for MR jobs to pick that data up that Tez created.
Cheers, Stephen. On Sun, Dec 25, 2016 at 2:45 AM, Elliot West <tea...@gmail.com> wrote: > I believe that tez will generate subfolders for unioned data. As far as I > know, this is the expected behaviour and there is no alternative. > Presumably this is to prevent multiple tasks from attempting to write the > same file? > > We've experienced issues when switching from mr to tez; downstream jobs > weren't expecting subfolders and had trouble reading previously accessible > datasets. > > Apparently there are workarounds within Hive: > http://stackoverflow.com/questions/39511585/hive- > create-table-not-insert-data > > Merry Christmas, > > Elliot. > > On Sun, 25 Dec 2016 at 03:11, Rajesh Balamohan <rbalamo...@apache.org> > wrote: > >> Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the >> select query? >> >> Assuming you are creating the table in staging.db, it would have created >> the table location as staging.db/foo (as you have not specified the >> location). >> >> Adding user@hive.apache.org as this is hive related. >> >> >> ~Rajesh.B >> >> On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague <sprag...@gmail.com> >> wrote: >> >> all, >> >> i'm running tez with the sql pattern: >> >> * create table foo as select * from (select... UNION select... UNION >> select...) >> >> in the logs the final step is this: >> >> * Moving data to directory hdfs://dwrnn1.sv2.trulia.com: >> 8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4 from hdfs:// >> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/ >> staging.db/.hive-staging_hive_2016-12-24_10-05-40_048_ >> 4896412314807355668-899/-ext-10002 >> >> >> when querying the table i got zero rows returned which made me curious. >> so i queried the hdfs location and see this: >> >> $ hdfs dfs -ls hdfs://dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/ >> staging.db/tmp_pv_v4c__loc_4 >> >> Found 3 items >> drwxrwxrwx - dwr supergroup 0 2016-12-24 10:05 hdfs:// >> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/ >> staging.db/tmp_pv_v4c__loc_4/1 >> drwxrwxrwx - dwr supergroup 0 2016-12-24 10:06 hdfs:// >> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/ >> staging.db/tmp_pv_v4c__loc_4/2 >> drwxrwxrwx - dwr supergroup 0 2016-12-24 10:06 hdfs:// >> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/ >> staging.db/tmp_pv_v4c__loc_4/3 >> >> and yes the data files are under these three dirs. >> >> so i ask... i'm not used to seeing sub-directories under the tablename >> unless the table is partitioned. is this legit? might there be some config >> settings i need to set to see this data via sql? >> >> thanks, >> Stephen. >> >> >> >> >> >> >> >>