I have run into similar issue with avro files . The solution was to fix upstream jobs that were writing data to those directories . In our case the writers were not flushed/closed correctly during certain events which caused the issue . Fixing those prevented these 0 sized files.
-NS On Wed, May 2, 2018 at 1:52 AM, Mahender Sarangam < mahender.bigd...@outlook.com> wrote: > ping.. > > On 5/1/2018 3:57 AM, Mahender Sarangam wrote: > > Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325 > (Folder) and same has Filename. actually in our upstream, our source table > is partitioned by Date. Whenever a table is partitioned, we see Zero Byte. > Now when we create external table with partitioned by columns and fire > select query no data is returned. . If I delete manually those files (Zero > Bytes), we were able to read. > > > /Mahender > > On 4/28/2018 6:36 AM, Thai Bui wrote: > > Your external table is referencing the .../day=201803250 location which is > empty. Point your table to the capital .../DAY=201803250 and you should be > able to read the data there. > > Also, it looks like you want external partitioned table. You’ll need to > create an external table with a partition clause, then alter the table and > add partition for each of the ../DAY=someday path that you have. > > On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam < > mahender.bigd...@outlook.com> wrote: > >> Gentle Ping. Please help me on below issue. Has any one faced same issue >> >> On 4/27/2018 1:28 AM, Mahender Sarangam wrote: >> >> Hi, >> >> Can any one faced issue while fetching data from external table. We are >> copying data from upstream system into our storage S3. As part of copy, >> directories along with Zero bytes files are been copied. Source File Format >> is in JSON format. Below is Folder Hierarchy Structure >> >> >> DATE --> <Folder> >> >> <DAY=201803250> ---> Folder >> >> 1.json.gz --> File >> >> 2.json.gz >> >> <day=201803250> ---> Empty Zero Bytes Files. >> >> Please find below screenshot >> >> We are trying to create external table with JSON Serde. >> >> ADD JAR wasb://jsonse...@xyz.blob.core.windows.net/json/json- >> serde-1.3.9.jar; >> SET hive.mapred.supports.subdirectories=TRUE; >> SET mapred.input.dir.recursive=TRUE; >> SET hive.merge.mapfiles = true; >> SET hive.merge.mapredfiles = true; >> SET hive.merge.tezfiles = true; >> >> >> DROP TABLE IF EXISTS Ext_STG1; >> CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) >> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH >> SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" = >> "true") >> STORED AS TEXTFILE LOCATION 'wasb://contain...@xyz.blob. >> core.windows.net/date/day=201803250/' TBLPROPERTIES >> ('serialization.null.format' = ''); >> >> select * from Ext_STG1 limit 100; >> >> >> Above Query shows Empty Results. >> >> >> When I delete Zero bytes files, then i could see data from select >> external table. Is this expected behaviour. Is there any setting for >> ignoring Zero bytes files in hive external table >> >> >> -Mahens >> >> >> -- > Thai > > > >