I have run into similar issue with avro files . The solution was to fix
upstream jobs  that were writing data to those directories . In our case
the  writers were not flushed/closed correctly during certain  events which
caused the   issue . Fixing those prevented these 0 sized files.

-NS

On Wed, May 2, 2018 at 1:52 AM, Mahender Sarangam <
mahender.bigd...@outlook.com> wrote:

> ping..
>
> On 5/1/2018 3:57 AM, Mahender Sarangam wrote:
>
> Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325
> (Folder) and same has Filename. actually in our upstream, our source table
> is partitioned by Date. Whenever a table is partitioned, we see Zero Byte.
> Now when we create external table with partitioned by columns and fire
> select query no data is returned. . If I delete manually those files (Zero
> Bytes), we were able to read.
>
>
> /Mahender
>
> On 4/28/2018 6:36 AM, Thai Bui wrote:
>
> Your external table is referencing the .../day=201803250 location which is
> empty. Point your table to the capital .../DAY=201803250 and you should be
> able to read the data there.
>
> Also, it looks like you want external partitioned table. You’ll need to
> create an external table with a partition clause, then alter the table and
> add partition for each of the ../DAY=someday path that you have.
>
> On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam <
> mahender.bigd...@outlook.com> wrote:
>
>> Gentle Ping. Please help me on below issue. Has any one faced same issue
>>
>> On 4/27/2018 1:28 AM, Mahender Sarangam wrote:
>>
>> Hi,
>>
>> Can any one faced issue while fetching data from external table. We are
>> copying data from upstream system into our storage S3. As part of copy,
>> directories along with Zero bytes files are been copied. Source File Format
>> is in JSON format.  Below is Folder Hierarchy Structure
>>
>>
>>  DATE  -->  <Folder>
>>
>>        <DAY=201803250> ---> Folder
>>
>>                      1.json.gz  --> File
>>
>>                       2.json.gz
>>
>>         <day=201803250> ---> Empty Zero Bytes Files.
>>
>> Please find below screenshot
>>
>> We are trying to create external table with JSON Serde.
>>
>> ADD JAR wasb://jsonse...@xyz.blob.core.windows.net/json/json-
>> serde-1.3.9.jar;
>>  SET hive.mapred.supports.subdirectories=TRUE;
>>  SET mapred.input.dir.recursive=TRUE;
>> SET hive.merge.mapfiles = true;
>> SET hive.merge.mapredfiles = true;
>> SET hive.merge.tezfiles = true;
>>
>>
>>  DROP TABLE IF EXISTS Ext_STG1;
>>  CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String)
>> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH
>> SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" =
>> "true")
>> STORED AS TEXTFILE LOCATION 'wasb://contain...@xyz.blob.
>> core.windows.net/date/day=201803250/' TBLPROPERTIES
>> ('serialization.null.format' = '');
>>
>> select * from Ext_STG1 limit 100;
>>
>>
>> Above Query shows Empty Results.
>>
>>
>> When I delete Zero bytes files, then i could see data from select
>> external table. Is this expected behaviour. Is there any setting for
>> ignoring Zero bytes files in hive external table
>>
>>
>> -Mahens
>>
>>
>> --
> Thai
>
>
>
>

Reply via email to