Re: HFile is empty if kylin.hbase.cluster.fs is set to s3

ShaoFeng Shi Thu, 10 Aug 2017 07:42:38 -0700

How about leaving empty for "kylin.hbase.cluster.fs"? This property is for
two-cluster deployment (one Hadoop for cube build, the other for query);


When be empty, the HFile will be written to default fs (HDFS in EMR), and
then load to HBase. I'm not sure whether EMR HBase (using S3 as storage)
can bulk load files from HDFS or not. If it can, that would be great as the
write performance of HDFS would be better than S3.

2017-08-10 22:29 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:

> I also thought about it, but no, it's not consistency.
>
> Consistency view is enabled. I use same s3 for my own map-reduce jobs and
> it's ok.
>
> I also checked if it lost consistency (emrfs diff). No problems.
>
> In case of inconsistency of s3 files disappear right after they were
> written and appear some time after. Hfiles didn't appear after a day, but
> _template is there.
>
> It's 100% reproducable, I think I'll investigate this problem by running
> conversion job manually.
>
> 10 авг. 2017 г. 17:18 пользователь "ShaoFeng Shi" <shaofeng...@apache.org>
> написал:
>
> Did you enable the Consistent View? This article explains the challenge
>> when using S3 directly for ETL process:
>> https://aws.amazon.com/cn/blogs/big-data/ensuring-consistenc
>> y-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl-workflows/
>>
>>
>> 2017-08-09 18:19 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>
>>> Yes, it's empty. Also I see this message in the log:
>>>
>>> 2017-08-09 09:02:35,947 WARN  [Job 
>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>> mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory
>>> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kyl
>>> in-1e436685-7102-4621-a4cb-6472b866126d
>>> /main_event_1_main/hfile/_SUCCESS
>>> 2017-08-09 09:02:36,009 WARN  [Job 
>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>> mapreduce.LoadIncrementalHFiles:252 : Skipping non-file
>>> FileStatusExt{path=s3://joom.emr.fs/home/production/bi/kylin
>>> /kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/m
>>> ain_event_1_main/hfile/_temporary/1; isDirectory=true;
>>> modification_time=0; access_time=0; owner=; group=; permission=rwxrwxrwx;
>>> isSymlink=false}
>>> 2017-08-09 09:02:36,014 WARN  [Job 
>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find
>>> any files to load in directory s3://joom.emr.fs/home/producti
>>> on/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-647
>>> 2b866126d/main_event_1_main/hfile.  Does it contain files in
>>> subdirectories that correspond to column family names?
>>>
>>> On Wed, Aug 9, 2017 at 1:15 PM, ShaoFeng Shi <shaofeng...@apache.org>
>>> wrote:
>>>
>>>> The HFile will be moved to HBase data folder when bulk load finished;
>>>> Did you check whether the HTable has data?
>>>>
>>>> 2017-08-09 17:54 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>>>
>>>>> Hi!
>>>>>
>>>>> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives.
>>>>>
>>>>> Step "Convert Cuboid Data to HFile" finished without errors.
>>>>> Statistics at the end of the job said that it has written lot's of data to
>>>>> s3.
>>>>>
>>>>> But there is no hfiles in kylin_metadata folder (kylin_metadata
>>>>> /kylin-1e436685-7102-4621-a4cb-6472b866126d/<table name>/hfile), but
>>>>> only _temporary folder and _SUCCESS file.
>>>>>
>>>>> _temporary contains hfiles inside attempt folders. it looks like there
>>>>> were not copied from _temporary to result dir. But there is no errors
>>>>> neither in kylin log, nor in reducers' logs.
>>>>>
>>>>> Then loading empty hfiles produces empty segments.
>>>>>
>>>>> Is that a bug or I'm doing something wrong?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: HFile is empty if kylin.hbase.cluster.fs is set to s3

Reply via email to