Hi Alexander,

That makes sense. Using S3 for Cube build and storage is required for a
cloud hadoop environment.

I tried to reproduce this problem. I created a EMR with S3 as HBase
storage, in kylin.properties, I set "kylin.env.hdfs-working-dir"
and "kylin.storage.hbase.cluster-fs" to the S3 bucket. But in the "Convert
Cuboid Data to HFile" step, Kylin still writes to local HDFS; Did you
modify the core-site.xml to make S3 as the default FS?




2017-08-10 22:53 GMT+08:00 Alexander Sterligov <[email protected]>:

> Yes, I workarounded this problem in such way and it works.
>
> One problem of such solution is that I have to use pretty large hdfs and
> it'expensive. And also I have to manually garbage collect it, because it is
> not moved to s3, but copied. Kylin cleanup job doesn't work for it, because
> main metadata folder is at s3. So it would be really nice to put everything
> to s3.
>
> Another problem is that I had to rise hbase rpc timeout, because bulk
> loading from hdfs takes long. That was not trivial. 3 minutes work good,
> but with drawback of queries or metadata writes handing for 3 minutes if
> something bad happen. But that's rare event.
>
> 10 авг. 2017 г. 17:42 пользователь "ShaoFeng Shi" <[email protected]>
> написал:
>
> How about leaving empty for "kylin.hbase.cluster.fs"? This property is
>> for two-cluster deployment (one Hadoop for cube build, the other for
>> query);
>>
>> When be empty, the HFile will be written to default fs (HDFS in EMR), and
>> then load to HBase. I'm not sure whether EMR HBase (using S3 as storage)
>> can bulk load files from HDFS or not. If it can, that would be great as the
>> write performance of HDFS would be better than S3.
>>
>> 2017-08-10 22:29 GMT+08:00 Alexander Sterligov <[email protected]>:
>>
>>> I also thought about it, but no, it's not consistency.
>>>
>>> Consistency view is enabled. I use same s3 for my own map-reduce jobs
>>> and it's ok.
>>>
>>> I also checked if it lost consistency (emrfs diff). No problems.
>>>
>>> In case of inconsistency of s3 files disappear right after they were
>>> written and appear some time after. Hfiles didn't appear after a day, but
>>> _template is there.
>>>
>>> It's 100% reproducable, I think I'll investigate this problem by running
>>> conversion job manually.
>>>
>>> 10 авг. 2017 г. 17:18 пользователь "ShaoFeng Shi" <
>>> [email protected]> написал:
>>>
>>> Did you enable the Consistent View? This article explains the challenge
>>>> when using S3 directly for ETL process:
>>>> https://aws.amazon.com/cn/blogs/big-data/ensuring-consistenc
>>>> y-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl-workflows/
>>>>
>>>>
>>>> 2017-08-09 18:19 GMT+08:00 Alexander Sterligov <[email protected]>:
>>>>
>>>>> Yes, it's empty. Also I see this message in the log:
>>>>>
>>>>> 2017-08-09 09:02:35,947 WARN  [Job 
>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>>>> mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory
>>>>> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kyl
>>>>> in-1e436685-7102-4621-a4cb-6472b866126d
>>>>> /main_event_1_main/hfile/_SUCCESS
>>>>> 2017-08-09 09:02:36,009 WARN  [Job 
>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>>>> mapreduce.LoadIncrementalHFiles:252 : Skipping non-file
>>>>> FileStatusExt{path=s3://joom.emr.fs/home/production/bi/kylin
>>>>> /kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/m
>>>>> ain_event_1_main/hfile/_temporary/1; isDirectory=true;
>>>>> modification_time=0; access_time=0; owner=; group=; permission=rwxrwxrwx;
>>>>> isSymlink=false}
>>>>> 2017-08-09 09:02:36,014 WARN  [Job 
>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>>>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not
>>>>> find any files to load in directory s3://joom.emr.fs/home/producti
>>>>> on/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-647
>>>>> 2b866126d/main_event_1_main/hfile.  Does it contain files in
>>>>> subdirectories that correspond to column family names?
>>>>>
>>>>> On Wed, Aug 9, 2017 at 1:15 PM, ShaoFeng Shi <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The HFile will be moved to HBase data folder when bulk load finished;
>>>>>> Did you check whether the HTable has data?
>>>>>>
>>>>>> 2017-08-09 17:54 GMT+08:00 Alexander Sterligov <[email protected]>:
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives.
>>>>>>>
>>>>>>> Step "Convert Cuboid Data to HFile" finished without errors.
>>>>>>> Statistics at the end of the job said that it has written lot's of data 
>>>>>>> to
>>>>>>> s3.
>>>>>>>
>>>>>>> But there is no hfiles in kylin_metadata folder (kylin_metadata
>>>>>>> /kylin-1e436685-7102-4621-a4cb-6472b866126d/<table name>/hfile),
>>>>>>> but only _temporary folder and _SUCCESS file.
>>>>>>>
>>>>>>> _temporary contains hfiles inside attempt folders. it looks like
>>>>>>> there were not copied from _temporary to result dir. But there is no 
>>>>>>> errors
>>>>>>> neither in kylin log, nor in reducers' logs.
>>>>>>>
>>>>>>> Then loading empty hfiles produces empty segments.
>>>>>>>
>>>>>>> Is that a bug or I'm doing something wrong?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>>
>>>>>> Shaofeng Shi 史少锋
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to