What if we shall add direct output in kylin_job_conf.xml
and kylin_job_conf_inmem.xml?

hbase.zookeeper.quorum for example doesn't work if not specified in these
configs.

On Fri, Aug 11, 2017 at 3:13 PM, ShaoFeng Shi <shaofeng...@apache.org>
wrote:

> EMR enables the direct output in mapred-site.xml, while in this step it
> seems these settings doesn't work (althoug the job's configuration shows
> they are there). I disabled the direct output but the behavior has no
> change. I did some search but no finding. I need drop the EMR now, and may
> get back it later.
>
> If you have any idea or findings, please share it. We'd like to make Kylin
> has better support for cloud.
>
> Thanks for your feedback!
>
> 2017-08-11 19:19 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>
>> Any ideas how to fix that?
>>
>> On Fri, Aug 11, 2017 at 2:16 PM, ShaoFeng Shi <shaofeng...@apache.org>
>> wrote:
>>
>>> I got the same problem as you:
>>>
>>> 2017-08-11 08:44:16,342 WARN  [Job 
>>> 2c86b4b6-7639-4a97-ba63-63c9dca095f6-2255]
>>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find
>>> any files to load in directory s3://privatekeybucket-anac5h41
>>> 523l/kylin/kylin_default_instance/kylin-2c86b4b6-7639-4a97-
>>> ba63-63c9dca095f6/kylin_sales_cube_clone3/hfile.  Does it contain files
>>> in subdirectories that correspond to column family names?
>>>
>>> In S3 view, I see the files exist in "_temporary" folder, seems were not
>>> moved to the target folder on complete. It seems EMR try to direct write to
>>> otuput path, but actually not.
>>>
>>> 2017-08-11 16:34 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>>
>>>> No, defaultFs is hdfs.
>>>>
>>>> I’ve seen such behavior when set working dir to s3, but didn’t set
>>>> cluster-fs at all. Maybe you have a typo in the name of the property. I
>>>> used the old one «kylin.hbase.cluster.fs»
>>>>
>>>> When both working-dir and cluster-fs were set to s3 I got _temporary
>>>> dir of convert job at s3, but no hfiles. Also I saw correct output path for
>>>> the job in the log. But I didn’t check if job creates temporary files in
>>>> s3, but then copies results to hdfs. I hardly believe it happens.
>>>>
>>>> Do you see proper arguments for the step in the log?
>>>>
>>>>
>>>> 11 авг. 2017 г., в 11:17, ShaoFeng Shi <shaofeng...@apache.org>
>>>> написал(а):
>>>>
>>>> Hi Alexander,
>>>>
>>>> That makes sense. Using S3 for Cube build and storage is required for a
>>>> cloud hadoop environment.
>>>>
>>>> I tried to reproduce this problem. I created a EMR with S3 as HBase
>>>> storage, in kylin.properties, I set "kylin.env.hdfs-working-dir"
>>>> and "kylin.storage.hbase.cluster-fs" to the S3 bucket. But in the "Convert
>>>> Cuboid Data to HFile" step, Kylin still writes to local HDFS; Did you
>>>> modify the core-site.xml to make S3 as the default FS?
>>>>
>>>>
>>>>
>>>>
>>>> 2017-08-10 22:53 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>>>
>>>>> Yes, I workarounded this problem in such way and it works.
>>>>>
>>>>> One problem of such solution is that I have to use pretty large hdfs
>>>>> and it'expensive. And also I have to manually garbage collect it, because
>>>>> it is not moved to s3, but copied. Kylin cleanup job doesn't work for it,
>>>>> because main metadata folder is at s3. So it would be really nice to put
>>>>> everything to s3.
>>>>>
>>>>> Another problem is that I had to rise hbase rpc timeout, because bulk
>>>>> loading from hdfs takes long. That was not trivial. 3 minutes work good,
>>>>> but with drawback of queries or metadata writes handing for 3 minutes if
>>>>> something bad happen. But that's rare event.
>>>>>
>>>>> 10 авг. 2017 г. 17:42 пользователь "ShaoFeng Shi" <
>>>>> shaofeng...@apache.org> написал:
>>>>>
>>>>> How about leaving empty for "kylin.hbase.cluster.fs"? This property
>>>>>> is for two-cluster deployment (one Hadoop for cube build, the other for
>>>>>> query);
>>>>>>
>>>>>> When be empty, the HFile will be written to default fs (HDFS in EMR),
>>>>>> and then load to HBase. I'm not sure whether EMR HBase (using S3 as
>>>>>> storage) can bulk load files from HDFS or not. If it can, that would be
>>>>>> great as the write performance of HDFS would be better than S3.
>>>>>>
>>>>>> 2017-08-10 22:29 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>>>>>
>>>>>>> I also thought about it, but no, it's not consistency.
>>>>>>>
>>>>>>> Consistency view is enabled. I use same s3 for my own map-reduce
>>>>>>> jobs and it's ok.
>>>>>>>
>>>>>>> I also checked if it lost consistency (emrfs diff). No problems.
>>>>>>>
>>>>>>> In case of inconsistency of s3 files disappear right after they were
>>>>>>> written and appear some time after. Hfiles didn't appear after a day, 
>>>>>>> but
>>>>>>> _template is there.
>>>>>>>
>>>>>>> It's 100% reproducable, I think I'll investigate this problem by
>>>>>>> running conversion job manually.
>>>>>>>
>>>>>>> 10 авг. 2017 г. 17:18 пользователь "ShaoFeng Shi" <
>>>>>>> shaofeng...@apache.org> написал:
>>>>>>>
>>>>>>> Did you enable the Consistent View? This article explains the
>>>>>>>> challenge when using S3 directly for ETL process:
>>>>>>>> https://aws.amazon.com/cn/blogs/big-data/ensuring-consistenc
>>>>>>>> y-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl-
>>>>>>>> workflows/
>>>>>>>>
>>>>>>>>
>>>>>>>> 2017-08-09 18:19 GMT+08:00 Alexander Sterligov <sterligo...@joom.it
>>>>>>>> >:
>>>>>>>>
>>>>>>>>> Yes, it's empty. Also I see this message in the log:
>>>>>>>>>
>>>>>>>>> 2017-08-09 09:02:35,947 WARN  [Job 
>>>>>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>>>>>>>> mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory
>>>>>>>>> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kyl
>>>>>>>>> in-1e436685-7102-4621-a4cb-6472b866126d
>>>>>>>>> /main_event_1_main/hfile/_SUCCESS
>>>>>>>>> 2017-08-09 09:02:36,009 WARN  [Job 
>>>>>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>>>>>>>> mapreduce.LoadIncrementalHFiles:252 : Skipping non-file
>>>>>>>>> FileStatusExt{path=s3://joom.emr.fs/home/production/bi/kylin
>>>>>>>>> /kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/m
>>>>>>>>> ain_event_1_main/hfile/_temporary/1; isDirectory=true;
>>>>>>>>> modification_time=0; access_time=0; owner=; group=; 
>>>>>>>>> permission=rwxrwxrwx;
>>>>>>>>> isSymlink=false}
>>>>>>>>> 2017-08-09 09:02:36,014 WARN  [Job 
>>>>>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608]
>>>>>>>>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not
>>>>>>>>> find any files to load in directory s3://joom.emr.fs/home/producti
>>>>>>>>> on/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-647
>>>>>>>>> 2b866126d/main_event_1_main/hfile.  Does it contain files in
>>>>>>>>> subdirectories that correspond to column family names?
>>>>>>>>>
>>>>>>>>> On Wed, Aug 9, 2017 at 1:15 PM, ShaoFeng Shi <
>>>>>>>>> shaofeng...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> The HFile will be moved to HBase data folder when bulk load
>>>>>>>>>> finished; Did you check whether the HTable has data?
>>>>>>>>>>
>>>>>>>>>> 2017-08-09 17:54 GMT+08:00 Alexander Sterligov <
>>>>>>>>>> sterligo...@joom.it>:
>>>>>>>>>>
>>>>>>>>>>> Hi!
>>>>>>>>>>>
>>>>>>>>>>> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives.
>>>>>>>>>>>
>>>>>>>>>>> Step "Convert Cuboid Data to HFile" finished without errors.
>>>>>>>>>>> Statistics at the end of the job said that it has written lot's of 
>>>>>>>>>>> data to
>>>>>>>>>>> s3.
>>>>>>>>>>>
>>>>>>>>>>> But there is no hfiles in kylin_metadata folder (kylin_metadata
>>>>>>>>>>> /kylin-1e436685-7102-4621-a4cb-6472b866126d/<table
>>>>>>>>>>> name>/hfile), but only _temporary folder and _SUCCESS file.
>>>>>>>>>>>
>>>>>>>>>>> _temporary contains hfiles inside attempt folders. it looks like
>>>>>>>>>>> there were not copied from _temporary to result dir. But there is 
>>>>>>>>>>> no errors
>>>>>>>>>>> neither in kylin log, nor in reducers' logs.
>>>>>>>>>>>
>>>>>>>>>>> Then loading empty hfiles produces empty segments.
>>>>>>>>>>>
>>>>>>>>>>> Is that a bug or I'm doing something wrong?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Shaofeng Shi 史少锋
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Shaofeng Shi 史少锋
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>>
>>>>>> Shaofeng Shi 史少锋
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Reply via email to