What if we shall add direct output in kylin_job_conf.xml and kylin_job_conf_inmem.xml?
hbase.zookeeper.quorum for example doesn't work if not specified in these configs. On Fri, Aug 11, 2017 at 3:13 PM, ShaoFeng Shi <[email protected]> wrote: > EMR enables the direct output in mapred-site.xml, while in this step it > seems these settings doesn't work (althoug the job's configuration shows > they are there). I disabled the direct output but the behavior has no > change. I did some search but no finding. I need drop the EMR now, and may > get back it later. > > If you have any idea or findings, please share it. We'd like to make Kylin > has better support for cloud. > > Thanks for your feedback! > > 2017-08-11 19:19 GMT+08:00 Alexander Sterligov <[email protected]>: > >> Any ideas how to fix that? >> >> On Fri, Aug 11, 2017 at 2:16 PM, ShaoFeng Shi <[email protected]> >> wrote: >> >>> I got the same problem as you: >>> >>> 2017-08-11 08:44:16,342 WARN [Job >>> 2c86b4b6-7639-4a97-ba63-63c9dca095f6-2255] >>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find >>> any files to load in directory s3://privatekeybucket-anac5h41 >>> 523l/kylin/kylin_default_instance/kylin-2c86b4b6-7639-4a97- >>> ba63-63c9dca095f6/kylin_sales_cube_clone3/hfile. Does it contain files >>> in subdirectories that correspond to column family names? >>> >>> In S3 view, I see the files exist in "_temporary" folder, seems were not >>> moved to the target folder on complete. It seems EMR try to direct write to >>> otuput path, but actually not. >>> >>> 2017-08-11 16:34 GMT+08:00 Alexander Sterligov <[email protected]>: >>> >>>> No, defaultFs is hdfs. >>>> >>>> I’ve seen such behavior when set working dir to s3, but didn’t set >>>> cluster-fs at all. Maybe you have a typo in the name of the property. I >>>> used the old one «kylin.hbase.cluster.fs» >>>> >>>> When both working-dir and cluster-fs were set to s3 I got _temporary >>>> dir of convert job at s3, but no hfiles. Also I saw correct output path for >>>> the job in the log. But I didn’t check if job creates temporary files in >>>> s3, but then copies results to hdfs. I hardly believe it happens. >>>> >>>> Do you see proper arguments for the step in the log? >>>> >>>> >>>> 11 авг. 2017 г., в 11:17, ShaoFeng Shi <[email protected]> >>>> написал(а): >>>> >>>> Hi Alexander, >>>> >>>> That makes sense. Using S3 for Cube build and storage is required for a >>>> cloud hadoop environment. >>>> >>>> I tried to reproduce this problem. I created a EMR with S3 as HBase >>>> storage, in kylin.properties, I set "kylin.env.hdfs-working-dir" >>>> and "kylin.storage.hbase.cluster-fs" to the S3 bucket. But in the "Convert >>>> Cuboid Data to HFile" step, Kylin still writes to local HDFS; Did you >>>> modify the core-site.xml to make S3 as the default FS? >>>> >>>> >>>> >>>> >>>> 2017-08-10 22:53 GMT+08:00 Alexander Sterligov <[email protected]>: >>>> >>>>> Yes, I workarounded this problem in such way and it works. >>>>> >>>>> One problem of such solution is that I have to use pretty large hdfs >>>>> and it'expensive. And also I have to manually garbage collect it, because >>>>> it is not moved to s3, but copied. Kylin cleanup job doesn't work for it, >>>>> because main metadata folder is at s3. So it would be really nice to put >>>>> everything to s3. >>>>> >>>>> Another problem is that I had to rise hbase rpc timeout, because bulk >>>>> loading from hdfs takes long. That was not trivial. 3 minutes work good, >>>>> but with drawback of queries or metadata writes handing for 3 minutes if >>>>> something bad happen. But that's rare event. >>>>> >>>>> 10 авг. 2017 г. 17:42 пользователь "ShaoFeng Shi" < >>>>> [email protected]> написал: >>>>> >>>>> How about leaving empty for "kylin.hbase.cluster.fs"? This property >>>>>> is for two-cluster deployment (one Hadoop for cube build, the other for >>>>>> query); >>>>>> >>>>>> When be empty, the HFile will be written to default fs (HDFS in EMR), >>>>>> and then load to HBase. I'm not sure whether EMR HBase (using S3 as >>>>>> storage) can bulk load files from HDFS or not. If it can, that would be >>>>>> great as the write performance of HDFS would be better than S3. >>>>>> >>>>>> 2017-08-10 22:29 GMT+08:00 Alexander Sterligov <[email protected]>: >>>>>> >>>>>>> I also thought about it, but no, it's not consistency. >>>>>>> >>>>>>> Consistency view is enabled. I use same s3 for my own map-reduce >>>>>>> jobs and it's ok. >>>>>>> >>>>>>> I also checked if it lost consistency (emrfs diff). No problems. >>>>>>> >>>>>>> In case of inconsistency of s3 files disappear right after they were >>>>>>> written and appear some time after. Hfiles didn't appear after a day, >>>>>>> but >>>>>>> _template is there. >>>>>>> >>>>>>> It's 100% reproducable, I think I'll investigate this problem by >>>>>>> running conversion job manually. >>>>>>> >>>>>>> 10 авг. 2017 г. 17:18 пользователь "ShaoFeng Shi" < >>>>>>> [email protected]> написал: >>>>>>> >>>>>>> Did you enable the Consistent View? This article explains the >>>>>>>> challenge when using S3 directly for ETL process: >>>>>>>> https://aws.amazon.com/cn/blogs/big-data/ensuring-consistenc >>>>>>>> y-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl- >>>>>>>> workflows/ >>>>>>>> >>>>>>>> >>>>>>>> 2017-08-09 18:19 GMT+08:00 Alexander Sterligov <[email protected] >>>>>>>> >: >>>>>>>> >>>>>>>>> Yes, it's empty. Also I see this message in the log: >>>>>>>>> >>>>>>>>> 2017-08-09 09:02:35,947 WARN [Job >>>>>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608] >>>>>>>>> mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory >>>>>>>>> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kyl >>>>>>>>> in-1e436685-7102-4621-a4cb-6472b866126d >>>>>>>>> /main_event_1_main/hfile/_SUCCESS >>>>>>>>> 2017-08-09 09:02:36,009 WARN [Job >>>>>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608] >>>>>>>>> mapreduce.LoadIncrementalHFiles:252 : Skipping non-file >>>>>>>>> FileStatusExt{path=s3://joom.emr.fs/home/production/bi/kylin >>>>>>>>> /kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/m >>>>>>>>> ain_event_1_main/hfile/_temporary/1; isDirectory=true; >>>>>>>>> modification_time=0; access_time=0; owner=; group=; >>>>>>>>> permission=rwxrwxrwx; >>>>>>>>> isSymlink=false} >>>>>>>>> 2017-08-09 09:02:36,014 WARN [Job >>>>>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608] >>>>>>>>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not >>>>>>>>> find any files to load in directory s3://joom.emr.fs/home/producti >>>>>>>>> on/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-647 >>>>>>>>> 2b866126d/main_event_1_main/hfile. Does it contain files in >>>>>>>>> subdirectories that correspond to column family names? >>>>>>>>> >>>>>>>>> On Wed, Aug 9, 2017 at 1:15 PM, ShaoFeng Shi < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> The HFile will be moved to HBase data folder when bulk load >>>>>>>>>> finished; Did you check whether the HTable has data? >>>>>>>>>> >>>>>>>>>> 2017-08-09 17:54 GMT+08:00 Alexander Sterligov < >>>>>>>>>> [email protected]>: >>>>>>>>>> >>>>>>>>>>> Hi! >>>>>>>>>>> >>>>>>>>>>> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives. >>>>>>>>>>> >>>>>>>>>>> Step "Convert Cuboid Data to HFile" finished without errors. >>>>>>>>>>> Statistics at the end of the job said that it has written lot's of >>>>>>>>>>> data to >>>>>>>>>>> s3. >>>>>>>>>>> >>>>>>>>>>> But there is no hfiles in kylin_metadata folder (kylin_metadata >>>>>>>>>>> /kylin-1e436685-7102-4621-a4cb-6472b866126d/<table >>>>>>>>>>> name>/hfile), but only _temporary folder and _SUCCESS file. >>>>>>>>>>> >>>>>>>>>>> _temporary contains hfiles inside attempt folders. it looks like >>>>>>>>>>> there were not copied from _temporary to result dir. But there is >>>>>>>>>>> no errors >>>>>>>>>>> neither in kylin log, nor in reducers' logs. >>>>>>>>>>> >>>>>>>>>>> Then loading empty hfiles produces empty segments. >>>>>>>>>>> >>>>>>>>>>> Is that a bug or I'm doing something wrong? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards, >>>>>>>>>> >>>>>>>>>> Shaofeng Shi 史少锋 >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Shaofeng Shi 史少锋 >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> >>>>>> Shaofeng Shi 史少锋 >>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> Shaofeng Shi 史少锋 >>>> >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
