I got the same problem as you: 2017-08-11 08:44:16,342 WARN [Job 2c86b4b6-7639-4a97-ba63-63c9dca095f6-2255] mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find any files to load in directory s3://privatekeybucket-anac5h41523l/kylin/kylin_default_instance/kylin-2c86b4b6-7639-4a97-ba63-63c9dca095f6/kylin_sales_cube_clone3/hfile. Does it contain files in subdirectories that correspond to column family names?
In S3 view, I see the files exist in "_temporary" folder, seems were not moved to the target folder on complete. It seems EMR try to direct write to otuput path, but actually not. 2017-08-11 16:34 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>: > No, defaultFs is hdfs. > > I’ve seen such behavior when set working dir to s3, but didn’t set > cluster-fs at all. Maybe you have a typo in the name of the property. I > used the old one «kylin.hbase.cluster.fs» > > When both working-dir and cluster-fs were set to s3 I got _temporary dir > of convert job at s3, but no hfiles. Also I saw correct output path for the > job in the log. But I didn’t check if job creates temporary files in s3, > but then copies results to hdfs. I hardly believe it happens. > > Do you see proper arguments for the step in the log? > > > 11 авг. 2017 г., в 11:17, ShaoFeng Shi <shaofeng...@apache.org> > написал(а): > > Hi Alexander, > > That makes sense. Using S3 for Cube build and storage is required for a > cloud hadoop environment. > > I tried to reproduce this problem. I created a EMR with S3 as HBase > storage, in kylin.properties, I set "kylin.env.hdfs-working-dir" > and "kylin.storage.hbase.cluster-fs" to the S3 bucket. But in the "Convert > Cuboid Data to HFile" step, Kylin still writes to local HDFS; Did you > modify the core-site.xml to make S3 as the default FS? > > > > > 2017-08-10 22:53 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>: > >> Yes, I workarounded this problem in such way and it works. >> >> One problem of such solution is that I have to use pretty large hdfs and >> it'expensive. And also I have to manually garbage collect it, because it is >> not moved to s3, but copied. Kylin cleanup job doesn't work for it, because >> main metadata folder is at s3. So it would be really nice to put everything >> to s3. >> >> Another problem is that I had to rise hbase rpc timeout, because bulk >> loading from hdfs takes long. That was not trivial. 3 minutes work good, >> but with drawback of queries or metadata writes handing for 3 minutes if >> something bad happen. But that's rare event. >> >> 10 авг. 2017 г. 17:42 пользователь "ShaoFeng Shi" <shaofeng...@apache.org> >> написал: >> >> How about leaving empty for "kylin.hbase.cluster.fs"? This property is >>> for two-cluster deployment (one Hadoop for cube build, the other for >>> query); >>> >>> When be empty, the HFile will be written to default fs (HDFS in EMR), >>> and then load to HBase. I'm not sure whether EMR HBase (using S3 as >>> storage) can bulk load files from HDFS or not. If it can, that would be >>> great as the write performance of HDFS would be better than S3. >>> >>> 2017-08-10 22:29 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>: >>> >>>> I also thought about it, but no, it's not consistency. >>>> >>>> Consistency view is enabled. I use same s3 for my own map-reduce jobs >>>> and it's ok. >>>> >>>> I also checked if it lost consistency (emrfs diff). No problems. >>>> >>>> In case of inconsistency of s3 files disappear right after they were >>>> written and appear some time after. Hfiles didn't appear after a day, but >>>> _template is there. >>>> >>>> It's 100% reproducable, I think I'll investigate this problem by >>>> running conversion job manually. >>>> >>>> 10 авг. 2017 г. 17:18 пользователь "ShaoFeng Shi" < >>>> shaofeng...@apache.org> написал: >>>> >>>> Did you enable the Consistent View? This article explains the challenge >>>>> when using S3 directly for ETL process: >>>>> https://aws.amazon.com/cn/blogs/big-data/ensuring-consistenc >>>>> y-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl-workflows/ >>>>> >>>>> >>>>> 2017-08-09 18:19 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>: >>>>> >>>>>> Yes, it's empty. Also I see this message in the log: >>>>>> >>>>>> 2017-08-09 09:02:35,947 WARN [Job >>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608] >>>>>> mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory >>>>>> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kyl >>>>>> in-1e436685-7102-4621-a4cb-6472b866126d >>>>>> /main_event_1_main/hfile/_SUCCESS >>>>>> 2017-08-09 09:02:36,009 WARN [Job >>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608] >>>>>> mapreduce.LoadIncrementalHFiles:252 : Skipping non-file >>>>>> FileStatusExt{path=s3://joom.emr.fs/home/production/bi/kylin >>>>>> /kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/m >>>>>> ain_event_1_main/hfile/_temporary/1; isDirectory=true; >>>>>> modification_time=0; access_time=0; owner=; group=; permission=rwxrwxrwx; >>>>>> isSymlink=false} >>>>>> 2017-08-09 09:02:36,014 WARN [Job >>>>>> 1e436685-7102-4621-a4cb-6472b866126d-7608] >>>>>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not >>>>>> find any files to load in directory s3://joom.emr.fs/home/producti >>>>>> on/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-647 >>>>>> 2b866126d/main_event_1_main/hfile. Does it contain files in >>>>>> subdirectories that correspond to column family names? >>>>>> >>>>>> On Wed, Aug 9, 2017 at 1:15 PM, ShaoFeng Shi <shaofeng...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> The HFile will be moved to HBase data folder when bulk load >>>>>>> finished; Did you check whether the HTable has data? >>>>>>> >>>>>>> 2017-08-09 17:54 GMT+08:00 Alexander Sterligov <sterligo...@joom.it> >>>>>>> : >>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives. >>>>>>>> >>>>>>>> Step "Convert Cuboid Data to HFile" finished without errors. >>>>>>>> Statistics at the end of the job said that it has written lot's of >>>>>>>> data to >>>>>>>> s3. >>>>>>>> >>>>>>>> But there is no hfiles in kylin_metadata folder (kylin_metadata >>>>>>>> /kylin-1e436685-7102-4621-a4cb-6472b866126d/<table name>/hfile), >>>>>>>> but only _temporary folder and _SUCCESS file. >>>>>>>> >>>>>>>> _temporary contains hfiles inside attempt folders. it looks like >>>>>>>> there were not copied from _temporary to result dir. But there is no >>>>>>>> errors >>>>>>>> neither in kylin log, nor in reducers' logs. >>>>>>>> >>>>>>>> Then loading empty hfiles produces empty segments. >>>>>>>> >>>>>>>> Is that a bug or I'm doing something wrong? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> >>>>>>> Shaofeng Shi 史少锋 >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> Shaofeng Shi 史少锋 >>>>> >>>>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > > -- Best regards, Shaofeng Shi 史少锋