Re: HFile is empty if kylin.hbase.cluster.fs is set to s3

Alexander Sterligov Fri, 11 Aug 2017 01:34:52 -0700

No, defaultFs is hdfs.

I’ve seen such behavior when set working dir to s3, but didn’t set cluster-fs 
at all. Maybe you have a typo in the name of the property. I used the old one 
«kylin.hbase.cluster.fs»


When both working-dir and cluster-fs were set to s3 I got _temporary dir of 
convert job at s3, but no hfiles. Also I saw correct output path for the job in 
the log. But I didn’t check if job creates temporary files in s3, but then 
copies results to hdfs. I hardly believe it happens.

Do you see proper arguments for the step in the log?


> 11 авг. 2017 г., в 11:17, ShaoFeng Shi <[email protected]> написал(а):
> 
> Hi Alexander,
> 
> That makes sense. Using S3 for Cube build and storage is required for a cloud 
> hadoop environment.
> 
> I tried to reproduce this problem. I created a EMR with S3 as HBase storage, 
> in kylin.properties, I set "kylin.env.hdfs-working-dir" and 
> "kylin.storage.hbase.cluster-fs" to the S3 bucket. But in the "Convert Cuboid 
> Data to HFile" step, Kylin still writes to local HDFS; Did you modify the 
> core-site.xml to make S3 as the default FS?
> 
> 
> 
> 
> 2017-08-10 22:53 GMT+08:00 Alexander Sterligov <[email protected] 
> <mailto:[email protected]>>:
> Yes, I workarounded this problem in such way and it works.
> 
> One problem of such solution is that I have to use pretty large hdfs and 
> it'expensive. And also I have to manually garbage collect it, because it is 
> not moved to s3, but copied. Kylin cleanup job doesn't work for it, because 
> main metadata folder is at s3. So it would be really nice to put everything 
> to s3. 
> 
> Another problem is that I had to rise hbase rpc timeout, because bulk loading 
> from hdfs takes long. That was not trivial. 3 minutes work good, but with 
> drawback of queries or metadata writes handing for 3 minutes if something bad 
> happen. But that's rare event. 
> 
> 10 авг. 2017 г. 17:42 пользователь "ShaoFeng Shi" <[email protected] 
> <mailto:[email protected]>> написал:
> 
> How about leaving empty for "kylin.hbase.cluster.fs"? This property is for 
> two-cluster deployment (one Hadoop for cube build, the other for query); 
> 
> When be empty, the HFile will be written to default fs (HDFS in EMR), and 
> then load to HBase. I'm not sure whether EMR HBase (using S3 as storage) can 
> bulk load files from HDFS or not. If it can, that would be great as the write 
> performance of HDFS would be better than S3.
> 
> 2017-08-10 22:29 GMT+08:00 Alexander Sterligov <[email protected] 
> <mailto:[email protected]>>:
> I also thought about it, but no, it's not consistency. 
> 
> Consistency view is enabled. I use same s3 for my own map-reduce jobs and 
> it's ok.
> 
> I also checked if it lost consistency (emrfs diff). No problems. 
> 
> In case of inconsistency of s3 files disappear right after they were written 
> and appear some time after. Hfiles didn't appear after a day, but _template 
> is there. 
> 
> It's 100% reproducable, I think I'll investigate this problem by running 
> conversion job manually. 
> 
> 10 авг. 2017 г. 17:18 пользователь "ShaoFeng Shi" <[email protected] 
> <mailto:[email protected]>> написал:
> 
> Did you enable the Consistent View? This article explains the challenge when 
> using S3 directly for ETL process:
> https://aws.amazon.com/cn/blogs/big-data/ensuring-consistency-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl-workflows/
>  
> <https://aws.amazon.com/cn/blogs/big-data/ensuring-consistency-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl-workflows/>
> 
> 
> 2017-08-09 18:19 GMT+08:00 Alexander Sterligov <[email protected] 
> <mailto:[email protected]>>:
> Yes, it's empty. Also I see this message in the log:
> 
> 2017-08-09 09:02:35,947 WARN  [Job 1e436685-7102-4621-a4cb-6472b866126d-7608] 
> mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory 
> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d
> /main_event_1_main/hfile/_SUCCESS
> 2017-08-09 09:02:36,009 WARN  [Job 1e436685-7102-4621-a4cb-6472b866126d-7608] 
> mapreduce.LoadIncrementalHFiles:252 : Skipping non-file 
> FileStatusExt{path=s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/main_event_1_main/hfile/_temporary/1;
>  isDirectory=true; modification_time=0; access_time=0; owner=; group=; 
> permission=rwxrwxrwx; isSymlink=false}
> 2017-08-09 09:02:36,014 WARN  [Job 1e436685-7102-4621-a4cb-6472b866126d-7608] 
> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find any 
> files to load in directory 
> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/main_event_1_main/hfile.
>   Does it contain files in subdirectories that correspond to column family 
> names?
> 
> On Wed, Aug 9, 2017 at 1:15 PM, ShaoFeng Shi <[email protected] 
> <mailto:[email protected]>> wrote:
> The HFile will be moved to HBase data folder when bulk load finished; Did you 
> check whether the HTable has data?
> 
> 2017-08-09 17:54 GMT+08:00 Alexander Sterligov <[email protected] 
> <mailto:[email protected]>>:
> Hi!
> 
> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives.
> 
> Step "Convert Cuboid Data to HFile" finished without errors. Statistics at 
> the end of the job said that it has written lot's of data to s3.
> 
> But there is no hfiles in kylin_metadata folder (kylin_metadata 
> /kylin-1e436685-7102-4621-a4cb-6472b866126d/<table name>/hfile), but only 
> _temporary folder and _SUCCESS file.
> 
> _temporary contains hfiles inside attempt folders. it looks like there were 
> not copied from _temporary to result dir. But there is no errors neither in 
> kylin log, nor in reducers' logs.
> 
> Then loading empty hfiles produces empty segments.
> 
> Is that a bug or I'm doing something wrong?
> 
> 
> 
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋
> 
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋
>

Re: HFile is empty if kylin.hbase.cluster.fs is set to s3

Reply via email to