What if we shall add direct output in kylin_job_conf.xml
hbase.zookeeper.quorum for example doesn't work if not specified in these
On Fri, Aug 11, 2017 at 3:13 PM, ShaoFeng Shi
> EMR enables the direct output in
EMR enables the direct output in mapred-site.xml, while in this step it
seems these settings doesn't work (althoug the job's configuration shows
they are there). I disabled the direct output but the behavior has no
change. I did some search but no finding. I need drop the EMR now, and may
Any ideas how to fix that?
On Fri, Aug 11, 2017 at 2:16 PM, ShaoFeng Shi
> I got the same problem as you:
> 2017-08-11 08:44:16,342 WARN [Job 2c86b4b6-7639-4a97-ba63-63c9dca095f6-2255]
> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find
I got the same problem as you:
2017-08-11 08:44:16,342 WARN [Job
mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find any
files to load in directory
No, defaultFs is hdfs.
I’ve seen such behavior when set working dir to s3, but didn’t set cluster-fs
at all. Maybe you have a typo in the name of the property. I used the old one
When both working-dir and cluster-fs were set to s3 I got _temporary dir of
That makes sense. Using S3 for Cube build and storage is required for a
cloud hadoop environment.
I tried to reproduce this problem. I created a EMR with S3 as HBase
storage, in kylin.properties, I set "kylin.env.hdfs-working-dir"
and "kylin.storage.hbase.cluster-fs" to the S3
Yes, I workarounded this problem in such way and it works.
One problem of such solution is that I have to use pretty large hdfs and
it'expensive. And also I have to manually garbage collect it, because it is
not moved to s3, but copied. Kylin cleanup job doesn't work for it, because
How about leaving empty for "kylin.hbase.cluster.fs"? This property is for
two-cluster deployment (one Hadoop for cube build, the other for query);
When be empty, the HFile will be written to default fs (HDFS in EMR), and
then load to HBase. I'm not sure whether EMR HBase (using S3 as storage)
I also thought about it, but no, it's not consistency.
Consistency view is enabled. I use same s3 for my own map-reduce jobs and
I also checked if it lost consistency (emrfs diff). No problems.
In case of inconsistency of s3 files disappear right after they were
written and appear some
Did you enable the Consistent View? This article explains the challenge
when using S3 directly for ETL process:
2017-08-09 18:19 GMT+08:00 Alexander Sterligov
Yes, it's empty. Also I see this message in the log:
2017-08-09 09:02:35,947 WARN [Job
mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory
The HFile will be moved to HBase data folder when bulk load finished; Did
you check whether the HTable has data?
2017-08-09 17:54 GMT+08:00 Alexander Sterligov :
> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives.
> Step "Convert Cuboid Data to HFile"
Mail list logo