Hi, I have the need to do some integration testing for a Hive UDF I have written. So I start a docker based cluster with a centos 7 config that contains components: [hdfs, yarn, mapreduce, hive]
and create it with ./docker-hadoop.sh -c 3 The problem I have is that this UDF needs a few hundred MiB of ram and Java heapsize of the processes is fixed at 1000m (I see -Xmx1000m for most processes) I have already increased the docker memory_limit but I have not been able to figure out what the proper way is to increase this 1000m value to something like 2000m I searched the code and the documentation and could not find how to change this. I have tried several things (setting HADOOP_HEAPSIZE , YARN_HEAPSIZE, ...) but none of those attempts had any impact. Also adding config lines like this did not have any effect: configurations::yarn-env::yarn_heapsize: 2222 yarn-env::yarn_heapsize: 2223 yarn_heapsize: 2224 So right now I hacked it by doing ./docker-hadoop.sh -e 1 sed -i 's@Xmx1000m@Xmx2000m@g' /usr/lib/hadoop/libexec/hadoop-config.sh and then forcing the various services to restart... which feels "so totally wrong" My question is simply what is the clean/correct way to specify this java heapsize for these processes? -- Best regards / Met vriendelijke groeten, Niels Basjes