unsubscribe

2024-02-01 Thread Jakub Stransky

Configuring hadoop in Azure on linux using Azure BLOB storage

2016-01-28 Thread Jakub Stransky
Hello, we are trying to configure hadoop HDP 2.2 running on azure cloud to use a Azure Storage BLOB instead of regular HDFS. Cluster is up and running, we can list files in azure blob storage over hdoop fs commands. But when trying to run smoke test mapreduce teragen we are getting following

Re: Capacity scheduler properties

2015-01-15 Thread Jakub Stransky
, Jakub Stransky stransky...@gmail.com wrote: Hello, I am configuring capacity scheduler all seems ok but I cannot find what is the meaning of the following property yarn.scheduler.capacity.root.unfunded.capacity I just found that everywhere is set to 50 and description is No description

Capacity scheduler properties

2015-01-15 Thread Jakub Stransky
Hello, I am configuring capacity scheduler all seems ok but I cannot find what is the meaning of the following property yarn.scheduler.capacity.root.unfunded.capacity I just found that everywhere is set to 50 and description is No description. Can anybody clarify or point to where to find

Memory consumption by AM

2014-10-23 Thread Jakub Stransky
Hello experienced users, we are new to hadoop hence using nearly default configuration including scheduler - which I guess by default is Capacity Scheduler. Lately we were confronted with following behaviour on the cluster. We are using apache oozie for job submission of various data pipes. We

Re: How to limit the number of containers requested by a pig script?

2014-10-21 Thread Jakub Stransky
requested(and used ofcourse) by my pig-script (not as a yarn queue configuration or some such stuff.. I want to limit it from outside on a per job basis. I would ideally like to set the number in my pig-script.) Can I do this? Thanks, Sunil. -- Jakub Stransky cz.linkedin.com

Re: How to limit the number of containers requested by a pig script?

2014-10-21 Thread Jakub Stransky
or code, then we can, I think. We do have this property mapreduce.job.maps. Regards, Shahab On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky stransky...@gmail.com wrote: Hello, as far as I understand. Number of mappers you cannot drive. The number of reducers you can control via PARALEL keyword

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Jakub Stransky
Distcp? On 17 Oct 2014 20:51, Alexander Pivovarov apivova...@gmail.com wrote: try to run on dest cluster datanode $ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/ On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani sm...@pivotal.io wrote: What is your approx input size ? Do

Cannot fine profiling log file

2014-09-23 Thread Jakub Stransky
Hello experienced users, I did try to use profiling of tasks during mapreduce property namemapreduce.task.profile/name valuetrue/value /property property namemapreduce.task.profile.maps/name value0-5/value /property property

CPU utilization

2014-09-12 Thread Jakub Stransky
Hello experienced hadoop users, I have one beginners question regarding cpu utilization on datanodes when running MR job. Cluster of 5 machines, 2NN +3 DN really inexpensive hw using following parameters: # hadoop - yarn-site.xml yarn.nodemanager.resource.memory-mb : 2048

Re: Enable Debug logging for a job

2014-09-12 Thread Jakub Stransky
, Siddhi -- Jakub Stransky cz.linkedin.com/in/jakubstransky

Re: CPU utilization

2014-09-12 Thread Jakub Stransky
! Adam 2014-09-12 17:51 GMT+02:00 Jakub Stransky stransky...@gmail.com: Hello experienced hadoop users, I have one beginners question regarding cpu utilization on datanodes when running MR job. Cluster of 5 machines, 2NN +3 DN really inexpensive hw using following parameters: # hadoop - yarn

Re: CPU utilization

2014-09-12 Thread Jakub Stransky
(mapreduce.reduce.memory.mb). If you run the MapReduce app master, you need 1024 MB ( yarn.app.mapreduce.am.resource.mb). Therefore, you run MapReduce job, you can run only 2 containers per NodeManager (3 x 768 = 2304 2048) on your setup. 2014-09-12 20:37 GMT+02:00 Jakub Stransky stransky...@gmail.com

virtual memory consumption

2014-09-11 Thread Jakub Stransky
Hello hadoop users, I am facing following issue when running M/R job during a reduce phase: Container [pid=22961,containerID=container_1409834588043_0080_01_10] is running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory

Re: virtual memory consumption

2014-09-11 Thread Jakub Stransky
as 768M and reduce memory as 1024M and am as 1024M. With AM and a single map task it is 1.7M and cannot start another container for reducer. Reduce these values and check. On 9/11/14, Jakub Stransky stransky...@gmail.com wrote: Hello hadoop users, I am facing following issue when running

task slowness

2014-09-11 Thread Jakub Stransky
Hello experienced hadoop users, I am having a data pipeline consisting of two java MR jobs coordinated by oozie scheduler. Both of them process the same data but the first one is more than 10 times slower than second one. Job counters on RM page are not much helpful in that matter. I have

running beyond virtual memory limits

2014-09-10 Thread Jakub Stransky
Hello, I am getting following error when running on 500MB dataset compressed in avro data format. Container [pid=22961,containerID=container_1409834588043_0080_01_10] is running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual

Error could only be replicated to 0 nodes instead of minReplication (=1)

2014-08-28 Thread Jakub Stransky
Hello, we are using Hadoop 2.2.0 (HDP 2.0), avro 1.7.4. running on CentOS 6.3 I am facing a following issue when using a AvroMultipleOutputs with dynamic output files. My M/R job works fine for a smaller amount of data or at least the error hasn't appear there so far. With bigger amount of data I