Hi, I was going through some benchmarking and realized that there are lots of TCP connections are initiated while running my PIG jobs over YARN(MR2). These TCP connections are related to data node. Although short-circuit is enabled in my data nodes but still a lot TCP connections are being created.
I wanted to check that how can we enable YARN applicationMaster to read data from Data node using short-circuits i.e. unix domain sockets. I believe that will improve the performance of our jobs. Can someone please help to understand how can I make sure that MR2 jobs created by PIG scripts are reading data from Data node using short-circuit instead of TCP connections? Regards, Sandeep
