Chiranjeevi, You will need to set the limit on each server in the cluster -> http://man7.org/linux/man-pages/man3/ulimit.3.html Run ulimit on each server in the cluster to set the value. Once you decide what the number should be, do remember to (a) set it using ulimit, & (b) put this command in a script that is invoked during reboot. To decide the number, you will need to know max number of open files the app has at any given time, and then add a buffer of say 2x to it. This user is the one used for Hadoop.
There is also a system wide file limit that you should set using sysctl ( http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/). This number should be higher than per user limit as other processes will also need to open files. Thks, Amol On Mon, Oct 17, 2016 at 5:33 AM, chiranjeevi vasupilli <[email protected]> wrote: > Thank You Priyanka, > > we are opening more number files in our app. can u please let me know , > how to find the current limit of no of open files in cluster. > so that we will update. > > > > On Mon, Oct 17, 2016 at 11:13 AM, Priyanka Gugale <[email protected]> > wrote: > >> Looks like you are reaching the limit set for max open file handles in >> your system. >> Is your application opening lot of files at same time? If you expect your >> application to open lot many files at same time, you can increase max open >> files limit in hadoop by changing some settings. (Check command " ulimit -n >> <limit>" to update your open files handles limit.) >> If your application is not supposed to open many files at same time, >> please check why so many file handles are open at same time. >> >> -Priyanka >> >> On Fri, Oct 14, 2016 at 6:58 PM, chiranjeevi vasupilli < >> [email protected]> wrote: >> >>> >>> Hi Team, >>> >>> Can you please let me know the reason, when we will get this kind of >>> exception. In my application containers getting killled with below >>> excpetion. >>> >>> >>> java.lang.RuntimeException: java.io.IOException: All datanodes >>> DatanodeInfoWithStorage[147.22.192.229:1004,DS-fa5a41a5-c953-477e-a5d9-3dc499d5baee,DISK] >>> are bad. Aborting... >>> at com.datatorrent.lib.io.fs.AbstractFileOutputOperator.endWind >>> ow(AbstractFileOutputOperator.java:948) >>> at com.datatorrent.stram.engine.GenericNode.processEndWindow(Ge >>> nericNode.java:141) >>> >>> >>> -- >>> ur's >>> chiru >>> >> >> > > > -- > ur's > chiru >
