Re: Retrieving Pig script from MR job config

2016-05-27 Thread Rohini Palaniswamy
You can find the pig script in pig.script setting. It is base64 encoded and you will have to decode it. If the script is too long, it will be truncated to 10K lines. Regards, Rohini On Tue, May 10, 2016 at 7:27 AM, Harish Gopalan wrote: > Hi, > > Is it possible to

Re: should I set a different number of mappers?

2016-05-27 Thread Rohini Palaniswamy
15K mappers on a 4 node system will definitely crash it unless you have tuned yarn (RM, NM) well. That many mappers reading data off few disks in parallel can create disk storm and disk can also turn out to be your bottle neck. Pig creates 1 map per 128MB ( pig.maxCombinedSplitSize default value)

Re: ToDate does not parse the date properly

2016-05-27 Thread Rohini Palaniswamy
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html You need to use '-MM-dd HH:mm:ss.SSS' instead of '-MM-DD HH:mm:ss.SSS'. DD stands for day of the year and dd stands for day of the month. 11th day of the year can only be in January. So month always comes out as

Re: should I set a different number of mappers?

2016-05-27 Thread Olaf Collider
Hello Rohini Super helpful, thanks! I was able to get the exact characteristics of my cluster. Here it is: Block size 128MB, 300TB of raw data storage (100TB if you account for replication) and each of the 4 nodes has 384GB RAM Does that change your answer? Thanks again!! On 27 May 2016 at