Hi all, I'm reading large text files from s3. Sizes between from 30GB and 40GB. Every stage runs in 8-9s, except the last 32, jumps to 1mn-2mn for some reason! Here is my sample code: val myDF = sc.textFile(input_file).map{ x => val p = x.split("\t", -1) new zzzzzzzz(....) }.toDF()
myDF.registerTempTable("tbl") sqlContext.sql("select count(1) from tbl").collect() Any help/idea? Thanks, Younes Naguib Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8 Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.nag...@tritondigital.com <mailto:younes.nag...@streamtheworld.com>