RE: yarn spark job submit problem

2017-01-09 Thread Chinnappan Chandrasekaran
Hi I suspect the code wrong, try the following file:///home/spark/spark-1.5.1-bin-hadoop2.6/local/spark-b9c155cf-4624-4c68-9d0d-d3b9d5748601/__spark_conf__6786864409197988390.zip Regards Chandrasekaran Technical Consultant Big Data & Analytics Business Solution Group Jardine OneSolution

Re: Why is the size of a HDFS file changed?

2017-01-09 Thread Mungeol Heo
Yes, that's the reason I wonder why is the specific one file cause the problem while other data files of a hive table are not. On Tue, Jan 10, 2017 at 3:42 AM, Ravi Prakash wrote: > I have not been able to reproduce this: > > [raviprak@ravi ~]$ hdfs dfs -put

Re: merging small files in HDFS

2017-01-09 Thread Gabriel Balan
Hi Here's a couple more alternatives. _If the goal is __writing the least amount of code_, I'd look into using hive. Create an external table over the dir with lots of small data files, and another external table over the dir where I want the compacted data files. Select * from one table and

Re: Why is the size of a HDFS file changed?

2017-01-09 Thread Ravi Prakash
I have not been able to reproduce this: [raviprak@ravi ~]$ hdfs dfs -put HuckleberryFinn.txt / [raviprak@ravi ~]$ cd /tmp [raviprak@ravi tmp]$ hdfs dfs -get /HuckleberryFinn.txt [raviprak@ravi tmp]$ hdfs dfs -cat /HuckleberryFinn.txt > hck [raviprak@ravi tmp]$ md5sum hck