Please close this query as we've identified the problem. Incorrect data types were used in the schema.
On 19 September 2014 11:31, diplomatic Guru <[email protected]> wrote: > I've been experimenting with MapReduce job using CSV and avro format. What > I find it strange is that Avro format is larger than CSV. > > For example, I exported some data in CSV, which is about 1.6GB. I then > wrote a schema and a MapReduce job to take that CSV and serialize and write > the output back to HDFS. > > When I checked the file size of the output, it was 2.4GB. I assumed that > the size would be smaller because it convert the data into binary but I was > wrong. Do you know what the reason is and refer me to some documentation on > this? > > I've checked the .avro file and I could see that header contains the > schema and the rest are data blocks. >
