Please close this query as we've identified the problem. Incorrect data
types were used in the schema.





On 19 September 2014 11:31, diplomatic Guru <[email protected]>
wrote:

> I've been experimenting with MapReduce job using CSV and avro format. What
> I find it strange is that Avro format is larger than CSV.
>
> For example, I exported some data in CSV, which is about 1.6GB. I then
> wrote a schema and a MapReduce job to take that CSV and serialize and write
> the output back to HDFS.
>
> When I checked the file size of the output, it was 2.4GB. I assumed that
> the size would be smaller because it convert the data into binary but I was
> wrong. Do you know what the reason is and refer me to some documentation on
> this?
>
> I've checked the .avro file and I could see that header contains the
> schema and the rest are data blocks.
>

Reply via email to