Re: binary file deserialization

Saurabh Bajaj Wed, 09 Mar 2016 09:58:27 -0800

You can load that binary up as a String RDD, then map over that RDD and
convert each row to your case class representing the data. In the map stage
you could also map the input string into an RDD of JSON values and use the
following function to convert it into a DF
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets


val anotherPeople = sqlContext.read.json(anotherPeopleRDD)


On Wed, Mar 9, 2016 at 9:15 AM, Ruslan Dautkhanov <[email protected]>
wrote:

> We have a huge binary file in a custom serialization format (e.g. header
> tells the length of the record, then there is a varying number of items for
> that record). This is produced by an old c++ application.
> What would be best approach to deserialize it into a Hive table or a Spark
> RDD?
> Format is known and well documented.
>
>
> --
> Ruslan Dautkhanov
>

Re: binary file deserialization

Reply via email to