subject:"\[How\-To\] Custom file format as source"

Re: [How-To] Custom file format as source

2017-06-15 Thread OBones

Thanks to both of you, this should get me started. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [How-To] Custom file format as source

2017-06-12 Thread Vadim Semenov

It should be easy to start with a custom Hadoop InputFormat that reads the file and creates a `RDD[Row]`, since you know the records size, it should be pretty easy to make the InputFormat to produce splits, so then you could read the file in parallel. On Mon, Jun 12, 2017 at 6:01 AM, OBones wrote

RE: [How-To] Custom file format as source

2017-06-12 Thread Mendelson, Assaf

Try https://mapr.com/blog/spark-data-source-api-extending-our-spark-sql-query-engine/ Thanks, Assaf. -Original Message- From: OBones [mailto:obo...@free.fr] Sent: Monday, June 12, 2017 1:01 PM To: user@spark.apache.org Subject: [How-To] Custom file format as source

[How-To] Custom file format as source

2017-06-12 Thread OBones

Hello, I have an application here that generates data files in a custom binary format that provides the following information: Column list, each column has a data type (64 bit integer, 32 bit string index, 64 bit IEEE float, 1 byte boolean) Catalogs that give modalities for some columns (ie,