Are these mainly in csv format?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 June 2016 at 20:38, Everett Anderson <ever...@nuna.com.invalid> wrote:

> Hi,
>
> I have a system with files in a variety of non-standard input formats,
> though they're generally flat text files. I'd like to dynamically create
> DataFrames of string columns.
>
> What's the best way to go from a RDD<String> to a DataFrame of StringType
> columns?
>
> My current plan is
>
>    - Call map() on the RDD<String> with a function to split the String
>    into columns and call RowFactory.create() with the resulting array,
>    creating a RDD<Row>
>    - Construct a StructType schema using column names and StringType
>    - Call SQLContext.createDataFrame(RDD, schema) to create the result
>
> Does that make sense?
>
> I looked through the spark-csv package a little and noticed that it's
> using baseRelationToDataFrame(), but BaseRelation looks like it might be a
> restricted developer API. Anyone know if it's recommended for use?
>
> Thanks!
>
> - Everett
>
>

Reply via email to