Hi Pandees,

You may also be helped by looking into the ability to read and write
Parquet files which is available in the present release.  Parquet
files allow you to store columnar data in HDFS.  At present, Spark
"infers" the schema from the Parquet file.  In pyspark, some of the
methods you'd be interested in are "parquetFile" and "inferSchema" in
SQLContext, and "saveAsParquetFile" in SchemaRDD.

Hope that helps.
-Brad

On Wed, Jul 16, 2014 at 4:31 PM, Michael Armbrust
<mich...@databricks.com> wrote:
> I think what you might be looking for is the ability to programmatically
> specify the schema, which is coming in 1.1.
>
> Here's the JIRA: SPARK-2179
>
>
> On Wed, Jul 16, 2014 at 8:24 AM, pandees waran <pande...@gmail.com> wrote:
>>
>> Hi,
>>
>> I am newbie to spark sql and i would like to know about how to read all
>> the columns from a file in spark sql. I have referred the programming guide
>> here:
>> http://people.apache.org/~tdas/spark-1.0-docs/sql-programming-guide.html
>>
>> The example says:
>>
>> val people =
>> sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p
>> => Person(p(0), p(1).trim.toInt))
>>
>> But, instead of explicitly specifying p(0),p(1) I would like to read all
>> the columns from a file. It would be difficult if my source dataset has more
>> no of columns.
>>
>> Is there any shortcut for that?
>>
>> And instead of a single file, i would like to read multiple files which
>> shares a similar structure from a directory.
>>
>> Could you please share your thoughts on this?
>>
>> It would be great , if you share any documentation which has details on
>> these?
>>
>> Thanks
>
>

Reply via email to