Re: making dataframe for different types using spark-csv

2015-07-02 Thread Kohler, Curt E (ELS-STL)
You should be able to do something like this (assuming an input file formatted 
as:  String, IntVal, LongVal)


import org.apache.spark.sql.types._

val recSchema = StructType(List(StructField(strVal, StringType, false),
StructField(intVal, IntegerType, 
false),
StructField(longVal, LongType, false)))

val filePath = some path to your dataset

val df1 =  sqlContext.load(com.databricks.spark.csv, recSchema, Map(path - 
filePath , header - false, delimiter - ,, mode - FAILFAST))

From: Hafiz Mujadid hafizmujadi...@gmail.commailto:hafizmujadi...@gmail.com
Date: Wednesday, July 1, 2015 at 10:59 PM
To: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com
Cc: Krishna Sankar ksanka...@gmail.commailto:ksanka...@gmail.com, 
user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: making dataframe for different types using spark-csv

hi Mohammed Guller!

How can I specify schema in load method?



On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
Another option is to provide the schema to the load method. One variant of the 
sqlContext.load takes a schema as a input parameter. You can define the schema 
programmatically as shown here:

https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

Mohammed

From: Krishna Sankar [mailto:ksanka...@gmail.commailto:ksanka...@gmail.com]
Sent: Wednesday, July 1, 2015 3:09 PM
To: Hafiz Mujadid
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: making dataframe for different types using spark-csv

·  use .cast(...).alias('...') after the DataFrame is read.
·  sql.functions.udf for any domain-specific conversions.
Cheers
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]k/

On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid 
hafizmujadi...@gmail.commailto:hafizmujadi...@gmail.com wrote:
Hi experts!


I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org




--
Regards: HAFIZ MUJADID


Re: making dataframe for different types using spark-csv

2015-07-02 Thread Hafiz Mujadid
Thanks

On Thu, Jul 2, 2015 at 5:40 PM, Kohler, Curt E (ELS-STL) 
c.koh...@elsevier.com wrote:

  You should be able to do something like this (assuming an input file
 formatted as:  String, IntVal, LongVal)


  import org.apache.spark.sql.types._

  val recSchema = StructType(List(StructField(“strVal, StringType, false),
 StructField(“intVal, IntegerType,
 false),
 StructField(“longVal, LongType, false)))

  val filePath = “some path to your dataset

  val df1 =  sqlContext.load(com.databricks.spark.csv, recSchema,
 Map(path - filePath , header - false, delimiter - ,, mode -
 FAILFAST))

   From: Hafiz Mujadid hafizmujadi...@gmail.com
 Date: Wednesday, July 1, 2015 at 10:59 PM
 To: Mohammed Guller moham...@glassbeam.com
 Cc: Krishna Sankar ksanka...@gmail.com, user@spark.apache.org 
 user@spark.apache.org

 Subject: Re: making dataframe for different types using spark-csv

   hi Mohammed Guller!

  How can I specify schema in load method?



 On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller moham...@glassbeam.com
 wrote:

  Another option is to provide the schema to the load method. One variant
 of the sqlContext.load takes a schema as a input parameter. You can define
 the schema programmatically as shown here:




 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema



 Mohammed



 *From:* Krishna Sankar [mailto:ksanka...@gmail.com]
 *Sent:* Wednesday, July 1, 2015 3:09 PM
 *To:* Hafiz Mujadid
 *Cc:* user@spark.apache.org
 *Subject:* Re: making dataframe for different types using spark-csv



 ·  use .cast(...).alias('...') after the DataFrame is read.

 ·  sql.functions.udf for any domain-specific conversions.

 Cheers

 k/



 On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com
 wrote:

 Hi experts!


 I am using spark-csv to lead csv data into dataframe. By default it makes
 type of each column as string. Is there some way to get dataframe of
 actual
 types like int,double etc.?


 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






  --
 Regards: HAFIZ MUJADID




-- 
Regards: HAFIZ MUJADID


making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid
Hi experts!


I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid
hi Mohammed Guller!

How can I specify schema in load method?



On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller moham...@glassbeam.com
wrote:

  Another option is to provide the schema to the load method. One variant
 of the sqlContext.load takes a schema as a input parameter. You can define
 the schema programmatically as shown here:




 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema



 Mohammed



 *From:* Krishna Sankar [mailto:ksanka...@gmail.com]
 *Sent:* Wednesday, July 1, 2015 3:09 PM
 *To:* Hafiz Mujadid
 *Cc:* user@spark.apache.org
 *Subject:* Re: making dataframe for different types using spark-csv



 ·  use .cast(...).alias('...') after the DataFrame is read.

 ·  sql.functions.udf for any domain-specific conversions.

 Cheers

 k/



 On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com
 wrote:

 Hi experts!


 I am using spark-csv to lead csv data into dataframe. By default it makes
 type of each column as string. Is there some way to get dataframe of actual
 types like int,double etc.?


 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






-- 
Regards: HAFIZ MUJADID


RE: making dataframe for different types using spark-csv

2015-07-01 Thread Mohammed Guller
Another option is to provide the schema to the load method. One variant of the 
sqlContext.load takes a schema as a input parameter. You can define the schema 
programmatically as shown here:

https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

Mohammed

From: Krishna Sankar [mailto:ksanka...@gmail.com]
Sent: Wednesday, July 1, 2015 3:09 PM
To: Hafiz Mujadid
Cc: user@spark.apache.org
Subject: Re: making dataframe for different types using spark-csv

·  use .cast(...).alias('...') after the DataFrame is read.
·  sql.functions.udf for any domain-specific conversions.
Cheers
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]k/

On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid 
hafizmujadi...@gmail.commailto:hafizmujadi...@gmail.com wrote:
Hi experts!


I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org



Re: making dataframe for different types using spark-csv

2015-07-01 Thread Krishna Sankar
   - use .cast(...).alias('...') after the DataFrame is read.
   - sql.functions.udf for any domain-specific conversions.

Cheers
k/

On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:

 Hi experts!


 I am using spark-csv to lead csv data into dataframe. By default it makes
 type of each column as string. Is there some way to get dataframe of actual
 types like int,double etc.?


 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org