Re: making dataframe for different types using spark-csv
You should be able to do something like this (assuming an input file formatted as: String, IntVal, LongVal) import org.apache.spark.sql.types._ val recSchema = StructType(List(StructField(strVal, StringType, false), StructField(intVal, IntegerType, false), StructField(longVal, LongType, false))) val filePath = some path to your dataset val df1 = sqlContext.load(com.databricks.spark.csv, recSchema, Map(path - filePath , header - false, delimiter - ,, mode - FAILFAST)) From: Hafiz Mujadid hafizmujadi...@gmail.commailto:hafizmujadi...@gmail.com Date: Wednesday, July 1, 2015 at 10:59 PM To: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com Cc: Krishna Sankar ksanka...@gmail.commailto:ksanka...@gmail.com, user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: making dataframe for different types using spark-csv hi Mohammed Guller! How can I specify schema in load method? On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Another option is to provide the schema to the load method. One variant of the sqlContext.load takes a schema as a input parameter. You can define the schema programmatically as shown here: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema Mohammed From: Krishna Sankar [mailto:ksanka...@gmail.commailto:ksanka...@gmail.com] Sent: Wednesday, July 1, 2015 3:09 PM To: Hafiz Mujadid Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: making dataframe for different types using spark-csv · use .cast(...).alias('...') after the DataFrame is read. · sql.functions.udf for any domain-specific conversions. Cheers [https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]k/ On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.commailto:hafizmujadi...@gmail.com wrote: Hi experts! I am using spark-csv to lead csv data into dataframe. By default it makes type of each column as string. Is there some way to get dataframe of actual types like int,double etc.? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org -- Regards: HAFIZ MUJADID
Re: making dataframe for different types using spark-csv
Thanks On Thu, Jul 2, 2015 at 5:40 PM, Kohler, Curt E (ELS-STL) c.koh...@elsevier.com wrote: You should be able to do something like this (assuming an input file formatted as: String, IntVal, LongVal) import org.apache.spark.sql.types._ val recSchema = StructType(List(StructField(“strVal, StringType, false), StructField(“intVal, IntegerType, false), StructField(“longVal, LongType, false))) val filePath = “some path to your dataset val df1 = sqlContext.load(com.databricks.spark.csv, recSchema, Map(path - filePath , header - false, delimiter - ,, mode - FAILFAST)) From: Hafiz Mujadid hafizmujadi...@gmail.com Date: Wednesday, July 1, 2015 at 10:59 PM To: Mohammed Guller moham...@glassbeam.com Cc: Krishna Sankar ksanka...@gmail.com, user@spark.apache.org user@spark.apache.org Subject: Re: making dataframe for different types using spark-csv hi Mohammed Guller! How can I specify schema in load method? On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller moham...@glassbeam.com wrote: Another option is to provide the schema to the load method. One variant of the sqlContext.load takes a schema as a input parameter. You can define the schema programmatically as shown here: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema Mohammed *From:* Krishna Sankar [mailto:ksanka...@gmail.com] *Sent:* Wednesday, July 1, 2015 3:09 PM *To:* Hafiz Mujadid *Cc:* user@spark.apache.org *Subject:* Re: making dataframe for different types using spark-csv · use .cast(...).alias('...') after the DataFrame is read. · sql.functions.udf for any domain-specific conversions. Cheers k/ On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi experts! I am using spark-csv to lead csv data into dataframe. By default it makes type of each column as string. Is there some way to get dataframe of actual types like int,double etc.? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Regards: HAFIZ MUJADID -- Regards: HAFIZ MUJADID
making dataframe for different types using spark-csv
Hi experts! I am using spark-csv to lead csv data into dataframe. By default it makes type of each column as string. Is there some way to get dataframe of actual types like int,double etc.? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: making dataframe for different types using spark-csv
hi Mohammed Guller! How can I specify schema in load method? On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller moham...@glassbeam.com wrote: Another option is to provide the schema to the load method. One variant of the sqlContext.load takes a schema as a input parameter. You can define the schema programmatically as shown here: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema Mohammed *From:* Krishna Sankar [mailto:ksanka...@gmail.com] *Sent:* Wednesday, July 1, 2015 3:09 PM *To:* Hafiz Mujadid *Cc:* user@spark.apache.org *Subject:* Re: making dataframe for different types using spark-csv · use .cast(...).alias('...') after the DataFrame is read. · sql.functions.udf for any domain-specific conversions. Cheers k/ On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi experts! I am using spark-csv to lead csv data into dataframe. By default it makes type of each column as string. Is there some way to get dataframe of actual types like int,double etc.? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Regards: HAFIZ MUJADID
RE: making dataframe for different types using spark-csv
Another option is to provide the schema to the load method. One variant of the sqlContext.load takes a schema as a input parameter. You can define the schema programmatically as shown here: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema Mohammed From: Krishna Sankar [mailto:ksanka...@gmail.com] Sent: Wednesday, July 1, 2015 3:09 PM To: Hafiz Mujadid Cc: user@spark.apache.org Subject: Re: making dataframe for different types using spark-csv · use .cast(...).alias('...') after the DataFrame is read. · sql.functions.udf for any domain-specific conversions. Cheers [https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]k/ On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.commailto:hafizmujadi...@gmail.com wrote: Hi experts! I am using spark-csv to lead csv data into dataframe. By default it makes type of each column as string. Is there some way to get dataframe of actual types like int,double etc.? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Re: making dataframe for different types using spark-csv
- use .cast(...).alias('...') after the DataFrame is read. - sql.functions.udf for any domain-specific conversions. Cheers k/ On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi experts! I am using spark-csv to lead csv data into dataframe. By default it makes type of each column as string. Is there some way to get dataframe of actual types like int,double etc.? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org