Thanks. This library is only available with Spark 1.3. I am using version 1.2.1. Before I upgrade to 1.3, I want to try what can be done in 1.2.1.
So I am using following: val MyDataset = sqlContext.sql("my select query”) MyDataset.map(t => t(0)+"|"+t(1)+"|"+t(2)+"|"+t(3)+"|"+t(4)+"|"+t(5)).saveAsTextFile("/my_destination_path") But it is giving following error: 15/03/24 17:05:51 ERROR Executor: Exception in task 1.0 in stage 13.0 (TID 106) java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:453) at java.lang.Long.parseLong(Long.java:483) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230) is there something wrong with the TSTAMP field which is Long datatype? Thanks & Regards ----------------------- Ananda Basak Ph: 425-213-7092 From: Yin Huai [mailto:yh...@databricks.com] Sent: Monday, March 23, 2015 8:55 PM To: BASAK, ANANDA Cc: user@spark.apache.org Subject: Re: Date and decimal datatype not working To store to csv file, you can use Spark-CSV<https://github.com/databricks/spark-csv> library. On Mon, Mar 23, 2015 at 5:35 PM, BASAK, ANANDA <ab9...@att.com<mailto:ab9...@att.com>> wrote: Thanks. This worked well as per your suggestions. I had to run following: val TABLE_A = sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => ROW_A(p(0).trim.toLong, p(1), p(2).trim.toInt, p(3), BigDecimal(p(4)), BigDecimal(p(5)), BigDecimal(p(6)))) Now I am stuck at another step. I have run a SQL query, where I am Selecting from all the fields with some where clause , TSTAMP filtered with date range and order by TSTAMP clause. That is running fine. Then I am trying to store the output in a CSV file. I am using saveAsTextFile(“filename”) function. But it is giving error. Can you please help me to write a proper syntax to store output in a CSV file? Thanks & Regards ----------------------- Ananda Basak Ph: 425-213-7092<tel:425-213-7092> From: BASAK, ANANDA Sent: Tuesday, March 17, 2015 3:08 PM To: Yin Huai Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: Date and decimal datatype not working Ok, thanks for the suggestions. Let me try and will confirm all. Regards Ananda From: Yin Huai [mailto:yh...@databricks.com<mailto:yh...@databricks.com>] Sent: Tuesday, March 17, 2015 3:04 PM To: BASAK, ANANDA Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Date and decimal datatype not working p(0) is a String. So, you need to explicitly convert it to a Long. e.g. p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals value, you need to create BigDecimal objects from your String values. On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA <ab9...@att.com<mailto:ab9...@att.com>> wrote: Hi All, I am very new in Spark world. Just started some test coding from last week. I am using spark-1.2.1-bin-hadoop2.4 and scala coding. I am having issues while using Date and decimal data types. Following is my code that I am simply running on scala prompt. I am trying to define a table and point that to my flat file containing raw data (pipe delimited format). Once that is done, I will run some SQL queries and put the output data in to another flat file with pipe delimited format. ******************************************************* val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD // Define row and table case class ROW_A( TSTAMP: Long, USIDAN: String, SECNT: Int, SECT: String, BLOCK_NUM: BigDecimal, BLOCK_DEN: BigDecimal, BLOCK_PCT: BigDecimal) val TABLE_A = sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p => ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6))) TABLE_A.registerTempTable("TABLE_A") *************************************************** The second last command is giving error, like following: <console>:17: error: type mismatch; found : String required: Long Looks like the content from my flat file are considered as String always and not as Date or decimal. How can I make Spark to take them as Date or decimal types? Regards Ananda