Hi Florin, I might be wrong but timestamp looks like a keyword in SQL that the engine gets confused with. If it is a column name of your table, you might want to change it. ( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types)
I'm constantly working with CSV files with spark. However, I didn't use the spark-csv package though. I did that manually so I cannot comment on the spark-csv. HTH, Jerry On Thu, Feb 5, 2015 at 9:32 AM, Spico Florin <[email protected]> wrote: > Hello! > I'm using spark-csv 2.10 with Java from the maven repository > <groupId>com.databricks</groupId> > <artifactId>spark-csv_2.10</artifactId> > <version>0.1.1</version> > > I would like to use Spark-SQL to filter out my data. I'm using the > following code: > JavaSchemaRDD cars = new JavaCsvParser().withUseHeader(true).csvFile( > sqlContext, logFile); > cars.registerAsTable("mytable"); > JavaSchemaRDD doll = sqlContext.sql("SELECT TimeStamp FROM mytable"); > doll.saveAsTextFile("dolly.csv"); > > but I'm getting the following error: > Exception in thread "main" java.lang.RuntimeException: [1.8] failure: > ``UNION'' expected but `TimeStamp' fo > > SELECT TimeStamp FROM mytablel > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33) > > Can you please tell me what is the best approach to filter the CSV data > with SQL? > Thank you. > Regards, > Florin >
