Hi Florin,

I might be wrong but timestamp looks like a keyword in SQL that the engine
gets confused with. If it is a column name of your table, you might want to
change it. (
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types)

I'm constantly working with CSV files with spark. However, I didn't use the
spark-csv package though. I did that manually so I cannot comment on the
spark-csv.

HTH,

Jerry


On Thu, Feb 5, 2015 at 9:32 AM, Spico Florin <[email protected]> wrote:

> Hello!
> I'm using spark-csv 2.10 with Java from the maven repository
> <groupId>com.databricks</groupId>
> <artifactId>spark-csv_2.10</artifactId>
> <version>0.1.1</version>
>
> I would like to use Spark-SQL to filter out my data. I'm using the
> following code:
> JavaSchemaRDD cars = new JavaCsvParser().withUseHeader(true).csvFile(
> sqlContext, logFile);
> cars.registerAsTable("mytable");
>  JavaSchemaRDD doll = sqlContext.sql("SELECT TimeStamp FROM mytable");
> doll.saveAsTextFile("dolly.csv");
>
> but I'm getting the following error:
> Exception in thread "main" java.lang.RuntimeException: [1.8] failure:
> ``UNION'' expected but `TimeStamp' fo
>
> SELECT TimeStamp FROM mytablel
>         at scala.sys.package$.error(package.scala:27)
>         at
> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33)
>
> Can you please tell me what is the best approach to filter the CSV data
> with SQL?
> Thank you.
>  Regards,
>  Florin
>

Reply via email to