So is the only issue that impala does not see changes until you refresh the table? This sounds like a configuration that needs to be changed on the impala side.
On Fri, Aug 1, 2014 at 7:20 AM, Patrick McGloin <mcgloin.patr...@gmail.com> wrote: > Sorry, sent early, wasn't finished typing. > > CREATE EXTERNAL TABLE .... > > Then we can select the data using Impala. But this is registered as an > external table and must be refreshed if new data is inserted. > > Obviously this doesn't seem good and doesn't seem like the correct > solution. > > How should we insert data from SparkSQL into a Parquet table which can be > directly queried by Impala? > > Best regards, > Patrick > > > On 1 August 2014 16:18, Patrick McGloin <mcgloin.patr...@gmail.com> wrote: > >> Hi, >> >> We would like to use Spark SQL to store data in Parquet format and then >> query that data using Impala. >> >> We've tried to come up with a solution and it is working but it doesn't >> seem good. So I was wondering if you guys could tell us what is the >> correct way to do this. We are using Spark 1.0 and Impala 1.3.1. >> >> First we are registering our tables using SparkSQL: >> >> val sqlContext = new SQLContext(sc) >> sqlContext.createParquetFile[ParqTable]("hdfs://localhost:8020/user/hive/warehouse/ParqTable.pqt", >> true) >> >> Then we are using the HiveContext to register the table and do the insert: >> >> val hiveContext = new HiveContext(sc) >> import hiveContext._ >> >> hiveContext.parquetFile("hdfs://localhost:8020/user/hive/warehouse/ParqTable.pqt").registerAsTable("ParqTable") >> eventsDStream.foreachRDD(event=>event.insertInto("ParqTable")) >> >> Now we have the data stored in a Parquet file. To access it in Hive or >> Impala we run >> >> >