Re: Spark SQL, Parquet and Impala

Michael Armbrust Fri, 01 Aug 2014 10:34:37 -0700

So is the only issue that impala does not see changes until you refresh the
table?  This sounds like a configuration that needs to be changed on the
impala side.



On Fri, Aug 1, 2014 at 7:20 AM, Patrick McGloin <mcgloin.patr...@gmail.com>
wrote:

> Sorry, sent early, wasn't finished typing.
>
> CREATE EXTERNAL TABLE ....
>
> Then we can select the data using Impala.  But this is registered as an
> external table and must be refreshed if new data is inserted.
>
> Obviously this doesn't seem good and doesn't seem like the correct
> solution.
>
> How should we insert data from SparkSQL into a Parquet table which can be
> directly queried by Impala?
>
> Best regards,
> Patrick
>
>
> On 1 August 2014 16:18, Patrick McGloin <mcgloin.patr...@gmail.com> wrote:
>
>> Hi,
>>
>> We would like to use Spark SQL to store data in Parquet format and then
>> query that data using Impala.
>>
>> We've tried to come up with a solution and it is working but it doesn't
>> seem good.  So I was wondering if you guys could tell us what is the
>> correct way to do this.  We are using Spark 1.0 and Impala 1.3.1.
>>
>> First we are registering our tables using SparkSQL:
>>
>> val sqlContext = new SQLContext(sc)
>> sqlContext.createParquetFile[ParqTable]("hdfs://localhost:8020/user/hive/warehouse/ParqTable.pqt",
>> true)
>>
>> Then we are using the HiveContext to register the table and do the insert:
>>
>> val hiveContext = new HiveContext(sc)
>> import hiveContext._
>>
>> hiveContext.parquetFile("hdfs://localhost:8020/user/hive/warehouse/ParqTable.pqt").registerAsTable("ParqTable")
>> eventsDStream.foreachRDD(event=>event.insertInto("ParqTable"))
>>
>> Now we have the data stored in a Parquet file.  To access it in Hive or
>> Impala we run
>>
>>
>

Re: Spark SQL, Parquet and Impala

Reply via email to