Awesome, thank you Michael for the detailed example!
I'll look into whether I can use this approach for my use case. If so, I
could avoid the overhead of repeatedly registering a temp table for one-off
queries, instead registering the table once and relying on the injected
strategy. Don't know how
Thanks for the link, I hadn't come across this.
According to https://forums.databricks.com/questions/400/what-is-the-
> difference-between-registertemptable-a.html
>
> and I quote
>
> "registerTempTable()
>
> registerTempTable() creates an in-memory table that is scoped to the
> cluster in which i
Hi again Mich,
"But the thing is that I don't explicitly cache the tempTables ..".
>
> I believe tempTable is created in-memory and is already cached
>
That surprises me since there is a sqlContext.cacheTable method to
explicitly cache a table in memory. Or am I missing something? This could
expl
Hi Mich,
Thank you again for your reply.
As I see you are caching the table already sorted
>
> val keyValRDDSorted = keyValRDD.sortByKey().cache
>
> and the next stage is you are creating multiple tempTables (different
> ranges) that cache a subset of rows already cached in RDD. The data stored
>
Hi Mich,
Thank you for your quick reply!
What type of table is the underlying table? Is it Hbase, Hive ORC or what?
>
It is a custom datasource, but ultimately backed by HBase.
> By Key you mean a UNIQUE ID or something similar and then you do multiple
> scans on the tempTable which stores dat
Hello,
I've got a Spark SQL dataframe containing a "key" column. The queries I
want to run start by filtering on the key range. My question in outline: is
it possible to sort the dataset by key so as to do efficient key range
filters, before subsequently running a more complex SQL query?
I'm awar
Hi,
I'm wanting to take a SQL string as a user input, then transform it before
execution. In particular, I want to modify the top-level projection (select
clause), injecting additional columns to be retrieved by the query.
I was hoping to achieve this by hooking into Catalyst using
sparkSession.e