Hi Roberto,

At present, I don't believe there's any way to pass a query hint
explicitly, as the SELECT statement is built based on the table name and
columns, down in this method:

https://github.com/apache/phoenix/blob/892be13985658169ae581b3cb318845891f36b92/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java#L176

However, it does seem that the Hive integration has this built-in, but
doesn't exist in the rest of the Phoenix MR codebase:

https://github.com/apache/phoenix/blob/616cd057d3c7d587aafe278948f8cff84efc9d29/phoenix-hive/src/main/java/org/apache/phoenix/hive/query/PhoenixQueryBuilder.java#L220-L235

Would you mind filing a JIRA ticket? As always, patches are welcome as
well. I suspect we should be disabling the block cache for phoenix-spark by
default as Hive does.

Thanks!

Josh


On Wed, Aug 30, 2017 at 7:11 AM, Roberto Coluccio <roberto.coluc...@eng.it>
wrote:

> Hello folks,
>
> I'm facing the issue of disabling adding to the block cache records I'm
> selecting from my Spark application when reading as DataFrame  (e.g.
> sqlContext.phoenixTableAsDataFrame(myTable, myColumns, myPredicate,
> myZkUrl, myConf).
>
> I know I can force the no cache on a query basis when issuing SQL queries
> leveraging the /*+ NO_CACHE */ hint.
> I know I can disable the caching at a table-specific or colum-family
> specific basis through an ALTER TABLE HBase shell command.
>
> What I don't know is how to do so when leveraging Phoenix-Spark APIs. I
> think my problem can be stated as a more general purpose question:
>
> *how can Phoenix hints be specified when using Phoenix-Spark APIs? *For
> my specific use case, I tried to push within a Configuration object the
> property *hfile.block.cache.size=0* before creating the DataFrame but I
> realized records resulting from the underneath scan where still cached.
>
> Thank you in advance for your help.
>
> Best regards,
> Roberto
>

Reply via email to