Hi Roberto, At present, I don't believe there's any way to pass a query hint explicitly, as the SELECT statement is built based on the table name and columns, down in this method:
https://github.com/apache/phoenix/blob/892be13985658169ae581b3cb318845891f36b92/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java#L176 However, it does seem that the Hive integration has this built-in, but doesn't exist in the rest of the Phoenix MR codebase: https://github.com/apache/phoenix/blob/616cd057d3c7d587aafe278948f8cff84efc9d29/phoenix-hive/src/main/java/org/apache/phoenix/hive/query/PhoenixQueryBuilder.java#L220-L235 Would you mind filing a JIRA ticket? As always, patches are welcome as well. I suspect we should be disabling the block cache for phoenix-spark by default as Hive does. Thanks! Josh On Wed, Aug 30, 2017 at 7:11 AM, Roberto Coluccio <roberto.coluc...@eng.it> wrote: > Hello folks, > > I'm facing the issue of disabling adding to the block cache records I'm > selecting from my Spark application when reading as DataFrame (e.g. > sqlContext.phoenixTableAsDataFrame(myTable, myColumns, myPredicate, > myZkUrl, myConf). > > I know I can force the no cache on a query basis when issuing SQL queries > leveraging the /*+ NO_CACHE */ hint. > I know I can disable the caching at a table-specific or colum-family > specific basis through an ALTER TABLE HBase shell command. > > What I don't know is how to do so when leveraging Phoenix-Spark APIs. I > think my problem can be stated as a more general purpose question: > > *how can Phoenix hints be specified when using Phoenix-Spark APIs? *For > my specific use case, I tried to push within a Configuration object the > property *hfile.block.cache.size=0* before creating the DataFrame but I > realized records resulting from the underneath scan where still cached. > > Thank you in advance for your help. > > Best regards, > Roberto >