Re: PrunedFilteredScan does not work for UDTs and Struct fields

2015-09-20 Thread Richard Eggert
Having to restructure my queries isn't a very satisfactory solution,
unfortunately.
I did notice that if I implement the CatalystScan interface instead, then
the filters DO get passed in, but the column identifiers would need to be
translated somewhat to be usable, so that's another option. Unfortunately,
they don't get passed in for JOIN conditions, though.

On Sat, Sep 19, 2015 at 11:26 PM, Zhan Zhang  wrote:

> Hi Richard,
>
>
> I am not sure how to support user-defined type. But regarding your second
> question, you can have a walkaround as following.
>
>
> Suppose you have a struct a, and want to filter a.c with a.c > X. You can
> define a alias C as a.c, and add extra column C to the schema of the
> relation, and your query would be C > X instead of a.c > X. In this way, in
> the buildScan you would have GreaterThan(C, X). You then can
> programmatically convert C to a.c. Note that in the buildScan required
> columns would also have an extra column C you need to returned in the
> buildScan RDD.
>
>
> It looks complicated, but I think it would work.
>
>
> Thanks.
>
>
> Zhan Zhang
> --
> *From:* Richard Eggert 
> *Sent:* Saturday, September 19, 2015 3:59 PM
> *To:* User
> *Subject:* PrunedFilteredScan does not work for UDTs and Struct fields
>
> I defined my own relation (extending BaseRelation) and implemented the
> PrunedFilteredScan interface, but discovered that if the column referenced
> in a WHERE = clause is a user-defined type or a field of a struct column,
> then Spark SQL passes NO filters to the PrunedFilteredScan.buildScan
> method, rendering the interface useless. Is there really no way to
> implement a relation to optimize on such fields?
>
> --
> Rich
>



-- 
Rich


Re: PrunedFilteredScan does not work for UDTs and Struct fields

2015-09-19 Thread Zhan Zhang
Hi Richard,


I am not sure how to support user-defined type. But regarding your second 
question, you can have a walkaround as following.


Suppose you have a struct a, and want to filter a.c with a.c > X. You can 
define a alias C as a.c, and add extra column C to the schema of the relation, 
and your query would be C > X instead of a.c > X. In this way, in the buildScan 
you would have GreaterThan(C, X). You then can programmatically convert C to 
a.c. Note that in the buildScan required columns would also have an extra 
column C you need to returned in the buildScan RDD.


It looks complicated, but I think it would work.


Thanks.


Zhan Zhang


From: Richard Eggert 
Sent: Saturday, September 19, 2015 3:59 PM
To: User
Subject: PrunedFilteredScan does not work for UDTs and Struct fields

I defined my own relation (extending BaseRelation) and implemented the 
PrunedFilteredScan interface, but discovered that if the column referenced in a 
WHERE = clause is a user-defined type or a field of a struct column, then Spark 
SQL passes NO filters to the PrunedFilteredScan.buildScan method, rendering the 
interface useless. Is there really no way to implement a relation to optimize 
on such fields?

--
Rich