I see now. It optimizes the selection semantics so that less things need to
be included just to do a count(). Very nice. I did a collect() instead of a
count just to see what would happen and it looks like the all the expected
select fields were propagated down as expected. Thanks.
On Sat, Jan
How are you running your test here? Are you perhaps doing a .count()?
On Sat, Jan 17, 2015 at 12:54 PM, Corey Nolet wrote:
> Michael,
>
> What I'm seeing (in Spark 1.2.0) is that the required columns being pushed
> down to the DataRelation are not the product of the SELECT clause but
> rather j
Michael,
What I'm seeing (in Spark 1.2.0) is that the required columns being pushed
down to the DataRelation are not the product of the SELECT clause but
rather just the columns explicitly included in the WHERE clause.
Examples from my testing:
SELECT * FROM myTable --> The required columns are
>
> 1) The fields in the SELECT clause are not pushed down to the predicate
> pushdown API. I have many optimizations that allow fields to be filtered
> out before the resulting object is serialized on the Accumulo tablet
> server. How can I get the selection information from the execution plan?
>
amples also can be found in the unit test:
>>
>>
>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources
>>
>>
>>
>>
>>
>> *From:* Corey Nolet [mailto:cjno...@gmail.com]
>> *Sent:* Friday, January
rg/apache/spark/sql/sources
>
>
>
>
>
> *From:* Corey Nolet [mailto:cjno...@gmail.com]
> *Sent:* Friday, January 16, 2015 1:51 PM
> *To:* user
> *Subject:* Spark SQL Custom Predicate Pushdown
>
>
>
> I have document storage services in Accumulo that I'd like
/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources
From: Corey Nolet [mailto:cjno...@gmail.com]
Sent: Friday, January 16, 2015 1:51 PM
To: user
Subject: Spark SQL Custom Predicate Pushdown
I have document storage services in Accumulo that I'd like to expose to Spark
SQL.
I have document storage services in Accumulo that I'd like to expose to
Spark SQL. I am able to push down predicate logic to Accumulo to have it
perform only the seeks necessary on each tablet server to grab the results
being asked for.
I'm interested in using Spark SQL to push those predicates do