Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Mich Talebzadeh Thu, 19 Oct 2017 15:19:56 -0700

sorry what do you mean my JDBC table has an index on it? Where are you
reading the data from the table?

I assume you are referring to "id" column on the table that you are reading
through JDBC connection.

Then you are creating a temp Table called "df". That temp table is created
in temporary work space and does not have any index. That index "id" is
used when doing parallel reads into RDDs not when querying the temp Table.

HTH

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 19 October 2017 at 23:10, lucas.g...@gmail.com <lucas.g...@gmail.com>
wrote:

> IE:  If my JDBC table has an index on it, will the optimizer consider that
> when pushing predicates down?
>
> I noticed in a query like this:
>
> df = spark.hiveContext.read.jdbc(
>   url=jdbc_url,
>   table="schema.table",
>   column="id",
>   lowerBound=lower_bound_id,
>   upperBound=upper_bound_id,
>   numPartitions=numberPartitions
> )
> df.registerTempTable("df")
>
> filtered_df = spark.hiveContext.sql("""
>     SELECT
>         *
>     FROM
>         df
>     WHERE
>         type = 'type'
>         AND action = 'action'
>         AND audited_changes LIKE '---\ncompany_id:\n- %'
> """)
> filtered_audits.registerTempTable("filtered_df")
>
>
> The queries sent to the DB look like this:
> "Select fields from schema.table where type='type' and action='action' and
> id > lower_bound and id <= upper_bound"
>
> And then it does the like ( LIKE '---\ncompany_id:\n- %') in memory, which
> is great!
>
> However I'm wondering why it chooses that optimization.  In this case
> there aren't any indexes on any of these except ID.
>
> So, does spark take into account JDBC indexes in it's query plan where it
> can?
>
> Thanks!
>
> Gary Lucas
>

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Reply via email to