Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

ilya.kasnacheev Fri, 28 Sep 2018 07:00:40 -0700

Hello!

I have indeed try a use case like yours:


0: jdbc:ignite:thin://127.0.0.1/> create index on b(x,y); 
No rows affected (9,729 seconds)
0: jdbc:ignite:thin://127.0.0.1/> select count(*) from a;
COUNT(*)  1

1 row selected (0,017 seconds)
0: jdbc:ignite:thin://127.0.0.1/> select count(*) from b;
COUNT(*)  4194304

1 row selected (0,024 seconds)
0: jdbc:ignite:thin://127.0.0.1/> select a.x,a.y from a join b where a.y =
b.y and a.x = b.x; 
X  1
Y  1

1 row selected (0,005 seconds)
0: jdbc:ignite:thin://127.0.0.1/> explain select a.x,a.y from a join b where
a.y = b.y and a.x = b.x;
PLAN  SELECT
    __Z0.X AS __C0_0,
    __Z0.Y AS __C0_1
FROM PUBLIC.A __Z0
    /* PUBLIC.A.__SCAN_ */
INNER JOIN PUBLIC.B __Z1
    /* PUBLIC."b_x_asc_y_asc_idx": Y = __Z0.Y
        AND X = __Z0.X
     */
    ON 1=1
WHERE (__Z0.Y = __Z1.Y)
    AND (__Z0.X = __Z1.X)

PLAN  SELECT
    __C0_0 AS X,
    __C0_1 AS Y
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */

2 rows selected (0,007 seconds)
^ very fast, compared to 1,598 seconds before index was created

My standing idea is that you have very low selectivity on b.x. I.e. if 10
million out of 14 million b rows will have x = 1, then index will not be
able to help and will only hurt. Can you execute SELECT COUNT(*) FROM b
WHERE x = 1; on your dataset?

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

Reply via email to