Hi,
I want to make sure that the cache table indeed would accelerate sql queries.
Here is one of my use case :
impala table size : 24.59GB, no partitions, with about 1 billion+ rows.
I use sqlContext.sql to run queries over this table and try to do cache and
uncache command to see if there
is any performance disparity. I ran the following query : select * from
video1203 where id > 10 and id < 20 and added_year != 1989
I can see the following results :
1 If I did not run cache table and just ran sqlContext.sql(), I can see the
above query run about 25 seconds.
2 If I firstly run sqlContext.cacheTable("video1203"), the query runs super
slow and would cause driver OOM exception, but I can
get final results with about running 9 minuts.
Would any expert can explain this for me ? I can see that cacheTable cause OOM
just because the in-memory columnar storage
cannot hold the 24.59GB+ table size into memory. But why the performance is so
different and even so bad ?
Best,
Sun.
[email protected]