Re: Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-05 Thread Igor Kuzmenko
Thanks for reply, Gopal. Very helpful. On Thu, Aug 4, 2016 at 10:15 PM, Gopal Vijayaraghavan wrote: > > where res_url like '%mts.ru%' > ... > > where res_url like '%mts_ru%' > ... > > Why '_' wildcard decrease perfomance? > > Because it misses the fast path by just one "_". >

Re: Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-04 Thread Gopal Vijayaraghavan
> where res_url like '%mts.ru%' ... > where res_url like '%mts_ru%' ... > Why '_' wildcard decrease perfomance? Because it misses the fast path by just one "_". ORC vectorized reader has a zero-copy check for 3 patterns - prefix, suffix and middle. That means "https://%;, "%.html", "%mts.ru%"

Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-04 Thread Igor Kuzmenko
I've got Hive Transactional table 'data_http' in ORC format, containing around 100.000.000 rows. When I execute query: select * from data_http where res_url like '%mts.ru%' it completes in 10 seconds. But executing query select * from data_http where res_url like '%mts_ru%' takes more than