I think yes -- if you would like to scrutinize the results, perhaps sorting
and conducting diff would be the best way. If you would like to test the
results quickly with a bit of uncertainty allowed, I guess comparing the
number of rows would be sufficient because two different results are
I like the approach of applying an arbitrary limit. Hive's q files tend to
add an ordering to everything. Would it make sense to simply order by
multiple columns in the result set and conduct a large diff on them?
On Wednesday, June 26, 2019, Sungwoo Park wrote:
> I have published a new article
I have published a new article on the correctness of Hive on MR3, Presto,
and Impala:
https://mr3.postech.ac.kr/blog/2019/06/26/correctness-hivemr3-presto-impala/
Hope you enjoy reading the article.
--- Sungwoo