Hi Everyone Lets say I have hive table in 2 datacenters. Table format can be textfile or Orc. There is scoop job running every day which adds data to the table.
Each datacenter has its own instance of scoop job. In Ideal case scenario the data in these two table should be the same. The same means that row count is the same and tables contain the same rows. However row order can be different. number of files and their size also can be different. Is there a way to scan the table and get some hashcode which can be used to compare tables? Thank you Alex