Hello We have 2 tables (tab1, tab2) exposed using hive. The data is in different hdfs folders. We are trying to join these 2 tables on certain single column using sparkR join. But inspite of join columns having same values, it returns zero rows.
But when I run the same join sql in hive, from hive console, to get the count(*), I do get millions of records meeting the join criteria. The join columns are of 'int' type. Also, when I join 'tab1' from one of these 2 tables for which join is not working with another 3rd table 'tab3' separately, that join works. To debug , we selected just 1 row in the sparkR script from tab1 and also 1 row row having the same value of join column from tab2 also. We used 'select' sparkR function for this. Now, our dataframes for tab1 and tab2 have single row each and the join columns have same value in both, but still joining these 2 dataframes having single row each and with same join column, the join returned zero rows. We are running the script from rstudio. It does not give any error. It runs fine. But gives zero join results whereas on hive I do get many rows for same join. Any idea what might be the cause of this? -- Architect Infoworks.io http://Infoworks.io