Hello, I have a big table with raw numbers and ids in impala. When I generate some reports I have bunch of ids for which I need to show some (from 100 to 100.000 it depends on query) corresponding names stored in some another table in some RDBMS. So I have raw ids with some key figures for it in Impala and user friendly names for this ids in RDBS, and i need to show somehow all this info.
So I need to synch 2 data sources on query level and show user friendly info with all aggregated key figures. Could you please give some advises, what is the preferable way to deal with such tasks with Impala (and probably it's more related for big data area in general)? Few ways I see: 1. One way is to synch somehow RDBMS with Impala (question also how, but can be resolved I guess with using some key based storage like Kudu or Hbase), do some joins on Impala side (since subqueries are not supported in select) and show in the end id + corresponding name. But joins could slow down everything, not sure yet, want to make different test with all the data of course, but maybe some of you are already tried that. 2. Another way is to get the report from Impala, resolve from RDBMS on some application level -> too slow as for me, i need different reports and flexible way to create them. But maybe it's possible to export to some third party stack, like BI tools, but I need more real time queries, and from my not big experience, they BI tools are good when they compile all that data and can good visualize it, but that' it. Maybe some of you are already using something reliable and flexible, would be interested to hear your experience 3. Store id with name from the beginning in big table, use id for aggregation, name for viewing. Here the question is, do you thing it's a good approach? Won't performance be slowed down for any reasons or maybe there is another pitfalls. I would really appreciate if you could share your experience with such kind of tasks. with kind regards, Oleksandr Baliev
