Hello,

I have a big table with raw numbers and ids in impala. When I generate some
reports I have bunch of ids for which I need to show some (from 100 to
100.000 it depends on query) corresponding names stored in some another
table in some RDBMS. So I have raw ids with some key figures for it in
Impala and user friendly names for this ids in RDBS, and i need to show
somehow all this info.

So I need to synch 2 data sources on query level and show user friendly
info with all aggregated key figures. Could you please give some advises,
what is the preferable way to deal with such tasks with Impala (and
probably it's more related for big data area in general)?

Few ways I see:
1. One way is to synch somehow RDBMS with Impala (question also how, but
can be resolved I guess with using some key based storage like Kudu or
Hbase), do some joins on Impala side (since subqueries are not supported in
select) and show in the end id + corresponding name. But joins could slow
down everything, not sure yet, want to make different test with all the
data of course, but maybe some of you are already tried that.

2. Another way is to get the report from Impala, resolve from RDBMS on some
application level -> too slow as for me, i need different reports and
flexible way to create them. But maybe it's possible to export to some
third party stack, like BI tools, but I need more real time queries, and
from my not big experience, they BI tools are good when they compile all
that data and can good visualize it, but that' it. Maybe some of you are
already using something reliable and flexible, would be interested to hear
your experience

3. Store id with name from the beginning in big table, use id for
aggregation, name for viewing. Here the question is, do you thing it's a
good approach? Won't performance be slowed down for any reasons or maybe
there is another pitfalls.

I would really appreciate if you could share your experience with such kind
of tasks.

with kind regards,
Oleksandr Baliev

Reply via email to