Hi, When doing micro-batch streaming of trade data we need to tokenization certain columns before data lands in Hbase with Lambda architecture.
There are two ways of tokenizing data, vault based and vault less using something like Protegrity tokenization. The vault-based tokenization requires clear text and token values to be stored in a vault say Hbase and crucially the vault cannot be on the same Hadoop cluster that we are processing real time. It could be in another Hadoop cluster for tokenization. This causes latency for real time analytics when token values have to be calculated and then stored in remote Hbase vault. What is the general approach to this type of issue. It seems to be based to use vault-less tokenization? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.