We are an Investment firm and have a MDM platform in oracle at a vendor
location and use Oracle Golden Gate to replicat data to our data center for
reporting needs.
Our data is not big data (total size 6 TB including 2 TB of archive data).
Moreover our data doesn't get updated often, nightly once (around 50 MB)
and some correction transactions during the day (<10 MB). We don't have
external users and hence data doesn't grow real-time like e-commerce.

When we replicate data from source to target, we transfer data through
files. So, if there are DML operations (corrections) during day time on a
source table, the corresponding file would have probably 100 lines of table
data that needs to be loaded into the target database. Due to low volume of
data we designed this through Informatica and this works in less than 2-5
minutes. Can Spark be used in this case or would it be an overkill of
technology use?

Reply via email to