To follow on:
I asked the developer how we incrementally load data and the response was
no. union only for updated records (every night)
For every minutes export algorithm next:
1. upload file to hadoop.
2. load data inpath... overwrite into table _incremental;
3. insert into table ..._cached
Hi,
We are seeing bad performance as we incrementally load data. Here is the
config
Spark standalone cluster
spark01 (spark master, shark, hadoop namenode): 15GB RAM, 4vCPU's
spark02 (spark worker, hadoop datanode): 15GB RAM, 8vCPU's
spark03 (spark worker): 15GB RAM, 8vCPU's
spark04 (spark worke