spark3.0 read kudu data

冯宝利 Thu, 03 Dec 2020 20:21:28 -0800
Hi：
    Recently, we are upgrading spark from 2.4 to 3.0. We are doing performance 
testing and found some performance problems.Through the comparative test, it is 
found that spark3.0 reads kudu data much slower than 2.4. Normally, spark2.4 
takes 0.1-1s to read the same amount of data, but spark3.0 takes 1 minute to 2 
minutes.Both versions of spark use the same spark submit parameter and run in 
local mode. The read kudu clusters, tables and query conditions are consistent.
    The only difference is that the kudu spark package is different, and that 
for spark2.4 is kudu-spark2_2.11,scala version is  2.11, spark3.0 uses 
kudu-spark3_2.12 ,scala  version is  2.12(This package is based on the Java 
version compiled by kudu 1.13，use spark 3.0.0 and scala 2.12 pom.xml file )
    Our cluster uses CDH 6.3.1 and kudu version is 1.10.In view of this 
situation, what can be optimized or suggestions to improve the performance of 
kudu reading data?
    Thanks!
spark3.0 read kudu data

Reply via email to