Marton Elek created HDDS-3552:
---------------------------------
Summary: OzoneFS is slow compared to HDFS using Spark job
Key: HDDS-3552
URL: https://issues.apache.org/jira/browse/HDDS-3552
Project: Hadoop Distributed Data Store
Issue Type: Improvement
Reporter: Marton Elek
Reported by "Andrey Mindrin" on the the-asf Slack:
{quote}
We have made a few tests to compare OZONE (0.4 and 0.5 on Cloudera Runtime
7.0.3 with 3 nodes) performance with HDFS and OZONE is slower in most cases.
For example, Spark application with 18 containers that copies 6 Gb parquet file
is about 50% slower on OzoneFS. The only one case shows the same performance -
Hive queries over partitioned tables.
simple spark code we used:
{code}
val file = spark.read.format(format).load(path_input)
file.write.format(format).save(path_output)
{code}
Tested on CSV file with 800 million records, 2 columns and parquet file
converted from CSV above. Just copied file from HDFS to HDFS and from Ozone to
Ozone. Application time is 1m 14s on HDFS and 1m 51s (+50%) on Ozone (parquet
file). Ozone has default settings. (edited)
{quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]