[
https://issues.apache.org/jira/browse/HDDS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101603#comment-17101603
]
Marton Elek commented on HDDS-3552:
-----------------------------------
I synced with @Shashikant Banerjee and I learned that the main reason for
slowness is that Ozone (by default) provides stronger guarantees on replication.
(For example flush() on hdfs means sending data to the network, in Ozone it can
guarantee to flush to the disk on remote).
There is an ongoing effort to adjust this guarantees and make them configurable
(and provide guarantees similar to the HDFS).
If you have time, you can set ozone.client.stream.buffer.flush.delay to true,
which makes the flush implementation similar to HDFS (and makes everything
faster) and repeat your test.
I am not sure if it's possible with Cloudera Runtime (this setting is not yet
released). But will try to repeat your result and show how does it work with
different settings.
> OzoneFS is slow compared to HDFS using Spark job
> ------------------------------------------------
>
> Key: HDDS-3552
> URL: https://issues.apache.org/jira/browse/HDDS-3552
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Marton Elek
> Priority: Major
>
> Reported by "Andrey Mindrin" on the the-asf Slack:
> {quote}
> We have made a few tests to compare OZONE (0.4 and 0.5 on Cloudera Runtime
> 7.0.3 with 3 nodes) performance with HDFS and OZONE is slower in most cases.
> For example, Spark application with 18 containers that copies 6 Gb parquet
> file is about 50% slower on OzoneFS. The only one case shows the same
> performance - Hive queries over partitioned tables.
> simple spark code we used:
> {code}
> val file = spark.read.format(format).load(path_input)
> file.write.format(format).save(path_output)
> {code}
> Tested on CSV file with 800 million records, 2 columns and parquet file
> converted from CSV above. Just copied file from HDFS to HDFS and from Ozone
> to Ozone. Application time is 1m 14s on HDFS and 1m 51s (+50%) on Ozone
> (parquet file). Ozone has default settings. (edited)
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]