[ 
https://issues.apache.org/jira/browse/HDDS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102396#comment-17102396
 ] 

Shashikant Banerjee commented on HDDS-3552:
-------------------------------------------

[~elek], was the test done on disaggregated compute and storage setup or 
colocated one? Ozone performance is not good as compared to HDFS in colocated 
compute and storage setup as found in our tests done so far.

> OzoneFS is slow compared to HDFS using Spark job
> ------------------------------------------------
>
>                 Key: HDDS-3552
>                 URL: https://issues.apache.org/jira/browse/HDDS-3552
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Marton Elek
>            Priority: Major
>
> Reported by "Andrey Mindrin" on the the-asf Slack:
> {quote}
> We have made a few tests to compare OZONE (0.4 and 0.5 on Cloudera Runtime 
> 7.0.3 with 3 nodes) performance with HDFS and OZONE is slower in most cases. 
> For example, Spark application with 18 containers that copies 6 Gb parquet 
> file is about 50% slower on OzoneFS. The only one case shows the same 
> performance - Hive queries over partitioned tables.
>  simple spark code we used:
> {code}
> val file = spark.read.format(format).load(path_input)
> file.write.format(format).save(path_output)
> {code}
> Tested on CSV file with 800 million records, 2 columns and parquet file 
> converted from CSV above. Just copied file from HDFS to HDFS and from Ozone 
> to Ozone. Application time is 1m 14s on HDFS and  1m 51s (+50%) on Ozone 
> (parquet file). Ozone has default settings. (edited) 
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to