[jira] [Commented] (HDDS-3552) OzoneFS is slow compared to HDFS using Spark job

Marton Elek (Jira) Fri, 08 May 2020 05:56:08 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102552#comment-17102552
 ]


Marton Elek commented on HDDS-3552:
-----------------------------------

Don't know, it was reported by Andrew Mindrin on Apache slack.

> OzoneFS is slow compared to HDFS using Spark job
> ------------------------------------------------
>
>                 Key: HDDS-3552
>                 URL: https://issues.apache.org/jira/browse/HDDS-3552
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Marton Elek
>            Priority: Major
>
> Reported by "Andrey Mindrin" on the the-asf Slack:
> {quote}
> We have made a few tests to compare OZONE (0.4 and 0.5 on Cloudera Runtime 
> 7.0.3 with 3 nodes) performance with HDFS and OZONE is slower in most cases. 
> For example, Spark application with 18 containers that copies 6 Gb parquet 
> file is about 50% slower on OzoneFS. The only one case shows the same 
> performance - Hive queries over partitioned tables.
>  simple spark code we used:
> {code}
> val file = spark.read.format(format).load(path_input)
> file.write.format(format).save(path_output)
> {code}
> Tested on CSV file with 800 million records, 2 columns and parquet file 
> converted from CSV above. Just copied file from HDFS to HDFS and from Ozone 
> to Ozone. Application time is 1m 14s on HDFS and  1m 51s (+50%) on Ozone 
> (parquet file). Ozone has default settings. (edited) 
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-3552) OzoneFS is slow compared to HDFS using Spark job

Reply via email to