Marton Elek created HDDS-3552:
---------------------------------

             Summary: OzoneFS is slow compared to HDFS using Spark job
                 Key: HDDS-3552
                 URL: https://issues.apache.org/jira/browse/HDDS-3552
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
            Reporter: Marton Elek


Reported by "Andrey Mindrin" on the the-asf Slack:

{quote}
We have made a few tests to compare OZONE (0.4 and 0.5 on Cloudera Runtime 
7.0.3 with 3 nodes) performance with HDFS and OZONE is slower in most cases. 
For example, Spark application with 18 containers that copies 6 Gb parquet file 
is about 50% slower on OzoneFS. The only one case shows the same performance - 
Hive queries over partitioned tables.

 simple spark code we used:

{code}
val file = spark.read.format(format).load(path_input)
file.write.format(format).save(path_output)
{code}

Tested on CSV file with 800 million records, 2 columns and parquet file 
converted from CSV above. Just copied file from HDFS to HDFS and from Ozone to 
Ozone. Application time is 1m 14s on HDFS and  1m 51s (+50%) on Ozone (parquet 
file). Ozone has default settings. (edited) 
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to