Github user bersprockets commented on the issue:

    https://github.com/apache/spark/pull/21043
  
    @gatorsmile 
    
    On on laptop, running
    <pre>
    spark.sql("select * from 
hive_table").write.mode(SaveMode.Overwrite).csv("outputfile.csv") 
    </pre>
    Input | master<br>runtime | branch<br>runtime
    --- | --- | ---
    6000 cols, 150k rows | 59 minutes | 2.6 minutes
    3000 cols, 150k rows | 13.6 minutes | 1.2 minutes
    20 cols, 150k rows | 7.6 seconds | 7.7 seconds
    20 cols, 1m rows |  10 seconds  | 8.6 second
    
    The branch runtimes are proportional to the number of columns, and also 
much faster for a large number of columns (but the same for a small number of 
columns).
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to