Github user junegunn commented on the issue:

    https://github.com/apache/spark/pull/16347
  
    Hive makes sure that the output file is properly sorted by the column 
specified in `SORT BY` clause by having only one reduce task (output) for each 
partition.
    
    ```
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Map Operator Tree:
              TableScan
                alias: __________________
                Statistics: Num rows: 183663543 Data size: 313697356092 Basic 
stats: COMPLETE Column stats: PARTIAL
                Select Operator
                  expressions: __ (type: bigint), ________ (type: string), ___ 
(type: string), _________ (type: string), _______________ (type: string), 
_______ (type: string), _____________ (type: string), ________ (type: string), 
__ (type: string), _________ (type: string), ________ (type: string), 
_______________ (type: string), _____________ (type: string), _____________ 
(type: string), ____________ (type: string), __________ (type: string), _____ 
(type: string), __________________ (type: string), ___ (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, 
_col16, _col17, _col18
                  Statistics: Num rows: 183663543 Data size: 313697356092 Basic 
stats: COMPLETE Column stats: PARTIAL
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint)
                    sort order: +
                    Map-reduce partition columns: _col18 (type: string)
                    Statistics: Num rows: 183663543 Data size: 313697356092 
Basic stats: COMPLETE Column stats: PARTIAL
                    value expressions: _col1 (type: string), _col2 (type: 
string), _col3 (type: string), _col4 (type: string), _col5 (type: string), 
_col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: 
string), _col10 (type: string), _col11 (type: string), _col12 (type: string), 
_col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 
(type: string), _col17 (type: string), _col18 (type: string)
          Execution mode: vectorized
          Reduce Operator Tree:
            Select Operator
              expressions: KEY.reducesinkkey0 (type: bigint), VALUE._col0 
(type: string), VALUE._col1 (type: string), VALUE._col2 (type: string), 
VALUE._col3 (type: string), VALUE._col4 (type: string), VALUE._col5 (type: 
string), VALUE._col6 (type: string), VALUE._col7 (type: string), VALUE._col8 
(type: string), VALUE._col9 (type: string), VALUE._col10 (type: string), 
VALUE._col11 (type: string), VALUE._col12 (type: string), VALUE._col13 (type: 
string), VALUE._col14 (type: string), VALUE._col15 (type: string), VALUE._col16 
(type: string), VALUE._col17 (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, 
_col16, _col17, _col18
              Statistics: Num rows: 183663543 Data size: 33794091912 Basic 
stats: COMPLETE Column stats: PARTIAL
              File Output Operator
                compressed: false
                Statistics: Num rows: 183663543 Data size: 33794091912 Basic 
stats: COMPLETE Column stats: PARTIAL
                table:
                    input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: _______________.________________
    ```
    
    The later stage simply moves the files to the corresponding directories.
    
    Since the patch no longer merges and I think I have made my point, I'm 
closing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to