Kengo Seki created AIRFLOW-2254:
-----------------------------------

             Summary: Fix header output on RedshiftToS3Transfer
                 Key: AIRFLOW-2254
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
             Project: Apache Airflow
          Issue Type: Bug
          Components: aws, redshift
            Reporter: Kengo Seki


The current implementation of RedshiftToS3Transfer is as follows and seems to 
have referred to [this 
post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].

{code}
        unload_query = """
                        UNLOAD ('SELECT {0}
                        UNION ALL
                        SELECT {1} FROM {2}.{3}
                        ORDER BY 1 DESC')
                        TO 's3://{4}/{5}/{3}_'
                        with
                        credentials 
'aws_access_key_id={6};aws_secret_access_key={7}'
                        {8};
                        """.format(column_names, column_castings, self.schema, 
self.table,
                                   self.s3_bucket, self.s3_key, 
credentials.access_key,
                                   credentials.secret_key, unload_options)
{code}

{{ORDER BY 1 DESC}} is intended to output the header first, but as [this 
post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
 says, it works only if the first column type is not character (e.g. numeric).
In addition, this query should be used with PARALLEL OFF option, because 
without that, many files are output but only the first one has the header line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to