Kengo Seki created AIRFLOW-2254: ----------------------------------- Summary: Fix header output on RedshiftToS3Transfer Key: AIRFLOW-2254 URL: https://issues.apache.org/jira/browse/AIRFLOW-2254 Project: Apache Airflow Issue Type: Bug Components: aws, redshift Reporter: Kengo Seki
The current implementation of RedshiftToS3Transfer is as follows and seems to have referred to [this post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7]. {code} unload_query = """ UNLOAD ('SELECT {0} UNION ALL SELECT {1} FROM {2}.{3} ORDER BY 1 DESC') TO 's3://{4}/{5}/{3}_' with credentials 'aws_access_key_id={6};aws_secret_access_key={7}' {8}; """.format(column_names, column_castings, self.schema, self.table, self.s3_bucket, self.s3_key, credentials.access_key, credentials.secret_key, unload_options) {code} {{ORDER BY 1 DESC}} is intended to output the header first, but as [this post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374] says, it works only if the first column type is not character (e.g. numeric). In addition, this query should be used with PARALLEL OFF option, because without that, many files are output but only the first one has the header line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)