[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439110#comment-16439110 ] ASF subversion and git services commented on AIRFLOW-2254: -- Commit a148043107f147ce7d3617308f119be27810ec5a in incubator-airflow's branch refs/heads/master from [~sathyaprakashg] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a148043 ] [AIRFLOW-2254] Put header as first row in unload Currently, data is ordered by first column in descending order Header row comes as first only if the first column is integer This fix puts header as first row regardless of first column data type Closes #3180 from sathyaprakashg/AIRFLOW-2254 > Fix header output on RedshiftToS3Transfer > - > > Key: AIRFLOW-2254 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2254 > Project: Apache Airflow > Issue Type: Bug > Components: aws, redshift >Reporter: Kengo Seki >Assignee: Sathyaprakash Govindasamy >Priority: Major > Fix For: 2.0.0 > > > The current implementation of RedshiftToS3Transfer is as follows and seems to > have referred to [this > post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7]. > {code} > unload_query = """ > UNLOAD ('SELECT {0} > UNION ALL > SELECT {1} FROM {2}.{3} > ORDER BY 1 DESC') > TO 's3://{4}/{5}/{3}_' > with > credentials > 'aws_access_key_id={6};aws_secret_access_key={7}' > {8}; > """.format(column_names, column_castings, > self.schema, self.table, >self.s3_bucket, self.s3_key, > credentials.access_key, >credentials.secret_key, unload_options) > {code} > {{ORDER BY 1 DESC}} is intended to output the header first, but as [this > post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374] > says, it works only if the first column type is not character (e.g. numeric). > In addition, this query should be used with PARALLEL OFF option, because > without that, many files are output but only the first one has the header > line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439111#comment-16439111 ] ASF subversion and git services commented on AIRFLOW-2254: -- Commit a148043107f147ce7d3617308f119be27810ec5a in incubator-airflow's branch refs/heads/master from [~sathyaprakashg] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a148043 ] [AIRFLOW-2254] Put header as first row in unload Currently, data is ordered by first column in descending order Header row comes as first only if the first column is integer This fix puts header as first row regardless of first column data type Closes #3180 from sathyaprakashg/AIRFLOW-2254 > Fix header output on RedshiftToS3Transfer > - > > Key: AIRFLOW-2254 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2254 > Project: Apache Airflow > Issue Type: Bug > Components: aws, redshift >Reporter: Kengo Seki >Assignee: Sathyaprakash Govindasamy >Priority: Major > Fix For: 2.0.0 > > > The current implementation of RedshiftToS3Transfer is as follows and seems to > have referred to [this > post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7]. > {code} > unload_query = """ > UNLOAD ('SELECT {0} > UNION ALL > SELECT {1} FROM {2}.{3} > ORDER BY 1 DESC') > TO 's3://{4}/{5}/{3}_' > with > credentials > 'aws_access_key_id={6};aws_secret_access_key={7}' > {8}; > """.format(column_names, column_castings, > self.schema, self.table, >self.s3_bucket, self.s3_key, > credentials.access_key, >credentials.secret_key, unload_options) > {code} > {{ORDER BY 1 DESC}} is intended to output the header first, but as [this > post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374] > says, it works only if the first column type is not character (e.g. numeric). > In addition, this query should be used with PARALLEL OFF option, because > without that, many files are output but only the first one has the header > line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427947#comment-16427947 ] Kengo Seki commented on AIRFLOW-2254: - [~sathyaprakashg] Thanks for the comment and submitting a PR! bq. There is an a argument called unload_options. You can pass PARALLEL OFF as value to unload_options and it should work. Yeah, that's right. I meant that it should be documented explicitly. :) > Fix header output on RedshiftToS3Transfer > - > > Key: AIRFLOW-2254 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2254 > Project: Apache Airflow > Issue Type: Bug > Components: aws, redshift >Reporter: Kengo Seki >Assignee: Sathyaprakash Govindasamy >Priority: Major > > The current implementation of RedshiftToS3Transfer is as follows and seems to > have referred to [this > post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7]. > {code} > unload_query = """ > UNLOAD ('SELECT {0} > UNION ALL > SELECT {1} FROM {2}.{3} > ORDER BY 1 DESC') > TO 's3://{4}/{5}/{3}_' > with > credentials > 'aws_access_key_id={6};aws_secret_access_key={7}' > {8}; > """.format(column_names, column_castings, > self.schema, self.table, >self.s3_bucket, self.s3_key, > credentials.access_key, >credentials.secret_key, unload_options) > {code} > {{ORDER BY 1 DESC}} is intended to output the header first, but as [this > post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374] > says, it works only if the first column type is not character (e.g. numeric). > In addition, this query should be used with PARALLEL OFF option, because > without that, many files are output but only the first one has the header > line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421607#comment-16421607 ] Sathyaprakash Govindasamy commented on AIRFLOW-2254: Pull request created to fix the issue where header row is not coming as first when data type of first column is not integer https://github.com/apache/incubator-airflow/pull/3180 > Fix header output on RedshiftToS3Transfer > - > > Key: AIRFLOW-2254 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2254 > Project: Apache Airflow > Issue Type: Bug > Components: aws, redshift >Reporter: Kengo Seki >Assignee: Sathyaprakash Govindasamy >Priority: Major > > The current implementation of RedshiftToS3Transfer is as follows and seems to > have referred to [this > post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7]. > {code} > unload_query = """ > UNLOAD ('SELECT {0} > UNION ALL > SELECT {1} FROM {2}.{3} > ORDER BY 1 DESC') > TO 's3://{4}/{5}/{3}_' > with > credentials > 'aws_access_key_id={6};aws_secret_access_key={7}' > {8}; > """.format(column_names, column_castings, > self.schema, self.table, >self.s3_bucket, self.s3_key, > credentials.access_key, >credentials.secret_key, unload_options) > {code} > {{ORDER BY 1 DESC}} is intended to output the header first, but as [this > post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374] > says, it works only if the first column type is not character (e.g. numeric). > In addition, this query should be used with PARALLEL OFF option, because > without that, many files are output but only the first one has the header > line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416506#comment-16416506 ] Sathyaprakash Govindasamy commented on AIRFLOW-2254: [~sekikn] There is an a argument called *unload_options*. You can passĀ *PARALLEL OFF* as value to unload_options and it should work. Please let me know. > Fix header output on RedshiftToS3Transfer > - > > Key: AIRFLOW-2254 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2254 > Project: Apache Airflow > Issue Type: Bug > Components: aws, redshift >Reporter: Kengo Seki >Priority: Major > > The current implementation of RedshiftToS3Transfer is as follows and seems to > have referred to [this > post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7]. > {code} > unload_query = """ > UNLOAD ('SELECT {0} > UNION ALL > SELECT {1} FROM {2}.{3} > ORDER BY 1 DESC') > TO 's3://{4}/{5}/{3}_' > with > credentials > 'aws_access_key_id={6};aws_secret_access_key={7}' > {8}; > """.format(column_names, column_castings, > self.schema, self.table, >self.s3_bucket, self.s3_key, > credentials.access_key, >credentials.secret_key, unload_options) > {code} > {{ORDER BY 1 DESC}} is intended to output the header first, but as [this > post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374] > says, it works only if the first column type is not character (e.g. numeric). > In addition, this query should be used with PARALLEL OFF option, because > without that, many files are output but only the first one has the header > line. -- This message was sent by Atlassian JIRA (v7.6.3#76005)