[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer

2018-04-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439110#comment-16439110
 ] 

ASF subversion and git services commented on AIRFLOW-2254:
--

Commit a148043107f147ce7d3617308f119be27810ec5a in incubator-airflow's branch 
refs/heads/master from [~sathyaprakashg]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a148043 ]

[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254


> Fix header output on RedshiftToS3Transfer
> -
>
> Key: AIRFLOW-2254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, redshift
>Reporter: Kengo Seki
>Assignee: Sathyaprakash Govindasamy
>Priority: Major
> Fix For: 2.0.0
>
>
> The current implementation of RedshiftToS3Transfer is as follows and seems to 
> have referred to [this 
> post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].
> {code}
> unload_query = """
> UNLOAD ('SELECT {0}
> UNION ALL
> SELECT {1} FROM {2}.{3}
> ORDER BY 1 DESC')
> TO 's3://{4}/{5}/{3}_'
> with
> credentials 
> 'aws_access_key_id={6};aws_secret_access_key={7}'
> {8};
> """.format(column_names, column_castings, 
> self.schema, self.table,
>self.s3_bucket, self.s3_key, 
> credentials.access_key,
>credentials.secret_key, unload_options)
> {code}
> {{ORDER BY 1 DESC}} is intended to output the header first, but as [this 
> post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
>  says, it works only if the first column type is not character (e.g. numeric).
> In addition, this query should be used with PARALLEL OFF option, because 
> without that, many files are output but only the first one has the header 
> line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer

2018-04-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439111#comment-16439111
 ] 

ASF subversion and git services commented on AIRFLOW-2254:
--

Commit a148043107f147ce7d3617308f119be27810ec5a in incubator-airflow's branch 
refs/heads/master from [~sathyaprakashg]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a148043 ]

[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254


> Fix header output on RedshiftToS3Transfer
> -
>
> Key: AIRFLOW-2254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, redshift
>Reporter: Kengo Seki
>Assignee: Sathyaprakash Govindasamy
>Priority: Major
> Fix For: 2.0.0
>
>
> The current implementation of RedshiftToS3Transfer is as follows and seems to 
> have referred to [this 
> post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].
> {code}
> unload_query = """
> UNLOAD ('SELECT {0}
> UNION ALL
> SELECT {1} FROM {2}.{3}
> ORDER BY 1 DESC')
> TO 's3://{4}/{5}/{3}_'
> with
> credentials 
> 'aws_access_key_id={6};aws_secret_access_key={7}'
> {8};
> """.format(column_names, column_castings, 
> self.schema, self.table,
>self.s3_bucket, self.s3_key, 
> credentials.access_key,
>credentials.secret_key, unload_options)
> {code}
> {{ORDER BY 1 DESC}} is intended to output the header first, but as [this 
> post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
>  says, it works only if the first column type is not character (e.g. numeric).
> In addition, this query should be used with PARALLEL OFF option, because 
> without that, many files are output but only the first one has the header 
> line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer

2018-04-05 Thread Kengo Seki (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427947#comment-16427947
 ] 

Kengo Seki commented on AIRFLOW-2254:
-

[~sathyaprakashg] Thanks for the comment and submitting a PR!

bq. There is an a argument called unload_options. You can pass PARALLEL OFF as 
value to unload_options and it should work.

Yeah, that's right. I meant that it should be documented explicitly. :)

> Fix header output on RedshiftToS3Transfer
> -
>
> Key: AIRFLOW-2254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, redshift
>Reporter: Kengo Seki
>Assignee: Sathyaprakash Govindasamy
>Priority: Major
>
> The current implementation of RedshiftToS3Transfer is as follows and seems to 
> have referred to [this 
> post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].
> {code}
> unload_query = """
> UNLOAD ('SELECT {0}
> UNION ALL
> SELECT {1} FROM {2}.{3}
> ORDER BY 1 DESC')
> TO 's3://{4}/{5}/{3}_'
> with
> credentials 
> 'aws_access_key_id={6};aws_secret_access_key={7}'
> {8};
> """.format(column_names, column_castings, 
> self.schema, self.table,
>self.s3_bucket, self.s3_key, 
> credentials.access_key,
>credentials.secret_key, unload_options)
> {code}
> {{ORDER BY 1 DESC}} is intended to output the header first, but as [this 
> post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
>  says, it works only if the first column type is not character (e.g. numeric).
> In addition, this query should be used with PARALLEL OFF option, because 
> without that, many files are output but only the first one has the header 
> line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer

2018-04-01 Thread Sathyaprakash Govindasamy (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421607#comment-16421607
 ] 

Sathyaprakash Govindasamy commented on AIRFLOW-2254:


Pull request created to fix the issue where header row is not coming as first 
when data type of first column is not integer

https://github.com/apache/incubator-airflow/pull/3180

> Fix header output on RedshiftToS3Transfer
> -
>
> Key: AIRFLOW-2254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, redshift
>Reporter: Kengo Seki
>Assignee: Sathyaprakash Govindasamy
>Priority: Major
>
> The current implementation of RedshiftToS3Transfer is as follows and seems to 
> have referred to [this 
> post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].
> {code}
> unload_query = """
> UNLOAD ('SELECT {0}
> UNION ALL
> SELECT {1} FROM {2}.{3}
> ORDER BY 1 DESC')
> TO 's3://{4}/{5}/{3}_'
> with
> credentials 
> 'aws_access_key_id={6};aws_secret_access_key={7}'
> {8};
> """.format(column_names, column_castings, 
> self.schema, self.table,
>self.s3_bucket, self.s3_key, 
> credentials.access_key,
>credentials.secret_key, unload_options)
> {code}
> {{ORDER BY 1 DESC}} is intended to output the header first, but as [this 
> post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
>  says, it works only if the first column type is not character (e.g. numeric).
> In addition, this query should be used with PARALLEL OFF option, because 
> without that, many files are output but only the first one has the header 
> line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer

2018-03-27 Thread Sathyaprakash Govindasamy (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416506#comment-16416506
 ] 

Sathyaprakash Govindasamy commented on AIRFLOW-2254:


[~sekikn] There is an a argument called *unload_options*. You can passĀ 
*PARALLEL OFF* as value to unload_options and it should work. Please let me 
know.

> Fix header output on RedshiftToS3Transfer
> -
>
> Key: AIRFLOW-2254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, redshift
>Reporter: Kengo Seki
>Priority: Major
>
> The current implementation of RedshiftToS3Transfer is as follows and seems to 
> have referred to [this 
> post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].
> {code}
> unload_query = """
> UNLOAD ('SELECT {0}
> UNION ALL
> SELECT {1} FROM {2}.{3}
> ORDER BY 1 DESC')
> TO 's3://{4}/{5}/{3}_'
> with
> credentials 
> 'aws_access_key_id={6};aws_secret_access_key={7}'
> {8};
> """.format(column_names, column_castings, 
> self.schema, self.table,
>self.s3_bucket, self.s3_key, 
> credentials.access_key,
>credentials.secret_key, unload_options)
> {code}
> {{ORDER BY 1 DESC}} is intended to output the header first, but as [this 
> post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
>  says, it works only if the first column type is not character (e.g. numeric).
> In addition, this query should be used with PARALLEL OFF option, because 
> without that, many files are output but only the first one has the header 
> line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)