[ https://issues.apache.org/jira/browse/AIRFLOW-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fokko Driesprong resolved AIRFLOW-1862. --------------------------------------- Resolution: Fixed Fix Version/s: (was: Airflow 1.8) 2.0.0 > redshift_to_s3_operator fails on BOOLEAN column in source table > --------------------------------------------------------------- > > Key: AIRFLOW-1862 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1862 > Project: Apache Airflow > Issue Type: Bug > Components: redshift > Affects Versions: Airflow 1.8 > Reporter: Mark S Weiss > Assignee: Mark S Weiss > Priority: Major > Labels: easyfix > Fix For: 2.0.0 > > > The {{airflow/operators/redshift_to_s3_operator}} module generates an > {{UNLOAD}} query using an SQL fragment for the {{column_castings}} that is > generated by this line: > https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87 > This is a bug, because a {{CAST()}} in Redshift on a column of type > {{BOOLEAN}} will raise an error and abort execution of the SQL statement. The > error raised is {{ERROR: cannot cast type boolean to character varying}}. > This can be trivially verified in Redshift using the following code: > {noformat} > my_db=# CREATE TABLE temp (BOOLEAN flag); > my_db=# INSERT INTO temp (flag) VALUES(false); > my_db=# SELECT CAST (flag AS text) FROM temp; > ERROR: cannot cast type boolean to character varying > {noformat} > The solution is to handle the case of {{BOOLEAN}} columns, by: > * Modifying the {{columns_query}} here > https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L75 > to also select column {{data_type}} > * Modify the expression here to use the {{data_type}} to generate a valid > {{CAST}} here: > https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87 > I have implemented and locally tested this alternative, which I believe is > both minimal in scope and robust in all cases, and will submit a pull request. > {noformat} > columns_query = """SELECT column_name, data_type > FROM information_schema.columns > WHERE table_schema = '{0}' > AND table_name = '{1}' > ORDER BY ordinal_position > """.format(self.schema, self.table) > .... > .... > column_castings = (', ').join(["CAST({0} AS text) AS > {1}".format(columns[i], columns[i]) > if types[i] != 'boolean' else > "CAST(CAST({0} AS SMALLINT)AS text) AS > {1}".format(columns[i], columns[i]) > for i in xrange(len(columns))]) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)