[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator

2018-12-26 Thread GitBox
yohei1126 commented on a change in pull request #4371: 
[AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / 
S3ToGoogleCloudStorageOperator
URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244072388
 
 

 ##
 File path: airflow/contrib/operators/gcs_to_s3.py
 ##
 @@ -101,7 +101,7 @@ def execute(self, context):
 # Google Cloud Storage and not in S3
 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key)
 existing_files = s3_hook.list_keys(bucket_name)
-files = set(files) - set(existing_files)
+files = list(set(files) - set(existing_files))
 
 Review comment:
   If you think `xcom` should support `set`, this is a bug on `xcom` since the 
current implementation uses `json.dumps` and it only supports JSON serializable 
( `set` is not JSON serializable).
   
   ```
   >>> list = [1,2,3,1]
   >>> json.dumps(list).encode('UTF-8')
   b'[1, 2, 3, 1]'
   >>> s = set(list)
   >>> json.dumps(s).encode('UTF-8')
   Traceback (most recent call last):
 File "", line 1, in 
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/__init__.py", line 
231, in dumps
   return _default_encoder.encode(obj)
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 
199, in encode
   chunks = self.iterencode(o, _one_shot=True)
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 
257, in iterencode
   return _iterencode(o, 0)
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 
179, in default
   raise TypeError(f'Object of type {o.__class__.__name__} '
   TypeError: Object of type set is not JSON serializable
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator

2018-12-26 Thread GitBox
yohei1126 commented on a change in pull request #4371: 
[AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / 
S3ToGoogleCloudStorageOperator
URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244072388
 
 

 ##
 File path: airflow/contrib/operators/gcs_to_s3.py
 ##
 @@ -101,7 +101,7 @@ def execute(self, context):
 # Google Cloud Storage and not in S3
 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key)
 existing_files = s3_hook.list_keys(bucket_name)
-files = set(files) - set(existing_files)
+files = list(set(files) - set(existing_files))
 
 Review comment:
   If you think `xcom` should support `set`, this is a bug on `xcom` since the 
current implementation only supports JSON serializable only ( `set` is not JSON 
serializable).
   
   ```
   >>> list = [1,2,3,1]
   >>> json.dumps(list).encode('UTF-8')
   b'[1, 2, 3, 1]'
   >>> s = set(list)
   >>> json.dumps(s).encode('UTF-8')
   Traceback (most recent call last):
 File "", line 1, in 
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/__init__.py", line 
231, in dumps
   return _default_encoder.encode(obj)
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 
199, in encode
   chunks = self.iterencode(o, _one_shot=True)
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 
257, in iterencode
   return _iterencode(o, 0)
 File 
"/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 
179, in default
   raise TypeError(f'Object of type {o.__class__.__name__} '
   TypeError: Object of type set is not JSON serializable
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator

2018-12-26 Thread GitBox
yohei1126 commented on a change in pull request #4371: 
[AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / 
S3ToGoogleCloudStorageOperator
URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244071926
 
 

 ##
 File path: airflow/contrib/operators/gcs_to_s3.py
 ##
 @@ -101,7 +101,7 @@ def execute(self, context):
 # Google Cloud Storage and not in S3
 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key)
 existing_files = s3_hook.list_keys(bucket_name)
-files = set(files) - set(existing_files)
+files = list(set(files) - set(existing_files))
 
 Review comment:
   added traceback on the description. According to the traceback model.py 
internally uses `json.dumps(value).encode` but it does not support set since 
set is not JSON serializable


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator

2018-12-26 Thread GitBox
yohei1126 commented on a change in pull request #4371: 
[AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / 
S3ToGoogleCloudStorageOperator
URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244071926
 
 

 ##
 File path: airflow/contrib/operators/gcs_to_s3.py
 ##
 @@ -101,7 +101,7 @@ def execute(self, context):
 # Google Cloud Storage and not in S3
 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key)
 existing_files = s3_hook.list_keys(bucket_name)
-files = set(files) - set(existing_files)
+files = list(set(files) - set(existing_files))
 
 Review comment:
   added traceback on the description. According to the traceback model.py 
internally uses json.dumps(value).encode but it does not support set since set 
is not JSON serializable


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator

2018-12-26 Thread GitBox
yohei1126 commented on a change in pull request #4371: 
[AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / 
S3ToGoogleCloudStorageOperator
URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244070449
 
 

 ##
 File path: airflow/contrib/operators/gcs_to_s3.py
 ##
 @@ -101,7 +101,7 @@ def execute(self, context):
 # Google Cloud Storage and not in S3
 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key)
 existing_files = s3_hook.list_keys(bucket_name)
-files = set(files) - set(existing_files)
+files = list(set(files) - set(existing_files))
 
 Review comment:
   Actually the error log says it does not support `set`: ` TypeError: 
'NoneType' object is not iterable.` but it works with list.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services