Thanks!  I'll try to test it. but it only happens sporadically, so not sure
if I'll be able to verify it in test cluster.

On Fri, Jun 21, 2019 at 2:56 PM Samuel Kohonen <[email protected]>
wrote:

> Looks OK on a quick look. I tried to verify that it is the exception that
> bubbles up at the point in code I marked in my first email. If I didn't
> completely misread the traceback that should be correct exception to catch.
>
> Hope you can test this in a test cluster before going for production. I
> guess the worst case for this change is that WAL uploads would start
> failing if there is a typo or something somewhere..
>
> On Fri, Jun 21, 2019 at 2:29 PM 'Yun Guo' via wal-e <
> [email protected]> wrote:
>
>> Thanks Samuel!
>> Can you help me review if below function can be used to check for the 410
>> exception?
>>
>> try:
>>     import google.cloud.exceptions
>> except ImportError:
>>     gcs = None
>>
>> def is_gcs_response_error(typ, value):
>>     if gcs is None:
>>         return False
>>
>>     if not issubclass(typ, google.api_core.exceptions.GoogleAPICallError):
>>         return False
>>
>>     if value.code == 410:
>>         return True
>>
>>     return False
>>
>>
>> On Fri, Jun 21, 2019 at 11:27 AM Samuel Kohonen <[email protected]>
>> wrote:
>>
>>> Hey,
>>>
>>> Seems like for some reason the resumable upload session that the google
>>> python library uses for large files disappeared. No idea how common or why
>>> that would happen, but unfortunately the google library doesn't seem to
>>> retry those errors themselves anymore now that we stopped using the
>>> deprecated num_retries parameter directly.
>>>
>>> Are you open to hacking your wal-e installation a bit to see if just
>>> checking for the GoogleAPICallError (and maybe specifically 410) and
>>> retrying would fix this? We can think about more cleaner solutions
>>> afterwards. Checking for the exception somewhere in this if-branch (
>>> https://github.com/wal-e/wal-e/blob/master/wal_e/worker/upload.py#L119)
>>> and making sure it doesn't get to the else block and raised should force
>>> wal-e to retry the upload. Is this something you could try adding to your
>>> local installation and see if it fixes the situation for you?
>>>
>>> Cheers,
>>> Samuel
>>>
>>> On Fri, Jun 21, 2019 at 9:43 AM 'Yun Guo' via wal-e <
>>> [email protected]> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> We are using wal-e v1.1 to backup GCS. The total backup is around 3.2T .
>>>> We noticed the wal-e processed failed HTTP/410 sporadically and below
>>>> is the log.
>>>>
>>>> Jun 21 02:30:57  wal_e.worker.upload INFO     MSG: beginning volume 
>>>> compression#012        DETAIL: Building volume 1142.#012        
>>>> STRUCTURED: time=2019-06-21T02:30:57.666929-00 pid=37373Jun 21 02:30:58  
>>>> wal_e.worker.upload INFO     MSG: beginning volume compression#012        
>>>> DETAIL: Building volume 1143.#012        STRUCTURED: 
>>>> time=2019-06-21T02:30:58.958880-00 pid=37373Jun 21 02:31:13  
>>>> wal_e.worker.upload INFO     MSG: beginning volume compression#012        
>>>> DETAIL: Building volume 1144.#012        STRUCTURED: 
>>>> time=2019-06-21T02:31:13.820819-00 pid=37373Jun 21 02:31:14  
>>>> wal_e.operator.backup WARNING  MSG: blocking on sending WAL segments#012   
>>>>      DETAIL: The backup was not completed successfully, but we have to 
>>>> wait anyway.  See README: TODO about pg_cancel_backup#012        
>>>> STRUCTURED: time=2019-06-21T02:31:14.716392-00 pid=37373Jun 21 02:31:17  
>>>> wal_e.main   CRITICAL MSG: An unprocessed exception has avoided all error 
>>>> handling#012        DETAIL: Traceback (most recent call last):#012         
>>>>  File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
>>>> line 1041, in upload_from_file#012            size, num_retries, 
>>>> predefined_acl)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
>>>> line 957, in _do_upload#012            num_retries, predefined_acl)#012    
>>>>       File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
>>>> line 904, in _do_resumable_upload#012            response = 
>>>> upload.transmit_next_chunk(transport)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/requests/upload.py",
>>>>  line 396, in transmit_next_chunk#012            
>>>> self._process_response(result, len(payload))#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/_upload.py",
>>>>  line 574, in _process_response#012            self._get_status_code, 
>>>> callback=self._make_invalid)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/_helpers.py",
>>>>  line 93, in require_status_code#012            status_code, u'Expected 
>>>> one of', *status_codes)#012        
>>>> google.resumable_media.common.InvalidResponse: ('Request failed with 
>>>> status code', 410, 'Expected one of', <HTTPStatus.OK: 200>, 308)#012       
>>>>  #012        During handling of the above exception, another exception 
>>>> occurred:#012        #012        Traceback (most recent call last):#012    
>>>>       File "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 
>>>> 87, in shim#012            return f(*args, **kwargs)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 140, 
>>>> in put_file_helper#012            return 
>>>> self.blobstore.uri_put_file(self.creds, url, tf)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/blobstore/gs/utils.py", line 
>>>> 38, in uri_put_file#012            blob.upload_from_file(fp, size=size, 
>>>> content_type=content_type)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
>>>> line 1044, in upload_from_file#012            
>>>> _raise_from_invalid_response(exc)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
>>>> line 1914, in _raise_from_invalid_response#012            
>>>> response.status_code, message, response=response)#012        
>>>> google.api_core.exceptions.GoogleAPICallError: 410 PUT 
>>>> https://www.googleapis.com/upload/storage/v1/b/gitlab-gprd-postgres-backup/o?uploadType=resumable&upload_id=AEnB2UrKU4zHPqzF4fGPeEvhoxJ-2qeIK5xY9SI8O1NIhtOaDn1GC7Q_D4XQVFFXvMVVzuhCLJvUmzTkkKui6M8mpb3BedH15g:
>>>>  ('Request failed with status code', 410, 'Expected one of', 
>>>> <HTTPStatus.OK: 200>, 308)#012        #012        During handling of the 
>>>> above exception, another exception occurred:#012        #012        
>>>> Traceback (most recent call last):#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/cmd.py", line 652, in 
>>>> main#012            pool_size=args.pool_size)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 
>>>> 197, in database_backup#012            **kwargs)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 
>>>> 500, in _upload_pg_cluster_dir#012            pool.put(tpart)#012          
>>>> File "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", 
>>>> line 108, in put#012            self._wait()#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", line 
>>>> 65, in _wait#012            raise val#012          File 
>>>> "src/gevent/greenlet.py", line 716, in gevent._greenlet.Greenlet.run#012   
>>>>        File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 145, 
>>>> in __call__#012            k = put_file_helper()#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 101, in 
>>>> shim#012            exc_processor_cxt=exc_processor_cxt)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 139, in 
>>>> retry_with_count_internal#012            side_effect_func(exc_tup, 
>>>> exc_processor_cxt)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 135, 
>>>> in log_volume_failures_on_error#012            raise 
>>>> typ(value).with_traceback(tb)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 87, in 
>>>> shim#012            return f(*args, **kwargs)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 140, 
>>>> in put_file_helper#012            return 
>>>> self.blobstore.uri_put_file(self.creds, url, tf)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/wal_e/blobstore/gs/utils.py", line 
>>>> 38, in uri_put_file#012            blob.upload_from_file(fp, size=size, 
>>>> content_type=content_type)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
>>>> line 1044, in upload_from_file#012            
>>>> _raise_from_invalid_response(exc)#012          File 
>>>> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
>>>> line 1914, in _raise_from_invalid_response#012            
>>>> response.status_code, message, response=response)#012        
>>>> google.api_core.exceptions.GoogleAPICallError: None 410 PUT 
>>>> https://www.googleapis.com/upload/storage/v1/b/gitlab-gprd-postgres-backup/o?uploadType=resumable&upload_id=AEnB2UrKU4zHPqzF4fGPeEvhoxJ-2qeIK5xY9SI8O1NIhtOaDn1GC7Q_D4XQVFFXvMVVzuhCLJvUmzTkkKui6M8mpb3BedH15g:
>>>>  ('Request failed with status code', 410, 'Expected one of', 
>>>> <HTTPStatus.OK: 200>, 308)#012        #012        STRUCTURED: 
>>>> time=2019-06-21T02:31:17.960909-00 pid=37373
>>>>
>>>>
>>>> Any idea what we can do to fix it?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> --
>>>>
>>>> Yun GuoSenior Database Engineer | GitLab
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "wal-e" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/wal-e/CAJsFAOz8wwdwGRgV9u4CFnrdQ0QKYMcArpVLOQ%3D%3DvVynT5q-Pw%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/wal-e/CAJsFAOz8wwdwGRgV9u4CFnrdQ0QKYMcArpVLOQ%3D%3DvVynT5q-Pw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>
>> --
>>
>> Yun GuoSenior Database Engineer | GitLab
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "wal-e" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/wal-e/CAJsFAOw-KNW8%2Bura2r2nr%3DP05WgVc%2BRd0OQy05ARjr5Vqmb9Vg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/wal-e/CAJsFAOw-KNW8%2Bura2r2nr%3DP05WgVc%2BRd0OQy05ARjr5Vqmb9Vg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 

Yun GuoSenior Database Engineer | GitLab

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/wal-e/CAJsFAOw9RZcu%3Dd2tMrwjAd3KHKr08M8_dyfscA-KH5nk6ygfwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to