I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.
On Mon, Jan 27, 2020, 8:48 AM David Magalhães <speeddra...@gmail.com> wrote: > Does StreamingFileSink use core-site.xml ? When I was using it, it didn't > load any configurations from core-site.xml. > > On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <mark.har...@hivehome.com> > wrote: > >> Hi Piotr, >> >> Thanks for the link to the issue. >> >> Do you know if there's a workaround? I've tried setting the following in >> my core-site.xml: >> >> fs.s3a.fast.upload.buffer=true >> >> To try and avoid writing the buffer files, but the taskmanager breaks >> with the same problem. >> >> Best regards, >> >> Mark >> ------------------------------ >> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr >> Nowojski <pi...@ververica.com> >> *Sent:* 22 January 2020 13:29 >> *To:* Till Rohrmann <trohrm...@apache.org> >> *Cc:* Mark Harris <mark.har...@hivehome.com>; flink-u...@apache.org < >> flink-u...@apache.org>; kkloudas <kklou...@apache.org> >> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit >> hooks for S3a files >> >> Hi, >> >> This is probably a known issue of Hadoop [1]. Unfortunately it was only >> fixed in 3.3.0. >> >> Piotrek >> >> [1] https://issues.apache.org/jira/browse/HADOOP-15658 >> >> On 22 Jan 2020, at 13:56, Till Rohrmann <trohrm...@apache.org> wrote: >> >> Thanks for reporting this issue Mark. I'm pulling Klou into this >> conversation who knows more about the StreamingFileSink. @Klou does the >> StreamingFileSink relies on DeleteOnExitHooks to clean up files? >> >> Cheers, >> Till >> >> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.har...@hivehome.com> >> wrote: >> >> Hi, >> >> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop >> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail >> (causing all the jobs running on them to fail) with an >> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager >> (and jobs that should be running on it) remain down until manually >> restarted. >> >> I managed to take and analyze a memory dump from one of the afflicted >> taskmanagers. >> >> It showed that 85% of the heap was made up of >> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in >> that hashset (9041060 out of ~9041100) pointed to files that began >> /tmp/hadoop-yarn/s3a/s3ablock >> >> The problem seems to affect jobs that make use of the StreamingFileSink >> - all of the taskmanager crashes have been on the taskmaster running at >> least one job using this sink, and a cluster running only a single >> taskmanager / job that uses the StreamingFileSink crashed with the GC >> overhead limit exceeded error. >> >> I've had a look for advice on handling this error more broadly without >> luck. >> >> Any suggestions or advice gratefully received. >> >> Best regards, >> >> Mark Harris >> >> >> >> The information contained in or attached to this email is intended only >> for the use of the individual or entity to which it is addressed. If you >> are not the intended recipient, or a person responsible for delivering it >> to the intended recipient, you are not authorised to and must not disclose, >> copy, distribute, or retain this message or any part of it. It may contain >> information which is confidential and/or covered by legal professional or >> other privilege under applicable law. >> >> The views expressed in this email are not necessarily the views of >> Centrica plc or its subsidiaries, and the company, its directors, officers >> or employees make no representation or accept any liability for its >> accuracy or completeness unless expressly stated to the contrary. >> >> Additional regulatory disclosures may be found here: >> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email >> >> PH Jones is a trading name of British Gas Social Housing Limited. British >> Gas Social Housing Limited (company no: 01026007), British Gas Trading >> Limited (company no: 03078711), British Gas Services Limited (company no: >> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas >> New Heating Limited (company no: 06723244), British Gas Services >> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) >> Limited (company no: 02877397) are all wholly owned subsidiaries of >> Centrica plc (company no: 3033654). Each company is registered in England >> and Wales with a registered office at Millstream, Maidenhead Road, Windsor, >> Berkshire SL4 5GD. >> >> British Gas Insurance Limited is authorised by the Prudential Regulation >> Authority and regulated by the Financial Conduct Authority and the >> Prudential Regulation Authority. British Gas Services Limited and Centrica >> Energy (Trading) Limited are authorised and regulated by the Financial >> Conduct Authority. British Gas Trading Limited is an appointed >> representative of British Gas Services Limited which is authorised and >> regulated by the Financial Conduct Authority. >> >> >> >> >> The information contained in or attached to this email is intended only >> for the use of the individual or entity to which it is addressed. If you >> are not the intended recipient, or a person responsible for delivering it >> to the intended recipient, you are not authorised to and must not disclose, >> copy, distribute, or retain this message or any part of it. It may contain >> information which is confidential and/or covered by legal professional or >> other privilege under applicable law. >> >> The views expressed in this email are not necessarily the views of >> Centrica plc or its subsidiaries, and the company, its directors, officers >> or employees make no representation or accept any liability for its >> accuracy or completeness unless expressly stated to the contrary. >> >> Additional regulatory disclosures may be found here: >> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email >> >> PH Jones is a trading name of British Gas Social Housing Limited. British >> Gas Social Housing Limited (company no: 01026007), British Gas Trading >> Limited (company no: 03078711), British Gas Services Limited (company no: >> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas >> New Heating Limited (company no: 06723244), British Gas Services >> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) >> Limited (company no: 02877397) are all wholly owned subsidiaries of >> Centrica plc (company no: 3033654). Each company is registered in England >> and Wales with a registered office at Millstream, Maidenhead Road, Windsor, >> Berkshire SL4 5GD. >> >> British Gas Insurance Limited is authorised by the Prudential Regulation >> Authority and regulated by the Financial Conduct Authority and the >> Prudential Regulation Authority. British Gas Services Limited and Centrica >> Energy (Trading) Limited are authorised and regulated by the Financial >> Conduct Authority. British Gas Trading Limited is an appointed >> representative of British Gas Services Limited which is authorised and >> regulated by the Financial Conduct Authority. >> >