Do you see any other exceptions in Stackdriver, e.g. (guess) OOM errors?
"worker lost contact with the service" is commonly caused by OOM crashes,
and GC thrashing could also cause network issues of the sort that you're
seeing.

On Fri, May 11, 2018 at 2:54 AM Jose Ignacio Honrado Benítez <
jihonra...@gmail.com> wrote:

> Hi there,
>
> I'm running a pipeline in Google Cloud Dataflow that reads from GCS and
> writes into several BQ tables. This pipeline has been successfully used to
> load several TB of data from GCS to BQ.
>
> In one particular execution, I am loading documents from GCS that are
> "exploded" into several documents (they are multiplied by between 1 and 25
> depending on the document type) in one of the steps to insert them into
> several BQ tables. Example: I have one document of "type a" that is
> exploded into three documents: "type a.1", "type a.2" and "type a.3."
>
> I also have performed successful executions of this kind several times,
> but in this specific case I am getting the following errors:
>
> java.io.IOException: Failed to close some writers at
> org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:245)
> Suppressed: java.io.IOException: java.io.IOException: insufficient data
> written at
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
> at
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
> at
> org.apache.beam.sdk.io.gcp.bigquery.TableRowWriter.close(TableRowWriter.java:81)
> at
> org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:239)
> at
> org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeFinishBundle(Unknown
> Source) at
> org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle(SimpleDoFnRunner.java:187)
> at
> com.google.cloud.dataflow.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:405)
> at
> com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:55)
> at
> com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
> insufficient data written at
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3501)
> at
> com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81)
> at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at
> com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
> at
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
> ... 4 more
>
> and at the end the job fails with:
>
> Workflow failed. Causes: ..., A work item was attempted 4 times without
> success. Each time the worker eventually lost contact with the service. The
> work item was attempted on:
>   xxx-d-05090856-5x3e-harness-86r6,
>   xxx-d-05090856-5x3e-harness-f296,
>   xxx-d-05090856-5x3e-harness-x5tz,
>   xxx-d-05090856-5x3e-harness-g362
>
> The only difference between this execution that failed and the previous
> ones that succeeded is that the former has a lot more input documents than
> the latter, but I don't understand why that would be an issue, as I have
> executed jobs with much more input in terms of GBs.
>
> I found this topic in StackOverflow:
> https://stackoverflow.com/questions/48285232/dataflow-batch-job-fails-with-failed-to-close-some-writers
> but the proposed solution does not work for me, as I am setting up to 180
> workers in maxNumWorkers argument.
>
> Any idea of what could be happening?
>
> If it helps, this is the job id on Google's
> Dataflow: 2018-05-11_01_09_21-2559209214303012602
>
> Cheers
>

Reply via email to