Re: java.io.IOException: Failed to close some writers

2018-05-11 Thread Eugene Kirpichov
Do you see any other exceptions in Stackdriver, e.g. (guess) OOM errors?
"worker lost contact with the service" is commonly caused by OOM crashes,
and GC thrashing could also cause network issues of the sort that you're
seeing.

On Fri, May 11, 2018 at 2:54 AM Jose Ignacio Honrado Benítez <
jihonra...@gmail.com> wrote:

> Hi there,
>
> I'm running a pipeline in Google Cloud Dataflow that reads from GCS and
> writes into several BQ tables. This pipeline has been successfully used to
> load several TB of data from GCS to BQ.
>
> In one particular execution, I am loading documents from GCS that are
> "exploded" into several documents (they are multiplied by between 1 and 25
> depending on the document type) in one of the steps to insert them into
> several BQ tables. Example: I have one document of "type a" that is
> exploded into three documents: "type a.1", "type a.2" and "type a.3."
>
> I also have performed successful executions of this kind several times,
> but in this specific case I am getting the following errors:
>
> java.io.IOException: Failed to close some writers at
> org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:245)
> Suppressed: java.io.IOException: java.io.IOException: insufficient data
> written at
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
> at
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
> at
> org.apache.beam.sdk.io.gcp.bigquery.TableRowWriter.close(TableRowWriter.java:81)
> at
> org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:239)
> at
> org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeFinishBundle(Unknown
> Source) at
> org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle(SimpleDoFnRunner.java:187)
> at
> com.google.cloud.dataflow.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:405)
> at
> com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:55)
> at
> com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
> insufficient data written at
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3501)
> at
> com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81)
> at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at
> com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
> at
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
> ... 4 more
>
> and at the end the job fails with:
>
> Workflow failed. Causes: ..., A work item was attempted 4 times without
> success. Each time the worker eventually lost contact with the service. The
> work item was attempted on:
>   xxx-d-05090856-5x3e-harness-86r6,
>   xxx-d-05090856-5x3e-harness-f296,
>   xxx-d-05090856-5x3e-harness-x5tz,
>   xxx-d-05090856-5x3e-harness-g362
>
> The only difference between this execution that failed and the previous
> ones that succeeded is that the former has a lot 

java.io.IOException: Failed to close some writers

2018-05-11 Thread Jose Ignacio Honrado Benítez
Hi there,

I'm running a pipeline in Google Cloud Dataflow that reads from GCS and
writes into several BQ tables. This pipeline has been successfully used to
load several TB of data from GCS to BQ.

In one particular execution, I am loading documents from GCS that are
"exploded" into several documents (they are multiplied by between 1 and 25
depending on the document type) in one of the steps to insert them into
several BQ tables. Example: I have one document of "type a" that is
exploded into three documents: "type a.1", "type a.2" and "type a.3."

I also have performed successful executions of this kind several times, but
in this specific case I am getting the following errors:

java.io.IOException: Failed to close some writers at
org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:245)
Suppressed: java.io.IOException: java.io.IOException: insufficient data
written at
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at
org.apache.beam.sdk.io.gcp.bigquery.TableRowWriter.close(TableRowWriter.java:81)
at
org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:239)
at
org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeFinishBundle(Unknown
Source) at
org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle(SimpleDoFnRunner.java:187)
at
com.google.cloud.dataflow.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:405)
at
com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:55)
at
com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
insufficient data written at
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3501)
at
com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at
com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
at
com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
at
com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
at
com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
... 4 more

and at the end the job fails with:

Workflow failed. Causes: ..., A work item was attempted 4 times without
success. Each time the worker eventually lost contact with the service. The
work item was attempted on:
  xxx-d-05090856-5x3e-harness-86r6,
  xxx-d-05090856-5x3e-harness-f296,
  xxx-d-05090856-5x3e-harness-x5tz,
  xxx-d-05090856-5x3e-harness-g362

The only difference between this execution that failed and the previous
ones that succeeded is that the former has a lot more input documents than
the latter, but I don't understand why that would be an issue, as I have
executed jobs with much more input in terms of GBs.

I found this topic in StackOverflow:
https://stackoverflow.com/questions/48285232/dataflow-batch-job-fails-with-failed-to-close-some-writers
but the proposed solution does not work for me, as I am setting up to 180
workers in maxNumWorkers argument.

Any idea of what could be happening?

If it helps, this is the job id on Google's