[
https://issues.apache.org/jira/browse/HADOOP-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404352#comment-17404352
]
Chris Nix edited comment on HADOOP-17847 at 9/1/21, 3:41 PM:
-
Not wishing to divert the original report regarding spark, but I thought it
worth noting that I'm also getting this error bubbling up from Flink 1.8's
`StreamingFileSink` when using an s3a url. I think flink 1.8 is using hadoop
2.4.1. In this Flink case, I believe it might occur sometimes as the sink's
`RollingPolicy` rotates output files periodically.
We're streaming avro from Kafka into parquet files on S3. I've verified that we
have a small proportion of avro messages on Kafka that have no corresponding
parquet rows on S3. This might imply it's more than just an instrumentation
thing.
EDIT: This is a false alarm. After some digging, there are other reasons for
the missing data from our Flink app. So whilst the S3AInstrumentation warning
occurs in the logs, I can't correlate it with missing data in S3. Apologies
for the misleading intervention here.
was (Author: chrisnix):
Not wishing to divert the original report regarding spark, but I thought it
worth noting that I'm also getting this error bubbling up from Flink 1.8's
`StreamingFileSink` when using an s3a url. I think flink 1.8 is using hadoop
2.4.1. In this Flink case, I believe it might occur sometimes as the sink's
`RollingPolicy` rotates output files periodically.
We're streaming avro from Kafka into parquet files on S3. I've verified that we
have a small proportion of avro messages on Kafka that have no corresponding
parquet rows on S3. This might imply it's more than just an instrumentation
thing.
> S3AInstrumentation Closing output stream statistics while data is still
> marked as pending upload in OutputStreamStatistics
> --
>
> Key: HADOOP-17847
> URL: https://issues.apache.org/jira/browse/HADOOP-17847
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
>Affects Versions: 3.2.1
> Environment: hadoop: 3.2.1
> spark: 3.0.2
> k8s server version: 1.18
> aws.java.sdk.bundle.version:1.11.1033
>Reporter: Li Rong
>Priority: Major
> Attachments: logs.txt
>
>
> When using hadoop s3a file upload for spark event Logs, the logs were queued
> up and not uploaded before the process is shut down:
> {code:java}
> // 21/08/13 12:22:39 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client
> has been closed (this is expected if the application is shutting down.)
> 21/08/13 12:22:39 WARN S3AInstrumentation: Closing output stream statistics
> while data is still marked as pending upload in
> OutputStreamStatistics{blocksSubmitted=1, blocksInQueue=1, blocksActive=0,
> blockUploadsCompleted=0, blockUploadsFailed=0, bytesPendingUpload=106716,
> bytesUploaded=0, blocksAllocated=1, blocksReleased=1,
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0,
> transferDuration=0 ms, queueDuration=0 ms, averageQueueTime=0 ms,
> totalUploadDuration=0 ms, effectiveBandwidth=0.0 bytes/s}{code}
> details see logs attached
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org