[jira] [Comment Edited] (HADOOP-17847) S3AInstrumentation Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics

2021-09-01 Thread Chris Nix (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404352#comment-17404352
 ] 

Chris Nix edited comment on HADOOP-17847 at 9/1/21, 3:41 PM:
-

Not wishing to divert the original report regarding spark, but I thought it 
worth noting that I'm also getting this error bubbling up from Flink 1.8's 
`StreamingFileSink` when using an s3a url.  I think flink 1.8 is using hadoop 
2.4.1.  In this Flink case, I believe it might occur sometimes as the sink's 
`RollingPolicy` rotates output files periodically.

We're streaming avro from Kafka into parquet files on S3. I've verified that we 
have a small proportion of avro messages on Kafka that have no corresponding 
parquet rows on S3.  This might imply it's more than just an instrumentation 
thing.

EDIT:  This is a false alarm.  After some digging, there are other reasons for 
the missing data from our Flink app.  So whilst the S3AInstrumentation warning 
occurs in the logs, I can't correlate it with missing data in S3.  Apologies 
for the misleading intervention here.


was (Author: chrisnix):
Not wishing to divert the original report regarding spark, but I thought it 
worth noting that I'm also getting this error bubbling up from Flink 1.8's 
`StreamingFileSink` when using an s3a url.  I think flink 1.8 is using hadoop 
2.4.1.  In this Flink case, I believe it might occur sometimes as the sink's 
`RollingPolicy` rotates output files periodically.

We're streaming avro from Kafka into parquet files on S3. I've verified that we 
have a small proportion of avro messages on Kafka that have no corresponding 
parquet rows on S3.  This might imply it's more than just an instrumentation 
thing.

> S3AInstrumentation Closing output stream statistics while data is still 
> marked as pending upload in OutputStreamStatistics
> --
>
> Key: HADOOP-17847
> URL: https://issues.apache.org/jira/browse/HADOOP-17847
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.2.1
> Environment: hadoop: 3.2.1
> spark: 3.0.2
> k8s server version: 1.18
> aws.java.sdk.bundle.version:1.11.1033
>Reporter: Li Rong
>Priority: Major
> Attachments: logs.txt
>
>
> When using hadoop s3a file upload for spark event Logs, the logs were queued 
> up and not uploaded before the process is shut down:
> {code:java}
> // 21/08/13 12:22:39 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
> has been closed (this is expected if the application is shutting down.)
> 21/08/13 12:22:39 WARN S3AInstrumentation: Closing output stream statistics 
> while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=1, blocksInQueue=1, blocksActive=0, 
> blockUploadsCompleted=0, blockUploadsFailed=0, bytesPendingUpload=106716, 
> bytesUploaded=0, blocksAllocated=1, blocksReleased=1, 
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
> transferDuration=0 ms, queueDuration=0 ms, averageQueueTime=0 ms, 
> totalUploadDuration=0 ms, effectiveBandwidth=0.0 bytes/s}{code}
> details see logs attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17847) S3AInstrumentation Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics

2021-08-25 Thread Chris Nix (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404352#comment-17404352
 ] 

Chris Nix commented on HADOOP-17847:


Not wishing to divert the original report regarding spark, but I thought it 
worth noting that I'm also getting this error bubbling up from Flink 1.8's 
`StreamingFileSink` when using an s3a url.  I think flink 1.8 is using hadoop 
2.4.1.  In this Flink case, I believe it might occur sometimes as the sink's 
`RollingPolicy` rotates output files periodically.

We're streaming avro from Kafka into parquet files on S3. I've verified that we 
have a small proportion of avro messages on Kafka that have no corresponding 
parquet rows on S3.  This might imply it's more than just an instrumentation 
thing.

> S3AInstrumentation Closing output stream statistics while data is still 
> marked as pending upload in OutputStreamStatistics
> --
>
> Key: HADOOP-17847
> URL: https://issues.apache.org/jira/browse/HADOOP-17847
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.2.1
> Environment: hadoop: 3.2.1
> spark: 3.0.2
> k8s server version: 1.18
> aws.java.sdk.bundle.version:1.11.1033
>Reporter: Li Rong
>Priority: Minor
> Attachments: logs.txt
>
>
> When using hadoop s3a file upload for spark event Logs, the logs were queued 
> up and not uploaded before the process is shut down:
> {code:java}
> // 21/08/13 12:22:39 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
> has been closed (this is expected if the application is shutting down.)
> 21/08/13 12:22:39 WARN S3AInstrumentation: Closing output stream statistics 
> while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=1, blocksInQueue=1, blocksActive=0, 
> blockUploadsCompleted=0, blockUploadsFailed=0, bytesPendingUpload=106716, 
> bytesUploaded=0, blocksAllocated=1, blocksReleased=1, 
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
> transferDuration=0 ms, queueDuration=0 ms, averageQueueTime=0 ms, 
> totalUploadDuration=0 ms, effectiveBandwidth=0.0 bytes/s}{code}
> details see logs attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org