[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-17 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976253#comment-16976253
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/18/19 3:01 AM:


Before my PR  [https://github.com/apache/hadoop-ozone/pull/163] I have too seen 
this error.  I think in goofys there is a logic if complete Multipart upload 
failed, it aborts and uploads. (Upload after abort, fails with 
No_SUCH_MULTIPART_ERROR, this is expected from Ozone/S3 perspective)

 

With the above PR, I was able to upload 1GB,2GB, ... ,6GB files. Please have a 
look in to PR #163 comment. And for testing with PR, have you used the branch 
and set up a new cluster or replaced jars. Could you provide some information 
on this.

 

So, we need to look for is there any failure for 
COMPLETE_MULTIPART_UPLOAD_ERROR for the key. The reason for this cause is 
explained in HDDS-2477. Can you also upload om-audit log, if there is an 
occurrence of COMPLETE_MULTIPART_UPLOAD_ERROR still.


was (Author: bharatviswa):
Before my PR  [https://github.com/apache/hadoop-ozone/pull/163] I have too seen 
this error.  I think in goofys there is a logic if complete Multipart upload 
failed, it aborts and uploads. (Upload after abort, fails with 
No_SUCH_MULTIPART_ERROR, this is expected from Ozone/S3 perspective)

 

With the above PR, I was able to upload 1GB,2GB, ... ,6GB files. Please have a 
look in to PR #163 comment. And for testing with PR, have you used the branch 
and set up a new cluster or replaced jars. Could you provide some information 
on this.

 

So, we need to look for is there any failure for 
COMPLETE_MULTIPART_UPLOAD_ERROR for the key. The reason for this cause is 
explained in HDDS-2477. Can you also upload om-audit log, if there is an 
occurrence of COMPLETE_MULTIPART_UPLOAD_ERROR still.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, 
> hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, 
> om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-13 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973836#comment-16973836
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/14/19 12:41 AM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see completeMultipartUpload is called with 286 parts, and OM is throwing an 
error InvalidPart, but from an audit log, I was not able to know which part is 
missing in OM(because we don't print any such info in log/exception message). 
(And I see 286 success commit Multipart upload for the key). 

I think there might be a chance of the scenario HDDS-2477 we are hitting here. 
(Not completely sure, this is my analysis after looking up logs) I have opened 
couple of Jira's HDDS-2477 HDDS-2471 and HDDS-2470 which will help in 
analyzing/debugging this issue. (Let's see HDDS-2477 will fix it or not)


was (Author: bharatviswa):
Hi [~timmylicheng]

Thanks for sharing the logs.

I see completeMultipartUpload is called with 286 parts, and OM is throwing an 
error InvalidPart, but from an audit log, I was not able to know which part is 
missing in OM. (And I see 286 success commit Multipart upload for the key). 



I think there might be a chance of the scenario HDDS-2477 we are hitting here. 
(Not completely sure, this is my analysis after looking up logs) I have opened 
couple of Jira's HDDS-2477 HDDS-2471 and HDDS-2470 which will help in 
analyzing/debugging this issue. (Let's see HDDS-2477 will fix it or not)

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png, om-audit-VM_50_210_centos.log, 
> om_audit_log_plc_1570863541668_9278.txt
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:22 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:21 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:17 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:08 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD

{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
   5912 ]}

| ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: 
Complete Multipart Upload Failed: volume: 
s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278

2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD

{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []}

| ret=SUCCESS |

 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-10 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971290#comment-16971290
 ] 

Li Cheng edited comment on HDDS-2356 at 11/11/19 7:21 AM:
--

[~bharat] In terms of the key in the last stacktrace:

2019-11-08 20:08:24,832 ERROR 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest:
 MultipartUpload Complete request failed for Key: plc_1570863541668_9278 in 
Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test
 INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete 
Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: 
ozone-testkey: plc_1570863541668_9278

 

OM audit logs show a bunch of entries for key plc_1570863541668_9278 with 
different clientID, for instance:

2019-11-08 20:19:56,241 | INFO | OMAudit | user=root | ip=9.134.50.210 | 
op=ALLOCATE_BLOCK \{volume=s325d55ad283aa400af464c76d713c07ad, 
bucket=ozone-test, key=plc_1570863541668_9278, dataSize=5242880, 
replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], 
clientID=103102209394803336} | ret=SUCCESS |

 

I'm uploading the entire OM audit logs about this key plc_1570863541668_9278.

This file size:

[root@VM_50_210_centos /data/idex_data/zip]# ls -altr -h 
./20191012/plc_1570863541668_9278
-rw-r--r-- 1 1003 users 1.4G 10月 22 10:33 ./20191012/plc_1570863541668_9278

 

You can try creating a similar size of file on you own and see it reproduces 
the issue. Please refer to the description for env details.


was (Author: timmylicheng):
[~bharat] In terms of the key in the last stacktrace:

2019-11-08 20:08:24,832 ERROR 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest:
 MultipartUpload Complete request failed for Key: plc_1570863541668_9278 in 
Volume/Bucket s325d55ad283aa400af464c76d713c07ad/ozone-test
INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete 
Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: 
ozone-testkey: plc_1570863541668_9278

 

OM audit logs show a bunch of entries for key plc_1570863541668_9278 with 
different clientID, for instance:

2019-11-08 20:19:56,241 | INFO | OMAudit | user=root | ip=9.134.50.210 | 
op=ALLOCATE_BLOCK \{volume=s325d55ad283aa400af464c76d713c07ad, 
bucket=ozone-test, key=plc_1570863541668_9278, dataSize=5242880, 
replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], 
clientID=103102209394803336} | ret=SUCCESS |

 

I'm uploading the entire OM audit logs about this key plc_1570863541668_9278.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-10 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971289#comment-16971289
 ] 

Li Cheng edited comment on HDDS-2356 at 11/11/19 3:14 AM:
--

[~bharat] As I said, debug logs for Goofys doesn't work since goofys debug mode 
is doing everything in single thread. The same error is not reproduced? Are you 
modeling some sample dataset and env to test the issues? I can try reproducing 
from my side and upload OM log, s3g log as well audit log here. Does that work?

 

Also, do you see 2019-11-06_18_13_57_422_ERROR in the attachment? Does it help?


was (Author: timmylicheng):
[~bharat] As I said, debug logs for Goofys doesn't work since goofys debug mode 
is doing everything in single thread. The same error is not reproduced? Are you 
modeling some sample dataset and env to test the issues? I can try reproducing 
from my side and upload OM log, s3g log as well audit log here. Does that work?

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
> Updated on 10/28/2019:
> See MISMATCH_MULTIPART_LIST error.
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-08 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970430#comment-16970430
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/8/19 5:29 PM:
---

Hi [~timmylicheng]

As every run, we are seeing the new error and the stack trace and from log not 
got much information about the root cause.

I think to debug this we need to know why for the Multipartupload key is not 
finding multipart upload or why some times we see InvalidMultipartupload error. 
We can see audit logs and see what request is passing for Multipartupload 
requests, and for the same key we can use listParts to know what are the parts 
OM is having in its MultipartInfoTable(This will help in InvalidPart error).

And also I think we should enable trace/debug log to see the incoming requests, 
and why for Multipart upload we see these errors. (Not sure some bug in Cache 
logic, or some handling we missed for MPU requests)

 

To debug this we need a complete OM log, audit log, S3gateway log. And also 
enable trace to see what requests are incoming, I think we log them in 
OzoneManagerProtocolServerSideTranslatorPB.

 

Let us know if you have any suggestions.

 


was (Author: bharatviswa):
Hi [~timmylicheng]

As every run, we are seeing the new error and the stack trace and from log not 
got much information about the root cause.

I think to debug this we need to know why for the Multipartupload key is not 
finding multipart upload or why some times we see InvalidMultipartupload error. 
We can see audit logs and see what request is passing for Multipartupload 
requests, and for the same key we can use listParts to know what are the parts 
OM is having in its MultipartInfoTable(This will help in InvalidPart error).

And also I think we should enable trace/debug log to see the incoming requests, 
and why for Multipart upload we see these errors. (Not sure some bug in Cache 
logic, or some handling we missed for MPU requests)

 

To debug this we need a complete OM log, audit log, S3gateway log. And also 
enable trace to see what requests are incoming, I think we log them in 
OzoneManagerProtocolServerSideTranslatorPB.

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, 
> image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-07 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969847#comment-16969847
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/8/19 5:26 AM:
---

I think this error is not related to the NO_SUCH_MULTIPART_UPLOAD_ERROR.

I have fixed MISMATCH_MULTIPART_LIST in HDDS-2395.


was (Author: bharatviswa):
I think this error is not related to the NO_SUCH_MULTIPART_UPLOAD_ERROR.

I have fixed MISMATCH_ERROR in HDDS-2395.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
> Updated on 10/28/2019:
> See MISMATCH_MULTIPART_LIST error.
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
>  .java:1104)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-06 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968967#comment-16968967
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/7/19 6:38 AM:
---

HI [~timmylicheng]

When uploading part file, when no such Multipartupload is found in 
multipartInfoTable we throw error.

In your tests, are you trying to abort any upload, while some client is still 
trying to upload part.

See the below snippet of the code.

 
{code:java}
if (multipartKeyInfo == null) {
 // This can occur when user started uploading part by the time commit
 // of that part happens, in between the user might have requested
 // abort multipart upload. If we just throw exception, then the data
 // will not be garbage collected, so move this part to delete table
 // and throw error
 // Move this part to delete table.
 throw new OMException("No such Multipart upload is with specified " +
 "uploadId " + uploadID,
 OMException.ResultCodes.NO_SUCH_MULTIPART_UPLOAD_ERROR);
}{code}
 

I don't see any log attached to the Jira BTW.


was (Author: bharatviswa):
HI [~timmylicheng]

When uploading part file, when no such Multipartupload is found in 
multipartInfoTable we throw error.

In your tests, are you trying to abort any upload.

See the below snippet of the code.

 
{code:java}
if (multipartKeyInfo == null) {
 // This can occur when user started uploading part by the time commit
 // of that part happens, in between the user might have requested
 // abort multipart upload. If we just throw exception, then the data
 // will not be garbage collected, so move this part to delete table
 // and throw error
 // Move this part to delete table.
 throw new OMException("No such Multipart upload is with specified " +
 "uploadId " + uploadID,
 OMException.ResultCodes.NO_SUCH_MULTIPART_UPLOAD_ERROR);
}{code}
 

I don't see any log attached to the Jira BTW.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs 
> are in the attachment.
>  2019-11-05 18:12:37,766 ERROR 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
>  MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket 
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload 
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-31 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964554#comment-16964554
 ] 

Li Cheng edited comment on HDDS-2356 at 11/1/19 3:45 AM:
-

Also see a core dump in rocksdb during last night's testing. Please check the 
attachment for the entire log.

 

>From the first glance, it looks like when rocksdb is iterating the write_batch 
>to insert to the memtable, there happens a stl memory error during memory 
>movement. It might not be related to ozone, but it would cause rocksdb 
>failure. 

 

Created https://issues.apache.org/jira/browse/HDDS-2396 to track the core dump 
in OM rocksdb.

Below is some part of the stack:

C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0
 C [librocksdbjni3192271038586903156.so+0x358fec] 
rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&, rocksdb:
 :ValueType)+0x51c
 C [librocksdbjni3192271038586903156.so+0x359d17] 
rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&)+0x17
 C [librocksdbjni3192271038586903156.so+0x3513bc] 
rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c
 C [librocksdbjni3192271038586903156.so+0x354df9] 
rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, 
unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, 
unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9
 C [librocksdbjni3192271038586903156.so+0x29fd79] 
rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, 
rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, 
unsigned long, rocksdb::PreReleaseCallback*)+0x24b9
 C [librocksdbjni3192271038586903156.so+0x2a0431] 
rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21
 C [librocksdbjni3192271038586903156.so+0x1a064c] 
Java_org_rocksdb_RocksDB_write0+0xcc
 J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe 
[0x7f58f1872d00+0xbe]
 J 10093% C1 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V 
(400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc]
 j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4


was (Author: timmylicheng):
Also see a core dump in rocksdb during last night's testing. Please check the 
attachment for the entire log.

 

>From the first glance, it looks like when rocksdb is iterating the write_batch 
>to insert to the memtable, there happens a stl memory error during memory 
>movement. It might not be related to ozone, but it would cause rocksdb 
>failure. 

Below is some part of the stack:

C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0
C [librocksdbjni3192271038586903156.so+0x358fec] 
rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&, rocksdb:
:ValueType)+0x51c
C [librocksdbjni3192271038586903156.so+0x359d17] 
rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&)+0x17
C [librocksdbjni3192271038586903156.so+0x3513bc] 
rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c
C [librocksdbjni3192271038586903156.so+0x354df9] 
rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, 
unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, 
unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9
C [librocksdbjni3192271038586903156.so+0x29fd79] 
rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, 
rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, 
unsigned long, rocksdb::PreReleaseCallback*)+0x24b9
C [librocksdbjni3192271038586903156.so+0x2a0431] 
rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21
C [librocksdbjni3192271038586903156.so+0x1a064c] 
Java_org_rocksdb_RocksDB_write0+0xcc
J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe 
[0x7f58f1872d00+0xbe]
J 10093% C1 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V 
(400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc]
j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-31 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964482#comment-16964482
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/1/19 12:41 AM:


Opened HDDS-2395 to handle CompleteMPU error cases.


was (Author: bharatviswa):
Opened HDDS-2359 to handle CompleteMPU error cases.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
>  .java:1104)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
>  at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
>  at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
>  at 
> org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
>  at 
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
>  
> The following errors has been resolved in 
> https://issues.apache.org/jira/browse/HDDS-2322. 
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
>  java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-31 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964156#comment-16964156
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/31/19 3:55 PM:


{quote} What are you looking for in audit logs?
{quote}
For which key we are failing with Mismatch part list, it will print the 
information in audit logs what are the parts which we have got with 
completeMultipartUpload request. For the same key whar are the parts in OM DB 
we can get using aws s3api --endpoint <<[http://s3:port|http://s3:port/]>> 
list-parts --bucket <> --key <> In this way we can know what is 
the reason for Mismatch part list. Once we get this information we can see if 
it matches with any of the above case which are not currently handled in 
OzoneS3.

 

Once you have above information, can you share it? Thanks.


was (Author: bharatviswa):
{quote} What are you looking for in audit logs?
{quote}
For which key we are failing with Mismatch part list, it will print the 
information in audit logs what are the parts which we have got with 
completeMultipartUpload request. For the same key whar are the parts in OM DB 
we can get using aws s3api --endpoint <<[http://s3:port|http://s3:port/]>> 
list-parts --bucket <> --key <> In this way we can know what is 
the reason for Mismatch part list. Once we get this information we can see if 
it matches with any of the above case which are not currently handled in 
OzoneS3.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
>  .java:1104)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
>  at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
>  at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
>  at 
> org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
>  at 
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
>  at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-31 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964156#comment-16964156
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/31/19 3:53 PM:


{quote} What are you looking for in audit logs?
{quote}
For which key we are failing with Mismatch part list, it will print the 
information in audit logs what are the parts which we have got with 
completeMultipartUpload request. For the same key whar are the parts in OM DB 
we can get using aws s3api --endpoint <<[http://s3:port|http://s3:port/]>> 
list-parts --bucket <> --key <> In this way we can know what is 
the reason for Mismatch part list. Once we get this information we can see if 
it matches with any of the above case which are not currently handled in 
OzoneS3.


was (Author: bharatviswa):
{quote} What are you looking for in audit logs?
{quote}
For which key we are failing with Mismatch part list, it will print the 
information in audit logs what are the parts which we have got with 
completeMultipartUpload request. For the same key we are the parts in OM DB 
using aws s3api --endpoint <> list-parts --bucket <> 
--key <> In this way we can know what is the reason for Mismatch part 
list. Once we get this information we can see if it matches with any of the 
above case which are not currently handled in OzoneS3.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
>  .java:1104)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
>  at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
>  at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
>  at 
> org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
>  at 
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
>  at 
> 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-29 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962583#comment-16962583
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/30/19 1:23 AM:


# When uploaded 2 parts, and when complete upload 1 part no error
 # During complete multipart upload name/part number not matching with uploaded 
part and part number then InvalidPart error
 # When parts are not specified in sorted order InvalidPartOrder
 # During complete multipart upload when no uploaded parts, and we specify some 
parts then also InvalidPart
 # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error)
 # When part 3 uploaded, complete with part 3 can be done

Found few scenarios where we behave differently need to handled in Ozone S3.

 

And also if you can enable audit logs and see what is the parts we are getting 
during complete MPU and for that key what are parts available, then also we can 
see where the error occurs. But we should handle the above scenarios in Ozone. 
But the question is why goofys when uploading is uploading like this or we 
missed any other cases still. To answer this audit logs will help here.


was (Author: bharatviswa):
# When uploaded 2 parts, and when complete upload 1 part no error
 # During complete multipart upload name/part number not matching with uploaded 
part and part number then InvalidPart error
 # When parts are not specified in sorted order InvalidPartOrder
 # During complete multipart upload when no uploaded parts, and we specify some 
parts then also InvalidPart
 # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error)
 # When part 3 uploaded, complete with part 3 can be done

Found few scenarios where we behave differently need to handled in Ozone S3.

 

And also if you can enable audit logs and see what is the parts we are getting 
during complete MPU and for that key what are parts available, then also we can 
see where the error occurs. But we should handle the above scenarios in Ozone. 
But the question is why goofys when uploading is uploading like this or we 
missed any other cases still.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-29 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962583#comment-16962583
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/30/19 1:23 AM:


# When uploaded 2 parts, and when complete upload 1 part no error
 # During complete multipart upload name/part number not matching with uploaded 
part and part number then InvalidPart error
 # When parts are not specified in sorted order InvalidPartOrder
 # During complete multipart upload when no uploaded parts, and we specify some 
parts then also InvalidPart
 # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error)
 # When part 3 uploaded, complete with part 3 can be done

Found few scenarios where we behave differently need to handled in Ozone S3.

 

And also if you can enable audit logs and see what is the parts we are getting 
during complete MPU and for that key what are parts available, then also we can 
see where the error occurs. But we should handle the above scenarios in Ozone. 
But the question is why goofys when uploading is uploading like this or we 
missed any other cases still.


was (Author: bharatviswa):
# When uploaded 2 parts, and when complete upload 1 part no error
 # During complete multipart upload name/part number not matching with uploaded 
part and part number then InvalidPart error
 # When parts are not specified in sorted order InvalidPartOrder
 # During complete multipart upload when no uploaded parts, and we specify some 
parts then also InvalidPart
 # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error)
 # When part 3 uploaded, complete with part 3 can be done

Found few scenarios where we behave differently need to handled in Ozone S3.

 

And also if you can enable audit logs and see what is the parts we are getting 
during complete MPU and for that key what are parts available, then also we can 
see where the error occurs. But we should handle above scenarios in Ozone.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-29 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng edited comment on HDDS-2356 at 10/29/19 6:53 AM:
--

[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
sub_files = os.listdir(dir)
num = 0 
output = "" 
for sub_file in sub_files:
 sub_file_path = dir + sub_file
 with open(dest_dir + sub_file_path, "w") as fw:
   with open(sub_file_path, "r") as fr:
line = fr.readline()
while line:
  num += 1
  output += line
  if (num >= 2000):
fw.write(output)
output = ""
num = 0
  line = fr.readline()
   fw.write(output)
{code}
 

Also I'm using S3 gateway to connect to ozone and mount local file path by fuse 
(goofys). Have you tested s3 gateway? Most unit tests are going thru RPC.


was (Author: timmylicheng):
[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
sub_files = os.listdir(dir)
num = 0 
output = "" 
for sub_file in sub_files:
 sub_file_path = dir + sub_file
 with open(dest_dir + sub_file_path, "w") as fw:
   with open(sub_file_path, "r") as fr:
line = fr.readline()
while line:
  num += 1
  output += line
  if (num >= 2000):
fw.write(output)
output = ""
num = 0
  line = fr.readline()
   fw.write(output)
{code}

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng edited comment on HDDS-2356 at 10/29/19 2:29 AM:
--

[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
sub_files = os.listdir(dir)
num = 0 
output = "" 
for sub_file in sub_files:
 sub_file_path = dir + sub_file
 with open(dest_dir + sub_file_path, "w") as fw:
   with open(sub_file_path, "r") as fr:
line = fr.readline()
while line:
  num += 1
  output += line
  if (num >= 2000):
fw.write(output)
output = ""
num = 0
  line = fr.readline()
   fw.write(output)
{code}


was (Author: timmylicheng):
[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
 for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)
{code}

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng edited comment on HDDS-2356 at 10/29/19 2:25 AM:
--

[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
 for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)
{code}


was (Author: timmylicheng):
[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
{code}
for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng edited comment on HDDS-2356 at 10/29/19 2:24 AM:
--

[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
{code}
for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)


was (Author: timmylicheng):
[~bharat] I'm using python to write to a OS path. sth like:

for sub_file in sub_files:
 sub_file_path = dir + sub_file
 with open(dest_dir + sub_file_path, "w") as fw:
 with open(sub_file_path, "r") as fr:
 line = fr.readline()
 while line:
 num += 1
 output += line
 if (num >= 2000):
 fw.write(output)
 output = ""
 num = 0
 line = fr.readline()
 fw.write(output)

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961568#comment-16961568
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/29/19 12:10 AM:
-

{quote}[~bharat] In term of reproduction, I have a dataset which includes small 
files as well as big files and I'm using s3 gateway from ozone and mount ozone 
cluster to a local path by goofys. All the data are recursively written to the 
mount path, which essentially leads to ozone cluster. The ozone cluster is 
deployed on a 3-node VMs env and each VM has only 1 disk for ozone data 
writing. I think it's a pretty simple scenario to reproduce. The solely 
operation is writing to ozone cluster thru fuse. 
{quote}
 

I have tried with a test to run parallel MPU for a key, and it still passes. 

 
{quote}All the data are recursively written to the mount path, which 
essentially leads to ozone cluster.
{quote}
 

Mean here using cp to move the files to mount path?. 

 

If possible, could you give some steps/exact commands to repro this, which will 
help in debugging this issue? I have tried mount on docker, but after a few 
large files cp, I get OutofMemory from docker. 


was (Author: bharatviswa):
[~bharat] In term of reproduction, I have a dataset which includes small files 
as well as big files and I'm using s3 gateway from ozone and mount ozone 
cluster to a local path by goofys. All the data are recursively written to the 
mount path, which essentially leads to ozone cluster. The ozone cluster is 
deployed on a 3-node VMs env and each VM has only 1 disk for ozone data 
writing. I think it's a pretty simple scenario to reproduce. The solely 
operation is writing to ozone cluster thru fuse. 

 

I have tried with a test to run parallel MPU for a key, and it still passes. 

 

All the data are recursively written to the mount path, which essentially leads 
to ozone cluster.

 

Mean here using cp to move the files to mount path?. 

 

If possible, could you give some steps/commands to repro this, which will help 
in debug this issue. I have tried mount on docker, but after few large files 
cp, I get OutofMemory from docker. 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961223#comment-16961223
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/28/19 4:26 PM:


{quote}Also it's print the same pipeline id in s3g logs like crazy. Wonder if 
that's expected. [~bharat]

2019-10-28 11:43:08,912 [qtp1383524016-24] INFO - Allocating block with 
ExcludeList \{datanodes = [], containerIds = [], pipelineIds = []}
 ...skipping...
 eID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, 
PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a,
{quote}
This is a log in allocateNewBlock() in BlockOutputStreamEntryPool.java. 
HDDS-2286 changed this behavior to print this info to the logs. As in the 
excludeList it is list of pipelineId's, same entry if added again it is adding 
again to the list, I think we should have a check to add if it does not exist 
in list. Hey, one thing this is getting printed as pipelineIds=[], but after 
that it is printing same pipelineID multiple times. can you paste the complete 
log?


was (Author: bharatviswa):
{quote}Also it's print the same pipeline id in s3g logs like crazy. Wonder if 
that's expected. [~bharat]

2019-10-28 11:43:08,912 [qtp1383524016-24] INFO - Allocating block with 
ExcludeList \{datanodes = [], containerIds = [], pipelineIds = []}
...skipping...
eID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, 
PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a,
{quote}
This is a log in allocateNewBlock() in BlockOutputStreamEntryPool.java. 
HDDS-2286 changed this behavior to print this info to the logs. As in the 
excludeList it is list of pipelineId's, same entry if added again it is adding 
again to the list, I think we should have a check to add if it does not exist 
in list.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-27 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960747#comment-16960747
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/28/19 4:15 AM:


Thank You [~timmylicheng] for trying out the fix.

This occurs when uploaded parts map is not matching with the parts map for a 
MPU key.

Example: Uploaded MPU key has 1,2,3 as its parts in OM for uploaded Multipart 
key, whereas different values or greater/less than the size of the entries are 
specified during complete multipart upload, this error will be thrown.

 

And also it helps if you could you provide the steps used to reproduce this 
error.

 
{code:java}
if (partKeyInfoMap.size() != multipartMap.size()) {
 throw new OMException("Complete Multipart Upload Failed: volume: " +
 volumeName + "bucket: " + bucketName + "key: " + keyName,
 OMException.ResultCodes.MISMATCH_MULTIPART_LIST);
}{code}


was (Author: bharatviswa):
Thank You [~timmylicheng] for trying out the fix.

This occurs when uploaded parts map is not matching with the parts map for a 
MPU key.

Example: Uploaded MPU key has 1,2,3 as its parts in OM for uploaded Multipart 
key, where as in upload if different values or greater/less than size of the 
entries are specified during complete multipart upload, this error will be 
thrown.

 

And also it helps if you could you provide the steps used to reproduce this 
error.

 
{code:java}
if (partKeyInfoMap.size() != multipartMap.size()) {
 throw new OMException("Complete Multipart Upload Failed: volume: " +
 volumeName + "bucket: " + bucketName + "key: " + keyName,
 OMException.ResultCodes.MISMATCH_MULTIPART_LIST);
}{code}

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-27 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960747#comment-16960747
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/28/19 4:14 AM:


Thank You [~timmylicheng] for trying out the fix.

This occurs when uploaded parts map is not matching with the parts map for a 
MPU key.

Example: Uploaded MPU key has 1,2,3 as its parts in OM for uploaded Multipart 
key, where as in upload if different values or greater/less than size of the 
entries are specified during complete multipart upload, this error will be 
thrown.

 

And also it helps if you could you provide the steps used to reproduce this 
error.

 
{code:java}
if (partKeyInfoMap.size() != multipartMap.size()) {
 throw new OMException("Complete Multipart Upload Failed: volume: " +
 volumeName + "bucket: " + bucketName + "key: " + keyName,
 OMException.ResultCodes.MISMATCH_MULTIPART_LIST);
}{code}


was (Author: bharatviswa):
Thank You [~timmylicheng] for trying out the fix.

This occurs when uploaded parts map is not matching with the parts map for a 
MPU key.

Example: Uploaded MPU key has 1,2,3 as its parts in OM for uploaded Multipart 
key, where as in upload if different values or greater/less than size of the 
entries, this error will be thrown.

 

And also it helps if you could you provide the steps used to reproduce this 
error.

 
{code:java}
if (partKeyInfoMap.size() != multipartMap.size()) {
 throw new OMException("Complete Multipart Upload Failed: volume: " +
 volumeName + "bucket: " + bucketName + "key: " + keyName,
 OMException.ResultCodes.MISMATCH_MULTIPART_LIST);
}{code}

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-27 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960747#comment-16960747
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/28/19 4:10 AM:


Thank You [~timmylicheng] for trying out the fix.

This occurs when uploaded parts map is not matching with the parts map for a 
MPU key.

Example: Uploaded MPU key has 1,2,3 as its parts in OM for uploaded Multipart 
key, where as in upload if different values or greater/less than size of the 
entries, this error will be thrown.

 

And also it helps if you could you provide the steps used to reproduce this 
error.

 
{code:java}
if (partKeyInfoMap.size() != multipartMap.size()) {
 throw new OMException("Complete Multipart Upload Failed: volume: " +
 volumeName + "bucket: " + bucketName + "key: " + keyName,
 OMException.ResultCodes.MISMATCH_MULTIPART_LIST);
}{code}


was (Author: bharatviswa):
This occurs when uploaded parts map is not matching with the parts map for a 
MPU key.

Example: Uploaded MPU key has 1,2,3 as its parts in OM for uploaded Multipart 
key, where as in upload if different values or greater/less than size of the 
entries, this error will be thrown.

 

And also it helps if you could you provide the steps used to reproduce this 
error.

 
{code:java}
if (partKeyInfoMap.size() != multipartMap.size()) {
 throw new OMException("Complete Multipart Upload Failed: volume: " +
 volumeName + "bucket: " + bucketName + "key: " + keyName,
 OMException.ResultCodes.MISMATCH_MULTIPART_LIST);
}{code}

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959419#comment-16959419
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/25/19 4:25 AM:


The above is an issue in OM, which might happen randomly, when there is another 
handler thread in OM is updating the partInfo Map while flush thread commits 
those entries. (During commit, we convert OmMultipartInfo to proto, during this 
we will see the above error).

 

Above config are not related to OM, they are for SCM end. 

 
{quote}However, writing fails due to no more blocks allocated. I guess my 
cluster cannot keep up with the writing. 
{quote}
we can see the error in  SCM logs why no more blocks are being allocated. And 
also this exception will be received by OM too.

 


was (Author: bharatviswa):
The above is an issue in OM, which might happen randomly, when there is another 
handler thread in OM is updating the partInfo Map while flush thread commits 
those entries. (During commit, we convert OmMultipartInfo to proto, during this 
we will see the above error).

 

Above config are not related to OM, they are for SCM end. 

 
{quote}However, writing fails due to no more blocks allocated. I guess my 
cluster cannot keep up with the writing. 
{quote}
we can see the error in  SCM logs why no more blocks are being allocated. And 
also this exception will be received by OM too.

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959238#comment-16959238
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/24/19 9:44 PM:


This will be fixed as part of HDDS-2322.  Thank You [~timmylicheng] for 
reporting this issue.


was (Author: bharatviswa):
This will be fixed as part of HDDS-2322.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org