[jira] [Created] (HDFS-15897) SCM HA should be disabled in secure cluster
Bharat Viswanadham created HDFS-15897: - Summary: SCM HA should be disabled in secure cluster Key: HDFS-15897 URL: https://issues.apache.org/jira/browse/HDFS-15897 Project: Hadoop HDFS Issue Type: Task Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham SCM HA security work is still in progress. [~elek] Brought up the point that until before merge of SCM HA branch we should add safeguard check to fail bringing up the cluster -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-15165: -- Attachment: HDFS-15165.01.patch > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-15165.00.patch, HDFS-15165.01.patch, > example-test.patch > > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039643#comment-17039643 ] Bharat Viswanadham commented on HDFS-15165: --- Thank You [~sodonnell] for the review. {quote}I believe you need to pass the iNodeAttr into the AccessControlException as the first parameter rather than inode at the bottom of the method. Otherwise, if there is an access exception, the log message will contain the HDFS permissions for the inode, rather than the provider permissions, which would be confusing eg: {quote} Yes, I agree, Done. Used the test case provided by you, thanks for the test. > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-15165.00.patch, HDFS-15165.01.patch, > example-test.patch > > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-15165: -- Status: Patch Available (was: Open) > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-15165.00.patch > > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036441#comment-17036441 ] Bharat Viswanadham edited comment on HDFS-15165 at 2/13/20 7:12 PM: With Suggested fix tried on the cluster. Now hdfs path managed with sentry du command is working. {code:java} [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging 805306368 2147483648 /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging/9f4f1d1b671d0714_1fbeb6b7 [root@gg-620-1 ~]# hdfs dfs -du -s /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging 1073741824 2684354560 /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging [root@gg-620-1 ~]#{code} was (Author: bharatviswa): With Suggested fix, now hdfs path managed with sentry du command is working. {code:java} [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging 805306368 2147483648 /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging/9f4f1d1b671d0714_1fbeb6b7 [root@gg-620-1 ~]# hdfs dfs -du -s /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging 1073741824 2684354560 /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging [root@gg-620-1 ~]#{code} > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-15165.00.patch > > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036441#comment-17036441 ] Bharat Viswanadham commented on HDFS-15165: --- With Suggested fix, now hdfs path managed with sentry du command is working. {code:java} [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging 805306368 2147483648 /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging/9f4f1d1b671d0714_1fbeb6b7 [root@gg-620-1 ~]# hdfs dfs -du -s /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging 1073741824 2684354560 /dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging [root@gg-620-1 ~]#{code} > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-15165.00.patch > > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-15165: -- Attachment: HDFS-15165.00.patch > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-15165.00.patch > > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDFS-15165: -- Description: HDFS-12130 changed the behavior of DU command. It merged both check permission and computation in to a single step. During this change, when it is required to getInodeAttributes, it just used inode.getAttributes(). But when attribute provider class is configured, we should call attribute provider configured object to get InodeAttributes and use the returned InodeAttributes during checkPermission. So, if we see after HDFS-12130, code is changed as below. {code:java} byte[][] localComponents = {inode.getLocalNameBytes()}; INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; enforcer.checkPermission( fsOwner, supergroup, callerUgi, iNodeAttr, // single inode attr in the array new INode[]{inode}, // single inode in the array localComponents, snapshotId, null, -1, // this will skip checkTraverse() because // not checking ancestor here false, null, null, access, // the target access to be checked against the inode null, // passing null sub access avoids checking children false); {code} If we observe 2nd line it is missing the check if attribute provider class is configured use that to get InodeAttributeProvider. Because of this when hdfs path is managed by sentry, and InodeAttributeProvider class is configured with SentryINodeAttributeProvider, it does not get SentryInodeAttributeProvider object and not using AclFeature from that if any Acl’s are set. This has caused the issue of AccessControlException when du command is run against hdfs path managed by Sentry. {code:java} [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ du: Permission denied: user=systest, access=READ_EXECUTE, inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} was: HDFS-12130 has changed the behavior of DU. During that change to getInodeAttributes, it missed calling getAttributesProvider().getAttributes when it is configured. Because of this, when sentry is configured for hdfs path, and attributeProvider class is set. We missed calling this, and AclFeature from Sentry was missing. Because of this when DU command is run on a sentry managed hdfs path, we are seeing AccessControlException. This Jira is to fix this issue. {code:java} dfs.namenode.inode.attributes.provider.class org.apache.sentry.hdfs.SentryINodeAttributesProvider {code} > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --
[jira] [Created] (HDFS-15165) In Du missed calling getAttributesProvider
Bharat Viswanadham created HDFS-15165: - Summary: In Du missed calling getAttributesProvider Key: HDFS-15165 URL: https://issues.apache.org/jira/browse/HDFS-15165 Project: Hadoop HDFS Issue Type: Bug Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham HDFS-12130 has changed the behavior of DU. During that change to getInodeAttributes, it missed calling getAttributesProvider().getAttributes when it is configured. Because of this, when sentry is configured for hdfs path, and attributeProvider class is set. We missed calling this, and AclFeature from Sentry was missing. Because of this when DU command is run on a sentry managed hdfs path, we are seeing AccessControlException. This Jira is to fix this issue. {code:java} dfs.namenode.inode.attributes.provider.class org.apache.sentry.hdfs.SentryINodeAttributesProvider {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2536) Add ozone.om.internal.service.id to OM HA configuration
[ https://issues.apache.org/jira/browse/HDDS-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2536. -- Fix Version/s: 0.5.0 Resolution: Fixed > Add ozone.om.internal.service.id to OM HA configuration > --- > > Key: HDDS-2536 > URL: https://issues.apache.org/jira/browse/HDDS-2536 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This Jira is to add ozone.om.internal.serviceid to let OM knows it belong to > a particular service. > > As now we have ozone.om.service.ids -≥ where we can define all service id's > in a cluster.(This can happen if the same config is shared across the cluster) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2591) No tailMap needed for startIndex 0 in ContainerSet#listContainer
[ https://issues.apache.org/jira/browse/HDDS-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978910#comment-16978910 ] Bharat Viswanadham edited comment on HDDS-2591 at 11/21/19 2:10 AM: Hi [~adoroszlai] This API is added so that it can be used by Scanners implementation. I think, for now, we can leave it, and fix the issue reported. was (Author: bharatviswa): Hi [~adoroszlai] This API is used so that it can be used by Scanners. I think, for now, we can leave it, and fix the issue reported. > No tailMap needed for startIndex 0 in ContainerSet#listContainer > > > Key: HDDS-2591 > URL: https://issues.apache.org/jira/browse/HDDS-2591 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > > {{ContainerSet#listContainer}} has this code: > {code:title=https://github.com/apache/hadoop-ozone/blob/3c334f6a7b344e0e5f52fec95071c369286cfdcb/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java#L198} > map = containerMap.tailMap(containerMap.firstKey(), true); > {code} > This is equivalent to: > {code} > map = containerMap; > {code} > since {{tailMap}} is a sub-map with all keys larger than or equal to > ({{inclusive=true}}) {{firstKey}}, which is the lowest key in the map. So it > is a sub-map with all keys, ie. the whole map. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2591) No tailMap needed for startIndex 0 in ContainerSet#listContainer
[ https://issues.apache.org/jira/browse/HDDS-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978910#comment-16978910 ] Bharat Viswanadham commented on HDDS-2591: -- Hi [~adoroszlai] This API is used so that it can be used by Scanners. I think, for now, we can leave it, and fix the issue reported. > No tailMap needed for startIndex 0 in ContainerSet#listContainer > > > Key: HDDS-2591 > URL: https://issues.apache.org/jira/browse/HDDS-2591 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > > {{ContainerSet#listContainer}} has this code: > {code:title=https://github.com/apache/hadoop-ozone/blob/3c334f6a7b344e0e5f52fec95071c369286cfdcb/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java#L198} > map = containerMap.tailMap(containerMap.firstKey(), true); > {code} > This is equivalent to: > {code} > map = containerMap; > {code} > since {{tailMap}} is a sub-map with all keys larger than or equal to > ({{inclusive=true}}) {{firstKey}}, which is the lowest key in the map. So it > is a sub-map with all keys, ie. the whole map. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2594) S3 RangeReads failing with NumberFormatException
[ https://issues.apache.org/jira/browse/HDDS-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2594: - Status: Patch Available (was: Open) > S3 RangeReads failing with NumberFormatException > > > Key: HDDS-2594 > URL: https://issues.apache.org/jira/browse/HDDS-2594 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > > {code:java} > 2019-11-20 15:32:04,684 WARN org.eclipse.jetty.servlet.ServletHandler: > javax.servlet.ServletException: java.lang.NumberFormatException: For input > string: "3977248768" > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:432) > at > org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1780) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1609) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:539) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2594) S3 RangeReads failing with NumberFormatException
Bharat Viswanadham created HDDS-2594: Summary: S3 RangeReads failing with NumberFormatException Key: HDDS-2594 URL: https://issues.apache.org/jira/browse/HDDS-2594 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham {code:java} 2019-11-20 15:32:04,684 WARN org.eclipse.jetty.servlet.ServletHandler: javax.servlet.ServletException: java.lang.NumberFormatException: For input string: "3977248768" at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:432) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1780) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1609) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:539) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2594) S3 RangeReads failing with NumberFormatException
[ https://issues.apache.org/jira/browse/HDDS-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2594: Assignee: Bharat Viswanadham > S3 RangeReads failing with NumberFormatException > > > Key: HDDS-2594 > URL: https://issues.apache.org/jira/browse/HDDS-2594 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > > {code:java} > 2019-11-20 15:32:04,684 WARN org.eclipse.jetty.servlet.ServletHandler: > javax.servlet.ServletException: java.lang.NumberFormatException: For input > string: "3977248768" > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:432) > at > org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1780) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1609) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:539) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978812#comment-16978812 ] Bharat Viswanadham commented on HDDS-2356: -- With HDDS-2477 PR, I was able to verify that MPU for larger size files is working. https://github.com/apache/hadoop-ozone/pull/159#issuecomment-556527944 > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at j
[jira] [Resolved] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipelines for a key
[ https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2241. -- Fix Version/s: 0.5.0 Resolution: Fixed > Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the > pipelines for a key > > > Key: HDDS-2241 > URL: https://issues.apache.org/jira/browse/HDDS-2241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, while looking up a key, the Ozone Manager gets the pipeline > information from SCM through an RPC for every block in the key. For large > files > 1GB, we may end up making a lot of RPC calls for this. This can be > optimized in a couple of ways > * We can implement a batch getContainerWithPipeline API in SCM using which we > can get the pipeline info locations for all the blocks for a file. To keep > the number of containers passed in to SCM in a single call, we can have a > fixed container batch size on the OM side. _Here, Number of calls = 1 (or k > depending on batch size)_ > * Instead, a simpler change would be to have a map (method local) of > ContainerID -> Pipeline that we get from SCM so that we don't need to make > repeated calls to SCM for the same containerID for a key. _Here, Number of > calls = Number of unique containerIDs_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2247) Delete FileEncryptionInfo from KeyInfo when a Key is deleted
[ https://issues.apache.org/jira/browse/HDDS-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2247. -- Fix Version/s: 0.5.0 Resolution: Fixed > Delete FileEncryptionInfo from KeyInfo when a Key is deleted > > > Key: HDDS-2247 > URL: https://issues.apache.org/jira/browse/HDDS-2247 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Dinesh Chitlangia >Assignee: Dinesh Chitlangia >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > As part of HDDS-2174 we are deleting GDPR Encryption Key on delete file > operation. > However, if KMS is enabled, we are skipping GDPR Encryption Key approach when > writing file in a GDPR enforced Bucket. > {code:java} > final FileEncryptionInfo feInfo = keyOutputStream.getFileEncryptionInfo(); > if (feInfo != null) { > KeyProvider.KeyVersion decrypted = getDEK(feInfo); > final CryptoOutputStream cryptoOut = > new CryptoOutputStream(keyOutputStream, > OzoneKMSUtil.getCryptoCodec(conf, feInfo), > decrypted.getMaterial(), feInfo.getIV()); > return new OzoneOutputStream(cryptoOut); > } else { > try{ > GDPRSymmetricKey gk; > Map openKeyMetadata = > openKey.getKeyInfo().getMetadata(); > if(Boolean.valueOf(openKeyMetadata.get(OzoneConsts.GDPR_FLAG))){ > gk = new GDPRSymmetricKey( > openKeyMetadata.get(OzoneConsts.GDPR_SECRET), > openKeyMetadata.get(OzoneConsts.GDPR_ALGORITHM) > ); > gk.getCipher().init(Cipher.ENCRYPT_MODE, gk.getSecretKey()); > return new OzoneOutputStream( > new CipherOutputStream(keyOutputStream, gk.getCipher())); > } > }catch (Exception ex){ > throw new IOException(ex); > } > {code} > In such scenario, when KMS is enabled & GDPR enforced on a bucket, if user > deletes a file, we should delete the {{FileEncryptionInfo}} from KeyInfo, > before moving it to deletedTable, else we cannot guarantee Right to Erasure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2581) Make OM Ha config to use Java Configs
Bharat Viswanadham created HDDS-2581: Summary: Make OM Ha config to use Java Configs Key: HDDS-2581 URL: https://issues.apache.org/jira/browse/HDDS-2581 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham This Jira is created based on the comments from [~aengineer] during HDDS-2536 review. Can we please use the Java Configs instead of this old-style config to add a config? This Jira it to make all HA OM config to the new style (Java config based approach) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2581) Make OM Ha config to use Java Configs
[ https://issues.apache.org/jira/browse/HDDS-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2581: - Labels: newbie (was: ) > Make OM Ha config to use Java Configs > - > > Key: HDDS-2581 > URL: https://issues.apache.org/jira/browse/HDDS-2581 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > Labels: newbie > > This Jira is created based on the comments from [~aengineer] during HDDS-2536 > review. > Can we please use the Java Configs instead of this old-style config to add a > config? > > This Jira it to make all HA OM config to the new style (Java config based > approach) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2486) Sonar: Avoid empty test methods
[ https://issues.apache.org/jira/browse/HDDS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2486: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Sonar: Avoid empty test methods > --- > > Key: HDDS-2486 > URL: https://issues.apache.org/jira/browse/HDDS-2486 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Dinesh Chitlangia >Priority: Minor > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {{TestRDBTableStore#toIOException}} is empty. > https://sonarcloud.io/project/issues?id=hadoop-ozone&issues=AW5md-5kKcVY8lQ4ZsQH&open=AW5md-5kKcVY8lQ4ZsQH > Also {{TestTypedRDBTableStore#toIOException}}: > https://sonarcloud.io/project/issues?id=hadoop-ozone&issues=AW5md-5qKcVY8lQ4ZsQJ&open=AW5md-5qKcVY8lQ4ZsQJ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2536) Add ozone.om.internal.service.id to OM HA configuration
[ https://issues.apache.org/jira/browse/HDDS-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2536: - Parent: HDDS-505 Issue Type: Sub-task (was: Bug) > Add ozone.om.internal.service.id to OM HA configuration > --- > > Key: HDDS-2536 > URL: https://issues.apache.org/jira/browse/HDDS-2536 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This Jira is to add ozone.om.internal.serviceid to let OM knows it belong to > a particular service. > > As now we have ozone.om.service.ids -≥ where we can define all service id's > in a cluster.(This can happen if the same config is shared across the cluster) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2536) Add ozone.om.internal.service.id to OM HA configuration
Bharat Viswanadham created HDDS-2536: Summary: Add ozone.om.internal.service.id to OM HA configuration Key: HDDS-2536 URL: https://issues.apache.org/jira/browse/HDDS-2536 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham This Jira is to add ozone.om.internal.serviceid to let OM knows it belong to a particular service. As now we have ozone.om.service.ids -≥ where we can define all service id's in a cluster.(This can happen if the same config is shared across the cluster) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2535) TestOzoneManagerDoubleBufferWithOMResponse is flaky
[ https://issues.apache.org/jira/browse/HDDS-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976850#comment-16976850 ] Bharat Viswanadham edited comment on HDDS-2535 at 11/18/19 8:25 PM: Hi [~elek] {quote}Independent from the flakiness I think a test where the timeout is 8 minutes and starts 1000 threads to insert 500 buckets (500_000 buckets all together) it's more like an integration test and would be better to move the slowest part to the integration-test project. {quote} I think now it should run quickly with the fix, and also I think it will not take that much of time. On my local laptop, I see, it is always completed in 30sec. And on github run I see it is completed in 53 seconds. I just want to keep this test in UT, as this will detect any failure in the DoubleBuffer issue which is a critical component in OM. (Why I want in UT, because we are going to force sooner, UT should be always green) [INFO] Running org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse [1164|https://github.com/bharatviswa504/hadoop-ozone/runs/308637202#step:3:1164][INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.536 s - in org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse was (Author: bharatviswa): Hi [~elek] {quote}Independent from the flakiness I think a test where the timeout is 8 minutes and starts 1000 threads to insert 500 buckets (500_000 buckets all together) it's more like an integration test and would be better to move the slowest part to the integration-test project. {quote} I think now it should run quickly with the fix, and also I think it will not take that much of time. On my local laptop, I see, it is always completed in 30sec. > TestOzoneManagerDoubleBufferWithOMResponse is flaky > --- > > Key: HDDS-2535 > URL: https://issues.apache.org/jira/browse/HDDS-2535 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Marton Elek >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Flakiness can be reproduced locally. Usually it passes, but when I started to > run it 100 times parallel with high cpu load it failed with the 3rd attempt > (timed out) > {code:java} > --- > Test set: > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 503.297 s <<< > FAILURE! - in > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > testDoubleBuffer(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse) > Time elapsed: 500.122 s <<< ERROR! > java.lang.Exception: test timed out after 50 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:382) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:385) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > Independent from the flakiness I think a test where the timeout is 8 minutes > and starts 1000 threads to insert 500 buckets (500_000 buckets all together) > it's more like an integration test and would be better to move the slowest > part to the integration-test project. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-
[jira] [Comment Edited] (HDDS-2535) TestOzoneManagerDoubleBufferWithOMResponse is flaky
[ https://issues.apache.org/jira/browse/HDDS-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976850#comment-16976850 ] Bharat Viswanadham edited comment on HDDS-2535 at 11/18/19 8:25 PM: Hi [~elek] {quote}Independent from the flakiness I think a test where the timeout is 8 minutes and starts 1000 threads to insert 500 buckets (500_000 buckets all together) it's more like an integration test and would be better to move the slowest part to the integration-test project. {quote} I think now it should run quickly with the fix, and also I think it will not take that much of time. On my local laptop, I see, it is always completed in 30sec. And on github run I see it is completed in 53 seconds. I just want to keep this test in UT, as this will detect any failure in the DoubleBuffer issue which is a critical component in OM. (Why I want in UT, because we are going to force sooner, UT should be always green) {code:java} 1164[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.536 s - in org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse{code} was (Author: bharatviswa): Hi [~elek] {quote}Independent from the flakiness I think a test where the timeout is 8 minutes and starts 1000 threads to insert 500 buckets (500_000 buckets all together) it's more like an integration test and would be better to move the slowest part to the integration-test project. {quote} I think now it should run quickly with the fix, and also I think it will not take that much of time. On my local laptop, I see, it is always completed in 30sec. And on github run I see it is completed in 53 seconds. I just want to keep this test in UT, as this will detect any failure in the DoubleBuffer issue which is a critical component in OM. (Why I want in UT, because we are going to force sooner, UT should be always green) [INFO] Running org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse [1164|https://github.com/bharatviswa504/hadoop-ozone/runs/308637202#step:3:1164][INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.536 s - in org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > TestOzoneManagerDoubleBufferWithOMResponse is flaky > --- > > Key: HDDS-2535 > URL: https://issues.apache.org/jira/browse/HDDS-2535 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Marton Elek >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Flakiness can be reproduced locally. Usually it passes, but when I started to > run it 100 times parallel with high cpu load it failed with the 3rd attempt > (timed out) > {code:java} > --- > Test set: > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 503.297 s <<< > FAILURE! - in > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > testDoubleBuffer(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse) > Time elapsed: 500.122 s <<< ERROR! > java.lang.Exception: test timed out after 50 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:382) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:385) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailO
[jira] [Commented] (HDDS-2535) TestOzoneManagerDoubleBufferWithOMResponse is flaky
[ https://issues.apache.org/jira/browse/HDDS-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976850#comment-16976850 ] Bharat Viswanadham commented on HDDS-2535: -- Hi [~elek] {quote}Independent from the flakiness I think a test where the timeout is 8 minutes and starts 1000 threads to insert 500 buckets (500_000 buckets all together) it's more like an integration test and would be better to move the slowest part to the integration-test project. {quote} I think now it should run quickly with the fix, and also I think it will not take that much of time. On my local laptop, I see, it is always completed in 30sec. > TestOzoneManagerDoubleBufferWithOMResponse is flaky > --- > > Key: HDDS-2535 > URL: https://issues.apache.org/jira/browse/HDDS-2535 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Marton Elek >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Flakiness can be reproduced locally. Usually it passes, but when I started to > run it 100 times parallel with high cpu load it failed with the 3rd attempt > (timed out) > {code:java} > --- > Test set: > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 503.297 s <<< > FAILURE! - in > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > testDoubleBuffer(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse) > Time elapsed: 500.122 s <<< ERROR! > java.lang.Exception: test timed out after 50 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:382) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:385) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > Independent from the flakiness I think a test where the timeout is 8 minutes > and starts 1000 threads to insert 500 buckets (500_000 buckets all together) > it's more like an integration test and would be better to move the slowest > part to the integration-test project. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2535) TestOzoneManagerDoubleBufferWithOMResponse is flaky
[ https://issues.apache.org/jira/browse/HDDS-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2535: - Status: Patch Available (was: Open) > TestOzoneManagerDoubleBufferWithOMResponse is flaky > --- > > Key: HDDS-2535 > URL: https://issues.apache.org/jira/browse/HDDS-2535 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Marton Elek >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Flakiness can be reproduced locally. Usually it passes, but when I started to > run it 100 times parallel with high cpu load it failed with the 3rd attempt > (timed out) > {code:java} > --- > Test set: > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 503.297 s <<< > FAILURE! - in > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > testDoubleBuffer(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse) > Time elapsed: 500.122 s <<< ERROR! > java.lang.Exception: test timed out after 50 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:382) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:385) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > Independent from the flakiness I think a test where the timeout is 8 minutes > and starts 1000 threads to insert 500 buckets (500_000 buckets all together) > it's more like an integration test and would be better to move the slowest > part to the integration-test project. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2535) TestOzoneManagerDoubleBufferWithOMResponse is flaky
[ https://issues.apache.org/jira/browse/HDDS-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2535: Assignee: Bharat Viswanadham > TestOzoneManagerDoubleBufferWithOMResponse is flaky > --- > > Key: HDDS-2535 > URL: https://issues.apache.org/jira/browse/HDDS-2535 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Marton Elek >Assignee: Bharat Viswanadham >Priority: Major > > Flakiness can be reproduced locally. Usually it passes, but when I started to > run it 100 times parallel with high cpu load it failed with the 3rd attempt > (timed out) > {code:java} > --- > Test set: > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 503.297 s <<< > FAILURE! - in > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > testDoubleBuffer(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse) > Time elapsed: 500.122 s <<< ERROR! > java.lang.Exception: test timed out after 50 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:382) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:385) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > Independent from the flakiness I think a test where the timeout is 8 minutes > and starts 1000 threads to insert 500 buckets (500_000 buckets all together) > it's more like an integration test and would be better to move the slowest > part to the integration-test project. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976256#comment-16976256 ] Bharat Viswanadham commented on HDDS-2356: -- [~timmylicheng] And for testing with PR, have you used the branch and set up a new cluster or replaced jars. Could you provide some information on this? > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at ja
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976253#comment-16976253 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/18/19 3:01 AM: Before my PR [https://github.com/apache/hadoop-ozone/pull/163] I have too seen this error. I think in goofys there is a logic if complete Multipart upload failed, it aborts and uploads. (Upload after abort, fails with No_SUCH_MULTIPART_ERROR, this is expected from Ozone/S3 perspective) With the above PR, I was able to upload 1GB,2GB, ... ,6GB files. Please have a look in to PR #163 comment. And for testing with PR, have you used the branch and set up a new cluster or replaced jars. Could you provide some information on this. So, we need to look for is there any failure for COMPLETE_MULTIPART_UPLOAD_ERROR for the key. The reason for this cause is explained in HDDS-2477. Can you also upload om-audit log, if there is an occurrence of COMPLETE_MULTIPART_UPLOAD_ERROR still. was (Author: bharatviswa): Before my PR [https://github.com/apache/hadoop-ozone/pull/163] I have too seen this error. I think in goofys there is a logic if complete Multipart upload failed, it aborts and uploads. (Upload after abort, fails with No_SUCH_MULTIPART_ERROR, this is expected from Ozone/S3 perspective) With the above PR, I was able to upload 1GB,2GB, ... ,6GB files. Please have a look in to PR #163 comment. And for testing with PR, have you used the branch and set up a new cluster or replaced jars. Could you provide some information on this. So, we need to look for is there any failure for COMPLETE_MULTIPART_UPLOAD_ERROR for the key. The reason for this cause is explained in HDDS-2477. Can you also upload om-audit log, if there is an occurrence of COMPLETE_MULTIPART_UPLOAD_ERROR still. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976253#comment-16976253 ] Bharat Viswanadham commented on HDDS-2356: -- Before my PR [https://github.com/apache/hadoop-ozone/pull/163] I have too seen this error. I think in goofys there is a logic if complete Multipart upload failed, it aborts and uploads. (Upload after abort, fails with No_SUCH_MULTIPART_ERROR, this is expected from Ozone/S3 perspective) With the above PR, I was able to upload 1GB,2GB, ... ,6GB files. Please have a look in to PR #163 comment. And for testing with PR, have you used the branch and set up a new cluster or replaced jars. Could you provide some information on this. So, we need to look for is there any failure for COMPLETE_MULTIPART_UPLOAD_ERROR for the key. The reason for this cause is explained in HDDS-2477. Can you also upload om-audit log, if there is an occurrence of COMPLETE_MULTIPART_UPLOAD_ERROR still. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2018-11-15-OM-logs.txt, 2019-11-06_18_13_57_422_ERROR, > hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png, > om-audit-VM_50_210_centos.log, om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1
[jira] [Resolved] (HDDS-2461) Logging by ChunkUtils is misleading
[ https://issues.apache.org/jira/browse/HDDS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2461. -- Fix Version/s: 0.5.0 Resolution: Fixed > Logging by ChunkUtils is misleading > --- > > Key: HDDS-2461 > URL: https://issues.apache.org/jira/browse/HDDS-2461 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > During a k8s based test I found a lot of log message like: > {code:java} > 2019-11-12 14:27:13 WARN ChunkManagerImpl:209 - Duplicate write chunk > request. Chunk overwrite without explicit request. > ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} > {code} > I was very surprised as at ChunkManagerImpl:209 there was no similar lines. > It turned out that it's logged by ChunkUtils but it's used the logger of > ChunkManagerImpl. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2513) Remove this unused "COMPONENT" private field.
[ https://issues.apache.org/jira/browse/HDDS-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2513: - Issue Type: Bug (was: Improvement) > Remove this unused "COMPONENT" private field. > - > > Key: HDDS-2513 > URL: https://issues.apache.org/jira/browse/HDDS-2513 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Abhishek Purohit >Assignee: Abhishek Purohit >Priority: Minor > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Remove this unused "COMPONENT" private field in class > XceiverClientGrpc > [https://sonarcloud.io/project/issues?id=hadoop-ozone&open=AW5md_AGKcVY8lQ4ZsWG&resolved=false] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2513) Remove this unused "COMPONENT" private field.
[ https://issues.apache.org/jira/browse/HDDS-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2513. -- Fix Version/s: 0.5.0 Resolution: Fixed > Remove this unused "COMPONENT" private field. > - > > Key: HDDS-2513 > URL: https://issues.apache.org/jira/browse/HDDS-2513 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Abhishek Purohit >Assignee: Abhishek Purohit >Priority: Minor > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Remove this unused "COMPONENT" private field in class > XceiverClientGrpc > [https://sonarcloud.io/project/issues?id=hadoop-ozone&open=AW5md_AGKcVY8lQ4ZsWG&resolved=false] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2405) int2ByteString unnecessary byte array allocation
[ https://issues.apache.org/jira/browse/HDDS-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2405: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) > int2ByteString unnecessary byte array allocation > > > Key: HDDS-2405 > URL: https://issues.apache.org/jira/browse/HDDS-2405 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {{int2ByteString}} implementations (currently duplicated in > [RatisHelper|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java#L280-L289] > and > [Checksum|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L64-L73], > but the first one is being removed in HDDS-2375) result in unnecessary byte > array allocations: > # {{ByteString.Output}} creates 128-byte buffer by default, which is too > large for writing a single int > # {{DataOutputStream}} allocates an [extra 8-byte > array|https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/io/DataOutputStream.java#l204], > used only for writing longs > # {{ByteString.Output}} also creates 10-element array for {{flushedBuffers}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2502) Close ScmClient in RatisInsight
[ https://issues.apache.org/jira/browse/HDDS-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2502. -- Fix Version/s: 0.5.0 Resolution: Fixed > Close ScmClient in RatisInsight > --- > > Key: HDDS-2502 > URL: https://issues.apache.org/jira/browse/HDDS-2502 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {{ScmClient}} in {{RatisInsight}} should be closed after use. > https://sonarcloud.io/project/issues?id=hadoop-ozone&issues=AW5md-mYKcVY8lQ4Zr_s&open=AW5md-mYKcVY8lQ4Zr_s > Also two other minor issues reported in the same file: > https://sonarcloud.io/project/issues?fileUuids=AW5md-HeKcVY8lQ4ZrXL&id=hadoop-ozone&resolved=false -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2507) Remove the hard-coded exclusion of TestMiniChaosOzoneCluster
[ https://issues.apache.org/jira/browse/HDDS-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2507. -- Fix Version/s: 0.5.0 Resolution: Fixed > Remove the hard-coded exclusion of TestMiniChaosOzoneCluster > > > Key: HDDS-2507 > URL: https://issues.apache.org/jira/browse/HDDS-2507 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We excluded the execution of TestMiniChaosOzoneCluster from the > hadoop-ozone/dev-support/checks/integration.sh because it was not stable > enough. > Unfortunately this exclusion makes it impossible to use custom exclusion > lists (-Dsurefire.excludesFile=) as excludesFile can't be used if > -Dtest=!... is already used. > I propose to remove this exclusion to make it possible to use different > exclusion for different runs (pr check, daily, etc.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2511) Fix Sonar issues in OzoneManagerServiceProviderImpl
[ https://issues.apache.org/jira/browse/HDDS-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2511. -- Resolution: Fixed > Fix Sonar issues in OzoneManagerServiceProviderImpl > --- > > Key: HDDS-2511 > URL: https://issues.apache.org/jira/browse/HDDS-2511 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Recon >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Link to the list of issues : > https://sonarcloud.io/project/issues?fileUuids=AW5md-HdKcVY8lQ4ZrUn&id=hadoop-ozone&resolved=false -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2511) Fix Sonar issues in OzoneManagerServiceProviderImpl
[ https://issues.apache.org/jira/browse/HDDS-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2511: - Priority: Minor (was: Major) > Fix Sonar issues in OzoneManagerServiceProviderImpl > --- > > Key: HDDS-2511 > URL: https://issues.apache.org/jira/browse/HDDS-2511 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Recon >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Minor > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Link to the list of issues : > https://sonarcloud.io/project/issues?fileUuids=AW5md-HdKcVY8lQ4ZrUn&id=hadoop-ozone&resolved=false -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2515) No need to call "toString()" method as formatting and string conversion is done by the Formatter
[ https://issues.apache.org/jira/browse/HDDS-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2515. -- Fix Version/s: 0.5.0 Resolution: Fixed > No need to call "toString()" method as formatting and string conversion is > done by the Formatter > > > Key: HDDS-2515 > URL: https://issues.apache.org/jira/browse/HDDS-2515 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Abhishek Purohit >Assignee: Abhishek Purohit >Priority: Major > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > [https://sonarcloud.io/project/issues?id=hadoop-ozone&open=AW5md_AGKcVY8lQ4ZsV4&resolved=false] > Class: XceiverClientGrpc > {code:java} > if (LOG.isDebugEnabled()) { LOG.debug("Nodes in pipeline : {}", > pipeline.getNodes().toString()); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2515) No need to call "toString()" method as formatting and string conversion is done by the Formatter
[ https://issues.apache.org/jira/browse/HDDS-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2515: - Issue Type: Bug (was: Improvement) > No need to call "toString()" method as formatting and string conversion is > done by the Formatter > > > Key: HDDS-2515 > URL: https://issues.apache.org/jira/browse/HDDS-2515 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Abhishek Purohit >Assignee: Abhishek Purohit >Priority: Major > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > [https://sonarcloud.io/project/issues?id=hadoop-ozone&open=AW5md_AGKcVY8lQ4ZsV4&resolved=false] > Class: XceiverClientGrpc > {code:java} > if (LOG.isDebugEnabled()) { LOG.debug("Nodes in pipeline : {}", > pipeline.getNodes().toString()); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2515) No need to call "toString()" method as formatting and string conversion is done by the Formatter
[ https://issues.apache.org/jira/browse/HDDS-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2515: - Priority: Minor (was: Major) > No need to call "toString()" method as formatting and string conversion is > done by the Formatter > > > Key: HDDS-2515 > URL: https://issues.apache.org/jira/browse/HDDS-2515 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Abhishek Purohit >Assignee: Abhishek Purohit >Priority: Minor > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > [https://sonarcloud.io/project/issues?id=hadoop-ozone&open=AW5md_AGKcVY8lQ4ZsV4&resolved=false] > Class: XceiverClientGrpc > {code:java} > if (LOG.isDebugEnabled()) { LOG.debug("Nodes in pipeline : {}", > pipeline.getNodes().toString()); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2472) Use try-with-resources while creating FlushOptions in RDBStore.
[ https://issues.apache.org/jira/browse/HDDS-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2472. -- Resolution: Fixed > Use try-with-resources while creating FlushOptions in RDBStore. > --- > > Key: HDDS-2472 > URL: https://issues.apache.org/jira/browse/HDDS-2472 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Link to the sonar issue flag - > https://sonarcloud.io/project/issues?id=hadoop-ozone&issues=AW5md-zwKcVY8lQ4ZsJ4&open=AW5md-zwKcVY8lQ4ZsJ4. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2487) Ensure streams are closed
[ https://issues.apache.org/jira/browse/HDDS-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2487: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Ensure streams are closed > - > > Key: HDDS-2487 > URL: https://issues.apache.org/jira/browse/HDDS-2487 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > * ContainerDataYaml: > https://sonarcloud.io/project/issues?id=hadoop-ozone&issues=AW5md-6IKcVY8lQ4ZsQU&open=AW5md-6IKcVY8lQ4ZsQU > * OmUtils: > https://sonarcloud.io/project/issues?id=hadoop-ozone&issues=AW5md-hdKcVY8lQ4Zr76&open=AW5md-hdKcVY8lQ4Zr76 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2494) Sonar - BigDecimal(double) should not be used
[ https://issues.apache.org/jira/browse/HDDS-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2494. -- Fix Version/s: 0.5.0 Resolution: Fixed > Sonar - BigDecimal(double) should not be used > - > > Key: HDDS-2494 > URL: https://issues.apache.org/jira/browse/HDDS-2494 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Matthew Sharp >Assignee: Matthew Sharp >Priority: Minor > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Sonar Issue: > [https://sonarcloud.io/project/issues?id=hadoop-ozone&issues=AW5md-0AKcVY8lQ4ZsKR&open=AW5md-0AKcVY8lQ4ZsKR] > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973836#comment-16973836 ] Bharat Viswanadham commented on HDDS-2356: -- Hi [~timmylicheng] Thanks for sharing the logs. I see completeMultipartUpload is called with 286 parts, and OM is throwing an error InvalidPart, but from an audit log, I was not able to know which part is missing in OM. (And I see 286 success commit Multipart upload for the key). I think there might be a chance of the scenario HDDS-2477 we are hitting here. (Not completely sure, this is my analysis after looking up logs) I have opened couple of Jira's HDDS-2477 HDDS-2471 and HDDS-2470 which will help in analyzing/debugging this issue. (Let's see HDDS-2477 will fix it or not) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om-audit-VM_50_210_centos.log, > om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.a
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973836#comment-16973836 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/14/19 12:41 AM: - Hi [~timmylicheng] Thanks for sharing the logs. I see completeMultipartUpload is called with 286 parts, and OM is throwing an error InvalidPart, but from an audit log, I was not able to know which part is missing in OM(because we don't print any such info in log/exception message). (And I see 286 success commit Multipart upload for the key). I think there might be a chance of the scenario HDDS-2477 we are hitting here. (Not completely sure, this is my analysis after looking up logs) I have opened couple of Jira's HDDS-2477 HDDS-2471 and HDDS-2470 which will help in analyzing/debugging this issue. (Let's see HDDS-2477 will fix it or not) was (Author: bharatviswa): Hi [~timmylicheng] Thanks for sharing the logs. I see completeMultipartUpload is called with 286 parts, and OM is throwing an error InvalidPart, but from an audit log, I was not able to know which part is missing in OM. (And I see 286 success commit Multipart upload for the key). I think there might be a chance of the scenario HDDS-2477 we are hitting here. (Not completely sure, this is my analysis after looking up logs) I have opened couple of Jira's HDDS-2477 HDDS-2471 and HDDS-2470 which will help in analyzing/debugging this issue. (Let's see HDDS-2477 will fix it or not) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png, om-audit-VM_50_210_centos.log, > om_audit_log_plc_1570863541668_9278.txt > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroup
[jira] [Updated] (HDDS-2477) TableCache cleanup issue for OM non-HA
[ https://issues.apache.org/jira/browse/HDDS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2477: - Component/s: Ozone Manager > TableCache cleanup issue for OM non-HA > -- > > Key: HDDS-2477 > URL: https://issues.apache.org/jira/browse/HDDS-2477 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In OM in non-HA case, the ratisTransactionLogIndex is generated by > OmProtocolServersideTranslatorPB.java. And in OM non-HA > validateAndUpdateCache is called from multipleHandler threads. So think of a > case where one thread which has an index - 10 has added to doubleBuffer. (0-9 > still have not added). DoubleBuffer flush thread flushes and call cleanup. > (So, now cleanup will go and cleanup all cache entries with less than 10 > epoch) This should not have cleanup those which might have put in to cache > later and which are in process of flush to DB. This will cause inconsitency > for few OM requests. > > > Example: > 4 threads Committing 4 parts. > 1st thread - part 1 - ratis Index - 3 > 2nd thread - part 2 - ratis index - 2 > 3rd thread - part3 - ratis index - 1 > > First thread got lock, and put in to doubleBuffer and cache with > OmMultipartInfo (with part1). And cleanup is called to cleanup all entries in > cache with less than 3. In the mean time 2nd thread and 1st thread put 2,3 > parts in to OmMultipartInfo in to Cache and doubleBuffer. But first thread > might cleanup those entries, as it is called with index 3 for cleanup. > > Now when the 4th part upload came -> when it is commit Multipart upload when > it gets multipartinfo it get Only part1 in OmMultipartInfo, as the > OmMultipartInfo (with 1,2,3 is still in process of committing to DB). So now > after 4th part upload is complete in DB and Cache we will have 1,4 parts > only. We will miss part2,3 information. > > So for non-HA case cleanup will be called with list of epochs that need to be > cleanedup. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2477) TableCache cleanup issue for OM non-HA
[ https://issues.apache.org/jira/browse/HDDS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2477: - Status: Patch Available (was: Open) > TableCache cleanup issue for OM non-HA > -- > > Key: HDDS-2477 > URL: https://issues.apache.org/jira/browse/HDDS-2477 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In OM in non-HA case, the ratisTransactionLogIndex is generated by > OmProtocolServersideTranslatorPB.java. And in OM non-HA > validateAndUpdateCache is called from multipleHandler threads. So think of a > case where one thread which has an index - 10 has added to doubleBuffer. (0-9 > still have not added). DoubleBuffer flush thread flushes and call cleanup. > (So, now cleanup will go and cleanup all cache entries with less than 10 > epoch) This should not have cleanup those which might have put in to cache > later and which are in process of flush to DB. This will cause inconsitency > for few OM requests. > > > Example: > 4 threads Committing 4 parts. > 1st thread - part 1 - ratis Index - 3 > 2nd thread - part 2 - ratis index - 2 > 3rd thread - part3 - ratis index - 1 > > First thread got lock, and put in to doubleBuffer and cache with > OmMultipartInfo (with part1). And cleanup is called to cleanup all entries in > cache with less than 3. In the mean time 2nd thread and 1st thread put 2,3 > parts in to OmMultipartInfo in to Cache and doubleBuffer. But first thread > might cleanup those entries, as it is called with index 3 for cleanup. > > Now when the 4th part upload came -> when it is commit Multipart upload when > it gets multipartinfo it get Only part1 in OmMultipartInfo, as the > OmMultipartInfo (with 1,2,3 is still in process of committing to DB). So now > after 4th part upload is complete in DB and Cache we will have 1,4 parts > only. We will miss part2,3 information. > > So for non-HA case cleanup will be called with list of epochs that need to be > cleanedup. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2477) TableCache cleanup issue for OM non-HA
Bharat Viswanadham created HDDS-2477: Summary: TableCache cleanup issue for OM non-HA Key: HDDS-2477 URL: https://issues.apache.org/jira/browse/HDDS-2477 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham In OM in non-HA case, the ratisTransactionLogIndex is generated by OmProtocolServersideTranslatorPB.java. And in OM non-HA validateAndUpdateCache is called from multipleHandler threads. So think of a case where one thread which has an index - 10 has added to doubleBuffer. (0-9 still have not added). DoubleBuffer flush thread flushes and call cleanup. (So, now cleanup will go and cleanup all cache entries with less than 10 epoch) This should not have cleanup those which might have put in to cache later and which are in process of flush to DB. This will cause inconsitency for few OM requests. Example: 4 threads Committing 4 parts. 1st thread - part 1 - ratis Index - 3 2nd thread - part 2 - ratis index - 2 3rd thread - part3 - ratis index - 1 First thread got lock, and put in to doubleBuffer and cache with OmMultipartInfo (with part1). And cleanup is called to cleanup all entries in cache with less than 3. In the mean time 2nd thread and 1st thread put 2,3 parts in to OmMultipartInfo in to Cache and doubleBuffer. But first thread might cleanup those entries, as it is called with index 3 for cleanup. Now when the 4th part upload came -> when it is commit Multipart upload when it gets multipartinfo it get Only part1 in OmMultipartInfo, as the OmMultipartInfo (with 1,2,3 is still in process of committing to DB). So now after 4th part upload is complete in DB and Cache we will have 1,4 parts only. We will miss part2,3 information. So for non-HA case cleanup will be called with list of epochs that need to be cleanedup. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2471) Improve exception message for CompleteMultipartUpload
[ https://issues.apache.org/jira/browse/HDDS-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2471: Assignee: Bharat Viswanadham > Improve exception message for CompleteMultipartUpload > - > > Key: HDDS-2471 > URL: https://issues.apache.org/jira/browse/HDDS-2471 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When InvalidPart error occurs, the exception message does not have any > information about partName and partNumber, it will be good to have this > information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2471) Improve exception message for CompleteMultipartUpload
[ https://issues.apache.org/jira/browse/HDDS-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2471: - Status: Patch Available (was: Open) > Improve exception message for CompleteMultipartUpload > - > > Key: HDDS-2471 > URL: https://issues.apache.org/jira/browse/HDDS-2471 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When InvalidPart error occurs, the exception message does not have any > information about partName and partNumber, it will be good to have this > information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2471) Improve exception message for CompleteMultipartUpload
Bharat Viswanadham created HDDS-2471: Summary: Improve exception message for CompleteMultipartUpload Key: HDDS-2471 URL: https://issues.apache.org/jira/browse/HDDS-2471 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham When InvalidPart error occurs, the exception message does not have any information about partName and partNumber, it will be good to have this information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2470) Add partName, partNumber for CommitMultipartUpload
[ https://issues.apache.org/jira/browse/HDDS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2470: - Description: Right now when complete Multipart Upload is not printing partName and partNumber into the audit log. This will help in analyzing audit logs for MPU. 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=xx.xx.xx.xx | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 2 localID: 103129366531867089 } blockCommitSequenceId: 4978 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "5d03aed5-cfb3-4689-b168-0c9a94316551" ipAddress: "xx.xx.xx.xx" hostName: "xx.xx.xx.xx" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5d03aed5-cfb3-4689-b168-0c9a94316551" networkLocation: "/default-rack" } members { uuid: "a71462ae-7865-4ed5-b84e-60616df60a0d" ipAddress: "9.134.51.25" hostName: "9.134.51.25" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a71462ae-7865-4ed5-b84e-60616df60a0d" networkLocation: "/default-rack" } members { uuid: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" ipAddress: "9.134.51.215" hostName: "9.134.51.215" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "ec6b06c5-193f-4c30-879b-5a12284dc4f8" } } ]} | ret=SUCCESS | was: Right now when complete Multipart Upload is not printing partName and partNumber into the audit log. This will help in analyzing audit logs for MPU. 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 2 localID: 103129366531867089 } blockCommitSequenceId: 4978 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "5d03aed5-cfb3-4689-b168-0c9a94316551" ipAddress: "9.134.51.232" hostName: "9.134.51.232" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5d03aed5-cfb3-4689-b168-0c9a94316551" networkLocation: "/default-rack" } members { uuid: "a71462ae-7865-4ed5-b84e-60616df60a0d" ipAddress: "9.134.51.25" hostName: "9.134.51.25" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a71462ae-7865-4ed5-b84e-60616df60a0d" networkLocation: "/default-rack" } members { uuid: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" ipAddress: "9.134.51.215" hostName: "9.134.51.215" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "ec6b06c5-193f-4c30-879b-5a12284dc4f8" } } ]} | ret=SUCCESS | > Add partName, partNumber for CommitMultipartUpload > -- > > Key: HDDS-2470 > URL: https://issues.apache.org/jira/browse/HDDS-2470 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Right now when complete Multipart Upload is not printing partName and > partNumber into the audit log. This will help in analyzing audit logs for MPU. > > > 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=xx.xx.xx.xx | > op=COMMIT_MULTIPART_UPLOAD_PARTKEY > {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, > key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, > replicationFactor=ONE, keyLocationInfo=[blockID { > containerBlockID > { containerID: 2 localID: 103129366531867089 } > blockCommitSequenceId: 4978 > } > offset: 0 > length: 5242880 > createVersion: 0 > pipeline {
[jira] [Updated] (HDDS-2470) Add partName, partNumber for CommitMultipartUpload
[ https://issues.apache.org/jira/browse/HDDS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2470: - Status: Patch Available (was: Open) > Add partName, partNumber for CommitMultipartUpload > -- > > Key: HDDS-2470 > URL: https://issues.apache.org/jira/browse/HDDS-2470 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Right now when complete Multipart Upload is not printing partName and > partNumber into the audit log. This will help in analyzing audit logs for MPU. > > > 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=9.134.50.210 | > op=COMMIT_MULTIPART_UPLOAD_PARTKEY > {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, > key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, > replicationFactor=ONE, keyLocationInfo=[blockID { > containerBlockID > { containerID: 2 localID: 103129366531867089 } > blockCommitSequenceId: 4978 > } > offset: 0 > length: 5242880 > createVersion: 0 > pipeline { > leaderID: "" > members { > uuid: "5d03aed5-cfb3-4689-b168-0c9a94316551" > ipAddress: "9.134.51.232" > hostName: "9.134.51.232" > ports > { name: "RATIS" value: 9858 } > ports > { name: "STANDALONE" value: 9859 } > networkName: "5d03aed5-cfb3-4689-b168-0c9a94316551" > networkLocation: "/default-rack" > } > members { > uuid: "a71462ae-7865-4ed5-b84e-60616df60a0d" > ipAddress: "9.134.51.25" > hostName: "9.134.51.25" > ports > { name: "RATIS" value: 9858 } > ports > { name: "STANDALONE" value: 9859 } > networkName: "a71462ae-7865-4ed5-b84e-60616df60a0d" > networkLocation: "/default-rack" > } > members { > uuid: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" > ipAddress: "9.134.51.215" > hostName: "9.134.51.215" > ports > { name: "RATIS" value: 9858 } > ports > { name: "STANDALONE" value: 9859 } > networkName: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" > networkLocation: "/default-rack" > } > state: PIPELINE_OPEN > type: RATIS > factor: THREE > id > { id: "ec6b06c5-193f-4c30-879b-5a12284dc4f8" } > } > ]} | ret=SUCCESS | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2470) Add partName, partNumber for CommitMultipartUpload
[ https://issues.apache.org/jira/browse/HDDS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2470: - Description: Right now when complete Multipart Upload is not printing partName and partNumber into the audit log. This will help in analyzing audit logs for MPU. 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 2 localID: 103129366531867089 } blockCommitSequenceId: 4978 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "5d03aed5-cfb3-4689-b168-0c9a94316551" ipAddress: "9.134.51.232" hostName: "9.134.51.232" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5d03aed5-cfb3-4689-b168-0c9a94316551" networkLocation: "/default-rack" } members { uuid: "a71462ae-7865-4ed5-b84e-60616df60a0d" ipAddress: "9.134.51.25" hostName: "9.134.51.25" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a71462ae-7865-4ed5-b84e-60616df60a0d" networkLocation: "/default-rack" } members { uuid: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" ipAddress: "9.134.51.215" hostName: "9.134.51.215" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "ec6b06c5-193f-4c30-879b-5a12284dc4f8" } } ]} | ret=SUCCESS | was: Right now when complete Multipart Upload is not printing partName and partNumber into the audit log. 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 2 localID: 103129366531867089 } blockCommitSequenceId: 4978 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "5d03aed5-cfb3-4689-b168-0c9a94316551" ipAddress: "9.134.51.232" hostName: "9.134.51.232" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5d03aed5-cfb3-4689-b168-0c9a94316551" networkLocation: "/default-rack" } members { uuid: "a71462ae-7865-4ed5-b84e-60616df60a0d" ipAddress: "9.134.51.25" hostName: "9.134.51.25" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a71462ae-7865-4ed5-b84e-60616df60a0d" networkLocation: "/default-rack" } members { uuid: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" ipAddress: "9.134.51.215" hostName: "9.134.51.215" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "ec6b06c5-193f-4c30-879b-5a12284dc4f8" } } ]} | ret=SUCCESS | > Add partName, partNumber for CommitMultipartUpload > -- > > Key: HDDS-2470 > URL: https://issues.apache.org/jira/browse/HDDS-2470 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > Right now when complete Multipart Upload is not printing partName and > partNumber into the audit log. This will help in analyzing audit logs for MPU. > > > 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=9.134.50.210 | > op=COMMIT_MULTIPART_UPLOAD_PARTKEY > {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, > key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, > replicationFactor=ONE, keyLocationInfo=[blockID { > containerBlockID > { containerID: 2 localID: 103129366531867089 } > blockCommitSequenceId: 4978 > } > offset: 0 > length: 5242880 > createVersion: 0 > pipeline { > leaderID: "" > members { > uuid: "5d03aed5-cfb3-4689-b168-0c9a94316551" > ipAddress: "9.134.51.232" > hostNam
[jira] [Created] (HDDS-2470) Add partName, partNumber for CommitMultipartUpload
Bharat Viswanadham created HDDS-2470: Summary: Add partName, partNumber for CommitMultipartUpload Key: HDDS-2470 URL: https://issues.apache.org/jira/browse/HDDS-2470 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham Right now when complete Multipart Upload is not printing partName and partNumber into the audit log. 2019-11-13 15:14:10,191 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570850798896_2991, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 2 localID: 103129366531867089 } blockCommitSequenceId: 4978 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "5d03aed5-cfb3-4689-b168-0c9a94316551" ipAddress: "9.134.51.232" hostName: "9.134.51.232" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5d03aed5-cfb3-4689-b168-0c9a94316551" networkLocation: "/default-rack" } members { uuid: "a71462ae-7865-4ed5-b84e-60616df60a0d" ipAddress: "9.134.51.25" hostName: "9.134.51.25" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a71462ae-7865-4ed5-b84e-60616df60a0d" networkLocation: "/default-rack" } members { uuid: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" ipAddress: "9.134.51.215" hostName: "9.134.51.215" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "79bf7bdf-ed29-49d4-bf7c-e88fdbd2ce03" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: THREE id { id: "ec6b06c5-193f-4c30-879b-5a12284dc4f8" } } ]} | ret=SUCCESS | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2465) S3 Multipart upload failing
[ https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972951#comment-16972951 ] Bharat Viswanadham commented on HDDS-2465: -- cc [~elek] > S3 Multipart upload failing > --- > > Key: HDDS-2465 > URL: https://issues.apache.org/jira/browse/HDDS-2465 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > Attachments: MPU.java > > > When I run attached java program, facing below error, during > completeMultipartUpload. > {code:java} > ERROR StatusLogger No Log4j 2 configuration file found. Using default > configuration (logging only errors to the console), or user programmatically > provided configurations. Set system property 'log4j2.debug' to show Log4j 2 > internal initialization logging. See > https://logging.apache.org/log4j/2.x/manual/configuration.html for > instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 > configuration file found. Using default configuration (logging only errors to > the console), or user programmatically provided configurations. Set system > property 'log4j2.debug' to show Log4j 2 internal initialization logging. See > https://logging.apache.org/log4j/2.x/manual/configuration.html for > instructions on how to configure Log4j 2Exception in thread "main" > com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: > Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: > c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), > S3 Extended Request ID: 7tnVbqgc4bgb at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) > at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at > com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464) > at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code} > When I debug it is not the request is not been received by S3Gateway, and I > don't see any trace of this in audit log. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2465) S3 Multipart upload failing
[ https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2465: - Attachment: MPU.java > S3 Multipart upload failing > --- > > Key: HDDS-2465 > URL: https://issues.apache.org/jira/browse/HDDS-2465 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > Attachments: MPU.java > > > When I run attached java program, facing below error, during > completeMultipartUpload. > {code:java} > ERROR StatusLogger No Log4j 2 configuration file found. Using default > configuration (logging only errors to the console), or user programmatically > provided configurations. Set system property 'log4j2.debug' to show Log4j 2 > internal initialization logging. See > https://logging.apache.org/log4j/2.x/manual/configuration.html for > instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 > configuration file found. Using default configuration (logging only errors to > the console), or user programmatically provided configurations. Set system > property 'log4j2.debug' to show Log4j 2 internal initialization logging. See > https://logging.apache.org/log4j/2.x/manual/configuration.html for > instructions on how to configure Log4j 2Exception in thread "main" > com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: > Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: > c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), > S3 Extended Request ID: 7tnVbqgc4bgb at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) > at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at > com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464) > at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code} > When I debug it is not the request is not been received by S3Gateway, and I > don't see any trace of this in audit log. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2465) S3 Multipart upload failing
Bharat Viswanadham created HDDS-2465: Summary: S3 Multipart upload failing Key: HDDS-2465 URL: https://issues.apache.org/jira/browse/HDDS-2465 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham When I run attached java program, facing below error, during completeMultipartUpload. {code:java} ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), S3 Extended Request ID: 7tnVbqgc4bgb at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464) at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code} When I debug it is not the request is not been received by S3Gateway, and I don't see any trace of this in audit log. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:22 PM: - Hi [~timmylicheng] Thanks for sharing the logs. I see an abort multipart upload request for the key plc_1570863541668_9278 once complete multipart upload failed. {code:java} 2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1 5626 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085" 5627 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158" . . 5911 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258" 5912 ]} | ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo= []} {code} And after that still, allocateBlock is continuing for the key because the entry from openKeyTable is not removed by abortMultipartUpload request.(Abort removed only entry which has been created during initiateMPU request, so that is the reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But the strange thing I have observed is the clientID is not matching with any of the name in the partlist, as partName lastpart is clientID.) And from the OM audit log, I see partNumber 1, and a list of multipart names, not sure if some log is truncated here. As it should show like part name, partNumber. # If you can confirm for this key what are parts in OM, you can get this from listParts(But this should be done before abort request). # Check in the OM audit log for this key what is the partlist we get, not sure in the uploaded log it is truncated. On my cluster audit logs look like below, where when completeMultipartUpload, I can see partNumber and partName.(Whereas in the uploaded log, I don't see like below) {code:java} 2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 6 localID: 103127415126327331 } blockCommitSequenceId: 18 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDA
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:21 PM: - Hi [~timmylicheng] Thanks for sharing the logs. I see an abort multipart upload request for the key plc_1570863541668_9278 once complete multipart upload failed. {code:java} 2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1 5626 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085" 5627 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158" . . 5911 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258" 5912 ]} | ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo= []} {code} And after that still, allocateBlock is continuing for the key because the entry from openKeyTable is not removed by abortMultipartUpload request.(Abort removed only entry which has been created during initiateMPU request, so that is the reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But the strange thing I have observed is the clientID is not matching with any of the name in the partlist, as partName lastpart is clientID.) And from the OM audit log, I see partNumber 1, and a list of multipart names, not sure if some log is truncated here. As it should show like part name, partNumber. # If you can confirm for this key what are parts in OM, you can get this from listParts(But this should be done before abort request). # Check in the OM audit log for this key what is the partlist we get, not sure in the uploaded log it is truncated. On my cluster audit logs look like below, where when completeMultipartUpload, I can see partNumber and partName.(Whereas in the uploaded log, I don't see like below) {code:java} 2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 6 localID: 103127415126327331 } blockCommitSequenceId: 18 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDA
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:17 PM: - Hi [~timmylicheng] Thanks for sharing the logs. I see an abort multipart upload request for the key plc_1570863541668_9278 once complete multipart upload failed. {code:java} 2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1 5626 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085" 5627 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158" . . 5911 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258" 5912 ]} | ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo= []} {code} And after that still, allocateBlock is continuing for the key because the entry from openKeyTable is not removed by abortMultipartUpload request.(Abort removed only entry which has been created during initiateMPU request, so that is the reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But the strange thing I have observed is the clientID is not matching with any of the name in the partlist, as partName lastpart is clientID.) And from the OM audit log, I see partNumber 1, and a list of multipart names, not sure if some log is truncated here. As it should show like part name, partNumber. # If you can confirm for this key what are parts in OM, you can get this from listParts(But this should be done before abort request). # Check in the OM audit log for this key what is the partlist we get, not sure in the uploaded log it is truncated. On my cluster audit logs look like below, where when completeMultipartUpload, I can see partNumber and partName.(Whereas in the uploaded log, I don't see like below) {code:java} 2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 6 localID: 103127415126327331 } blockCommitSequenceId: 18 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDA
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:08 PM: - Hi [~timmylicheng] Thanks for sharing the logs. I see an abort multipart upload request for the key plc_1570863541668_9278 once complete multipart upload failed. 2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1 5626 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085" 5627 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158" . . 5911 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258" 5912 ]} | ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo= []} | ret=SUCCESS | And after that still, allocateBlock is continuing for the key because the entry from openKeyTable is not removed by abortMultipartUpload request.(Abort removed only entry which has been created during initiateMPU request, so that is the reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But the strange thing I have observed is the clientID is not matching with any of the name in the partlist, as partName lastpart is clientID.) And from the OM audit log, I see partNumber 1, and a list of multipart names, not sure if some log is truncated here. As it should show like part name, partNumber. # If you can confirm for this key what are parts in OM, you can get this from listParts(But this should be done before abort request). # Check in the OM audit log for this key what is the partlist we get, not sure in the uploaded log it is truncated. On my cluster audit logs look like below, where when completeMultipartUpload, I can see partNumber and partName.(Whereas in the uploaded log, I don't see like below) {code:java} 2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 6 localID: 103127415126327331 } blockCommitSequenceId: 18 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDA
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874 ] Bharat Viswanadham commented on HDDS-2356: -- Hi [~timmylicheng] Thanks for sharing the logs. I see an abort multipart upload request for the key plc_1570863541668_9278 once complete multipart upload failed. 2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1 5626 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085" 5627 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158" . . 5911 partName: "/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258" 5912 ]} | ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278 2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 | op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo= []} | ret=SUCCESS | And after that still, allocateBlock is continuing for the key because the entry from openKeyTable is not removed by abortMultipartUpload request.(Abort removed only entry which has been created during initiateMPU request, so that is the reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But the strange thing I have observed is the clientID is not matching with any of the name in the partlist, as partName lastpart is clientID.) And from the OM audit log, I see partNumber 1, and a list of multipart names, not sure if some log is truncated here. As it should show like part name, partNumber. # If you can confirm for this key what are parts in OM, you can get this from listParts(But this should be done before abort request). # Check in the OM audit log for this key what is the partlist we get, not sure in the uploaded log it is truncated. On my cluster audit logs look like below. {code:java} 2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[]} | ret=SUCCESS | 2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 | op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 6 localID: 103127415126327331 } blockCommitSequenceId: 18 } offset: 0 length: 5242880 createVersion: 0 pipeline { leaderID: "" members { uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" ipAddress: "10.65.49.251" hostName: "bh-ozone-3.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6" networkLocation: "/default-rack" } members { uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" ipAddress: "10.65.51.23" hostName: "bh-ozone-4.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d" networkLocation: "/default-rack" } members { uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b" ipAddress: "10.65.53.160" hostName: "bh-ozone-2.vpc.cloudera.com" ports { name: "RATIS" value: 9858 } ports { name: "STANDALONE" value: 9859 } networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b" networkLocation: "/default-rack" } state: PIPELINE_OPEN type: RATIS factor: T
[jira] [Created] (HDDS-2453) Add Freon tests for S3Bucket/MPU Keys
Bharat Viswanadham created HDDS-2453: Summary: Add Freon tests for S3Bucket/MPU Keys Key: HDDS-2453 URL: https://issues.apache.org/jira/browse/HDDS-2453 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham This Jira is to create freon tests for # S3Bucket creation. # S3 MPU Key uploads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2453) Add Freon tests for S3Bucket/MPU Keys
[ https://issues.apache.org/jira/browse/HDDS-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2453: Assignee: Bharat Viswanadham > Add Freon tests for S3Bucket/MPU Keys > - > > Key: HDDS-2453 > URL: https://issues.apache.org/jira/browse/HDDS-2453 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > This Jira is to create freon tests for > # S3Bucket creation. > # S3 MPU Key uploads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2410) Ozoneperf docker cluster should use privileged containers
[ https://issues.apache.org/jira/browse/HDDS-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2410. -- Fix Version/s: 0.5.0 Resolution: Fixed > Ozoneperf docker cluster should use privileged containers > - > > Key: HDDS-2410 > URL: https://issues.apache.org/jira/browse/HDDS-2410 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The profiler > [servlet|https://github.com/elek/hadoop-ozone/blob/master/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/ProfileServlet.java] > (which helps to run java profiler in the background and publishes the result > on the web interface) requires privileged docker containers. > > This flag is missing from the ozoneperf docker-compose cluster (which is > designed to run performance tests). > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970430#comment-16970430 ] Bharat Viswanadham commented on HDDS-2356: -- Hi [~timmylicheng] As every run, we are seeing the new error and the stack trace and from log not got much information about the root cause. I think to debug this we need to know why for the Multipartupload key is not finding multipart upload or why some times we see InvalidMultipartupload error. We can see audit logs and see what request is passing for Multipartupload requests, and for the same key we can use listParts to know what are the parts OM is having in its MultipartInfoTable(This will help in InvalidPart error). And also I think we should enable trace/debug log to see the incoming requests, and why for Multipart upload we see these errors. (Not sure some bug in Cache logic, or some handling we missed for MPU requests) To debug this we need a complete OM log, audit log, S3gateway log. And also enable trace to see what requests are incoming, I think we log them in OzoneManagerProtocolServerSideTranslatorPB. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970430#comment-16970430 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/8/19 5:29 PM: --- Hi [~timmylicheng] As every run, we are seeing the new error and the stack trace and from log not got much information about the root cause. I think to debug this we need to know why for the Multipartupload key is not finding multipart upload or why some times we see InvalidMultipartupload error. We can see audit logs and see what request is passing for Multipartupload requests, and for the same key we can use listParts to know what are the parts OM is having in its MultipartInfoTable(This will help in InvalidPart error). And also I think we should enable trace/debug log to see the incoming requests, and why for Multipart upload we see these errors. (Not sure some bug in Cache logic, or some handling we missed for MPU requests) To debug this we need a complete OM log, audit log, S3gateway log. And also enable trace to see what requests are incoming, I think we log them in OzoneManagerProtocolServerSideTranslatorPB. Let us know if you have any suggestions. was (Author: bharatviswa): Hi [~timmylicheng] As every run, we are seeing the new error and the stack trace and from log not got much information about the root cause. I think to debug this we need to know why for the Multipartupload key is not finding multipart upload or why some times we see InvalidMultipartupload error. We can see audit logs and see what request is passing for Multipartupload requests, and for the same key we can use listParts to know what are the parts OM is having in its MultipartInfoTable(This will help in InvalidPart error). And also I think we should enable trace/debug log to see the incoming requests, and why for Multipart upload we see these errors. (Not sure some bug in Cache logic, or some handling we missed for MPU requests) To debug this we need a complete OM log, audit log, S3gateway log. And also enable trace to see what requests are incoming, I think we log them in OzoneManagerProtocolServerSideTranslatorPB. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log, > image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB
[jira] [Updated] (HDDS-2427) Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar
[ https://issues.apache.org/jira/browse/HDDS-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2427: - Fix Version/s: 0.5.0 > Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar > - > > Key: HDDS-2427 > URL: https://issues.apache.org/jira/browse/HDDS-2427 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This has caused issue for DN UI loading. > hadoop-ozone-filesystem-lib-current-xx.jar is in the classpath which > accidentally loaded Ozone datanode web application instead of Hadoop datanode > application. This leads to the reported error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2399) Update mailing list information in CONTRIBUTION and README files
[ https://issues.apache.org/jira/browse/HDDS-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2399. -- Fix Version/s: 0.5.0 Resolution: Fixed > Update mailing list information in CONTRIBUTION and README files > > > Key: HDDS-2399 > URL: https://issues.apache.org/jira/browse/HDDS-2399 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Priority: Major > Labels: newbie, pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We have new mailing lists: > [ozone-...@hadoop.apache.org|mailto:ozone-...@hadoop.apache.org] > [ozone-iss...@hadoop.apache.org|mailto:ozone-iss...@hadoop.apache.org] > [ozone-comm...@hadoop.apache.org|mailto:ozone-comm...@hadoop.apache.org] > > We need to update CONTRIBUTION.md and README.md to use ozone-dev instead of > hdfs-dev (optionally we can mention the issues/commits lists, but only in > CONTRIBUTION.md) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969847#comment-16969847 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/8/19 5:26 AM: --- I think this error is not related to the NO_SUCH_MULTIPART_UPLOAD_ERROR. I have fixed MISMATCH_MULTIPART_LIST in HDDS-2395. was (Author: bharatviswa): I think this error is not related to the NO_SUCH_MULTIPART_UPLOAD_ERROR. I have fixed MISMATCH_ERROR in HDDS-2395. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969847#comment-16969847 ] Bharat Viswanadham commented on HDDS-2356: -- I think this error is not related to the NO_SUCH_MULTIPART_UPLOAD_ERROR. I have fixed MISMATCH_ERROR in HDDS-2395. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMu
[jira] [Resolved] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2395. -- Fix Version/s: 0.5.0 Resolution: Fixed > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2427) Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar
[ https://issues.apache.org/jira/browse/HDDS-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2427: Assignee: Bharat Viswanadham > Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar > - > > Key: HDDS-2427 > URL: https://issues.apache.org/jira/browse/HDDS-2427 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This has caused issue for DN UI loading. > hadoop-ozone-filesystem-lib-current-xx.jar is in the classpath which > accidentally loaded Ozone datanode web application instead of Hadoop datanode > application. This leads to the reported error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2427) Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar
[ https://issues.apache.org/jira/browse/HDDS-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2427: - Status: Patch Available (was: Open) > Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar > - > > Key: HDDS-2427 > URL: https://issues.apache.org/jira/browse/HDDS-2427 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This has caused issue for DN UI loading. > hadoop-ozone-filesystem-lib-current-xx.jar is in the classpath which > accidentally loaded Ozone datanode web application instead of Hadoop datanode > application. This leads to the reported error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2427) Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar
Bharat Viswanadham created HDDS-2427: Summary: Exclude webapps from hadoop-ozone-filesystem-lib-current uber jar Key: HDDS-2427 URL: https://issues.apache.org/jira/browse/HDDS-2427 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham This has caused issue for DN UI loading. hadoop-ozone-filesystem-lib-current-xx.jar is in the classpath which accidentally loaded Ozone datanode web application instead of Hadoop datanode application. This leads to the reported error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968967#comment-16968967 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/7/19 6:38 AM: --- HI [~timmylicheng] When uploading part file, when no such Multipartupload is found in multipartInfoTable we throw error. In your tests, are you trying to abort any upload, while some client is still trying to upload part. See the below snippet of the code. {code:java} if (multipartKeyInfo == null) { // This can occur when user started uploading part by the time commit // of that part happens, in between the user might have requested // abort multipart upload. If we just throw exception, then the data // will not be garbage collected, so move this part to delete table // and throw error // Move this part to delete table. throw new OMException("No such Multipart upload is with specified " + "uploadId " + uploadID, OMException.ResultCodes.NO_SUCH_MULTIPART_UPLOAD_ERROR); }{code} I don't see any log attached to the Jira BTW. was (Author: bharatviswa): HI [~timmylicheng] When uploading part file, when no such Multipartupload is found in multipartInfoTable we throw error. In your tests, are you trying to abort any upload. See the below snippet of the code. {code:java} if (multipartKeyInfo == null) { // This can occur when user started uploading part by the time commit // of that part happens, in between the user might have requested // abort multipart upload. If we just throw exception, then the data // will not be garbage collected, so move this part to delete table // and throw error // Move this part to delete table. throw new OMException("No such Multipart upload is with specified " + "uploadId " + uploadID, OMException.ResultCodes.NO_SUCH_MULTIPART_UPLOAD_ERROR); }{code} I don't see any log attached to the Jira BTW. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:10
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968967#comment-16968967 ] Bharat Viswanadham commented on HDDS-2356: -- HI [~timmylicheng] When uploading part file, when no such Multipartupload is found in multipartInfoTable we throw error. In your tests, are you trying to abort any upload. See the below snippet of the code. {code:java} if (multipartKeyInfo == null) { // This can occur when user started uploading part by the time commit // of that part happens, in between the user might have requested // abort multipart upload. If we just throw exception, then the data // will not be garbage collected, so move this part to delete table // and throw error // Move this part to delete table. throw new OMException("No such Multipart upload is with specified " + "uploadId " + uploadID, OMException.ResultCodes.NO_SUCH_MULTIPART_UPLOAD_ERROR); }{code} I don't see any log attached to the Jira BTW. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > Updated on 11/06/2019: > See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs > are in the attachment. > 2019-11-05 18:12:37,766 ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for Key:./2 > 0191012/plc_1570863541668_9278 in Volume/Bucket > s325d55ad283aa400af464c76d713c07ad/ozone-test > NO_SUCH_MULTIPART_UPLOAD_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload > is with specified uploadId fcda8608-b431-48b7-8386- > 0a332f1a709a-103084683261641950 > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1 > 56) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB. > java:217) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > Updated on 10/28/2019: > See MISMATCH_MULTIPART_LIST error. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolCl
[jira] [Resolved] (HDDS-2377) Speed up TestOzoneManagerHA#testOMRetryProxy and #testTwoOMNodesDown
[ https://issues.apache.org/jira/browse/HDDS-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2377. -- Fix Version/s: 0.5.0 Resolution: Fixed > Speed up TestOzoneManagerHA#testOMRetryProxy and #testTwoOMNodesDown > > > Key: HDDS-2377 > URL: https://issues.apache.org/jira/browse/HDDS-2377 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Marton's comment: > https://github.com/apache/hadoop-ozone/pull/30#pullrequestreview-302465440 > Out of curiosity, I ran entire TestOzoneManagerHA locally. The entire test > class finished in 10m 30s. I discovered {{testOMRetryProxy}} and > {{testTwoOMNodesDown}} are taking the most time (2m and 2m 30s respectively) > to finish. Most time are wasted on retry and wait. We could reasonably reduce > the amount of time on the wait. > As I tested, with the patch, {{testOMRetryProxy}} and {{testTwoOMNodesDown}} > finish in 20 sec each, saving almost 4 min runtime on those two tests alone. > The whole TestOzoneManagerHA test finishes in 5m 44s with the patch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2404) Add support for Registered id as service identifier for CSR.
[ https://issues.apache.org/jira/browse/HDDS-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968096#comment-16968096 ] Bharat Viswanadham commented on HDDS-2404: -- Can we move this task under HDDS-505, as it is related to OM HA work.? > Add support for Registered id as service identifier for CSR. > > > Key: HDDS-2404 > URL: https://issues.apache.org/jira/browse/HDDS-2404 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Reporter: Anu Engineer >Assignee: Abhishek Purohit >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The SCM HA needs the ability to represent a group as a single entity. So that > Tokens for each of the OM which is part of an HA group can be honored by the > data nodes. > This patch adds the notion of a service group ID to the Certificate > Infrastructure. In the next JIRAs, we will use this capability when issuing > certificates to OM -- especially when they are in HA mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1643) Send hostName also part of OMRequest
[ https://issues.apache.org/jira/browse/HDDS-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-1643. -- Fix Version/s: 0.5.0 Resolution: Fixed > Send hostName also part of OMRequest > > > Key: HDDS-1643 > URL: https://issues.apache.org/jira/browse/HDDS-1643 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: YiSheng Lien >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This Jira is created based on the comment from [~eyang] on HDDS-1600 jira. > [~bharatviswa] can hostname be used as part of OM request? For running in > docker container, virtual private network address may not be routable or > exposed to outside world. Using IP to identify the source client location may > not be enough. It would be nice to have ability support hostname based > request too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2064. -- Fix Version/s: 0.5.0 Resolution: Fixed > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist.- > -Root cause:- > -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config.- > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads
[ https://issues.apache.org/jira/browse/HDDS-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2359. -- Fix Version/s: 0.5.0 Resolution: Fixed > Seeking randomly in a key with more than 2 blocks of data leads to > inconsistent reads > - > > Key: HDDS-2359 > URL: https://issues.apache.org/jira/browse/HDDS-2359 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > During Hive testing we found the following exception: > {code} > TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : > attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 16 more > Caused by: java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > ... 18 more > Caused by: java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:835) > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:74) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361) > ... 24 more > Caused by: java.io.IOException: Error reading file: > o3fs://hive.warehouse.vc0136.halxg.cloudera.com:9862/data/inventory/delta_001_001_/bucket_0 > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(Rec
[jira] [Resolved] (HDDS-2255) Improve Acl Handler Messages
[ https://issues.apache.org/jira/browse/HDDS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2255. -- Fix Version/s: 0.5.0 Resolution: Fixed > Improve Acl Handler Messages > > > Key: HDDS-2255 > URL: https://issues.apache.org/jira/browse/HDDS-2255 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: om >Reporter: Hanisha Koneru >Assignee: YiSheng Lien >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In Add/Remove/Set Acl Key/Bucket/Volume Handlers, we print a message about > whether the operation was successful or not. If we are trying to add an ACL > which is already existing, we convey the message that the operation failed. > It would be better if the message conveyed more clearly why the operation > failed i.e. the ACL already exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2398) Remove usage of LogUtils class from ratis-common
[ https://issues.apache.org/jira/browse/HDDS-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2398. -- Fix Version/s: 0.5.0 Resolution: Fixed > Remove usage of LogUtils class from ratis-common > > > Key: HDDS-2398 > URL: https://issues.apache.org/jira/browse/HDDS-2398 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > MiniOzoneChaoasCluster.java for setting log level it uses LogUtils from > ratis-common. But this is removed from LogUtils as part of Ratis-508. > We can avoid depending on ratis for this, and use GenericTestUtils from > hadoop-common test. > LogUtils.setLogLevel(GrpcClientProtocolClient.LOG, Level.WARN); -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2398) Remove usage of LogUtils class from ratis-common
Bharat Viswanadham created HDDS-2398: Summary: Remove usage of LogUtils class from ratis-common Key: HDDS-2398 URL: https://issues.apache.org/jira/browse/HDDS-2398 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham MiniOzoneChaoasCluster.java for setting log level it uses LogUtils from ratis-common. But this is removed from LogUtils as part of Ratis-508. We can avoid depending on ratis for this, and use GenericTestUtils from hadoop-common test. LogUtils.setLogLevel(GrpcClientProtocolClient.LOG, Level.WARN); -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2398) Remove usage of LogUtils class from ratis-common
[ https://issues.apache.org/jira/browse/HDDS-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2398: Assignee: Bharat Viswanadham > Remove usage of LogUtils class from ratis-common > > > Key: HDDS-2398 > URL: https://issues.apache.org/jira/browse/HDDS-2398 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > MiniOzoneChaoasCluster.java for setting log level it uses LogUtils from > ratis-common. But this is removed from LogUtils as part of Ratis-508. > We can avoid depending on ratis for this, and use GenericTestUtils from > hadoop-common test. > LogUtils.setLogLevel(GrpcClientProtocolClient.LOG, Level.WARN); -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964947#comment-16964947 ] Bharat Viswanadham edited comment on HDDS-2396 at 11/1/19 4:30 PM: --- Hi [~aengineer] The issue resolved with try-with-resource is HDDS-2379. Below is the stack trace error. [~timmylicheng] In your cluster setup testing does it have a fix for HDDS-2379.(Not sure it will resolve or not, just want to check if it is a new issue.) {code:java} 2019-10-29 11:15:15,131 ERROR org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with exit status 1: During flush to DB encountered err or in OMDoubleBuffer flush thread OMDoubleBufferFlushThread java.io.IOException: Unable to write the batch. at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48) at org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) at java.lang.Thread.run(Thread.java:745) Caused by: org.rocksdb.RocksDBException: unknown WriteBatch tag at org.rocksdb.RocksDB.write0(Native Method) at org.rocksdb.RocksDB.write(RocksDB.java:1421) at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46) ... 3 more{code} was (Author: bharatviswa): Hi [~aengineer] The issue resolved with try-with-resource is HDDS-2379. Below is the stack trace error. [~timmylicheng] In your cluster setup testing does it have a fix for HDDS-2379.(Not sure it will resolve or not, just want to check.) {code:java} 2019-10-29 11:15:15,131 ERROR org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with exit status 1: During flush to DB encountered err or in OMDoubleBuffer flush thread OMDoubleBufferFlushThread java.io.IOException: Unable to write the batch. at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48) at org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) at java.lang.Thread.run(Thread.java:745) Caused by: org.rocksdb.RocksDBException: unknown WriteBatch tag at org.rocksdb.RocksDB.write0(Native Method) at org.rocksdb.RocksDB.write(RocksDB.java:1421) at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46) ... 3 more{code} > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x00
[jira] [Comment Edited] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964947#comment-16964947 ] Bharat Viswanadham edited comment on HDDS-2396 at 11/1/19 4:30 PM: --- Hi [~aengineer] The issue resolved with try-with-resource is HDDS-2379. Below is the stack trace error. [~timmylicheng] In your cluster setup testing does it have a fix for HDDS-2379.(Not sure it will resolve or not, just want to check.) {code:java} 2019-10-29 11:15:15,131 ERROR org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with exit status 1: During flush to DB encountered err or in OMDoubleBuffer flush thread OMDoubleBufferFlushThread java.io.IOException: Unable to write the batch. at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48) at org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) at java.lang.Thread.run(Thread.java:745) Caused by: org.rocksdb.RocksDBException: unknown WriteBatch tag at org.rocksdb.RocksDB.write0(Native Method) at org.rocksdb.RocksDB.write(RocksDB.java:1421) at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46) ... 3 more{code} was (Author: bharatviswa): Hi [~aengineer] The issue resolved with try-with-resource is HDDS-2379. Below is the stack trace error. [~timmylicheng] In your cluster setup testing does it have a fix for HDDS-2379.(Not sure it will resolve or not, just want to check.) 2019-10-29 11:15:15,131 ERROR org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with exit status 1: During flush to DB encountered err or in OMDoubleBuffer flush thread OMDoubleBufferFlushThread java.io.IOException: Unable to write the batch. at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48) at org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) at java.lang.Thread.run(Thread.java:745) Caused by: org.rocksdb.RocksDBException: unknown WriteBatch tag at org.rocksdb.RocksDB.write0(Native Method) at org.rocksdb.RocksDB.write(RocksDB.java:1421) at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46) ... 3 more > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 byt
[jira] [Commented] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964947#comment-16964947 ] Bharat Viswanadham commented on HDDS-2396: -- Hi [~aengineer] The issue resolved with try-with-resource is HDDS-2379. Below is the stack trace error. [~timmylicheng] In your cluster setup testing does it have a fix for HDDS-2379.(Not sure it will resolve or not, just want to check.) 2019-10-29 11:15:15,131 ERROR org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with exit status 1: During flush to DB encountered err or in OMDoubleBuffer flush thread OMDoubleBufferFlushThread java.io.IOException: Unable to write the batch. at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48) at org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) at java.lang.Thread.run(Thread.java:745) Caused by: org.rocksdb.RocksDBException: unknown WriteBatch tag at org.rocksdb.RocksDB.write0(Native Method) at org.rocksdb.RocksDB.write(RocksDB.java:1421) at org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46) ... 3 more > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2397) Fix calling cleanup for few missing tables in OM
[ https://issues.apache.org/jira/browse/HDDS-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2397: Assignee: Bharat Viswanadham > Fix calling cleanup for few missing tables in OM > > > Key: HDDS-2397 > URL: https://issues.apache.org/jira/browse/HDDS-2397 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After DoubleBuffer flushes, we call cleanup cache to cleanup tables cache. > For few tables cleanup of cache is missed: > # PrefixTable > # S3SecretTable > # DelegationTable -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2397) Fix calling cleanup for few missing tables in OM
Bharat Viswanadham created HDDS-2397: Summary: Fix calling cleanup for few missing tables in OM Key: HDDS-2397 URL: https://issues.apache.org/jira/browse/HDDS-2397 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham After DoubleBuffer flushes, we call cleanup cache to cleanup tables cache. For few tables cleanup of cache is missed: # PrefixTable # S3SecretTable # DelegationTable -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2356: - Comment: was deleted (was: Has the above error caused crash in OM? If so, can you share stack trace?) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964586#comment-16964586 ] Bharat Viswanadham commented on HDDS-2356: -- Has the above error caused crash in OM? If so, can you share stack trace? > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKey
[jira] [Commented] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964585#comment-16964585 ] Bharat Viswanadham commented on HDDS-2395: -- Hi [~timmylicheng] Exclude List is fixed as part of HDDS-2381. Thanks. > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2395: - Issue Type: Bug (was: Task) > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964482#comment-16964482 ] Bharat Viswanadham commented on HDDS-2356: -- Opened HDDS-2359 to handle CompleteMPU error cases. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.had
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964482#comment-16964482 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/1/19 12:41 AM: Opened HDDS-2395 to handle CompleteMPU error cases. was (Author: bharatviswa): Opened HDDS-2359 to handle CompleteMPU error cases. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at >