Hi Luca,

Thanks for sharing your findings. 
It makes sense to me that the parallel storage/data dirs upgrade is likely the 
cause of OOM that we hit. For our upgrade experience, we only hit the OOM issue 
on the cluster where the average number of blocks per DN is very high (~8M 
block replica per DN).

-Jason

On 2/14/21, 11:41 PM, "Luca Toscano" <toscano.l...@gmail.com> wrote:

    This is my idea about what happened, let me know if it makes sense
    (extended to everybody reading of course!):

    * Any upgrade from 2.6 to something more recent needs to go through a
    restructuring of the datanode volumes/directories, as described in
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D3290&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=19J5Mfofkl1yBsGlgiLTSuZxKGoM2QFqTWGTOIX8mWc&s=oAemf95gmd2h0lpuZRtM4VHAg9Ck7CdSPItXm38cL0s&e=
  and
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D6482&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=19J5Mfofkl1yBsGlgiLTSuZxKGoM2QFqTWGTOIX8mWc&s=cPlMCGr9ooHdz_QRanlfVTcSZXv4ApMsP_uznFJW2KI&e=
 .
    * From 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D8782&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=19J5Mfofkl1yBsGlgiLTSuZxKGoM2QFqTWGTOIX8mWc&s=ZedlA7n0DRm1X_jqiI_4rtUttyO03WaV7G5wfiCN70Y&e=
  it seems that
    the procedure requires time, and until the volumes are upgraded the DN
    doesn't register to the Namenode. This is what we observed during the
    upgrade, a lot of DNs took a ton of time to register but eventually
    they did it (without hitting OOMs).
    * 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D8578&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=19J5Mfofkl1yBsGlgiLTSuZxKGoM2QFqTWGTOIX8mWc&s=uH1c1qjE2XGlkLyPM0t7XrnPY5vBjVlEelos-2m6Jd0&e=
  was created to
    process the datanode volumes/dirs in parallel on upgrade
    (independently from the datanode dir structure upgrade mentioned above
    IIUC) but this may cause OOM, as described in
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D9536&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=19J5Mfofkl1yBsGlgiLTSuZxKGoM2QFqTWGTOIX8mWc&s=YtmSA-VpEORp8hfYG8D2MQsEO7Lc9fukR_mxyg4Y27o&e=
  (that looks an open
    problem).

    In theory then upgrading from a distro equipped with 2.6 (like CDH 5)
    needs to go through the directory restructure, but any upgrade can
    also hit OOMs due to parallel processing of storage volumes/dirs. Does
    it make sense?

    Luca

    On Sat, Feb 13, 2021 at 7:11 PM Luca Toscano <toscano.l...@gmail.com> wrote:
    >
    > Hi Jason,
    >
    > Thanks a lot for sharing your story too, I definitely feel way better
    > about the upgrade plan that we used knowing that the exact issue
    > happened to other people. I tried to check in Hadoop's jira if this
    > upgrade memory requirement was mentioned, but didn't find anything.
    > Have you some more info to share about how to best scale DNs' jvm heap
    > sizes before the upgrade starts? In my case it was a
    > restart/fail/double-the-heap procedure until we found that 16G was a
    > good value for our DNs, but I see that in your case it was probably
    > worse (4GB -> 64GB). I wouldn't really be sure about what to suggest
    > to somebody doing a similar upgrade and asking for suggestions, and
    > since you encountered the issue upgrading to Hadoop 3.x this will be
    > relevant also for people upgrading from Bigtop 1.4/1.5 to the future
    > 3.x release. The more info we can collect the better for the community
    > in my opinion!
    >
    > Luca
    >
    > On Fri, Feb 12, 2021 at 7:49 PM Jason Wen <zhenshan....@workday.com> 
wrote:
    > >
    > > HI Luca,
    > >
    > > Thanks for sharing your upgrade experience.
    > > We hit the exact same issue of HDFS inconsistent status issue when we 
upgraded one cluster from CDH5.16.2 to CDH6.3.2. At that time some DNs crashed 
due to OOM and some other DNs were still running but failed to upgrade its 
volumes. We finally resolved the issue by increasing the max heap size from 4GB 
to 64GB (our DNs has either 256GB or 512GB memory) and then restarting all the 
DNs.
    > >
    > > -Jason
    > >
    > > On 2/12/21, 12:52 AM, "Luca Toscano" <toscano.l...@gmail.com> wrote:
    > >
    > >     Hi everybody,
    > >
    > >     We have finally migrated our CDH cluster to Bigtop 1.5, so I can say
    > >     that we are now happy Bigtop users :)
    > >
    > >     The upgrade of the production cluster (60 worker nodes, ~50M files 
on
    > >     HDFS) was harder than I expected, since we bumped into a strange
    > >     performance issue that slowed down the HDFS upgrade. I wrote a 
summary
    > >     in 
https://urldefense.proofpoint.com/v2/url?u=https-3A__phabricator.wikimedia.org_T273711-236818136&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=n8sbnJKTVI75MPipuVM4uUi1n49089On4CdWygRwp20&s=Lluhh7rsGsKk9zQbVVXvbAMLIlMUPdary3ZUuI3dA8I&e=
  for whoever is
    > >     interested, it is surely something to highlight in the CDH->Bigtop
    > >     guide. Speaking of which, the last thing that we did was starting
    > >     
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE_edit&d=DwIBaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=UflFQf1BWcrVtfjfN1LUqWWh-UBP5XtRGMdcDC-0P7o&m=n8sbnJKTVI75MPipuVM4uUi1n49089On4CdWygRwp20&s=GxA46Ok2-8JaiU3V2_uF9QaI49w31jRHn4sRh_YCcGc&e=
    > >     some time ago, so I am wondering if we could find a more permanent
    > >     location. Would it make sense to start a wiki page somewhere? Or 
even
    > >     a .md file in the github repo, as you prefer (the latter would be 
more
    > >     convenient for reviewers etc..).
    > >
    > >     Anyway, thanks a lot to all for the support! It was a looong project
    > >     but we eventually did it!
    > >
    > >     Luca
    > >

Reply via email to