** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/
  
  [Impact]
  
  There is a problem where kernels 5.0-rc1 and onwards cannot mount a
  multi tier cifs DFS setup, while kernels 4.20 and below can mount the
  share fine.
  
  The DFS tiering structure looks like this:
  
  Domain virtual DFS (i.e. \\company.com\folders\share)
  |-- Domain controller DFS (i.e. \\regional-dc.company.com\folders\share)
-     |-- Regional DFS Server (i.e. \\regional-dfs.company.com\folders\share)
-         |-- Actual file server (i.e. \\regional-svr.company.com\share)
+     |-- Regional DFS Server (i.e. \\regional-dfs.company.com\folders\share)
+         |-- Actual file server (i.e. \\regional-svr.company.com\share)
  
  On the 5.x series kernels, after getting the DFS referrals list through
  to the Regional DFS Server, which responds with the correct
  server/share, instead of going to the Actual file server, the kernel
  backtracks from the Regional DFS Server back to the Domain controller
  and requests the share there. Of course, this share does not exist on
  the Domain controller, as it only exists on the Actual file server, and
  the connection dies.
  
  We have collected a packet capture, and the flow looks like this:
  
+ Legend:
+ --------------------------------------------------
+ DC = Domain Controller / Domain DFS Root
+ RDC = Regional Domain Controller / Domain DFS Root
+ RDS = Regional DFS Server
+ AFS = Actual File Server
+ 
  4.18.0-21-generic Ubuntu kernel - Good
  
- Host                                            request/response
- --------------------------------------------    
----------------------------------------------------
- Domain controller / Domain DFS Root             company.com\folders
- Domain controller / Domain DFS Root             Referral List
- Regional Domain Controller / Domain DFS Root    start convo
- Regional Domain Controller / Domain DFS Root    <Regional Domain 
Controller>\Folders\Country\<Share> referral
- Regional Domain Controller / Domain DFS Root    <Regional Domain 
Controller>\Folders\Country\<Share> referral
- Regional DFS server                             start convo
- Regional DFS server                             <Regional DFS 
Server>\Root\Country\<Share>
- Regional DFS server                             STATUS_PATH_NOT_COVERED
- Regional DFS server                             request referrals
- Regional DFS server                             Referral List
- Actual File Server                              convo started
- Actual File Server                              <Actual File Server>\<Share>
- Actual File Server                              Good response
+ Host:   request/response
+ --------------------------------------------------------------------
+ DC:     company.com\folders
+ DC:     Referral List
+ RDC:    start convo
+ RDC:    <Regional Domain Controller>\Folders\Country\<Share> referral
+ RDC:    <Regional Domain Controller>\Folders\Country\<Share> referral
+ RDS:    start convo
+ RDS:    <Regional DFS Server>\Root\Country\<Share>
+ RDS:    STATUS_PATH_NOT_COVERED
+ RDS:    request referrals
+ RDS:    Referral List
+ AFS:    convo started
+ AFS:    <Actual File Server>\<Share>
+ AFS:    Good response
  
  5.0.0-26-generic Ubuntu kernel - Bad
  
- Host                                            request/response
- --------------------------------------------    
-------------------------------------------
- Domain controller / Domain DFS Root             company.com\folders
- Regional Domain Controller / Domain DFS Root    start convo
- Regional Domain Controller / Domain DFS Root    <Regional Domain 
Controller>\Folders\Country\<Share>
- Regional Domain Controller / Domain DFS Root    STATUS_PATH_NOT_COVERED
- Regional DFS server                             start convo
- Regional DFS server                             <Regional DFS 
Server>\Root\Country\<Share>
- Regional DFS server                             STATUS_PATH_NOT_COVERED
- Regional Domain Controller / Domain DFS Root    <Regional DFS 
Server>\Root\Country\<Share>
- Regional Domain Controller / Domain DFS Root    STATUS_PATH_NOT_COVERED
+ Host:   request/response
+ ------------------------------------------------------------
+ DC:     company.com\folders
+ RDC:    start convo
+ RDC:    <Regional Domain Controller>\Folders\Country\<Share>
+ RDC:    STATUS_PATH_NOT_COVERED
+ RDS:    start convo
+ RDS:    <Regional DFS Server>\Root\Country\<Share>
+ RDS:    STATUS_PATH_NOT_COVERED
+ RDC:    <Regional DFS Server>\Root\Country\<Share>
+ RDC:    STATUS_PATH_NOT_COVERED
  
  From there the debugging output was more or less the same between the
  two kernel versions, until the problematic area:
  
  Linux 4.18:
  
  Full log: https://paste.ubuntu.com/p/D9XwBbvTXc/
  
  Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
  fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
  fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS 
Server>\Root\Country\<Share>>
  fs/cifs/misc.c: num_referrals: 1 dfs flags: 0x2 ...
  fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: <Actual File 
Server> to <IPV4 Address>
  fs/cifs/connect.c: Username: XXX
  // mounts the share successfully
  
  Linux 5.0:
  
  Full log: https://paste.ubuntu.com/p/9sXPj7WMQv/
  
  Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
  fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
  fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/dfs_cache.c: do_dfs_cache_find: search path: \<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/dfs_cache.c: do_dfs_cache_find: cache miss
  fs/cifs/dfs_cache.c: do_dfs_cache_find: DFS referral request for \<Regional 
DFS Server>\Root\Country\<Share>
  fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS 
Server>\Root\Country\<Share>>
  fs/cifs/smb2pdu.c: SMB2 IOCTL
  Status code returned 0xc0000225 STATUS_NOT_FOUND
  fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000225 to POSIX err -2
  // mounting the share fails shortly after
  
  This has quite a big impact to customers who need to mount their multi-
  tier DFS mounts, as they have to remain on the 4.15 bionic kernel and
  cannot use the HWE kernel for their machines.
  
  [Fix]
  
  After some debugging, I narrowed the cause down to a new DFS caching
  feature introduced in 5.0-rc1. I started a discussion with the upstream
  maintainer of cifs, which you can read here:
  
  https://lore.kernel.org/linux-cifs/05aa2995-e85e-
  0ff4-d003-5bb08bd17...@canonical.com/T/#u
  
  This discussion resulted in the below upstream commit, which was merged
  in the 5.5 development window:
  
  commit 5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
  Author: Paulo Alcantara (SUSE) <p...@cjr.nz>
  Date:   Fri Nov 22 12:30:56 2019 -0300
  Subject: cifs: Fix retrieval of DFS referrals in cifs_mount()
  
  You can read it here:
  
https://github.com/torvalds/linux/commit/5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
  
  This commit sets referrals to be passed to the newest resolved root
  server, instead of older ones up the order. This ensures that we keep
  descending down the tree instead of backtracking, which what was
  happening.
  
  This commit has been submitted for upstream -stable, and is still being
  processed. The commit is needed on kernels 5.0 and up. I will update
  this section if it is accepted for -stable.
  
  [Testcase]
  
  To test this commit you need a multi-tier cifs DFS with a similar
  structure as the tree mentioned in the Impact section. From there, you
  simply try and mount a cifs share.
  
  On patched kernels, the mount will succeed. On broken kernels, the mount
  will fail.
  
  I have prepared a test kernel for Bionic HWE, based on 5.0.0-37.40~18.04
  which you can find here:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf245466-test
  
  This test kernel has been tested by the customer and mounts the cifs DFS
  correctly.
  
  [Regression Potential]
  
  I believe the risk of regression for this commit is low. All changes are
  limited to DFS within cifs, and only change the behaviour of what server
  is the root server referrals are sent to.
  
  The commit is a clean cherry pick for disco, eoan and focal. The
  maintainer has submitted the commit for upstream -stable, and we have
  tested the commit with the customer, and things are now working as
  intended.

** Description changed:

- BugLink: https://bugs.launchpad.net/bugs/
+ BugLink: https://bugs.launchpad.net/bugs/1854887
  
  [Impact]
  
  There is a problem where kernels 5.0-rc1 and onwards cannot mount a
  multi tier cifs DFS setup, while kernels 4.20 and below can mount the
  share fine.
  
  The DFS tiering structure looks like this:
  
  Domain virtual DFS (i.e. \\company.com\folders\share)
  |-- Domain controller DFS (i.e. \\regional-dc.company.com\folders\share)
      |-- Regional DFS Server (i.e. \\regional-dfs.company.com\folders\share)
          |-- Actual file server (i.e. \\regional-svr.company.com\share)
  
  On the 5.x series kernels, after getting the DFS referrals list through
  to the Regional DFS Server, which responds with the correct
  server/share, instead of going to the Actual file server, the kernel
  backtracks from the Regional DFS Server back to the Domain controller
  and requests the share there. Of course, this share does not exist on
  the Domain controller, as it only exists on the Actual file server, and
  the connection dies.
  
  We have collected a packet capture, and the flow looks like this:
  
  Legend:
  --------------------------------------------------
  DC = Domain Controller / Domain DFS Root
  RDC = Regional Domain Controller / Domain DFS Root
  RDS = Regional DFS Server
  AFS = Actual File Server
  
  4.18.0-21-generic Ubuntu kernel - Good
  
  Host:   request/response
  --------------------------------------------------------------------
  DC:     company.com\folders
  DC:     Referral List
  RDC:    start convo
  RDC:    <Regional Domain Controller>\Folders\Country\<Share> referral
  RDC:    <Regional Domain Controller>\Folders\Country\<Share> referral
  RDS:    start convo
  RDS:    <Regional DFS Server>\Root\Country\<Share>
  RDS:    STATUS_PATH_NOT_COVERED
  RDS:    request referrals
  RDS:    Referral List
  AFS:    convo started
  AFS:    <Actual File Server>\<Share>
  AFS:    Good response
  
  5.0.0-26-generic Ubuntu kernel - Bad
  
  Host:   request/response
  ------------------------------------------------------------
  DC:     company.com\folders
  RDC:    start convo
  RDC:    <Regional Domain Controller>\Folders\Country\<Share>
  RDC:    STATUS_PATH_NOT_COVERED
  RDS:    start convo
  RDS:    <Regional DFS Server>\Root\Country\<Share>
  RDS:    STATUS_PATH_NOT_COVERED
  RDC:    <Regional DFS Server>\Root\Country\<Share>
  RDC:    STATUS_PATH_NOT_COVERED
  
  From there the debugging output was more or less the same between the
  two kernel versions, until the problematic area:
  
  Linux 4.18:
  
  Full log: https://paste.ubuntu.com/p/D9XwBbvTXc/
  
  Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
  fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
  fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS 
Server>\Root\Country\<Share>>
  fs/cifs/misc.c: num_referrals: 1 dfs flags: 0x2 ...
  fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: <Actual File 
Server> to <IPV4 Address>
  fs/cifs/connect.c: Username: XXX
  // mounts the share successfully
  
  Linux 5.0:
  
  Full log: https://paste.ubuntu.com/p/9sXPj7WMQv/
  
  Status code returned 0xc0000257 STATUS_PATH_NOT_COVERED
  fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000257 to POSIX err -66
  fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/connect.c: build_unc_path_to_root: full_path=\\<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/dfs_cache.c: do_dfs_cache_find: search path: \<Regional DFS 
Server>\Root\Country\<Share>
  fs/cifs/dfs_cache.c: do_dfs_cache_find: cache miss
  fs/cifs/dfs_cache.c: do_dfs_cache_find: DFS referral request for \<Regional 
DFS Server>\Root\Country\<Share>
  fs/cifs/smb2ops.c: smb2_get_dfs_refer path <\<Regional DFS 
Server>\Root\Country\<Share>>
  fs/cifs/smb2pdu.c: SMB2 IOCTL
  Status code returned 0xc0000225 STATUS_NOT_FOUND
  fs/cifs/smb2maperror.c: Mapping SMB2 status code 0xc0000225 to POSIX err -2
  // mounting the share fails shortly after
  
  This has quite a big impact to customers who need to mount their multi-
  tier DFS mounts, as they have to remain on the 4.15 bionic kernel and
  cannot use the HWE kernel for their machines.
  
  [Fix]
  
  After some debugging, I narrowed the cause down to a new DFS caching
  feature introduced in 5.0-rc1. I started a discussion with the upstream
  maintainer of cifs, which you can read here:
  
  https://lore.kernel.org/linux-cifs/05aa2995-e85e-
  0ff4-d003-5bb08bd17...@canonical.com/T/#u
  
  This discussion resulted in the below upstream commit, which was merged
  in the 5.5 development window:
  
  commit 5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
  Author: Paulo Alcantara (SUSE) <p...@cjr.nz>
  Date:   Fri Nov 22 12:30:56 2019 -0300
  Subject: cifs: Fix retrieval of DFS referrals in cifs_mount()
  
  You can read it here:
  
https://github.com/torvalds/linux/commit/5bb30a4dd60e2a10a4de9932daff23e503f1dd2b
  
  This commit sets referrals to be passed to the newest resolved root
  server, instead of older ones up the order. This ensures that we keep
  descending down the tree instead of backtracking, which what was
  happening.
  
  This commit has been submitted for upstream -stable, and is still being
  processed. The commit is needed on kernels 5.0 and up. I will update
  this section if it is accepted for -stable.
  
  [Testcase]
  
  To test this commit you need a multi-tier cifs DFS with a similar
  structure as the tree mentioned in the Impact section. From there, you
  simply try and mount a cifs share.
  
  On patched kernels, the mount will succeed. On broken kernels, the mount
  will fail.
  
  I have prepared a test kernel for Bionic HWE, based on 5.0.0-37.40~18.04
  which you can find here:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf245466-test
  
  This test kernel has been tested by the customer and mounts the cifs DFS
  correctly.
  
  [Regression Potential]
  
  I believe the risk of regression for this commit is low. All changes are
  limited to DFS within cifs, and only change the behaviour of what server
  is the root server referrals are sent to.
  
  The commit is a clean cherry pick for disco, eoan and focal. The
  maintainer has submitted the commit for upstream -stable, and we have
  tested the commit with the customer, and things are now working as
  intended.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1854887

Title:
  cifs: DFS Caching feature causing problems traversing multi-tier DFS
  setups

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1854887/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to