Re: [lustre-discuss] ZFS w/Lustre problem

2020-11-19 Thread Steve Thompson

On Wed, 18 Nov 2020, Faaland, Olaf P. wrote:

You mentioned you may have a fix for zfs_send.c in ZFS.  Although Lustre 
tickles the bug, it's not likely that is the only way to tickle it.


Is there already a bug report for your issue at 
https://github.com/openzfs/zfs/issues?  If not, can you create one, even 
if your patch isn't successful?  That's the place to get your patch 
landed, and/or get help with the issue.


The issue is already reported as #8067. The patch mentioned in this 
article is still valid, but for a different file; the txg_wait_synced()
call is at line 2215 of dmu_send.c on ZFS 0.7.13. This patch does not fix 
that very slow performance of 'zfs recv', but it does fix the 'dataset 
does not exist' problem.


Steve
--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
  "186,282 miles per second: it's not just a good idea, it's the law"

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS w/Lustre problem

2020-11-18 Thread Faaland, Olaf P.
Hi Steve,

You mentioned you may have a fix for zfs_send.c in ZFS.   Although Lustre 
tickles the bug, it's not likely that is the only way to tickle it.

Is there already a bug report for your issue at 
https://github.com/openzfs/zfs/issues?  If not, can you create one, even if 
your patch isn't successful?  That's the place to get your patch landed, and/or 
get help with the issue.

thanks,
-Olaf


From: lustre-discuss  on behalf of 
Steve Thompson 
Sent: Tuesday, November 10, 2020 5:06 AM
To: Hans Henrik Happe
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] ZFS w/Lustre problem

On Mon, 9 Nov 2020, Hans Henrik Happe wrote:

> I sounds like this issue, but I'm not sure what your dnodesize is:
>
> https://github.com/openzfs/zfs/issues/8458
>
> ZFS 0.8.1+ on the receiving side should fix it. Then again ZFS 0.8 is
> not supported in Lustre 2.12, so it's a bit hard to restore, without
> copying the underlying devices.

Hans Henrik,

Many thanks for your input. I had in fact known about the dnodesize issue,
and tested a workaround. Unfortunately, it turned out not to be this.
Instead, I have tested a patch to zfs_send.c, which does appear to have
solved the issue. The zfs send/recv is still running, however; if it
completes successfully, I will post again with details of the patch.

Steve
--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
   "186,282 miles per second: it's not just a good idea, it's the law"

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS w/Lustre problem

2020-11-10 Thread Steve Thompson

On Mon, 9 Nov 2020, Hans Henrik Happe wrote:


I sounds like this issue, but I'm not sure what your dnodesize is:

https://github.com/openzfs/zfs/issues/8458

ZFS 0.8.1+ on the receiving side should fix it. Then again ZFS 0.8 is
not supported in Lustre 2.12, so it's a bit hard to restore, without
copying the underlying devices.


Hans Henrik,

Many thanks for your input. I had in fact known about the dnodesize issue, 
and tested a workaround. Unfortunately, it turned out not to be this. 
Instead, I have tested a patch to zfs_send.c, which does appear to have 
solved the issue. The zfs send/recv is still running, however; if it 
completes successfully, I will post again with details of the patch.


Steve
--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
  "186,282 miles per second: it's not just a good idea, it's the law"

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS w/Lustre problem

2020-11-09 Thread Hans Henrik Happe
I sounds like this issue, but I'm not sure what your dnodesize is:

https://github.com/openzfs/zfs/issues/8458

ZFS 0.8.1+ on the receiving side should fix it. Then again ZFS 0.8 is
not supported in Lustre 2.12, so it's a bit hard to restore, without
copying the underlying devices.

Cheers,
Hans Henrik

On 06.11.2020 21.23, Steve Thompson wrote:
> This may be a question for the ZFS list...
>
> I have Lustre 2.12.5 on Centos 7.8 with ZFS 0.7.13, 10GB network. I
> make snapshots of the Lustre filesystem with 'lctl snapshot_create'
> and at a later time transfer these snapshots to a backup system with
> zfs send/recv. This works well for everything but the MDT. For the
> MDT, I find that the zfs recv always fails when a little less than 1GB
> has been transferred (this being an incremental send/recv of snapshots
> taken a day apart):
>
> # zfs send -v -c -i fs0pool/mdt0@03-nov-2020 fs0pool/mdt0@04-nov-2020 | \
> zfs recv -F backups/fs0pool/mdt0
> 
> 12:11:18    946M   fs0pool/mdt0@04-nov-2020-01:00
> 12:11:19    946M   fs0pool/mdt0@04-nov-2020-01:00
> 12:11:20    946M   fs0pool/mdt0@04-nov-2020-01:00
> cannot receive incremental stream: dataset does not exist
>
> while if the data transfer is much smaller, the send/recv works. Since
> once I get a failure it is not possible to complete a send/recv for
> any subsequent day, I am doing a full snapshot send to a file; this
> always works and takes about 5/6 minutes for my MDT. When using zfs
> send/recv, the recv is always very very slow (several hours to get to
> the above failure point, even when using mbuffer). I am using custom
> zfs replication scripts, but it fails also using the zrep package.
>
> Does anyone know of a possible explanation? Is there any version of
> ZFS 0.8 that works with Lustre 2.12.5?
>
> Thanks,
> Steve

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] ZFS w/Lustre problem

2020-11-06 Thread Steve Thompson

This may be a question for the ZFS list...

I have Lustre 2.12.5 on Centos 7.8 with ZFS 0.7.13, 10GB network. I make 
snapshots of the Lustre filesystem with 'lctl snapshot_create' and at a 
later time transfer these snapshots to a backup system with zfs send/recv. 
This works well for everything but the MDT. For the MDT, I find that the 
zfs recv always fails when a little less than 1GB has been transferred 
(this being an incremental send/recv of snapshots taken a day apart):


# zfs send -v -c -i fs0pool/mdt0@03-nov-2020 fs0pool/mdt0@04-nov-2020 | \
zfs recv -F backups/fs0pool/mdt0

12:11:18946M   fs0pool/mdt0@04-nov-2020-01:00
12:11:19946M   fs0pool/mdt0@04-nov-2020-01:00
12:11:20946M   fs0pool/mdt0@04-nov-2020-01:00
cannot receive incremental stream: dataset does not exist

while if the data transfer is much smaller, the send/recv works. Since 
once I get a failure it is not possible to complete a send/recv for any 
subsequent day, I am doing a full snapshot send to a file; this always 
works and takes about 5/6 minutes for my MDT. When using zfs send/recv, 
the recv is always very very slow (several hours to get to the above 
failure point, even when using mbuffer). I am using custom zfs replication 
scripts, but it fails also using the zrep package.


Does anyone know of a possible explanation? Is there any version of ZFS 
0.8 that works with Lustre 2.12.5?


Thanks,
Steve
--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
  "186,282 miles per second: it's not just a good idea, it's the law"

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org