Re: posix_fallocate on ZFS

2018-02-10 Thread Willem Jan Withagen

On 11/02/2018 00:10, Alan Somers wrote:
On Sat, Feb 10, 2018 at 3:50 PM, Willem Jan Withagen > wrote:


On 10/02/2018 20:43, Ian Lepore wrote:

On Sat, 2018-02-10 at 11:24 -0700, Alan Somers wrote:

On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen
wrote:


Hi,

This has been disabled on ZFS since last November.
And I do understand the rationale on this.

BUT

I've now upgraded some of my HEAD Ceph test systems and
they now fail,
since Ceph uses posix_fallocate() to allocate space for the
FileStore-journal.

Is there any expectation that this is going to fixed in
any near future?

--WjW

No.  It's fundamentally impossible to support
posix_fallocate on a COW
filesystem like ZFS.  Ceph should be taught to ignore an
EINVAL result,
since the system call is merely advisory.

-Alan


Unfortunately, posix documents that the function returns EINVAL only
due to bad input parameters, so ignoring that seems like a bad idea.

Wouldn't it be better if we returned EOPNOTSUP if that's the actual
situation?  That could be safely ignored.


I would probably help in my situation

And I've been looking at the manpage, but cannot seem to find any
indication that EINVAL is returned on running it on FreeBSD.


It's in the manpage, but only on head.  It hasn't been in any stable 
release yet.

https://svnweb.freebsd.org/base/head/lib/libc/sys/posix_fallocate.2?revision=325422&view=markup#l112


Right, it is. And it is even in the man-page were I looked. :(
Just plainly read over it.

To be honest I would expect it to have a bit more proza in the header of 
the manpage. Because it is rather significant that it does not work on 
certain FSes. And not just hide this in a single line in the explanation 
of an error value...


--WjW

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Alan Somers
On Sat, Feb 10, 2018 at 3:50 PM, Willem Jan Withagen 
wrote:

> On 10/02/2018 20:43, Ian Lepore wrote:
>
>> On Sat, 2018-02-10 at 11:24 -0700, Alan Somers wrote:
>>
>>> On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen
>>> wrote:
>>>
>>>
 Hi,

 This has been disabled on ZFS since last November.
 And I do understand the rationale on this.

 BUT

 I've now upgraded some of my HEAD Ceph test systems and they now fail,
 since Ceph uses posix_fallocate() to allocate space for the
 FileStore-journal.

 Is there any expectation that this is going to fixed in any near future?

 --WjW

 No.  It's fundamentally impossible to support posix_fallocate on a COW
>>> filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
>>> since the system call is merely advisory.
>>>
>>> -Alan
>>>
>>
>> Unfortunately, posix documents that the function returns EINVAL only
>> due to bad input parameters, so ignoring that seems like a bad idea.
>>
>> Wouldn't it be better if we returned EOPNOTSUP if that's the actual
>> situation?  That could be safely ignored.
>>
>
> I would probably help in my situation
>
> And I've been looking at the manpage, but cannot seem to find any
> indication that EINVAL is returned on running it on FreeBSD.
>

It's in the manpage, but only on head.  It hasn't been in any stable
release yet.

https://svnweb.freebsd.org/base/head/lib/libc/sys/posix_fallocate.2?revision=325422&view=markup#l112
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Willem Jan Withagen

On 10/02/2018 20:43, Ian Lepore wrote:

On Sat, 2018-02-10 at 11:24 -0700, Alan Somers wrote:

On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen
wrote:



Hi,

This has been disabled on ZFS since last November.
And I do understand the rationale on this.

BUT

I've now upgraded some of my HEAD Ceph test systems and they now fail,
since Ceph uses posix_fallocate() to allocate space for the
FileStore-journal.

Is there any expectation that this is going to fixed in any near future?

--WjW


No.  It's fundamentally impossible to support posix_fallocate on a COW
filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
since the system call is merely advisory.

-Alan


Unfortunately, posix documents that the function returns EINVAL only
due to bad input parameters, so ignoring that seems like a bad idea.

Wouldn't it be better if we returned EOPNOTSUP if that's the actual
situation?  That could be safely ignored.


I would probably help in my situation

And I've been looking at the manpage, but cannot seem to find any 
indication that EINVAL is returned on running it on FreeBSD.


--WjW
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Ian Lepore
On Sat, 2018-02-10 at 12:45 -0700, Alan Somers wrote:
> On Sat, Feb 10, 2018 at 12:43 PM, Ian Lepore  wrote:
> 
> > 
> > On Sat, 2018-02-10 at 11:24 -0700, Alan Somers wrote:
> > > 
> > > On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen
> > > wrote:
> > > 
> > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > This has been disabled on ZFS since last November.
> > > > And I do understand the rationale on this.
> > > > 
> > > > BUT
> > > > 
> > > > I've now upgraded some of my HEAD Ceph test systems and they now fail,
> > > > since Ceph uses posix_fallocate() to allocate space for the
> > > > FileStore-journal.
> > > > 
> > > > Is there any expectation that this is going to fixed in any near
> > future?
> > > 
> > > > 
> > > > 
> > > > --WjW
> > > > 
> > > No.  It's fundamentally impossible to support posix_fallocate on a COW
> > > filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
> > > since the system call is merely advisory.
> > > 
> > > -Alan
> > Unfortunately, posix documents that the function returns EINVAL only
> > due to bad input parameters, so ignoring that seems like a bad idea.
> > 
> > Wouldn't it be better if we returned EOPNOTSUP if that's the actual
> > situation?  That could be safely ignored.
> > 
> I'm afraid you are mistaken.  Posix _should've_ required EOPNOTSUP in this,
> but it actually requires EINVAL.
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html

Oops, I apparently was looking at the prior version of the spec.
 Nevermind. :)

-- Ian

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Alan Somers
On Sat, Feb 10, 2018 at 12:43 PM, Ian Lepore  wrote:

> On Sat, 2018-02-10 at 11:24 -0700, Alan Somers wrote:
> > On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen
> > wrote:
> >
> > >
> > > Hi,
> > >
> > > This has been disabled on ZFS since last November.
> > > And I do understand the rationale on this.
> > >
> > > BUT
> > >
> > > I've now upgraded some of my HEAD Ceph test systems and they now fail,
> > > since Ceph uses posix_fallocate() to allocate space for the
> > > FileStore-journal.
> > >
> > > Is there any expectation that this is going to fixed in any near
> future?
> > >
> > > --WjW
> > >
> > No.  It's fundamentally impossible to support posix_fallocate on a COW
> > filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
> > since the system call is merely advisory.
> >
> > -Alan
>
> Unfortunately, posix documents that the function returns EINVAL only
> due to bad input parameters, so ignoring that seems like a bad idea.
>
> Wouldn't it be better if we returned EOPNOTSUP if that's the actual
> situation?  That could be safely ignored.
>

I'm afraid you are mistaken.  Posix _should've_ required EOPNOTSUP in this,
but it actually requires EINVAL.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Ian Lepore
On Sat, 2018-02-10 at 11:24 -0700, Alan Somers wrote:
> On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen 
> wrote:
> 
> > 
> > Hi,
> > 
> > This has been disabled on ZFS since last November.
> > And I do understand the rationale on this.
> > 
> > BUT
> > 
> > I've now upgraded some of my HEAD Ceph test systems and they now fail,
> > since Ceph uses posix_fallocate() to allocate space for the
> > FileStore-journal.
> > 
> > Is there any expectation that this is going to fixed in any near future?
> > 
> > --WjW
> > 
> No.  It's fundamentally impossible to support posix_fallocate on a COW
> filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
> since the system call is merely advisory.
> 
> -Alan

Unfortunately, posix documents that the function returns EINVAL only
due to bad input parameters, so ignoring that seems like a bad idea.

Wouldn't it be better if we returned EOPNOTSUP if that's the actual
situation?  That could be safely ignored.

-- Ian

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Alan Somers
On Sat, Feb 10, 2018 at 11:50 AM, Willem Jan Withagen 
wrote:

> On 10/02/2018 19:24, Alan Somers wrote:
>
>> On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen > > wrote:
>>
>> Hi,
>>
>> This has been disabled on ZFS since last November.
>> And I do understand the rationale on this.
>>
>> BUT
>>
>> I've now upgraded some of my HEAD Ceph test systems and they now
>> fail, since Ceph uses posix_fallocate() to allocate space for the
>> FileStore-journal.
>>
>> Is there any expectation that this is going to fixed in any near
>> future?
>>
>> --WjW
>>
>>
>> No.  It's fundamentally impossible to support posix_fallocate on a COW
>> filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
>> since the system call is merely advisory.
>>
>
> Yup, that was what I'm going to do.
> But then I would like to know how to annotate it.
>
> And I guess that I'd get reactions submitting code to fix this, since the
> journal could run out of space.
> So I'd beter know what is going on.
>
> I seem to remember that on a pool level is is possible to reserve space
> whilest creating a filesystem? And then it could/should be fixed when
> building the disk-infra for an OSD.
>

Yes, you can easily reserve space for an entire filesystem.  Just do for
example "zfs create -o reservation=64GB mypool/myfs" .
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Garrett Wollman
In article
,
asom...@freebsd.org writes:

>On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen 
>wrote:

>> Is there any expectation that this is going to fixed in any near future?

>No.  It's fundamentally impossible to support posix_fallocate on a COW
>filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
>since the system call is merely advisory.

I don't think it's true that this is _fundamentally_ impossible.  What
the standard requires would in essence be a per-object refreservation.
ZFS supports refreservation, obviously, but not on a per-object basis.
Furthermore, there are mechanisms to preallocate blocks for things
like dumps.  So it *could* be done (as in, the concept is there), but
it may not be practical.  (And ultimately, there are ways in which the
administrator might manage the system that would defeat the desired
effect, but that's out of the standard's scope.)  Given the semantic
mismatch, though, I suspect it's unreasonable to expect anyone to
prioritize implementation of such a feature.

-GAWollman

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Willem Jan Withagen

On 10/02/2018 19:24, Alan Somers wrote:
On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen > wrote:


Hi,

This has been disabled on ZFS since last November.
And I do understand the rationale on this.

BUT

I've now upgraded some of my HEAD Ceph test systems and they now
fail, since Ceph uses posix_fallocate() to allocate space for the
FileStore-journal.

Is there any expectation that this is going to fixed in any near future?

--WjW


No.  It's fundamentally impossible to support posix_fallocate on a COW 
filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result, 
since the system call is merely advisory.


Yup, that was what I'm going to do.
But then I would like to know how to annotate it.

And I guess that I'd get reactions submitting code to fix this, since 
the journal could run out of space.

So I'd beter know what is going on.

I seem to remember that on a pool level is is possible to reserve space 
whilest creating a filesystem? And then it could/should be fixed when 
building the disk-infra for an OSD.


--WjW


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fallocate on ZFS

2018-02-10 Thread Alan Somers
On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen 
wrote:

> Hi,
>
> This has been disabled on ZFS since last November.
> And I do understand the rationale on this.
>
> BUT
>
> I've now upgraded some of my HEAD Ceph test systems and they now fail,
> since Ceph uses posix_fallocate() to allocate space for the
> FileStore-journal.
>
> Is there any expectation that this is going to fixed in any near future?
>
> --WjW
>

No.  It's fundamentally impossible to support posix_fallocate on a COW
filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
since the system call is merely advisory.

-Alan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


posix_fallocate on ZFS

2018-02-10 Thread Willem Jan Withagen

Hi,

This has been disabled on ZFS since last November.
And I do understand the rationale on this.

BUT

I've now upgraded some of my HEAD Ceph test systems and they now fail, 
since Ceph uses posix_fallocate() to allocate space for the 
FileStore-journal.


Is there any expectation that this is going to fixed in any near future?

--WjW
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: VIMAGE: vnet, epair and lots of jails on bridgeX - routing

2018-02-10 Thread Olivier Cochard-Labbé
On Sat, Feb 10, 2018 at 8:52 AM, O. Hartmann  wrote:

>
> The moment any of the bridges gets an additional member epair interface
> (so the bridge
> has at least three members including the on reaching into the virtual
> router jail) the
> vbridge seems to operate unpredictable (to me). Pinging jails memeber of
> that vbridge
> are unreachable.
>
>
​First idea:
Did you try with a more simple setup, like with just 3 jails members of one
bridge ?
=> I've tried it on a -head, and all 4 members  (3 jails and the host)
reach to communicate.

Second idea:
Can you check that all epairs have different MAC address each ?​
I hit this bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=176671

Regards,

Olivier
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Lua Loader changes posted

2018-02-10 Thread Warner Losh
I'm at a good point in the lua loader, so I've posted changed to
https://reviews.freebsd.org/D14295 for review.

It works great in the userboot.so loader test harness, but I've not booted
this on real hardware yet, so anyone wanting to take it for a spin be sure
you have an old-working loader or some other way to recover your machine /
VM.

It's purely opt-in right now. You must build WITHOUT_FORTH=yes and
WITH_LOADER_LUA=yes. The good news is you don't need a full buildworld. You
can just build stand (but don't forget the above options on both the build
and the install).

It's getting close. But there's no man page (it will be committed without
one unless someone else writes it). I've done a lot of cleanup of the
original GSoC code that was then cleaned up a bit by Rui, etc (all the
names of the contributors are in the review and will be in the commit
message). However, I'm sure the lua code could use some more cleanup. The
menu code is especially inefficient at drawing boxes and could likely use
some additional cleanup. Despite the slight roughness around the edges, I'm
grateful to Zakary Nafziger's menu changes from this past fall. They look a
lot better than the final GSoC state. I arbitrarily decided to follow the
same conventions we do with style(9) for the lua code.

The loader and os module code (the C code in lutil.c) could also use some
close scrutiny. I've cleaned up the LUA code to match the new lua modules
stuff, but haven't done this code at all and it likely could benefit from a
final pass there. The original code was written before you could have
integers, so there may be a couple of stray places that still use numbers.
If anyone with a lot of Lua embedding experience could look, that would be
great.

Please let me know what you think. It's been a long road to get /stand into
decent enough shape to allow us to build lua w/o changes to lua (OK, only 1
change to lua since there was a format that was forced to be floating point
that couldn't change with a #define). It's also quite a bit different than
either the final GSoC code, or the Rui branch, or the cleaned up, forward
ported Rui branch. Most of my changes have been to the C glue code, though
I did some tweaks to the lua code.

Warner
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: VIMAGE: vnet, epair and lots of jails on bridgeX - routing

2018-02-10 Thread Marko Zec
On Sat, 10 Feb 2018 08:52:21 +0100
"O. Hartmann"  wrote:

> Am Fri, 09 Feb 2018 16:43:17 +
> "Bjoern A. Zeeb"  schrieb:
> 
> > On 9 Feb 2018, at 16:22, O. Hartmann wrote:
> >   
> > > Am Thu, 8 Feb 2018 09:31:15 +0100
> > > "O. Hartmann"  schrieb:
> > >
> > > Is this problem to trivial?
> > 
> > I read through it yesterday and found myself in the position that I
> > need a whiteboard or paper and pencil or an ASCII art of your
> > situation.  But by the time I made it to the question I was
> > basically lost.  Could you massively simplify this and maybe
> > produce the ASCII art?
> > 
> > /bz
> > ___
> > freebsd-current@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to
> > "freebsd-current-unsubscr...@freebsd.org"  
> 
> All right.
> 
> I'm not much of an artist and at this very moment, I haven't much
> experience with neat ASCII art tools. But I'll provide a sketch
> later, but I also will simplify  the situation.
> 
> Consider three "vswitches", basically based on the creation of
> bridges, bridge0, bridge1, bridge2. Create at least three individual
> vnet-jails attached to each vbridge. Those jails have epair pseudo
> devices. The jail itself owns the "a-part" of the epair and the
> b-part is "member of the bridge". Each jail's epairXXXa has an IP
> assigned of the network the vswitch is part of. I mention a- and
> b-part of the epair here, because I thought it could matter, but I
> think for symmetry reasons it doesn't.
> 
> Now consider a further, special jail. This jail is supposed to have
> three epair devices, each one is reaching into one of the vbridges.
> This jail is the router/routing jail. Later, this jail should filter
> via IPFW the traffic between the three vbridges according to rules,
> but this doesn't matter here, beacuase the basics are not working as
> expected.
> 
> Now the problems. It doesn't matter on which jail of the three
> vswitches I login, the moment a vbridge has more than two member
> epairs (one  is alway member of the routing jail, now consider a
> database jail and a webserver jail), pinging each jail or the routing
> jail fails. It works sometimes for a couple of ICMP packets and then
> stops.
> 
> If each vbridge has only one member jail, I have NO PROBLEMS
> traversing accordingly to the static routing rules from one vbridge
> to any other, say from vbridge1 to vbridge0 or vbridge2 and any
> permutation of that.
> 
> The moment any of the bridges gets an additional member epair
> interface (so the bridge has at least three members including the on
> reaching into the virtual router jail) the vbridge seems to operate
> unpredictable (to me). Pinging jails memeber of that vbridge are
> unreachable.
> 
> Technical information:
> 
> The kernel has options IPFIREWALL, VIMAGE. The host's ipfw (kernel)
> declines packets by default. Each jail is configured to have ipfw
> "open".
> 
> Thanks for the patience.

If you could provide a script which sets up the topology you described
in two lengthy posts then others could reproduce this, and your chances
of getting useful feedback would certainly increase.

We also have a graphical tool (https://github.com/imunes/imunes) which
can set up a topology like you described in a few clicks of a mouse,
albeit using netgraph and ng_eiface instead of epairs, but I assume this
is irellevant as long as you are not aiming for maximum packet
throughputs.  If you attempt to use this tool, note that selecting
"stpswitch" will create if_bridge instances, whereas "lanswitch"
creates ng_bridge instances.

Good luck,

Marko


> 
> Kind regards,
> 
> O. Hartmann

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"