Re: [RFC] Replace gnu groff in base by heirloom doctools

2015-05-13 Thread Chris H
On Thu, 14 May 2015 02:02:11 +0200 Baptiste Daroussin  wrote

> Hi,
> 
> I plan to work in replacing GNU groff for FreeBSD 11.0 in base by heirloom
> doctools.
> 
> This mostly concern documentation in share/docs and the fallback when
> mandoc(1) is not able to render a manpage.
> 
> Heirloom doctools has progressed a lot recently and is now able to render
> correctly all the document we do provide in base, it has active development
> and integrate quickly new features.
> 
> Upstream have been very reactive to bug report I have sent to them and fixed
> them very quickly.
> 
> Heirloom has multiple advantages over GNU groff:
> - it is partially CDDL partially BSD license.
> - it is mainly written in C (to the exception of a single tool in C++ which I
> do   not plan to important)
> - it is derived from the original macros from AT&T (in particular ms(7))
> - it is smaller than GNU groff
> - it has better unicode support than GNU groff
> - it has better error reporting than GNU groff (which allowed me to fix a
> couple   of the documentation there)
> - heirloom manpages are mandoc(1) friendly which is not the case for GNU
> groff's   one
> 
> I do only plan to incorporate part of it and keeping our own version of tools
> we already have like: col(1), soelim(1), checknr(1) and vgrind(1).
> 
> mandoc(1) is still the target for rendering manpages and but I think keeping
> a fully functionnal roff(7) toolchain part of the base system is very good on
> a unix.
> 
> The issue we have with GNU groff is that newer version are in GPLv3 so we
> cannot upgrade base version to a newer version. Base version of GNU groff is
> a stripped down version of GNU groff so users willing to user some extra
> functionnality of GNU groff will have to rely on the ports tree.
> 
> what have already been done:
> - col(1): updated and fixed  base on work with the heirloom doctools and
> collaboration with OpenBSD folks. While there I have already sandboxed
> col(1) using capsicum.
> - checknr(1): now handles more roff(7) commands.
> - vgrind(11): modernize code base and synchronized some changes from NetBSD
> 
> I plan to import heirloom doctool later this month.
> 
> So far the only issue we have is with documents using pic(1) when rendering
> in ascii (postscript and pdf rendering are ok) upstream is working on a fix
> but I do not consider this as a blocker.
> 
> Allowing to have both gnu groff and heirloom at once switchable via an option
> will be hard so I plan to make the switch happening at once.
> 
> From what I could check I cannot find any regression when migrating from gnu
> groff to heirloom doctools, if there is a particular area when you think
> extra care is needed please share it.
> 
> Heirloom doctools: https://github.com/n-t-roff/heirloom-doctools
> 
> Best regards,
> Bapt
+1
Please do, and *thank you* for all the work you put into this!

--Chris


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] Replace gnu groff in base by heirloom doctools

2015-05-13 Thread Pedro Giffuni
+1

Great idea.

Pedro.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[RFC] Replace gnu groff in base by heirloom doctools

2015-05-13 Thread Baptiste Daroussin
Hi,

I plan to work in replacing GNU groff for FreeBSD 11.0 in base by heirloom
doctools.

This mostly concern documentation in share/docs and the fallback when mandoc(1)
is not able to render a manpage.

Heirloom doctools has progressed a lot recently and is now able to render
correctly all the document we do provide in base, it has active development and
integrate quickly new features.

Upstream have been very reactive to bug report I have sent to them and fixed
them very quickly.

Heirloom has multiple advantages over GNU groff:
- it is partially CDDL partially BSD license.
- it is mainly written in C (to the exception of a single tool in C++ which I do
  not plan to important)
- it is derived from the original macros from AT&T (in particular ms(7))
- it is smaller than GNU groff
- it has better unicode support than GNU groff
- it has better error reporting than GNU groff (which allowed me to fix a couple
  of the documentation there)
- heirloom manpages are mandoc(1) friendly which is not the case for GNU groff's
  one

I do only plan to incorporate part of it and keeping our own version of tools we
already have like: col(1), soelim(1), checknr(1) and vgrind(1).

mandoc(1) is still the target for rendering manpages and but I think keeping a
fully functionnal roff(7) toolchain part of the base system is very good on a
unix.

The issue we have with GNU groff is that newer version are in GPLv3 so we
cannot upgrade base version to a newer version. Base version of GNU groff is a
stripped down version of GNU groff so users willing to user some extra
functionnality of GNU groff will have to rely on the ports tree.

what have already been done:
- col(1): updated and fixed  base on work with the heirloom doctools and
collaboration with OpenBSD folks. While there I have already sandboxed
col(1) using capsicum.
- checknr(1): now handles more roff(7) commands.
- vgrind(11): modernize code base and synchronized some changes from NetBSD

I plan to import heirloom doctool later this month.

So far the only issue we have is with documents using pic(1) when rendering in
ascii (postscript and pdf rendering are ok) upstream is working on a fix but I
do not consider this as a blocker.

Allowing to have both gnu groff and heirloom at once switchable via an option
will be hard so I plan to make the switch happening at once.

From what I could check I cannot find any regression when migrating from gnu
groff to heirloom doctools, if there is a particular area when you think extra
care is needed please share it.

Heirloom doctools: https://github.com/n-t-roff/heirloom-doctools

Best regards,
Bapt


pgpT8SlIHoXkA.pgp
Description: PGP signature


r282420 omits /usr/lib/private/libssh_p.a

2015-05-13 Thread Trond Endrestøl
make delete-old can't finish off the /usr/lib/private directory due to 
the presence of libssh_p.a. Manual intervention is required UFN.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread Ian Lepore
On Wed, 2015-05-13 at 11:13 -0700, John-Mark Gurney wrote:
> Adrian Chadd wrote this message on Wed, May 13, 2015 at 08:34 -0700:
> > The reason I ask about "why is it faster?" is because for embedded-y
> > things with low RAM we may not want that to happen due to memory
> > constraints. However, we may actually want to do some form of
> > autotuning on some platforms.
> 
> If you're already running a program, the difference between 1k and
> 8k isn't significant... I'll give you 64k can be significant for
> embedded-y platforms...  But this goes back to the, we need a global
> knob saying I want low memory usage, and I am willing to pay for it
> in performance...
> 

It is NOT just a difference of 1K vs 8K.  It's that much times however
many BUFSIZ-sized things a program has allocated at once.  It's where
they are allocated.  As I've already pointed out, BUFSIZ appears in the
base code over 2000 times.  Where is the analysis of the impact an 8x
change is going to have on all those uses?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread John-Mark Gurney
Adrian Chadd wrote this message on Wed, May 13, 2015 at 08:34 -0700:
> The reason I ask about "why is it faster?" is because for embedded-y
> things with low RAM we may not want that to happen due to memory
> constraints. However, we may actually want to do some form of
> autotuning on some platforms.

If you're already running a program, the difference between 1k and
8k isn't significant... I'll give you 64k can be significant for
embedded-y platforms...  But this goes back to the, we need a global
knob saying I want low memory usage, and I am willing to pay for it
in performance...

> So, if it's underlying block size, maybe BUFSIZ isn't the thing to
> tweak, but based on disk io buffer size.
> If it's filling L1 or L2 cache with useful work, maybe auto-tune it
> based on that.

I'm pretty sure this is just simply, syscalls+copies are expensive,
and larger block sizes reduces the number of calls, going from 1k to
64k means 64 times less syscalls...

So, in my benchmark, we went from 148271 syscalls/second to 3228
syscalls/second for 64k block size, and we got a 40% perf increase on
top of this...  i.e. we spend ~40% of the cpu time to do 145k syscalls
instead of doing real work...

> Please don't take this as bikeshedding, I'd really like to see some
> "this is why it's faster" analysis rather than just numbers thrown
> around.

I don't really see a need to analyize this any more... We are batching
work in a more effecient manner...  I could list many other examples
of where we do similar optimizations...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread John-Mark Gurney
Hans Petter Selasky wrote this message on Wed, May 13, 2015 at 10:35 +0200:
> On 05/13/15 10:27, David Chisnall wrote:
> > On 13 May 2015, at 09:03, John-Mark Gurney  wrote:
> >>
> >> Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
> >>> 
> >>> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
> >>>
>  Also, you'd probably see even better performance by increasing the
>  size to 64k, [...]
> >>>
> >>> easy:
> >>>   8K on 32bit
> >>>   64k on 64bit
> >>
> >> Sounds good to me...  Just for people who care... I did a quick set of
> >> benchmarks on sha256.. This is using my preliminary patch to use sse4
> >> optimized sha256...  But this should be the same for others...
> >>
> >> The numbers in ministat output are the time in seconds it takes my
> >> 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> >> numbers are better..  I've processed them into easier to read format:
> >> BUFSIZ:145MB/sec
> >> 8k:193MB/sec
> >> 16k:   198MB/sec
> >> 64k:   202MB/sec
> >> 128k:  202MB/sec
> >> -t:211MB/sec
> >
> > It looks like most of the benefit is gained at 16KB.  Did you try running 
> > the benchmark with something else running at the same time to see if there 
> > is any advantage in trashing the caches a bit less (simple case, what 
> > happens if you run two instances of the same benchmark at once)?
> >
> > I suspect that you???re about right anyway - I recently did some tests 
> > while playing with JavaScript FFI generation with a multithreaded process 
> > JavaScript environment calling out to OpenSSL to do SHA calculations and 
> > having each of 8 threads reading in 128KB chunks gave the fastest 
> > performance (Core i7, 4 cores + hyperthreading), with only a negligible 
> > gain over 64KB.  In all cases, the JavaScript implementation was 
> > significantly faster than the openssl tool, which used 8KB buffers.
> 
> You should also try this using an USB disk. The performance numbers 
> heavily depends on the hardware's interrupt moderation values.

This shouldn't matter.. I wasn't flushing the buffer cache between
runs, so this was entirely from the buffer cache...  This is purely,
syscall+copy overhead that is being measured here...  No matter what
you're source is, NFS, USB disk, you'll always have this overhead...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CFR: a new __unreachable() builtin

2015-05-13 Thread David Chisnall
On 13 May 2015, at 17:05, Pedro Giffuni  wrote:
> 
> Hello;
> 
> I am looking at the cdefs in other BSDs hoping to avoid adopting the
> same definitions with incompatible names and I noticed NetBSD is using
> a new __builtin_unreachable (void) function from gcc 4.6:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> 
> Apparently it was interesting enough that clang implemented it too so
> I created a code review differential for it.
> 
> https://reviews.freebsd.org/D2536
> 
> I don't want to add new C definitions unless they are going to be used
> so feel free to comment on the convenience or not of having it.

LLVM uses this quite heavily, in a macro that expands to something equivalent 
to assert(0 && "unreachable reached!”) in debug mode and 
__builtin_unreachable() in release mode.  When you’re debugging, you get errors 
if you reach unreachable code and in deployment the compiler gets a useful hint 
for optimisation.

David

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

CFR: a new __unreachable() builtin

2015-05-13 Thread Pedro Giffuni

Hello;

I am looking at the cdefs in other BSDs hoping to avoid adopting the
same definitions with incompatible names and I noticed NetBSD is using
a new __builtin_unreachable (void) function from gcc 4.6:

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

Apparently it was interesting enough that clang implemented it too so
I created a code review differential for it.

https://reviews.freebsd.org/D2536

I don't want to add new C definitions unless they are going to be used
so feel free to comment on the convenience or not of having it.

Pedro.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread Adrian Chadd
[snip]

The reason I ask about "why is it faster?" is because for embedded-y
things with low RAM we may not want that to happen due to memory
constraints. However, we may actually want to do some form of
autotuning on some platforms.

So, if it's underlying block size, maybe BUFSIZ isn't the thing to
tweak, but based on disk io buffer size.
If it's filling L1 or L2 cache with useful work, maybe auto-tune it
based on that.
If it's hiding interrupt latency over USB, then that should be addressed.
etc, etc.

Please don't take this as bikeshedding, I'd really like to see some
"this is why it's faster" analysis rather than just numbers thrown
around.



-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread Ian Lepore
On Wed, 2015-05-13 at 10:35 +0200, Hans Petter Selasky wrote:
> On 05/13/15 10:27, David Chisnall wrote:
> > On 13 May 2015, at 09:03, John-Mark Gurney  wrote:
> >>
> >> Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
> >>> 
> >>> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
> >>>
>  Also, you'd probably see even better performance by increasing the
>  size to 64k, [...]
> >>>
> >>> easy:
> >>>   8K on 32bit
> >>>   64k on 64bit
> >>
> >> Sounds good to me...  Just for people who care... I did a quick set of
> >> benchmarks on sha256.. This is using my preliminary patch to use sse4
> >> optimized sha256...  But this should be the same for others...
> >>
> >> The numbers in ministat output are the time in seconds it takes my
> >> 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> >> numbers are better..  I've processed them into easier to read format:
> >> BUFSIZ:145MB/sec
> >> 8k:193MB/sec
> >> 16k:   198MB/sec
> >> 64k:   202MB/sec
> >> 128k:  202MB/sec
> >> -t:211MB/sec
> >
> > It looks like most of the benefit is gained at 16KB.  Did you try running 
> > the benchmark with something else running at the same time to see if there 
> > is any advantage in trashing the caches a bit less (simple case, what 
> > happens if you run two instances of the same benchmark at once)?
> >
> > I suspect that you’re about right anyway - I recently did some tests while 
> > playing with JavaScript FFI generation with a multithreaded process 
> > JavaScript environment calling out to OpenSSL to do SHA calculations and 
> > having each of 8 threads reading in 128KB chunks gave the fastest 
> > performance (Core i7, 4 cores + hyperthreading), with only a negligible 
> > gain over 64KB.  In all cases, the JavaScript implementation was 
> > significantly faster than the openssl tool, which used 8KB buffers.
> >
> 
> Hi,
> 
> You should also try this using an USB disk. The performance numbers 
> heavily depends on the hardware's interrupt moderation values.


All this discussion should be happening in phabricator, not the email
that announces the review on phab.  But, since it's now happening here,
I guess I'll transplant my comments from there to here...

There are 2125 occurrances of BUFSIZ in the base code (probably 95% of
them inappropriately used to size a local temp buffer or string). Do you
really want to perturb that much working tested software because it
makes md5 faster? How many of those occurrances are stack-allocated
variables and is it wise to allocate 8k buffers on the stack for all of
them? How about existing programs (not necessarily in base) that open
many streams concurrently... what will be the impact of a sudden 8x
increase in memory usage for them?

It seems to me that if libmd needs bigger buffers to perform well, it
should use setvbuf().

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Increase BUFSIZ to 8192

2015-05-13 Thread Hans Petter Selasky

On 05/13/15 10:27, David Chisnall wrote:

On 13 May 2015, at 09:03, John-Mark Gurney  wrote:


Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:


In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:


Also, you'd probably see even better performance by increasing the
size to 64k, [...]


easy:
8K on 32bit
64k on 64bit


Sounds good to me...  Just for people who care... I did a quick set of
benchmarks on sha256.. This is using my preliminary patch to use sse4
optimized sha256...  But this should be the same for others...

The numbers in ministat output are the time in seconds it takes my
3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
numbers are better..  I've processed them into easier to read format:
BUFSIZ: 145MB/sec
8k: 193MB/sec
16k:198MB/sec
64k:202MB/sec
128k:   202MB/sec
-t: 211MB/sec


It looks like most of the benefit is gained at 16KB.  Did you try running the 
benchmark with something else running at the same time to see if there is any 
advantage in trashing the caches a bit less (simple case, what happens if you 
run two instances of the same benchmark at once)?

I suspect that you’re about right anyway - I recently did some tests while 
playing with JavaScript FFI generation with a multithreaded process JavaScript 
environment calling out to OpenSSL to do SHA calculations and having each of 8 
threads reading in 128KB chunks gave the fastest performance (Core i7, 4 cores 
+ hyperthreading), with only a negligible gain over 64KB.  In all cases, the 
JavaScript implementation was significantly faster than the openssl tool, which 
used 8KB buffers.



Hi,

You should also try this using an USB disk. The performance numbers 
heavily depends on the hardware's interrupt moderation values.


--HPS

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Increase BUFSIZ to 8192

2015-05-13 Thread David Chisnall
On 13 May 2015, at 09:03, John-Mark Gurney  wrote:
> 
> Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
>> 
>> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
>> 
>>> Also, you'd probably see even better performance by increasing the
>>> size to 64k, [...]
>> 
>> easy:
>>  8K on 32bit
>>  64k on 64bit
> 
> Sounds good to me...  Just for people who care... I did a quick set of
> benchmarks on sha256.. This is using my preliminary patch to use sse4
> optimized sha256...  But this should be the same for others...
> 
> The numbers in ministat output are the time in seconds it takes my
> 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> numbers are better..  I've processed them into easier to read format:
> BUFSIZ:   145MB/sec
> 8k:   193MB/sec
> 16k:  198MB/sec
> 64k:  202MB/sec
> 128k: 202MB/sec
> -t:   211MB/sec

It looks like most of the benefit is gained at 16KB.  Did you try running the 
benchmark with something else running at the same time to see if there is any 
advantage in trashing the caches a bit less (simple case, what happens if you 
run two instances of the same benchmark at once)?

I suspect that you’re about right anyway - I recently did some tests while 
playing with JavaScript FFI generation with a multithreaded process JavaScript 
environment calling out to OpenSSL to do SHA calculations and having each of 8 
threads reading in 128KB chunks gave the fastest performance (Core i7, 4 cores 
+ hyperthreading), with only a negligible gain over 64KB.  In all cases, the 
JavaScript implementation was significantly faster than the openssl tool, which 
used 8KB buffers.

David

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Increase BUFSIZ to 8192

2015-05-13 Thread John-Mark Gurney
Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
> 
> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
> 
> >Also, you'd probably see even better performance by increasing the
> >size to 64k, [...]
> 
> easy:
>   8K on 32bit
>   64k on 64bit

Sounds good to me...  Just for people who care... I did a quick set of
benchmarks on sha256.. This is using my preliminary patch to use sse4
optimized sha256...  But this should be the same for others...

The numbers in ministat output are the time in seconds it takes my
3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
numbers are better..  I've processed them into easier to read format:
BUFSIZ: 145MB/sec
8k: 193MB/sec
16k:198MB/sec
64k:202MB/sec
128k:   202MB/sec
-t: 211MB/sec

x def.times
+ 8k.times
* 16k.times
% 64k.times
# 128k.times
+-+
|#%  *+ x |
|#%  *+ x |
|#%  *+ x |
|##  *+ xx|
|A|  AA|A||
+-+
N   Min   MaxMedian   AvgStddev
x   5  3.53  3.55  3.53 3.536  0.0089442719
+   5  2.65  2.66  2.65 2.654  0.0054772256
Difference at 95.0% confidence
-0.882 +/- 0.0108161
-24.9434% +/- 0.305885%
(Student's t, pooled s = 0.0074162)
*   5  2.58  2.59  2.58 2.584  0.0054772256
Difference at 95.0% confidence
-0.952 +/- 0.0108161
-26.9231% +/- 0.305885%
(Student's t, pooled s = 0.0074162)
%   5  2.53  2.54  2.54 2.538   0.004472136
Difference at 95.0% confidence
-0.998 +/- 0.0103127
-28.224% +/- 0.29165%
(Student's t, pooled s = 0.00707107)
#   5  2.53  2.54  2.53 2.532   0.004472136
Difference at 95.0% confidence
-1.004 +/- 0.0103127
-28.3937% +/- 0.29165%
(Student's t, pooled s = 0.00707107)

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"