Re: levels of Windows support

2017-05-09 Thread Eli Zaretskii
> From: Paul Eggert 
> Date: Tue, 9 May 2017 14:08:19 -0700
> 
> I suppose it'd be nicer if Emacs could use the Gnulib versions of the 
> code instead of having its own version, as that would help Emacs and 
> other GNU programs share solutions better; but this would require some 
> hacking and it's not clear it's worth the effort (which I wouldn't be 
> able to do, myself).

As explained elsewhere, Emacs cannot rely on Gnulib completely, due to
some features Gnulib doesn't support, like file names which cannot be
expressed within the current system codepage's character set.  Another
class of issues that prevent full Gnulib acceptance is the support for
'select' on non-socket handles, which in Emacs goes together with
support for async subprocesses and SIGCHLD emulation.

> I hope we don't need to put "#ifdef EMACS_CONFIGURATION" in a lot of 
> places in Gnulib.

I don't expect that to be in a lot of places, because we generally
disable incorporation of Gnulib modules in the MinGW build on the
module level.



Re: levels of Windows support

2017-05-09 Thread Eli Zaretskii
> From: Bruno Haible 
> Date: Tue, 09 May 2017 22:17:58 +0200
> 
> Would a gnulib-wide option "ignore mingw and MSVC portability" be useful for
> Emacs?

Not as a global option, because the MinGW port of Emacs does use some
Gnulib functions.

> Would a gnulib-wide option "ignore MSVC portability" be useful for Emacs?

Yes, most probably, because Emacs no longer supports MSVC builds
(although you might still find traces of that in the sources).



Re: plans for file related modules on Windows

2017-05-09 Thread Paul Eggert

On 05/09/2017 01:34 PM, Bruno Haible wrote:

- windows-year2038 : define time_t as 64-bit (might involve renaming module 
time to time-h)


Such a module could be useful on non-MS-Windows platforms too. The 
module could support functions like localtime even on 32-bit platforms 
that can't handle time stamps past 2038. (It'd have trouble with 
functions like 'stat', of course.) This would be useful on GNU/Linux 
x86, for example. It'd be a no-op on platforms like recent OpenBSD x86, 
which already uses 64-bit time_t. A similar point applies to some of the 
other modules you mentioned, e.g., windows-uid. So perhaps these modules 
should not have "windows-" in their names.



Simon Josefsson proposed something along these lines a decade ago:

https://lists.gnu.org/archive/html/bug-gnulib/2007-03/msg00106.html

The project is somewhat more urgent now than it was back then.


Also please see plans in time_t area in the Linux kernel and in glibc, 
summarized here:


https://lwn.net/Articles/643234/
https://sourceware.org/glibc/wiki/Y2038ProofnessDesign







Re: levels of Windows support

2017-05-09 Thread Paul Eggert

On 05/09/2017 01:17 PM, Bruno Haible wrote:

Hi Eli and Paul,

I'm trying to understand whether the #ifs you are adding for Emacs could
be generalized for gnulib users other than Emacs.

On one hand, Eli writes:

Emacs dropped MSVC support a few years ago.  Only MinGW, with its 2
flavors, is currently supported, in addition to Cygwin.

On the other hand, Paul commits patches that disable mingw AND MSVC support,
presumably for functions that are implemented in Emacs' w32.c.


Yes, the idea there is to avoid having Emacs have two sets of C source 
code to support the same low-level function, as this complicates 
maintenance (and could lead to bugs).


I suppose it'd be nicer if Emacs could use the Gnulib versions of the 
code instead of having its own version, as that would help Emacs and 
other GNU programs share solutions better; but this would require some 
hacking and it's not clear it's worth the effort (which I wouldn't be 
able to do, myself).



Would a gnulib-wide option "ignore mingw and MSVC portability" be useful for
Emacs? Is it something that we should offer as a documented feature?


It could be a little tricky to define exactly what we mean by "ignore 
mingw portability". Aside from the EMACS_CONFIGURATION hack in 
gnulib/lib/utimens.c, gnulib/lib/time.in.h looks at __MINGW32__; 
presumably the latter should be retained for Emacs even if Emacs uses 
the gnulib-tool option you're thinking of. These are the only two 
collisions I know of right now.



Would a gnulib-wide option "ignore MSVC portability" be useful for Emacs?
Is it something that we should offer as a documented feature?
(I would see it as quite arbitrary: it is like supporting GCC but not SUNWspro
cc on Solaris.)

If the answer to both questions is "no", then OK, the best approach then is to
continue with "#ifndef EMACS_CONFIGURATION" here and there.



I hope we don't need to put "#ifdef EMACS_CONFIGURATION" in a lot of 
places in Gnulib. Currently it's present in only one place, which is 
something we can tolerate. But a dozen places would indicate that we 
need a better solution, perhaps along the lines you're suggesting.




Re: plans for file related modules on Windows

2017-05-09 Thread Ben Pfaff
On Tue, May 09, 2017 at 10:34:04PM +0200, Bruno Haible wrote:
> It's clear that different gnulib users need different levels of native Windows
> support. Some will want to avoid 'struct stat', some will want to use the 
> ino_t
> values in struct stat.
> 
> Here's my current plan: Introduce a set of orthogonal, transversal modules.
> ("transversal" in the sense that such a module does not provide a function,
> but rather provides guarantees for other modules.)
> 
> - windows-year2038 : define time_t as 64-bit (might involve renaming module 
> time to time-h)
> - windows-symlinks : add symlink support to stat, lstat, readlink etc.
> - windows-stat-timespec : add 100ns resolution to file times
> - windows-stat-inodes : redefine dev_t, ino_t
> - windows-uid : redefine uid_t [1]
> - windows-gid : redefine gid_t [1]
> - windows-utf8-filenames : implies override of all file-related functions
> - largefile : support for files > 2 GB (already partially implemented in 2012)
> 
> An override of 'struct stat' will be necessary for windows-year2038,
> windows-stat-timespec, windows-stat-inodes, windows-uid, windows-gid, 
> largefile.
> 
> What do you think about it?
> 
> Which of these modules would you like to see implemented first?

I did not realize that Windows could even support a proper
implementation of the struct stat st_dev and st_ino.  I'd find this
useful in multiple programs, although in some of them I might really
just use the code you write as an educational resource.



plans for file related modules on Windows

2017-05-09 Thread Bruno Haible
Hi,

It's clear that different gnulib users need different levels of native Windows
support. Some will want to avoid 'struct stat', some will want to use the ino_t
values in struct stat.

Here's my current plan: Introduce a set of orthogonal, transversal modules.
("transversal" in the sense that such a module does not provide a function,
but rather provides guarantees for other modules.)

- windows-year2038 : define time_t as 64-bit (might involve renaming module 
time to time-h)
- windows-symlinks : add symlink support to stat, lstat, readlink etc.
- windows-stat-timespec : add 100ns resolution to file times
- windows-stat-inodes : redefine dev_t, ino_t
- windows-uid : redefine uid_t [1]
- windows-gid : redefine gid_t [1]
- windows-utf8-filenames : implies override of all file-related functions
- largefile : support for files > 2 GB (already partially implemented in 2012)

An override of 'struct stat' will be necessary for windows-year2038,
windows-stat-timespec, windows-stat-inodes, windows-uid, windows-gid, largefile.

What do you think about it?

Which of these modules would you like to see implemented first?

Bruno

[1] as proposed by Bastien Roucariès
https://lists.gnu.org/archive/html/bug-gnulib/2011-09/msg00130.html




Re: tzset: add native Windows workaround

2017-05-09 Thread Bruno Haible
Ken Brown wrote:
> ... Here's what I see on my Cygwin system:
> 
> $ echo $TZ
> America/New_York
> 
> $ date +'%Y-%m-%d %H:%M:%S %z (%Z)'
> 2017-05-03 07:29:03 -0400 (EDT)
> 
> $ TZ= date +'%Y-%m-%d %H:%M:%S %z (%Z)'
> 2017-05-03 11:29:14 + (GMT)
> 
> $ unset TZ
> 
> $ date +'%Y-%m-%d %H:%M:%S %z (%Z)'
> 2017-05-03 07:29:30 -0400 (EDT)

Thanks for correcting me. I had (incorrectly) assumed that an unset value
and an empty value are equivalent, like for LANG.

I've now brought up the issue on the Cygwin mailing list:
  
Let's see...

Bruno




Re: tzset: add native Windows workaround

2017-05-09 Thread Bruno Haible
Hi Paul,

> > Only for Cygwin, an empty or absent TZ environment variable means GMT.
> 
> That's weird; I don't know of any other system that does that.

Misunderstanding: I meant only among the platforms on Windows. I.e. for mingw 
and
MSVC, an empty or absent TZ environment variable means the system's notion of
time zone.

> > How about this revised comment?
> 
> > + There are two possible kinds of such values:
> > +   - Traditional US time zone names, e.g. "PST8PDT",
> > +   - tzdata time zone names, based on geography.  They contain one
> > + or more slashes.
> 
> As a point of terminology, "PST8PDT" is as much of a tzdata time zone name as 
> "America/Los_Angeles" is. They are both implemented by consulting a file by 
> that 
> name. And some of the slashless tzdata names are based on geography, e.g., 
> "Singapore", "GB-Eire" (these are mostly present for backward compatibility, 
> but 
> some people still use them). Perhaps it would be better to summarize things 
> by 
> wording the comment something like this:
> 
> 
> TZ values are of two kinds:
> 
> - Values supported by the Microsoft CRT, e.g., "PST+8PDT". See 
> . The documented 
> values 
> of this form are matched by the POSIX extended regular expression 
> "^[A-Za-z]{3}[-+]?[01]?[0-9](:[0-5][0-9](:[0-5][0-9])?)?([A-Za-z]{3})?$".
> 
> - Values supported by Cygwin, e.g., "America/Los_Angeles". Typically, each of 
> these values corresponds to the name of a file installed somewhere on the 
> system. However, some of these values are analyzed programmatically based on 
> rules specified by POSIX, e.g., "PST8PDT,M3.2.0,M11.1.0"; see 
> .
> 
> The two kinds of TZ values overlap, e.g., "PST8PDT" is valid in both 
> implementations.  However, most TZ values supported by Cygwin do not work 
> with 
> the Microsoft CRT, which silently uses UTC when given such values. In 
> practice 
> most of these troublesome TZ values contain '/', and no TZ value supported by 
> the Microsoft CRT contains '/', so as a heuristic neutralize any TZ value 
> containing '/'. For the Microsoft CRT, an absent or empty TZ means the time 
> zone 
> that the user has set in the Windows Control Panel.

Thanks for the explanations. However, your wording is quite confusing to me,
because it talks about the possible syntaxes and their different interpretations
at the same time, and because of the overlap. For clarity, I prefer to talk
about disjoint cases. Here's what I have come up with:


2017-05-09 Bruno Haible  

tzset: Expand comment about TZ problem on native Windows.
* lib/tzset.c (tzset): Elaborate comment, based on explanations by
Paul Eggert.
* lib/ctime.c (rpl_ctime): Likewise.
* lib/localtime.c (rpl_localtime): Likewise.
* lib/mktime.c (mktime): Likewise.
* lib/strftime-fixes.c (rpl_strftime): Likewise.
* lib/wcsftime.c (rpl_wcsftime): Likewise.

diff --git a/lib/tzset.c b/lib/tzset.c
index ce854b9..bec4dfe 100644
--- a/lib/tzset.c
+++ b/lib/tzset.c
@@ -41,9 +41,28 @@ tzset (void)
 #endif
 
 #if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
-  /* If the environment variable TZ has been set by Cygwin, neutralize it.
- The Microsoft CRT interprets TZ differently than Cygwin and produces
- incorrect results if TZ has the syntax used by Cygwin.  */
+  /* Rectify the value of the environment variable TZ.
+ There are four possible kinds of such values:
+   - Traditional US time zone names, e.g. "PST8PDT".  Syntax: see
+ 
+   - Time zone names based on geography, that contain one or more
+ slashes, e.g. "Europe/Moscow".
+   - Time zone names based on geography, without slashes, e.g.
+ "Singapore".
+   - Time zone names that contain explicit DST rules.  Syntax: see
+ 

+ The Microsoft CRT understands only the first kind.  It produces incorrect
+ results if the value of TZ is of the other kinds.
+ But in a Cygwin environment, /etc/profile.d/tzset.sh sets TZ to a value
+ of the second kind for most geographies, or of the first kind in a few
+ other geographies.  If it is of the second kind, neutralize it.  For the
+ Microsoft CRT, an absent or empty TZ means the time zone that the user
+ has set in the Windows Control Panel.
+ If the value of TZ is of the third or fourth kind -- Cygwin programs
+ understand these syntaxes as well --, it does not matter whether we
+ neutralize it or not, since these values occur only when a Cygwin user
+ has set TZ explicitly; this case is 1. rare and 2. under the user's
+ responsibility.  */
   const char *tz = getenv ("TZ");
   if (tz != NULL && 

Re: The "regex" module brings in GPLv3 code even with --lgpl

2017-05-09 Thread Ævar Arnfjörð Bjarmason
On Mon, May 8, 2017 at 9:43 PM, Bruno Haible  wrote:
> Ævar Arnfjörð Bjarmason wrote:
>> > When you use the --lgpl option to gnulib-tool, it should replace the
>> > copyright headers of the files accordingly. If not, that's a bug in 
>> > gnulib-tool.
>>
>> That wasn't  happening in the latest gnulib.git today:
>>
>> ---cut---
>> $ rm -rf /tmp/rx.tmp; ./gnulib-tool --version; ./gnulib-tool --lgpl 
>> --create-testdir --dir=/tmp/rx.tmp regex >/dev/null
>
> Well; I should have said "When you use the --lgpl option to gnulib-tool,
> together with the --import / --add-import / --remove-import / --update
> options". --create-testdir does not do this processing, because it's not
> useful for a testdir.
>
>> As an aside, is there a way to make gnulib-tool emit just the *.c &
>> *.h files needed for e.g. the regex module in some target directory?
>
> Well, you can "rm -rf glm4" after having run gnulib-tool. But this mode
> of using gnulib-tool is not supported, because there is *plenty* of
> autoconf macro processing that prepares the subsequent build.
>
>> I'm using gnulib-distributed files in a project that doesn't use the
>> autoconf/automake/m4 macros
>
> In this situation I would suggest to create a subdirectory of the
> project, with a configure.ac file of its own, just for building the
> libgnu.a. This way, the configuration of the project stays the same
> way as it is, i.e. the way the developers know it and like it, and
> the subdirectory's Makefile.am and configure.ac deal only with gnulib.
>
>> But if the license header in the files themselves can't be trusted
>
> The license header in the files can be trusted _after_ you have used --lgpl
> in combination with --import / --add-import / --remove-import / --update.
>
>> Aside from me not finding some invocation to the tool that surely
>> exists to do this, is there a reason the file in git wouldn't just
>> have the most permissive license it's licensed under in the header
>> itself?
>
> The way we do the license handling in gnulib is the result of lengthy
> discussions and lots of considerations. Of course your suggestion was
> one of the approaches that were discussed, but it was not the one that
> was adopted.

All makes sense. Thanks a lot for your help.