Re: multiple distfiles from multiple sources and with multiple compression types

2016-04-01 Thread René J . V . Bertin
Ryan Schmidt wrote:

>> $ file -b /opt/local/var/macports/distfiles/scapy/scapy-2.3.1.zip
>> POSIX shell script executable (binary data)

Those ought to be detected as executable scripts by libmagic, not as archives.

> 
> Interesting, I did not know that kind of thing existed / was possible.

Many installers on MSWin are or used to be executable zip files; zip archives 
with an executable wrapper. Zip recognises its own wrapper of course, and will 
skip to the archive part if you hand it such a file.

Not 100% the same, but

man shar
man bzexe

;)

R

___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-31 Thread Ryan Schmidt

On Mar 31, 2016, at 11:35 AM, Rainer Müller wrote:

> Using magic bytes would even result in wrong results in some cases. For
> example for scapy, the distfile is intentionally a polyglot. It is both
> a valid shell script and a zip file:
> 
> $ file -b /opt/local/var/macports/distfiles/scapy/scapy-2.3.1.zip
> POSIX shell script executable (binary data)

Interesting, I did not know that kind of thing existed / was possible.

___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-31 Thread Rainer Müller
On 2016-03-31 03:16, Ryan Schmidt wrote:
> libmagic should be used to determine the compression format; filename 
> extension should not be used.

Was there ever a case with a .tar.gz that was actually compressed with
bzip2 or something like this? The current approach of 'use_* yes' also
assumes ${extract.suffix} and ${extract.cmd} to match...

Even then, ports could still overwrite the extract phase. We could also
still allow to set an explicit extract.cmd to overwrite the guessed
extract command.

I do not want to complicate this more than needed.

Using magic bytes would even result in wrong results in some cases. For
example for scapy, the distfile is intentionally a polyglot. It is both
a valid shell script and a zip file:

$ file -b /opt/local/var/macports/distfiles/scapy/scapy-2.3.1.zip
POSIX shell script executable (binary data)

Rainer


PS: For the curious, I found the scapy example with the following shell
snippet:

find /opt/local/var/macports/distfiles -type f -exec file -i {} \; \
|perl -nle 'if (/^[a-zA-Z0-9.\/-]+\.(.*): .*\/x-(.*);/) {
  ($m = $2) =~ s/^gzip$/gz/; $m =~ s/^bzip2$/bz2/;
  if ($1 ne $2) { print $_; }
}'
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-31 Thread René J . V . Bertin
On Wednesday March 30 2016 20:16:37 Ryan Schmidt wrote:

>libmagic should be used to determine the compression format; filename 
>extension should not be used.

I'd agree with that - if introducing an additional dependency isn't an issue.
I guess that the set of compression formats to recognise is limited and rather 
well-known, so alternatively a home-grown solution could be used. I've 
implemented something like that for an old "less" version long ago that I've 
been maintaining (because I never felt like porting it to newer versions =)). 
I'd be happy to donate that code so it can be bundled in a little Tcl package - 
that way it should be relatively straightforward to implement an updating 
mechanism for just that extension if ever a need arises to support new formats 
without requiring a full MacPorts upgrade.

On a related note: didn't I hear/see discussion about the possibilities of 
providing certain features as (more or less obligatory) ports?

R.
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-30 Thread Ryan Schmidt

On Mar 30, 2016, at 3:06 PM, Mojca Miklavec wrote:

> On 30 March 2016 at 19:25, René J. V. wrote:
>> 
>>> It is fairly obvious how to extract these files. There is usually no
>>> need to specify how these files should be extracted, since most
>>> compression methods can be deduced from the filename.
>>> 
>>> The command 'use_* yes' would then only set the extract.suffix
>>> accordingly, instead of the explicit extract.cmd. These two options
>>> would then be equivalent:
>>> 
>>> use_xz yes
>>> extract.suffix .tar.xz
>> 
>> Still, how would that be implemented in "base", except by an algorithm that
>> determines how to extract each file (that is to be handled in the extract 
>> phase)
>> based on the filename or (better) from a magic cookie?
> 
> Those are the details. If a distfile has a standard extension, it's no
> problem figuring out how to extract it.

libmagic should be used to determine the compression format; filename extension 
should not be used.

___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-30 Thread Mojca Miklavec
On 30 March 2016 at 19:25, René J. V. wrote:
>
>> It is fairly obvious how to extract these files. There is usually no
>> need to specify how these files should be extracted, since most
>> compression methods can be deduced from the filename.
>>
>> The command 'use_* yes' would then only set the extract.suffix
>> accordingly, instead of the explicit extract.cmd. These two options
>> would then be equivalent:
>>
>> use_xz yes
>> extract.suffix .tar.xz
>
> Still, how would that be implemented in "base", except by an algorithm that
> determines how to extract each file (that is to be handled in the extract 
> phase)
> based on the filename or (better) from a magic cookie?

Those are the details. If a distfile has a standard extension, it's no
problem figuring out how to extract it.

I opened
https://trac.macports.org/ticket/50969

Mojca
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-30 Thread René J . V . Bertin
Rainer Müller wrote:


> In case of multiple distfiles you will always specify the full name
> anyway. This would be that for an option such as this example:
> 
> distfilesfoo.tar.gz \
>  bar.tar.xz

Yes, *fetching* isn't the problem here.

> It is fairly obvious how to extract these files. There is usually no
> need to specify how these files should be extracted, since most
> compression methods can be deduced from the filename.
> 
> The command 'use_* yes' would then only set the extract.suffix
> accordingly, instead of the explicit extract.cmd. These two options
> would then be equivalent:
> 
> use_xz yes
> extract.suffix .tar.xz

Still, how would that be implemented in "base", except by an algorithm that 
determines how to extract each file (that is to be handled in the extract 
phase) 
based on the filename or (better) from a magic cookie?

R.

___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-26 Thread Rainer Müller
On 2016-03-26 22:20, René J.V. Bertin wrote:
> On Saturday March 26 2016 02:27:49 Ryan Schmidt wrote:
> 
>> Don't think of them as commands. Think of them as radio buttons.
> 
> Typically that would mean that if you "push" another one of them, the 
> previous choice is undone.
> That doesn't seem to be the case here.
> 
>> Hmm, that could be considered a bug. I think you work around it by setting 
>> "use_xz no" as well.
> 
> I could try, but from what I remember of my understanding of the 
> implementation, it was simply not foreseen to unset every possible setting. 
> At least not to go back to the default, use_gz.
> 
>> There was some interest expressed at the MacPorts meeting in Slovenia to 
>> make MacPorts automatically detect compression formats during decompression 
>> and thus make the "use_xz" etc. options unnecessary. Notably, the tar 
>> command in OS X 10.9 (?) and later already supports automatic compression 
>> format detection (though we don't use that in MacPorts at this time: we 
>> decompress, then pipe the decompressed file to tar).
> 
> That would mean you'd need another method of specifying the suffix, no?

In case of multiple distfiles you will always specify the full name
anyway. This would be that for an option such as this example:

distfilesfoo.tar.gz \
 bar.tar.xz

It is fairly obvious how to extract these files. There is usually no
need to specify how these files should be extracted, since most
compression methods can be deduced from the filename.

The command 'use_* yes' would then only set the extract.suffix
accordingly, instead of the explicit extract.cmd. These two options
would then be equivalent:

use_xz yes
extract.suffix .tar.xz

Rainer
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-26 Thread René J . V . Bertin
On Saturday March 26 2016 02:27:49 Ryan Schmidt wrote:

> Don't think of them as commands. Think of them as radio buttons.

Typically that would mean that if you "push" another one of them, the previous 
choice is undone.
That doesn't seem to be the case here.

> Hmm, that could be considered a bug. I think you work around it by setting 
> "use_xz no" as well.

I could try, but from what I remember of my understanding of the 
implementation, it was simply not foreseen to unset every possible setting. At 
least not to go back to the default, use_gz.

> There was some interest expressed at the MacPorts meeting in Slovenia to make 
> MacPorts automatically detect compression formats during decompression and 
> thus make the "use_xz" etc. options unnecessary. Notably, the tar command in 
> OS X 10.9 (?) and later already supports automatic compression format 
> detection (though we don't use that in MacPorts at this time: we decompress, 
> then pipe the decompressed file to tar).

That would mean you'd need another method of specifying the suffix, no?

R
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-26 Thread Ryan Schmidt

On Mar 25, 2016, at 4:30 PM, René J.V. Bertin wrote:

> On Friday March 25 2016 20:59:47 Ian Rees wrote:
> 
> Hi Ian,
> 
> 
>> for ccs, and a .tgz for spooles.  As far as I could figure out, the fetch
>> step can handle multiple files, but the extract step can not.-Ian-
> 
> That's what I figured would happen indeed.
> 
> A related observation: you can only use one of the use_{zip,bz2,xz,etc} 
> commands once; subsequent calls have no effect.

Don't think of them as commands. Think of them as radio buttons. You're 
indicating which of the available compression format all of the files will use. 
You can only pick one.

> Thus, if all but a handful of the tarballs of a software family use xz 
> compression you cannot put a "use_xz" in the PortGroup and override it with, 
> say, use_bz2 for the few exceptions.

Hmm, that could be considered a bug. I think you work around it by setting 
"use_xz no" as well.


There was some interest expressed at the MacPorts meeting in Slovenia to make 
MacPorts automatically detect compression formats during decompression and thus 
make the "use_xz" etc. options unnecessary. Notably, the tar command in OS X 
10.9 (?) and later already supports automatic compression format detection 
(though we don't use that in MacPorts at this time: we decompress, then pipe 
the decompressed file to tar).


___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-25 Thread René J . V . Bertin
On Friday March 25 2016 20:59:47 Ian Rees wrote:

Hi Ian,


> for ccs, and a .tgz for spooles.  As far as I could figure out, the fetch
> step can handle multiple files, but the extract step can not.-Ian-

That's what I figured would happen indeed.

A related observation: you can only use one of the use_{zip,bz2,xz,etc} 
commands once; subsequent calls have no effect.
Thus, if all but a handful of the tarballs of a software family use xz 
compression you cannot put a "use_xz" in the PortGroup and override it with, 
say, use_bz2 for the few exceptions.

R.
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev


Re: multiple distfiles from multiple sources and with multiple compression types

2016-03-25 Thread Ian Rees
Hi René,

  I had a similar situation recently making the Portfile for ccx:
https://trac.macports.org/ticket/50810 .  There is one .tar.bz2 download
for ccs, and a .tgz for spooles.  As far as I could figure out, the fetch
step can handle multiple files, but the extract step can not.-Ian-

On Sat, Mar 26, 2016 at 1:33 AM René J.V.  wrote:

> Hi,
>
> I have a case where I'd need to grab an xz'ed source tarball from one
> location, and a gzip'ed tarball from another (github's snapshot generator,
> to be exact).
>
> Is that possible with "base" or do I need to fetch the 2nd one manually in
> the post-fetch?
>
> R.
> ___
> macports-dev mailing list
> macports-dev@lists.macosforge.org
> https://lists.macosforge.org/mailman/listinfo/macports-dev
>
___
macports-dev mailing list
macports-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/macports-dev