Re: Undefined reference to gnutls_protocol_set_priority() when compiling latest wget version

2020-05-12 Thread Petr Pisar
On Tue, May 12, 2020 at 05:34:22PM -0600, Stephen Kirby wrote:
> I'm using GnuTLS version 3.6.13.  I believe it is the latest.  If anyone
> knows otherwise please let me know.
> 
> Sorry for the delay in getting back to you Tim (was swamped this morning)
> and thanks for your fast response!  I double-checked the versions of GnuTLS
> and wget I am using.  Both are the absolute latest (gnutls-3.6.13 and
> wget-1.20.3).  As such, I am not sure why the latest wget (in src/gnutls.c)
> would employ a deprecated/removed function, specifically,
> "gnutls_protocol_set_priority()?  Do you recommend stepping back to an
> older version of GnuTLS to get around this and if so which one would work?
> Otherwise, would anyone know of a patch for the wget source code,
> specifically, for the file  /src/gnutls.c so I can use the latest versions
> of GnuTLS and wget?  Thanks so much.
> 
I have also these latest versions and I do not observe your problem.

Indeed GnuTLS version 3.6.13 does not provide gnutls_protocol_set_priority
symbol. It provides gnutls_priority_set_direct. You can check it by inspecting
the library:

$ nm -D /usr/lib64/libgnutls.so.30.27.0 |grep gnutls_priority_set_direct
00052bc0 T gnutls_priority_set_direct
$ nm -D /usr/lib64/libgnutls.so.30.27.0 |grep gnutls_protocol_set_priority

If you read wget code, you will find out that gnutls_protocol_set_priority()
function is used only if HAVE_GNUTLS_PRIORITY_SET_DIRECT C preprocessor macro
is not defined. Please check src/config.h generated after running ./configure.
I bet it defines it.

If that's so, you need to find out why the configure check was unable to
discover support for gnutls_priority_set_direct. configure does this:

for ac_func in gnutls_priority_set_direct
do :
  ac_fn_c_check_func "$LINENO" "gnutls_priority_set_direct" 
"ac_cv_func_gnutls_priority_set_direct"
if test "x$ac_cv_func_gnutls_priority_set_direct" = xyes; then :
  cat >>confdefs.h <<_ACEOF
#define HAVE_GNUTLS_PRIORITY_SET_DIRECT 1
_ACEOF

I recommed you reading config.log (around "checking for
gnutls_priority_set_direct" line) to find it out.

I suspect your GnuTLS installation is botched. Probably the library and header
files do not match.

-- Petr


signature.asc
Description: PGP signature


Re: Undefined reference to gnutls_protocol_set_priority() when compiling latest wget version

2020-05-12 Thread Stephen Kirby
Hi Tim,

I'm using GnuTLS version 3.6.13.  I believe it is the latest.  If anyone
knows otherwise please let me know.

Sorry for the delay in getting back to you Tim (was swamped this morning)
and thanks for your fast response!  I double-checked the versions of GnuTLS
and wget I am using.  Both are the absolute latest (gnutls-3.6.13 and
wget-1.20.3).  As such, I am not sure why the latest wget (in src/gnutls.c)
would employ a deprecated/removed function, specifically,
"gnutls_protocol_set_priority()?  Do you recommend stepping back to an
older version of GnuTLS to get around this and if so which one would work?
Otherwise, would anyone know of a patch for the wget source code,
specifically, for the file  /src/gnutls.c so I can use the latest versions
of GnuTLS and wget?  Thanks so much.

Best regards,
Steve

On Tue, May 12, 2020 at 1:57 AM Tim Rühsen  wrote:

> Which version of GnuTLS are you using ?
>
> I saw this issue two or three weeks ago on OSS-Fuzz, which builds wget
> and it's dependencies from master. But the build failure is gone now -
> and i never cared much for it, so can't tell you anything but "try with
> latest GnuTLS".
>
> Regards, Tim
>
> On 12.05.20 00:51, Stephen Kirby wrote:
> > Hi all,
> >
> > I am trying to cross-compile wget using the
> target=x86_64-linux-gnueabi.  It makes it almost all the way through but is
> dying at the link stage with the error I name in the subject line.  It
> looks as if wget is trying to use a function in gnutls that has been
> deprecated/deleted.  Is there a workaround for this?  From what I have
> found so far, gnutls_protocol_set_priority() should be replaced with
> gnutls_priority_set_direct().  But, I see that both of these functions are
> listed in wget/src/gnutls.c.
> >
> > Can someone please advise me on this?
> >
> > Thanks so much!
> >
> > Best,
> > Steve
> >
>
>


RE: [bug #58354] Wget doesn't parse URIs starting with http:/

2020-05-12 Thread Seymour J Metz
I've got code for parsing broken URLs at 
http://mason.gmu.edu/~smetz3/source/unobfuscate.zip if that's of any use to 
you. 


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: Bug-wget [bug-wget-bounces+smetz3=gmu@gnu.org] on behalf of Luca 
Bernardi [invalid.nore...@gnu.org]
Sent: Tuesday, May 12, 2020 6:57 AM
To: Luca Bernardi; gscriv...@gnu.org; tim.rueh...@gmx.de; bug-wget@gnu.org; 
dar...@gnu.org
Subject: [bug #58354] Wget doesn't parse URIs starting with http:/

Follow-up Comment #1, bug #58354 (project wget):

PS This bug has happened when trying to crawl a website with default Wordpress
template.

___

Reply to this item at:

  


___
  Message sent via Savannah
  
https://secure-web.cisco.com/1t7bdydvsCxBYK2hviWUK34edpVCbTtcc7hvoEjsGxp7TF7YcwxQ4wHZDEeqhx7ckLh33IjhN6G3CTT6UK6Nhhq-1MBzaLtKN3ycAbQu9cLQX_Is4dFUdOLYzPUdtaX4csfyBmvz-h5-D-HjK5ZoEEYyJLkpqwjCVh8FrDCzMX3GPuG7Gc47pGRmt4cAoaa64gi3TWmRF9Rlac3d-3JLYmkzxyBl6DMT_eeYR9YQIZLnWPYhJhdG4367UOEV6eEJPSzbApw6N0xoxr7bE9EhRLs509MOh6MRMnCQPJk6JpDttjn_xSjlybWQzZRlYmm87zlzgsopx_leVwUGOHKtEcCDJqMajmWHC4NDH2M3DPfHGQ5uSYTbaoVmgMMZBuHksYzhBaW8pWLkIYDTAe288H6u12Rr1qbRMeJA6v5UeUTNSgb5ebn2ld1j9hvKPDnN-/https%3A%2F%2Fsavannah.gnu.org%2F






Re: Wget for Windows unicode issue

2020-05-12 Thread Gisle Vanem

This works here:
  wget.exe --local-encoding=utf-8 -i url-file.txt

With a 'url-file.txt' containing:
  www.seoghør.no

I used 'https://cafewebmaster.com/online_tools/utf8_encode'
to UTF-8 encode '.seoghør.no' into the above.

Wget downloaded a 300 kB index.html. Except I had to
use '--no-check-certificate'!

--
--gv



[bug #58354] Wget doesn't parse URIs starting with http:/

2020-05-12 Thread Daniel Stenberg
Follow-up Comment #3, bug #58354 (project wget):

Note that the browser's The WHATWG URL Spec (TWUS) allows that kind of
abomination, while RFC 3986 explicitly requires two slashes to be there.
(Listed in my "URL interop" page at
https://github.com/bagder/docs/blob/master/URL-interop.md)

In fact, TWUS allows an unlimited amount of slashes (and backslashes) to be
present and is still fine with it.

Because browsers allow this, instances of such URLs appear in the wild so it
makes sense to be somewhat accommodating.

In the curl project (which primarily has RFC3986 as guidance), we've still
decided to accept one, two or three slashes as we've seen both one and three
happen in the wild. I don't believe in being more lenient than so.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #58354] Wget doesn't parse URIs starting with http:/

2020-05-12 Thread Tim Ruehsen
Update of bug #58354 (project wget):

  Status:None => Confirmed  

___

Follow-up Comment #2:

We could accept this while parsing input *and* having a BASE.

It doesn't make much sense without a BASE, as the host/domain part is skipped
here.

I assume that 'https:' should also be recognized.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #58354] Wget doesn't parse URIs starting with http:/

2020-05-12 Thread Luca Bernardi
Follow-up Comment #1, bug #58354 (project wget):

PS This bug has happened when trying to crawl a website with default Wordpress
template.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: [bug #58354] Wget doesn't parse URIs starting with http:/

2020-05-12 Thread Jeffrey Walton
On Tue, May 12, 2020 at 6:45 AM Luca Bernardi  wrote:
>
> URL:
>   
>
>  Summary: Wget doesn't parse URIs starting with http:/
>  Project: GNU Wget
> Submitted by: f0ff
> Submitted on: Tue 12 May 2020 10:45:17 AM UTC
> Category: None
> Severity: 3 - Normal
> Priority: 5 - Normal
>   Status: None
>  Privacy: Public
>  Assigned to: None
>  Originator Name:
> Originator Email:
>  Open/Closed: Open
>  Release: 1.14
>  Discussion Lock: Any
> Operating System: GNU/Linux
>  Reproducibility: Every Time
>Fixed Release: None
>  Planned Release: None
>   Regression: None
>Work Required: None
>   Patch Included: No
>
> ___
>
> Details:
>
> Hi,
> Wget refuses to parse URIs that start with http:/ (note single slash), e.g.
> http:/wp-includes/css/dist/block-library/style.min.css?ver=5.4.1. These are
> widely accepted by browsers.
>
> Command that I've used: `wget --user-agent=Mozilla --content-disposition
> --page-requisites --adjust-extension --restrict-file-names=windows -d -e
> robots=off -m -k -E -r -l 10 -p -N -F -P crawl  -nH $IP`

You may as well make the slashes optional in the protocol string.
Berners Lee does not like them anyway,
https://www.mentalfloss.com/uk/history/27802/10-inventors-who-came-to-regret-their-creations.

Jeff



[bug #58354] Wget doesn't parse URIs starting with http:/

2020-05-12 Thread Luca Bernardi
URL:
  

 Summary: Wget doesn't parse URIs starting with http:/
 Project: GNU Wget
Submitted by: f0ff
Submitted on: Tue 12 May 2020 10:45:17 AM UTC
Category: None
Severity: 3 - Normal
Priority: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: 
Originator Email: 
 Open/Closed: Open
 Release: 1.14
 Discussion Lock: Any
Operating System: GNU/Linux
 Reproducibility: Every Time
   Fixed Release: None
 Planned Release: None
  Regression: None
   Work Required: None
  Patch Included: No

___

Details:

Hi,
Wget refuses to parse URIs that start with http:/ (note single slash), e.g.
http:/wp-includes/css/dist/block-library/style.min.css?ver=5.4.1. These are
widely accepted by browsers.

Command that I've used: `wget --user-agent=Mozilla --content-disposition
--page-requisites --adjust-extension --restrict-file-names=windows -d -e
robots=off -m -k -E -r -l 10 -p -N -F -P crawl  -nH $IP`



___

File Attachments:


---
Date: Tue 12 May 2020 10:45:17 AM UTC  Name: out.txt  Size: 17KiB   By: f0ff



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: how to capture "20 redirections exceeded." error?

2020-05-12 Thread Tim Rühsen
You could parse the output from wget. It is extremely unlikely that we
will change this message in the future.

Regards, Tim

On 11.05.20 23:11, Peng Yu wrote:
> Hi,
> 
> wget returns 8 when it sees "20 redirections exceeded.". But how to
> capture such an error from the program calling wget? Thanks.
> 



signature.asc
Description: OpenPGP digital signature


Re: Undefined reference to gnutls_protocol_set_priority() when compiling latest wget version

2020-05-12 Thread Tim Rühsen
Which version of GnuTLS are you using ?

I saw this issue two or three weeks ago on OSS-Fuzz, which builds wget
and it's dependencies from master. But the build failure is gone now -
and i never cared much for it, so can't tell you anything but "try with
latest GnuTLS".

Regards, Tim

On 12.05.20 00:51, Stephen Kirby wrote:
> Hi all,
> 
> I am trying to cross-compile wget using the target=x86_64-linux-gnueabi.  It 
> makes it almost all the way through but is dying at the link stage with the 
> error I name in the subject line.  It looks as if wget is trying to use a 
> function in gnutls that has been deprecated/deleted.  Is there a workaround 
> for this?  From what I have found so far, gnutls_protocol_set_priority() 
> should be replaced with gnutls_priority_set_direct().  But, I see that both 
> of these functions are listed in wget/src/gnutls.c.  
> 
> Can someone please advise me on this?
> 
> Thanks so much!
> 
> Best,
> Steve
> 



signature.asc
Description: OpenPGP digital signature


Re: Wget for Windows unicode issue

2020-05-12 Thread Tim Rühsen
Hi,

the default charset encoding on Windows is likely not UTF-8 (maybe
cp1252 !?), so UTF-8 character read from myfile.txt are not correctly
converted.

But you can possibly use --remote-encoding=utf-8.

From the wget man page:
   --remote-encoding=encoding
   Force Wget to use encoding as the default remote server
encoding.  That affects how Wget converts URIs
   found in files from remote encoding to UTF-8 during a
recursive fetch. This options is only useful for IRI
   support, for the interpretation of non-ASCII characters.

Regards, Tim

On 12.05.20 04:37, Leonid Pavel wrote:
> I'm trying to use wget for windows with unicode characters and getting
> issues with filename creation.
> 
> Passing in "wget http://example.com/á.png; directly works fine, however
> if I put the URL in a UTF-8 encoded file and run "wget -i myfile.txt",
> it downloads the file as "A¡.png" which is obviously incorrect.
> 
> Setting the file encoding as UTF-16 / UCS-2 just breaks entirely (tries
> to make a request to a gibberish URL)
> 
> However writing the file as ANSI/ASCII works correctly. This works for
> my example, but for characters that are not able to be represented as
> ASCII characters will surely fail.
> 
> Is this not possible to fix? Why does mingw not take this into account?
> 
> 



signature.asc
Description: OpenPGP digital signature