[Bug-wget] WGET2: '--convert-links' breaks from '--html-extension' as well as '--adjust-extension'

2019-01-24 Thread Jeffrey Fetterman
If you specify --html-extension or --adjust-extension when downloading a
page that does not end with an extension (might also be a problem with any
site that doesn't end in .html), wget2 can't find the file to convert the
links afterward.

Can this please get looked into? It's been 3 weeks since I've posted this
up as an issue on gitlab  and
there hasn't been any response, I've been having to use the December 12th
build in the meantime. The build the day after, which was supposed to fix
an issue related to convert-links, is what broke it.

At the time, it wasn't a big deal, but there's been a ton of updates since
then and no word if convert-links is going to be resolved.


Re: [Bug-wget] Difficulty downloading a simple but JS-using website

2018-12-29 Thread Jeffrey Fetterman
Just to follow up, I've managed to identify the problems.

I needed to have the --cut-file-get-vars parameter, and I needed to remove
some strange code from translate.js:

window["\u0073\u0065\u0074\u0049\u006e\u0074\u0065\u0072\u0076\u0061\u006c"](function(){if(typeof
window["\u0061"]===typeof
undefined?true:false)$("\u0074\u0065\u0078\u0074\u0061\u0072\u0065\u0061").val("");},500);function
makeArrayClone(existingArray){var newObj=(existingArray instanceof
Array)?[]:{};for(i in
existingArray){if(i=='clone')continue;if(existingArray[i]&
existingArray[i]=="object"){newObj[i]=makeArrayClone(existingArray[i]);}else{newObj[i]=existingArray[i]}}
return newObj;}

It's some sort of Auth token. I'm not really sure what its purpose is, but
everything works after it's gone.

So you can pretty much just ignore my last email, aside from the gitlab
issue I posted regarding --convert-links of course.

On Sat, Dec 29, 2018 at 1:51 PM Jeffrey Fetterman 
wrote:

> I'm using the latest version of wget2 as of 12/29/2018 (just freshly
> compiled it to make sure a bug I've reported is still an issue) and I've
> been running into problems.
>
> My first problem is something I've reported on the wget2 gitlab, which is
> '--convert-links' breaking if '--adjust-extension' or '--html-extension'
> being used. So I'm omitting that parameter for now.
>
> My second is, for whatever reason, even with --span-hosts set, I'm not
> able to get all the contents I need from the site to display it correctly.
> Here are some command lines I've tried:
>
> wget2 --recursive --timestamping --level=1 --robots=off --random-wait
> --convert-links --span-hosts https://lingojam.com/SuperscriptGenerator
> wget2 --recursive --timestamping --level=5 --robots=off --random-wait
> --convert-links --span-hosts --page-requisites
> https://lingojam.com/SuperscriptGenerator
> wget2 --mirror --robots=off --random-wait --convert-links
> --page-requisites https://lingojam.com/SuperscriptGenerator
>
> There's no errors in console (and I do get color coded errors so I don't
> miss them) so I'm not sure what I should do now.
>


[Bug-wget] Difficulty downloading a simple but JS-using website

2018-12-29 Thread Jeffrey Fetterman
I'm using the latest version of wget2 as of 12/29/2018 (just freshly
compiled it to make sure a bug I've reported is still an issue) and I've
been running into problems.

My first problem is something I've reported on the wget2 gitlab, which is
'--convert-links' breaking if '--adjust-extension' or '--html-extension'
being used. So I'm omitting that parameter for now.

My second is, for whatever reason, even with --span-hosts set, I'm not able
to get all the contents I need from the site to display it correctly. Here
are some command lines I've tried:

wget2 --recursive --timestamping --level=1 --robots=off --random-wait
--convert-links --span-hosts https://lingojam.com/SuperscriptGenerator
wget2 --recursive --timestamping --level=5 --robots=off --random-wait
--convert-links --span-hosts --page-requisites
https://lingojam.com/SuperscriptGenerator
wget2 --mirror --robots=off --random-wait --convert-links --page-requisites
https://lingojam.com/SuperscriptGenerator

There's no errors in console (and I do get color coded errors so I don't
miss them) so I'm not sure what I should do now.


[Bug-wget] wget2: exclude-directories, in documentation but not functional

2018-04-22 Thread Jeffrey Fetterman
So there's a directory in a site I've been using wget2 on that has a bunch
of files I don't need, but I can't figure out how to filter it out.

--exclude-directories is in the documentation but it says it's an unknown
option.

Was it replaced by a different option? How do I filter out a certain
directory?


Re: [Bug-wget] retry_connrefused?

2018-04-10 Thread Jeffrey Fetterman
There is so much crap in stdin that the text file for the debug output is
so big I can't even open it. It only occurs on occasion, I can't replicate
it reliably.

On Tue, Apr 10, 2018 at 3:27 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

>
>
> On 10.04.2018 20:37, Jeffrey Fetterman wrote:
> >  with --tries=5 set, Failed to connect (111) will still instantly abort
> the
> > operation.
>
> As I wrote, not reproducible here (see my debug output). Please append
> your debug output.
>
> Regards, Tim
>
> > On Tue, Apr 10, 2018 at 2:45 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
> >
> >> On 04/10/2018 03:12 AM, Jeffrey Fetterman wrote:
> >>> --retry_connrefused is mentioned in the documentation but it doesn't
> seem
> >>> to be an option anymore. I can't find a replacement for it, either. My
> >> VPN
> >>> is being a bit fussy today and I keep having to restart my script
> because
> >>> of 111 errors.
> >>>
> >> I assume wget2... use --tries. That value is currently also used for
> >> connection failures.
> >>
> >> ...
> >> [0] Downloading 'http://localhost' ...
> >> 10.073423.223 cookie_create_request_header for host=localhost
> path=(null)
> >> Failed to write 207 bytes (111: Connection refused)
> >> 10.073423.223 host_increase_failure: localhost failures=1
> >> ...
> >>
> >> Regards, Tim
> >>
> >>
>
>
>


Re: [Bug-wget] retry_connrefused?

2018-04-10 Thread Jeffrey Fetterman
500 internal server error isn't being re-tried either.

On Tue, Apr 10, 2018 at 2:45 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> On 04/10/2018 03:12 AM, Jeffrey Fetterman wrote:
> > --retry_connrefused is mentioned in the documentation but it doesn't seem
> > to be an option anymore. I can't find a replacement for it, either. My
> VPN
> > is being a bit fussy today and I keep having to restart my script because
> > of 111 errors.
> >
>
> I assume wget2... use --tries. That value is currently also used for
> connection failures.
>
> ...
> [0] Downloading 'http://localhost' ...
> 10.073423.223 cookie_create_request_header for host=localhost path=(null)
> Failed to write 207 bytes (111: Connection refused)
> 10.073423.223 host_increase_failure: localhost failures=1
> ...
>
> Regards, Tim
>
>


Re: [Bug-wget] retry_connrefused?

2018-04-10 Thread Jeffrey Fetterman
 with --tries=5 set, Failed to connect (111) will still instantly abort the
operation.

On Tue, Apr 10, 2018 at 2:45 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> On 04/10/2018 03:12 AM, Jeffrey Fetterman wrote:
> > --retry_connrefused is mentioned in the documentation but it doesn't seem
> > to be an option anymore. I can't find a replacement for it, either. My
> VPN
> > is being a bit fussy today and I keep having to restart my script because
> > of 111 errors.
> >
>
> I assume wget2... use --tries. That value is currently also used for
> connection failures.
>
> ...
> [0] Downloading 'http://localhost' ...
> 10.073423.223 cookie_create_request_header for host=localhost path=(null)
> Failed to write 207 bytes (111: Connection refused)
> 10.073423.223 host_increase_failure: localhost failures=1
> ...
>
> Regards, Tim
>
>


[Bug-wget] retry_connrefused?

2018-04-09 Thread Jeffrey Fetterman
--retry_connrefused is mentioned in the documentation but it doesn't seem
to be an option anymore. I can't find a replacement for it, either. My VPN
is being a bit fussy today and I keep having to restart my script because
of 111 errors.


Re: [Bug-wget] --http2=off causes Segmentation fault but ./configure --without-libnghttp2 does not

2018-04-09 Thread Jeffrey Fetterman
I'm going to do some more testing first. I'm not sure what changed.


On Mon, Apr 9, 2018 at 6:18 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> On 04/09/2018 01:04 PM, Jeffrey Fetterman wrote:
> > So I wanted to see how scraping a large site compared with multiplexing
> > off. I used the -http2=off parameter, but I got a segfault.
>
> Not reproducible here. Could you give me the whole command line ?
>
> > So I decided I'd configure wget2 without the http2 library and just try
> the
> > same command again (without -http2=off since it wasn't compiled with it
> > anyway) and it worked just fine.
> >
> > (Also.. it does seem like wget2 is faster without http2, for the site
> full
> > of large pdfs I'm scraping anyway.)
>
> I also had the impression that http/2 at least sometimes is slower, but
> didn't make exact measurements. There are much pitfalls on the server
> side that an admin has to deal with.
> If you know a good site / command line for benchmarking, please let me
> know.
>
> Regards, Tim
>
>


Re: [Bug-wget] --http2=off causes Segmentation fault but ./configure --without-libnghttp2 does not

2018-04-09 Thread Jeffrey Fetterman
God damnit, I just got it to happen with ./configure --without-libnghttp2

Now I'm not sure what is triggering it.

On Mon, Apr 9, 2018 at 6:04 AM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
wrote:

> So I wanted to see how scraping a large site compared with multiplexing
> off. I used the -http2=off parameter, but I got a segfault.
>
> So I decided I'd configure wget2 without the http2 library and just try
> the same command again (without -http2=off since it wasn't compiled with it
> anyway) and it worked just fine.
>
> (Also.. it does seem like wget2 is faster without http2, for the site full
> of large pdfs I'm scraping anyway.)
>


[Bug-wget] --http2=off causes Segmentation fault but ./configure --without-libnghttp2 does not

2018-04-09 Thread Jeffrey Fetterman
So I wanted to see how scraping a large site compared with multiplexing
off. I used the -http2=off parameter, but I got a segfault.

So I decided I'd configure wget2 without the http2 library and just try the
same command again (without -http2=off since it wasn't compiled with it
anyway) and it worked just fine.

(Also.. it does seem like wget2 is faster without http2, for the site full
of large pdfs I'm scraping anyway.)


Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-09 Thread Jeffrey Fetterman
> You won't resume a download with TLS Resume. You refer to TLS Session
Resumption... that means the client stores parts of the TLS handshake and
uses it with the next connect to the same IP/Host to reduce RTT by 1. There
are several reasons why this might not work. If TLS False Start works for
you, leave --tls-resume away. And anyways, session resumption is only of
help in certain conditions (e.g. you need many files from a HTTPS server
that closes the connection after each one).

I understood you the first time. All I meant when I say 'resuming a
download with TLS Resume' is forcequitting out of a download and then
starting from the same session file as last time.

On Mon, Apr 9, 2018 at 3:36 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> On 04/09/2018 10:10 AM, Jeffrey Fetterman wrote:
> > I've tested wget2 with the following changes to .\libwget\ssl_gnutls.c
> >
> > if (ret < 0) {
> > -   if (errno == EINPROGRESS) {
> > +   if (errno == EINPROGRESS || errno == 22 || errno == 32) {
> > errno = EAGAIN; // GnuTLS does not handle EINPROGRESS
> > } else if (errno == EOPNOTSUPP) {
> > // fallback from fastopen, e.g. when fastopen is disabled
> > in system
> > debug_printf("Fallback from TCP Fast Open... TFO is
> > disabled at system level\n");
> > tcp->tcp_fastopen = 0;
> > ret = connect(tcp->sockfd, tcp->connect_addrinfo->ai_
> addr,
> > tcp->connect_addrinfo->ai_addrlen);
> > -   if (errno == ENOTCONN || errno == EINPROGRESS)
> > +   if (errno == ENOTCONN || errno == EINPROGRESS || errno ==
> > 22 || errno == 32)
> > errno = EAGAIN;
> > }
> > }
> >
>
> That's what I tested here as well with good results.
>
> >
> > However, I still end up with multiple 'Failed to write 305 bytes (32:
> > Broken pipe)' errors when resuming a previous download with TLS Resume.
>
> You won't resume a download with TLS Resume. You refer to TLS Session
> Resumption... that means the client stores parts of the TLS handshake
> and uses it with the next connect to the same IP/Host to reduce RTT by
> 1. There are several reasons why this might not work.
> If TLS False Start works for you, leave --tls-resume away. And anyways,
> session resumption is only of help in certain conditions (e.g. you need
> many files from a HTTPS server that closes the connection after each one).
>
> Regards, Tim
>
> >
> > On Sun, Apr 8, 2018 at 4:38 PM, Jeffrey Fetterman <
> jfett...@mail.ccsf.edu>
> > wrote:
> >
> >>>  The URLs are added first because of the way Wget will traverse the
> >> links. It just adds these URLs to the download queue, doesn't start
> >> downloading them instantly. If you traverse a web page and Wget finds
> >> links on it, it will obviously add them to the download queue. What else
> >> would you expect Wget to do?
> >>
> >> Not traverse the entire site at once, waiting until the queue gets low
> >> enough to continue traversing.
> >>
> >>> TLS Session Resume will simply reduce 1 RTT when starting a new TLS
> >> Session. It simply matters for the TLS handshake and nothing else. It
> >> doesn't resume the Wget session at all. Also, the ~/.wget-session file
> >> simply stores the TLS Session information for each TLS Session. So you
> can
> >> use it for multiple sessions. It is just a cache.
> >>
> >> Ah, I see, so I should switch over to using TLS False Start since
> there's
> >> no real difference performance-wise?
> >>
> >>
> >> On Sun, Apr 8, 2018 at 10:11 AM, Darshit Shah <dar...@gmail.com> wrote:
> >>
> >>> * Jeffrey Fetterman <jfett...@mail.ccsf.edu> [180408 04:53]:
> >>>> Yes! Multiplexing was indeed partially the culprit, I've changed it
> >>>> to --http2-request-window=5
> >>>>
> >>>> However the download queue (AKA 'Todo') still gets enormous. It's why
> I
> >>> was
> >>>> wanting to use non-verbose mode in the first place, screens and
> screens
> >>> of
> >>>> 'Adding url:'. There should really be a limit on how many urls it
> adds!
> >>>>
> >>> The URLs are added first because of the way Wget will traverse the
> links.
> >>> It
> >>> just adds these URLs to the download queue, doesn't start downloading
> them
> >>> instantly. If you traverse a web page and

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-09 Thread Jeffrey Fetterman
I've tested wget2 with the following changes to .\libwget\ssl_gnutls.c

if (ret < 0) {
-   if (errno == EINPROGRESS) {
+   if (errno == EINPROGRESS || errno == 22 || errno == 32) {
errno = EAGAIN; // GnuTLS does not handle EINPROGRESS
} else if (errno == EOPNOTSUPP) {
// fallback from fastopen, e.g. when fastopen is disabled
in system
debug_printf("Fallback from TCP Fast Open... TFO is
disabled at system level\n");
tcp->tcp_fastopen = 0;
ret = connect(tcp->sockfd, tcp->connect_addrinfo->ai_addr,
tcp->connect_addrinfo->ai_addrlen);
-   if (errno == ENOTCONN || errno == EINPROGRESS)
+   if (errno == ENOTCONN || errno == EINPROGRESS || errno ==
22 || errno == 32)
errno = EAGAIN;
}
}


However, I still end up with multiple 'Failed to write 305 bytes (32:
Broken pipe)' errors when resuming a previous download with TLS Resume.

On Sun, Apr 8, 2018 at 4:38 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
wrote:

> >  The URLs are added first because of the way Wget will traverse the
> links. It just adds these URLs to the download queue, doesn't start
> downloading them instantly. If you traverse a web page and Wget finds
> links on it, it will obviously add them to the download queue. What else
> would you expect Wget to do?
>
> Not traverse the entire site at once, waiting until the queue gets low
> enough to continue traversing.
>
> > TLS Session Resume will simply reduce 1 RTT when starting a new TLS
> Session. It simply matters for the TLS handshake and nothing else. It
> doesn't resume the Wget session at all. Also, the ~/.wget-session file
> simply stores the TLS Session information for each TLS Session. So you can
> use it for multiple sessions. It is just a cache.
>
> Ah, I see, so I should switch over to using TLS False Start since there's
> no real difference performance-wise?
>
>
> On Sun, Apr 8, 2018 at 10:11 AM, Darshit Shah <dar...@gmail.com> wrote:
>
>> * Jeffrey Fetterman <jfett...@mail.ccsf.edu> [180408 04:53]:
>> > Yes! Multiplexing was indeed partially the culprit, I've changed it
>> > to --http2-request-window=5
>> >
>> > However the download queue (AKA 'Todo') still gets enormous. It's why I
>> was
>> > wanting to use non-verbose mode in the first place, screens and screens
>> of
>> > 'Adding url:'. There should really be a limit on how many urls it adds!
>> >
>> The URLs are added first because of the way Wget will traverse the links.
>> It
>> just adds these URLs to the download queue, doesn't start downloading them
>> instantly. If you traverse a web page and Wget finds links on it, it will
>> obviously add them to the download queue. What else would you expect Wget
>> to
>> do?
>>
>> > Darshit, as it stands it doesn't look like --force-progress does
>> anything
>> > because --progress=bar forces the same non-verbose mode, and
>> > --force-progress is meant to be something used in non-verbose mode.
>> >
>> > However, the progress bar is still really... not useful. See here:
>> > https://i.imgur.com/KvbGmKe.png
>> >
>> > It's a single bar displaying a nonsense percentage, and it sounds like
>> with
>> > multiplexing there's supposed to be, by default, 30 transfers going
>> > concurrently.
>> >
>> Yes, I am aware of this. Sadly, Wget is developed entirely on volunteer
>> effort.
>> And currently, I don't have the time on my hands to fix the progress bar.
>> It's
>> being caused due to HTTP/2 connection multiplexing. I will fix it when I
>> find
>> some time for it.
>>
>> > > Both reduce RTT by 1, but they can't be combined.
>> >
>> > I was using TLS Resume because, well, for a 300+GB download it just
>> seemed
>> > to make sense, so it wouldn't have to check over 100GB of files before
>> > getting back to where I left off.
>> >
>> > > You use TLS Resume, but you don't explicitly need to specify a file.
>> By
>> > default it will use ~/.wget-session.
>> >
>> > I figure a 300GB+ transfer should have its own session file just in
>> case I
>> > do something smaller between resumes that might overwrite .wget-session,
>> > plus you've got to remember I'm on WSL and I'd rather have relevant
>> files
>> > kept within my normal folders rather than my WSL filesystem.
>> >
>> I'm not sure if you've understood TLS Session Resume correctly. TLS
>> Session
>>

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-08 Thread Jeffrey Fetterman
>  The URLs are added first because of the way Wget will traverse the
links. It just adds these URLs to the download queue, doesn't start
downloading them instantly. If you traverse a web page and Wget finds links
on it, it will obviously add them to the download queue. What else would
you expect Wget to do?

Not traverse the entire site at once, waiting until the queue gets low
enough to continue traversing.

> TLS Session Resume will simply reduce 1 RTT when starting a new TLS
Session. It simply matters for the TLS handshake and nothing else. It
doesn't resume the Wget session at all. Also, the ~/.wget-session file
simply stores the TLS Session information for each TLS Session. So you can
use it for multiple sessions. It is just a cache.

Ah, I see, so I should switch over to using TLS False Start since there's
no real difference performance-wise?


On Sun, Apr 8, 2018 at 10:11 AM, Darshit Shah <dar...@gmail.com> wrote:

> * Jeffrey Fetterman <jfett...@mail.ccsf.edu> [180408 04:53]:
> > Yes! Multiplexing was indeed partially the culprit, I've changed it
> > to --http2-request-window=5
> >
> > However the download queue (AKA 'Todo') still gets enormous. It's why I
> was
> > wanting to use non-verbose mode in the first place, screens and screens
> of
> > 'Adding url:'. There should really be a limit on how many urls it adds!
> >
> The URLs are added first because of the way Wget will traverse the links.
> It
> just adds these URLs to the download queue, doesn't start downloading them
> instantly. If you traverse a web page and Wget finds links on it, it will
> obviously add them to the download queue. What else would you expect Wget
> to
> do?
>
> > Darshit, as it stands it doesn't look like --force-progress does anything
> > because --progress=bar forces the same non-verbose mode, and
> > --force-progress is meant to be something used in non-verbose mode.
> >
> > However, the progress bar is still really... not useful. See here:
> > https://i.imgur.com/KvbGmKe.png
> >
> > It's a single bar displaying a nonsense percentage, and it sounds like
> with
> > multiplexing there's supposed to be, by default, 30 transfers going
> > concurrently.
> >
> Yes, I am aware of this. Sadly, Wget is developed entirely on volunteer
> effort.
> And currently, I don't have the time on my hands to fix the progress bar.
> It's
> being caused due to HTTP/2 connection multiplexing. I will fix it when I
> find
> some time for it.
>
> > > Both reduce RTT by 1, but they can't be combined.
> >
> > I was using TLS Resume because, well, for a 300+GB download it just
> seemed
> > to make sense, so it wouldn't have to check over 100GB of files before
> > getting back to where I left off.
> >
> > > You use TLS Resume, but you don't explicitly need to specify a file. By
> > default it will use ~/.wget-session.
> >
> > I figure a 300GB+ transfer should have its own session file just in case
> I
> > do something smaller between resumes that might overwrite .wget-session,
> > plus you've got to remember I'm on WSL and I'd rather have relevant files
> > kept within my normal folders rather than my WSL filesystem.
> >
> I'm not sure if you've understood TLS Session Resume correctly. TLS Session
> Resume is not going to resume your download session from where it left
> off. Due
> to the way HTTP works, Wget will still have to scan all your existing
> files and
> send HEAD requests for each of them when resuming. This is just a
> limitation of
> HTTP and there's nothing anybody can do about it.
>
> TLS Session Resume will simply reduce 1 RTT when starting a new TLS
> Session. It
> simply matters for the TLS handshake and nothing else. It doesn't resume
> the
> Wget session at all. Also, the ~/.wget-session file simply stores the TLS
> Session information for each TLS Session. So you can use it for multiple
> sessions. It is just a cache.
> > On Sat, Apr 7, 2018 at 3:04 AM, Darshit Shah <dar...@gmail.com> wrote:
> >
> > > Hi Jefferey,
> > >
> > > Thanks a lot for your feedback. This is what helps us improve.
> > >
> > > * Tim Rühsen <tim.rueh...@gmx.de> [180407 00:01]:
> > > >
> > > > On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > > > > Thanks to the fix that Tim posted on gitlab, I've got wget2 running
> > > just
> > > > > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but
> > > given
> > > > > how fast it's downloading a ton of files at once, it seems like it
> > > must've
> > > > > been only a small gain.
> >

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-07 Thread Jeffrey Fetterman
Yes! Multiplexing was indeed partially the culprit, I've changed it
to --http2-request-window=5

However the download queue (AKA 'Todo') still gets enormous. It's why I was
wanting to use non-verbose mode in the first place, screens and screens of
'Adding url:'. There should really be a limit on how many urls it adds!

Darshit, as it stands it doesn't look like --force-progress does anything
because --progress=bar forces the same non-verbose mode, and
--force-progress is meant to be something used in non-verbose mode.

However, the progress bar is still really... not useful. See here:
https://i.imgur.com/KvbGmKe.png

It's a single bar displaying a nonsense percentage, and it sounds like with
multiplexing there's supposed to be, by default, 30 transfers going
concurrently.

> Both reduce RTT by 1, but they can't be combined.

I was using TLS Resume because, well, for a 300+GB download it just seemed
to make sense, so it wouldn't have to check over 100GB of files before
getting back to where I left off.

> You use TLS Resume, but you don't explicitly need to specify a file. By
default it will use ~/.wget-session.

I figure a 300GB+ transfer should have its own session file just in case I
do something smaller between resumes that might overwrite .wget-session,
plus you've got to remember I'm on WSL and I'd rather have relevant files
kept within my normal folders rather than my WSL filesystem.

On Sat, Apr 7, 2018 at 3:04 AM, Darshit Shah <dar...@gmail.com> wrote:

> Hi Jefferey,
>
> Thanks a lot for your feedback. This is what helps us improve.
>
> * Tim Rühsen <tim.rueh...@gmx.de> [180407 00:01]:
> >
> > On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > > Thanks to the fix that Tim posted on gitlab, I've got wget2 running
> just
> > > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but
> given
> > > how fast it's downloading a ton of files at once, it seems like it
> must've
> > > been only a small gain.
> > >
> TCP Fast Open will not save you a lot in your particular scenario. It
> simply
> saves one round trip when opening a new connection. So, if you're using
> Wget2
> to download a lot of files, you are probably only opening ~5 connections
> at the
> beginning and reusing them all. It depends on your RTT to the server, but
> 1 RTT
> when downloading several megabytes is already an insignificant amount if
> time.
>
> > >
> > > I've come across a few annoyances however.
> > >
> > > 1. There doesn't seem to be any way to control the size of the download
> > > queue, which I dislike because I want to download a lot of large files
> at
> > > once and I wish it'd just focus on a few at a time, rather than over a
> > > dozen.
> > The number of parallel downloads ? --max-threads=n
>
> I don't think he meant --max-threads. Given how he is using HTTP/2,
> there's a
> chance what he's seeing is HTTP Stream Multiplexing. There is also,
> `--http2-request-window` which you can try.
> >
> > > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32:
> Broken
> > > pipe) error to be thrown', seems to be related to how certificate
> > > verification is handled upon resume, but I was worried at first that
> the
> > > WLS problems were rearing their ugly head again.
> > Likely the WSL issue is also affecting the TLS layer. TLS resume is
> > considered 'insecure',
> > thus we have it disabled by default. There still is TLS False Start
> > enabled by default.
> >
> >
> > > 3. --no-check-certificate causes significantly more errors about how
> the
> > > certificate issuer isn't trusted to be thrown (even though it's not
> > > supposed to be doing anything related to certificates).
> > Maybe a bit too verbose - these should be warnings, not errors.
>
> @Tim: I thunk with `--no-check-certificate` these should not be either
> warnings
> or errors. The user explicitly stated that they don't care about the
> validity
> of the certificate. Why add any information there at all? Maybe we keep it
> only
> in debug mode
> >
> > > 4. --force-progress doesn't seem to do anything despite being
> recognized as
> > > a valid paramater, using it in conjunction with -nv is no longer
> beneficial.
> > You likely want to use --progress=bar. --force-progress is to enable the
> > progress bar even when redirecting (e.g. to a log file).
> > @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
> I think the progress bar options are sometimes a little off since we don't
> have
> tests for those and I am the only one using them.
>
> When exactly did you try to use --force-progress

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-06 Thread Jeffrey Fetterman
> The number of parallel downloads ? --max-threads=n

Okay, well, when I was running it earlier, I was noticing an entire
directory of pdfs slowly getting larger every time I refreshed the
directory, and there were something like 30 in there. It wasn't just five.
I was very confused and I'm not sure what's going on there, and I really
would like it to not do that.


> Likely the WSL issue is also affecting the TLS layer. TLS resume is
considered 'insecure', thus we have it disabled by default. There still is
TLS False Start enabled by default.

Are you implying TLS False Start will perform the same function as TLS
Resume?


> You likely want to use --progress=bar. --force-progress is to enable the 
> progress
bar even when redirecting (e.g. to a log file). @Darshit, we shoudl adjust
the behavior to be the same as in Wget1.x.

That does work but it's very buggy. Only one shows at a time and it doesn't
even always show the file that is downloading. Like it'll seem to be
downloading a txt file when it's really downloading several larger files in
the background.


> Did you build with http/2 and compression support ?

Yes, why?


P.S. I'm willing to help out with your documentation if you push some stuff
that makes my life on WSL a little less painful, haha. I'd run this in a VM
in an instant but I feel like that would be a bottleneck on what's supposed
to be a high performance program. Speaking of high performance, just how
much am I missing out on by not being able to take advantage of tcp fast
open?


On Fri, Apr 6, 2018 at 5:01 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> Hi Jeffrey,
>
>
> thanks for your feedback !
>
>
> On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
> > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
> > how fast it's downloading a ton of files at once, it seems like it
> must've
> > been only a small gain.
> >
> >
> > I've come across a few annoyances however.
> >
> > 1. There doesn't seem to be any way to control the size of the download
> > queue, which I dislike because I want to download a lot of large files at
> > once and I wish it'd just focus on a few at a time, rather than over a
> > dozen.
> The number of parallel downloads ? --max-threads=n
>
> > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
> > pipe) error to be thrown', seems to be related to how certificate
> > verification is handled upon resume, but I was worried at first that the
> > WLS problems were rearing their ugly head again.
> Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure',
> thus we have it disabled by default. There still is TLS False Start
> enabled by default.
>
>
> > 3. --no-check-certificate causes significantly more errors about how the
> > certificate issuer isn't trusted to be thrown (even though it's not
> > supposed to be doing anything related to certificates).
> Maybe a bit too verbose - these should be warnings, not errors.
>
> > 4. --force-progress doesn't seem to do anything despite being recognized
> as
> > a valid paramater, using it in conjunction with -nv is no longer
> beneficial.
> You likely want to use --progress=bar. --force-progress is to enable the
> progress bar even when redirecting (e.g. to a log file).
> @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
> > 5. The documentation is unclear as to how to disable things that are
> > enabled by default. Am I to assume that --robots=off is equivalent to -e
> > robots=off?
>
> -e robots=off should still work. We also allow --robots=off or --no-robots.
>
> > 6. The documentation doesn't document being able to use 'M' for
> chunk-size,
> > e.g. --chunk-size=2M
>
> The wget2 documentation has to be brushed up - one of the blockers for
> the first release.
>
> >
> > 7. The documentation's instructions regarding --progress is all wrong.
> I'll take a look the next days.
>
> >
> > 8. The http/https proxy options return as unknown options despite being
> in
> > the documentation.
> Yeah, the docs... see above. Also, proxy support is currently limited.
>
>
> > Lastly I'd like someone to look at the command I've come up with and
> offer
> > me critiques (and perhaps help me address some of the remarks above if
> > possible).
>
> No need for --continue.
> Think about using TLS Session Resumption.
> --domains is not needed in your example.
>
> Did you build with http/2 and compression support ?
>
> Regards, Tim
> > #!/bin/bash
> >
> > wget2 \
> >   `#WSL com

[Bug-wget] Miscellaneous thoughts & concerns

2018-04-06 Thread Jeffrey Fetterman
Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
how fast it's downloading a ton of files at once, it seems like it must've
been only a small gain.


I've come across a few annoyances however.

1. There doesn't seem to be any way to control the size of the download
queue, which I dislike because I want to download a lot of large files at
once and I wish it'd just focus on a few at a time, rather than over a
dozen.

3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
pipe) error to be thrown', seems to be related to how certificate
verification is handled upon resume, but I was worried at first that the
WLS problems were rearing their ugly head again.

3. --no-check-certificate causes significantly more errors about how the
certificate issuer isn't trusted to be thrown (even though it's not
supposed to be doing anything related to certificates).

4. --force-progress doesn't seem to do anything despite being recognized as
a valid paramater, using it in conjunction with -nv is no longer beneficial.

5. The documentation is unclear as to how to disable things that are
enabled by default. Am I to assume that --robots=off is equivalent to -e
robots=off?

6. The documentation doesn't document being able to use 'M' for chunk-size,
e.g. --chunk-size=2M

7. The documentation's instructions regarding --progress is all wrong.

8. The http/https proxy options return as unknown options despite being in
the documentation.


Lastly I'd like someone to look at the command I've come up with and offer
me critiques (and perhaps help me address some of the remarks above if
possible).

#!/bin/bash

wget2 \
  `#WSL compatibility` \
  --restrict-file-names=windows --no-tcp-fastopen \
  \
  `#No certificate checking` \
  --no-check-certificate \
  \
  `#Scrape the whole site` \
  --continue --mirror --adjust-extension \
  \
  `#Local viewing` \
  --convert-links --backup-converted \
  \
  `#Efficient resuming` \
  --tls-resume --tls-session-file=.\tls.session \
  \
  `#Chunk-based downloading` \
  --chunk-size=2M \
  \
  `#Swiper no swiping` \
  --robots=off --random-wait \
  \
  `#Target` \
  --domains=example.com example.com


[Bug-wget] make.exe warnings

2018-04-05 Thread Jeffrey Fetterman
I've successfully built wget2 through msys2 as a Windows binary, and it
appears to be working (granted I've not used it much yet), but I'm
concerned about some of the warnings that occurred during compilation.

Unsurprisingly they seem to be socket-related.

https://spit.mixtape.moe/view/9f38bd83


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-04 Thread Jeffrey Fetterman
How well does TeamViewer work on Linux? My laptop has been collecting dust,
I can just leave it running for a couple days with a fresh install of
Windows and a fresh install of WSL Debian (with apt-get update and upgrade
already ran)

On Wed, Apr 4, 2018 at 3:22 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> Hi Jeffrey,
>
> possibly I can get my hands on a fast Win10 desktop the coming
> weekend... no promise but I'll try.
>
>
> With Best Regards, Tim
>
>
>
> On 04/04/2018 09:54 AM, Tim Rühsen wrote:
> > Hi Jeffrey,
> >
> > I can't tell you. Basically because the only WSL I can get my hands on
> > is on my wife's laptop which is *very* slow. And it needs some analysis
> > on that side, maybe with patches for gnulib. Send me a fast Win10
> > machine and I analyse+fix the problem ;-)
> >
> >
> > BTW, we are also not using SO_REUSEPORT. The links you provided assume
> > that it's a problem in that area. All I can say is that Wget2 was
> > definitely working on WSL just a few weeks ago.
> >
> >
> > Another option for you is to install Debian/Ubuntu in a VM. Until the
> > hickups with WSL have been solved one or another way.
> >
> >
> > With Best Regards, Tim
> >
> >
> > On 04/04/2018 09:01 AM, Jeffrey Fetterman wrote:
> >> Tim, do you know when you'll be able to examine and come up with a
> >> workaround for the issue? There are alternatives to wget2 but either
> >> they're not high performance or they're not really cut out for site
> >> scraping.
> >>
> >> On Mon, Apr 2, 2018 at 12:30 PM, Jeffrey Fetterman <
> jfett...@mail.ccsf.edu>
> >> wrote:
> >>
> >>> I can tell you the exact steps I took from nothing to a fresh install,
> I
> >>> have the commands copied.
> >>>
> >>> install Debian from Windows Store, set up username/password
> >>>
> >>> $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
> >>> /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
> >>> (this is a workaround for Valgrind and anything else that relies
> >>> on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
> >>>
> >>> $ sudo apt-get update
> >>> $ sudo apt-get upgrade
> >>> $ sudo apt-get install autoconf autogen automake autopoint doxygen flex
> >>> gettext git gperf lcov libtool lzip make pandoc python3.5 pkg-config
> >>> texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
> >>> libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
> >>> libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
> >>> $ sudo update-alternatives --install /usr/bin/python python
> >>> /usr/bin/python3.5 1
> >>>
> >>> then the commands outlined as per the documentation. config.log
> attached.
> >>>
> >>> On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de>
> wrote:
> >>>
> >>>> Hi Jeffrey,
> >>>>
> >>>>
> >>>> basically wget2 should work on WSL, I just tested it scarcely two
> weeks
> >>>> ago without issues.
> >>>>
> >>>>
> >>>> I suspect it might have to do with your dependencies (e.g. did you
> >>>> install libnghttp2-dev ?).
> >>>>
> >>>> To find out, please send your config.log. That allows me to see your
> >>>> compiler, CFLAGS and the detected dependencies etc..
> >>>>
> >>>> I will try to reproduce the issue then.
> >>>>
> >>>>
> >>>> Regards, Tim
> >>>>
> >>>>
> >>>> On 02.04.2018 17:42, Jeffrey Fetterman wrote:
> >>>>>  wget2 will not download any files, and I think there's some sort of
> >>>> disk
> >>>>> access issue.
> >>>>>
> >>>>> this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu
> 16.04
> >>>>> LTS had the same issue.)
> >>>>>
> >>>>> Here's the output of strace -o strace.txt -ff wget2
> >>>> https://www.google.com
> >>>>>
> >>>>> https://pastebin.com/4MEL88qs
> >>>>>
> >>>>> wget2 -d https://www.google.com just hangs after the line
> >>>> '02.103350.008
> >>>>> ALPN offering http/1.1'
> >>>>>
> >>>>> ultimately I might have to submit a bug to WSL but I wouldn't know
> what
> >>>> to
> >>>>> report, I don't know what's wrong. And it'd be great if there was a
> >>>>> workaround
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
>


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-04 Thread Jeffrey Fetterman
Tim, do you know when you'll be able to examine and come up with a
workaround for the issue? There are alternatives to wget2 but either
they're not high performance or they're not really cut out for site
scraping.

On Mon, Apr 2, 2018 at 12:30 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
wrote:

> I can tell you the exact steps I took from nothing to a fresh install, I
> have the commands copied.
>
> install Debian from Windows Store, set up username/password
>
> $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
> /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
> (this is a workaround for Valgrind and anything else that relies
> on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>
> $ sudo apt-get update
> $ sudo apt-get upgrade
> $ sudo apt-get install autoconf autogen automake autopoint doxygen flex
> gettext git gperf lcov libtool lzip make pandoc python3.5 pkg-config
> texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
> libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
> libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
> $ sudo update-alternatives --install /usr/bin/python python
> /usr/bin/python3.5 1
>
> then the commands outlined as per the documentation. config.log attached.
>
> On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>
>> Hi Jeffrey,
>>
>>
>> basically wget2 should work on WSL, I just tested it scarcely two weeks
>> ago without issues.
>>
>>
>> I suspect it might have to do with your dependencies (e.g. did you
>> install libnghttp2-dev ?).
>>
>> To find out, please send your config.log. That allows me to see your
>> compiler, CFLAGS and the detected dependencies etc..
>>
>> I will try to reproduce the issue then.
>>
>>
>> Regards, Tim
>>
>>
>> On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>> >  wget2 will not download any files, and I think there's some sort of
>> disk
>> > access issue.
>> >
>> > this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu 16.04
>> > LTS had the same issue.)
>> >
>> > Here's the output of strace -o strace.txt -ff wget2
>> https://www.google.com
>> >
>> > https://pastebin.com/4MEL88qs
>> >
>> > wget2 -d https://www.google.com just hangs after the line
>> '02.103350.008
>> > ALPN offering http/1.1'
>> >
>> > ultimately I might have to submit a bug to WSL but I wouldn't know what
>> to
>> > report, I don't know what's wrong. And it'd be great if there was a
>> > workaround
>>
>>
>>
>


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Jeffrey Fetterman
I've submitted an issue here: https://gitlab.com/gnuwget/wget2/issues/370

On Mon, Apr 2, 2018 at 6:12 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
wrote:

> It looks like there is a way to fix it: https://github.com/Rich-
> Harris/port-authority/pull/5
>
> On Mon, Apr 2, 2018 at 6:02 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
> wrote:
>
>> I think I may have found the problem...
>>
>> *https://github.com/Microsoft/WSL/issues/1419
>> <https://github.com/Microsoft/WSL/issues/1419>*
>>
>> There's no workaround posted so I may be SOL unless anyone has any ideas.
>>
>> On Mon, Apr 2, 2018 at 4:45 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu
>> > wrote:
>>
>>> Few other notes: I am on the latest slow ring build, which is
>>> practically a necessity if you're using WSL. The build I'm on is probably
>>> about to be released publicly seeing as there's no version info on the
>>> desktop.
>>>
>>> I did try this with Windows Firewall and my antivirus disabled.
>>>
>>> I also tried this with openSUSE aside from learning that WSL openSUSE is
>>> a mess, once I got it working I ran into the same issues as on WSL Debian &
>>> WSL Ubuntu.
>>>
>>> On Mon, Apr 2, 2018 at 3:59 PM, Jeffrey Fetterman <
>>> jfett...@mail.ccsf.edu> wrote:
>>>
>>>> oh, and the hang with HTTPS and repeating errors with HTTP is exactly
>>>> the same issue I'm experiencing, yes.
>>>>
>>>> On Mon, Apr 2, 2018 at 3:59 PM, Jeffrey Fetterman <
>>>> jfett...@mail.ccsf.edu> wrote:
>>>>
>>>>> Why'd you use your wife's laptop? You can have Debian and Ubuntu
>>>>> installed on the same machine. Typing 'bash' in command prompt will go to
>>>>> your primary (generally the first one you installed) and you just type the
>>>>> OS name to get one specifically.
>>>>>
>>>>> I was thinking of trying to get it running on openSUSE but I'm worried
>>>>> I'd just run into the same issue.
>>>>>
>>>>> On Mon, Apr 2, 2018 at 3:55 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>>>>>
>>>>>> Hi Jeffrey,
>>>>>>
>>>>>>
>>>>>> back then I installed Ubuntu via WSL. A fresh build of Wget2 took
>>>>>> ~30mins on my wife's laptop. Time-wasting.
>>>>>>
>>>>>> But I can reproduce a hang with HTTPS and (repeating) errors with
>>>>>> HTTP.
>>>>>>
>>>>>>
>>>>>> This might be an issue with Windows Sockets... maybe someone has a
>>>>>> faster machine to do some testing !?
>>>>>>
>>>>>>
>>>>>> Regards, Tim
>>>>>>
>>>>>> On 02.04.2018 19:30, Jeffrey Fetterman wrote:
>>>>>> > I can tell you the exact steps I took from nothing to a fresh
>>>>>> install,
>>>>>> > I have the commands copied.
>>>>>> >
>>>>>> > install Debian from Windows Store, set up username/password
>>>>>> >
>>>>>> > $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
>>>>>> > /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
>>>>>> > (this is a workaround for Valgrind and anything else that relies
>>>>>> > on prctl(PR_SET_PTRACER) and the wget2 problem will occur either
>>>>>> way)
>>>>>> >
>>>>>> > $ sudo apt-get update
>>>>>> > $ sudo apt-get upgrade
>>>>>> > $ sudo apt-get install autoconf autogen automake autopoint doxygen
>>>>>> > flex gettext git gperf lcov libtool lzip make pandoc python3.5
>>>>>> > pkg-config texinfo valgrind libbz2-dev libgnutls28-dev
>>>>>> libgpgme11-dev
>>>>>> > libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
>>>>>> > libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev
>>>>>> zlib1g-dev
>>>>>> > $ sudo update-alternatives --install /usr/bin/python python
>>>>>> > /usr/bin/python3.5 1
>>>>>> >
>>>>>> > then the commands outlined as per the documentation. config.log
>>>>>> attached.
>>>>>> >
>>>>>> > On Mon, Apr 2, 2018 at 1

Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Jeffrey Fetterman
I think I may have found the problem...

*https://github.com/Microsoft/WSL/issues/1419
<https://github.com/Microsoft/WSL/issues/1419>*

There's no workaround posted so I may be SOL unless anyone has any ideas.

On Mon, Apr 2, 2018 at 4:45 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
wrote:

> Few other notes: I am on the latest slow ring build, which is practically
> a necessity if you're using WSL. The build I'm on is probably about to be
> released publicly seeing as there's no version info on the desktop.
>
> I did try this with Windows Firewall and my antivirus disabled.
>
> I also tried this with openSUSE aside from learning that WSL openSUSE is a
> mess, once I got it working I ran into the same issues as on WSL Debian &
> WSL Ubuntu.
>
> On Mon, Apr 2, 2018 at 3:59 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
> wrote:
>
>> oh, and the hang with HTTPS and repeating errors with HTTP is exactly the
>> same issue I'm experiencing, yes.
>>
>> On Mon, Apr 2, 2018 at 3:59 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu
>> > wrote:
>>
>>> Why'd you use your wife's laptop? You can have Debian and Ubuntu
>>> installed on the same machine. Typing 'bash' in command prompt will go to
>>> your primary (generally the first one you installed) and you just type the
>>> OS name to get one specifically.
>>>
>>> I was thinking of trying to get it running on openSUSE but I'm worried
>>> I'd just run into the same issue.
>>>
>>> On Mon, Apr 2, 2018 at 3:55 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>>>
>>>> Hi Jeffrey,
>>>>
>>>>
>>>> back then I installed Ubuntu via WSL. A fresh build of Wget2 took
>>>> ~30mins on my wife's laptop. Time-wasting.
>>>>
>>>> But I can reproduce a hang with HTTPS and (repeating) errors with HTTP.
>>>>
>>>>
>>>> This might be an issue with Windows Sockets... maybe someone has a
>>>> faster machine to do some testing !?
>>>>
>>>>
>>>> Regards, Tim
>>>>
>>>> On 02.04.2018 19:30, Jeffrey Fetterman wrote:
>>>> > I can tell you the exact steps I took from nothing to a fresh install,
>>>> > I have the commands copied.
>>>> >
>>>> > install Debian from Windows Store, set up username/password
>>>> >
>>>> > $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
>>>> > /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
>>>> > (this is a workaround for Valgrind and anything else that relies
>>>> > on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>>>> >
>>>> > $ sudo apt-get update
>>>> > $ sudo apt-get upgrade
>>>> > $ sudo apt-get install autoconf autogen automake autopoint doxygen
>>>> > flex gettext git gperf lcov libtool lzip make pandoc python3.5
>>>> > pkg-config texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
>>>> > libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
>>>> > libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
>>>> > $ sudo update-alternatives --install /usr/bin/python python
>>>> > /usr/bin/python3.5 1
>>>> >
>>>> > then the commands outlined as per the documentation. config.log
>>>> attached.
>>>> >
>>>> > On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de
>>>> > <mailto:tim.rueh...@gmx.de>> wrote:
>>>> >
>>>> > Hi Jeffrey,
>>>> >
>>>> >
>>>> > basically wget2 should work on WSL, I just tested it scarcely two
>>>> > weeks
>>>> > ago without issues.
>>>> >
>>>> >
>>>> > I suspect it might have to do with your dependencies (e.g. did you
>>>> > install libnghttp2-dev ?).
>>>> >
>>>> > To find out, please send your config.log. That allows me to see
>>>> your
>>>> > compiler, CFLAGS and the detected dependencies etc..
>>>> >
>>>> > I will try to reproduce the issue then.
>>>> >
>>>> >
>>>> > Regards, Tim
>>>> >
>>>> >
>>>> > On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>>>> > >  wget2 will not download any files, and I think there's some
>>>> > sort of disk
>>>> > > access issue.
>>>> > >
>>>> > > this is on Windows Subsystem for Linux Debian 9.3 Stretch.
>>>> > (Ubuntu 16.04
>>>> > > LTS had the same issue.)
>>>> > >
>>>> > > Here's the output of strace -o strace.txt -ff wget2
>>>> > https://www.google.com
>>>> > >
>>>> > > https://pastebin.com/4MEL88qs
>>>> > >
>>>> > > wget2 -d https://www.google.com just hangs after the line
>>>> > '02.103350.008
>>>> > > ALPN offering http/1.1'
>>>> > >
>>>> > > ultimately I might have to submit a bug to WSL but I wouldn't
>>>> > know what to
>>>> > > report, I don't know what's wrong. And it'd be great if there
>>>> was a
>>>> > > workaround
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>
>>
>


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Jeffrey Fetterman
Few other notes: I am on the latest slow ring build, which is practically a
necessity if you're using WSL. The build I'm on is probably about to be
released publicly seeing as there's no version info on the desktop.

I did try this with Windows Firewall and my antivirus disabled.

I also tried this with openSUSE aside from learning that WSL openSUSE is a
mess, once I got it working I ran into the same issues as on WSL Debian &
WSL Ubuntu.

On Mon, Apr 2, 2018 at 3:59 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
wrote:

> oh, and the hang with HTTPS and repeating errors with HTTP is exactly the
> same issue I'm experiencing, yes.
>
> On Mon, Apr 2, 2018 at 3:59 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
> wrote:
>
>> Why'd you use your wife's laptop? You can have Debian and Ubuntu
>> installed on the same machine. Typing 'bash' in command prompt will go to
>> your primary (generally the first one you installed) and you just type the
>> OS name to get one specifically.
>>
>> I was thinking of trying to get it running on openSUSE but I'm worried
>> I'd just run into the same issue.
>>
>> On Mon, Apr 2, 2018 at 3:55 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>>
>>> Hi Jeffrey,
>>>
>>>
>>> back then I installed Ubuntu via WSL. A fresh build of Wget2 took
>>> ~30mins on my wife's laptop. Time-wasting.
>>>
>>> But I can reproduce a hang with HTTPS and (repeating) errors with HTTP.
>>>
>>>
>>> This might be an issue with Windows Sockets... maybe someone has a
>>> faster machine to do some testing !?
>>>
>>>
>>> Regards, Tim
>>>
>>> On 02.04.2018 19:30, Jeffrey Fetterman wrote:
>>> > I can tell you the exact steps I took from nothing to a fresh install,
>>> > I have the commands copied.
>>> >
>>> > install Debian from Windows Store, set up username/password
>>> >
>>> > $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
>>> > /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
>>> > (this is a workaround for Valgrind and anything else that relies
>>> > on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>>> >
>>> > $ sudo apt-get update
>>> > $ sudo apt-get upgrade
>>> > $ sudo apt-get install autoconf autogen automake autopoint doxygen
>>> > flex gettext git gperf lcov libtool lzip make pandoc python3.5
>>> > pkg-config texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
>>> > libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
>>> > libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
>>> > $ sudo update-alternatives --install /usr/bin/python python
>>> > /usr/bin/python3.5 1
>>> >
>>> > then the commands outlined as per the documentation. config.log
>>> attached.
>>> >
>>> > On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de
>>> > <mailto:tim.rueh...@gmx.de>> wrote:
>>> >
>>> > Hi Jeffrey,
>>> >
>>> >
>>> > basically wget2 should work on WSL, I just tested it scarcely two
>>> > weeks
>>> > ago without issues.
>>> >
>>> >
>>> > I suspect it might have to do with your dependencies (e.g. did you
>>> > install libnghttp2-dev ?).
>>> >
>>> > To find out, please send your config.log. That allows me to see
>>> your
>>> > compiler, CFLAGS and the detected dependencies etc..
>>> >
>>> > I will try to reproduce the issue then.
>>> >
>>> >
>>> > Regards, Tim
>>> >
>>> >
>>> > On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>>> > >  wget2 will not download any files, and I think there's some
>>> > sort of disk
>>> > > access issue.
>>> > >
>>> > > this is on Windows Subsystem for Linux Debian 9.3 Stretch.
>>> > (Ubuntu 16.04
>>> > > LTS had the same issue.)
>>> > >
>>> > > Here's the output of strace -o strace.txt -ff wget2
>>> > https://www.google.com
>>> > >
>>> > > https://pastebin.com/4MEL88qs
>>> > >
>>> > > wget2 -d https://www.google.com just hangs after the line
>>> > '02.103350.008
>>> > > ALPN offering http/1.1'
>>> > >
>>> > > ultimately I might have to submit a bug to WSL but I wouldn't
>>> > know what to
>>> > > report, I don't know what's wrong. And it'd be great if there
>>> was a
>>> > > workaround
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>
>


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Jeffrey Fetterman
oh, and the hang with HTTPS and repeating errors with HTTP is exactly the
same issue I'm experiencing, yes.

On Mon, Apr 2, 2018 at 3:59 PM, Jeffrey Fetterman <jfett...@mail.ccsf.edu>
wrote:

> Why'd you use your wife's laptop? You can have Debian and Ubuntu installed
> on the same machine. Typing 'bash' in command prompt will go to your
> primary (generally the first one you installed) and you just type the OS
> name to get one specifically.
>
> I was thinking of trying to get it running on openSUSE but I'm worried I'd
> just run into the same issue.
>
> On Mon, Apr 2, 2018 at 3:55 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:
>
>> Hi Jeffrey,
>>
>>
>> back then I installed Ubuntu via WSL. A fresh build of Wget2 took
>> ~30mins on my wife's laptop. Time-wasting.
>>
>> But I can reproduce a hang with HTTPS and (repeating) errors with HTTP.
>>
>>
>> This might be an issue with Windows Sockets... maybe someone has a
>> faster machine to do some testing !?
>>
>>
>> Regards, Tim
>>
>> On 02.04.2018 19:30, Jeffrey Fetterman wrote:
>> > I can tell you the exact steps I took from nothing to a fresh install,
>> > I have the commands copied.
>> >
>> > install Debian from Windows Store, set up username/password
>> >
>> > $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
>> > /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
>> > (this is a workaround for Valgrind and anything else that relies
>> > on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>> >
>> > $ sudo apt-get update
>> > $ sudo apt-get upgrade
>> > $ sudo apt-get install autoconf autogen automake autopoint doxygen
>> > flex gettext git gperf lcov libtool lzip make pandoc python3.5
>> > pkg-config texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
>> > libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
>> > libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
>> > $ sudo update-alternatives --install /usr/bin/python python
>> > /usr/bin/python3.5 1
>> >
>> > then the commands outlined as per the documentation. config.log
>> attached.
>> >
>> > On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de
>> > <mailto:tim.rueh...@gmx.de>> wrote:
>> >
>> > Hi Jeffrey,
>> >
>> >
>> > basically wget2 should work on WSL, I just tested it scarcely two
>> > weeks
>> > ago without issues.
>> >
>> >
>> > I suspect it might have to do with your dependencies (e.g. did you
>> > install libnghttp2-dev ?).
>> >
>> > To find out, please send your config.log. That allows me to see your
>> > compiler, CFLAGS and the detected dependencies etc..
>> >
>> > I will try to reproduce the issue then.
>> >
>> >
>> > Regards, Tim
>> >
>> >
>> > On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>> > >  wget2 will not download any files, and I think there's some
>> > sort of disk
>> > > access issue.
>> > >
>> > > this is on Windows Subsystem for Linux Debian 9.3 Stretch.
>> > (Ubuntu 16.04
>> > > LTS had the same issue.)
>> > >
>> > > Here's the output of strace -o strace.txt -ff wget2
>> > https://www.google.com
>> > >
>> > > https://pastebin.com/4MEL88qs
>> > >
>> > > wget2 -d https://www.google.com just hangs after the line
>> > '02.103350.008
>> > > ALPN offering http/1.1'
>> > >
>> > > ultimately I might have to submit a bug to WSL but I wouldn't
>> > know what to
>> > > report, I don't know what's wrong. And it'd be great if there was
>> a
>> > > workaround
>> >
>> >
>> >
>>
>>
>>
>


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Jeffrey Fetterman
Why'd you use your wife's laptop? You can have Debian and Ubuntu installed
on the same machine. Typing 'bash' in command prompt will go to your
primary (generally the first one you installed) and you just type the OS
name to get one specifically.

I was thinking of trying to get it running on openSUSE but I'm worried I'd
just run into the same issue.

On Mon, Apr 2, 2018 at 3:55 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> Hi Jeffrey,
>
>
> back then I installed Ubuntu via WSL. A fresh build of Wget2 took
> ~30mins on my wife's laptop. Time-wasting.
>
> But I can reproduce a hang with HTTPS and (repeating) errors with HTTP.
>
>
> This might be an issue with Windows Sockets... maybe someone has a
> faster machine to do some testing !?
>
>
> Regards, Tim
>
> On 02.04.2018 19:30, Jeffrey Fetterman wrote:
> > I can tell you the exact steps I took from nothing to a fresh install,
> > I have the commands copied.
> >
> > install Debian from Windows Store, set up username/password
> >
> > $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
> > /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
> > (this is a workaround for Valgrind and anything else that relies
> > on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
> >
> > $ sudo apt-get update
> > $ sudo apt-get upgrade
> > $ sudo apt-get install autoconf autogen automake autopoint doxygen
> > flex gettext git gperf lcov libtool lzip make pandoc python3.5
> > pkg-config texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
> > libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
> > libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
> > $ sudo update-alternatives --install /usr/bin/python python
> > /usr/bin/python3.5 1
> >
> > then the commands outlined as per the documentation. config.log attached.
> >
> > On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen <tim.rueh...@gmx.de
> > <mailto:tim.rueh...@gmx.de>> wrote:
> >
> > Hi Jeffrey,
> >
> >
> > basically wget2 should work on WSL, I just tested it scarcely two
> > weeks
> > ago without issues.
> >
> >
> > I suspect it might have to do with your dependencies (e.g. did you
> > install libnghttp2-dev ?).
> >
> > To find out, please send your config.log. That allows me to see your
> > compiler, CFLAGS and the detected dependencies etc..
> >
> > I will try to reproduce the issue then.
> >
> >
> > Regards, Tim
> >
> >
> > On 02.04.2018 17:42, Jeffrey Fetterman wrote:
> > >  wget2 will not download any files, and I think there's some
> > sort of disk
> > > access issue.
> > >
> > > this is on Windows Subsystem for Linux Debian 9.3 Stretch.
> > (Ubuntu 16.04
> > > LTS had the same issue.)
> > >
> > > Here's the output of strace -o strace.txt -ff wget2
> > https://www.google.com
> > >
> > > https://pastebin.com/4MEL88qs
> > >
> > > wget2 -d https://www.google.com just hangs after the line
> > '02.103350.008
> > > ALPN offering http/1.1'
> > >
> > > ultimately I might have to submit a bug to WSL but I wouldn't
> > know what to
> > > report, I don't know what's wrong. And it'd be great if there was a
> > > workaround
> >
> >
> >
>
>
>


[Bug-wget] wget2 hanging, possible I/O issue

2018-04-02 Thread Jeffrey Fetterman
 wget2 will not download any files, and I think there's some sort of disk
access issue.

this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu 16.04
LTS had the same issue.)

Here's the output of strace -o strace.txt -ff wget2 https://www.google.com

https://pastebin.com/4MEL88qs

wget2 -d https://www.google.com just hangs after the line '02.103350.008
ALPN offering http/1.1'

ultimately I might have to submit a bug to WSL but I wouldn't know what to
report, I don't know what's wrong. And it'd be great if there was a
workaround