Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-06 Thread Jeffrey Fetterman
> The number of parallel downloads ? --max-threads=n

Okay, well, when I was running it earlier, I was noticing an entire
directory of pdfs slowly getting larger every time I refreshed the
directory, and there were something like 30 in there. It wasn't just five.
I was very confused and I'm not sure what's going on there, and I really
would like it to not do that.


> Likely the WSL issue is also affecting the TLS layer. TLS resume is
considered 'insecure', thus we have it disabled by default. There still is
TLS False Start enabled by default.

Are you implying TLS False Start will perform the same function as TLS
Resume?


> You likely want to use --progress=bar. --force-progress is to enable the 
> progress
bar even when redirecting (e.g. to a log file). @Darshit, we shoudl adjust
the behavior to be the same as in Wget1.x.

That does work but it's very buggy. Only one shows at a time and it doesn't
even always show the file that is downloading. Like it'll seem to be
downloading a txt file when it's really downloading several larger files in
the background.


> Did you build with http/2 and compression support ?

Yes, why?


P.S. I'm willing to help out with your documentation if you push some stuff
that makes my life on WSL a little less painful, haha. I'd run this in a VM
in an instant but I feel like that would be a bottleneck on what's supposed
to be a high performance program. Speaking of high performance, just how
much am I missing out on by not being able to take advantage of tcp fast
open?


On Fri, Apr 6, 2018 at 5:01 PM, Tim Rühsen  wrote:

> Hi Jeffrey,
>
>
> thanks for your feedback !
>
>
> On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
> > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
> > how fast it's downloading a ton of files at once, it seems like it
> must've
> > been only a small gain.
> >
> >
> > I've come across a few annoyances however.
> >
> > 1. There doesn't seem to be any way to control the size of the download
> > queue, which I dislike because I want to download a lot of large files at
> > once and I wish it'd just focus on a few at a time, rather than over a
> > dozen.
> The number of parallel downloads ? --max-threads=n
>
> > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
> > pipe) error to be thrown', seems to be related to how certificate
> > verification is handled upon resume, but I was worried at first that the
> > WLS problems were rearing their ugly head again.
> Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure',
> thus we have it disabled by default. There still is TLS False Start
> enabled by default.
>
>
> > 3. --no-check-certificate causes significantly more errors about how the
> > certificate issuer isn't trusted to be thrown (even though it's not
> > supposed to be doing anything related to certificates).
> Maybe a bit too verbose - these should be warnings, not errors.
>
> > 4. --force-progress doesn't seem to do anything despite being recognized
> as
> > a valid paramater, using it in conjunction with -nv is no longer
> beneficial.
> You likely want to use --progress=bar. --force-progress is to enable the
> progress bar even when redirecting (e.g. to a log file).
> @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
> > 5. The documentation is unclear as to how to disable things that are
> > enabled by default. Am I to assume that --robots=off is equivalent to -e
> > robots=off?
>
> -e robots=off should still work. We also allow --robots=off or --no-robots.
>
> > 6. The documentation doesn't document being able to use 'M' for
> chunk-size,
> > e.g. --chunk-size=2M
>
> The wget2 documentation has to be brushed up - one of the blockers for
> the first release.
>
> >
> > 7. The documentation's instructions regarding --progress is all wrong.
> I'll take a look the next days.
>
> >
> > 8. The http/https proxy options return as unknown options despite being
> in
> > the documentation.
> Yeah, the docs... see above. Also, proxy support is currently limited.
>
>
> > Lastly I'd like someone to look at the command I've come up with and
> offer
> > me critiques (and perhaps help me address some of the remarks above if
> > possible).
>
> No need for --continue.
> Think about using TLS Session Resumption.
> --domains is not needed in your example.
>
> Did you build with http/2 and compression support ?
>
> Regards, Tim
> > #!/bin/bash
> >
> > wget2 \
> >   `#WSL compatibility` \
> >   --restrict-file-names=windows --no-tcp-fastopen \
> >   \
> >   `#No certificate checking` \
> >   --no-check-certificate \
> >   \
> >   `#Scrape the whole site` \
> >   --continue --mirror --adjust-extension \
> >   \
> >   `#Local viewing` \
> >   --convert-links --backup-converted \
> >   \
> >   `#Efficient resuming` \
> > 

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-06 Thread Tim Rühsen
Hi Jeffrey,


thanks for your feedback !


On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
> fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
> how fast it's downloading a ton of files at once, it seems like it must've
> been only a small gain.
>
>
> I've come across a few annoyances however.
>
> 1. There doesn't seem to be any way to control the size of the download
> queue, which I dislike because I want to download a lot of large files at
> once and I wish it'd just focus on a few at a time, rather than over a
> dozen.
The number of parallel downloads ? --max-threads=n

> 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
> pipe) error to be thrown', seems to be related to how certificate
> verification is handled upon resume, but I was worried at first that the
> WLS problems were rearing their ugly head again.
Likely the WSL issue is also affecting the TLS layer. TLS resume is
considered 'insecure',
thus we have it disabled by default. There still is TLS False Start
enabled by default.


> 3. --no-check-certificate causes significantly more errors about how the
> certificate issuer isn't trusted to be thrown (even though it's not
> supposed to be doing anything related to certificates).
Maybe a bit too verbose - these should be warnings, not errors.

> 4. --force-progress doesn't seem to do anything despite being recognized as
> a valid paramater, using it in conjunction with -nv is no longer beneficial.
You likely want to use --progress=bar. --force-progress is to enable the
progress bar even when redirecting (e.g. to a log file).
@Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.

> 5. The documentation is unclear as to how to disable things that are
> enabled by default. Am I to assume that --robots=off is equivalent to -e
> robots=off?

-e robots=off should still work. We also allow --robots=off or --no-robots.

> 6. The documentation doesn't document being able to use 'M' for chunk-size,
> e.g. --chunk-size=2M

The wget2 documentation has to be brushed up - one of the blockers for
the first release.

>
> 7. The documentation's instructions regarding --progress is all wrong.
I'll take a look the next days.

>
> 8. The http/https proxy options return as unknown options despite being in
> the documentation.
Yeah, the docs... see above. Also, proxy support is currently limited.


> Lastly I'd like someone to look at the command I've come up with and offer
> me critiques (and perhaps help me address some of the remarks above if
> possible).

No need for --continue.
Think about using TLS Session Resumption.
--domains is not needed in your example.

Did you build with http/2 and compression support ?

Regards, Tim
> #!/bin/bash
>
> wget2 \
>   `#WSL compatibility` \
>   --restrict-file-names=windows --no-tcp-fastopen \
>   \
>   `#No certificate checking` \
>   --no-check-certificate \
>   \
>   `#Scrape the whole site` \
>   --continue --mirror --adjust-extension \
>   \
>   `#Local viewing` \
>   --convert-links --backup-converted \
>   \
>   `#Efficient resuming` \
>   --tls-resume --tls-session-file=.\tls.session \
>   \
>   `#Chunk-based downloading` \
>   --chunk-size=2M \
>   \
>   `#Swiper no swiping` \
>   --robots=off --random-wait \
>   \
>   `#Target` \
>   --domains=example.com example.com
>





[Bug-wget] Miscellaneous thoughts & concerns

2018-04-06 Thread Jeffrey Fetterman
Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
how fast it's downloading a ton of files at once, it seems like it must've
been only a small gain.


I've come across a few annoyances however.

1. There doesn't seem to be any way to control the size of the download
queue, which I dislike because I want to download a lot of large files at
once and I wish it'd just focus on a few at a time, rather than over a
dozen.

3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
pipe) error to be thrown', seems to be related to how certificate
verification is handled upon resume, but I was worried at first that the
WLS problems were rearing their ugly head again.

3. --no-check-certificate causes significantly more errors about how the
certificate issuer isn't trusted to be thrown (even though it's not
supposed to be doing anything related to certificates).

4. --force-progress doesn't seem to do anything despite being recognized as
a valid paramater, using it in conjunction with -nv is no longer beneficial.

5. The documentation is unclear as to how to disable things that are
enabled by default. Am I to assume that --robots=off is equivalent to -e
robots=off?

6. The documentation doesn't document being able to use 'M' for chunk-size,
e.g. --chunk-size=2M

7. The documentation's instructions regarding --progress is all wrong.

8. The http/https proxy options return as unknown options despite being in
the documentation.


Lastly I'd like someone to look at the command I've come up with and offer
me critiques (and perhaps help me address some of the remarks above if
possible).

#!/bin/bash

wget2 \
  `#WSL compatibility` \
  --restrict-file-names=windows --no-tcp-fastopen \
  \
  `#No certificate checking` \
  --no-check-certificate \
  \
  `#Scrape the whole site` \
  --continue --mirror --adjust-extension \
  \
  `#Local viewing` \
  --convert-links --backup-converted \
  \
  `#Efficient resuming` \
  --tls-resume --tls-session-file=.\tls.session \
  \
  `#Chunk-based downloading` \
  --chunk-size=2M \
  \
  `#Swiper no swiping` \
  --robots=off --random-wait \
  \
  `#Target` \
  --domains=example.com example.com


Re: [Bug-wget] make.exe warnings

2018-04-06 Thread Darshit Shah
All of these warnings happen due to gnulib replacing standard Unix API calls 
with replacements for system specific implementations.

I guess that removing the warnings for redefinition definitely makes sense in 
the gnulib code.  Redefining functions is quite literally its job.

The pointer type warnings, I would want to look at them once and decide whether 
to fix or silence them

On April 6, 2018 7:39:30 AM UTC, "Tim Rühsen"  wrote:
>On 04/06/2018 04:30 AM, Jeffrey Fetterman wrote:
>> I've successfully built wget2 through msys2 as a Windows binary, and
>it
>> appears to be working (granted I've not used it much yet), but I'm
>> concerned about some of the warnings that occurred during
>compilation.
>> 
>> Unsurprisingly they seem to be socket-related.
>> 
>> https://spit.mixtape.moe/view/9f38bd83
>
>These are warnings from gnulib code. The code itself looks good to me.
>Our CFLAGS for building the gnulib code are maybe too strong, I'll see
>if reducing verbosity is recommended here.
>
>With Best Regards, Tim

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [Bug-wget] wget2 hanging, possible I/O issue

2018-04-06 Thread Tim Rühsen
On 04/04/2018 01:32 PM, Jeffrey Fetterman wrote:
> How well does TeamViewer work on Linux? My laptop has been collecting dust,
> I can just leave it running for a couple days with a fresh install of
> Windows and a fresh install of WSL Debian (with apt-get update and upgrade
> already ran)

I made some tests yesterday without success.
--no-tcp-fastopen makes a small difference, write() sets errno to 32
(broken pipe).
Removing the gnulib wrapper code didn't make a difference, neither did
removal of SO_REUSEADDR.

Regards, Tim

> 
> On Wed, Apr 4, 2018 at 3:22 AM, Tim Rühsen  wrote:
> 
>> Hi Jeffrey,
>>
>> possibly I can get my hands on a fast Win10 desktop the coming
>> weekend... no promise but I'll try.
>>
>>
>> With Best Regards, Tim
>>
>>
>>
>> On 04/04/2018 09:54 AM, Tim Rühsen wrote:
>>> Hi Jeffrey,
>>>
>>> I can't tell you. Basically because the only WSL I can get my hands on
>>> is on my wife's laptop which is *very* slow. And it needs some analysis
>>> on that side, maybe with patches for gnulib. Send me a fast Win10
>>> machine and I analyse+fix the problem ;-)
>>>
>>>
>>> BTW, we are also not using SO_REUSEPORT. The links you provided assume
>>> that it's a problem in that area. All I can say is that Wget2 was
>>> definitely working on WSL just a few weeks ago.
>>>
>>>
>>> Another option for you is to install Debian/Ubuntu in a VM. Until the
>>> hickups with WSL have been solved one or another way.
>>>
>>>
>>> With Best Regards, Tim
>>>
>>>
>>> On 04/04/2018 09:01 AM, Jeffrey Fetterman wrote:
 Tim, do you know when you'll be able to examine and come up with a
 workaround for the issue? There are alternatives to wget2 but either
 they're not high performance or they're not really cut out for site
 scraping.

 On Mon, Apr 2, 2018 at 12:30 PM, Jeffrey Fetterman <
>> jfett...@mail.ccsf.edu>
 wrote:

> I can tell you the exact steps I took from nothing to a fresh install,
>> I
> have the commands copied.
>
> install Debian from Windows Store, set up username/password
>
> $ sudo sh -c "echo kernel.yama.ptrace_scope = 0 >>
> /etc/sysctl.d/10-ptrace.conf; sysctl --system -a -p | grep yama"
> (this is a workaround for Valgrind and anything else that relies
> on prctl(PR_SET_PTRACER) and the wget2 problem will occur either way)
>
> $ sudo apt-get update
> $ sudo apt-get upgrade
> $ sudo apt-get install autoconf autogen automake autopoint doxygen flex
> gettext git gperf lcov libtool lzip make pandoc python3.5 pkg-config
> texinfo valgrind libbz2-dev libgnutls28-dev libgpgme11-dev
> libiconv-hook-dev libidn2-0-dev liblzma-dev libnghttp2-dev
> libmicrohttpd-dev libpcre3-dev libpsl-dev libunistring-dev zlib1g-dev
> $ sudo update-alternatives --install /usr/bin/python python
> /usr/bin/python3.5 1
>
> then the commands outlined as per the documentation. config.log
>> attached.
>
> On Mon, Apr 2, 2018 at 11:53 AM, Tim Rühsen 
>> wrote:
>
>> Hi Jeffrey,
>>
>>
>> basically wget2 should work on WSL, I just tested it scarcely two
>> weeks
>> ago without issues.
>>
>>
>> I suspect it might have to do with your dependencies (e.g. did you
>> install libnghttp2-dev ?).
>>
>> To find out, please send your config.log. That allows me to see your
>> compiler, CFLAGS and the detected dependencies etc..
>>
>> I will try to reproduce the issue then.
>>
>>
>> Regards, Tim
>>
>>
>> On 02.04.2018 17:42, Jeffrey Fetterman wrote:
>>>  wget2 will not download any files, and I think there's some sort of
>> disk
>>> access issue.
>>>
>>> this is on Windows Subsystem for Linux Debian 9.3 Stretch. (Ubuntu
>> 16.04
>>> LTS had the same issue.)
>>>
>>> Here's the output of strace -o strace.txt -ff wget2
>> https://www.google.com
>>>
>>> https://pastebin.com/4MEL88qs
>>>
>>> wget2 -d https://www.google.com just hangs after the line
>> '02.103350.008
>>> ALPN offering http/1.1'
>>>
>>> ultimately I might have to submit a bug to WSL but I wouldn't know
>> what
>> to
>>> report, I don't know what's wrong. And it'd be great if there was a
>>> workaround
>>
>>
>>
>

>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] make.exe warnings

2018-04-06 Thread Tim Rühsen
On 04/06/2018 04:30 AM, Jeffrey Fetterman wrote:
> I've successfully built wget2 through msys2 as a Windows binary, and it
> appears to be working (granted I've not used it much yet), but I'm
> concerned about some of the warnings that occurred during compilation.
> 
> Unsurprisingly they seem to be socket-related.
> 
> https://spit.mixtape.moe/view/9f38bd83

These are warnings from gnulib code. The code itself looks good to me.
Our CFLAGS for building the gnulib code are maybe too strong, I'll see
if reducing verbosity is recommended here.

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature