Public bug reported: The attached apport file was created from a segfault/core-dump observed while using wget to try to audit a large number of websites to determine which ones were online, which were redirects and where they redirected to, etc.
The exact command-line attempts a considerable amount of obfuscation and cares nothing at all for the files that are actually downloaded, which are occasionally harvested for free space. The harvester did not run anytime near this crash, though. wget --tries=3 -i /path/to/getlist.txt -U 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36' --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8" --header="Accept-Encoding: gzip, deflate, br" --header="Accept-Language: en-US,en;q=0.8" --header="Cache-Control: max-age=0" --header="Referer: https://www.google.com/" -e robots=off --wait 0.5 --random-wait 2>&1 | tee /path/to/logfile.txt The getlist contained 144,551 URLs to process; this happened at the 44,417th URL. Wget successfully downloads the nearby URLs just fine now; but here is the last several lines of logfile.txt: - - - - - - - - --2017-07-15 04:05:13-- http://urlshortener.actorsandcrew.com/ Resolving urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)... 64.13.228.85 Connecting to urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)|64.13.228.85|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1515 (1.5K) [text/html] Saving to: ‘index.html.4732’ 0K . 100% 127M=0s 2017-07-15 04:05:19 (127 MB/s) - ‘index.html.4732’ saved [1515/1515] --2017-07-15 04:05:19-- http://varganess.soclog.se/p Resolving varganess.soclog.se (varganess.soclog.se)... 83.140.155.4 Connecting to varganess.soclog.se (varganess.soclog.se)|83.140.155.4|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se Location: http://dayviews.com [following] --2017-07-15 04:05:25-- http://dayviews.com/ Connecting to dayviews.com (dayviews.com)|83.140.155.40|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘p’ 0K .......... ........ 115K=0.2s 2017-07-15 04:05:26 (115 KB/s) - ‘p’ saved [19057] - - - - - - - - The next site up for audit after this saved event was emitted was http://drivingrevenue.com/ , which also downloads just fine when I run it as a one-off. ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: wget 1.17.1-1ubuntu1.2 ProcVersionSignature: Ubuntu 4.4.0-75.96-generic 4.4.59 Uname: Linux 4.4.0-75-generic x86_64 ApportVersion: 2.20.1-0ubuntu2.9 Architecture: amd64 Date: Mon Jul 17 12:40:33 2017 InstallationDate: Installed on 2014-06-23 (1120 days ago) InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2) ProcEnviron: LC_CTYPE=en_US.UTF-8 TERM=screen PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: wget UpgradeStatus: Upgraded to xenial on 2016-05-05 (437 days ago) ** Affects: wget (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug xenial -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to wget in Ubuntu. https://bugs.launchpad.net/bugs/1704843 Title: segfault processing very large input list Status in wget package in Ubuntu: New Bug description: The attached apport file was created from a segfault/core-dump observed while using wget to try to audit a large number of websites to determine which ones were online, which were redirects and where they redirected to, etc. The exact command-line attempts a considerable amount of obfuscation and cares nothing at all for the files that are actually downloaded, which are occasionally harvested for free space. The harvester did not run anytime near this crash, though. wget --tries=3 -i /path/to/getlist.txt -U 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36' --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8" --header="Accept-Encoding: gzip, deflate, br" --header="Accept- Language: en-US,en;q=0.8" --header="Cache-Control: max-age=0" --header="Referer: https://www.google.com/" -e robots=off --wait 0.5 --random-wait 2>&1 | tee /path/to/logfile.txt The getlist contained 144,551 URLs to process; this happened at the 44,417th URL. Wget successfully downloads the nearby URLs just fine now; but here is the last several lines of logfile.txt: - - - - - - - - --2017-07-15 04:05:13-- http://urlshortener.actorsandcrew.com/ Resolving urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)... 64.13.228.85 Connecting to urlshortener.actorsandcrew.com (urlshortener.actorsandcrew.com)|64.13.228.85|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1515 (1.5K) [text/html] Saving to: ‘index.html.4732’ 0K . 100% 127M=0s 2017-07-15 04:05:19 (127 MB/s) - ‘index.html.4732’ saved [1515/1515] --2017-07-15 04:05:19-- http://varganess.soclog.se/p Resolving varganess.soclog.se (varganess.soclog.se)... 83.140.155.4 Connecting to varganess.soclog.se (varganess.soclog.se)|83.140.155.4|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se Cookie coming from varganess.soclog.se attempted to set domain to bilddagboken.se Location: http://dayviews.com [following] --2017-07-15 04:05:25-- http://dayviews.com/ Connecting to dayviews.com (dayviews.com)|83.140.155.40|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘p’ 0K .......... ........ 115K=0.2s 2017-07-15 04:05:26 (115 KB/s) - ‘p’ saved [19057] - - - - - - - - The next site up for audit after this saved event was emitted was http://drivingrevenue.com/ , which also downloads just fine when I run it as a one-off. ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: wget 1.17.1-1ubuntu1.2 ProcVersionSignature: Ubuntu 4.4.0-75.96-generic 4.4.59 Uname: Linux 4.4.0-75-generic x86_64 ApportVersion: 2.20.1-0ubuntu2.9 Architecture: amd64 Date: Mon Jul 17 12:40:33 2017 InstallationDate: Installed on 2014-06-23 (1120 days ago) InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2) ProcEnviron: LC_CTYPE=en_US.UTF-8 TERM=screen PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: wget UpgradeStatus: Upgraded to xenial on 2016-05-05 (437 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1704843/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp

