Re: [Bug-wget] Wget not finding image references in javascript source
Sounds reasonable Darshit thanks for the explanation. Rather than actually parsing javascript (or use a headless browser etc) I was thinking wget could use a regex for the simplest case of an image with the jpg/png/gif extension embedded in javascript strings. But I do realize that there is overhead to that and there are many edge cases to how a javascript string might be built dynamically so it may be just too much risk to even try to do that but maybe not if it's only for the specific case of a valid absolute/relative path to an image. -- Zane On Mon, Feb 29, 2016 at 10:59 PM, Darshit Shah wrote: > Hi Zane, > > The question of supporting links and images emebdded via javascript props up > fairly often. JS is a dynamic scripting language and the code path taken > depends on the user's interaction with the page. To simulate this, we would > need a full JS engine inside of Wget. Apart from being large and clumsy this > would also be impossible for us to maintain. As a result, we do not and have > no plans to support parsing JS code in Wget in the near future. > > If you have any ideas that would help implement this without needing a full > JS engine, do let us know. We'd be interested in hearing and evaluating new > options. > > On 02/29, Zane Staggs wrote: >> >> It seems wget ignores image paths that exist in javascript source like >> in a simple path string like "/path/to/my/image.jpg". I realize it's >> probably not easy to do parse every js string for an image path but >> wondering if there are ways to make it work or plans to implement it. >> I got around it for now by creating a dummy hidden img element with >> the src so wget could find it in the dom. Thanks. >> > > -- > Thanking You, > Darshit Shah
Re: [Bug-wget] Google Summer of Code 2016
Just more ideas for you, Kushagra: There are many command line options from Wget still missing in Wget2, you should have a look at https://github.com/rockdaboot/wget2/wiki anyways - feel free to work on the wiki yourself (e.g. fork the wiki pages: https://help.github.com/articles/adding-and-editing-wiki-pages-locally/ or let me know and I'll give you write access). You can search the Wget bug tracker (https://savannah.gnu.org/bugs/?group=wget) for wishlist items. My favorite is https://savannah.gnu.org/bugs/?45803. Special popen(2|3) functions/code is already in libwget/ directory. E.g., that would allow Wget2 to be used as part of a recursive website malware checker. The authorization code in the test suite is not complete/not implemented - I once tested authorization (MD5, MD5-sess) 'by hand' with my local Apache. But a automated test is badly needed. We thought of a statistic module (very basic code exists) for spider mode to output diagnostics very detailed. Missing pages, response times, server load (e.g. using the RTT/ping time), etc. Tim On Wednesday 02 March 2016 10:51:02 Kushagra Singh wrote: > Hi, > > Thanks for the quick reply. I went through the repository and the issues, > and found a couple of things I would like to work on. > > I have a couple of questions about Wget2. Is it a complete rewrite of the > Wget project, available at git://git.savannah.gnu.org/wget.git, or are we > using existing code and extending functionality? I guess it is the second > one because I saw `libwget` in the repo. However if such is the case, then > how do we change existing functions in wget? For example, implementing [2] > would require making changes to the file cookies.c, which is present in > /src in the wget repo, but not in /src in the wget2 repo. > > I was looking at #43 [1], and have already submitted a patch for > consideration for the first suggestion [2]. The second suggestion mentioned > [3] is one of the things I'd like to work on, however this is not something > which will take three months :) > > Another project I am interested in, is implementing FTPS. I saw this listed > under one of the ideas of GSoC 2015, but I'm not sure whether it was > implemented, as I didn't see it under 'Development Status' in the wget2 > readme on Github. > > Also, in #67 [4], we are talking about adhering to some specific parts of > RFC 7230. I'm not sure which all parts would be right, as the discussion > thread mentions that it won't be good to stick to each point of the RFC. > WDYT? > > > [1] https://github.com/rockdaboot/wget2/issues/43 > [2] https://tools.ietf.org/html/draft-west-leave-secure-cookies-alone-04 > [3] https://tools.ietf.org/html/draft-west-cookie-prefixes-05 > [4] https://github.com/rockdaboot/wget2/issues/67 > > On Tue, Mar 1, 2016 at 9:57 PM, Giuseppe Scrivano wrote: > > Kushagra Singh writes: > > > Hi, > > > > > > Will we be taking part in GSoC this year? I would really like to work on > > > > a > > > > > project related to Wget this summer. Any specific ideas that are of > > > importance to the community presently? > > > > yes, we will be take part in GSoC. I think we would like to see more > > work happening on wget2, at the moment there is a list of issues on > > > > github that can be useful to you to pick some ideas to work on: > > https://github.com/rockdaboot/wget2/issues > > > > Could you take a look at it? Do you see anything interesting that you > > would like to work on? > > > > Regards, > > Giuseppe signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] buildbot failure in OpenCSW Buildbot on wget-solaris10-sparc
The problem is not related to the latest commit(s). It it SSLv2 related stuff on the build farm: Running Test HSTS basic test Traceback (most recent call last): File "./Test-hsts.py", line 75, in test.setup() File "/home/rockdaboot/wget/testenv/test/http_test.py", line 30, in setup self.server_setup() File "/home/rockdaboot/wget/testenv/test/base_test.py", line 85, in server_setup instance = self.instantiate_server_by(protocol) File "/home/rockdaboot/wget/testenv/test/http_test.py", line 51, in instantiate_server_by HTTPS: HTTPSd}[protocol]() File "/home/rockdaboot/wget/testenv/server/http/http_server.py", line 470, in __init__ self.server_inst = self.server_class(addr, self.handler) File "/home/rockdaboot/wget/testenv/server/http/http_server.py", line 38, in __init__ import ssl File "/opt/csw/lib/python3.3/ssl.py", line 60, in import _ssl # if we can't import it, let the error propagate ImportError: ld.so.1: python3.3: fatal: relocation error: file /opt/csw/lib/python3.3/lib-dynload/_ssl.so: symbol SSLv2_method: referenced symbol not found FAIL Test-hsts.py (exit status: 1) I *guess* that is has to do with the latest SSLv2 vulnerability and that the underlying OpenSSL library has been exchanged without taking care for the python module. Tim On Thursday 03 March 2016 10:13:55 build...@opencsw.org wrote: > The Buildbot has detected a new failure on builder wget-solaris10-sparc > while building wget. Full details are available at: > https://buildfarm.opencsw.org/buildbot/builders/wget-solaris10-sparc/builds > /131 > > Buildbot URL: https://buildfarm.opencsw.org/buildbot/ > > Buildslave for this Build: unstable10s > > Build Reason: scheduler > Build Source Stamp: [branch master] 44aedd832197e32abbb4cb9582774c2ca8b8fa43 > Blamelist: Giuseppe Scrivano ,Maks Orlovich > > > BUILD FAILED: failed shell_3 > > sincerely, > -The Buildbot signature.asc Description: This is a digitally signed message part.
[Bug-wget] buildbot failure in OpenCSW Buildbot on wget-solaris10-sparc
The Buildbot has detected a new failure on builder wget-solaris10-sparc while building wget. Full details are available at: https://buildfarm.opencsw.org/buildbot/builders/wget-solaris10-sparc/builds/131 Buildbot URL: https://buildfarm.opencsw.org/buildbot/ Buildslave for this Build: unstable10s Build Reason: scheduler Build Source Stamp: [branch master] 44aedd832197e32abbb4cb9582774c2ca8b8fa43 Blamelist: Giuseppe Scrivano ,Maks Orlovich BUILD FAILED: failed shell_3 sincerely, -The Buildbot
[Bug-wget] buildbot failure in OpenCSW Buildbot on wget-solaris10-i386
The Buildbot has detected a new failure on builder wget-solaris10-i386 while building wget. Full details are available at: https://buildfarm.opencsw.org/buildbot/builders/wget-solaris10-i386/builds/126 Buildbot URL: https://buildfarm.opencsw.org/buildbot/ Buildslave for this Build: unstable10x Build Reason: scheduler Build Source Stamp: [branch master] 44aedd832197e32abbb4cb9582774c2ca8b8fa43 Blamelist: Giuseppe Scrivano ,Maks Orlovich BUILD FAILED: failed shell_3 sincerely, -The Buildbot
Re: [Bug-wget] Patch for understanding srcset= on img tags.
Maksim Orlovich writes: >> should the condition be (c == ')' && in_paren) ? > > Indeed. > > Thanks, > Maks Thanks for the changes, I am going to push it shortly. Regards, Giuseppe