Re: About Automated Unit Test for Wget
2008/4/5, Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: This system allows us to write unit-tests if we'd like to, but mostly so far we've focused to test it system-wide. It is hard enough for us! Yeah, I thought I'd seen something like that; I was thinking we might even be able to appropriate some of that, if that looked doable. Except that I preferred faking the server completely, so I could deal better with cross-site issues, which AFAICT are significantly more important to Wget than they are to Curl. It seems that abstraction of network API needs more discussion, so I would focus on the server emulation By the way, How about using LD_PRELOAD ? I tested a little and it seems to be working. If we use this, we can test by overriding socket interface, and still we don't change wget real source code. --main.c -- #include stdio.h int main(void) { puts(Helow Wgets\n); return 0; } --testputs.c #include stdio.h int puts(const char *str) { while(*str) putchar(*str++); printf(This is a test module); putchar('\n'); } - --Compile like below: [EMAIL PROTECTED] Test]$ gcc main.c -o main [EMAIL PROTECTED] Test]$ gcc -fPIC -shared -o testputs.so testputs.c --Execute like below: [EMAIL PROTECTED] Test]$ ./main Helow Wgets [EMAIL PROTECTED] Test]$ LD_PRELOAD=./testputs.so ./main Helow Wgets This is a test module -- I found this way on the net, and sample was using wget !! they are overriding socket, close, and connect. http://www.t-dori.net/forensics/hook_tcp.cpp -- Yoshihiro TANAKA
Re: About Automated Unit Test for Wget
2008/4/5, Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: This system allows us to write unit-tests if we'd like to, but mostly so far we've focused to test it system-wide. It is hard enough for us! Yeah, I thought I'd seen something like that; I was thinking we might even be able to appropriate some of that, if that looked doable. Except that I preferred faking the server completely, so I could deal better with cross-site issues, which AFAICT are significantly more important to Wget than they are to Curl. It seems that abstraction of network API needs more discussion, so I would focus on the server emulation By the way, How about using LD_PRELOAD ? I tested a little and it seems to be working. If we use this, we can test by overriding socket interface, and still we don't change wget real source code. I found this way on the net, and sample was using wget !! they are overriding socket, close, connect. --main.c -- #include stdio.h int main(void) { puts(Helow Wgets\n); return 0; } --testputs.c #include stdio.h int puts(const char *str) { while(*str) putchar(*str++); printf(This is a test module); putchar('\n'); } - --Compile like below: [EMAIL PROTECTED] Test]$ gcc main.c -o main [EMAIL PROTECTED] Test]$ gcc -fPIC -shared -o testputs.so testputs.c --Execute like below: [EMAIL PROTECTED] Test]$ ./main Helow Wgets [EMAIL PROTECTED] Test]$ LD_PRELOAD=./testputs.so ./main Helow Wgets This is a test module -- Yoshihiro TANAKA SFSU CS Department
Re: About Automated Unit Test for Wget
2008/4/4, Micah Cowan [EMAIL PROTECTED]: IMO, if it's worth testing, it's probably better to have external linkage anyway. I got it. 1) Select functions which can be tested in unit test. But How can I select them? is difficult. Basically the less dependency the function has, more easy to include in unit test, but about boundary line, I'm not sure. This is precisely the problem, and one reason I've been thinking that this might not make an ideal topic for a GSoC proposal, unless you want to include refactoring existing functions like gethttp and http_loop into more logically discreet sets of functions. Essentially, to get better coverage of the code that needs it the most, that code will need to be rewritten. I believe this can be an iterative process (find one function to factor out, write a unit test for it, make it work...). Yes, since I want to write proposal for Unit testing, I can't skip this problem. But considering GSoC program is only 2 month, I'd rather narrow down the target - to gethttp funcion. Although I'm not well aware of source code, I'm thinking like below: In gethttp function there are roughly six chunk of functionality. 1.preparation of request 2.making header part of HTTP proxy_auth generate_hosthead , and other process to make header 3.connection persistent_available_p establishment of connection to host ssl_connection process 4.http request process request send read request response checking status codes 5.local file - related process ( a bunch of process...) deterimine filename file existence check noclobber, -O check timestamping check Content-Length check Keep-Alive response check Authorize process Set-cookie header Content-Range check filename dealing (HTML Extention) 416 status code dealing open local file 6.download body part writing into local file So, Basically I think it could be divided into these functionality. And after that each functionality would be divided into more small pieces to the extent that unit tests can be done separately. In addition to above, we have to think about abstraction of network API and file I/O API. But network API(such as fd_read_body, fd_read_hunk) exists in retr.c, and socket is opened in connect.c file, it looks that abstraction of network API would require major modification of interfaces. And design of this would not be proper for me, I think. So what I want to suggest is that I want to ask interface _design_. How do you think ? At least I want to narrow down the scope within I can take responsiblity. What I'm most keenly interested in, is the ability to verify the logic of how follow/don't-follow is decided (that actually may not be too hard to write tests against now), how Wget handles various protocol-level situations, how it chooses the filename and deals with the local filesystem, etc. I will be very, _very_ happy when everything that's in http_loop and gethttp is verified by unit tests. But a lot of getting to where we can test that may mean abstracting out things like the Berkeley Sockets API and filesystem interactions, so that we can drop in fake replacements for testing. I'd like to try, if we could settle down the problem of interface design... I'm familiar with a framework called (simply) Check, which might be worth considering. It forks a new process for each test, which isolates it from interfering with the other tests, and also provides a clean way to handle things like segmentation violations or aborts. However, it's intended for Unix, and probably doesn't compile on other systems. http://check.sourceforge.net/ Thank you for your information :) -- Yoshihiro TANAKA
Re: About Automated Unit Test for Wget
2008/4/5, Yoshihiro Tanaka [EMAIL PROTECTED]: 2008/4/4, Micah Cowan [EMAIL PROTECTED]: IMO, if it's worth testing, it's probably better to have external linkage anyway. I got it. 1) Select functions which can be tested in unit test. But How can I select them? is difficult. Basically the less dependency the function has, more easy to include in unit test, but about boundary line, I'm not sure. This is precisely the problem, and one reason I've been thinking that this might not make an ideal topic for a GSoC proposal, unless you want to include refactoring existing functions like gethttp and http_loop into more logically discreet sets of functions. Essentially, to get better coverage of the code that needs it the most, that code will need to be rewritten. I believe this can be an iterative process (find one function to factor out, write a unit test for it, make it work...). Yes, since I want to write proposal for Unit testing, I can't skip this problem. But considering GSoC program is only 2 month, I'd rather narrow down the target - to gethttp funcion. Although I'm not well aware of source code, I'm thinking like below: In gethttp function there are roughly six chunk of functionality. 1.preparation of request 2.making header part of HTTP proxy_auth generate_hosthead , and other process to make header 3.connection persistent_available_p establishment of connection to host ssl_connection process 4.http request process request send read request response checking status codes 5.local file - related process ( a bunch of process...) deterimine filename file existence check noclobber, -O check timestamping check Content-Length check Keep-Alive response check Authorize process Set-cookie header Content-Range check filename dealing (HTML Extention) 416 status code dealing open local file 6.download body part writing into local file So, Basically I think it could be divided into these functionality. And after that each functionality would be divided into more small pieces to the extent that unit tests can be done separately. In addition to above, we have to think about abstraction of network API and file I/O API. But network API(such as fd_read_body, fd_read_hunk) exists in retr.c, and socket is opened in connect.c file, it looks that abstraction of network API would require major modification of interfaces. Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. And design of this would not be proper for me, I think. So what I want to suggest is that I want to ask interface _design_. How do you think ? At least I want to narrow down the scope within I can take responsiblity. What I'm most keenly interested in, is the ability to verify the logic of how follow/don't-follow is decided (that actually may not be too hard to write tests against now), how Wget handles various protocol-level situations, how it chooses the filename and deals with the local filesystem, etc. I will be very, _very_ happy when everything that's in http_loop and gethttp is verified by unit tests. But a lot of getting to where we can test that may mean abstracting out things like the Berkeley Sockets API and filesystem interactions, so that we can drop in fake replacements for testing. I'd like to try, if we could settle down the problem of interface design... I'm familiar with a framework called (simply) Check, which might be worth considering. It forks a new process for each test, which isolates it from interfering with the other tests, and also provides a clean way to handle things like segmentation violations or aborts. However, it's intended for Unix, and probably doesn't compile on other systems. http://check.sourceforge.net/ Thank you for your information :) -- Yoshihiro TANAKA -- Yoshihiro TANAKA
About Automated Unit Test for Wget
Hello, I want to ask about Unit test of Wget in the future. I want to ask about unit test. Now unit test of Wget is written only for following .c files. -- http.c init.c main.c res.c url.c utils.c (test.c) So as written in Wiki, new unit test suite is necessary. (ref. http://wget.addictivecode.org/FeatureSpecifications/Testing) In order to make new unit test suite, I think following work is necessary. 1) Select functions which can be tested in unit test. But How can I select them? is difficult. Basically the less dependency the function has, more easy to include in unit test, but about boundary line, I'm not sure. 2) In order to do above 1), How about Making a list of all functions in Wget? and maintain? The advantage of 2) is ... * make clear which function is included in Unit test * make clear which function is _not_ in Unit test * make easy to manage testing * make easy to devide testing work So once this list is done, it would become easier to maintain testing schedule, progress, etc.. And when Unit test suite is done, this list should be able to be generated automatically... and to do regression test, all we do is just run Unit test again :) 3) Contents of list I come up is following: * Wget version num * Filename * function name * Included in Unit Test or not * Simple Call graph of the function So let me ask your opinions. And is there any suggestion about unit test of Wget? (test tools, other preliminary work for unit test, how manage it ...) Thank you for your time. -- Yoshihiro TANAKA
Re: About file format for MetaDataBase
it doesn't understand _anyway_, and any other important changes will pretty much require a major version dump, does it actually make sense to distinguish (I meant bump.) an SIDB 1.0 from an SIDB 1.1? At least minor version would help when we check the contents of SIDB file. In the case like, why this item is/is not writen here? That's true; but actually, using the Wget version number instead could be more informative in that way. We could write that information as well (but give it no semantic meaning: just intended for human readers). That way, we wouldn't have to remember to be sure to bump the SIDB version number every time we add a new header type (I'm not as worried about the major version bumps: I think we'll remember to bump for truly incompatible changes). Yes, if we could do without more information, it would be better. I just wandering it might be useful. How about the case like this?: Wget 1.12 SIDB 1.0 Wget 1.13 SIDB 1.1 Wget 1.14 SIDB 1.1 Wget 1.15 SIDB 1.1 Wget 1.16 SIDB 1.2 For me, if SIDB has version number, it looks clear which version of Wget uses which format of SIDB. This is my impression, so please tell me how do you feel. Thank you for your time. -- Yoshihiro TANAKA
About file format for MetaDataBase
Hello, My name is Yoshihiro TANAKA. I'm interested in GSOC, and MetaDataBase project. So let me ask about file format for MetaDataBase(SIDB). Considering forwards-compatibility, Wget should be able to ignore items it does not recognize. For this, Wget has to know which data belongs to which item. So how about csv, with delimiter | ? It would look like below. - first line:Wget Start at MMSSMMHH-DDMM second line:SIDB Version:1.13 third line:Wget invocation configration fourth line:titleline:URL|StatusCode|Filepath|MIME-Type|.. fifth line, and below:data lines bra|bra|bra|bra|bra|bra|... data lines bra|bra|bra|bra|bra|bra|... data lines bra|bra|bra|bra|bra|bra|... data lines bra|bra|bra|bra|bra|bra|... data lines bra|bra|bra|bra|bra|bra|... data lines bra|bra|bra|bra|bra|bra|... last line:Wget End at MMSSMMHH-DDMM --- The advantage of this format is: 1. Wget can recognize start/end of session 2. Wget can recognize which data belongs to with item (It includes configuration infor in title line) 3. Wget can recognize the version of this SIDB file (It does not have to be same to that of Wget) Case 1: When Older Wget reads newer version of SIDB file, it can only read items which it recognizes. Case 2: When Newer Wget wants to use old version SIDB file, it can check Version of file, and cope with it. Case 3: When New Wget wants to use new version SIDB file as Old version SIDB file, it can specify version of SIDB file like: # Wget -VSIDB 1.12 which means even SIDB file version is 1.13, Wget treat it as version 1.12 file. so please comment on this file format. Thank you for your time. -- Yoshihiro TANAKA SFSU CS Department