Re: About Automated Unit Test for Wget
2008/4/5, Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: This system allows us to write unit-tests if we'd like to, but mostly so far we've focused to test it system-wide. It is hard enough for us! Yeah, I thought I'd seen something like that; I was thinking we might even be able to appropriate some of that, if that looked doable. Except that I preferred faking the server completely, so I could deal better with cross-site issues, which AFAICT are significantly more important to Wget than they are to Curl. It seems that abstraction of network API needs more discussion, so I would focus on the server emulation By the way, How about using LD_PRELOAD ? I tested a little and it seems to be working. If we use this, we can test by overriding socket interface, and still we don't change wget real source code. --main.c -- #include stdio.h int main(void) { puts(Helow Wgets\n); return 0; } --testputs.c #include stdio.h int puts(const char *str) { while(*str) putchar(*str++); printf(This is a test module); putchar('\n'); } - --Compile like below: [EMAIL PROTECTED] Test]$ gcc main.c -o main [EMAIL PROTECTED] Test]$ gcc -fPIC -shared -o testputs.so testputs.c --Execute like below: [EMAIL PROTECTED] Test]$ ./main Helow Wgets [EMAIL PROTECTED] Test]$ LD_PRELOAD=./testputs.so ./main Helow Wgets This is a test module -- I found this way on the net, and sample was using wget !! they are overriding socket, close, and connect. http://www.t-dori.net/forensics/hook_tcp.cpp -- Yoshihiro TANAKA
Re: About Automated Unit Test for Wget
2008/4/5, Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: This system allows us to write unit-tests if we'd like to, but mostly so far we've focused to test it system-wide. It is hard enough for us! Yeah, I thought I'd seen something like that; I was thinking we might even be able to appropriate some of that, if that looked doable. Except that I preferred faking the server completely, so I could deal better with cross-site issues, which AFAICT are significantly more important to Wget than they are to Curl. It seems that abstraction of network API needs more discussion, so I would focus on the server emulation By the way, How about using LD_PRELOAD ? I tested a little and it seems to be working. If we use this, we can test by overriding socket interface, and still we don't change wget real source code. I found this way on the net, and sample was using wget !! they are overriding socket, close, connect. --main.c -- #include stdio.h int main(void) { puts(Helow Wgets\n); return 0; } --testputs.c #include stdio.h int puts(const char *str) { while(*str) putchar(*str++); printf(This is a test module); putchar('\n'); } - --Compile like below: [EMAIL PROTECTED] Test]$ gcc main.c -o main [EMAIL PROTECTED] Test]$ gcc -fPIC -shared -o testputs.so testputs.c --Execute like below: [EMAIL PROTECTED] Test]$ ./main Helow Wgets [EMAIL PROTECTED] Test]$ LD_PRELOAD=./testputs.so ./main Helow Wgets This is a test module -- Yoshihiro TANAKA SFSU CS Department
Re: About Automated Unit Test for Wget
Micah Cowan [EMAIL PROTECTED] writes: I don't see what you see wrt making the code harder to follow and reason about (true abstraction rarely does, AFAICT, I was referring to the fact that adding an abstraction layer requires learning about the abstraction layer, both its concepts and its implementation, including its quirks and limitations. Too general abstractions added to application software are typically to be underspecified (for the domain they attempt to cover) and incomplete. Programmers tend to ignore the hidden cost of adding an abstraction layer until the cost becomes apparent, by which time it is too late. Application-specific abstractions are usually worth it because they are well-justified: they directly benefit the application by making the code base simpler and removing duplication. Some general abstractions are worth it because the alternative is worse; you wouldn't want to have two versions of SSL-using code, one for regular sockets, and one for SSL, since the whole point of SSL is that you're supposed to use it as if it were sockets behind the scenes. But adding a whole new abstraction layer over something as general as Berkely sockets to facilitate an automated test suite definitely sounds like ignoring the costs of such an abstraction layer. I _am_ thinking that it'd probably be best to forgo the idea of one-to-one correspondence of Berkeley sockets, and pass around a struct net_connector * (and struct net_listener *), so we're not forced to deal with file descriptor silliness (where obviously we'd have wanted to avoid the values 0 through 2, and I was even thinking it might _possibly_ be worthwhile to allocate real file descriptors to get the numbers, just to avoid clashes). I have no idea what file descriptor silliness with values 0-2 you're referring to. :-) I do agree that an application-specific struct is better than a more general abstraction because it is easier to design and more useful to Wget in the long run. This would mean we'd need to separate uses of read() and write() on normal files (which should continue to use the real calls, until we replace them with the file I/O abstractions), from uses of read(), write(), etc on sockets, which would be using our emulated versions. Unless you're willing to spend a lot of time in careful design of these abstractions, I think this is a mistake. Why? Because implementing a file I/O abstraction is much harder and more time-consuming than it sounds. To paraphrase Greenspun, it would appear that every sufficiently large code base contains an ad-hoc, informally-specified, bug-ridden implementation of a streaming layer. There are streaming libraries out there; maybe we should consider using some of them.
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: Yeah. But we're not doing streaming. And you still haven't given much explanation for _why_ it's as hard and time-consuming as you say. Making a claim and demonstrating it are different things, I think. To be clear, I'm not trying to say, I don't believe you; I'm saying, argue the case, please, don't just make assertions. Clearly, you're concerned about something I'm unable to see: help me to see it! If I ignore your warnings, and wind up running headlong into what you saw in the first place, you can't claim you gave fair warning if you didn't provide examples of what I might run into. For my part, I see something which, at least for first cut, I could whip up in a couple of hours (the server emulation and associated state-tracking, of course, would be _quite_ a bit more work). What is it that causes our two perspectives to differ so wildly? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+KfI7M8hyUobTrERAt4YAKCKSfG/1HtV29mm1MSdDyzFuS8lRQCfdVla EIpSSdKhguieVxgYXln+XiQ= =mMj2 -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
2008/4/4, Micah Cowan [EMAIL PROTECTED]: IMO, if it's worth testing, it's probably better to have external linkage anyway. I got it. 1) Select functions which can be tested in unit test. But How can I select them? is difficult. Basically the less dependency the function has, more easy to include in unit test, but about boundary line, I'm not sure. This is precisely the problem, and one reason I've been thinking that this might not make an ideal topic for a GSoC proposal, unless you want to include refactoring existing functions like gethttp and http_loop into more logically discreet sets of functions. Essentially, to get better coverage of the code that needs it the most, that code will need to be rewritten. I believe this can be an iterative process (find one function to factor out, write a unit test for it, make it work...). Yes, since I want to write proposal for Unit testing, I can't skip this problem. But considering GSoC program is only 2 month, I'd rather narrow down the target - to gethttp funcion. Although I'm not well aware of source code, I'm thinking like below: In gethttp function there are roughly six chunk of functionality. 1.preparation of request 2.making header part of HTTP proxy_auth generate_hosthead , and other process to make header 3.connection persistent_available_p establishment of connection to host ssl_connection process 4.http request process request send read request response checking status codes 5.local file - related process ( a bunch of process...) deterimine filename file existence check noclobber, -O check timestamping check Content-Length check Keep-Alive response check Authorize process Set-cookie header Content-Range check filename dealing (HTML Extention) 416 status code dealing open local file 6.download body part writing into local file So, Basically I think it could be divided into these functionality. And after that each functionality would be divided into more small pieces to the extent that unit tests can be done separately. In addition to above, we have to think about abstraction of network API and file I/O API. But network API(such as fd_read_body, fd_read_hunk) exists in retr.c, and socket is opened in connect.c file, it looks that abstraction of network API would require major modification of interfaces. And design of this would not be proper for me, I think. So what I want to suggest is that I want to ask interface _design_. How do you think ? At least I want to narrow down the scope within I can take responsiblity. What I'm most keenly interested in, is the ability to verify the logic of how follow/don't-follow is decided (that actually may not be too hard to write tests against now), how Wget handles various protocol-level situations, how it chooses the filename and deals with the local filesystem, etc. I will be very, _very_ happy when everything that's in http_loop and gethttp is verified by unit tests. But a lot of getting to where we can test that may mean abstracting out things like the Berkeley Sockets API and filesystem interactions, so that we can drop in fake replacements for testing. I'd like to try, if we could settle down the problem of interface design... I'm familiar with a framework called (simply) Check, which might be worth considering. It forks a new process for each test, which isolates it from interfering with the other tests, and also provides a clean way to handle things like segmentation violations or aborts. However, it's intended for Unix, and probably doesn't compile on other systems. http://check.sourceforge.net/ Thank you for your information :) -- Yoshihiro TANAKA
Re: About Automated Unit Test for Wget
2008/4/5, Yoshihiro Tanaka [EMAIL PROTECTED]: 2008/4/4, Micah Cowan [EMAIL PROTECTED]: IMO, if it's worth testing, it's probably better to have external linkage anyway. I got it. 1) Select functions which can be tested in unit test. But How can I select them? is difficult. Basically the less dependency the function has, more easy to include in unit test, but about boundary line, I'm not sure. This is precisely the problem, and one reason I've been thinking that this might not make an ideal topic for a GSoC proposal, unless you want to include refactoring existing functions like gethttp and http_loop into more logically discreet sets of functions. Essentially, to get better coverage of the code that needs it the most, that code will need to be rewritten. I believe this can be an iterative process (find one function to factor out, write a unit test for it, make it work...). Yes, since I want to write proposal for Unit testing, I can't skip this problem. But considering GSoC program is only 2 month, I'd rather narrow down the target - to gethttp funcion. Although I'm not well aware of source code, I'm thinking like below: In gethttp function there are roughly six chunk of functionality. 1.preparation of request 2.making header part of HTTP proxy_auth generate_hosthead , and other process to make header 3.connection persistent_available_p establishment of connection to host ssl_connection process 4.http request process request send read request response checking status codes 5.local file - related process ( a bunch of process...) deterimine filename file existence check noclobber, -O check timestamping check Content-Length check Keep-Alive response check Authorize process Set-cookie header Content-Range check filename dealing (HTML Extention) 416 status code dealing open local file 6.download body part writing into local file So, Basically I think it could be divided into these functionality. And after that each functionality would be divided into more small pieces to the extent that unit tests can be done separately. In addition to above, we have to think about abstraction of network API and file I/O API. But network API(such as fd_read_body, fd_read_hunk) exists in retr.c, and socket is opened in connect.c file, it looks that abstraction of network API would require major modification of interfaces. Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. And design of this would not be proper for me, I think. So what I want to suggest is that I want to ask interface _design_. How do you think ? At least I want to narrow down the scope within I can take responsiblity. What I'm most keenly interested in, is the ability to verify the logic of how follow/don't-follow is decided (that actually may not be too hard to write tests against now), how Wget handles various protocol-level situations, how it chooses the filename and deals with the local filesystem, etc. I will be very, _very_ happy when everything that's in http_loop and gethttp is verified by unit tests. But a lot of getting to where we can test that may mean abstracting out things like the Berkeley Sockets API and filesystem interactions, so that we can drop in fake replacements for testing. I'd like to try, if we could settle down the problem of interface design... I'm familiar with a framework called (simply) Check, which might be worth considering. It forks a new process for each test, which isolates it from interfering with the other tests, and also provides a clean way to handle things like segmentation violations or aborts. However, it's intended for Unix, and probably doesn't compile on other systems. http://check.sourceforge.net/ Thank you for your information :) -- Yoshihiro TANAKA -- Yoshihiro TANAKA
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yoshihiro Tanaka wrote: 2008/4/5, Yoshihiro Tanaka [EMAIL PROTECTED]: Yes, since I want to write proposal for Unit testing, I can't skip this problem. But considering GSoC program is only 2 month, I'd rather narrow down the target - to gethttp funcion. I have a sneaking suspicion that some chunks of functionality that you'd want to farm out in gethttp, also have code-change repurcussions elsewhere (probably http_loop usually). So it may be difficult to restrict yourself to gethttp. :) Probably better to identify the specific chunks of logic that can be farmed out, find out how far-reaching separating those chunks might be, and choose some specific ones to do. You've already identified some areas; I'll comment those when I have a chance to look more closely at the code, for comparison with your remarks. In addition to above, we have to think about abstraction of network API and file I/O API. But network API(such as fd_read_body, fd_read_hunk) exists in retr.c, and socket is opened in connect.c file, it looks that abstraction of network API would require major modification of interfaces. Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) It would be enough to write function-pointers for (say), wg_socket, wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at system socket, connect, etc for real Wget, but at wg_test_socket, wg_test_connect, etc for our emulated servers. This would mean we'd need to separate uses of read() and write() on normal files (which should continue to use the real calls, until we replace them with the file I/O abstractions), from uses of read(), write(), etc on sockets, which would be using our emulated versions. Ideally, we'd replace the use of file descriptor ints with a more opaque mechanism; but that can be done later. If you'd prefer, you might choose to write a proposal focusing on the server emulation, which would easily take up a summer of itself (and then some); particularly when you realize that we would need a file format describing the virtual server's state (what domains and URLs exist, what sort of headers it should respond with to certain requests, etc). If you chose to take on, you'd probably need to settle for a subset of the final expected product. Note that, down the road, we'll want to encapsulate the whole sockets-layer abstraction into an object we'd pass around as an argument (struct net_connector * ?), as we might want to use it to handle SOCKS for some URLs, while using direct connections for others. But that doesn't have to happen right now; once we've got the actual abstraction done it should be pretty easy to move it to an object-based mechanism (just use conn-connect(...) instead of wg_connect(...)). But, if you want to go ahead and do that now, that'd be great too. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH9+7p7M8hyUobTrERApu6AKCENiEExoyTHxDUodnr/AIcRx8BOgCcD89N k6ANTdl+4fgb+4trcADXnO0= =fmya -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
On Sat, 5 Apr 2008, Micah Cowan wrote: Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) Except, you do need listen, accept and bind in a server sense since even if wget is a client I believe it still supports the PORT command for ftp...
Re: About Automated Unit Test for Wget
Micah Cowan [EMAIL PROTECTED] writes: Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) It would be enough to write function-pointers for (say), wg_socket, wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at system socket, connect, etc for real Wget, but at wg_test_socket, wg_test_connect, etc for our emulated servers. This seems like a neat idea, but it should be carefully weighed against the drawbacks. Adding an ad-hoc abstraction layer is harder than it sounds, and has more repercussions than is immediately obvious. An underspecified, unfinished abstraction layer over sockets makes the code harder, not easier, to follow and reason about. You no longer deal with BSD sockets, you deal with an abstraction over them. Is it okay to call getsockname on such a socket? How about setsockopt? What about the listen/bind mechanism (which we do need, as Daniel points out)? This would mean we'd need to separate uses of read() and write() on normal files (which should continue to use the real calls, until we replace them with the file I/O abstractions), from uses of read(), write(), etc on sockets, which would be using our emulated versions. Unless you're willing to spend a lot of time in careful design of these abstractions, I think this is a mistake.
Re: About Automated Unit Test for Wget
On Sat, 5 Apr 2008, Hrvoje Niksic wrote: This would mean we'd need to separate uses of read() and write() on normal files (which should continue to use the real calls, until we replace them with the file I/O abstractions), from uses of read(), write(), etc on sockets, which would be using our emulated versions. Unless you're willing to spend a lot of time in careful design of these abstractions, I think this is a mistake. Related: In the curl project we took a simpler route: we have our own dumb test servers in the test suite to run tests against and we have single files that describe each test case: what the server should respond, what the protocol dump should look like, what output to expect, what return code, etc. Then we have a script that reads the test case description, fires up the correct server(s), verifies all the ouputs (optionally using valgrind). This system allows us to write unit-tests if we'd like to, but mostly so far we've focused to test it system-wide. It is hard enough for us!
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: On Sat, 5 Apr 2008, Micah Cowan wrote: Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) Except, you do need listen, accept and bind in a server sense since even if wget is a client I believe it still supports the PORT command for ftp... Damn FTP... :) Yeah, of course. Sorry, my view of the web tends frequently to be very HTTP-colored. :) (Well, technically, that _is_ the WWW, but anyway...) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+ENm7M8hyUobTrERAlewAJ9W+vriWeVptJWG72Q3F0Njpt9TZgCfeZI4 An3zovMEfIEd1W1o7hqe5q0= =TKsW -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: In the curl project we took a simpler route: we have our own dumb test servers in the test suite to run tests against and we have single files that describe each test case: what the server should respond, what the protocol dump should look like, what output to expect, what return code, etc. Then we have a script that reads the test case description, fires up the correct server(s), verifies all the ouputs (optionally using valgrind). This system allows us to write unit-tests if we'd like to, but mostly so far we've focused to test it system-wide. It is hard enough for us! Yeah, I thought I'd seen something like that; I was thinking we might even be able to appropriate some of that, if that looked doable. Except that I preferred faking the server completely, so I could deal better with cross-site issues, which AFAICT are significantly more important to Wget than they are to Curl. I was thinking, and should have said, that if we go this route, we'd want to focus on high-level tests first. That also has the advantage that if we accidentally change something during the refactoring process (not unlikely), we will notice it, whereas focusing just on unit tests would mean we'd have to change the code to be testable in units _before_ verification. We already _do_ have some spawn-a-server tests code, but much of it needs rewriting, and it still suffers when you bring in the idea of multiple servers. The servers are driven by Perl code, rather than a driver script or description file. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+EZz7M8hyUobTrERAjDxAJ9N3AbEVG6NTy735hy6KtjPO7jm8wCdFX+/ gLx9jZcp0ZQqE2bQAU7VdyQ= =u+PC -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) It would be enough to write function-pointers for (say), wg_socket, wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at system socket, connect, etc for real Wget, but at wg_test_socket, wg_test_connect, etc for our emulated servers. This seems like a neat idea, but it should be carefully weighed against the drawbacks. Adding an ad-hoc abstraction layer is harder than it sounds, and has more repercussions than is immediately obvious. An underspecified, unfinished abstraction layer over sockets makes the code harder, not easier, to follow and reason about. You no longer deal with BSD sockets, you deal with an abstraction over them. Is it okay to call getsockname on such a socket? How about setsockopt? What about the listen/bind mechanism (which we do need, as Daniel points out)? I'm having some trouble seeing how most of those present problems. Obviously, you wouldn't call _any_ system functions on these, so yeah, no setsockopt() unless it's a wg_setsockopt() (a wg_setsockopt would probably be a poor way to handle it anyway, as it'd be mainly true-TCP specific). I don't see what you see wrt making the code harder to follow and reason about (true abstraction rarely does, AFAICT, though there are some counter-examples, usually of things that are much, much more abstract than we are used to thinking about). Did you have some specific concerns? I _am_ thinking that it'd probably be best to forgo the idea of one-to-one correspondence of Berkeley sockets, and pass around a struct net_connector * (and struct net_listener *), so we're not forced to deal with file descriptor silliness (where obviously we'd have wanted to avoid the values 0 through 2, and I was even thinking it might _possibly_ be worthwhile to allocate real file descriptors to get the numbers, just to avoid clashes). Then we can focus on actual abstraction (which we don't obtain by emulating Berkeley sockets), rather than just emulation. While Daniel was of course right that we'd need listen, accept, etc, we _wouldn't_ need them to begin using this layer to test against http.c. We wouldn't even need bind, if we didn't include --bind-address in our first tests of the http code. This would mean we'd need to separate uses of read() and write() on normal files (which should continue to use the real calls, until we replace them with the file I/O abstractions), from uses of read(), write(), etc on sockets, which would be using our emulated versions. Unless you're willing to spend a lot of time in careful design of these abstractions, I think this is a mistake. Why? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+Eoi7M8hyUobTrERAj3VAJ4vb/SPNkNo+Xyd2Hq09U4ey6zJJwCfVmG0 NSVpzr7IEdpUQkTwy/j2z9E= =9lKJ -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yoshihiro Tanaka wrote: Hello, I want to ask about Unit test of Wget in the future. I want to ask about unit test. Now unit test of Wget is written only for following .c files. -- http.c init.c main.c res.c url.c utils.c (test.c) So as written in Wiki, new unit test suite is necessary. (ref. http://wget.addictivecode.org/FeatureSpecifications/Testing) Well, or expansions to the existing one. However, it's my desire (as expressed on that page) that the test code be separated from the .c files containing the code-to-test. This may mean making some functions that are currently static to be externally linked. IMO, if it's worth testing, it's probably better to have external linkage anyway. In order to make new unit test suite, I think following work is necessary. 1) Select functions which can be tested in unit test. But How can I select them? is difficult. Basically the less dependency the function has, more easy to include in unit test, but about boundary line, I'm not sure. This is precisely the problem, and one reason I've been thinking that this might not make an ideal topic for a GSoC proposal, unless you want to include refactoring existing functions like gethttp and http_loop into more logically discreet sets of functions. Essentially, to get better coverage of the code that needs it the most, that code will need to be rewritten. I believe this can be an iterative process (find one function to factor out, write a unit test for it, make it work...). 2) In order to do above 1), How about Making a list of all functions in Wget? and maintain? The advantage of 2) is ... * make clear which function is included in Unit test * make clear which function is _not_ in Unit test * make easy to manage testing * make easy to devide testing work Hm... I'm not sure that the benefits are worth the effort. If we _really_ wanted this, I'd propose that we use a naming convention (or processed comment, etc) for the unit test functions so that the list of functions that are covered can be determined automatically by a program; the ones that aren't covered would be any functions remaining. However, I personally wouldn't find this useful. I don't intend that every function in existence has to have a unit test covering it. Some functions will have already been tested through the exercise of higher-level calling functions (in which case they should probably have internal linkage); others may have been tested through the exercise of the functions it calls. What I'm most keenly interested in, is the ability to verify the logic of how follow/don't-follow is decided (that actually may not be too hard to write tests against now), how Wget handles various protocol-level situations, how it chooses the filename and deals with the local filesystem, etc. I will be very, _very_ happy when everything that's in http_loop and gethttp is verified by unit tests. But a lot of getting to where we can test that may mean abstracting out things like the Berkeley Sockets API and filesystem interactions, so that we can drop in fake replacements for testing. (test tools, other preliminary work for unit test, how manage it ...) There is an incredibly basic test framework, completely defined in src/test.h. See src/test.c for how it is being used. I'm familiar with a framework called (simply) Check, which might be worth considering. It forks a new process for each test, which isolates it from interfering with the other tests, and also provides a clean way to handle things like segmentation violations or aborts. However, it's intended for Unix, and probably doesn't compile on other systems. http://check.sourceforge.net/ - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH9d0Q7M8hyUobTrERApdnAJ905n4j0oglUHekP6MJaE4dBCEw+QCeL4RE 0lnwnZgHQjSEom4f1MfiviM= =UejZ -END PGP SIGNATURE-