Re: About Automated Unit Test for Wget

2008-04-06 Thread Yoshihiro Tanaka
2008/4/5, Micah Cowan [EMAIL PROTECTED]:
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  Daniel Stenberg wrote:
   This system allows us to write unit-tests if we'd like to, but mostly so
   far we've focused to test it system-wide. It is hard enough for us!


 Yeah, I thought I'd seen something like that; I was thinking we might
  even be able to appropriate some of that, if that looked doable. Except
  that I preferred faking the server completely, so I could deal better
  with cross-site issues, which AFAICT are significantly more important to
  Wget than they are to Curl.


It seems that abstraction of network API needs more discussion,
so I would focus on the server emulation

By the way, How about using LD_PRELOAD ?
I tested a little and it seems to be working. If we use this, we can test
by overriding socket interface, and still we don't change wget real source
code.

--main.c --
#include stdio.h


int main(void)
{

puts(Helow Wgets\n);
return 0;

}



--testputs.c 
#include stdio.h

int puts(const char *str)
{
   while(*str)
putchar(*str++);
   printf(This is a test module);
   putchar('\n');
}
-


--Compile like below:

[EMAIL PROTECTED] Test]$ gcc main.c -o main
[EMAIL PROTECTED] Test]$ gcc -fPIC -shared -o testputs.so testputs.c



--Execute like below:

[EMAIL PROTECTED] Test]$ ./main
Helow Wgets

[EMAIL PROTECTED] Test]$ LD_PRELOAD=./testputs.so ./main
Helow Wgets
This is a test module


--
I found this way on the net, and sample was using wget !! they are overriding
socket, close, and connect.
http://www.t-dori.net/forensics/hook_tcp.cpp

-- 
Yoshihiro TANAKA


Re: About Automated Unit Test for Wget

2008-04-06 Thread Yoshihiro Tanaka
2008/4/5, Micah Cowan [EMAIL PROTECTED]:
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  Daniel Stenberg wrote:
   This system allows us to write unit-tests if we'd like to, but mostly so
   far we've focused to test it system-wide. It is hard enough for us!


 Yeah, I thought I'd seen something like that; I was thinking we might
  even be able to appropriate some of that, if that looked doable. Except
  that I preferred faking the server completely, so I could deal better
  with cross-site issues, which AFAICT are significantly more important to
  Wget than they are to Curl.


It seems that abstraction of network API needs more discussion,
so I would focus on the server emulation

By the way, How about using LD_PRELOAD ?
I tested a little and it seems to be working. If we use this, we can test
by overriding socket interface, and still we don't change wget real source
code.

I found this way on the net, and sample was using wget !! they are overriding
socket, close, connect.


--main.c --
#include stdio.h


int main(void)
{

puts(Helow Wgets\n);
return 0;

}



--testputs.c 
#include stdio.h

int puts(const char *str)
{
   while(*str)
putchar(*str++);
   printf(This is a test module);
   putchar('\n');
}
-


--Compile like below:

[EMAIL PROTECTED] Test]$ gcc main.c -o main
[EMAIL PROTECTED] Test]$ gcc -fPIC -shared -o testputs.so testputs.c



--Execute like below:

[EMAIL PROTECTED] Test]$ ./main
Helow Wgets

[EMAIL PROTECTED] Test]$ LD_PRELOAD=./testputs.so ./main
Helow Wgets
This is a test module



-- 
Yoshihiro TANAKA
SFSU CS Department


Re: About Automated Unit Test for Wget

2008-04-06 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 I don't see what you see wrt making the code harder to follow and reason
 about (true abstraction rarely does, AFAICT,

I was referring to the fact that adding an abstraction layer requires
learning about the abstraction layer, both its concepts and its
implementation, including its quirks and limitations.  Too general
abstractions added to application software are typically to be
underspecified (for the domain they attempt to cover) and incomplete.
Programmers tend to ignore the hidden cost of adding an abstraction
layer until the cost becomes apparent, by which time it is too late.

Application-specific abstractions are usually worth it because they
are well-justified: they directly benefit the application by making
the code base simpler and removing duplication.  Some general
abstractions are worth it because the alternative is worse; you
wouldn't want to have two versions of SSL-using code, one for regular
sockets, and one for SSL, since the whole point of SSL is that you're
supposed to use it as if it were sockets behind the scenes.  But
adding a whole new abstraction layer over something as general as
Berkely sockets to facilitate an automated test suite definitely
sounds like ignoring the costs of such an abstraction layer.

 I _am_ thinking that it'd probably be best to forgo the idea of
 one-to-one correspondence of Berkeley sockets, and pass around a struct
 net_connector * (and struct net_listener *), so we're not forced to
 deal with file descriptor silliness (where obviously we'd have wanted to
 avoid the values 0 through 2, and I was even thinking it might
 _possibly_ be worthwhile to allocate real file descriptors to get the
 numbers, just to avoid clashes).

I have no idea what file descriptor silliness with values 0-2 you're
referring to.  :-)  I do agree that an application-specific struct is
better than a more general abstraction because it is easier to design
and more useful to Wget in the long run.

 This would mean we'd need to separate uses of read() and write() on
 normal files (which should continue to use the real calls, until we
 replace them with the file I/O abstractions), from uses of read(),
 write(), etc on sockets, which would be using our emulated versions.
 
 Unless you're willing to spend a lot of time in careful design of
 these abstractions, I think this is a mistake.

 Why?

Because implementing a file I/O abstraction is much harder and more
time-consuming than it sounds.  To paraphrase Greenspun, it would
appear that every sufficiently large code base contains an ad-hoc,
informally-specified, bug-ridden implementation of a streaming layer.
There are streaming libraries out there; maybe we should consider
using some of them.


Re: About Automated Unit Test for Wget

2008-04-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah Cowan wrote:
 Yeah. But we're not doing streaming. And you still haven't given much
 explanation for _why_ it's as hard and time-consuming as you say. Making
 a claim and demonstrating it are different things, I think.

To be clear, I'm not trying to say, I don't believe you; I'm saying,
argue the case, please, don't just make assertions. Clearly, you're
concerned about something I'm unable to see: help me to see it! If I
ignore your warnings, and wind up running headlong into what you saw in
the first place, you can't claim you gave fair warning if you didn't
provide examples of what I might run into.

For my part, I see something which, at least for first cut, I could whip
up in a couple of hours (the server emulation and associated
state-tracking, of course, would be _quite_ a bit more work). What is it
that causes our two perspectives to differ so wildly?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+KfI7M8hyUobTrERAt4YAKCKSfG/1HtV29mm1MSdDyzFuS8lRQCfdVla
EIpSSdKhguieVxgYXln+XiQ=
=mMj2
-END PGP SIGNATURE-


Re: About Automated Unit Test for Wget

2008-04-05 Thread Yoshihiro Tanaka
2008/4/4, Micah Cowan [EMAIL PROTECTED]:

  IMO, if it's worth testing, it's probably better to have external
  linkage anyway.

I got it.


1) Select functions which can be tested in unit test.
   But How can I select them? is difficult.
   Basically the less dependency the function has, more easy to
   include in unit test, but about boundary line, I'm not sure.


 This is precisely the problem, and one reason I've been thinking that
  this might not make an ideal topic for a GSoC proposal, unless you want
  to include refactoring existing functions like gethttp and http_loop
  into more logically discreet sets of functions. Essentially, to get
  better coverage of the code that needs it the most, that code will need
  to be rewritten. I believe this can be an iterative process (find one
  function to factor out, write a unit test for it, make it work...).

Yes, since I want to write proposal for Unit testing, I can't skip this
problem. But considering GSoC program is only 2 month, I'd rather narrow
down the target - to gethttp funcion.

Although I'm not well aware of source code,
I'm thinking like below:

In gethttp function there are roughly six chunk of functionality.

1.preparation of request

2.making header part of HTTP
   proxy_auth
   generate_hosthead
   , and other process to make header

3.connection
   persistent_available_p
   establishment of connection to host
   ssl_connection process

4.http request process
   request send
   read request response
   checking status codes

5.local file - related process ( a bunch of process...)
   deterimine filename
   file existence check
   noclobber, -O check
   timestamping check
   Content-Length check
   Keep-Alive response check
   Authorize process
   Set-cookie header
   Content-Range check
   filename dealing (HTML Extention)
   416 status code dealing
   open local file

6.download body part  writing into local file



So, Basically I think it could be divided into these functionality.
And after that each functionality would be divided into more
small pieces to the extent that unit tests can be done separately.

In addition to above, we have to think about abstraction of
network API and file I/O API.

But network API(such as fd_read_body, fd_read_hunk) exists in
 retr.c, and socket is opened in connect.c file, it looks that
abstraction of network API would require major modification of
interfaces.

And design of this would not be proper for me, I think.
So what I want to suggest is that I want to ask interface _design_.
How do you think ? At least I want to narrow down the scope within
I can take responsiblity.


  What I'm most keenly interested in, is the ability to verify the logic
  of how follow/don't-follow is decided (that actually may not be too hard
  to write tests against now), how Wget handles various protocol-level
  situations, how it chooses the filename and deals with the local
  filesystem, etc. I will be very, _very_ happy when everything that's in
  http_loop and gethttp is verified by unit tests.

  But a lot of getting to where we can test that may mean abstracting out
  things like the Berkeley Sockets API and filesystem interactions, so
  that we can drop in fake replacements for testing.


I'd like to try, if we could settle down the problem of interface design...


  I'm familiar with a framework called (simply) Check, which might be
  worth considering. It forks a new process for each test, which isolates
  it from interfering with the other tests, and also provides a clean way
  to handle things like segmentation violations or aborts. However, it's
  intended for Unix, and probably doesn't compile on other systems.

  http://check.sourceforge.net/

Thank you for your information :)


-- 
Yoshihiro TANAKA


Re: About Automated Unit Test for Wget

2008-04-05 Thread Yoshihiro Tanaka
2008/4/5, Yoshihiro Tanaka [EMAIL PROTECTED]:
 2008/4/4, Micah Cowan [EMAIL PROTECTED]:

 
IMO, if it's worth testing, it's probably better to have external
linkage anyway.


 I got it.



  1) Select functions which can be tested in unit test.
 But How can I select them? is difficult.
 Basically the less dependency the function has, more easy to
 include in unit test, but about boundary line, I'm not sure.
  
  
   This is precisely the problem, and one reason I've been thinking that
this might not make an ideal topic for a GSoC proposal, unless you want
to include refactoring existing functions like gethttp and http_loop
into more logically discreet sets of functions. Essentially, to get
better coverage of the code that needs it the most, that code will need
to be rewritten. I believe this can be an iterative process (find one
function to factor out, write a unit test for it, make it work...).


 Yes, since I want to write proposal for Unit testing, I can't skip this
  problem. But considering GSoC program is only 2 month, I'd rather narrow
  down the target - to gethttp funcion.

  Although I'm not well aware of source code,
  I'm thinking like below:

  In gethttp function there are roughly six chunk of functionality.

  1.preparation of request

  2.making header part of HTTP
proxy_auth
generate_hosthead
, and other process to make header

  3.connection
persistent_available_p
establishment of connection to host
ssl_connection process

  4.http request process
request send
read request response
checking status codes

  5.local file - related process ( a bunch of process...)
deterimine filename
file existence check
noclobber, -O check
timestamping check
Content-Length check
Keep-Alive response check
Authorize process
Set-cookie header
Content-Range check
filename dealing (HTML Extention)
416 status code dealing
open local file

  6.download body part  writing into local file



  So, Basically I think it could be divided into these functionality.
  And after that each functionality would be divided into more
  small pieces to the extent that unit tests can be done separately.

  In addition to above, we have to think about abstraction of
  network API and file I/O API.

  But network API(such as fd_read_body, fd_read_hunk) exists in
   retr.c, and socket is opened in connect.c file, it looks that
  abstraction of network API would require major modification of
  interfaces.

Or did you mean to write wget version of socket interface?
i.e. to write our version of socket, connect,write,read,close,bind,
listen,accept,,,? sorry I'm confused.



  And design of this would not be proper for me, I think.
  So what I want to suggest is that I want to ask interface _design_.
  How do you think ? At least I want to narrow down the scope within
  I can take responsiblity.



What I'm most keenly interested in, is the ability to verify the logic
of how follow/don't-follow is decided (that actually may not be too hard
to write tests against now), how Wget handles various protocol-level
situations, how it chooses the filename and deals with the local
filesystem, etc. I will be very, _very_ happy when everything that's in
http_loop and gethttp is verified by unit tests.
  
But a lot of getting to where we can test that may mean abstracting out
things like the Berkeley Sockets API and filesystem interactions, so
that we can drop in fake replacements for testing.
  


 I'd like to try, if we could settle down the problem of interface design...



I'm familiar with a framework called (simply) Check, which might be
worth considering. It forks a new process for each test, which isolates
it from interfering with the other tests, and also provides a clean way
to handle things like segmentation violations or aborts. However, it's
intended for Unix, and probably doesn't compile on other systems.
  
http://check.sourceforge.net/


 Thank you for your information :)


  --

 Yoshihiro TANAKA

-- 
Yoshihiro TANAKA


Re: About Automated Unit Test for Wget

2008-04-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yoshihiro Tanaka wrote:
 2008/4/5, Yoshihiro Tanaka [EMAIL PROTECTED]:

 Yes, since I want to write proposal for Unit testing, I can't skip this
  problem. But considering GSoC program is only 2 month, I'd rather narrow
  down the target - to gethttp funcion.

I have a sneaking suspicion that some chunks of functionality that you'd
want to farm out in gethttp, also have code-change repurcussions
elsewhere (probably http_loop usually). So it may be difficult to
restrict yourself to gethttp. :)

Probably better to identify the specific chunks of logic that can be
farmed out, find out how far-reaching separating those chunks might be,
and choose some specific ones to do.

You've already identified some areas; I'll comment those when I have a
chance to look more closely at the code, for comparison with your remarks.

  In addition to above, we have to think about abstraction of
  network API and file I/O API.

  But network API(such as fd_read_body, fd_read_hunk) exists in
   retr.c, and socket is opened in connect.c file, it looks that
  abstraction of network API would require major modification of
  interfaces.
 
 Or did you mean to write wget version of socket interface?
 i.e. to write our version of socket, connect,write,read,close,bind,
 listen,accept,,,? sorry I'm confused.

Yes! That's what I meant. (Except, we don't need listen, accept; and we
only need bind to support --bind-address. We're a client, not a server. ;) )

It would be enough to write function-pointers for (say), wg_socket,
wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at
system socket, connect, etc for real Wget, but at wg_test_socket,
wg_test_connect, etc for our emulated servers.

This would mean we'd need to separate uses of read() and write() on
normal files (which should continue to use the real calls, until we
replace them with the file I/O abstractions), from uses of read(),
write(), etc on sockets, which would be using our emulated versions.

Ideally, we'd replace the use of file descriptor ints with a more opaque
mechanism; but that can be done later.

If you'd prefer, you might choose to write a proposal focusing on the
server emulation, which would easily take up a summer of itself (and
then some); particularly when you realize that we would need a file
format describing the virtual server's state (what domains and URLs
exist, what sort of headers it should respond with to certain requests,
etc). If you chose to take on, you'd probably need to settle for a
subset of the final expected product.

Note that, down the road, we'll want to encapsulate the whole
sockets-layer abstraction into an object we'd pass around as an argument
(struct net_connector * ?), as we might want to use it to handle SOCKS
for some URLs, while using direct connections for others. But that
doesn't have to happen right now; once we've got the actual abstraction
done it should be pretty easy to move it to an object-based mechanism
(just use conn-connect(...) instead of wg_connect(...)). But, if you
want to go ahead and do that now, that'd be great too.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH9+7p7M8hyUobTrERApu6AKCENiEExoyTHxDUodnr/AIcRx8BOgCcD89N
k6ANTdl+4fgb+4trcADXnO0=
=fmya
-END PGP SIGNATURE-


Re: About Automated Unit Test for Wget

2008-04-05 Thread Daniel Stenberg

On Sat, 5 Apr 2008, Micah Cowan wrote:

Or did you mean to write wget version of socket interface? i.e. to write 
our version of socket, connect,write,read,close,bind, listen,accept,,,? 
sorry I'm confused.


Yes! That's what I meant. (Except, we don't need listen, accept; and we only 
need bind to support --bind-address. We're a client, not a server. ;) )


Except, you do need listen, accept and bind in a server sense since even if 
wget is a client I believe it still supports the PORT command for ftp...




Re: About Automated Unit Test for Wget

2008-04-05 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 Or did you mean to write wget version of socket interface?  i.e. to
 write our version of socket, connect,write,read,close,bind,
 listen,accept,,,? sorry I'm confused.

 Yes! That's what I meant. (Except, we don't need listen, accept; and
 we only need bind to support --bind-address. We're a client, not a
 server. ;) )

 It would be enough to write function-pointers for (say), wg_socket,
 wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at
 system socket, connect, etc for real Wget, but at wg_test_socket,
 wg_test_connect, etc for our emulated servers.

This seems like a neat idea, but it should be carefully weighed
against the drawbacks.  Adding an ad-hoc abstraction layer is harder
than it sounds, and has more repercussions than is immediately
obvious.  An underspecified, unfinished abstraction layer over sockets
makes the code harder, not easier, to follow and reason about.  You no
longer deal with BSD sockets, you deal with an abstraction over them.
Is it okay to call getsockname on such a socket?  How about
setsockopt?  What about the listen/bind mechanism (which we do need,
as Daniel points out)?

 This would mean we'd need to separate uses of read() and write() on
 normal files (which should continue to use the real calls, until we
 replace them with the file I/O abstractions), from uses of read(),
 write(), etc on sockets, which would be using our emulated versions.

Unless you're willing to spend a lot of time in careful design of
these abstractions, I think this is a mistake.


Re: About Automated Unit Test for Wget

2008-04-05 Thread Daniel Stenberg

On Sat, 5 Apr 2008, Hrvoje Niksic wrote:

This would mean we'd need to separate uses of read() and write() on normal 
files (which should continue to use the real calls, until we replace them 
with the file I/O abstractions), from uses of read(), write(), etc on 
sockets, which would be using our emulated versions.


Unless you're willing to spend a lot of time in careful design of these 
abstractions, I think this is a mistake.


Related:

In the curl project we took a simpler route: we have our own dumb test servers 
in the test suite to run tests against and we have single files that describe 
each test case: what the server should respond, what the protocol dump should 
look like, what output to expect, what return code, etc. Then we have a script 
that reads the test case description, fires up the correct server(s), verifies

all the ouputs (optionally using valgrind).

This system allows us to write unit-tests if we'd like to, but mostly so far 
we've focused to test it system-wide. It is hard enough for us!


Re: About Automated Unit Test for Wget

2008-04-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Daniel Stenberg wrote:
 On Sat, 5 Apr 2008, Micah Cowan wrote:
 
 Or did you mean to write wget version of socket interface? i.e. to
 write our version of socket, connect,write,read,close,bind,
 listen,accept,,,? sorry I'm confused.

 Yes! That's what I meant. (Except, we don't need listen, accept; and
 we only need bind to support --bind-address. We're a client, not a
 server. ;) )
 
 Except, you do need listen, accept and bind in a server sense since even
 if wget is a client I believe it still supports the PORT command for ftp...

Damn FTP... :)

Yeah, of course. Sorry, my view of the web tends frequently to be very
HTTP-colored. :)

(Well, technically, that _is_ the WWW, but anyway...)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+ENm7M8hyUobTrERAlewAJ9W+vriWeVptJWG72Q3F0Njpt9TZgCfeZI4
An3zovMEfIEd1W1o7hqe5q0=
=TKsW
-END PGP SIGNATURE-


Re: About Automated Unit Test for Wget

2008-04-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Daniel Stenberg wrote:
 In the curl project we took a simpler route: we have our own dumb test
 servers in the test suite to run tests against and we have single files
 that describe each test case: what the server should respond, what the
 protocol dump should look like, what output to expect, what return code,
 etc. Then we have a script that reads the test case description, fires
 up the correct server(s), verifies
 all the ouputs (optionally using valgrind).
 
 This system allows us to write unit-tests if we'd like to, but mostly so
 far we've focused to test it system-wide. It is hard enough for us!

Yeah, I thought I'd seen something like that; I was thinking we might
even be able to appropriate some of that, if that looked doable. Except
that I preferred faking the server completely, so I could deal better
with cross-site issues, which AFAICT are significantly more important to
Wget than they are to Curl.

I was thinking, and should have said, that if we go this route, we'd
want to focus on high-level tests first. That also has the advantage
that if we accidentally change something during the refactoring process
(not unlikely), we will notice it, whereas focusing just on unit tests
would mean we'd have to change the code to be testable in units _before_
verification.

We already _do_ have some spawn-a-server tests code, but much of it
needs rewriting, and it still suffers when you bring in the idea of
multiple servers. The servers are driven by Perl code, rather than a
driver script or description file.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+EZz7M8hyUobTrERAjDxAJ9N3AbEVG6NTy735hy6KtjPO7jm8wCdFX+/
gLx9jZcp0ZQqE2bQAU7VdyQ=
=u+PC
-END PGP SIGNATURE-


Re: About Automated Unit Test for Wget

2008-04-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hrvoje Niksic wrote:
 Micah Cowan [EMAIL PROTECTED] writes:
 
 Or did you mean to write wget version of socket interface?  i.e. to
 write our version of socket, connect,write,read,close,bind,
 listen,accept,,,? sorry I'm confused.
 Yes! That's what I meant. (Except, we don't need listen, accept; and
 we only need bind to support --bind-address. We're a client, not a
 server. ;) )

 It would be enough to write function-pointers for (say), wg_socket,
 wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at
 system socket, connect, etc for real Wget, but at wg_test_socket,
 wg_test_connect, etc for our emulated servers.
 
 This seems like a neat idea, but it should be carefully weighed
 against the drawbacks.  Adding an ad-hoc abstraction layer is harder
 than it sounds, and has more repercussions than is immediately
 obvious.  An underspecified, unfinished abstraction layer over sockets
 makes the code harder, not easier, to follow and reason about.  You no
 longer deal with BSD sockets, you deal with an abstraction over them.
 Is it okay to call getsockname on such a socket?  How about
 setsockopt?  What about the listen/bind mechanism (which we do need,
 as Daniel points out)?

I'm having some trouble seeing how most of those present problems.
Obviously, you wouldn't call _any_ system functions on these, so yeah,
no setsockopt() unless it's a wg_setsockopt() (a wg_setsockopt would
probably be a poor way to handle it anyway, as it'd be mainly true-TCP
specific).

I don't see what you see wrt making the code harder to follow and reason
about (true abstraction rarely does, AFAICT, though there are some
counter-examples, usually of things that are much, much more abstract
than we are used to thinking about). Did you have some specific concerns?

I _am_ thinking that it'd probably be best to forgo the idea of
one-to-one correspondence of Berkeley sockets, and pass around a struct
net_connector * (and struct net_listener *), so we're not forced to
deal with file descriptor silliness (where obviously we'd have wanted to
avoid the values 0 through 2, and I was even thinking it might
_possibly_ be worthwhile to allocate real file descriptors to get the
numbers, just to avoid clashes). Then we can focus on actual abstraction
(which we don't obtain by emulating Berkeley sockets), rather than just
emulation.

While Daniel was of course right that we'd need listen, accept, etc, we
_wouldn't_ need them to begin using this layer to test against http.c.
We wouldn't even need bind, if we didn't include --bind-address in our
first tests of the http code.

 This would mean we'd need to separate uses of read() and write() on
 normal files (which should continue to use the real calls, until we
 replace them with the file I/O abstractions), from uses of read(),
 write(), etc on sockets, which would be using our emulated versions.
 
 Unless you're willing to spend a lot of time in careful design of
 these abstractions, I think this is a mistake.

Why?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+Eoi7M8hyUobTrERAj3VAJ4vb/SPNkNo+Xyd2Hq09U4ey6zJJwCfVmG0
NSVpzr7IEdpUQkTwy/j2z9E=
=9lKJ
-END PGP SIGNATURE-


Re: About Automated Unit Test for Wget

2008-04-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yoshihiro Tanaka wrote:
 Hello, I want to ask about Unit test of Wget in the future.
 I want to ask about unit test.
 
 Now unit test of Wget is written only for following .c files.
  -- http.c init.c main.c res.c url.c utils.c (test.c)
 
 So as written in Wiki, new unit test suite is necessary.
(ref. http://wget.addictivecode.org/FeatureSpecifications/Testing)

Well, or expansions to the existing one.

However, it's my desire (as expressed on that page) that the test code
be separated from the .c files containing the code-to-test. This may
mean making some functions that are currently static to be externally
linked.

IMO, if it's worth testing, it's probably better to have external
linkage anyway.

 In order to make new unit test suite, I think following work is necessary.
 
  1) Select functions which can be tested in unit test.
 But How can I select them? is difficult.
 Basically the less dependency the function has, more easy to
 include in unit test, but about boundary line, I'm not sure.

This is precisely the problem, and one reason I've been thinking that
this might not make an ideal topic for a GSoC proposal, unless you want
to include refactoring existing functions like gethttp and http_loop
into more logically discreet sets of functions. Essentially, to get
better coverage of the code that needs it the most, that code will need
to be rewritten. I believe this can be an iterative process (find one
function to factor out, write a unit test for it, make it work...).

  2) In order to do above 1), How about Making a list of all functions
 in Wget? and maintain?
 
 The advantage of 2) is ...
 * make clear which function is included in Unit test
 * make clear which function is _not_ in Unit test
 * make easy to manage testing
 * make easy to devide testing work

Hm... I'm not sure that the benefits are worth the effort.

If we _really_ wanted this, I'd propose that we use a naming convention
(or processed comment, etc) for the unit test functions so that the list
of functions that are covered can be determined automatically by a
program; the ones that aren't covered would be any functions remaining.

However, I personally wouldn't find this useful. I don't intend that
every function in existence has to have a unit test covering it. Some
functions will have already been tested through the exercise of
higher-level calling functions (in which case they should probably have
internal linkage); others may have been tested through the exercise of
the functions it calls.

What I'm most keenly interested in, is the ability to verify the logic
of how follow/don't-follow is decided (that actually may not be too hard
to write tests against now), how Wget handles various protocol-level
situations, how it chooses the filename and deals with the local
filesystem, etc. I will be very, _very_ happy when everything that's in
http_loop and gethttp is verified by unit tests.

But a lot of getting to where we can test that may mean abstracting out
things like the Berkeley Sockets API and filesystem interactions, so
that we can drop in fake replacements for testing.

 (test tools, other preliminary work for unit test, how manage it ...)

There is an incredibly basic test framework, completely defined in
src/test.h. See src/test.c for how it is being used.

I'm familiar with a framework called (simply) Check, which might be
worth considering. It forks a new process for each test, which isolates
it from interfering with the other tests, and also provides a clean way
to handle things like segmentation violations or aborts. However, it's
intended for Unix, and probably doesn't compile on other systems.

http://check.sourceforge.net/

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH9d0Q7M8hyUobTrERApdnAJ905n4j0oglUHekP6MJaE4dBCEw+QCeL4RE
0lnwnZgHQjSEom4f1MfiviM=
=UejZ
-END PGP SIGNATURE-