Re: RFD: Built-in testing

2009-01-28 Thread Martin D Kealey

On Wed, 21 Jan 2009, Damian Conway wrote:
> > Maybe something in all caps. For what it's worth, :OK<> can be typed
> > with one hand while the other holds down the shift key. :)
>
> Typical right-hander fascism!

On the other hands we have :QA ... which also so happens to be an apposite
abbreviation. :-)

-Martin


Re: RFD: Built-in testing

2009-01-24 Thread dpuu
On Jan 23, 8:59 pm, jswit...@gmail.com (Jason Switzer) wrote:

> That sounds useful on the surface but often turns out to be more difficult
> to do than you might think. There are many cases where tests are performed
> from within loops. Something like S09.237 may or may not be in a loop, may
> be difficult to identify in files with many tests.

There are at least two reasons to identify a test (or check): to
control it from afar, and to track it's results.

If the reason for wanting identity is to control it (e.g.
Foo::Bar::Test.disable()), then the fact that it's in a loop
isn't necessarily important: if you want to disable it, then you
probably want to disable all iterations. If we do want finer grain
control, then it is probably possible to do something with resumable
exceptions that are thrown each time the test is potentially skipped.

If the reason to identifying a check is to track its result, then the
obvious solution is to not assume that it's result is pass/fail, but
is instead a pair of pass/fail counts (or pass/total -- same thing). A
good testing approach is "directed-random", where the same test is run
multiple times with different random seeds so as to use different test
data. IMO, it is reasonable to think of a one-shot test as an
aberration.



Re: RFD: Built-in testing

2009-01-24 Thread Will Coleda
On Fri, Jan 23, 2009 at 4:08 PM, jerry gay  wrote:
> On Fri, Jan 23, 2009 at 12:37, Dave Whipp  wrote:
>> I could also imagine writing code that reads from an Sqlite database, and
>> imposes that info onto the test. Whatever mechanism is used, I think we need
>> a language-defined mechanism to supply a stable unique identifier for each
>> test, so that it can be individually tracked and manipulated. Perhaps "is
>> only" is the wrong way to implement the action-at-a-distance, but it does
>> seem better (IMO) than a preprocessor.
>>
> i don't understand the drive to have unique test identifiers. we don't
> have unique identifiers for every code statement, or every bit of
> documentation. why are tests so important/special/different that each
> warrants a unique id? that aside, this functionality sounds like it
> can be encapsulated in a module, if desired. as it stands, i can't see
> a reason reason it *has to* be made available in the core.

Unique test identifiers are helpful because you can then track the
progress of a specific test across platforms or revisions.

> as a recap, the discussions larry, patrick, moritz and i (and others,
> i'm sure) had on this topic long ago led to agreement that the most
> important characteristics for a portable specification test suite
> were:
>
> ~ the tests should be organized in such a way that it makes it easy to
> figure out to what bit of spec is under scrutiny
>  (addressed by directory/filename standardization and smartlinks)
> ~ the test files mustn't be cluttered with code that implementations need 
> ignore
>  (comments are used, which are by default ignored, and can be
> preprocessed to customize the test for each implementation)
> ~ the skip/todo markers should be as close to the relevant tests as
> possible, so they're less likely to fall out-of-sync
>  (the markers are in comments in the test file, directly above the tests)
>
> it's my view that spec tests should be easy to maintain for developers
> of multiple implementations, and uniqueness is an overly burdensome
> constraint.

A simple algorithm (used by tcl's spec tests) is to have each named
test correspond roughly to the name of the file (which in turn
corresponds roughly to the name of the feature being tested), and then
increment vaguely numerically. e.g:

dict-1.1
dict-2.1
dict-2.2
dict-2.3

Then, if they have to add a test in a future revision, then can insert
it between dict-2.1 and dict-2.2, call it dict-2.1-a, and still know
that dict-2.2 is testing the same code, regardless of when that test
was run.

Regards.

-- 
Will "Coke" Coleda


Re: RFD: Built-in testing

2009-01-24 Thread Ovid
- Original Message 

> From: jerry gay 

> i don't understand the drive to have unique test identifiers. we don't
> have unique identifiers for every code statement, or every bit of
> documentation. why are tests so important/special/different that each
> warrants a unique id?

 
Actually, if code is well-written, we *do* sort of have unique identifiers.  
"Bob, you need to change &Customer::name to also show the middle initial".  We 
don't really have anything like that in tests unless we move close to the xUnit 
style.  TAP has no concept of this.

Unique identifiers are useful in that they can let you track changes over time 
(many of us use source control history to understand changes over time for 
code).  It would be very useful to have unique identifiers to persist to a db 
and create graphs of one's test suite behavior ("hey, we keep failing out 
credit card tests. We should look into this more carefully!").

Cheers,
Ovid
--
Buy the book - http://www.oreilly.com/catalog/perlhks/
Tech blog- http://use.perl.org/~Ovid/journal/
Twitter  - http://twitter.com/OvidPerl
Official Perl 6 Wiki - http://www.perlfoundation.org/perl6


Re: RFD: Built-in testing

2009-01-23 Thread Brandon S. Allbery KF8NH

On 2009 Jan 21, at 7:35, Carl Mäsak wrote:

Moritz (>):

So Larry and Patrick developed the idea of creating an
adverb on the test operator instead:

  $x == 1e5   :ok('the :ok makes this is a test');


I'm trying to explain to myself why I don't like this idea at all. I'm
only partially successful. Other people seem to have no problem with


I'm having SNOBOL flashbacks.  That's quite enough to put me off of it.

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH




Re: RFD: Built-in testing

2009-01-23 Thread jason switzer
On Fri, Jan 23, 2009 at 6:39 PM, Dave Whipp  wrote:

> A spec-test is (or should be) different from an ad-hoc test. I want to be
> able to say "test S09.237 passes on pugs but not on Rakudo" (perhaps with a
> nicer name). Unique identifiers  allow comparisons of specific tests across
> multiple implementations, and over time. It is possible to derive IDs using
> line numbers (perhaps block-relative), but that's only a good idea if the
> test suite is reasonably stable (and it requires tool support).
>

That sounds useful on the surface but often turns out to be more difficult
to do than you might think. There are many cases where tests are performed
from within loops. Something like S09.237 may or may not be in a loop, may
be difficult to identify in files with many tests. This sort of test name
could be the test message output by Test.pm's verbose output, but it then
makes the verbose output virtually useless in that Test.pm could just keep
records of the test numbers instead. There can also be multiple tests per
single line of code, especially if provided as an adverb, such as :ok

Test labels seems like an aspect that is highly susceptible to bit-rot due
to the ever evolving nature. Given the multitude of things that can go wrong
trying to keep records, it might not be a good idea to focus on this.
Rather, it might be a good idea to have the language provide a base test and
a means to extend the test. This would allow for the tests previously
written to transparently change the back-end testing mechanism.

Here's a very crude example. Lets say that ok() is defined by the Core (and
thus the language):

multi sub ok(Bool $test, Str $msg) { if $test { say "ok $msg" } else { say
"not ok $msg" } }

Then let's say I don't want the default (psuedo)-tap test output, I could
redefine what ok() does:

multi sub ok(Bool $test) { say "A test has failed at some point somewhere"
if $test }

ok(?($x == 4), "no good has come of this"); #calls Core's ok()
ok(?($x == 2)); #calls my crappy ok()

That's just an example to show that the language could provide a basic
version that is extensible with various implementations and various
compilers such that I don't have to write constantly unique test names (or
poorly identified names) and still only have to write a test once.

-Jason "s1n" Switzer


Re: RFD: Built-in testing

2009-01-23 Thread Dave Whipp

jerry gay wrote:


i don't understand the drive to have unique test identifiers. we don't
have unique identifiers for every code statement, or every bit of
documentation. why are tests so important/special/different that each
warrants a unique id? that aside, this functionality sounds like it
can be encapsulated in a module, if desired. as it stands, i can't see
a reason reason it *has to* be made available in the core.


I have a mental model that says, for each implementation, there is a 
mapping that tells us which tests are runnable, non-runnable, etc. 
Imposing such information from without is difficult (or fuzzy) if tests 
aren't identifiable (giving a name to a group of tests allows a whole 
group to be enabled/disabled as one).


I'd point out that we do, in fact, name statements when it makes sense 
to do so: with nested loops, labels allow you to refer to output loops 
explicitly for C or C statements (also C).


The fact that it's possible to name something doesn't require you to do 
so. But the ability name things like tests is a useful capability, in 
that it makes it possible to programmatically enable/disable them 
without touching the source code that defines them.


A spec-test is (or should be) different from an ad-hoc test. I want to 
be able to say "test S09.237 passes on pugs but not on Rakudo" (perhaps 
with a nicer name). Unique identifiers  allow comparisons of specific 
tests across multiple implementations, and over time. It is possible to 
derive IDs using line numbers (perhaps block-relative), but that's only 
a good idea if the test suite is reasonably stable (and it requires tool 
support).


Actually, if the truth be known, I don't really want to say that. I much 
prefer to define behavior using properties, and then say "random seed 
35467 generated a test that caused assertion XYZ to fail" (or "random 
seed 54578 generates a test that gives a different result on Rakudo Vs 
Pugs"). Specific hand-coded directed/focused tests are usually a last 
resort in my line of work.


I don't care if the functionality is "Core" or "Module" -- I'm not even 
sure that there's a distinction. I think the question is more "is it 
specified as part of the language, or not" -- and if it's used by the 
spec of the language then it seems reasonable to specify it.


Re: RFD: Built-in testing

2009-01-23 Thread jerry gay
On Fri, Jan 23, 2009 at 12:37, Dave Whipp  wrote:
> I could also imagine writing code that reads from an Sqlite database, and
> imposes that info onto the test. Whatever mechanism is used, I think we need
> a language-defined mechanism to supply a stable unique identifier for each
> test, so that it can be individually tracked and manipulated. Perhaps "is
> only" is the wrong way to implement the action-at-a-distance, but it does
> seem better (IMO) than a preprocessor.
>
i don't understand the drive to have unique test identifiers. we don't
have unique identifiers for every code statement, or every bit of
documentation. why are tests so important/special/different that each
warrants a unique id? that aside, this functionality sounds like it
can be encapsulated in a module, if desired. as it stands, i can't see
a reason reason it *has to* be made available in the core.

as a recap, the discussions larry, patrick, moritz and i (and others,
i'm sure) had on this topic long ago led to agreement that the most
important characteristics for a portable specification test suite
were:

~ the tests should be organized in such a way that it makes it easy to
figure out to what bit of spec is under scrutiny
  (addressed by directory/filename standardization and smartlinks)
~ the test files mustn't be cluttered with code that implementations need ignore
  (comments are used, which are by default ignored, and can be
preprocessed to customize the test for each implementation)
~ the skip/todo markers should be as close to the relevant tests as
possible, so they're less likely to fall out-of-sync
  (the markers are in comments in the test file, directly above the tests)

it's my view that spec tests should be easy to maintain for developers
of multiple implementations, and uniqueness is an overly burdensome
constraint.
~jerry


Re: RFD: Built-in testing

2009-01-23 Thread Dave Whipp

Larry Wall wrote:


  module MyTests {
   sub group1 {
 ok foo :name; ## Q - would a label be better?
   }
  }


>>   ## Elsewhere
>>   MyTests.group1.test_foo is also broken;


I guess I don't see offhand what you're trying to do with that.

...

We must keep a clean
separation between code that proves success and any indicator that says
"don't try this yet".


That was the intent. The test (within the MyTests module) would define 
tests in a platfom agnostic way. The "is also" clause would be added in 
some other place (a platform-specific file) that says which tests are 
currently broken (or perhaps adds some other tag that indicates that it 
should be skipped for smoke testing, but not for full regressions).


The point is to have a mechanism within the language (i.e. not a 
preprocessor) that imposes that tags from afar: a useful 
action-at-a-distance that is necessary to separate the test from it's 
current status.


I could also imagine writing code that reads from an Sqlite database, 
and imposes that info onto the test. Whatever mechanism is used, I think 
we need a language-defined mechanism to supply a stable unique 
identifier for each test, so that it can be individually tracked and 
manipulated. Perhaps "is only" is the wrong way to implement the 
action-at-a-distance, but it does seem better (IMO) than a preprocessor.


Re: RFD: Built-in testing

2009-01-23 Thread Larry Wall
On Fri, Jan 23, 2009 at 11:16:21AM -0800, Dave Whipp wrote:
> I can see that. So the alternative is to give things names and/or tags,  
> so that we can attach parameters remotely.

Hmm, well, we also decided not to use any solutions that encourage
putting the metadata too far away from the place it modifies.
Somewhere else in the same file is perhaps okay (and I can see the
use of tags in messages if the message itself isn't unique, but
then why isn't the message unique?).  But as soon as you have unique
IDs people think they have to move the metadata out to a database,
and then you're back with the same kind of always out-of-date and
out-of-sync errors that we used to get with documentation before POD.
Plus you start getting back into uncertainty as to whether something
external to the program is cheating, unless you can prove a positive
cutoff to the fudging metadata while doing validation testing.
I really like the notion that final validation of 6.0.0 involves
simply running the test files without any reference to outside data.

> Such a mechanism should  
> probably be more general than just tests, so I'll overload "is also" to  
> impose additional traits:
>
>   module MyTests {
>sub group1 {
>  ok foo :name; ## Q - would a label be better?
>}
>   }
>
>   MyTests.group1.test_foo is also broken;
>
> presumably this would have some form of wildcarding, or inheritance of  
> the "broken" trait from outer scopes:
>
>   MyTests is also broken;
>
> Not sure if that could work.

I guess I don't see offhand what you're trying to do with that.
Modules are primarily about exportation, and seem like the wrong peg
to be hanging test info on--assuming such metadata even wants to look
like real code, which I don't think it does.  The real code wants to
look exactly like what it will look like when rakudo *isn't* broken
anymore.   Test code should rarely be in the business of asserting
that something is broken.  Or to put it another way, test code that
asserts failure a priori can never prove success.  We must keep a clean
separation between code that proves success and any indicator that says
"don't try this yet".  Every bit of code that is dependent on platform
dependencies is, by definition, not platform independent, and we've got
to keep at least the language validation tests platform independent.

Larry


Re: RFD: Built-in testing

2009-01-23 Thread Dave Whipp

Larry Wall wrote:

On Fri, Jan 23, 2009 at 08:01:14AM -0800, Dave Whipp wrote:

For example, I could conceive of a trait:

  ok foo, :broken

which might downgrade the error to a warning on rakudo, but not on other  
implementations.


On the surface that seems like a good idea, and pugs started out doing
things this way, but we discovered that it's a Terrible Mistake to
mix platform dependencies in with the notation of the actual test,.

..


All that being said, fudge is a preprocessor, and preprocessors are
a form of evil, so I'd certainly be open to the actual parser doing
the fudging during compilation if explicitly requested to do so.
My main concern is that the fudging directives not be intermixed with
the actual test, and that they not look like real code.



I can see that. So the alternative is to give things names and/or tags, 
so that we can attach parameters remotely. Such a mechanism should 
probably be more general than just tests, so I'll overload "is also" to 
impose additional traits:


  module MyTests {
   sub group1 {
 ok foo :name; ## Q - would a label be better?
   }
  }

  MyTests.group1.test_foo is also broken;

presumably this would have some form of wildcarding, or inheritance of 
the "broken" trait from outer scopes:


  MyTests is also broken;

Not sure if that could work.


Dave.


Re: RFD: Built-in testing

2009-01-23 Thread Larry Wall
On Fri, Jan 23, 2009 at 08:01:14AM -0800, Dave Whipp wrote:
> For example, I could conceive of a trait:
>
>   ok foo, :broken
>
> which might downgrade the error to a warning on rakudo, but not on other  
> implementations.

On the surface that seems like a good idea, and pugs started out doing
things this way, but we discovered that it's a Terrible Mistake to
mix platform dependencies in with the notation of the actual test,
which is why we now use the "fudge" preprocessor approach, where any
platform-dependent cheating is listed on its own line and looks like
a comment to other platforms.  Plus it's very easy to measure whether
you're passing the test or not--you just turn off all the fudging,
which leaves all the annotations as mere comments.  If you mix the
notation in with the test, then the test harness has to explicitly
ignore the notations both for other platforms and also for this
platform when a complete validation is desired; whether that is
being done correctly is more difficult to prove, and it opens up
the test harness to potential accusations of perfidious cheating.
With the current approach it's drop-dead easy to see whether or not
the tests are cheating--either you're running "fudge" or you're not.

All that being said, fudge is a preprocessor, and preprocessors are
a form of evil, so I'd certainly be open to the actual parser doing
the fudging during compilation if explicitly requested to do so.
My main concern is that the fudging directives not be intermixed with
the actual test, and that they not look like real code.

Larry


Re: RFD: Built-in testing

2009-01-23 Thread Dave Whipp

Timothy S. Nelson wrote:


 method foo() does assume { ... }
 method bar() does ensure { ... }


Is "ensure" equivalent to the "assert" that you describe above?


Yes. "does ensure" was meant to be an englishification of 
"postcondition"; and "does assume" is "precondition".


From the perspective of formal specification, one assumes that a 
precondition is true, and the body of the method/sub/block must ensure 
that the postcondition is true (given the assumption of any preconditions).





 method baz() { bar; ok conserve_sum; foo; }


I'd suggest that we don't even need to have "ok" here; we'd be 
better off just going "conserve_sum()", and assuming that, because it's 
a property, the "ok" will be automatically attached.  I know you're not 
being picky about syntax at the moment, but I wanted to throw the idea 
into the ring.


You really need some keyword there, to distinguish between the roles of 
"assume" and "assert". Also, it provides a construct to hang other 
traits on to. For example, I could conceive of a trait:


  ok foo, :broken

which might downgrade the error to a warning on rakudo, but not on other 
implementations.


Re: RFD: Built-in testing

2009-01-22 Thread Timothy S. Nelson

On Wed, 21 Jan 2009, Dave Whipp wrote:


Moritz Lenz wrote:

A few months ago Larry proposed to add some testing
facilites to the language itself, because we want to
culturally encourage testing, and because the test
suite defines the language, so we need to specify the
behaviour of our testing facilities anyway.


If we're going to revamp the testing primitives, then I'd like to suggest 
importing some concepts from hardware verification langauges, whose entire 
purpose is to define tests. Not too much, but just a few defns:


I love the basic ideas, but I have a few queries along the way.

* Define a "property" as an expression whose truth is of interest (properties 
may be named, or may be anonymous inline).


* An "assert " statement (aka "ok ") indicates that a 
violation of the property is to be considered an error


* An "assume " statement indicates that a violation of the property 
implies an incorrect test.


	It seems to me that, from your description, that "assert " 
is more like:


if(! ) { throw exception }

...and that assume  is more like ok().


class Foo {
 has $.a;
 has $.b;

 property conserve_sum { $.a + $.b == 42 },
   "a+b must sum to 42, but a=$.a + b=$.b == { $.a+$.b }";

 method foo() does assume { ... }
 method bar() does ensure { ... }


Is "ensure" equivalent to the "assert" that you describe above?


 method baz() { bar; ok conserve_sum; foo; }


	I'd suggest that we don't even need to have "ok" here; we'd be better 
off just going "conserve_sum()", and assuming that, because it's a property, 
the "ok" will be automatically attached.  I know you're not being picky about 
syntax at the moment, but I wanted to throw the idea into the ring.


An interesting type of property is one that tracks a series of events through 
time: a so called "temporal" property. A simple idea might be that 
"conserve_sum" should actually mean "sum does't change", instead of "is 
constant 42":


class Foo {
 ...

 coro property conserve_sum {
   my $sum = $.a + $.b;
   leave True;
   ok $.a + $.b == $sum,
  "sum not conserved: expected $sum, actual {$.a+$.b}"
 }

 method foo() does maintain { --$.a; ++$.b }
}


Vote++ :)


-
| Name: Tim Nelson | Because the Creator is,|
| E-mail: wayl...@wayland.id.au| I am   |
-

BEGIN GEEK CODE BLOCK
Version 3.12
GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- 
PE(+) Y+>++ PGP->+++ R(+) !tv b++ DI D G+ e++> h! y-

-END GEEK CODE BLOCK-



Re: RFD: Built-in testing

2009-01-22 Thread Timothy S. Nelson

On Thu, 22 Jan 2009, Richard Hainsworth wrote:

4) Testing software is different from debugging or running software. Running 
is about providing functionality to the user. Debugging is about getting 
expected behaviour and discovering why behaviour exhibited is not what is 
expected / specified. Testing is about demonstrating that the functionality 
provided is the functionality expected / specified under all specified 
conditions.


	I guess I've always seen it as having even more facets.  I hadn't 
thought about testing in this context before.  My suggestion, though, is that 
the facets would include:

-   "Useful" code (ie. the code that actually does the stuff you want)
-   Runtime error handling (ie. try/catch/whatever)
-   Debugging code (kind of like a log of watches[*], etc)
-   Tests (we're talking about these)
-   Comments # We know what these are :)
-   Documentation (POD, etc)

[*] by "watches", I mean those things in a GUI where you get it to show you 
the contents of a variable.


	If there is a name for what I've labelled '"Useful" code', then please 
let me know :).


	Anyway, I just wanted to highlight the contrast between code (which is 
essentially a 1-dimensional character stream), and the 2-dimensional nature 
we're trying to capture, in hopes that it will give someone ideas.


	In particular, I note that we have specialised syntax for each of 
these.  While it would presumably be confusing to unify the syntax for all of 
them, it seems to me that they naturally break into three groups:

-   "Useful" code (in as group by itself)
-   Checking code
-   Runtime error handling
-   Debugging code
-   Tests
-   English
-   Documentation
-   Comments

	Whether we should somehow unify the syntax of either the checking 
group or the English group isn't something that I know the answer to, but it's 
a thought.  Perl 6 is confusing to the beginner, and that's OK, but I figure 
it should be no more confusing than it has to be.


:)


-
| Name: Tim Nelson | Because the Creator is,|
| E-mail: wayl...@wayland.id.au| I am   |
-

BEGIN GEEK CODE BLOCK
Version 3.12
GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- 
PE(+) Y+>++ PGP->+++ R(+) !tv b++ DI D G+ e++> h! y-

-END GEEK CODE BLOCK-



Re: RFD: Built-in testing

2009-01-22 Thread jason switzer
On Thu, Jan 22, 2009 at 4:51 PM, jerry gay  wrote:

>  $x == $y
>:ok({ .true ?? 'message' !! 'failure message' })
>:diag( 'tap comment', :some_tap_property)


I just want to stress again that I would like to see no focus on just tap
emitters. While I realize this is just an example, adverbs that apply to a
specific emitter would not be my preference. Extensible emitters would allow
integrators the opportunity to mix perl6 tests in with perl5 tests and xUnit
tests (for easily integrated test reports).

-Jason "s1n" Switzer


Re: RFD: Built-in testing

2009-01-22 Thread Ovid
- Original Message 

> From: jerry gay 

> On Thu, Jan 22, 2009 at 09:22, Moritz Lenz wrote:
> > Richard Hainsworth wrote:
> > But it is interesting to think about the case where a user wants two
> > different diagnostic test messages (to all the testing gurus out there:
> > do you actually want such a feature?). It shouldn't be too hard to do;
> > maybe just  :OK('True message', 'False message')?

I can't speak for others, but I only want one diagnostic message, with the 
option to turn it on for passing tests.  Having different messages for 
different conditions will confuse me :)

 
Cheers,
Ovid
--
Buy the book - http://www.oreilly.com/catalog/perlhks/
Tech blog- http://use.perl.org/~Ovid/journal/
Twitter  - http://twitter.com/OvidPerl
Official Perl 6 Wiki - http://www.perlfoundation.org/perl6


Re: RFD: Built-in testing

2009-01-22 Thread jerry gay
On Thu, Jan 22, 2009 at 09:22, Moritz Lenz  wrote:
> Richard Hainsworth wrote:
> But it is interesting to think about the case where a user wants two
> different diagnostic test messages (to all the testing gurus out there:
> do you actually want such a feature?). It shouldn't be too hard to do;
> maybe just  :OK('True message', 'False message')?
>
maybe

  $x == $y :ok('message') :nok('failure message')

or

  $x == $y
:ok({ .true ?? 'message' !! 'failure message' })
:diag( 'tap comment', :some_tap_property)

to handle success and failure messages, and set custom diagnostic info
in the tap stream. that is, as long as the result of the comparison is
available in $_ to the :ok adverb.

~jerry


Re: RFD: Built-in testing

2009-01-22 Thread Moritz Lenz
Ovid wrote:
> One concern is where Larry asks:
> 
> I wonder how often we'd have people making the error
> of trying to interpoalte into :ok
>
> 
>  
> I'd be one of them.  The following is a very common idiom:
> 
> for my $method (@methods) {
> can_ok $object, $method;
> lives_ok { $object->$method } "... and calling '$method' isn't fatal";
> }

Single angle quotes are just like single quotes in that they don't
interpolate, whereas double angle quotes are just like double quotes;
they interpolate.

So you can just write :ok«... and calling '$method' isn't fatal», or
:ok<<...>> or :ok("...") - it's not like there were only one way to
write an attribute ;-)

Surely people will make mistakes when they blindly assume things, but
they'll learn it rather quickly.

(BTW I want the non-interpolating test description just as often as the
interpolating one, as in :ok; but that
might be because I'm testing Perl 6, not user-level applications).

> Interpolation in the test description is very important on iterative tests or 
> to distingiush similar tests

That's why there's still more than one way to do it ;-)

Cheers,
Moritz


Re: RFD: Built-in testing

2009-01-22 Thread Moritz Lenz
There are a few interesting points on which I'd like to comment

Richard Hainsworth wrote:
> In other words, test functionality sufficient for the compiler may not 
> be adequate for module testing. But other functions can be developed in 
> Test modules that can be hooked into a general testing approach.

That's clear to me, and our current approach doesn't require all tests
to be written as adverbs - only the most common ones. For example I
think the eval_lives_ok and eval_dies_ok functions will remain, as well
as a few others.

> a) a global variable $*TESTING which defaults to FALSE (or should it be 
> $?TESTING ?)
> 
> It could be set lexically so that specific software / modules can be 
> tested without triggering tests in other used modules.

I don't think that lexical is good choice, since it means that you can
only ever turn it on from the inside, which means that every code that
contains TEST blocks also has to have some logic for switching on
$*TESTING - which smells like a lot of code duplication.

> b) When $*TESTING is TRUE, any TEST block is executed.
> 
> c) Within a TEST block, the tenary  is defined slightly 
> differently, thus for
>  ??  !! 
> ;
> 
>  is guaranteed to return Boolean::FALSE if an 
> exception or failure condition is encountered when evaluating it.
> 
> Some advantages of this approach over :OK<>:
> - no new behaviour outside of a TEST block is defined, no change to 
> adverbs or boolean operators.

That is an advantage, but the definition of the :OK adverbs could also
somehow magically be scoped to TEST blocks.

> - any expression that leads to a boolean result (note that :OK is 
> suggested to be defined only on boolean operators) can be included in 
> the expression, eg., an entire block.

The same can be achieved with ? ... :OK

> - The programer has control over both the "True" diagnostic, as in the 
> :OK<> syntax, but also over the 'False' diagnostic, thus allowing a 
> degree of introspection on the component of the expression, which the 
> programmer has more knowledge about than the compiler.

Sadly he not only has the control, but is also obliged to cater for both
cases. I'm lazy, and I don't want to type all of my messages twice.

But it is interesting to think about the case where a user wants two
different diagnostic test messages (to all the testing gurus out there:
do you actually want such a feature?). It shouldn't be too hard to do;
maybe just  :OK('True message', 'False message')?

> - Since the variables used in the boolean expression are available to 
> the programmer for both diagnostics, there is no need for special magic 
> to generate the failure diagnostic, which seems to be the situation with 
> :OK<>.

No, the :OK solves that problem, it doesn't generate it.
Also it implies again that the programmer actually has to do it himself,
which goes against the principle of laziness.

> - Since it is the programmer that defines the False diagnostic, no extra 
> autogenerated macros are needed.
> - The minimum that is needed for a test would be to specify a 'true' 
> diagnostic and the $! error variable, eg.,
> TEST { 2 == 2 ?? say 'constants are constants' !! say $! };

And what would $! contain in this case?

(I think the real objection from is that ?? !! is just plain ugly; but
then again I might be blind here...)

> d) A TEST block is specified to react to exceptions / failures in a 
> different manner than in normal blocks. Uncaught exceptions are 
> discarded at the end of a block. Thus compiler / module / software 
> failures do not stop the software from continuing, unless specifically 
> required by the programmer to do so within the block.

That's quite a good idea.

> e) Other functions that are useful in test suites, such as plan, could 
> be defined later as wrappers around "?? !!"

Or just stay as plain functions.

Cheers,
Moritz


Re: RFD: Built-in testing

2009-01-22 Thread Moritz Lenz
Ovid wrote:
> Regarding the disadvantages:
> 
>> However nothing in life is free, we pay for it with a
>> few disadvantages:
>> * We nearly double the number of built-in operators
>>by adding an :ok multi
> 
> Yes, but conceptually this will be transparent to the end user, right?  
> They'll just know that they can add :ok to operators.  They'll mentally have 
> one extra piece of information, not twice as many.

Right.

>> * We force implementors to handle operator adverbs
>>and named arguments very early in their progress
>>(don't know how easy or hard that is)
> 
> This might be a problem.  After my (now possibly moot) rewrite of Test.pm was 
> finished, my plan was to write a basic Test.pm which required as few features 
> as needed but still allowed the spectests to run.  Then you simply provide 
> language developers a list of features they need to implement to run the test 
> suite.  Adding operator adverbs to the mix means a lot of rewriting of tests.
> 
> Alternatively, we can say "you don't need these at first" and Test.pm is 
> merely a older way of running tests.  It still remains a valid alternative 
> and new implementers don't need to worry about adverbs. 

But if the spectests are re-written in terms of adverbs, a compiler
can't use them without adverbs. If not they are not re-written, there's
no point in introducing the syntax.

>> * Testing of operators becomes somewhat clumsy. If you
>> * want to test infix:<==>, you won't write
>>'2 == 2 :ok("== works")', because you test a
>>different multi there. Instead you'd have to write
>>something like '?(2 == 2) :ok("== works")', where
>>:ok is an adverb on prefix:.
> 
> Bad:
> 
>   2==2 :ok("== works");
> 
> Good:
> 
>  ?(2==2) :ok("== works");
> 
> I don't relish explaining, over and over again, why the first is bad and the 
> second is good.  That being said, if this is only used for internals tests, 
> is this likely going to be exposed?

This will only be a FAQ for the contributors of the official Test suite,
and for people who write and test their own boolean operators. I guess
we can live with that.

All other people will assume that the operators already work.

>> So I'd like to hear your opinions: do you think
>> adverb-based testing is a good idea? If you don't like
>> it, do you see any other good way to tackle the
>> problems I mentioned above?
> 
> So how would the following work?
> 
>   can_ok
>   lives_ok
>   throws_ok
>   isa_ok
>   is_deeply

They would remain subs (unless somebody has a much better idea).

> And so on?  Sure, I can write extensions for this, but they're so common that 
> it seems a shame to not have them built-in, but what operator would they hook 
> to?
> 
> Also, if we're going to go whole hog on this, then may I suggest a "tests" or 
> "test" keyword?  We might have :ok embedded in our code, in which case 
> running multiple sections of code might have multiple sections with :ok.  How 
> do test numbers work?  When Foo.pm calls Bar.pm calls Baz.pm and they all use 
> :ok, we may not know how many tests we have, so these might get handled 
> different from something like this:
> 
>   test Unit::Customer plan 3 {
>   use Customer;
>   my Customer $cust .= new( :fname, :lname );
>   $cust.fname eq 'Billy' :ok;
> 
>   # plan assumes 2 referrals
>   # won't work because we can't interpolate?
>   for $cust.referrals -> $ref_cust {
>   $ref_cust.referrer === $cust :ok<{$ref_cust.name} should have 
> correct referrer>;
>   }
>   }
> 
> With a scheme like this, we can separate tests explicitly written by 
> programmers for testing and those which are embedded.  If the &referrals 
> method has :ok in it, this shouldn't impact the overall plan, right?
> 
> Side note: for the desugar, I'd still prefer we go with 'have/want' instead 
> of 'got/expected'.  We've been wanting to do this with TAP for a while. It 
> reads well and also aligns nicely for fixed-width fonts.

I'll think a bit more about these points.

Cheers,
Moritz


Re: RFD: Built-in testing

2009-01-22 Thread Ovid
- Original Message 

> From: Moritz Lenz 

> >   test Unit::Customer plan 3 {
> >   use Customer;
> >   my Customer $cust .= new( :fname, :lname);
> >   $cust.fname eq 'Billy' :ok;
> > 
> >   # plan assumes 2 referrals
> >   # won't work because we can't interpolate?
> >   for $cust.referrals -> $ref_cust {
> >   $ref_cust.referrer === $cust :ok<{$ref_cust.name} should have 
> correct referrer>;
> >   }
> >   }

> I'll think a bit more about these points.

I've been thinking about this and have realized that it also solves an 
intractable problem with Perl 5 tests:  identifying tests.

By promoting 'test' to a first class concept (not just adjectives), you can 
"name" a test.  Right now, I'm trying to write App::Prove::History 
(http://github.com/Ovid/app--prove--history/tree/master), a bad name for code 
which saves the state of test runs.

One incredibly thorny problem I have is that tests are identified by the name 
of the file.  Reorganize your tests in directories or rename 'em?  You've just 
lost your test history.  However, if tests have an implicit name, developers 
are no longer locked into a directory hierarchy to identify their tests.  This 
also brings us conceptually closer to the xUnit crowd.

I would say for the above, if &referrals had embedded :ok tests, they could be 
output as warnings (if failing) or be provided via some mechanism that would 
let them be embedded into a TAP stream (or other test protocol) so that the 
information is not lost.

I also wonder if 'plan' might not belong there.  Not all testing protocols 
implement that and perhaps some developers won't want it.  So long as their 
tests don't prematurely exit, they know they've run all of their tests.

 
Cheers,
Ovid
--
Buy the book - http://www.oreilly.com/catalog/perlhks/
Tech blog- http://use.perl.org/~Ovid/journal/
Twitter  - http://twitter.com/OvidPerl
Official Perl 6 Wiki - http://www.perlfoundation.org/perl6



Re: RFD: Built-in testing

2009-01-22 Thread Dave Whipp

Moritz Lenz wrote:


$x == 1e5   :ok('the :ok makes this is a test');


I can't help feeling that there's an end-weight problem here: The fact 
that it is a test is the essence of statement.


If we're thinking of it as a library, then the MMD way of thinking might 
be appropriate: we know it's an equality test so there's no need to 
introspect it.


But if we're thinking of it as a core language feature, then using macro 
semantics -- and introspecting the AST -- isn't necessarily a bad thing.


It also depends on the context. If we're in a file that contains just 
tests, then there's nothing unexpected about seeing a test. Indeed, the 
fact that a statement is a test is no longer important (so no end weight 
issue). But if we want to see these ":ok" tests littering everyday code 
(i.e. as assertions) then it would be wrong to not make explicit the 
fact that the statement is a test


Re: RFD: Built-in testing

2009-01-22 Thread Dave Whipp

Moritz Lenz wrote:

A few months ago Larry proposed to add some testing
facilites to the language itself, because we want to
culturally encourage testing, and because the test
suite defines the language, so we need to specify the
behaviour of our testing facilities anyway.


If we're going to revamp the testing primitives, then I'd like to 
suggest importing some concepts from hardware verification langauges, 
whose entire purpose is to define tests. Not too much, but just a few defns:


* Define a "property" as an expression whose truth is of interest 
(properties may be named, or may be anonymous inline).


* An "assert " statement (aka "ok ") indicates that 
a violation of the property is to be considered an error


* An "assume " statement indicates that a violation of the 
property implies an incorrect test.


Assumptions are very important when you write automated test generators, 
or need to validate that your tests are not doing something illegal. If 
you violate an assumption when running normal code then it's not really 
any different from hitting an assertion. You want as many assumptions as 
possible to be part of the type system (which we already do with "where" 
clauses).


(I'm using the phrase "assert" instead of "ok" because that's the 
standard terminology: In perl, "ok" is standard, so no need to rename 
it. But I do think that we need to qualify it with whether it's an 
assumption or an assertion. You can also think of an assumption as a 
precondition. But adding "PRE" blocks to every function tends to 
encourage cargo-cult DBC programming.)



The other thing I'd like to point out is that the concept of a 
"property" can be very general. We shouldn't assume that they just sit 
in the middle of procedural code. Specifically, it should be possible to 
 define invariants on objects, which should be true at some specified 
point in time (however, it's not always obvious what that point in time is).


I'm thinking we might have something like:

class Foo {
  has $.a;
  has $.b;

  property conserve_sum { $.a + $.b == 42 },
"a+b must sum to 42, but a=$.a + b=$.b == { $.a+$.b }";

  method foo() does assume { ... }
  method bar() does ensure { ... }
  method baz() { bar; ok conserve_sum; foo; }
}

A property is just a method that returns a Bool but, if you associate 
the failure message with it, then it becomes simple to assert/assumme it 
in multiple places without needing to keep repeating the message.


An interesting type of property is one that tracks a series of events 
through time: a so called "temporal" property. A simple idea might be 
that "conserve_sum" should actually mean "sum does't change", instead of 
"is constant 42":


class Foo {
  ...

  coro property conserve_sum {
my $sum = $.a + $.b;
leave True;
ok $.a + $.b == $sum,
   "sum not conserved: expected $sum, actual {$.a+$.b}"
  }

  method foo() does maintain { --$.a; ++$.b }
}


I don't think that it is necessary to be too cute with huffmannization 
of testing primitives. All my examples here are just thinking aloud.


Re: RFD: Built-in testing

2009-01-22 Thread Ovid
- Original Message 

> From: Moritz Lenz 

> So Larry and Patrick developed the idea of creating an
> adverb on the test operator instead:
> 
> $x == 1e5   :ok('the :ok makes this is a test');
> 
> This is an adverb on the infix:<==> operator, and might
> desugar to something like this:
> 
> multi sub infix:<==>($left, $right, :$ok) {
> $*TEST_BACKEND.proclaim($left == $right, $ok)
> or $*TEST_BACKEND.diag(
> "Got: «$left.perl()»; Expected: «$right.perl»");
> }

Regarding the disadvantages:

> However nothing in life is free, we pay for it with a
> few disadvantages:
> * We nearly double the number of built-in operators
>by adding an :ok multi

Yes, but conceptually this will be transparent to the end user, right?  They'll 
just know that they can add :ok to operators.  They'll mentally have one extra 
piece of information, not twice as many.

Are there other consequences of this?

> * We force implementors to handle operator adverbs
>and named arguments very early in their progress
>(don't know how easy or hard that is)

This might be a problem.  After my (now possibly moot) rewrite of Test.pm was 
finished, my plan was to write a basic Test.pm which required as few features 
as needed but still allowed the spectests to run.  Then you simply provide 
language developers a list of features they need to implement to run the test 
suite.  Adding operator adverbs to the mix means a lot of rewriting of tests.

Alternatively, we can say "you don't need these at first" and Test.pm is merely 
a older way of running tests.  It still remains a valid alternative and new 
implementers don't need to worry about adverbs. 

> * Testing of operators becomes somewhat clumsy. If you
> * want to test infix:<==>, you won't write
>'2 == 2 :ok("== works")', because you test a
>different multi there. Instead you'd have to write
>something like '?(2 == 2) :ok("== works")', where
>:ok is an adverb on prefix:.

Bad:

  2==2 :ok("== works");

Good:

 ?(2==2) :ok("== works");

I don't relish explaining, over and over again, why the first is bad and the 
second is good.  That being said, if this is only used for internals tests, is 
this likely going to be exposed?

> So I'd like to hear your opinions: do you think
> adverb-based testing is a good idea? If you don't like
> it, do you see any other good way to tackle the
> problems I mentioned above?

So how would the following work?

  can_ok
  lives_ok
  throws_ok
  isa_ok
  is_deeply

And so on?  Sure, I can write extensions for this, but they're so common that 
it seems a shame to not have them built-in, but what operator would they hook 
to?

Also, if we're going to go whole hog on this, then may I suggest a "tests" or 
"test" keyword?  We might have :ok embedded in our code, in which case running 
multiple sections of code might have multiple sections with :ok.  How do test 
numbers work?  When Foo.pm calls Bar.pm calls Baz.pm and they all use :ok, we 
may not know how many tests we have, so these might get handled different from 
something like this:

  test Unit::Customer plan 3 {
  use Customer;
  my Customer $cust .= new( :fname, :lname );
  $cust.fname eq 'Billy' :ok;

  # plan assumes 2 referrals
  # won't work because we can't interpolate?
  for $cust.referrals -> $ref_cust {
  $ref_cust.referrer === $cust :ok<{$ref_cust.name} should have correct 
referrer>;
  }
  }

With a scheme like this, we can separate tests explicitly written by 
programmers for testing and those which are embedded.  If the &referrals method 
has :ok in it, this shouldn't impact the overall plan, right?

Side note: for the desugar, I'd still prefer we go with 'have/want' instead of 
'got/expected'.  We've been wanting to do this with TAP for a while. It reads 
well and also aligns nicely for fixed-width fonts.

Cheers,
Ovid
--
Buy the book - http://www.oreilly.com/catalog/perlhks/
Tech blog- http://use.perl.org/~Ovid/journal/
Twitter  - http://twitter.com/OvidPerl
Official Perl 6 Wiki - http://www.perlfoundation.org/perl6


Re: RFD: Built-in testing

2009-01-22 Thread Richard Hainsworth

Moritz Lenz wrote:


So I'd like to hear your opinions: do you think
adverb-based testing is a good idea? If you don't like
it, do you see any other good way to tackle the
problems I mentioned above?

  
After reading everything in this thread to date and in order to 
structure my thoughts, I wrote up some assertions a suggestion.


1) A perl6 implementation is to be certified by its ability to pass the 
test suite. The test functionality is implemented in the implementation 
(the implicit recursion has been noted in other threads). This means the 
behaviour of the test functionality must be specified and verifiable.


Hence the assertion that perl6 is to be specified by a suite of 
documents and tests seems to me to imply that test functionality must be 
an inherent part of the language specification.


2) Test functionality for the compiler must be as simple to implement as 
possible, so that it can be incorporated into the implementation at an 
early stage.


3) Although test functionality for more complex software, eg., 
event-driven GUIs, could be constructed from simple specified test 
functionality, more complex forms will be developed to isolate the 
features that need testing and to provide the diagnostics.


In other words, test functionality sufficient for the compiler may not 
be adequate for module testing. But other functions can be developed in 
Test modules that can be hooked into a general testing approach.


4) Testing software is different from debugging or running software. 
Running is about providing functionality to the user. Debugging is about 
getting expected behaviour and discovering why behaviour exhibited is 
not what is expected / specified. Testing is about demonstrating that 
the functionality provided is the functionality expected / specified 
under all specified conditions.


It seems to me that the ethos of testing could be much wider than just a 
part of the development stage. From a risk-management perspective, 
mission-critical software (especially when it is complex and large) 
should be tested regularly against a standard test suite, because random 
errors may occur in the software (eg., a power surge subtly corrupts the 
contents of hard disk storage), and particularly after any upgrade of 
hardware or ancillary software, or any other environmental change, to 
say nothing of changes (upgrades) in the software itself.


Indeed, the inclusion of test functionality in the language and the 
focus on test suites as part of perl6 culture would make software 
written in perl6 extremely desirable in risk-sensitive companies.


Consequently, it seems to me that the following might be useful:

a) a global variable $*TESTING which defaults to FALSE (or should it be 
$?TESTING ?)


It could be set lexically so that specific software / modules can be 
tested without triggering tests in other used modules.


b) When $*TESTING is TRUE, any TEST block is executed.

c) Within a TEST block, the tenary  is defined slightly 
differently, thus for
 ??  !! 
;


 is guaranteed to return Boolean::FALSE if an 
exception or failure condition is encountered when evaluating it.


Some advantages of this approach over :OK<>:
- no new behaviour outside of a TEST block is defined, no change to 
adverbs or boolean operators.
- any expression that leads to a boolean result (note that :OK is 
suggested to be defined only on boolean operators) can be included in 
the expression, eg., an entire block.
- The programer has control over both the "True" diagnostic, as in the 
:OK<> syntax, but also over the 'False' diagnostic, thus allowing a 
degree of introspection on the component of the expression, which the 
programmer has more knowledge about than the compiler.
- Since the variables used in the boolean expression are available to 
the programmer for both diagnostics, there is no need for special magic 
to generate the failure diagnostic, which seems to be the situation with 
:OK<>.
- Since it is the programmer that defines the False diagnostic, no extra 
autogenerated macros are needed.
- The minimum that is needed for a test would be to specify a 'true' 
diagnostic and the $! error variable, eg.,

TEST { 2 == 2 ?? say 'constants are constants' !! say $! };

d) A TEST block is specified to react to exceptions / failures in a 
different manner than in normal blocks. Uncaught exceptions are 
discarded at the end of a block. Thus compiler / module / software 
failures do not stop the software from continuing, unless specifically 
required by the programmer to do so within the block.


e) Other functions that are useful in test suites, such as plan, could 
be defined later as wrappers around "?? !!"


Regards,
Richard



Re: RFD: Built-in testing

2009-01-22 Thread Ovid
- Original Message 

> From: jerry gay 

> since the :ok adverb is modifying the operator, perl  knows what kind
> of comparison is being attempted, and can automatically give smart
> diagnostics. this point was taken into consideration when the
> adverbial test syntax was originally designed. some examples of perl 6
> tests using adverbial notation:
> 
>   plan *;
>   3 === "3" :ok('int constant is equivalent to string constant integer');
>   3 !~~ "3" :ok('int constant smartmatch to string constant integer')
>   my $x = 284;
>   +$x == 284 :ok('$x is 284');
>   ?$x :ok('$x is True');
> 
> there will no longer be ok() and is() functions, so although is() is
> still a floor wax and a dessert topping, it has nothing to do with
> testing. the comparisons are now explicit, so the intent of the test
> isn't hidden behind a friendly-looking but difficult to debug function
> like is().

Reading through that log more carefully now.  Sorry I didn't do that earlier.

One concern is where Larry asks:

I wonder how often we'd have people making the error
of trying to interpoalte into :ok

 
I'd be one of them.  The following is a very common idiom:

for my $method (@methods) {
can_ok $object, $method;
lives_ok { $object->$method } "... and calling '$method' isn't fatal";
}

Interpolation in the test description is very important on iterative tests or 
to distingiush similar tests (sometimes it would be nice to go so far as to ban 
identical test descriptions).

Cheers,
Ovid
--
Buy the book - http://www.oreilly.com/catalog/perlhks/
Tech blog- http://use.perl.org/~Ovid/journal/
Twitter  - http://twitter.com/OvidPerl
Official Perl 6 Wiki - http://www.perlfoundation.org/perl6



Re: RFD: Built-in testing

2009-01-21 Thread jerry gay
On Wed, Jan 21, 2009 at 13:44, Ovid
 wrote:
> - Original Message 
>
>> From: Moritz Lenz 
>
>> * the word 'is' is overloaded in Perl 6
>>* if we export subs is() and ok(), we clutter the
>>  namespace with subs with short names
>>* is() is rather imprecise; it doesn't say *how*
>>  things are compared.
> 
>> So Larry and Patrick developed the idea of creating an
>> adverb on the test operator instead:
>>
>> $x == 1e5   :ok('the :ok makes this is a test');
>
> This may all be irrelevant, but I'm tossing it out here in case anyone thinks 
> of how it might impact things.
>
> I'm not entire certain how I feel about this yet, but I love the core concept 
> of making testing a first class feature (well, duh ... of course I would say 
> that :)
>
> I'd like for this to be thought through really carefully lest we create an 
> interesting idea which is hampered by its implementation.  Specifically, I'm 
> concerned about diagnostics.  What we'd ultimately love to have in TAP is 
> some way of improving diagnostics (pseudo-TAP).
>
>  is 3,3 'constants are constants;
>  # ok 1 - constants are constants
>  # have: 3
>  # want: 3
>
> Now we have a curious situation:
>
>  multisub foo(Str $bar);
>  multisub foo(Int $bar);
>
> If we're testing what we should pass to &foo:
>
>  is 3,"3" 'constants are constants;
>  # ok 1 - constants are constants
>  # have: 3
>  # want: "3"
>
> Integration tests will still do OK, but unit tests may have issues and this 
> could be an expectation violation.  What does it mean that the string 3 eq 
> the integer 3?
>
> Worse:
>
>  my $bar = 284;
>  ok $bar, '$bar should be true';
>  # ok 1 - $bar should be true
>  # have: 284
>  # want: True
>
> That can also look a bit strange, particularly if someone is coming from a 
> different language background.
>
> How would this new system handle diagnostic information?  One thing which 
> might mitigate this is something we've wanted in newer versions of TAP:
>
>  my $bar = 284;
>  ok $bar, '$bar should be true';
>  # ok 1 - $bar should be true
>  # test: ok $bar, '$bar should be true';
>  # have: 284
>  # want: True
>
> By letting programmers see the exact line of code for the test, the type 
> information *might* not be as important.  I'm unsure.
>
> One possibility is to look at the &Test::More::cmp_ok function:
>
>  $ perl -MTest::Most=no_plan -e 'cmp_ok 3, "==","2"'
>  not ok 1
>  #   Failed test at -e line 1.
>  #  got: 3
>  # expected: 2
>  1..1
>
> If you change "2" to "3", the test still passes, but we could force it to not 
> pass unless eq is passed in as the second argument.  Then we could have the 
> following diagnostics:
>
>  perl6 $ perl -MTest::Most=no_plan -e 'cmp_ok 3, "eq","3"'
>  not ok 1
>  # have: 3
>  # test: eq
>  # want: "3"
>  1..1
>
> And then it's crystal clear why it failed.
>
since the :ok adverb is modifying the operator, perl  knows what kind
of comparison is being attempted, and can automatically give smart
diagnostics. this point was taken into consideration when the
adverbial test syntax was originally designed. some examples of perl 6
tests using adverbial notation:

  plan *;
  3 === "3" :ok('int constant is equivalent to string constant integer');
  3 !~~ "3" :ok('int constant smartmatch to string constant integer')
  my $x = 284;
  +$x == 284 :ok('$x is 284');
  ?$x :ok('$x is True');

there will no longer be ok() and is() functions, so although is() is
still a floor wax and a dessert topping, it has nothing to do with
testing. the comparisons are now explicit, so the intent of the test
isn't hidden behind a friendly-looking but difficult to debug function
like is().
~jerry


Re: RFD: Built-in testing

2009-01-21 Thread Ovid
- Original Message 

> From: Moritz Lenz 

> * the word 'is' is overloaded in Perl 6
>* if we export subs is() and ok(), we clutter the
>  namespace with subs with short names 
>* is() is rather imprecise; it doesn't say *how*
>  things are compared.

> So Larry and Patrick developed the idea of creating an
> adverb on the test operator instead:
> 
> $x == 1e5   :ok('the :ok makes this is a test');

This may all be irrelevant, but I'm tossing it out here in case anyone thinks 
of how it might impact things.

I'm not entire certain how I feel about this yet, but I love the core concept 
of making testing a first class feature (well, duh ... of course I would say 
that :)

I'd like for this to be thought through really carefully lest we create an 
interesting idea which is hampered by its implementation.  Specifically, I'm 
concerned about diagnostics.  What we'd ultimately love to have in TAP is some 
way of improving diagnostics (pseudo-TAP).

  is 3,3 'constants are constants;
  # ok 1 - constants are constants
  # have: 3
  # want: 3

Now we have a curious situation:

  multisub foo(Str $bar);
  multisub foo(Int $bar);

If we're testing what we should pass to &foo:

  is 3,"3" 'constants are constants;
  # ok 1 - constants are constants
  # have: 3
  # want: "3"

Integration tests will still do OK, but unit tests may have issues and this 
could be an expectation violation.  What does it mean that the string 3 eq the 
integer 3?

Worse:

  my $bar = 284;
  ok $bar, '$bar should be true';
  # ok 1 - $bar should be true
  # have: 284
  # want: True

That can also look a bit strange, particularly if someone is coming from a 
different language background.

How would this new system handle diagnostic information?  One thing which might 
mitigate this is something we've wanted in newer versions of TAP:

  my $bar = 284;
  ok $bar, '$bar should be true';
  # ok 1 - $bar should be true
  # test: ok $bar, '$bar should be true';
  # have: 284
  # want: True

By letting programmers see the exact line of code for the test, the type 
information *might* not be as important.  I'm unsure.

One possibility is to look at the &Test::More::cmp_ok function:

  $ perl -MTest::Most=no_plan -e 'cmp_ok 3, "==","2"'
  not ok 1
  #   Failed test at -e line 1.
  #  got: 3
  # expected: 2
  1..1

If you change "2" to "3", the test still passes, but we could force it to not 
pass unless eq is passed in as the second argument.  Then we could have the 
following diagnostics:
 
  perl6 $ perl -MTest::Most=no_plan -e 'cmp_ok 3, "eq","3"'
  not ok 1
  # have: 3
  # test: eq
  # want: "3"
  1..1

And then it's crystal clear why it failed.

Cheers,
Ovid



Re: RFD: Built-in testing

2009-01-21 Thread Geoffrey Broadwell
On Wed, 2009-01-21 at 14:23 +, Peter Scott wrote:
> On Wed, 21 Jan 2009 13:35:50 +0100, Carl Mäsak wrote:
> > I'm trying to explain to myself why I don't like this idea at all. I'm
> > only partially successful. Other people seem to have no problem with it,
> > so I might just be wrong, or part of a very small, ignorable minority.
> > :) 
> 
> I find myself echoing you.  I don't have the language design skills others 
> are displaying here.  I can only evaluate this from an educator's point of 
> view and say that the P5 syntax of
> 
> is $x, 42, 'Got The Answer';
> 
> is just about the conceivable pinnacle of elegance for at least that form 
> of question.  (Compare, e.g., the logorrhoea of Java tests.)  I do not see 
> how I could tell a student with a straight face that the P6 proposal is an 
> improvement, at which point the conversation would devolve into a 
> defensive argument I do not want to have.
> 
> I get that 'is' is already taken and we do not want the grammar to engage 
> in Clintonesque parsing when it encounters the token.  Okay.  But how do I 
> justify the new syntax to a student?  What are they getting that makes up 
> for what looks like a fall in readability?

I don't quite understand the problem with using the same syntax as in
Perl 5, just uppercasing the verbs so they won't conflict with everyday
syntactic features:

OK($bool,  'Widget claimed success');
IS($x, 42, 'Widget produced the right answer');

(This is ignoring issues of placement of parens or curlies to make the
Perl 6 syntax attractive and consistent with other constructs -- I'm
just talking about using verb rather than adverb syntax, with our
already properly Huffmanized verb names intact.)

I do like the idea of having TEST {} blocks that go inactive when not in
testing mode (however that is defined).  But other than that, I don't
understand the value of the other syntactic changes suggested, the
adverb syntax in particular.  Maybe I'm missing something obvious 


-'f




Re: RFD: Built-in testing

2009-01-21 Thread Peter Scott
On Wed, 21 Jan 2009 13:35:50 +0100, Carl Mäsak wrote:

> Moritz (>):
>> So Larry and Patrick developed the idea of creating an adverb on the
>> test operator instead:
>>
>>$x == 1e5   :ok('the :ok makes this is a test');
> 
> I'm trying to explain to myself why I don't like this idea at all. I'm
> only partially successful. Other people seem to have no problem with it,
> so I might just be wrong, or part of a very small, ignorable minority.
> :) 

I find myself echoing you.  I don't have the language design skills others 
are displaying here.  I can only evaluate this from an educator's point of 
view and say that the P5 syntax of

is $x, 42, 'Got The Answer';

is just about the conceivable pinnacle of elegance for at least that form 
of question.  (Compare, e.g., the logorrhoea of Java tests.)  I do not see 
how I could tell a student with a straight face that the P6 proposal is an 
improvement, at which point the conversation would devolve into a 
defensive argument I do not want to have.

I get that 'is' is already taken and we do not want the grammar to engage 
in Clintonesque parsing when it encounters the token.  Okay.  But how do I 
justify the new syntax to a student?  What are they getting that makes up 
for what looks like a fall in readability?

-- 
Peter Scott
http://www.perlmedic.com/
http://www.perldebugged.com/


Re: RFD: Built-in testing

2009-01-21 Thread Carl Mäsak
Moritz (>):
> So Larry and Patrick developed the idea of creating an
> adverb on the test operator instead:
>
>$x == 1e5   :ok('the :ok makes this is a test');

I'm trying to explain to myself why I don't like this idea at all. I'm
only partially successful. Other people seem to have no problem with
it, so I might just be wrong, or part of a very small, ignorable
minority. :) Nevertheless, here is my main kvetch about the new syntax
proposal:

* Adverbs traditionally modify the behaviour of some construct, giving
it additional information or suggesting an alternative algorithm.
Well-known examples are :by on ranges, the adverbs on regexes, or the
:repl option on .pick(). All of these preserve the main objective of
the construct, only modifying it somewhat.

* The proposed :ok syntax changes the semantics of the comparison (or
whatever) from returning a value, to committing test-related actions,
probably resulting in output of some kind. The original comparison is
still syntactically prominent in the statement, but it's the testing
bit, whose syntax is pushed to the irrelevant far right, that does the
heavy lifting.

This can all be summarized in a feeling of mine that the suggested
testing :ok syntax make a travesty of adverbs. For the above reasons,
I don't find it particularly elegant or intuitive. I do think that
it's possible to use adverbs to make a better testing framework, but
IMHO this is not the way.

// Carl


Re: RFD: Built-in testing

2009-01-21 Thread Richard Hainsworth



(Daniel Ruoso also proposed to call the adverb :test
instead of :ok, making it easier to read but a bit
longer; my happiness doesn't depend on the exact name,
but of course we can discuss it once we have settled
on this scheme, if we do so).

  

My two-cents worth:

The adverb on a boolean changes the nature of the statement, so that if 
the statement is true we get the diagnostic message in :OKmessage> but if the statement is false we get a failure message from the 
compiler / software


Given that the diagnostic appears when the test succeeds, I - like 
Fagyal - would prefer :OK<> to :TEST<> because this is the way I use OK, 
that is I expect a positive answer.


However, the nature of a test is that a program consisting of test 
commands continues to run even if there is a failure. This is not a 
problem if the boolean statements are 'standalones' meaning that the 
consequent flow of the program is not dependent on the test, eg.,

$x.value == 2 :OK;
$x.color eq 'red' :OK;
...

But if this is part of perl6, then it will be possible (I think) to write
if $x.value == 2 :OK {dosomething()} else {dosomeotherthing()};

What sort of behaviour would be expected? I see several alternatives:
a) Suppose it is decided that :OK could be a part of ordinary software, 
then a fork in the program would occur depending on the boolean value.
Hence :OK in general generates a trace commentary that is explicitly 
defined for the TRUE case, but is implicitly defined by the compiler for 
the FALSE case.
b) However, if it is considered best for :OK only to operate in Test 
contexts. That would mean a boolean test with :OK should be illegal 
unless it is a standalone statement, eg., the test should not be in a 
control construct.
In this case, I would think :TEST should be the variant chosen. The 
reason being that it focusses attention on the test behaviour.
c) Suppose, as Damian suggested (I think), that tests should be included 
in normal software, but that they are ring-fenced into a separate block 
with a TEST {}. That way, TEST blocks would not normally run in 
production software.
In this case, the extra semantic hint of ok expecting a positive 
response would be useful. Hence :OK would be the preferable variation.


Richard


Re: RFD: Built-in testing

2009-01-20 Thread cdumont

Hi

I assume that BDD(Behavior Driven Development) and the vocabulary that 
it implies is not a good choice

at this stage ?

:describe("");
$x.should be(1e5)   :it("");

and that a module based on the core testing facilities can be built  if 
someone feels like to.
Well, the vocabulary that it implies is really nice anyway if it can be 
of any inspiration^^


http://www.oreillynet.com/pub/a/ruby/2007/08/09/behavior-driven-development-using-ruby-part-1.html






Re: RFD: Built-in testing

2009-01-20 Thread Damian Conway
Larry observed:

> My feeling on this is that the compiler should simply hardwire this
> particular adverb so that all the tests can be autogenerated, and the
> multi system never needs to see those versions.

I strongly agree.


> We are merely hijacking the adverb syntax so that is clear which
> operator is being modified. There is no need for the late binding of
> multi. It's just a "reserved adverb" if you will. Which probably means
> it should be something unlikely to collide with user-defined adverbs.
> Maybe something in all caps. For what it's worth, :OK<> can be typed
> with one hand while the other holds down the shift key. :)

Typical right-hander fascism!

We do indeed want to encourage testing by making it easy to write tests,
but naming it :TEST<> makes it far easier to *read* tests, which seems
to me a better long-term optimization.

We would probably also want a mechanism for switching tests on or off in
a given compilation unit, or globally, so they can be placed in (and left in!)
production code. Perhaps we could use the same mechanism for PRE{...}
and POST{...} blocks as well? Which also suggests that a general TEST
{...} block (which only runs if testing is enabled) might be valuable?

Damian


Re: RFD: Built-in testing

2009-01-20 Thread Fagyal Csongor

Hi,

I pretty much like this idea. Very perl6ish :)

- I don't think it's important whether it is called :ok, :OK or :test or 
:wellhowdidthatworkout. I assume people who will be testing their 
modules/code/etc. will be using more advanced modules for testing 
anyway. This is for testing the implementation against the specs, and 
they *will* know how it works :)


- I don't think we should be concerned whether to implement :ok is 
difficult. Implementations in early stage are totally broken anyway :), 
they won't even *parse* the tests well - they will have have their own, 
limited tests. Later they can chose to do some magic to make :ok work... 
and finally implement it.


- I like "ok" better than "test", as the former kind of implies a 
boolean "was that true?" to me. YMMV, though.


- Fagzal




Re: RFD: Built-in testing

2009-01-20 Thread jason switzer
On Tue, Jan 20, 2009 at 1:08 PM, Moritz Lenz  wrote:

> So Larry and Patrick developed the idea of creating an
> adverb on the test operator instead:
>
>$x == 1e5   :ok('the :ok makes this is a test');
>
> This is an adverb on the infix:<==> operator, and might
> desugar to something like this:
>
> multi sub infix:<==>($left, $right, :$ok) {
>$*TEST_BACKEND.proclaim($left == $right, $ok)
>or $*TEST_BACKEND.diag(
>"Got: «$left.perl()»; Expected: «$right.perl»");
> }
>
> (Daniel Ruoso also proposed to call the adverb :test
> instead of :ok, making it easier to read but a bit
> longer; my happiness doesn't depend on the exact name,
> but of course we can discuss it once we have settled
> on this scheme, if we do so).


I like this idea and with it built into the language itself, there will be
much less of an excuse to skip testing. I like the adverb form, which seems
more perl6 than C. Naming it something like :test is a better idea than :ok
as that seems a bit more direct.

There isn't much in the spec concerning namespaces, other than the default *
namespace. Is there any reason why the testing framework can't go in it's
own namespace?


> * We nearly double the number of built-in operators
>   by adding an :ok multi
>  * We force implementors to handle operator adverbs
>   and named arguments very early in their progress
>   (don't know how easy or hard that is)
>  * Testing of operators becomes somewhat clumsy. If you
>  * want to test infix:<==>, you won't write
>   '2 == 2 :ok("== works")', because you test a
>   different multi there. Instead you'd have to write
>   something like '?(2 == 2) :ok("== works")', where
>   :ok is an adverb on prefix:.
>

These are mostly disadvantages to implementors, not users of the testing
framework. I'd rather the implementations struggle to implement a built-in
testing functionality than users of the language struggle to use the
built-in testing.


> I'll send another mail on the subject of pluggable
> testing backends in order to allow different emitters
> (TAP output, storage into databases, whatever)


This is a requirement for me. Having only TAP emitters may not integrate
well. It would be nice if the spec, if added, would allow flexibility in
this realm. I would actually like to see a flexible system that allowed me
to define a new emitter, say for the cases where you want to integrate perl6
testing into an existing testing framework (think automated builds and
tests).

-Jason "s1n" Switzer


Re: RFD: Built-in testing

2009-01-20 Thread Larry Wall
On Tue, Jan 20, 2009 at 08:08:57PM +0100, Moritz Lenz wrote:
:  * We nearly double the number of built-in operators
:by adding an :ok multi

My feeling on this is that the compiler should simply hardwire
this particular adverb so that all the tests can be autogenerated,
and the multi system never needs to see those versions.  We are
merely hijacking the adverb syntax so that is clear which operator
is being modified.  There is no need for the late binding of multi.
It's just a "reserved adverb" if you will.  Which probably means it
should be something unlikely to collide with user-defined adverbs.
Maybe something in all caps.  For what it's worth, :OK<> can be typed
with one hand while the other holds down the shift key. :)

Larry


RFD: Built-in testing

2009-01-20 Thread Moritz Lenz
A few months ago Larry proposed to add some testing
facilites to the language itself, because we want to
culturally encourage testing, and because the test
suite defines the language, so we need to specify the
behaviour of our testing facilities anyway.

We also discussed some possible changes to the current
testing syntax, please read the IRC discussion starting
from here:
http://irclog.perlgeek.de/perl6/2008-10-09#i_613827

There are some problems with the current approach,
especially if we make it built-in:

* the word 'is' is overloaded in Perl 6, it is used
  for traits (class A is rw { ... }),
  implementation types (my @a is TiedArray) and
  inheritance (class A { is B; ... }). Especially
  the the last one can appear in the same position
  as an is() test, and means something completely
  different

   * if we export subs is() and ok(), we clutter the
 namespace with subs with short names - not very nice

   * is() is rather imprecise; it doesn't say *how*
 things are compared. Currently two options seem to
 be open for the comparison semantics: either
 string based, or deep structural equality (with
 infix:).  Both are very problematic: The
 stringification of some structures can be
 implementation specific (for example for Ranges),
 and it often looses lot of information. Also pugs
 used to stringify hashes in insertion order, but
 that's not forced by the spec, so many tests
 relied on that behaviour, ie they were wrong.
 Comparison by infix: is dangerous, because it
 is very strict, and things that were identical in
 the early stages of the compiler become distinct
 later on. Consider 'sub f { return 1, 2, 3}; say
 f() eqv (1, 2, 3)'. Pugs says this is true,
 because it doesn't implement return() as returning
 a capture (or in turn thinks that lists are eqv to
 captures), and testers would rely on this.
 Future versions of a compiler would contradict the
 previous results, and thus put an additional burden
 on the test suite maintainers (ie mostly me). There
 are other subtle distinctions (like between List
 and Array) that make it hard to write tests
 with infix: correctly.


So we want to get rid if the is() test function. The
current workaround is to use ok() and an explicit
comparison operator like this:

ok $x == 1e5, 'with explicit numeric comparison';

However this just tells us if a test fails, not how it
fails - it would be nice to have something like
Test::More's output "Expected 1e5, got 1e4".  The only
way that ok() could produce such diagnostics would be
to declare it a macro that peeks into the AST of its
first argument and tries to find the two values. But
that's very advanced magic, and might not even be
possible in a DWIMmy way in the general case.

So Larry and Patrick developed the idea of creating an
adverb on the test operator instead:

$x == 1e5   :ok('the :ok makes this is a test');

This is an adverb on the infix:<==> operator, and might
desugar to something like this:

multi sub infix:<==>($left, $right, :$ok) {
$*TEST_BACKEND.proclaim($left == $right, $ok)
or $*TEST_BACKEND.diag(
"Got: «$left.perl()»; Expected: «$right.perl»");
}

(Daniel Ruoso also proposed to call the adverb :test
instead of :ok, making it easier to read but a bit
longer; my happiness doesn't depend on the exact name,
but of course we can discuss it once we have settled
on this scheme, if we do so).

So every operator that returns a boolean would get a
multi with the named argument :ok, which could be
autogenerated for most operators, but hand-written for
others that need more verbose diagnostics (for example
the smart match operator could tell you which of its
hundred possible comparisons it used). That's not just
limited to infix operators, ok($x, '$x is true') could
be written as ?$x :ok('$x is true'), where the :ok is
an adverb on prefix:;

This approach gives us
 * good and easy diagnostics
 * exactness by forcing the explicit choice of
   comparison semantics
 * nice integration into the Perl 6 syntax
 * no cluttering of the namespace with short subs

However nothing in life is free, we pay for it with a
few disadvantages:
 * We nearly double the number of built-in operators
   by adding an :ok multi
 * We force implementors to handle operator adverbs
   and named arguments very early in their progress
   (don't know how easy or hard that is)
 * Testing of operators becomes somewhat clumsy. If you
 * want to test infix:<==>, you won't write
   '2 == 2 :ok("== works")', because you test a
   different multi there. Instead you'd have to write
   something like '?(2 == 2) :ok("== works")', where
   :ok is an adverb on prefix:.

So I'd like to hear your opinions: do you think
adverb-based testing is a good idea? If you don't like
it, do you see any other good way to tackle the
problems I mentioned above?

I'll send another mail on the su