On 6/16/07, Chris Dolan <[EMAIL PROTECTED]> wrote:
On Jun 17, 2007, at 12:56 AM, Joshua ben Jore wrote:
> On 6/16/07, Chris Dolan <[EMAIL PROTECTED]> wrote:
>> Josh,
>>
>> Josh, can you explain to us in a little more depth what this means?
>> Are you showing that certain input values follow the same path
>> through the code?
>
> Yes.
>
>> It looks like the full path through the code is
>> the key to your hash of runs. If I've understood that part
>> correctly, then I'm still having trouble understanding where you go
>> from there. How is that a measure of code coverage? Are you
>> planning to then compare those paths to the full tree of opcodes?
>
> Ah, sorry. I have a problem at work where there's a pile of nasty code
> which deals in really gigantic piles of input per day. Much of the
> data is similar and I assume triggers identical paths through the
> code. If I wanted to get "full" coverage I could probably use a
> suitably gigantic pile of input because that would probably be
> sufficient to cover all or many of the potential code paths.
>
> It is impractical to write tests using those monstrously large piles
> of input. I want to filter out input which triggers redundant code
> paths and retain only those data that cause a unique behaviour.
>
> My example was an evenness detector. In one possible scenario I figure
> "several thousand" numbers are sufficient to cover all the possible
> code paths because "I don't understand this function." Maybe I really
> have hundreds of millions of numbers being tested and I really want to
> know if I get any additional behaviors by trying 4 if I've already
> tried 2.
>
> What I learn from using this is that I can get equivalent coverage
> from both inputs 2 and 4. I can now opt to write my tests using just
> the number 2 because I don't learn anything new by using any
> additional numbers like 4, 6, or 8.
>
> So I can test for which inputs will add to my code coverage and
> which will not.
>
> Josh
Got it. Thanks very much for the explanation. Does anyone know if
there's been research in a topic like this? That is, what size/
complexity can your code grow to be before small changes in inputs
create wildly divergent code paths? It sounds (distantly) related to
the code tracing that software like Coverity is doing, but on a macro
scale.
I wonder if path-finding algorithms (computer games, highway route
finders, etc) have any relevance in this space? Hmm, on further
thought that's probably wrong since (to put it mildly) code has much
stronger opinions about which path to take than people or bots.
I have other relevant code: Devel::Spy. Takes input and traces how it
is used everywhere. I've used it to find out what values were being
used and how. Usually this was to find booleans and string
comparisons. Worked great. I made a no-nothing set of regression tests
without ever having to understand the code I was writing against.
Mostly I was just facing a different but related gnarly function.
Josh