Paolo Bonzini <[email protected]> writes:

> [People in Cc are a mix of Python people, tracing people, and people
>  who followed the recent AI discussions. - Paolo]
>
> This series adds type annotations to tracetool. While useful on its own, 
> it also served as an experiment in whether AI tools could be useful and
> appropriate for mechanical code transformations that may not involve
> copyrightable expression.
>
> In this version, the types were added mostly with the RightTyper tool
> (https://github.com/RightTyper/RightTyper), which uses profiling to detect
> the types of arguments and return types at run time.  However, because
> adding type annotations is such a narrow and verifiable task, I also developed
> a parallel version using an LLM, to provide some data on topics such as:
>
> - how much choice/creativity is there in writing type annotations?
>   Is it closer to writing functional code or to refactoring?

Based on my work with John Snow on typing of the QAPI generator: there
is some choice.

Consider typing a function's argument.  Should we pick it based on what
the function requires from its argument?  Or should the type reflect how
the function is used?

Say the function iterates over the argument.  So we make the argument
Iterable[...], right?  But what if all callers pass a list?  Making it
List[...] could be clearer then.  It's a choice.

I think the choice depends on context and taste.  At some library's
external interface, picking a more general type can make the function
more generally useful.  But for some internal helper, I'd pick the
actual type.

My point isn't that an LLM could not possibly do the right thing based
on context, and maybe even "taste" distilled from its training data.  My
point is that this isn't entirely mechanical with basically one correct
output.

Once we have such judgement calls, there's the question how an LLM's
choice depends on its training data (first order approximation today:
nobody knows), and whether and when that makes the LLM's output a
derived work of its training data (to be settled in court).

[...]

> Based on this experience, my answer to the copyrightability question is
> that, for this kind of narrow request, the output of AI can be treated as
> the output of an imperfect tool, rather than as creative content potentially
> tainted by the training material.

Maybe.

>                                    Of course this is one data point and
> is intended as an experiment rather than a policy recommendation.

Understood.  We need to develop a better understanding of capabilities,
potential benefits and risks, and such experiments can only help with
that.


Reply via email to