Re: [Python-projects] A design for inc-lint, an incremental pylint

Edward K. Ream Tue, 27 Jul 2010 11:19:34 -0700

On Tue, Jul 27, 2010 at 12:56 PM, Emile Anclin <[email protected]> wrote:


> * Remark about Inference: I once tried to cache the *direct* inferences,
> i.e. calling infer() without the context. There, I discovered that:
>  - only 5 % of the nodes are infered more than one time
>  - we basicly only infer Name and Getattr nodes directly
>  - however, there should be a lot of double-infering with regard
>    to the Context / Callcontext management; I think there are some
>    low hanging fruits there (like there were discovered some on our
>    last ASTNG refactoring) for a simple inference cache.

Many thanks, Emile, for your comments.

I think the essence of the situation, as far as global analysis goes,
is that type analysis is "hard" in the sense that it depends on
assignments that can happen anywhere in the program.  But if none of
those assignments change then the type of an object (or rather name)
doesn't change.

> * Concerning your *Inc-lint*: What is quite obvious is, that Pylint looses
> a lot of time of parsing almost always basicly all builtin and all
> the Python standard library.

Yes, the symbol tables (and deductions) for those files should be cached.

> So the question is, how to store all the data about modified and
> unmodified files, including a relevant representation of the astng tree?

The ast (or astng) trees would be cached merely to discover that
special case that the old and new versions of a file differ only by
comments or whitespace.  This is a cute check, but it's optional.

The key to the design is that it is *not* based on diffs to ast trees.
 Indeed, local analysis will be slightly slower than the corresponding
parts of pylint.  It does the same general ast-related checks that
pylint does, but in addition it is computing (from scratch, not
incrementally) both the module symbol table and all the deductions
that will have to be satisfied again.

> And how reading all this information can be done faster than just astng?

It's not faster.  It's slower.  But unpickling is pretty fast, and
it's not much of a concern.  The speedup comes becomes the diff phase
shows where the old and new versions of the *symbol tables* differ.
This in turn allows us to update the *global* list of needed
deductions.

So yes, there is overhead, but it should be hugely effective overhead.
 In particular, suppose I am writing new methods of a class.  I
typically will not change the class attributes, except to add the new
methods.  These methods might satisfy some global deductions that
previously failed, but otherwise *none* of the (very large) set of
global deductions will have to be re-verified.  So inc-lint will
analyze the changed files, but none of the inference machinery will be
needed because the symbol tables *already* will contain all the needed
information.

Or so I hope :-)

> At least if we could just store the informations about unmodified modules,
> it would probably be a huge improvement, and I wonder if going below
> would result in another important improvement: the _ast-parsing of a
> single file is quite fast.

The problem is that type inference is a completely global matter.  Any
assignment (including function returns) anywhere in the entire
program, can change the set of types that name can take on.  There is
simply no way around this: global inference can be expensive.  Thus,
the design must be ambitious enough to handle all possible ripple
effects.  I believe that the data-flow algorithm prototyped in
new-lint is ambitious (and general) enough, but that's not proven.
Clearly, though, the data-flow algorithm is the only kind of general
algorithm that has a chance of working.  It is fundamentally iterative
in the sense that deductions can trigger other deductions
indefinitely.  The trick is to set things up so that a deduction fires
only once, when all needed data are available.  The new-lint prototype
appears to handle these kinds of details properly.

Edward
------------------------------------------------------------------------------
Edward K. Ream email: [email protected]
Leo: http://webpages.charter.net/edreamleo/front.html
------------------------------------------------------------------------------
_______________________________________________
Python-Projects mailing list
[email protected]
http://lists.logilab.org/mailman/listinfo/python-projects

Re: [Python-projects] A design for inc-lint, an incremental pylint

Reply via email to