On Thu, Jul 23, 2009 at 1:50 PM, Greg Beaver<g...@chiaraquartet.net> wrote:
> Eddie Drapkin wrote:
>> Hey all,
>> we've got a repository here at work, with something like 55,000 files
>> in it. For the last few years, we've been naming $variables_like_this
>> and functions_the_same($way_too).  And now we've decided to switch to
>> camelCasing everything and I've been tasked with somehow determining
>> if it's possible to automate this process.  Usually, I'd just use the
>> IDE refactoring functionality, but doing it on a
>> per-method/per-function and a per-variable basis would take weeks, if
>> not longer, not to mention driving everyone insane.
>>
>> I've tried with regular expressions, but I can't make them smart
>> enough to distinguish between builtins and userland code.  I've looked
>> at the tokenizer and it seems to be the right way forward, but that's
>> also a huge project to get that to work.
>>
>> I was wondering if anyone had had any experience doing this and could
>> either point me in the right direction or just down and out tell me
>> how to do it.
>
> Hi Eddie,
>
> That's quite the task :).
>
> You're going to need to scan the source to generate a list of every
> variable and function name using the tokenizer.  Fortunately, this is
> easy - with the caveat that if you do this anywhere in your source:
>
> $a = $this->{$constructed . '_name'}();
>
> you will have to handle these manually.
>
> Basically, run token_get_all() on the source, scanning for T_VARIABLE,
> and record every T_VARIABLE in an array.  Then, scan for:
>
> 1) T_FUNCTION T_WHITESPACE* T_STRING
> 2) T_OBJECT_OPERATOR T_WHITESPACE* T_STRING
>
> <?php
> $replace = array();
> foreach (new RegexIterator(new RecursiveIteratorIterator(new
> RecursiveDirectoryIterator('/path/to/src')), '/\.php$/',
> RegexIterator::MATCH, RegexIterator::USE_KEY) as $path => $file) {
> $source = file_get_contents($path);
>
> $checkForID = false;
> $var = false;
> $last = '';
> foreach (token_get_all($source) as $token) {
>    if (!is_array($token)) continue;
>
>    if ($checkForID) {
>        if ($token[0] == T_WHITESPACE) {
>            $last .= $token[1];
>            continue;
>        }
>        if ($token[0] != T_STRING) {
>            $checkForID = false;
>            $last = '';
>            continue;
>        }
>        $token[1] = $last . $token[1];
>    } elseif ($token[0] == T_FUNCTION || $token[0] == T_OBJECT_OPERATOR) {
>        $checkForID = true;
>        $last = $token[1];
>        continue;
>    } elseif ($token[0] == T_STRING) {
>        if (function_exists($token[1])) {
>            continue; // skip internal functions
>        }
>        if (strtolower($token[1]) != $token[1]) {
>            continue; // assuming you UPPER-CASE constants, this skips them
>        }
>    } elseif ($token[0] != T_VARIABLE) {
>        continue;
>    }
>
>    // we get to here if we've found one to process
>    $new = explode('_', $token[1]);
>    $new = array_map('ucfirst', $new);
>    $new[0] = lcfirst($new); // for your camelCasing
>
>    $new = implode('', $new);
>    $replace[] = array($token[1], $new);
> ?>
>
> Next, load each file (you should use RecursiveIteratorIterator with a
> RecursiveDirectoryIterator and some kind of filter, probably
> RegexIterator, to grab the PHP source files), and then iterate over the
> list of variable names somewhat like this:
>
> <?php
> foreach (new RegexIterator(new RecursiveIteratorIterator(new
> RecursiveDirectoryIterator('/path/to/src')), '/\.php$/',
> RegexIterator::MATCH, RegexIterator::USE_KEY) as $path => $file) {
>    $source = file_get_contents($path);
>    foreach ($replace as $items) {
>
>        $source = str_replace($items[0], $items[1], $source);
>
>        if ($items[0][0] == '$') {
>            $source = preg_replace('/->(\s*)' . substr($variable, 1) . '/',
>                                   '->\\1'substr($new, 1),
>                                   $source);
>        }
>    }
>    file_put_contents($path, $source);
> }
> ?>
>
> Voila, code refactored.
>
> I trust you know this, but don't run that example code without testing
> it on a limited sandbox and comparing the results first :).  I did not
> test anything except the regexiterator part to make sure that it
> actually grabbed PHP files, the rest is based on my experience
> tokenizing for parsing PHP when writing tools like phpDocumentor.
>
> If I made any mistakes, it would be good for you to post your final
> scripts for posterity back on here.
>
> Greg
>


Thanks so much, man.  I'm using most of your methodology, although
there were definitely some hiccups along the way, but it seems to make
a map of what to replace and what to replace with so far, although the
code is far from pretty. I'll be sure to send it to the list when it's
done.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to