>>>>> "vQ" == Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no> >>>>> on Sat, 30 May 2009 11:16:43 +0200 writes:
vQ> Martin Maechler wrote: >> Hi Waclav (and other interested parties), >> >> I have committed my working version of src/main/coerce.c >> so you can prepare your patch against that. >> vQ> some further investigation and reflections on the code in StringFromReal vQ> (henceforth SFR), src/main/coerce.c:315 (as in the vQ> patched version, now in r-devel). vQ> petr's elim_trailing (renamed to dropTrailing, vQ> henceforth referred to as DT) takes as input a const vQ> char*, and returns a const char*. vQ> const-ness of the return is not a problem; it is fed vQ> into mkChar, which (via mkCharLenCE) makes a local vQ> memcpy of the string, and there's no violation of the vQ> contract here. vQ> const-ness of the input is a consequence of the return vQ> type of EncodeReal (henceforth EC). however, it is vQ> hardly ever, "in principle", a good idea to vQ> destructively modify const input (as DT does) if it vQ> comes from a function that explicitly provides it as vQ> const (as ER does). yes, I think we've alway agreed on that "in principle". vQ> the first question is, why does ER return the string as const? it vQ> appears that the returned pointer provides the address of a buffer used vQ> internally in ER, which is allocated *statically*. that is, each call vQ> to ER operates on the same memory location, and each call to ER returns vQ> the address of that same location. i suspect this is intended to be a vQ> smart optimization, to avoid heap- or stack-allocating a new buffer in vQ> each call to ER, and deallocating it after use. however, this appraoch vQ> is problematic, in that any two calls to ER return the address of the vQ> same piece of memory, and this may easily lead to data corruption. Well, that would be ok if R could be used "threaded" / parallel / ... and we all know that there are many other pieces of code {not just R's "own", but also in Fortran/C algorithms ..} that are not thread-safe. "Yes, of course", R looks like a horrible piece of software to some, because of that vQ> under the assumption that the content of this piece of memory is copied vQ> before any destructive use, and that after the string is copied the vQ> address is not further distributed, the hack is relatively harmless. vQ> this is what mkChar (via mkCharLenCE) does; in SFR it copies the vQ> content of s with memcpy, and wraps it into a SEXP that becomes the vQ> return value from SFR. exactly. vQ> the original author of this hack seems to have had some concern about vQ> exporting (from ER) the address of a static buffer, hence the returned vQ> buffer is const. in principle, this should prevent corruption of the vQ> buffer's content in situations such as vQ> // hypothetical vQ> char *p1 = ER(...); vQ> // p1 is some string returned from ER vQ> char p2 = ER(...); vQ> // p2 is some other string returned from ER vQ> // some modifications performed on the string referred to by p1 vQ> p1[0] = 'x'; vQ> // p2[0] is 'x' -- possible data corruption vQ> still worse in a scenario with concurrent calls to ER. (which will not happen in the near future) vQ> however, since the output from ER is const, this is no longer possible vQ> -- at least, not without a deconstifying cast the petr style. the vQ> problem with petr's solution is not only that it modifies shared memory vQ> purposefully qualified as const (by virtue of ER's return type), but vQ> also that it effectively distributes the address for further use. vQ> unfortunately, like most of the r source code, ER is not appropriately vQ> commented at the declaration and the definition, and without looking at vQ> the code, one can hardly have any clue that ER always return the same vQ> address of a static location. while the original developer might be vQ> careful enough not to misuse ER, in a large multideveloper project it's vQ> hard expect that from others. petr's function is precisely an example vQ> of such misuse, and as it adds (again, without an appropriate comment) a vQ> step of indirection; any use of petr's function other than what you have vQ> in SFR (and can you guarantee no one will ever use DT for other vQ> purposes?) is even more likely to end up in data corruption. you have a point here, and as a consequence, I'm proposing to put the following version of DT into the source : ------------------------------------------------------------------------ /* Note that we modify a 'const char*' which is unsafe in general, * but ok in the context of filtering an Encode*() value into mkChar(): */ static const char* dropTrailing0(char *s, char cdec) { char *p = s; for (p = s; *p; p++) { if(*p == cdec) { char *replace = p++; while ('0' <= *p && *p <= '9') if(*(p++) != '0') replace = p; while((*(replace++) = *(p++))) ; break; } } return s; } ------------------------------------------------------------------------ so it has a comment along the lines you suggest, *and* as static is not callable from outside coerce.c vQ> one simple way to improve the code is as follows; instead of (simplified) vQ> const char* dropTrailing(const char* s, ...) { vQ> const char *p = s; vQ> char *replace; vQ> ... vQ> replace = (char*) p; vQ> ... vQ> return s; } vQ> ...mkChar(dropTrailing(EncodeReal(...), ...) ... vQ> you can have something like vQ> const char* dropTrailing(char* s, ...) { vQ> char *p = s, *replace; vQ> ... vQ> replace = p; vQ> ... vQ> return s; } vQ> ...mkChar(dropTrailing((char*)EncodeReal(...), ...) ... vQ> where it is clear, from DT's signature, that it may (as it purposefully vQ> does, in fact) modify the content of s. that is, you drop the vQ> promise-not-to-modify contract in DT, and move the need for vQ> deconstifying ER's return out of DT, making it more explicit. vQ> however, this is still an ad hoc hack; it still breaks the original vQ> developer's assumption (if i'm correct) that the return from ER vQ> (pointing to its internal buffer) should not be destructively modified vQ> outside of ER. I think that's a misinterpretation of the history of ER. IIRC, the main issue when changing from 'char*' to 'const char*' in *many* places "simultaneously" was the introduction of hashes / cashes of strings (in mkChar(.)) in order to make R-level character handling (and storage of STRSXP/CHARSXP) much more efficient. vQ> another issue is that even making the return from ER const does not vQ> protect against data corruption. for example, vQ> const char *p1 = ER(...) vQ> // p1 is some string returned from ER vQ> const char *p2 = ER(...) vQ> // p2 is some other string returned from ER vQ> // but p1 == p2 vQ> if p1 is used after the second call to ER, it's likely to lead to data vQ> corruption problems. frankly, i'd consider the design of ER essentially vQ> flawed. removing const from ER's return is obviously not an option; it vQ> would certainly be safer to have it return a pointer to a heap-allocated vQ> buffer, but then the returned buffer would have to be freed at some vQ> point in the client code, and this would require a whole cascade of vQ> modifications to the r source code. since it usually is more funny to vQ> introduce new bugs than repair old ones, i'd not expect r core to be vQ> ever willing to invest time in this. vQ> the following seem acceptable and relatively reasonable ways to address vQ> the issue: vQ> (1) noop: leave as is. vQ> (2) minimal: adopt the (inessential) changes suggested in my previous post: vQ> const char* dropTrailing(const char* s, ...) { vQ> char *p = (char*) s; vQ> // no further need for (char*) casting from p to replace vQ> ... } vQ> (3) modify petr's solution along the lines above, i.e., have the input vQ> in the signature non-const and deconst-cast the output from ER outside vQ> of the call to DT. that's what I have adopted, as I'm sure you've noticed when you saw the code above. So, let's get to something more interesting, "Happy Pentecoste!" Martin vQ> (4) modify petr's solution to operate on a local, non-const copy of s: vQ> const char* dropTrailing(const char* s, ...) { vQ> int length = strlen(s); vQ> char *ss = malloc((length+1)*sizeof(char)); vQ> memcpy(ss, s, length+1); vQ> // work on ss rather than on s vQ> ... vQ> return ss; } vQ> and don't forget to deallocate the return from DT after mkChar (or vQ> whoever calls DT) has no more need for it. (an alternative is to use vQ> the allocCharsxp approach of mkCharLenCE, but that would be overdoing vQ> the job.) vQ> to sum up: for a clean solution, i find (4) preferable. for vQ> efficiency, (3) seems better, provided that you add a clear comment to vQ> the effect that the assumption of non-modifiable output from ER is vQ> purposefully violated. (2) is acceptable provided that DT is vQ> appropriately documented (i.e., that it does not really treat s as const vQ> char*, but effectively as non-const char*), and with a note as above. vQ> a final, bitter remark: i'm surprised that it's so easy to have an r vQ> core developer submit to r-devel a patch that violates an explicit vQ> demand on the immutability of a shared, statically allocated piece of vQ> memory. (even if it's the already existing code that actually is vQ> problematic.) vQ> no patch attached, you need to consider your options. hope this helps vQ> anyway. vQ> best, vQ> vQ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel