"M. Uli Kusterer" <[EMAIL PROTECTED]> wrote:
> >They could, of course, just call ParamCStringValue() and then copy
> >the memory it points to, which would seem to eliminate the need for
> >the Copy... functions altogether.
>
> As long as we have no typed parameters, that would be OK, but if the
> internal representation is any different from the kind of string
the user
> requested (e.g. Unicode, long, double, Rect, whatever...) it will cause
two
> copies of the strings to be created, one owned by the host (even though
the
> host doesn't need it) and another one owned by the user.
To my thinking, that was the reason for having both types of calls. You
would call ParamCStringValue if you don't need a copy, such as for most
simple parameters. You just want to see if a string is "yes" or "no" for
instance, so you ask for a pointer to the string and do a quick
compare. No
need to allocate memory, copy it, or worry about freeing it later.
And the
host doesn't do any copying either, unless it needs to (depending on
how the
value is stored internally).
In cases where you need a copy, such as a longer string that you'll be
searching through and modifying in some way, you use the Copy... function
and get your copy. As I tried to point out later, maybe not very
well, I'm
not sure we really want to create both types of functions. Either
approach
alone will sometimes lead to unnecessary copying that could be avoided by
providing both types of functions, but that may not be a strong enough
reason to include the complexity of two approaches.
If we choose to only include one function for accessing a string, my vote
would be to eliminate the Copy... approach. The reason is that I'd
like to
make it as easy as possible to write externals. In many (I would guess
most) cases, parameters are short, simple values that control the
operation
of the external, such as "ascending" vs. "descending" or
"yes"/"no"/"maybe"
or the like. For such parameters it seems rather a lot of bother for the
external author to have to do something like this (apologies if I mess
something up here -- it's getting late):
int myFlag;
unsigned int len = GetParamStringLength( XParamRef param );
char * paramPtr = (char *)malloc(len+1);
CopyParamCStringValue( XParamRef param, ¶mPtr, len );
myFlag = strcmp(paramPtr, "yes");
free(paramPtr);
as opposed to:
int myFlag = strcmp( ParamCStringValue( XParamRef param ), "yes");
The latter will make life so much easier for external writers, while
still
allowing them to copy to their heart's content in the cases where
they need
to own a copy for themselves.
> >The reason I included the Copy...
> >functions in the first place was to save double-copying in some
> >cases. For example, if the host is holding a Unicode string and the
> >external requests a C string using ParamCStringValue(), the host will
> >have to convert to a C string internally and return a pointer to
> >that string which the external would then copy.
>
> Exactly. IMHO it would be much more efficient to have only the CopyXXX
> calls. Only one set of calls to maintain, and the engine wouldn't
have to
> care about disposing something behind the XCMD's back. It is the very
> reason why HyperCard's GetFieldTEHandle() callback (or whatever
the exact
> name) returns a copy to a field's TEHandle, and not the original. It is
> safer, and allows the host to use whatever internal representation they
> desire without having to care about maintaining memory for the user.
It sounds like you prefer to only support one approach rather than
both, but
you chose Copy... as a way of being safer. I agree that a trusting host
which returns a pointer to its internal data runs the risk that the
X could
muck around with its internals. But an X will have plenty of
opportunity to
mess things up in any case. Trust is necessary. As long as it is
clearly
documented that an X shouldn't mess with the contents of the memory
pointed
to I don't think returning such a pointer is really a problem.
As for something being disposed behind the X's back, that shouldn't be
possible when running single-threaded during a single call to the X.
If the
X will stay around and be called again, then clearly it needs to
make a copy
of anything it wants to keep. If you're talking about
multi-threading then
we've got a whole new can of worms to deal with, but essentially we would
need some mechanism to guarantee data validity to the X for some
period of
time (like until the X says it's done). Otherwise all bets are off.
> >Unless you're dealing with megabyte-size strings,
> >the performance gain is probably insignificant, so maybe we should
> >just eliminate the Copy... functions.
>
> Here's two people coming to exactly opposite decisions based on
the same
> data. Maybe it's my thinking the Apple way again. Apple used to make the
> fault of letting the users directly access their data structures,
meaning
> they couldn't change the way e.g a window worked, because then
applications
> would have had problems getting at the data they wanted. That's what
> opacity is all about.
Returning a pointer rather than a copy doesn't violate opacity. It's
entirely up to the host how it is implemented. It may be a pointer
to live
internal data, or it may be a pointer to a copy managed by the host,
but in
any case the host is free to change its implementation at any time.
We're
not talking data structures here, just simple strings (of course, if the
host decides to change the definition of a char, it might be in
trouble ;-).
> And Apple took it one step further by also only
> returning copies to their data that is owned by the caller. This
way, the
> caller can keep it exactly as long as needed, and the host doesn't
have to
> babysit on data the user might be no longer needing.
I think my earlier comments covered this, but just to be sure: The host
calls the X. When the X returns, the host is free to clean up or
dump any
data it's holding, including stuff returned in callbacks from the X.
This
might require some extra housekeeping by the host (or not, depending
on how
it is implemented). But my philosophy is to treat X writers the
same way I
treat scripters: I'll do the work once so they don't have to, ever.
> >Okay, that's good to keep in mind, although in this case eliminating
> >the Copy... calls would also solve the problem. ;-)
>
> But would also make many things much more effort. I recently wanted to
> take some ANSI code that fetches a whole file's data from a
compound file
> format. It is not unlike having a hard disk in a file. Now here's the
> problem: QuickTime (which I needed to decode the file's data)
requires me
> to pass it a Macintosh Handle. But I had that memory as a
malloc()ed data
> block. I have to make an additional copy of that image file in a Handle,
> just to have QuickTime make a third copy in a graphics buffer. I don't
even
> want to *see* the memory stats on this one.
I'm afraid I don't quite follow this. It sounds like it was a pain,
but I
don't see how keeping Copy... would help it much. Or maybe I've
lost track
of the context of this part of the discussion... ?
> Also, if the user knows it's
> always a copy of the data being passed around, they have a rule to
follow
> every time they have to consider whether they are leaking.
Otherwise they
> always have to try and remember: "Did I allocate the memory, or did the
> host?" This begins becoming really bad when you have "if" clauses where
one
> allocates the data itself while the other retrieves the data from the
host.
> You'll always have to keep a flag who owns the memory.
I agree that knowing who owns the data is essential. But if you
make a call
you know which method you called. You'd only need a flag if you had more
than one way within your own code to obtain the pointer. I think
this is a
good argument for having only one mechanism, however. Then there will be
just one rule (the host owns it) unless you allocate memory yourself (in
which case of course you own it).
> I don't mind calling an associative array a HashTable, but I don't
> like if something else (e.g. a non-terminated string) is suddenly
called
an
> array. Tech support will love you if you do that.
Okay. Since there seems to be some disagreement on this, I'm fine
with not
calling anything an array, although it may be awkward at times.
> >Getting back to the case at hand, what is the purpose of the
> >ParamCharArrayValue() call above, anyway? If its purpose is to
> >provide access to arbitrary binary data, then maybe "byte" or "data"
> >would be an appropriate term. How about changing it to
> >ParamDataValue(), which returns a pointer to an array of bytes:
>
> MetaCard has associative arrays. The way they work is that every
variable
> can either contain a string value, or it contain an arbitrary number of
> "entries" that are identified by strings. These entries are essentially
> also variables again. Although MetaCard doesn't allow that yet,
you could
> technically take this idea a step further and allow these entries
to have
> sub-entries.
This much I knew already.
> So, if a parameter is just a variable, and an array is simply a
variable
> containing a list of variables, we could simplify all of this a
great deal
> by having:
>
> XParamRef GetHashTableEntry( XParamRef param, char* hash );
>
> which would give you a reference to a variable in an array variable, and
> you'd identify this entry by the hash or index you pass in. Is this more
> comprehensible?
Umm, this is perfectly clear, but I'm afraid maybe we had a disconnect
somewhere along the way. My remarks above were intended to be in
reference
to arrays (oops! I mean sequences) of bytes, not to hash tables. It was
actually my feeble attempt to come up with a term other than "array"
to use
in accessing a non-terminated string or other chunk of binary data (hence
ParamDataValue() ).
> >I nearly eliminated the ioLength parameter since we now have
> >GetParamSize(), but it's not clear that they would return the same
> >value -- should GetParamSize() return the raw size in bytes, or the
> >number of characters in a string? In the case of Unicode they will
> >be very different numbers. Always more questions...
>
> I also thought about that. We'd probably have to change the name to
> GetParamStringLength() or something like that. "length" in the ANSI libs
> also means characters, while "size" means bytes, IIRC. having
length here
> would be smarter since it would work both for ASCII and UniCode strings,
> while otherwise the user would need several calls:
>
> GetParamAsciiStringLength()
> GetParamUnicodeStringLength()
Agreed -- length and size sound ok to me, as does
GetParamStringLength().
Well, no, I take that back. We may have to have both ascii and unicode
length calls, because they can differ. The reason is that if you want an
ascii string representation of a unicode string, you'll end up with stuff
like "abcd\2453fg" where the \2453 represents a single unicode
character.
The unicode length is 7 but the ascii length is 11 -- darn!
Cheers,
Doug