Do you have any statistics about what the hit rate of the identifier
lookup cache is? How many identifier lookups actually use the binary
search?
On Thu, 29 Mar 2012, Per Hedbor () @ Pike (-) developers forum wrote:
If it was not clear, before CRC32, almost 50% of share-string time was
spent
Even if we were to get rid of the global lock, a global hash table for
strings probably wouldn't be a significant problem since it can be
made lock free.
The problem is that there may be work patterns where there's a
significant risk of getting very long identical strings, e.g. if a
file is read and cached for some time and then read again from another
part of the program, or if the same file is read concurrently by
different threads.
What's
Does anyone know how often in the code we actually depend on the
fact that the same string will be at the same address in memory?
Because I'm contemplating an optimisation which would involve making
the string duplication avoidance opportunistic instead of mandatory.
I.e. something along the
On Thu, 29 Mar 2012, Stephen R. van den Berg wrote:
Because I'm contemplating an optimisation which would involve making
the string duplication avoidance opportunistic instead of mandatory.
I guess the point here is to skip the hashing in cases where the strings
are large, come from the
Arne Goedeke wrote:
On Thu, 29 Mar 2012, Stephen R. van den Berg wrote:
Because I'm contemplating an optimisation which would involve making
the string duplication avoidance opportunistic instead of mandatory.
I guess the point here is to skip the hashing in cases where the strings
are large,
(i.e. they're not
fully hashed all the time, to avoid the overhead of rehashing large strings
repeatedly when juggling around lots of strings).
Large strings are not fully hashed. The hash function will consider
at most 72 characters. So strings longer than that will not take
longer to hash
That's exactly what I'm asking... How many places are there where we
explicitly depend on the fact that the address can be used to define
uniqueness?
All places where strings are compared.
Say, a few thousand places in the code, probably?
Most importantly: Mappings and multiset, identifiers
The issue isn't necessarily the hashing but the fact that you need to
have this globally synced instead of e.g. creating a thread-local
string pool. Still, I agree with you that fundamental properties of
mappings etc are based on string uniqueness. There are other
low-hanging fruit that should be
The issue isn't necessarily the hashing but the fact that you need to
have this globally synced instead of e.g. creating a thread-local
string pool.
Well. Yes, but as long as we do not have actual threads that can run
concurrently in pike this is not much of an issue, really.
C-code can (and
Does anyone know how often in the code we actually depend on the
fact that the same string will be at the same address in memory?
Often, but it's probably not hard to find a set of gatekeeper
functions that cover all the cases.
Because I'm contemplating an optimisation which would involve making
If it was not clear, before CRC32, almost 50% of share-string time was
spent in the hash function. But it was still a very small percentage
of the total CPU time used.
I optimized it because it was easy to do, mostly.
We spend significantly more time looking up identifiers in objects, as
an
Do you have some statistics on this? I'd imagine that most time
spent on comparising hash-hits would be on short to medium length
strings, and not really long ones since it's unlikely you'll find
another string with the exact same length in the hash bucket.
I do have some statistics from the
Note that it typically isn't the calculation of the hash that is
expensive, but the comparison on hash-hit.
Do you have some statistics on this? I'd imagine that most time
spent on comparising hash-hits would be on short to medium length
strings, and not really long ones since it's unlikely
14 matches
Mail list logo