rrides?
>
> Good to know Vitaly!
> So a poor example then. Better example is an abstract class with a method
> implementation that no subtypes override, yet multiple subtypes are found
> to be the receiver of a particular call site. Should we expect a
> monomorphic call site in tha
On Sun, Dec 29, 2019 at 10:22 AM Brian Harris
wrote:
> Hello!
>
> I was hoping to get one point of clarification about avoiding megamorphic
> call sites, after reading these excellent articles:
>
>
> http://www.insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/
>
On Mon, Nov 25, 2019 at 11:50 AM Peter Veentjer wrote:
> I have a question about MESI.
>
> My question isn't about atomic operations; but about an ordinary write to
> the same cacheline done by 2 cores.
>
> If a CPU does a write, the write is placed on the store buffer.
>
> Then the CPU will
I posed this question to this list a few years ago, but don’t recall much
participation - let’s try again :).
Has anyone moved their C, C++, Java, whatever low latency/high perf systems
(or components thereof) to Rust? If so, what type of system/component? What
has been your experience? Bonus
FWIW, I’ve only seen lfence used precisely in the 2 cases mentioned in this
thread:
1) use of non-temporal loads (ie weak ordering, normal x86 guarantees go
out the window)
2) controlling execution of non-serializing instructions like rdtsc
I’d be curious myself to hear of other cases.
On Fri,
didn’t actually pick up on how often the termination protocol
triggers - I assumed it’s an uncommon/slow path.
>
>
> On Saturday, September 14, 2019 at 11:29:00 AM UTC-7, Vitaly Davidovich
> wrote:
>>
>> Unlike C++, where you can specify mem ordering for failure and success
On x86, I’ve never heard of failed CAS being cheaper. In theory, cache
snooping can inform the core whether it’s xchg would succeed without going
through the RFO dance. But, to perform the actual xchg it would need
ownership regardless (if not already owned/exclusive).
Sharing ordinary mutable
VarHandle does provide suitable (and better) replacements for the various
Unsafe.get/putXXX methods. But, U.objectFieldOffset() is the only (easy?)
Java way to inspect the layout of a class; e.g. say you apply a hacky
class-hierarchy based cacheline padding to a field, and then want to assert
Your understanding of how varargs calls are made is correct - it's nothing
more than sugar for an allocated array to store the args. Your bench,
however, explicitly disables inlining of the varargs method, and thus
prevents escape analysis from potentially eliminating the array
allocation. Try
On Wed, Apr 25, 2018 at 4:52 AM Aleksey Shipilev
wrote:
> On 04/24/2018 10:44 PM, John Hening wrote:
> > I'm reading the great article from
> https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks
> > Aleksey! :)) and I am not sure whether I understand
A few suggestions:
1) have you tried just reading the data in the prefaulting code, instead of
dirtying it with a dummy write? Since this is a disk backed mapping, it
should page fault and map the underlying file data (rather than mapping to
a zero page, e.g.). At a high rate of dirtying, this
This.
Also, I think the (Intel) adjacent sector prefetch is a feature enabled
through BIOS. I think that will pull the adjacent line to L1, whereas the
spatial prefetcher is probably for streaming accesses that are loading L2.
Also, I'd run the bench without atomic ops - just relaxed (atomic)
017 01:11 AM, Ross Bencina wrote:
>>
>>> On 25/01/2017 9:31 AM, Vitaly Davidovich wrote:
>>>
>>>> Interesting (not just) Mono bug:
>>>> http://www.mono-project.com/news/2016/09/12/arm64-icache/
>>>>
>>>
>>> Sc
Interesting (not just) Mono bug:
http://www.mono-project.com/news/2016/09/12/arm64-icache/
--
Sent from my phone
--
You received this message because you are subscribed to the Google Groups
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an
And should also mention that doing very early load scheduling will increase
register pressure as that value will need to be kept live across more
instructions. Stack spills and reloads suck in a hot/tight code sequence.
On Tue, Jan 17, 2017 at 7:08 PM Vitaly Davidovich <vita...@gmail.com>
aligned, so the guarantees of an atomic write don't apply, cache or no
>
> cache.
>
>
>
> On Wed, Jan 18, 2017 at 8:02 AM, Vitaly Davidovich <vita...@gmail.com>
> wrote:
>
> >
>
> >
>
> > On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev
>
The cache miss latency can be hidden either by this load being done ahead
of time or if there're other instructions that can execute while this load
is outstanding. So breaking dependency chains is good, but extending the
distance like this seems weird and may hurt common cases. If ICC does this
execute these instructions in OOO-manner.
> But if you schedule them this way
>
> mov (%rax), %rbx
> ... few instructions
> cmp %rbx, %rdx
> ... few instructions
> jxx Lxxx
>
> It would be possible to execute them out-of-order and calculate something
> additional.
&
On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev <
aleksey.shipi...@gmail.com> wrote:
> On 01/17/2017 12:55 PM, Vitaly Davidovich wrote:
> > Atomicity of values isn't something I'd assume happens automatically.
> Word
> > tearing isn't observable from single threaded co
Atomicity of values isn't something I'd assume happens automatically. Word
tearing isn't observable from single threaded code.
I think the only thing you can safely and portably assume is the high level
"single threaded observable behavior will occur" statement. It's also
interesting to note
Depends on which hardware. For instance, x86/64 is very specific about
what memory operations can be reordered (for cacheable operations), and two
stores aren't reordered. The only reordering is stores followed by loads,
where the load can appear to reorder with the preceding store.
On Mon, Jan
(even if the value at the address is still the expected
one).
On Wed, Jan 4, 2017 at 2:59 PM, Vitaly Davidovich <vita...@gmail.com> wrote:
> Probably worth a mention that "CAS" is a bit too generic. For instance,
> you can have weak and strong CAS, with some architectures o
On Thu, Dec 22, 2016 at 3:58 PM, Gil Tene <g...@azul.com> wrote:
>
>
> On Thursday, December 22, 2016 at 10:46:47 AM UTC-8, Vitaly Davidovich
> wrote:
>>
>>
>>
>> On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene <g...@azul.com> wrote:
>>
>>>
>>>
&
Rajiv/Marshall,
Thanks for your comments. I guess I should rephrase my initial post - I'm
*particularly* interested in production and migration scenarios/stories,
but happy to hear others' casual dabbling experience as well.
I agree on the compile time, but there's good news and bad news. The
On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene <g...@azul.com> wrote:
>
>
> On Thursday, December 22, 2016 at 9:33:09 AM UTC-8, Vitaly Davidovich
> wrote:
>>
>>
>>
>> On Thu, Dec 22, 2016 at 12:14 PM, Gil Tene <g...@azul.com> wrote:
>>
>&g
mall incremental
> steps), or a combination of the two.
>
> And yes, doing that (concurrent generational GC) while supporting great
> latency, high throughput, and high efficiency all at the same time, is very
> possible. It is NOT the hard or inherent tradeoff many people seem to
>
choice for them and how they see Go being used.
Mind you, I'm not a fan nor a user of Go so I'm referring purely to their
stipulated strategy on how to evolve their GC.
On Thu, Dec 22, 2016 at 7:37 AM Remi Forax <fo...@univ-mlv.fr> wrote:
>
>
> ------
&g
FWIW, I think the Go team is right in favoring lower latency over
throughput of their GC given the expected usage scenarios for Go.
In fact, most of the (Hotspot based) Java GC horror stories involve very
long pauses (G1 and CMS not excluded) - I've yet to hear anyone complain
that their "Big
Curious if anyone on this list is running any non-trivial Rust code in
production? And if so, would love to hear some thoughts on how that's going.
Also, if the code either interops with existing c/c++/java code or is a
replacement/rewrite/port of code from those languages, interested to hear
On Sunday, October 30, 2016, Aleksey Shipilev <aleksey.shipi...@gmail.com>
wrote:
> On 10/29/2016 10:31 PM, Vitaly Davidovich wrote:
> > There's one thing I still can't get someone at Oracle to clarify, which
> > is whether getOpaque ensures atomicity of the read. I believ
Why is it egregious? It's detailed in the types of exceptions it throws,
yes, but that's good assuming you want to handle some of those types (and
there are cases where those exceptions can be handled properly). Even
before ReflectiveOperationException, you could use multi-catch since Java 7
to
31 matches
Mail list logo