Re: does call site polymorphism factor in method overrides?

2019-12-30 Thread Vitaly Davidovich
rrides? > > Good to know Vitaly! > So a poor example then. Better example is an abstract class with a method > implementation that no subtypes override, yet multiple subtypes are found > to be the receiver of a particular call site. Should we expect a > monomorphic call site in tha

Re: does call site polymorphism factor in method overrides?

2019-12-29 Thread Vitaly Davidovich
On Sun, Dec 29, 2019 at 10:22 AM Brian Harris wrote: > Hello! > > I was hoping to get one point of clarification about avoiding megamorphic > call sites, after reading these excellent articles: > > > http://www.insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/ >

Re: MESI and 'atomicity'

2019-11-25 Thread Vitaly Davidovich
On Mon, Nov 25, 2019 at 11:50 AM Peter Veentjer wrote: > I have a question about MESI. > > My question isn't about atomic operations; but about an ordinary write to > the same cacheline done by 2 cores. > > If a CPU does a write, the write is placed on the store buffer. > > Then the CPU will

Rust

2019-10-08 Thread Vitaly Davidovich
I posed this question to this list a few years ago, but don’t recall much participation - let’s try again :). Has anyone moved their C, C++, Java, whatever low latency/high perf systems (or components thereof) to Rust? If so, what type of system/component? What has been your experience? Bonus

Re: purpose of an LFENCE

2019-10-08 Thread Vitaly Davidovich
FWIW, I’ve only seen lfence used precisely in the 2 cases mentioned in this thread: 1) use of non-temporal loads (ie weak ordering, normal x86 guarantees go out the window) 2) controlling execution of non-serializing instructions like rdtsc I’d be curious myself to hear of other cases. On Fri,

Re: Volatile semantic for failed/noop atomic operations

2019-10-08 Thread Vitaly Davidovich
didn’t actually pick up on how often the termination protocol triggers - I assumed it’s an uncommon/slow path. > > > On Saturday, September 14, 2019 at 11:29:00 AM UTC-7, Vitaly Davidovich > wrote: >> >> Unlike C++, where you can specify mem ordering for failure and success

Re: Volatile semantic for failed/noop atomic operations

2019-09-14 Thread Vitaly Davidovich
On x86, I’ve never heard of failed CAS being cheaper. In theory, cache snooping can inform the core whether it’s xchg would succeed without going through the RFO dance. But, to perform the actual xchg it would need ownership regardless (if not already owned/exclusive). Sharing ordinary mutable

Re: how to replace Unsafe.objectFieldOffset in jdk 11

2019-07-17 Thread Vitaly Davidovich
VarHandle does provide suitable (and better) replacements for the various Unsafe.get/putXXX methods. But, U.objectFieldOffset() is the only (easy?) Java way to inspect the layout of a class; e.g. say you apply a hacky class-hierarchy based cacheline padding to a field, and then want to assert

Re: Varargs vs. explicit param method call

2018-05-06 Thread Vitaly Davidovich
Your understanding of how varargs calls are made is correct - it's nothing more than sugar for an allocated array to store the args. Your bench, however, explicitly disables inlining of the varargs method, and thus prevents escape analysis from potentially eliminating the array allocation. Try

Re: Nanotrusting the Nanotime and amortization.

2018-04-25 Thread Vitaly Davidovich
On Wed, Apr 25, 2018 at 4:52 AM Aleksey Shipilev wrote: > On 04/24/2018 10:44 PM, John Hening wrote: > > I'm reading the great article from > https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks > > Aleksey! :)) and I am not sure whether I understand

Re: Disk-based logger - write pretouch

2017-07-10 Thread Vitaly Davidovich
A few suggestions: 1) have you tried just reading the data in the prefaulting code, instead of dirtying it with a dummy write? Since this is a disk backed mapping, it should page fault and map the underlying file data (rather than mapping to a zero page, e.g.). At a high rate of dirtying, this

Re: Prefetching and false sharing

2017-01-29 Thread Vitaly Davidovich
This. Also, I think the (Intel) adjacent sector prefetch is a feature enabled through BIOS. I think that will pull the adjacent line to L1, whereas the spatial prefetcher is probably for streaming accesses that are loading L2. Also, I'd run the bench without atomic ops - just relaxed (atomic)

Re: SMP vs AMP: watch your cacheline sizes

2017-01-25 Thread Vitaly Davidovich
017 01:11 AM, Ross Bencina wrote: >> >>> On 25/01/2017 9:31 AM, Vitaly Davidovich wrote: >>> >>>> Interesting (not just) Mono bug: >>>> http://www.mono-project.com/news/2016/09/12/arm64-icache/ >>>> >>> >>> Sc

SMP vs AMP: watch your cacheline sizes

2017-01-24 Thread Vitaly Davidovich
Interesting (not just) Mono bug: http://www.mono-project.com/news/2016/09/12/arm64-icache/ -- Sent from my phone -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
And should also mention that doing very early load scheduling will increase register pressure as that value will need to be kept live across more instructions. Stack spills and reloads suck in a hot/tight code sequence. On Tue, Jan 17, 2017 at 7:08 PM Vitaly Davidovich <vita...@gmail.com>

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
aligned, so the guarantees of an atomic write don't apply, cache or no > > cache. > > > > On Wed, Jan 18, 2017 at 8:02 AM, Vitaly Davidovich <vita...@gmail.com> > wrote: > > > > > > > > > On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev >

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
The cache miss latency can be hidden either by this load being done ahead of time or if there're other instructions that can execute while this load is outstanding. So breaking dependency chains is good, but extending the distance like this seems weird and may hurt common cases. If ICC does this

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
execute these instructions in OOO-manner. > But if you schedule them this way > > mov (%rax), %rbx > ... few instructions > cmp %rbx, %rdx > ... few instructions > jxx Lxxx > > It would be possible to execute them out-of-order and calculate something > additional. &

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev < aleksey.shipi...@gmail.com> wrote: > On 01/17/2017 12:55 PM, Vitaly Davidovich wrote: > > Atomicity of values isn't something I'd assume happens automatically. > Word > > tearing isn't observable from single threaded co

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
Atomicity of values isn't something I'd assume happens automatically. Word tearing isn't observable from single threaded code. I think the only thing you can safely and portably assume is the high level "single threaded observable behavior will occur" statement. It's also interesting to note

Re: Operation Reordering

2017-01-16 Thread Vitaly Davidovich
Depends on which hardware. For instance, x86/64 is very specific about what memory operations can be reordered (for cacheable operations), and two stores aren't reordered. The only reordering is stores followed by loads, where the load can appear to reorder with the preceding store. On Mon, Jan

Re: How hardware implements CAS

2017-01-04 Thread Vitaly Davidovich
(even if the value at the address is still the expected one). On Wed, Jan 4, 2017 at 2:59 PM, Vitaly Davidovich <vita...@gmail.com> wrote: > Probably worth a mention that "CAS" is a bit too generic. For instance, > you can have weak and strong CAS, with some architectures o

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
On Thu, Dec 22, 2016 at 3:58 PM, Gil Tene <g...@azul.com> wrote: > > > On Thursday, December 22, 2016 at 10:46:47 AM UTC-8, Vitaly Davidovich > wrote: >> >> >> >> On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene <g...@azul.com> wrote: >> >>> >>> &

Re: Any serious Rust users here?

2016-12-22 Thread Vitaly Davidovich
Rajiv/Marshall, Thanks for your comments. I guess I should rephrase my initial post - I'm *particularly* interested in production and migration scenarios/stories, but happy to hear others' casual dabbling experience as well. I agree on the compile time, but there's good news and bad news. The

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene <g...@azul.com> wrote: > > > On Thursday, December 22, 2016 at 9:33:09 AM UTC-8, Vitaly Davidovich > wrote: >> >> >> >> On Thu, Dec 22, 2016 at 12:14 PM, Gil Tene <g...@azul.com> wrote: >> >&g

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
mall incremental > steps), or a combination of the two. > > And yes, doing that (concurrent generational GC) while supporting great > latency, high throughput, and high efficiency all at the same time, is very > possible. It is NOT the hard or inherent tradeoff many people seem to >

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
choice for them and how they see Go being used. Mind you, I'm not a fan nor a user of Go so I'm referring purely to their stipulated strategy on how to evolve their GC. On Thu, Dec 22, 2016 at 7:37 AM Remi Forax <fo...@univ-mlv.fr> wrote: > > > ------ &g

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
FWIW, I think the Go team is right in favoring lower latency over throughput of their GC given the expected usage scenarios for Go. In fact, most of the (Hotspot based) Java GC horror stories involve very long pauses (G1 and CMS not excluded) - I've yet to hear anyone complain that their "Big

Any serious Rust users here?

2016-12-22 Thread Vitaly Davidovich
Curious if anyone on this list is running any non-trivial Rust code in production? And if so, would love to hear some thoughts on how that's going. Also, if the code either interops with existing c/c++/java code or is a replacement/rewrite/port of code from those languages, interested to hear

Re: Single writer counter: how expensive is a volatile read?

2016-10-30 Thread Vitaly Davidovich
On Sunday, October 30, 2016, Aleksey Shipilev <aleksey.shipi...@gmail.com> wrote: > On 10/29/2016 10:31 PM, Vitaly Davidovich wrote: > > There's one thing I still can't get someone at Oracle to clarify, which > > is whether getOpaque ensures atomicity of the read. I believ

Re: Unchecked exceptions for IO considered harmful.

2016-08-15 Thread Vitaly Davidovich
Why is it egregious? It's detailed in the types of exceptions it throws, yes, but that's good assuming you want to handle some of those types (and there are cases where those exceptions can be handled properly). Even before ReflectiveOperationException, you could use multi-catch since Java 7 to