Single writer counter: how expensive is a volatile read?

2016-10-29 Thread Peter Veentjer
How expensive is a volatile read for implementing a single writer performance counter? One can easily create some kind of performance counter using an AtomicLong or using a LongAdder if there is contention. some example code: public class Counter { private final AtomicLong c = new Atomic

Re: Single writer counter: how expensive is a volatile read?

2016-10-29 Thread Peter Veentjer
hod will only be called every X seconds. So a volatile read is fine. On Saturday, October 29, 2016 at 11:13:38 AM UTC+3, Peter Veentjer wrote: > > How expensive is a volatile read for implementing a single writer > performance counter? > > One can easily create some kind of perfo

private final static optimization

2016-12-18 Thread Peter Veentjer
The following question is one out of curiosity of compiler optimization of private final static fields. I created a very contrived example. The idea is to have some kind of switch that can be enabled/disabled when the JVM starts up. public class Foo{ private final static boolean ENABLED = B

Re: private final static optimization

2016-12-18 Thread Peter Veentjer
Hi Martin, so the JIT is able to remove the if(ENABLED) branch if ENABLED = false? How does it deal with reflection? The JIT is triggered to deoptimize the code when the ENABLED changes? On Sunday, December 18, 2016 at 10:50:48 AM UTC+2, Peter Veentjer wrote: > > The following question

Method inlining: why does MaxInlineLevel exist?

2017-01-07 Thread Peter Veentjer
perhaps a naive question, but why is there a the max inlining depth? I understand there is a limit on the maximum inline size of a method; AFAIK the reason behind this limitation is that you otherwise would get a huge amount of code. And this could e.g. reduce the effectiveness of the instruct

Aeron zero'ing buffer?

2017-05-28 Thread Peter Veentjer
In Martin Thompson's talks about Aeron he mentions writing threads doing an increment on an AtomicLong counter to claim a region in a buffer, but the initial 4 bytes for the length field aren't written, but only on completion of the frame, the length is set. This signals to the reader of the b

Re: Aeron zero'ing buffer?

2017-05-28 Thread Peter Veentjer
otocol takes care it. But would like to get confirmation anyway. On Monday, May 29, 2017 at 6:17:33 AM UTC+3, Peter Veentjer wrote: > > In Martin Thompson's talks about Aeron he mentions writing threads doing > an increment on an AtomicLong counter to claim a region in a buffer, b

Re: Aeron zero'ing buffer?

2017-05-30 Thread Peter Veentjer
Thanks everyone. Another related question. When an atomic integer is used so that writers can claim their segment in the buffer, what prevent wrapping of this atomic integer and falsely assuming you have allocated a section in the buffer? So imagine the buffer is full, and the atomic integer >

Re: Detailed post on using SEDA-like algorithms to improve IPC

2017-07-23 Thread Peter Veentjer
Indeed a very interesting read. On Monday, July 17, 2017 at 5:44:13 PM UTC+3, Avi Kivity wrote: > > Some time ago I posted about how we ~ doubled our IPC; here is a > detailed blog post explaining the problem and solution: > > > > http://www.scylladb.com/2017/07/06/scyllas-approach-improve-perf

failing to understand the issues with transparent huge paging

2017-08-07 Thread Peter Veentjer
Hi Everyone, I have a failing understanding the problem with transparent huge pages. I 'understand' how normal pages work. A page is typically 4kb in a virtual address space; each process has its own. I understand how the TLB fits in; a cache providing a mapping of virtual to real addresses t

Re: failing to understand the issues with transparent huge paging

2017-08-07 Thread Peter Veentjer
This can lead to significant > pauses as the kernel decides that now is the time to rearrange your > process's memory to be more or less huge-page-y. I have experienced this > even with desktop java apps with relatively small heaps. > > You can look for CONFIG_TRANSPARENT_HUGEPAGE

Re: failing to understand the issues with transparent huge paging

2017-08-09 Thread Peter Veentjer
Thanks for your very useful replies Gil. Question: Using huge pages can give a big performance boost: https://shipilev.net/jvm-anatomy-park/2-transparent-huge-pages/ $ time java -Xms4T -Xmx4T -XX:-UseTransparentHugePages -XX:+AlwaysPreTouch real13m58.167s user43m37.519s sys 1011m25.

Confusion regarding 'mark-sweep-compact' naming

2017-08-13 Thread Peter Veentjer
I have been improving my gc knowledge and there is some confusion on my side regarding the 'mark-sweep-compact' algorithm I frequently see mentioned in posts, articles and some not too formal books on the topic. E.g. https://plumbr.eu/handbook/garbage-collection-algorithms/removing-unused-objec

Confusion high performance loop

2017-12-27 Thread Peter Veentjer
As part of an experiment, I'm working on querying large volumes of data which is stored offheap. The content of each record is stored in a chunk of offheap memory. So instead of having an array of object references, it is an array of records (no pointer chasing). My confusion is about some cod

Re: Confusion high performance loop

2017-12-27 Thread Peter Veentjer
rmally apply if an int was used instead. You might want to see what > JITWatch can tell you. > > — Kirk > > On Dec 27, 2017, at 10:09 AM, Peter Veentjer > wrote: > > As part of an experiment, I'm working on querying large volumes of data > which is stored offheap

Re: Confusion high performance loop

2017-12-27 Thread Peter Veentjer
id makes two…. ;-) > > — Kirk > > On Dec 27, 2017, at 11:23 AM, Peter Veentjer > wrote: > > Hi Kirk, > > thanks for your reply. > > Unfortunately I'm made a very shameful mistake. The println at the end of > the loop always gets called; even if nothing

Determine memory bandwidth machine

2018-01-14 Thread Peter Veentjer
I'm working on some very simple aggregations on huge chunks of offheap memory (500GB+) for a hackaton. This is done using a very simple stride; every iteration the address increases with 20 bytes. So the prefetcher should not have any problems with it. According to my calculations I'm currently

Re: Determine memory bandwidth machine

2018-01-14 Thread Peter Veentjer
Some additional information. The memory is broken up into chunks of 8MB and are executed in parallel using the fork join framework. So all cores are busy iterating over the memory. On Sunday, January 14, 2018 at 8:44:00 PM UTC+2, Peter Veentjer wrote: > > I'm working on some

Re: Determine memory bandwidth machine

2018-01-14 Thread Peter Veentjer
maximum bandwidth of the memory bus. On Sunday, January 14, 2018 at 8:44:00 PM UTC+2, Peter Veentjer wrote: > > I'm working on some very simple aggregations on huge chunks of offheap > memory (500GB+) for a hackaton. This is done using a very simple stride; > every iteration th

LOCK XADD wait-free or lock free.

2018-08-24 Thread Peter Veentjer
I'm polishing up my lock free knowledge and one question keeps on bugging me. The question is about the LOCK XADD and why it is considered to be wait free. AFAIK for wait freedom there needs to be some fairness. So image a concurrent counter using a spin on a cas to increment a counter, then

Re: LOCK XADD wait-free or lock free.

2018-08-25 Thread Peter Veentjer
ee "Wait-Free Queues With > Multiple Enqueuers and Dequeuers" > <http://www.cs.technion.ac.il/~erez/Papers/wfquque-ppopp.pdf>, where wait > free queues are implemented using CAS constructs. > > On Friday, August 24, 2018 at 10:00:22 PM UTC-7, Peter Veentjer wrote:

RSS and CPU selection

2019-03-15 Thread Peter Veentjer
I have a question about RSS and how a CPU is selected to process the interrupt. So with RSS you have multiple rx-queue and each rx-queue has an IRQ associated to it. Each CPU can pick up the IRQ as clearly explained here: https://www.kernel.org/doc/Documentation/networking/scaling.txt It is po

Re: RSS and CPU selection

2019-03-15 Thread Peter Veentjer
x) , irq affinities > (/proc/interrupts/$irq/smp_affinity) > or configure relevant stack if you're using solarflare (through onload > profile). > > Hope this helps. > > > On Fri, 15 Mar 2019, 14:40 Peter Veentjer, > wrote: > >> I have a question about

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-05 Thread Peter Veentjer
e set'. Get rid of the sweep and complexity goes down to 'linear to the live set'. What is the added value of such an inefficient GC implementation? I'm sure there is value; but currently it doesn't compute. On Monday, August 14, 2017 at 8:47:01 AM UTC+3, Peter V

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-13 Thread Peter Veentjer
Thanks for your answer Aleksey, comments inline. On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote: > > On 8/6/19 7:38 AM, Peter Veentjer wrote: > > There is still some confusion on my side. This time it is regarding the > algorithmic complexity of

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-13 Thread Peter Veentjer
ep-compact. On Wednesday, August 14, 2019 at 9:34:15 AM UTC+3, Peter Veentjer wrote: > > Thanks for your answer Aleksey, > > comments inline. > > On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote: >> >> On 8/6/19 7:38 AM, Peter Veentjer wrote: >&

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-14 Thread Peter Veentjer
Hi Gil, thanks for your detailed answer. So mark-sweep-compact implementation really do exist. I'll need to update a set of flashcards. On Wednesday, August 14, 2019 at 12:30:10 PM UTC+3, Gil Tene wrote: > > > On Aug 13, 2019, at 11:34 PM, Peter Veentjer > wrote: > >

purpose of an LFENCE

2019-10-04 Thread Peter Veentjer
I'm have been checking out the new fence API's in Java (Unsafe/VarHandle). I understand how the higher level API are translated to the logical fences. E.g. release fence -> LoadStore+StoreStore. There are some great post including https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

Re: purpose of an LFENCE

2019-10-10 Thread Peter Veentjer
ch condition #2 would be needed. > > On Fri, Oct 4, 2019 at 10:10 AM Peter Veentjer > wrote: > >> I'm have been checking out the new fence API's in Java >> (Unsafe/VarHandle). >> >> I understand how the higher level API are translated to the logica

Re: JVMs and the new silicon

2019-11-05 Thread Peter Veentjer
The track-record of Java isn't very good IMHO. For example: 1) the Hotspot JVM isn't very good at identifying where SIMD instructions should be used (Azul Zing with the LLVM backend does a better job). 2) There is no official integration for GPGPU (General-Purpose computing on Graphics Processin

MESI and 'atomicity'

2019-11-25 Thread Peter Veentjer
I have a question about MESI. My question isn't about atomic operations; but about an ordinary write to the same cacheline done by 2 cores. If a CPU does a write, the write is placed on the store buffer. Then the CPU will send a invalidation request to the other cores (RFO) for the given cach

MMU and Meltdown

2020-03-12 Thread Peter Veentjer
I'm currently investigating Meltdown in combination with X86. The cause of Meltdown is that due to out of order execution, access to memory can be done in a page that doesn't have the correct permissions to be accessed. And using a side channel attack based on cache timing every byte from that

Re: Use HashMap for better multithreaded performance

2020-07-29 Thread Peter Veentjer
The volatile variable ensures that there is a happens-before edge between the write and the read. So the synchronized block isn't needed; it even doesn't provide any visibility guarantee because: - currentMap = newMap is not send in the synchronized block - currentMap isn't read in the synchronize

Re: Reliably allocating large arrays

2020-10-01 Thread Peter Veentjer
I forget a very important part. There is a memory allocator used by the process. And this memory allocator either makes use of the program break or the mmap for an anonymous memory mapping. The ByteBuffer.allocateDirect forwards allocation requests to this memory allocator; it will not directly a

Re: Reliably allocating large arrays

2020-10-01 Thread Peter Veentjer
I should have not opened my mouth :) I thought multiple humongous regions for a single array did not need to be contiguous. But apparently this is incorrect: https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/g1_gc_tuning.html On Fri, Oct 2, 2020 at 8:19 AM Peter Veentjer wrote

Re: Thread safety of the shared piece of memory - Java Memory Model

2021-07-29 Thread Peter Veentjer
There is a happens before edge between calling the start method of a thread and the thread running. On JMM level that is sufficient. On Thu, Jul 29, 2021, 18:47 r r wrote: > Hello, > > We have the simple pseudocode: > > void main(String[] args) { > MemoryBuffer b = new MemoryBuffer(); // it

Interrupts and hyperthreading.

2022-07-01 Thread Peter Veentjer
Hi, I'm reading the following book "Developing High-Frequency Trading Systems". My goal is not to write any high-frequency trading systems, but to get some insights into the domain and learn as much as possible from the applied techniques. It is a Packt book and they are not known for their qualit

Re: Interrupts and hyperthreading.

2022-07-02 Thread Peter Veentjer
.html > > > On Sat, Jul 2, 2022 at 7:16 AM peter royal wrote: > >> is it possible the book was accurate at the time of writing but hardware >> has since improved? >> -pete >> >> -- >> (peter.royal|osi)@pobox.com - http://fotap.org/~osi >> >>

Re: Java Memory Model, ConcurrencyHashMap and guarantee of iterator

2022-09-12 Thread Peter Veentjer
If T1 would run first, the content of the ConcurrentHashMap is (1,true), and therefore there is only 1 value. So it will print 'true' and not 'true,true' because T2 has not run yet. On Mon, Sep 12, 2022 at 3:31 PM r r wrote: > Hello, > let's look for the following piece of code: > > c = new Con

Re: Java Memory Model, ConcurrencyHashMap and guarantee of iterator

2022-09-12 Thread Peter Veentjer
ent: > > > Actions in a thread prior to placing an object into any concurrent > collection happen-before actions subsequent to the access or removal of > that element from the collection in another thread. > > So ether one of threads will print (true, true) I guess. > > On

Re: volatile reads does happen before volatile write in JMM?

2022-09-14 Thread Peter Veentjer
On Wed, Sep 14, 2022 at 1:51 PM r r wrote: > Hello, > let's look for the following piece of code: > > int x; > volatile boolean v; // v = false by default > T1: > x = 1; (1) > v = true;(2) > doSth(3) > > T2: >doSth2(4) >if (v) {} (5) > > > When T2

Re: volatile reads does happen before volatile write in JMM?

2022-09-15 Thread Peter Veentjer
ertainly have a happens-before edge is v would be volatile. > > środa, 14 września 2022 o 14:04:16 UTC+2 r r napisał(a): > >> Thanks >> >> śr., 14 wrz 2022, 13:26 użytkownik Peter Veentjer >> napisał: >> >>> >>> >>> On Wed, Sep

Re: Difference between set and lazySet on AtomicLong

2022-09-18 Thread Peter Veentjer
On Sun, Sep 18, 2022 at 9:24 AM Antonio Rafael Rodrigues < antonio.rafael...@gmail.com> wrote: > Hello > > I have two questions. > > 1) > As we can see in the source code of AtomicLong, there should be a > difference between calling set and lazySet, given that *set* calls > Unsafe.putLongVolatile

Re: Difference between set and lazySet on AtomicLong

2022-09-18 Thread Peter Veentjer
This is also a very good read for your particular question: https://shipilev.net/blog/2014/on-the-fence-with-dependencies/ On Sun, Sep 18, 2022 at 10:31 AM Peter Veentjer wrote: > > > On Sun, Sep 18, 2022 at 9:24 AM Antonio Rafael Rodrigues < > antonio.rafael...@gmail.com>

Re: Difference between set and lazySet on AtomicLong

2022-09-18 Thread Peter Veentjer
erent address. On Sun, Sep 18, 2022 at 10:36 AM Peter Veentjer wrote: > This is also a very good read for your particular question: > > https://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > > On Sun, Sep 18, 2022 at 10:31 AM Peter Veentjer > wrote: > >> &g

Re: JMM - is this program data race free?

2022-10-23 Thread Peter Veentjer
The program isn't clear to me. I guess you want something like this: int x; volatile int v; CPU1: write(x, 1) (1) write(v, 1)(2) CPU2 read(v) (3) read(x)(4) Then there exists an execution where (4) sees value written by (1) it is in a data race with