How expensive is a volatile read for implementing a single writer
performance counter?
One can easily create some kind of performance counter using an AtomicLong
or using a LongAdder if there is contention.
some example code:
public class Counter {
private final AtomicLong c = new Atomic
hod will only be called every X seconds. So a volatile read is
fine.
On Saturday, October 29, 2016 at 11:13:38 AM UTC+3, Peter Veentjer wrote:
>
> How expensive is a volatile read for implementing a single writer
> performance counter?
>
> One can easily create some kind of perfo
The following question is one out of curiosity of compiler optimization of
private final static fields.
I created a very contrived example. The idea is to have some kind of switch
that can be enabled/disabled when the JVM starts up.
public class Foo{
private final static boolean ENABLED = B
Hi Martin,
so the JIT is able to remove the if(ENABLED) branch if ENABLED = false?
How does it deal with reflection? The JIT is triggered to deoptimize the
code when the ENABLED changes?
On Sunday, December 18, 2016 at 10:50:48 AM UTC+2, Peter Veentjer wrote:
>
> The following question
perhaps a naive question, but why is there a the max inlining depth?
I understand there is a limit on the maximum inline size of a method; AFAIK
the reason behind this limitation is that you otherwise would get a huge
amount of code. And this could e.g. reduce the effectiveness of the
instruct
In Martin Thompson's talks about Aeron he mentions writing threads doing an
increment on an AtomicLong counter to claim a region in a buffer, but the
initial 4 bytes for the length field aren't written, but only on
completion of the frame, the length is set. This signals to the reader of
the b
otocol takes care it. But
would like to get confirmation anyway.
On Monday, May 29, 2017 at 6:17:33 AM UTC+3, Peter Veentjer wrote:
>
> In Martin Thompson's talks about Aeron he mentions writing threads doing
> an increment on an AtomicLong counter to claim a region in a buffer, b
Thanks everyone.
Another related question. When an atomic integer is used so that writers
can claim their segment in the buffer, what prevent wrapping of this atomic
integer and falsely assuming you have allocated a section in the buffer?
So imagine the buffer is full, and the atomic integer >
Indeed a very interesting read.
On Monday, July 17, 2017 at 5:44:13 PM UTC+3, Avi Kivity wrote:
>
> Some time ago I posted about how we ~ doubled our IPC; here is a
> detailed blog post explaining the problem and solution:
>
>
>
> http://www.scylladb.com/2017/07/06/scyllas-approach-improve-perf
Hi Everyone,
I have a failing understanding the problem with transparent huge pages.
I 'understand' how normal pages work. A page is typically 4kb in a virtual
address space; each process has its own.
I understand how the TLB fits in; a cache providing a mapping of virtual to
real addresses t
This can lead to significant
> pauses as the kernel decides that now is the time to rearrange your
> process's memory to be more or less huge-page-y. I have experienced this
> even with desktop java apps with relatively small heaps.
>
> You can look for CONFIG_TRANSPARENT_HUGEPAGE
Thanks for your very useful replies Gil.
Question:
Using huge pages can give a big performance boost:
https://shipilev.net/jvm-anatomy-park/2-transparent-huge-pages/
$ time java -Xms4T -Xmx4T -XX:-UseTransparentHugePages -XX:+AlwaysPreTouch
real13m58.167s
user43m37.519s
sys 1011m25.
I have been improving my gc knowledge and there is some confusion on my
side regarding the 'mark-sweep-compact' algorithm I frequently see
mentioned in posts, articles and some not too formal books on the topic.
E.g.
https://plumbr.eu/handbook/garbage-collection-algorithms/removing-unused-objec
As part of an experiment, I'm working on querying large volumes of data
which is stored offheap.
The content of each record is stored in a chunk of offheap memory. So
instead of having an array of object references, it is an array of records
(no pointer chasing).
My confusion is about some cod
rmally apply if an int was used instead. You might want to see what
> JITWatch can tell you.
>
> — Kirk
>
> On Dec 27, 2017, at 10:09 AM, Peter Veentjer > wrote:
>
> As part of an experiment, I'm working on querying large volumes of data
> which is stored offheap
id makes two…. ;-)
>
> — Kirk
>
> On Dec 27, 2017, at 11:23 AM, Peter Veentjer > wrote:
>
> Hi Kirk,
>
> thanks for your reply.
>
> Unfortunately I'm made a very shameful mistake. The println at the end of
> the loop always gets called; even if nothing
I'm working on some very simple aggregations on huge chunks of offheap
memory (500GB+) for a hackaton. This is done using a very simple stride;
every iteration the address increases with 20 bytes. So the prefetcher
should not have any problems with it.
According to my calculations I'm currently
Some additional information.
The memory is broken up into chunks of 8MB and are executed in parallel
using the fork join framework. So all cores are busy iterating over the
memory.
On Sunday, January 14, 2018 at 8:44:00 PM UTC+2, Peter Veentjer wrote:
>
> I'm working on some
maximum bandwidth of the memory bus.
On Sunday, January 14, 2018 at 8:44:00 PM UTC+2, Peter Veentjer wrote:
>
> I'm working on some very simple aggregations on huge chunks of offheap
> memory (500GB+) for a hackaton. This is done using a very simple stride;
> every iteration th
I'm polishing up my lock free knowledge and one question keeps on bugging
me.
The question is about the LOCK XADD and why it is considered to be wait
free.
AFAIK for wait freedom there needs to be some fairness.
So image a concurrent counter using a spin on a cas to increment a counter,
then
ee "Wait-Free Queues With
> Multiple Enqueuers and Dequeuers"
> <http://www.cs.technion.ac.il/~erez/Papers/wfquque-ppopp.pdf>, where wait
> free queues are implemented using CAS constructs.
>
> On Friday, August 24, 2018 at 10:00:22 PM UTC-7, Peter Veentjer wrote:
I have a question about RSS and how a CPU is selected to process the
interrupt.
So with RSS you have multiple rx-queue and each rx-queue has an IRQ
associated to it. Each CPU can pick up the IRQ as clearly explained here:
https://www.kernel.org/doc/Documentation/networking/scaling.txt
It is po
x) , irq affinities
> (/proc/interrupts/$irq/smp_affinity)
> or configure relevant stack if you're using solarflare (through onload
> profile).
>
> Hope this helps.
>
>
> On Fri, 15 Mar 2019, 14:40 Peter Veentjer, > wrote:
>
>> I have a question about
e set'. Get rid of the sweep and complexity goes down
to 'linear to the live set'. What is the added value of such an inefficient
GC implementation?
I'm sure there is value; but currently it doesn't compute.
On Monday, August 14, 2017 at 8:47:01 AM UTC+3, Peter V
Thanks for your answer Aleksey,
comments inline.
On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote:
>
> On 8/6/19 7:38 AM, Peter Veentjer wrote:
> > There is still some confusion on my side. This time it is regarding the
> algorithmic complexity of
ep-compact.
On Wednesday, August 14, 2019 at 9:34:15 AM UTC+3, Peter Veentjer wrote:
>
> Thanks for your answer Aleksey,
>
> comments inline.
>
> On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote:
>>
>> On 8/6/19 7:38 AM, Peter Veentjer wrote:
>&
Hi Gil,
thanks for your detailed answer.
So mark-sweep-compact implementation really do exist. I'll need to update a
set of flashcards.
On Wednesday, August 14, 2019 at 12:30:10 PM UTC+3, Gil Tene wrote:
>
>
> On Aug 13, 2019, at 11:34 PM, Peter Veentjer > wrote:
>
>
I'm have been checking out the new fence API's in Java (Unsafe/VarHandle).
I understand how the higher level API are translated to the logical fences.
E.g. release fence -> LoadStore+StoreStore. There are some great post
including
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
ch
condition #2 would be needed.
>
> On Fri, Oct 4, 2019 at 10:10 AM Peter Veentjer > wrote:
>
>> I'm have been checking out the new fence API's in Java
>> (Unsafe/VarHandle).
>>
>> I understand how the higher level API are translated to the logica
The track-record of Java isn't very good IMHO. For example:
1) the Hotspot JVM isn't very good at identifying where SIMD instructions
should be used (Azul Zing with the LLVM backend does a better job).
2) There is no official integration for GPGPU (General-Purpose computing on
Graphics Processin
I have a question about MESI.
My question isn't about atomic operations; but about an ordinary write to
the same cacheline done by 2 cores.
If a CPU does a write, the write is placed on the store buffer.
Then the CPU will send a invalidation request to the other cores (RFO) for
the given cach
I'm currently investigating Meltdown in combination with X86.
The cause of Meltdown is that due to out of order execution, access to
memory can be done in a page that doesn't have the correct permissions to
be accessed. And using a side channel attack based on cache timing every
byte from that
The volatile variable ensures that there is a happens-before edge between
the write and the read.
So the synchronized block isn't needed; it even doesn't provide any
visibility guarantee because:
- currentMap = newMap is not send in the synchronized block
- currentMap isn't read in the synchronize
I forget a very important part.
There is a memory allocator used by the process. And this memory allocator
either makes use of the program break or the mmap for an anonymous memory
mapping.
The ByteBuffer.allocateDirect forwards allocation requests to this memory
allocator; it will not directly a
I should have not opened my mouth :)
I thought multiple humongous regions for a single array did not need to be
contiguous. But apparently this is incorrect:
https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/g1_gc_tuning.html
On Fri, Oct 2, 2020 at 8:19 AM Peter Veentjer wrote
There is a happens before edge between calling the start method of a thread
and the thread running.
On JMM level that is sufficient.
On Thu, Jul 29, 2021, 18:47 r r wrote:
> Hello,
>
> We have the simple pseudocode:
>
> void main(String[] args) {
> MemoryBuffer b = new MemoryBuffer(); // it
Hi,
I'm reading the following book "Developing High-Frequency Trading Systems".
My goal is not to write any high-frequency trading systems, but to get some
insights into the domain and learn as much as possible from the applied
techniques. It is a Packt book and they are not known for their qualit
.html
>
>
> On Sat, Jul 2, 2022 at 7:16 AM peter royal wrote:
>
>> is it possible the book was accurate at the time of writing but hardware
>> has since improved?
>> -pete
>>
>> --
>> (peter.royal|osi)@pobox.com - http://fotap.org/~osi
>>
>>
If T1 would run first, the content of the ConcurrentHashMap is (1,true),
and therefore there is only 1 value.
So it will print 'true' and not 'true,true' because T2 has not run yet.
On Mon, Sep 12, 2022 at 3:31 PM r r wrote:
> Hello,
> let's look for the following piece of code:
>
> c = new Con
ent:
>
> > Actions in a thread prior to placing an object into any concurrent
> collection happen-before actions subsequent to the access or removal of
> that element from the collection in another thread.
>
> So ether one of threads will print (true, true) I guess.
>
> On
On Wed, Sep 14, 2022 at 1:51 PM r r wrote:
> Hello,
> let's look for the following piece of code:
>
> int x;
> volatile boolean v; // v = false by default
> T1:
> x = 1; (1)
> v = true;(2)
> doSth(3)
>
> T2:
>doSth2(4)
>if (v) {} (5)
>
>
> When T2
ertainly have a
happens-before edge is v would be volatile.
>
> środa, 14 września 2022 o 14:04:16 UTC+2 r r napisał(a):
>
>> Thanks
>>
>> śr., 14 wrz 2022, 13:26 użytkownik Peter Veentjer
>> napisał:
>>
>>>
>>>
>>> On Wed, Sep
On Sun, Sep 18, 2022 at 9:24 AM Antonio Rafael Rodrigues <
antonio.rafael...@gmail.com> wrote:
> Hello
>
> I have two questions.
>
> 1)
> As we can see in the source code of AtomicLong, there should be a
> difference between calling set and lazySet, given that *set* calls
> Unsafe.putLongVolatile
This is also a very good read for your particular question:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
On Sun, Sep 18, 2022 at 10:31 AM Peter Veentjer
wrote:
>
>
> On Sun, Sep 18, 2022 at 9:24 AM Antonio Rafael Rodrigues <
> antonio.rafael...@gmail.com>
erent address.
On Sun, Sep 18, 2022 at 10:36 AM Peter Veentjer
wrote:
> This is also a very good read for your particular question:
>
> https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
>
> On Sun, Sep 18, 2022 at 10:31 AM Peter Veentjer
> wrote:
>
>>
&g
The program isn't clear to me. I guess you want something like this:
int x;
volatile int v;
CPU1:
write(x, 1) (1)
write(v, 1)(2)
CPU2
read(v) (3)
read(x)(4)
Then there exists an execution where (4) sees value written by (1) it is in
a data race with
46 matches
Mail list logo