Re: LMAX Disruptor vs Agrona RingBuffer (one consumer and one producer)

2024-02-11 Thread Martin Thompson
I did a talk about testing latency on FIFO structures some time ago.

https://vimeo.com/181814364

Benchmarks are here:

https://github.com/real-logic/benchmarks

Best,
Martin...

On Sunday 11 February 2024 at 09:19:38 UTC vitor@gmail.com wrote:

> Hi there,
>
> Having some trouble to find any latency comparison between LMAX Disruptor 
> and Agrona RingBuffer.
>
> Which one is better, or maybe the questions is, for latency perspective 
> should I go with Agrona or LMAX?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/fe552ec3-f6b1-4ea9-ad48-2b072d78c719n%40googlegroups.com.


Re: Simple explanation of Histogram

2019-08-04 Thread Martin Thompson
You can ask here:

https://gitter.im/HdrHistogram/HdrHistogram

On Sunday, 4 August 2019 15:13:58 UTC+1, Mohan Radhakrishnan wrote:
>
> Hi,
>
>   I didn't find another forum for my simple question. But if there is 
> one I can ask there.
>
> I need to record millions of values like these - 0.81, 0.33 and so on. And 
> I need a true min. and true max. Is that possible ? If it is not then I 
> need some basic explanation. Please point out any other material
> if it can help.
>
> Thanks,
> Mohan
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/b5102131-ffd5-4b6e-bb5f-f83a16abd061%40googlegroups.com.


Re: Probing the CPU for metrics / info

2018-12-01 Thread Martin Thompson


> Thanks everyone who responded, they are good starting points. These look 
> like will work on Linux or macos machibes what about Windows? 


For general profiling on Windows then perfview is work trying.

https://github.com/Microsoft/perfview

There is the excellent overview by Sasha Goldshtein.

https://www.infoq.com/presentations/perfview-net

Martin...
 

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Mechanical sympathy for high schoolers?

2018-10-18 Thread Martin Thompson
Nicely put Greg. Building, debugging, benchmarking, and profiling data
structures is a great exercise which teaches many skills. Placing emphasis
on the debugging, benchmarking, and profiling will develop really useful
tools of the trade. The data structures are a great vehicle for that.
Discussing what is discovered at each stage really helps build
understanding, as you nicely point out in the what has "changed" between
computers point.

Martin...

On Thu, 18 Oct 2018 at 18:01, Greg Young  wrote:

> So I would probably focus on some more basic principles with them and
> focus on what is faster and *why*. I would also *NOT* use javascript if the
> point was to look at efficiency but instead something like C. To be fair
> you can teach an aspiring student enough C to write basic data structures
> very quickly if they already know javascript.
>
> When learning basic data structures and algorithms as an example learning
> sympathy with them is *required* for really understanding them. Let me give
> an example, a linked list or unbalanced tree made up of malloc'ed nodes vs
> a linked list made up of nodes pre-malloced into an array. How does each
> perform? What are the trade offs? What happens when the array is filled? If
> just malloced what happens when we introduce *other* mallocs into our code
> as it runs? Even such a simple implementation which is < 200 loc can go a
> long way towards understanding some trade offs involved. Looking using a
> tool at things like cache misses can also show *why* things may be doing
> what they are doing (eg when array gets too big to fit in cache and you
> suddenly see a slowdown due to the misses!)
>
> You can get pretty deep into what the computer is actually doing *very*
> quickly in such an example. I might also take the example to show the same
> code in assembler (as output by the JIT or compiler), most compilers can
> output, or you can view assembler from higher level languages such as
> java/C# (how to view asm from java/C# is also a useful skill few java/C#
> devs even KNOW (most professional devs don't know how to do this!)).
>
> Doing such you can very quickly get into things like how memory locations,
> caching, and other thing cause differences in how code executes. Another
> obviously fun one here is to look at the JIT compiler in debug/poduction
> mode, what optimizations were applied, how things are laid out, etc.
>
> One thing to be careful about here is that each student's computer will
> have slightly different behaviour. As such the parameters where things
> "change" will be different. A great board exercise would be showing each
> students computer, where things "changed" and comparing it to their
> processor/memory specifications/etc to see if patterns can be found.
>
> As a side note when dealing with mechanical sympathy I would almost
> *never* use javascript you will be learning far more about javascript and
> how to *force* it to do things (and getting tons of false positives to weed
> through) than learning about underlying things.
>
> HTH,
>
> Greg
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: aeron-driver (c++): localhost to localhost

2018-10-01 Thread Martin Thompson
This would be more appropriate on the Aeron Gitter group.

https://gitter.im/real-logic/Aeron

On Mon, 1 Oct 2018 at 17:16,  wrote:

> Hello to everybody.
>
> There's a wish to search ironmq acts on localhost over *IPC*. first of
> all I'm interested in  *c++ port driver*.
> I built for *Ubuntu 18.04* version downloaded from git (*1.11.2-SNAPSHOT*)
> and ran on localhost *aeronmd, basicpublisher* and* basicsubscriber.*
> During communication I saw that *basicpublisher* put data to its log
> buffer and *basicsubscriber* listened to its one.
> However I was suprised  having found out that  *aeronmd ,*driver app,
> sent packets to its self over udp (wireshark, lsof, *trace helped me).
> In this situation I have a question if it's possible to avoid aeron (udp)
> transfer to copy data from one share segment to another on *localhost*.
> I couldn't define it in docs.  In general I thought on localhost *aeronmd
> *should copy data from one logbuffer to another without network and
> packets overhead.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to keep socket session opened when one of the web-server fails?

2018-09-08 Thread Martin Thompson
Again is there a mechanical sympathy question in this?

On Saturday, 8 September 2018 17:27:23 UTC+1, MechSyQ wrote:
>
> In our system, our clients are connected to our web-servers via a load 
> balancer. They open web-socket connection and authenticate against a single 
> web server instance. It is very important for a connection to be 
> super-fast. Now, if one of web-server is down, currently, our client should 
> re-login to the system and connect to another web-server instance. We want 
> to handle fail-over automatically without re-connection for our clients. 
> How can we achieve that without breaking WebSocket connection? Can we do 
> something on the load balancer level which will keep a connection with a 
> client open and re-connect with a web-server instance? Do we have any other 
> options?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does matching engine on the financial exchanges handle fail-over

2018-09-08 Thread Martin Thompson
Is there a mechanical sympathy question in this? If feels like someone is 
curious or looking free consultancy.

Matching engines tend to be replicated state machines. Best if you start 
reading up on those.

Martin...

On Saturday, 8 September 2018 17:27:23 UTC+1, MechSyQ wrote:
>
> For example, on the FX or Equities exchange where HFT/real-time trading 
> going on, something went terrifically wrong: not possible to save 
> transaction into a file etc. What are the next actions: trading is put on 
> halt developers involved? What about fail-over? Active/active is no 
> application in this case? So, it is just manual or automatic fail-over from 
> active to passive node?
>
>
> Thx
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: LOCK XADD wait-free or lock free.

2018-08-25 Thread Martin Thompson
To perform an update via the XADD instruction the cache line containing the 
word must first be acquired. x86 uses the MESI cache coherence protocol and 
to get the cacheline for update an RFO (Read/Request For Ownership) message 
must be sent as a bus transaction. These requests are queued per core and 
in the uncore and effectively avoid starvation. Events are available, e.g. 
OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO. From the perspective of 
instructions each thread takes a finite number of steps to complete. The 
CAS equivalent would be lock-free, rather than wait-free, as the number of 
steps per thread is not finite.

On Saturday, 25 August 2018 06:00:22 UTC+1, Peter Veentjer wrote:
>
> I'm polishing up my lock free knowledge and one question keeps on bugging 
> me.
>
> The question is about the LOCK XADD and why it is considered to be wait 
> free.
>  
> AFAIK for wait freedom there needs to be some fairness. 
>
> So image a concurrent counter using a spin on a cas to increment a 
> counter, then at least 1 thread wins and makes progress. Therefor this 
> implementation is lock free. It isn't wait free because some thread could 
> be spinning without bound. The problem here is that there is no fairness.
>
> Now imagine this counter would be implemented using a LOCK XADD and 
> therefor there is no need for a loop. What is guaranteeing that every 
> thread is going to make progress in a bound number of steps? E.g. could it 
> happen that one thread is continuously denied exclusive access to the 
> memory and therefor it won't complete in a bound number of steps? Or do the 
> request get stored in some kind of list and the requests are processed in 
> this order? This order would provide fairness and then it will be wait free.
>
> I have been looking at the Intel documentation of LOCK XADD; but it 
> remains unclear.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Intra-process queue between Java and C++

2018-03-30 Thread Martin Thompson
Aeron IPC is more functional than a plain queue but I get your point that 
it is not a drop in replacement.

The ring buffers are typical queue based semantics and use an 8 byte header 
in Agrona and Aeron. 4 bytes for message length and 4 bytes for message 
type to give some flexibility. Fixed format messages can optimise well but 
have limited applicability.

On Friday, 30 March 2018 16:36:09 UTC+1, Roman Leventov wrote:
>
> Martin, thanks a lot!
>
> I thought about Aeron IPC, but as far as I understand it maps to the queue 
> model only when there is a single producer and a single consumer. Also it 
> felt a little too heavyweight for small fixed-sized messages. Generally 
> Aeron's Data frames have 32-byte headers. RingBuffers have only 16-byte 
> headers, and it looks like it could be harmlessly reduced down to 8 or even 
> 0 for e. g. fixed format 32-byte messages.
>
> There are implementations of FIFO ring buffers for Java and C++ used in 
>> Aeron for doing exactly this.
>>
>>
>> https://github.com/real-logic/aeron/tree/master/aeron-client/src/main/cpp/concurrent
>>
>>
>> https://github.com/real-logic/agrona/tree/master/agrona/src/main/java/org/agrona/concurrent
>>
>> You could also use Aeron IPC.
>>
>> On Friday, 30 March 2018 09:55:23 UTC+1, Roman Leventov wrote:
>>>
>>> I think about the possibility of building an asynchronous application 
>>> with back pressure where some upstream operators are in Java and some 
>>> downstream ones are in C++. For this purpose, some queues would be needed 
>>> to pass the data between Java and C++ layers. It seems that porting 
>>> JCTools's bounded array queues to off-heap should be doable, but I couldn't 
>>> find existing prototypes or discussions of such thing so maybe I overlook 
>>> some inherent complications with this idea.
>>>
>>> Did anybody think about something like this or has implemented in 
>>> proprietary systems?
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: garbage-free java.nio

2018-03-12 Thread Martin Thompson
There are challenges with implementing a communications system (messaging 
or RPC) on top of NIO with Java but it can be done. If you want to learn 
about this you could study code bases that have done this. Aeron and Netty 
are two examples. For reference Aeron is garbage free once connections are 
established regardless of the number of messages exchanged. To achieve this 
you have to do some ugly things with NIO to work around its weaknesses.

On Sunday, 11 March 2018 20:25:32 UTC, John Hening wrote:
>
> Hello,
>
> recently I am interested in non-blokcing java api for networking. It seems 
> to be great. However, I would like to implement garbage-free solution. I am 
> trying to do it only for learning purpose (I know that I don't implement a 
> "better" solution). Especially, my solution is going to be garbage-free. 
>
> How to start? I see that jdk implementation is not garbage-free. And what? 
> My only idea is to implement a small native library (only for Linux) and 
> implement something like "facade" and a lot of stuff around that in Java 
> (but, garbage-free).
> I suppose that it can be very hard to integrate it with selectors for 
> example (if possible). But, I don't see another solution and this is why I 
> wrote here. What do you think? What is the best approach?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM with 100 seconds minor GC

2017-12-21 Thread Martin Thompson
Start by enabling GC logging and then analyse the log to see what it is 
doing during GC.

On Thursday, 21 December 2017 17:25:09 UTC, Abhishek Agarwal wrote:
>
> Hello everyone,
> Recently, I had one of the JVMs going into this weird state wherein minor 
> gc start kicking almost every minute but what is most surprising that it is 
> one minor GC is taking as much as 100 seconds. The heap size is around 12 
> gigs and young gen is about 30% of that. As soon as gc starts, 
> understandably CPU usage started increasing as well. I am not able to 
> figure out how would one minor GC take this much time. The machine does not 
> have swapping enabled. I also ran sar and confirmed there was no swapping 
> during the issue. We are using parallel GC for young gen and CMS for old 
> gen GC. How can I go about troubleshooting this further? Any pointers will 
> be appreciated. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Befuddling deadlock (lock held but not in stack?)

2017-10-12 Thread Martin Thompson
I agree with Gil here. I often reject posts via moderation which range from 
things that should be in the lazy section for Stackoverflow or recruiters 
taking the piss. This question is from someone who has made good 
contributions to previous discussions and is posing a question that is 
trying to get to the bottom of something which is "befuddling" to someone 
trying to understand the mechanisms of a platform. To paraphrase Jackie 
Stewart who coined the mechanical sympathy phrase, to work in harmony with 
the machine one must understand the machine. If this was a question about 
the syntax of the Java language then that would be different.

In my own experiences I've often found discussions about managed runtimes 
and VMs in languages other than Java, which while not directly applicable, 
can be informative and give insights into the thinking behind the design of 
the "machine" on which we run. I don't write code in Pony or Rust for a 
living but studying the underlying implementations gives real insights into 
the possible future directions for memory management which is core to the 
"machine" for example.

I feel it is also valid to question the appropriateness and context for 
discussions. It is healthy to check our assumptions. We all win when 
discussions are civil and present opportunities to learn.

Martin...

On Thursday, 12 October 2017 17:09:06 UTC+1, Gil Tene wrote:
>
> The "machine" we run on is not just the hardware. It's also the BIOS, the 
> hypervisor, the kernel, the container system, the system libraries, and the 
> various runtimes. Stuff that is "interesting" and "strange" about how the 
> machine seems to behave is very appropriate to this group, IMO. And 
> mysterious deadlock behaviors and related issues at the machine level (e.g. 
> an apparent deadlock where no lock owner appears to exist, as is the case 
> discussed here) is certainly an "interesting" machine behavior. Java for 
> not. libposix or not, Linux or not. C#/C++/Rust or not. The fact that much 
> of concurrency work happens to be done in Java does make Java and JVMs a 
> more common context in which these issues are discussed, but the same can 
> be said about Linux, even tho this is not a Linux support group.
>
> E.g. the discussion we had a while back about Linux's futex wakeup bug 
>  
> (where certain kernel versions failed to wake up futex's, creating [among 
> other things] apparent deadlocks in non-deadlocking code) was presumably 
> appropriate for this group. I see Todd's query as no different. It is not a 
> "help! my program has a deadlock" question. It is an observed "deadlock 
> that isn't a deadlock" question. It may be a bug in tooling/reporting (e.g. 
> tooling might be deducing the deadlock based on non-atomic sampling of 
> stack state as some have suggested here), or it may be a bug in the lock 
> mechanisms. e.g. failed wakeup at the JVM level. Either way, it would be 
> just as interesting and relevant here as a failed wakeup or wrong lock 
> state intrumentation at the Linux kernel level. or at the .NET CLR level, 
> or at the golang runtime level, etc...
>
> On Wednesday, October 11, 2017 at 6:06:16 AM UTC-7, Jarkko Miettinen wrote:
>>
>> keskiviikko 11. lokakuuta 2017 11.27.17 UTC+3 Avi Kivity kirjoitti:
>>>
>>> If this is not off topic, what is the topic of this group?
>>>
>>>
>>> Is it a Java support group, or a "coding in a way that exploits the way 
>>> the hardware works" group?
>>>
>>
>>
>> I have to agree here with Avi. Better to have a group for "coding in a 
>> way that exploits the way the hardware works" and another group for Java 
>> support. Otherwise there will be a lot of discussion of no real relation to 
>> the topic except that people exploiting mechanical sympathy might have run 
>> into such problems.
>>
>> (I would also be interested in a Java support group for the level of 
>> problems that have been posted in this group before.)
>>
>>
>>>
>>> On 10/11/2017 10:29 AM, Kirk Pepperdine wrote:
>>>
>>> Not at all off topic… first, thread dumps lie like a rug… and here is 
>>> why… 
>>>
>>> for each thread {
>>> safe point
>>> create stack trace for that thread
>>> release threads from safe point
>>> }
>>>
>>> And while rugs may attempt to cover the debris that you’ve swept under 
>>> them, that debris leaves a clearly visible lump that suggests that you have 
>>> a congestion problem on locks in both sun.security.provider.Sun and 
>>> java.lang.Class…. What could possibly go wrong?
>>>
>>>
>>> Kind regards,
>>> Kirk
>>>
>>> On Oct 11, 2017, at 3:05 AM, Todd Lipcon  wrote:
>>>
>>> Hey folks, 
>>>
>>> Apologies for the slightly off-topic post, since this isn't performance 
>>> related, but I hope I'll be excused since this might be interesting to the 
>>> group members.
>>>
>>> We're recently facing an issue where a JVM is deadlocking in some SSL 
>>> code. The resulting 

Re: Lock and Little's Law

2017-09-16 Thread Martin Thompson
Little's law can be used to describe a system in steady state from a 
queuing perspective, i.e. arrival and leaving rates are balanced. In this 
case it is a crude way of modelling a system with a contention percentage 
of 100% under Amdahl's law, in that throughput is one over latency.

However this is an inaccurate way to model a system with locks. Amdahl's 
law does not account for coherence costs. For example, if you wrote a 
microbenchmark with a single thread to measure the lock cost then it is 
much lower that in a multi-threaded environment where cache coherence, 
other OS costs such as scheduling, and lock implementations need to be 
considered.

Universal Scalability Law (USL) accounts for both the contention and the 
coherence costs.

http://www.perfdynamics.com/Manifesto/USLscalability.html

When modelling locks it is necessary to consider how contention and 
coherence costs vary given how they can be implemented. Consider in Java 
how we have biased locking, thin locks, fat locks, inflation, and revoking 
biases which can cause safe points that bring all threads in the JVM to a 
stop with a significant coherence component.

On Friday, 15 September 2017 02:30:55 UTC+1, Dain Ironfoot wrote:
>
> Hi Guys, 
>
> Can someone verify if my understanding of Little's law is correct in the 
> context of locking.
>
> Suppose I have a system where I acquire a lock, do some work and release 
> it. Further, suppose that doing some "work" takes no time.
>
>
> *λ = L/ *W  ( *λ = throughout, L=Average number of customer 
> in a stable system, W=Average time spent in the system)*
>
> *λ = 1/* W  (Since a lock will only allow one thread to 
> execute)
>
> *λ = 1/*10 micros   (Supposed average time taken to acquire the lock)
>
> *λ **= *100,000 per second
>
>
> Therefore, just by using a lock, the throughput of my system is capped at 
> 100,000 
> per second.
>
> Is my reasoning correct?
>
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimal disruptor size

2017-09-10 Thread Martin Thompson

>
> is there any recommendation for an optimal Disruptor size? say, disruptor 
> size 2,048 vs 65,536 should not make any difference except that more data 
> can fit in?
>

This would be best answered in the Disruptor discussion group.

https://groups.google.com/forum/#!forum/lmax-disruptor 

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Disk-based logger - write pretouch

2017-07-10 Thread Martin Thompson
Are you running recent MacOS with APFS which supports sparse files? Linux
supports sparse files but not all OSs do. You need to run Linux to get
reasonable profiling information from the OS that is generally applicable.
Then you can use nice tools like those from Brendan Gregg.

https://github.com/brendangregg/perf-tools

When considering an algorithm for pre-touch ahead it is often useful to
consider rate to determine who far ahead you should be pre-touching. When
pre-touching then it can be better with a positional write of a byte, i.e.
use
https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#write(java.nio.ByteBuffer,%20long)
which should map to pwrite() on Linux. This will be a safepoint whereas the
mapped buffer write will not be. You can avoid corruption by gating on the
completion of this operation. Get the GC logs and compare GC time to
application stopped time.



On 10 July 2017 at 17:56, Roman Leventov  wrote:

> I'm trying to improve the latency of synchronous logging for Log4j:
> https://github.com/apache/logging-log4j2/pull/87
>
> One of the means is "write pretouch" of the mapped file contents:
> https://github.com/leventov/logging-log4j2/blob/6d
> 1c0e6b9a73b46ba7dd28ff853694f2012873c0/log4j-core/src/main/
> java/org/apache/logging/log4j/core/appender/UnsafeMappedByte
> BufferWriteTouch.java . Currently it is done as the following: when
> logger hits the 1st MB of the file, a task is scheduled to write-pretouch
> file contents between 1MB and 2MB marks (i. e. the 2nd MB) in a background
> thread. When logger starts to write in the 2nd MB, a task for
> write-pretouch of the 3rd MB is scheduled, and so on, i. e. at 1 MB blocks.
>
> Some testing is done on 4-core MBP, in 3 logging threads:
>  - new version looks great at 2k messages/sec/thread: https://
> cdn.rawgit.com/leventov/eee84006a53dfb8d2acaebfe126bb499/raw/
> 805c548d1f41e49e67b6815dde84d92be0568012/load2k.html (and pretouch
> actually contributes a lot to this result, without pretouch the new version
> is just a little better than the old)
>  - decent at 20k: https://cdn.rawgit.com/leventov/
> eee84006a53dfb8d2acaebfe126bb499/raw/805c548d1f41e49e67b6815dde84d9
> 2be0568012/load20k.html
>  - but suddenly much worse at 100k messages/sec/thread, than the old
> version: https://cdn.rawgit.com/leventov/eee84006a53dfb8d2acaebfe126bb4
> 99/raw/805c548d1f41e49e67b6815dde84d92be0568012/load100k.html
>
> I tried different approaches to pretouch, but it consistency incurs
> extremely long pauses, when running at "pretty extreme" message rates. Only
> not doing pretouch at all allows to avoid those pauses.
>
> Does anybody have a theory what's going on here? Is it more likely the
> feature of consumer-grade hardware, or the pretouch approach falls in bad
> inference with the MM implementation in OS X?
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aeron zero'ing buffer?

2017-05-30 Thread Martin Thompson
The claim is on a long with a check before increment.

On Tuesday, 30 May 2017 11:25:50 UTC+1, Peter Veentjer wrote:
>
> Thanks everyone.
>
> Another related question. When an atomic integer is used so that writers 
> can claim their segment in the buffer, what prevent wrapping of this atomic 
> integer and falsely assuming you have allocated a section in the buffer?
>
> So imagine the buffer is full, and the atomic integer > buffer.length. Any 
> thread that wants to write, keeps increasing this atomic integer far far 
> beyond its maximum capacity of the buffer. Which in itself is fine because 
> one can detect what kind write failure one had:
> - first one the over commit
> - subsequent over commit
>
> But what if the value wraps? In theory one could end up thinking one has 
> claimed a segment of memory still needed for reading purposes.
>
> One simple way to reduce the problem is to use an atomic long instead of 
> an atomic integer.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aeron zero'ing buffer?

2017-05-28 Thread Martin Thompson

>
> A related question. What happens if 2 threads do a plain write in the same 
> cache line but independent locations.
>
> If this happens concurrently, can the system run into a 'lost update'? I'm 
> sure it can't and I guess the cache coherence protocol takes care it. But 
> would like to get confirmation anyway.
>

If the writes are to independent, non-overlapping, addresses then no update 
will be lost even if in the same cache line. This will result in false 
sharing which is a performance issue but not a correctness issue.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question about SBE and DirectBuffer

2017-03-09 Thread Martin Thompson
If you can it would be best to raise an issue on the GitHub repo with a 
repeatable test. This looks like you have a logic bug given the exception. 
You should check the source of where you get the length parameter. This is 
not the best place to ask such questions.

On Thursday, 9 March 2017 12:21:08 UTC, Fayanne King wrote:
>
> Hi,
>
> We are getting the following exceptions when deserializing using SBE. We 
> got the same issue on both 1.5.6 and 1.6.0 versions of SBE.
>
> java.lang.IndexOutOfBoundsException: index=22, length=524370, capacity=1601
> at 
> org.agrona.concurrent.UnsafeBuffer.boundsCheck0(UnsafeBuffer.java:1096) 
> ~[aeron-all-1.0.4.jar:?]
> at 
> org.agrona.concurrent.UnsafeBuffer.getBytes(UnsafeBuffer.java:823) 
> ~[aeron-all-1.0.4.jar:?]
>
> java.lang.IndexOutOfBoundsException: index=524396, capacity=1601
> at 
> org.agrona.concurrent.UnsafeBuffer.boundsCheck(UnsafeBuffer.java:1087) 
> ~[aeron-all-1.0.4.jar:?]
> at 
> org.agrona.concurrent.UnsafeBuffer.getByte(UnsafeBuffer.java:778) 
> ~[aeron-all-1.0.4.jar:?]
>
> When we run unit tests on our local machine, everything is serializing and 
> deserializing properly. However, when we send the serialized message over 
> the network, the deserializing process is getting the exception above. 
>
> Messages are serialized and deserialized in a single thread and we are 
> using a ThreadLocal.initialValue of new 
> UnsafeBuffer(ByteBuffer.allocateDirect(4096*4)).
>
> We are running on Java(TM) SE Runtime Environment (build 1.8.0_111-b14).
>
> Thanks a lot for your help.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crashed with memmap and unsafe operation for IPC

2017-02-15 Thread Martin Thompson
I might be related to the following bug that was discussed on another 
thread.

https://bugs.openjdk.java.net/browse/JDK-8087134

On Wednesday, 15 February 2017 12:43:58 UTC, Yunpeng Li wrote:
>
> Hi there,
>  I'm trying to copycat using memmap and unsafe to build an IPC ping 
> pong test. Attached the source codes and crash reports, unfortunately, my 
> code crashes JVM each time when ping pong reaches 73100(+- 10), Run 
> PublisherTest first then SubscriberTest in separated process. Sometimes one 
> process crashes sometime both crash at the sametime.
>  Can someone help to check what's the reason it crashes at the very 
> place. I ran the test case inside eclipse and on Mint Linux 16(yeah old 
> version) and jvm 8.
>  And one more question on how JVM work with OS on memmap swap, e.g. in 
> my case, the locks are more than 1 page away from data, I follow the 
> reserve-modify-publish pattern from disruptor, First updating data then 
> publish lock by unsafe putOrdered, How did jvm make sure the lock is 
> visible behind data from another process, Given update dirty page is 
> "random" by OS.
>
> Thanks in advance.
> Yunpeng Li
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hiring Angular JS Developer at Washington DC

2017-01-30 Thread Martin Thompson
So far we have only gotten these infrequently. When we do I delete the post 
and automatically ban the person posting.

If it becomes more frequent we can consider moderating first post.

Martin...

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Fast serialization for Java

2016-12-20 Thread Martin Thompson
I've seen many attempts at fast codecs for Java and done things like
described. This resulted in me getting involved in SBE for encoding of
financial market data which has some of the single highest volume
feeds I know of. We open sourced an implementation of SBE here.

https://github.com/real-logic/simple-binary-encoding

On 21 December 2016 at 02:45, Hải Nguyễn  wrote:
> Hi,
> I just came across an idea of fast serialization from here:
> https://accu.org/index.php/journals/2317
> Basically, the idea is that for each strongly typed data, overwrite the
> serialize function as template, like this
>
> write(OutputStream os, Integer value);
> write(OutputStream os, Double value);
> write(OutputStream os, String value);
> Then in serialize() function, just iterate all paramters and call the
> corresponding write() function.
> Thing becomes harder for read or deserialization:
> Integer read(InputStream is);
> Double read(InputStream is);
> String read(InputStream is);
> Java does not accept that kind of serialization.
> So what can we do to implement kind like that: for specific data type, for
> example Integer), call the corresponding read() to get the desired value?
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: private final static optimization

2016-12-18 Thread Martin Thompson
Yes this can be a useful optimisation to take advantage of. We use this in 
Agrona for the optional bounds checking on buffers.

https://github.com/real-logic/Agrona/blob/master/src/main/java/org/agrona/concurrent/UnsafeBuffer.java#L51

https://github.com/real-logic/Agrona/blob/master/src/main/java/org/agrona/concurrent/UnsafeBuffer.java#L360

On Sunday, 18 December 2016 08:50:48 UTC, Peter Veentjer wrote:
>
> The following question is one out of curiosity of compiler optimization of 
> private final static fields.
>
> I created a very contrived example. The idea is to have some kind of 
> switch that can be enabled/disabled when the JVM starts up.
>
> public class Foo{
>
>private final static boolean ENABLED = 
> Boolean.getBoolean("bla.enabled");
>
>private int x;
>
>public void foo(int x){
> if(ENABLED){
> if(x <0) throw new RuntimeException();
> }
> this.x = x;
>}
> } 
>
> The question is if the JIT is allowed to remove the ENABLED check if 
> ENABLED is false. My worry is that due to reflection, one could set the 
> ENABLED to true and therefor the JIT isn't allowed to remove the ENABLED 
> check.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: LLC-misses are high, and yet it runs slower - why?

2016-12-16 Thread Martin Thompson
Why is the cacheline size not 64 or 128 bytes?


https://github.com/elazarl/use_method_talk/blob/master/stress_mem_bus/cachemiss.c#L13

I could not see if you tried to do this in the code but it is worth 
allocating with aligned_alloc() to ensure you are not false sharing.

http://man7.org/linux/man-pages/man3/posix_memalign.3.html

On Thursday, 15 December 2016 09:55:27 UTC, Elazar Leibovich wrote:
>
> Hi,
>
> I wrote a small piece of code that should
> demonstrate the fact that one CPU can affect
> the other CPU by thrashing the LLC cache.
>
> The code worked perfectly - CPU counters
> showed that the same process
> CPU PMU counters reports higher LLC misses
> when other process, on other CPU is thrashing
> the L3 cache.
>
> Alas, the one who has many cache misses
> is running pretty consistently wall-clock-faster,
> with similar  PMU and OS counters, such as
> context-switches and L1 misses.
>
> Can you help me understand why?
>
> I was able to recreate the issue on a Linux Laptop and on a server.
>
> More details, in the readme file here
>
>
> https://github.com/elazarl/use_method_talk/tree/master/stress_mem_bus#readme
>
> Thanks,
>
>
> PS,
> Yeah, micro optimizations are bad, and easy to misinterpret.
> Yeah, I'm not a CPU expert, and I do not fully understand the internals of 
> the CPU.
>
> But if you can help me and understand why, I'll be a bit smarter, and 
> you've helped
> spread the knowledge.
> Bonus point, if you'd explain me how to analyze such cases in the future.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Open source sharding framework ?

2016-12-03 Thread Martin Thompson
That feels more like a question for Stackoverflow.

On Saturday, 3 December 2016 21:10:41 UTC, Dorian Hoxha wrote:
>
> Hi,
>
> Is there any ? I'd like to build a simple db and also do sharding on it 
> but I don't want to develop it. I know that no magical sharding exists.
> There is redis 4 with it's modules api, but it's in-memory + the cluster 
> can miss writes.
> There is also dynomitedb but you can't up/down scale a cluster (have to 
> migrate to new one) and no multithreading.
>
> Is there something like those in a fast-enough language like 
> c,c++,java,golang,rust ? 
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unchecked exceptions for IO considered harmful.

2016-08-15 Thread Martin Thompson
+1

On 15 August 2016 at 15:05, Avi Kivity  wrote:

> Perhaps this discussion should be moved to a Java group.  As far as I can
> tell, it has nothing to do with mechanical sympathy.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.