Re: Happens before between putting and getting from and to ConcurrentHashMap

2018-11-18 Thread John Hening
Gil and Vladimir, thanks a lot for your time and explanation. This 
discussion was very fruitful, at least for me.

W dniu niedziela, 18 listopada 2018 19:35:30 UTC+1 użytkownik Vladimir 
Sitnikov napisał:
>
> Jean-Philippe>is a write to value but no read of this value inside the 
> same thread, so the write is free to be reordered
>
> It ("reordering") does not really matter.
>
> For instance,
>
> 17.4.5. Happens-before Order> If the reordering produces results 
> consistent with a legal execution, it is not illegal.
>
> What matters is the set of possible "writes" that given "read" is allowed 
> to observe.
>
>
> In this case, simple transitivity is enough to establish hb.
> As Gil highlights, "negations" are a bit hard to deal with, and Mr.Alexey 
> converts the negations to a positive clauses: 
> https://shipilev.net/blog/2014/jmm-pragmatics/#_happens_before
>
> Shipilёv> Therefore, in the absence of races, we can only see the latest 
> write in HB.
>
> Note: we (as programmers) do not really care HOW the runtime and/or CPU 
> would make that possible. We have guarantees from JVM that "in the absence 
> of races, we can only see the latest write in HB".
> CPU can reorder things and/or execute multiple instructions in parallel. I 
> don't really need to know the way it is implemented in order to prove that 
> "CHM is fine to share objects across threads".
>
> Just in case: there are two writes for w.value field.
> "write1" is "the write of default value" which "synchronizes-with the 
> first action in every thread" (see 17.4.4.) + "If an action x 
> synchronizes-with a following action y, then we also have hb(x, y)." (see 
> 17.4.5)
> "write2" is "w.value=42"
>
> "value=0" (write1) happens-before "w.value=42" (write2) by definition 
> (17.4.4+17.4.5)
> w.value=42 happens-before map.put (program order implies happens-before)
> read of u.value happens-before map.put (CHM guarantees that)
>
> In other words, "w.value=42" is the latest write in hb order for u.value 
> read, so u.value must observe 42.
> JRE must ensure that the only possible outcome for the program in question 
> is 42.
>
> Vladimir
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Happens before between putting and getting from and to ConcurrentHashMap

2018-11-17 Thread John Hening
Gil, 

thank you.

In the docs it is written 

Retrievals reflect the results of the most recently *completed* update 
> operations holding upon their onset. (More formally, an update operation 
> for a given key bears a *happens-before* relation with any (non-null) 
> retrieval for that key reporting the updated value.)
>
>
and, ineed, you right. It is too generally written. Actually it points a HB 
relation between updating and getting value but we cannot conclude that 
there is a HB between initialization of element and getting it (nonnull). 
So, for users of ConcurrentHashMap it means completely nothing because we 
don't need to take care of implementation details.

 


W dniu sobota, 17 listopada 2018 22:46:51 UTC+1 użytkownik Gil Tene napisał:
>
> I'd be happy to be wrong here, but...
>
> This statement in the CHM contract (JavaDoc quoted below) establishes a 
> happens-before relationship between the put into the CHM and any (non-null) 
> retrieval form the CHM. It amounts to a safe publication within the CHM 
> itself, but does not read on the constructor of the value being put() into 
> the CHM. There is no contract I've found that establishes a 
> happens-before relationship between the initialization of non-final fields 
> in some object construction and the put of that object (or of some object 
> that refers to it) as a value into the CHM.
>
> It is true that* in current implementation* a put() involves a volatile 
> store of the value into a Node.val field 
> ,
>  
> and that this volatile store does establish the needed happens-before 
> relationship between constructor initialization of non-final fields in the 
> value object and subsequent reads from the val field (including subsequent 
> get() operations that return that value object). But (AFAICT) that quality 
> can only be deduced by reading the source code of a current implementation.
>
> It is also true that future implementations of CHM will *likely* maintain 
> similar qualities since they seem to be desirable and de-facto provided 
> currently, so it may be surprising for them to disappear in some future CHM 
> implementation.
>
> But AFAICT, nowhere in the CHM contract, or in the spec, does it actually 
> say that this relationship (a happens-before between non-final field 
> initialization in a constructor and an eventual get() of the constructed 
> object from a CHM) can be counted on.
>
> If someone finds that missing statement (in the documentation, not the 
> implementation source statements) that would provide the full 
> happens-before links to non-final-field initialization, please point it out.
>
> On Saturday, November 17, 2018 at 2:04:32 AM UTC-8, Vladimir Sitnikov 
> wrote:
>>
>> > storing into a CHM is a safe publication
>>
>> +1
>>
>> That's declared in CHM JavaDoc: 
>>
>> More formally, an update operation for a given key bears a
>> * happens-before relation with any (non-null) retrieval for
>> * that key reporting the updated value.
>>
>> Vladimir
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Happens before between putting and getting from and to ConcurrentHashMap

2018-11-17 Thread John Hening
@Gil Tene, 
thanks for your response.

But in other cases where non-final fields are involved, e.g. if p.x was a 
> Foo with a non-final field y with a getter, p.x.getY() may return an 
> uninitialized value after the get() returns a non-null p.
>

So, it means that in this case:
 
   class Foo {  

public Foo(int i) { x = i;} 

public int x;
}
   
ConcurrentHashMap x = new ConcurrentHashMap<>();

new Thread(() -> { // Thread 1
x.put(1, new Foo(1)); // 1
}).start();
   
new Thread(() -> { // Thread 2
Foo p = x.get(1); // 2
if(p != null){ // 3
print(p.x);// 4
}
}).start();




the Thread 2 can read 0 in the line (4).

I cannot agree with that because:
1. Note that a putting element to the hashmap stores a value to the field 
annotated as volatile: 
https://github.com/bpupadhyaya/openjdk-8/blob/master/jdk/src/share/classes/java/util/concurrent/ConcurrentHashMap.java#L622
2. Note that a getting element from the hashmap reads a value from the 
field annoated as volatile: 
https://github.com/bpupadhyaya/openjdk-8/blob/master/jdk/src/share/classes/java/util/concurrent/ConcurrentHashMap.java#L622

So, if the second thread obeserves not null value in the 3 line it means 
there is a synchronization-with relation betweeen storing a Foo instance to 
value field in hashmap and reading from them. 

In a result we have a HB relation: 

new Foo(1) --HB--> store to volatile value --HB--> read from volatile value 
--HB--> read Foo.x field

We have a such chain of HB relations because it is a transitive close of PO 
and SW orders: 

<https://shipilev.net/blog/2014/jmm-pragmatics/page-077.png>






W dniu sobota, 17 listopada 2018 08:52:53 UTC+1 użytkownik Gil Tene napisał:
>
> Well, "yes, sort of".
>
> Your example works for String, because it is immutable.
>
> Specifically, there is a happens-before relationship between the 
> constructor's initialization of the final fields of the String and the 
> subsequent publishing store of the result of the constructor.
>
> However, non-final fields have no such happens-before relationship with 
> the publishing store. E.g. the cached hash field in String 
> <https://github.com/bpupadhyaya/openjdk-8/blob/master/jdk/src/share/classes/java/lang/String.java#L117>
>  may 
> or may not be have been initialized when p.x is read. 
>
> This [race on hash field initialization] has no visible side effect 
> because of how the hash field is used in String: it caches the hash code, 
> and will freshly compute the hash from the immutable char[] if the field 
> holds a 0. So even if the initialization races with a call to hashCode() 
> <https://github.com/bpupadhyaya/openjdk-8/blob/master/jdk/src/share/classes/java/lang/String.java#L1452>
>  [e.g. 
> a call to hashcode happens on another thread before the hash field is 
> initialized, and the initialization overwrites the cached value with a 0], 
> all that would happen is a recomputation of the same hash. The value 
> returned by hashCode() won't change.
>
> But in other cases where non-final fields are involved, e.g. if p.x was a 
> Foo with a non-final field y with a getter, p.x.getY() may return an 
> uninitialized value after the get() returns a non-null p.
>
> On Friday, November 16, 2018 at 3:59:47 AM UTC-8, John Hening wrote:
>>
>> Hello, let's look at:
>>
>>
>>
>> class P {  public String x;
>> }
>> 
>> ConcurrentHashMap x = new ConcurrentHashMap<>();
>>
>> new Thread(() -> { // Thread 1
>> x.put(1, new String("x")); // 1
>> }).start();
>> 
>> new Thread(() -> { // Thread 2 
>> P p = x.get(1); // 2
>> if(p != null){ 
>> print(p.x);// 4
>> }
>> }).start();
>>
>>
>>
>>
>> If thread 2 observes p != null is it guaranteed by JMM that p.x is 
>> initialized? For my eye, yes, because:
>>
>> Let's assume a such execution when p != null was true. It means that 
>> there is a synchronize-with relation between x.put(1, new String("x")); 
>> --sw--> x.get().
>> Putting and getting an element from ConcurrentHashMap contain 
>> synchronization access (and, actually, synchronization-with is between 
>> them). 
>>
>> In a result, there is a chain of happens-before relation:
>>
>> tmp = new String("x") --hb--> x.put(1, tmp) --hb--> x.get(1) --hb--> 
>> read(p.x)
>>
>>
>>
>>
>> Yes?
>>
>>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Happens before between putting and getting from and to ConcurrentHashMap

2018-11-16 Thread John Hening
Hello, let's look at:



class P {  public String x;
}

ConcurrentHashMap x = new ConcurrentHashMap<>();

new Thread(() -> { // Thread 1
x.put(1, new String("x")); // 1
}).start();

new Thread(() -> { // Thread 2 
P p = x.get(1); // 2
if(p != null){ 
print(p.x);// 4
}
}).start();




If thread 2 observes p != null is it guaranteed by JMM that p.x is 
initialized? For my eye, yes, because:

Let's assume a such execution when p != null was true. It means that there 
is a synchronize-with relation between x.put(1, new String("x")); --sw--> 
x.get().
Putting and getting an element from ConcurrentHashMap contain 
synchronization access (and, actually, synchronization-with is between 
them). 

In a result, there is a chain of happens-before relation:

tmp = new String("x") --hb--> x.put(1, tmp) --hb--> x.get(1) --hb--> 
read(p.x)




Yes?





-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JMM- synchronization access in a concrete example.

2018-09-25 Thread John Hening
Tom, 

Actually you right. I get it! 

Gil, 
thanks for your note. You obviously right. If I use multithreaded executor 
I got a lot races in a result. 
So, does it mean that my both version of example are correct? 

How to interpret a citation given by Cezary?: "If x and y are actions of 
the same thread and x comes before y in program order, then hb(x, y)."
For my eye the key is in interpreting of program order. So, if we have two 
statements [X, Y] and order of execution does not matter because both are 
intrathread-consistent it means that [Y,X] are in program order and HB(Y,X) 
by a rule I cite above. 

So, If we had no Executor's (and no other) guarantee it could be reordered. 


W dniu środa, 26 września 2018 04:49:21 UTC+2 użytkownik Gil Tene napisał:
>
> As Tom noted, The Executor's submission happens-before promise prevents a 
> reordering of (1) and (2) above.
>
> Note that, as written, the reason you you don't have data races between 
> (2) and (2) is that executor is known to be a single threaded executor (and 
> will only run one task at a time). Without that quality, you would have 
> plenty of (2) vs. (2) races. It is not that "doers contain different 
> objects": your code submits executions of functions using the same x member 
> of xs to all doers, and it is only the guaranteed serialization in your 
> chosen executor implementation that prevents x,f()s from racing on the same 
> x...
>
> On Tuesday, September 25, 2018 at 8:52:14 AM UTC-7, John Hening wrote:
>>
>> public class Test {
>> ArrayList xs;  
>> ArrayList doers;
>> Executor executor = Executors.newSingleThreadExecutor();
>>
>> static class Doer {
>>   public void does(X x){
>>x.f();   
>>   // (2)
>>   }
>> } 
>>
>> void test() {
>> for(X x : xs){
>> x.f();  // 
>> (1)
>> 
>> for(Doer d : doers) {
>> executor.execute(() -> d.does(x));
>> }
>> }
>> }
>> }
>>
>>
>>
>>
>> For my eye, if X.f is not synchronized it is incorrect because of two 
>> facts (and only that two facts): 
>>
>> 1. Obviously, there is data race between (1) and (2). There are no more 
>> data races here. (doers contains different objects)
>> 2. There is no guarantee that (1) will be executed before (2). Yes?
>>
>> If X.f would be synchronized that code will be correct because:
>> 1. There is no data race.
>> 2. There is guarantee that (1) will be executed before (2) because (1) is 
>> a synchronization action and Executor.execute is also a synchronization 
>> access (not specifically execute itself)
>>
>> Yes?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JMM- synchronization access in a concrete example.

2018-09-25 Thread John Hening

>
> (1) should happen before "executor.execute()" (because they are on the 
> same thread), 
>

It does not matter that they are executed on the same thread. I do not see 
cause here to HB relation was set up.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JMM- conservative and formal point of view.

2018-05-18 Thread John Hening
Aleksey, thanks for your explanation. By the way, your presentation (I mean 
slides) is great- when it comes to the content and not only :-). 


W dniu środa, 16 maja 2018 19:33:22 UTC+2 użytkownik John Hening napisał:
>
> The following picture comes from: 
> https://shipilev.net/blog/2014/jmm-pragmatics/
>
> <https://shipilev.net/blog/2014/jmm-pragmatics/page-101.png>
>
> As I understand this slide points that JMM does not constitute that a 
> volatile write does not work as a (Load/Store)Store barrier, and it doesn't 
> in fact. 
>
>
> But, JVM is allowed to reorder a such situation only in special cases, 
> when the execution will be *equivalent as if write(x,1) wouldn't be 
> reorderd with release(g)*. In another words, here that reordering is 
> allowed if and only if JVM is able to prove that it is equivalent to the 
> situation without reordering. 
>
>
> The question is:
>
> So, for considering *correctness* of program (and only *correctness*) can 
> we assume that volatile store works as (Load/Store)Store barrier in fact?
>
>
> However, a such simpilified view doesn't explain why JVM is able in some 
> cases to hostile:
>
>
> volatile x;
> for(..)
>   x += 1;
>
>
> What do you think?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JMM- conservative and formal point of view.

2018-05-16 Thread John Hening


Caveat: when the *outcome* would be equivalent to some allowed JMM 
execution. It does not mean the 
actual implementation should issue machine instructions in any given order. 

Obviously yes.

Aleksey, thanks for your response and presentation. I need to rethink a 
whole JMM, unfortunately :(. 

By the way, in your transcription you said:
int a;

volatile boolean ready = false;

void actor(IntResult1 r) {
while (!ready){};
r.r1 = a;
}

void observer() {
a = 41;
a = 42;
ready = true;
a = 43;
}


You said that possible values for r1 are 42 and 43. I cannot understand why 
41 is impossible. What does prevent write(a, 41) and write(a, 42) from 
being reordered?  
JMM is a weak model. It means that JMM is allowed to reorder all normal 
memory operations provided that reordered program is equivalent in 
one-threaded environment. In my example I see nothing prevents reordering. 

a = 42
a = 41
ready = true
a = 43


is equivalent to
a = 41;
a = 42;
ready = true;
a = 43;



W dniu środa, 16 maja 2018 19:49:04 UTC+2 użytkownik Aleksey Shipilev 
napisał:
>
> On 05/16/2018 07:33 PM, John Hening wrote: 
> > Subject: JMM- conservative and formal point of view. 
>
> There is no "conservative" JMM point of view. There is JSR 133 Cookbook 
> for Compiler Writers that 
> describes the conservative implementation, but it is not the JMM itself. 
> Reductio ad absurdum: 
> suppose JSR 133 Fanfic for Concurrency Experts would say the easy way to 
> be JMM conformant is to 
> acquire lock before entering any thread -- basically implementing Global 
> Interpreter Lock. Would you 
> be willing to take the existence of GIL in some implementations as the 
> guidance for writing reliable 
> software? I hope not. 
>
> > As I understand this slide points that JMM does not constitute that a 
> volatile write does not work 
> > as a (Load/Store)Store barrier, and it doesn't in fact. 
>
> Double negation is confusing here. That slide points out that roach motel 
> semantics basically says 
> that some transformations are easy (putting stuff "into" the acq/rel 
> block), and others are still 
> possible, but hard. There are cases where it is not hard, actually: for 
> example, when we know the 
> access is provably thread-local. 
>
>
> > But, JVM is allowed to reorder a such situation only in special cases, 
> when the execution will be 
> > *equivalent as if write(x,1) wouldn't be reorderd with release(g)*. In 
> another words, here that 
> > reordering is allowed if and only if JVM is able to prove that it is 
> equivalent to the situation 
> > without reordering. 
>
> Caveat: when the *outcome* would be equivalent to some allowed JMM 
> execution. It does not mean the 
> actual implementation should issue machine instructions in any given 
> order. 
>
>
> > The question is: 
> > 
> > So, for considering *correctness* of program (and only *correctness*) 
> can we assume that volatile 
> > store works as (Load/Store)Store barrier in fact? 
>
> No. The fluff about barriers is implementation details, which gets 
> confusing very quickly as 
> implementations screw with your code. 
>
> I ranted about this a week ago at GeeCON: 
>  https://shipilev.net/talks/geecon-May2018-jmm.pdf 
>
> -Aleksey 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


JMM- conservative and formal point of view.

2018-05-16 Thread John Hening


The following picture comes from: 
https://shipilev.net/blog/2014/jmm-pragmatics/



As I understand this slide points that JMM does not constitute that a 
volatile write does not work as a (Load/Store)Store barrier, and it doesn't 
in fact. 


But, JVM is allowed to reorder a such situation only in special cases, when 
the execution will be *equivalent as if write(x,1) wouldn't be reorderd 
with release(g)*. In another words, here that reordering is allowed if and 
only if JVM is able to prove that it is equivalent to the situation without 
reordering. 


The question is:

So, for considering *correctness* of program (and only *correctness*) can 
we assume that volatile store works as (Load/Store)Store barrier in fact?


However, a such simpilified view doesn't explain why JVM is able in some 
cases to hostile:


volatile x;
for(..)
  x += 1;


What do you think?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Throughput test of OpenHFT networking

2018-05-12 Thread John Hening
Why do you think that I have a production system? I do it for learning 
purpose. 


W dniu sobota, 12 maja 2018 21:59:53 UTC+2 użytkownik Greg Young napisał:
>
> Will your production system be running over loopback? 
>
> On Sun, May 13, 2018 at 1:59 AM, John Hening <goci...@gmail.com 
> > wrote:
>
>> I know that testing by loopback isn't the best idea (and that it omits a 
>> part of the network stack) but it is the simplest and I have just a laptop 
>> with one physical network interface so I don't see a possibility to test it 
>> without a second physical host. 
>>
>>
>>  Not mentioning that open HFT may employ socket mechanics different from 
>>> what's available in jdk. 
>>>
>> AFAIK (perhaps I am wrong) OpenHFT uses networking available in jdk,
>>
>> W dniu sobota, 12 maja 2018 20:34:57 UTC+2 użytkownik Wojciech Kudla 
>> napisał:
>>>
>>> It's probably not the response that you were hoping to see but I'd avoid 
>>> testing for performance using loopback interface. 
>>> There are whole parts of the network stack omitted by the Linux kernel 
>>> in such scenarios. 
>>> Not mentioning that open HFT may employ socket mechanics different from 
>>> what's available in jdk. 
>>>
>>> On Sat, 12 May 2018, 19:03 John Hening, <goci...@gmail.com> wrote:
>>>
>>>> Hello, 
>>>> I am trying to examine a throughput of OpenHFT networking in version 
>>>> 1.12.2 to compare it with my toy. 
>>>>
>>>> My test works in the following way: 
>>>>
>>>> I have few concurrent clients that send (by loopback) to the server 
>>>> (written with OpenHFT networking) 131`072 messages of size 4KB and 
>>>> blocking-wait for a (1-byte) response that confirms that message was 
>>>> processed by the server.
>>>>
>>>> I've run a test with -DServerThreadingStrategy=CONCURRENT, 
>>>> MULTI_THREADED_BUSY_WAITING  and SINGLE_THREADED.
>>>>
>>>>
>>>>
>>>> number of clientsOpenHFT: avg. 
>>>> Messages per second: my toy
>>>> (run on different threads)CONCURRENT  SINGLE_THREADED 
>>>> MULTI_THREADED_BUSY_WAITING CONCURRENT (no single threaded 
>>>> strategy)
>>>>   1   52.47 50.4 
>>>> 49.95   41.4  
>>>>   2   40.21 48.57
>>>> 39.54   44.65
>>>>   4   21.92 23.68
>>>> 21.51   32.04
>>>>   8   10.78 12.83
>>>> 10.91   23.06
>>>>  16   5.53  6.02 
>>>> 5.5711.77 
>>>>  32   2.68  2.79 
>>>> 2.766.46
>>>>
>>>>
>>>>
>>>> 
>>>>
>>>>
>>>>
>>>> I suppose that the problem is with my usage of that library, but I 
>>>> cannot figure out what is wrong- it is not so easy to write a test because 
>>>> of lack/obsolete documentation/examples.
>>>>
>>>> The server is run as follows: (I skipped not important details to make 
>>>> it shorter)
>>>>
>>>> private static EventLoop eg;
>>>>
>>>> public static void starServer() {
>>>> eg = new EventGroup(true);
>>>> eg.start();
>>>> TCPRegistry.createServerSocketChannelFor(desc);
>>>> AcceptorEventHandler eah = new AcceptorEventHandler(desc, 
>>>> LegacyHanderFactory.legacyTcpEventHandlerFactory(nc -> new Confirmer
>>>> ()), VanillaNetworkContext::new);
>>>> 
>>>>  // Confirmer send 
>>>> one-byte message to a client to confirm processing of message
>>>> eg.addHandler(eah);
>>>> }
>>>>
>>>>  

Throughput test of OpenHFT networking

2018-05-12 Thread John Hening
Hello, 
I am trying to examine a throughput of OpenHFT networking in version 1.12.2 
to compare it with my toy. 

My test works in the following way: 

I have few concurrent clients that send (by loopback) to the server 
(written with OpenHFT networking) 131`072 messages of size 4KB and 
blocking-wait for a (1-byte) response that confirms that message was 
processed by the server.

I've run a test with -DServerThreadingStrategy=CONCURRENT, 
MULTI_THREADED_BUSY_WAITING  and SINGLE_THREADED.



number of clientsOpenHFT: avg. 
Messages per second: my toy
(run on different threads)CONCURRENT  SINGLE_THREADED 
MULTI_THREADED_BUSY_WAITING CONCURRENT (no single threaded 
strategy)
  1   52.47 50.4 49.95  
 
41.4  
  2   40.21 48.5739.54 
  44.65
  4   21.92 23.6821.51  
 32.04
  8   10.78 12.8310.91  
 23.06
 16   5.53  6.02 5.57   
 11.77 
 32   2.68  2.79 2.76   
 6.46






I suppose that the problem is with my usage of that library, but I cannot 
figure out what is wrong- it is not so easy to write a test because of 
lack/obsolete documentation/examples.

The server is run as follows: (I skipped not important details to make it 
shorter)

private static EventLoop eg;

public static void starServer() {
eg = new EventGroup(true);
eg.start();
TCPRegistry.createServerSocketChannelFor(desc);
AcceptorEventHandler eah = new AcceptorEventHandler(desc, 
LegacyHanderFactory.legacyTcpEventHandlerFactory(nc -> new Confirmer()), V
anillaNetworkContext::new);

 // Confirmer send one-byte 
message to a client to confirm processing of message
eg.addHandler(eah);
}

class Confirmer implements TcpHandler {
@Override
public void process(@NotNull final Bytes in, @NotNull final Bytes 
out, NetworkContext nc) {
if (in.readRemaining() == 0){
return;
}
out.write(in, in.readPosition(), 1); // send a confirmation
in.readSkip(Math.min(in.readRemaining(), out.writeRemaining()));
}
}
}



A client code (run on different threads):
  
private void sender(long numberOfMessage, int sizeOfMessage) {
SocketChannel sc = TCPRegistry.createSocketChannel(desc);
sc.configureBlocking(true);
ByteBuffer buffer = ByteBuffer.allocateDirect(sizeOfMessage);
ByteBuffer recv = ByteBuffer.allocateDirect(1);
buffer.putInt(64); // required bytes. To be frankly, I don't understand 
exactly why 64. 
buffer.put(bytes.getBytes()); // nearly 4KB-text-message

long start = System.nanoTime();
for(int i = 0; i < numberOfMessage; i++){
buffer.clear();
recv.clear();
sc.write(buffer);
sc.read(recv);
}
times.add(System.nanoTime() - start); // use it to compute an average 
after
sc.close();
}



And I use times to get an average throughput. 

P.S. I see that system is loaded (in the sense of threads and kernel 
network stack) but results seems to be low though. I don't have an 
experience and I cannot decide whether that results are normal.
P.S.2 I also know that my test is primitive 

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Experiments with send in Linux

2018-05-07 Thread John Hening
Thanks Avi,
 

> The send doesn't wait, but the kernel has to serve both the send side and 
> the receive side, and if they happen to run on the same CPU and possibly in 
> the same thread, then you'll see them in the flame graph for that CPU or 
> thread.


Ok. I didn't notice that I create a flamegraph for a specififc PID, so it 
is not that case- I mean that I did not sample a CPU but specififc process.

In this case the kernel decided to run the receive processing immediately, 
> so that's what you get.


I see, you right. This is a way kernel implements a loopback.

use unix domain sockets or (better) shared memory for IPC.
>

I cannot, my communication is not interprocess.

Now everything is clear, thanks ;-)

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Nanotrusting the Nanotime and amortization.

2018-04-25 Thread John Hening
So by amortization you mean just:

In reality volatile write in relation to whole execution negligible.

yes?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Nanotrusting the Nanotime and amortization.

2018-04-24 Thread John Hening
I'm reading the great article from 
https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks Aleksey! :)) 
and I am not sure whether I understand correctly that. 

Firstly, it is compared performance of plain and volatile writes:


Benchmark Mode Samples Mean Mean error Units
o.s.VolatileWriteSucks.incrPlain avgt 250 3.589 0.025 ns/op
o.s.VolatileWriteSucks.incrVolatile avgt 250 15.219 0.114 ns/op


and then it is written that: 

"In real code, the heavy-weight operations are mixed with relatively low-weight 
ops, which amortize the costs."

And my question is: What does it mean to amortize costs exactly? I explain it 
myself that amortization is caused by out of order execution of CPU, yes? 
So even if volatile write takes much more time than plain write, it isn't so 
painful because CPU executes other instruction out of order (if it can). 

What do you think?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Analysis of perf sched for simple server.

2018-04-23 Thread John Hening
Hello,

1. I have a simple one-threaded tcp server written in Java. I try to 
measure its receive-and-process-response latency. More preceisely, the 
client sends (by loopback) the 128-bytes message with a timestamp in the 
header. The server receives a message, reads a content byte by byte and 
compute a difference between `now` and the timestamp from the header. The 
difference is more or less 6μs.

Now, I am trying to make it faster.

But, firstly I would like to examine a scheduling issue. So, I've collected 
results with:

perf sched record -p  -- sleep 10

and then:

perf sched timehist -V

Excerpt from the result is presented below: (I've filtered it for my server 
thread)

 
wait 
time  sch delay   run time
   time  cputask name [tid/pid] (msec) 
(msec) (msec)
--- --  -  --  - 
 -  -

   1849.531842  [0002]sjava[7340/7338] 0.000 
 0.000 56.003 
   21849.587844 [0002]sjava[7340/7338] 0.000 
 0.000 56.001 
   21849.607838 [0002]sjava[7340/7338] 0.000 
 0.000 19.994 
   21849.615836 [0002]sjava[7340/7338] 0.000 
 0.000  7.998 
   
...
   21849.691834 [0002]sjava[7340/7338] 0.000 
 0.000  4.000 
   21849.703837 [0001]   s java[7340/7338] 0.000 
 0.000 38.330 
   21849.711838 [0005]   s java[7340/7338] 0.000 
 0.000  0.000 
   21849.719834 [0005]   s java[7340/7338] 0.000 
 0.000  7.996 

My question is:
How is it possible that wait_time is always zero? After all, it is 
impossible. My CPU is multicore but there is a lot of processes (threads) 
that needs CPU time. How to interpret that? 
>From the other hand I am gonna to reduce CPU migration. Perhaps it will 
help :-)


2. The second issue is:

The excerpt from perf sched script

java  7340 [002] 21848.012360:   sched:sched_wakeup: comm=java 
pid=7511 prio=120 target_cpu=005
java  7340 [002] 21848.012375:   sched:sched_wakeup: comm=java 
pid=7511 prio=120 target_cpu=005
java  7340 [002] 21848.012391:   sched:sched_wakeup: comm=java 
pid=7511 prio=120 target_cpu=005
java  7340 [002] 21848.012406:   sched:sched_wakeup: comm=java 
pid=7511 prio=120 target_cpu=005

...
 swapper 0 [007] 21848.012554:   sched:sched_wakeup: comm=java 
pid=7377 prio=120 target_cpu=007
 swapper 0 [007] 21848.012554:   sched:sched_wakeup: comm=java 
pid=7377 prio=120 target_cpu=007

...
java  7340 [002] 21848.012555:   sched:sched_wakeup: comm=java 
pid=7511 prio=120 target_cpu=005
java  7377 [007] 21848.012582: sched:sched_stat_runtime: comm=java 
pid=7377 runtime=37420 [ns] vruntime=1628300433237 [ns]
java  7340 [002] 21848.012585:   sched:sched_wakeup: comm=java 
pid=7511 prio=120 target_cpu=005
java  7377 [007] 21848.012587:   sched:sched_switch: 
prev_comm=java prev_pid=7377 prev_prio=120 prev_state=S ==> next_comm=
swapper/7 next_pid=0 next_prio=120

Why my server receives sched_wakup that looks like spurious wakeups? What 
is a swapper?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Exclusive core for a process, is it reasonable?

2018-04-10 Thread John Hening
Thanks for your responses 

W dniu niedziela, 8 kwietnia 2018 14:51:52 UTC+2 użytkownik John Hening 
napisał:
>
> Hello,
>
> I've read about thread affinity and I see that it is popular in 
> high-performance-libraries (for example 
> https://github.com/OpenHFT/Java-Thread-Affinity). Ok, jugglery a thread 
> between cores has impact (generally) on performance so it is reasonable to 
> bind a specific thread to a specific core. 
>
> *Intro*:
> It is obvious that the best idea to make it possible that any process will 
> be an owner of core [let's call it X] (in multi-core CPU). I mean that main 
> thread in a process will be one and only thread executed on core X. So, 
> there is no problem with context-switching and cache flushing [with expect 
> system calls]. 
> I know that it requires a special implementation of scheduler in kernel, 
> so it requires a modification of [Linux] kernel. I know that it is not so 
> easy and so on.
>
> *Question*:
> But, we know that we have systems that need a high performance. So, it 
> could be a solution with context-switching once and at all. So, why there 
> is no a such solution? My suspicions are:
>
> * it is pointless, the bottleneck is elsewhere [However, it is meaningful 
> to get thread-affinity]
> * it is too hard and there is too risky to make it not correctly
> * there is no need
> * forking own linux kernel doesn't sound like a good idea.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: garbage-free java.nio

2018-03-12 Thread John Hening
Thanks! :)

W dniu niedziela, 11 marca 2018 21:25:32 UTC+1 użytkownik John Hening 
napisał:
>
> Hello,
>
> recently I am interested in non-blokcing java api for networking. It seems 
> to be great. However, I would like to implement garbage-free solution. I am 
> trying to do it only for learning purpose (I know that I don't implement a 
> "better" solution). Especially, my solution is going to be garbage-free. 
>
> How to start? I see that jdk implementation is not garbage-free. And what? 
> My only idea is to implement a small native library (only for Linux) and 
> implement something like "facade" and a lot of stuff around that in Java 
> (but, garbage-free).
> I suppose that it can be very hard to integrate it with selectors for 
> example (if possible). But, I don't see another solution and this is why I 
> wrote here. What do you think? What is the best approach?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


garbage-free java.nio

2018-03-11 Thread John Hening
Hello,

recently I am interested in non-blokcing java api for networking. It seems 
to be great. However, I would like to implement garbage-free solution. I am 
trying to do it only for learning purpose (I know that I don't implement a 
"better" solution). Especially, my solution is going to be garbage-free. 

How to start? I see that jdk implementation is not garbage-free. And what? 
My only idea is to implement a small native library (only for Linux) and 
implement something like "facade" and a lot of stuff around that in Java 
(but, garbage-free).
I suppose that it can be very hard to integrate it with selectors for 
example (if possible). But, I don't see another solution and this is why I 
wrote here. What do you think? What is the best approach?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Looking at reordering memory operations

2018-03-11 Thread John Hening

>
> That (t1 always see the same value of x when x is modified elsewhere) is 
> possible, e.g. in a tight loop reading x and nothing else
>

And this is why I dislike when people say: "volatile field" ensures you 
*only* about ordering. It is not true. Because it also means: 

Mr javac/java/jit, 

I synchronized access to that object so please be careful during reoderding 
*AND* and optimization like: (Java perspective) I don't load the value from 
the memory because noone can modify it (because it is not 
volatile/synchronized- I have no clue that object is shared!)


Gil, thanks for your explanation. 

W dniu piątek, 9 marca 2018 23:20:37 UTC+1 użytkownik John Hening napisał:
>
>
> executor = Executors.newFixedThreadPool(16);
> while(true) {
> SocketChannel connection = serverSocketChannel.accept();
> connection.configueBlocking(false);
> executor.execute(() -> writeTask(connection)); 
> }
> void writeTask(SocketChannel s){
> s.isBlocking();
> }
>
> public final SelectableChannel configureBlocking(boolean block) throws 
> IOException
> {
> synchronized (regLock) {
> ...
> blocking = block;
> }
> return this;
> }
>
>
>
> We see the following situation: the main thread is setting 
> connection.configueBlocking(false)
>
> and another thread (launched by executor) is reading that. So, it looks 
> like a datarace.
>
> My question is:
>
> 1. Here 
> configureBlocking
>
> is synchronized so it behaves as memory barrier. It means that code is ok- 
> even if reading/writing to 
> blocking
>
> field is not synchronized- reading/writing boolean is atomic.
>
> 2. What if 
> configureBlocking
>
> wouldn't be synchronized? What in a such situation? I think that it would 
> be necessary to emit a memory barrier because it is theoretically possible 
> that setting blocking field could be reordered. 
>
> Am I right?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Looking at reordering memory operations

2018-03-10 Thread John Hening
Gil, thanks for your response. It is very helpful. 

In your specific example above, there is actually no ordering question, 
because your writeTask() operations doesn't actually observe the state 
changed by connection.configueBlocking(false)


I agree that my question wasn't correct. There is not 'ordering'. I meant 
visibility. 


 Without the use of synchronized in isBlocking(), the use of synchronized 
in configureBlocking() wouldn't make a difference.

Yes, semi-synchronized doesn't work. So, I conclude that without a 
synchronization the result of `blocking = false` could be invisible for 
writeTask, am I right?


As your question about the possibility of "skipping" some write operations.


By skipping I meant 'being invisible for observers'. For example, if one 
thread t1 read any not-volatile-integer x then it is possible that t1 see 
always the same value of x (though there is another thread t2 that modifies 
x). 


1. It is interesting for me what about a such situation:

   while(true) {
SocketChannel connection = serverSocketChannel.accept();
connection.configueBlocking(false);
Unsafe.storeFence();
executor.execute(() -> writeTask(connection));
}
void writeTask(SocketChannel s){
(***)
any_static_global_field = s.isBlocking();
}

For my eye it should work but I have doubts. What does it mean storeFence? 
Please flush it to the memory immediately! So, it will be visible before 
starting the executor thread. But, it seems that, here, load fence is not 
necessary (***). Why? The blocking field must be read from memory (there is 
no possibility that it is cached, because it is read the first time by the 
executor thread). When it comes to CPU cache it may be cached but cache is 
coherent = no problem). Moreover, there is no need to ensure ordering here. 
So, loadFence is not necessary. Yes? 

2. 
volatile int foo;
...
foo = 1;
foo = 2;
foo = 3;



It is very interesting. So, after JITed on x86 it can look like:

mov , 1
sfence
mov , 2
sfence
mov , 3
sfence



Are you sure that CPU can execute that as:
mov , 3
sfence


?

I know that: 

mov , 1
mov , 2
mov , 3 



x86-CPU can optimizied it legally. 
















W dniu piątek, 9 marca 2018 23:20:37 UTC+1 użytkownik John Hening napisał:
>
>
> executor = Executors.newFixedThreadPool(16);
> while(true) {
> SocketChannel connection = serverSocketChannel.accept();
> connection.configueBlocking(false);
> executor.execute(() -> writeTask(connection)); 
> }
> void writeTask(SocketChannel s){
> s.isBlocking();
> }
>
> public final SelectableChannel configureBlocking(boolean block) throws 
> IOException
> {
> synchronized (regLock) {
> ...
> blocking = block;
> }
> return this;
> }
>
>
>
> We see the following situation: the main thread is setting 
> connection.configueBlocking(false)
>
> and another thread (launched by executor) is reading that. So, it looks 
> like a datarace.
>
> My question is:
>
> 1. Here 
> configureBlocking
>
> is synchronized so it behaves as memory barrier. It means that code is ok- 
> even if reading/writing to 
> blocking
>
> field is not synchronized- reading/writing boolean is atomic.
>
> 2. What if 
> configureBlocking
>
> wouldn't be synchronized? What in a such situation? I think that it would 
> be necessary to emit a memory barrier because it is theoretically possible 
> that setting blocking field could be reordered. 
>
> Am I right?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Looking at reordering memory operations

2018-03-10 Thread John Hening
ok, reordering is not a good idea to consider here. But, please note that 
if conifgureBlocking wans't synchronized then a statement:

blocking = false 

could be "skipped" on compilation level because JMM doesn't guarantee you 
that every access to the memory will be "commit" to the main memory. 


W dniu piątek, 9 marca 2018 23:20:37 UTC+1 użytkownik John Hening napisał:
>
>
> executor = Executors.newFixedThreadPool(16);
> while(true) {
> SocketChannel connection = serverSocketChannel.accept();
> connection.configueBlocking(false);
> executor.execute(() -> writeTask(connection)); 
> }
> void writeTask(SocketChannel s){
> s.isBlocking();
> }
>
> public final SelectableChannel configureBlocking(boolean block) throws 
> IOException
> {
> synchronized (regLock) {
> ...
> blocking = block;
> }
> return this;
> }
>
>
>
> We see the following situation: the main thread is setting 
> connection.configueBlocking(false)
>
> and another thread (launched by executor) is reading that. So, it looks 
> like a datarace.
>
> My question is:
>
> 1. Here 
> configureBlocking
>
> is synchronized so it behaves as memory barrier. It means that code is ok- 
> even if reading/writing to 
> blocking
>
> field is not synchronized- reading/writing boolean is atomic.
>
> 2. What if 
> configureBlocking
>
> wouldn't be synchronized? What in a such situation? I think that it would 
> be necessary to emit a memory barrier because it is theoretically possible 
> that setting blocking field could be reordered. 
>
> Am I right?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Looking at reordering memory operations

2018-03-09 Thread John Hening

executor = Executors.newFixedThreadPool(16);
while(true) {
SocketChannel connection = serverSocketChannel.accept();
connection.configueBlocking(false);
executor.execute(() -> writeTask(connection)); 
}
void writeTask(SocketChannel s){
s.isBlocking();
}

public final SelectableChannel configureBlocking(boolean block) throws 
IOException
{
synchronized (regLock) {
...
blocking = block;
}
return this;
}



We see the following situation: the main thread is setting 
connection.configueBlocking(false)

and another thread (launched by executor) is reading that. So, it looks 
like a datarace.

My question is:

1. Here 
configureBlocking

is synchronized so it behaves as memory barrier. It means that code is ok- 
even if reading/writing to 
blocking

field is not synchronized- reading/writing boolean is atomic.

2. What if 
configureBlocking

wouldn't be synchronized? What in a such situation? I think that it would 
be necessary to emit a memory barrier because it is theoretically possible 
that setting blocking field could be reordered. 

Am I right?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


String.interning is surprisingly cpu-consuming.

2018-02-27 Thread John Hening
Hello! 

I have the very simple Utils function: 

public static Unsafe getUnsafe(){
  Field f = Unsafe.class.getDeclaredField("theUnsafe");
  f.setAccessible(true);
  return (Unsafe) f.get(null);
}

And, as it turned out from my analysis with perf it is a bottleneck in my 
"scheme" of the program. That funciton is called in main loop. It is a 
scheme so it wasn't optimized (I know that I should do it better). 
Nevertheless, perf points that `String::intern` is cpu-cylcle-consuming. 
I cannot understand why that method wasn't compiled to something like that:

("theUnsafe" was placed in permgen)
and now:


load string_theUnsafe_from_permgen
call Unsafe.class.getDeclaredField
..



instead of
public static getUnsafe()Lsun/misc/Unsafe;
...
   L0
LDC Lsun/misc/Unsafe;.class
LDC "theUnsafe"
INVOKEVIRTUAL java/lang/Class.getDeclaredField (Ljava/lang/String;)Ljava
/lang/reflect/Field;
...
}

Especially, I don't see where method

String.intern

is called? I suppose that it is called by interpreter on

LDC "theUnsafe"


Explain me that, please.

Greetings,

John

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JITed very simple method.

2018-02-05 Thread John Hening
@Aleksey,

1. Why do you find it is a bug in tierred compilation machinery? It is not 
on my eye.  The compilation level is: C2, level 4

2. I have still a doubt: why profiling counter is not synchronized. What if 
2 or more threads executing a function 
simple()

Can you explain?




W dniu niedziela, 4 lutego 2018 00:31:28 UTC+1 użytkownik John Hening 
napisał:
>
> public static int dontOptimize;
>
>
> public static int simple(){
>  return dontOptimize;
> }
>
> And JITed version of that:
>
>
>   0x7fdc75112aa0: mov%eax,-0x14000(%rsp)
>   0x7fdc75112aa7: push   %rbp
>   0x7fdc75112aa8: sub$0x30,%rsp
>   *0x7fdc75112aac*: mov$0x7fdc7443be90,%rax  ;   {metadata(method 
> data for {method} {0x7fdc7443bb30} 'simple' '()I' in 'Main')}
>   0x7fdc75112ab6: mov0xdc(%rax),%esi
>   0x7fdc75112abc: add$0x8,%esi
>   0x7fdc75112abf: mov%esi,0xdc(%rax)
>   0x7fdc75112ac5: mov$0x7fdc7443bb30,%rax  ;   {metadata({method} 
> {0x7fdc7443bb30} 'simple' '()I' in 'Main')}
>   0x7fdc75112acf: and$0x1ff8,%esi
>   0x7fdc75112ad5: cmp$0x0,%esi
>   *0x7fdc75112ad8*: je 0x7fdc75112af7
>   0x7fdc75112ade: mov$0x76d2414f0,%rax  ;   {oop(a 
> 'java/lang/Class' = 'Main')}
>   0x7fdc75112ae8: mov0x68(%rax),%eax;*getstatic dontOptimize
> ; - Main::simple@0 (line 
> 11)
>
>   0x7fdc75112aeb: add$0x30,%rsp
>   0x7fdc75112aef: pop%rbp
>   0x7fdc75112af0: test   %eax,0x1861060a(%rip)# 
> 0x7fdc8d723100
> ;   {poll_return}
>   0x7fdc75112af6: retq   
>
>
>
>
>
> Though I know assembly I cannot understand a such simple code. Especially, 
> what does lines between 0x7fdc75112aac-*0x7fdc75112ad8* means? I 
> highlighted it. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JITed very simple method.

2018-02-04 Thread John Hening
Alex,

thanks for your response. But, how to know that? I suspected that it is a 
kind of statistic (because it is metadata, it is incremented every time), 
but I didn't know that it is exactly.

W dniu niedziela, 4 lutego 2018 12:28:54 UTC+1 użytkownik Alex Blewitt 
napisał:
>
>
> On 3 Feb 2018, at 23:31, John Hening <goci...@gmail.com > 
> wrote:
>
> public static int dontOptimize;
>
>
> public static int simple(){
>  return dontOptimize;
> }
>
> And JITed version of that:
>
>
>   0x7fdc75112aa0: mov%eax,-0x14000(%rsp)
>   0x7fdc75112aa7: push   %rbp
>   0x7fdc75112aa8: sub$0x30,%rsp
>   *0x7fdc75112aac*: mov$0x7fdc7443be90,%rax  ;   {metadata(method 
> data for {method} {0x7fdc7443bb30} 'simple' '()I' in 'Main')}
>   0x7fdc75112ab6: mov0xdc(%rax),%esi
>   0x7fdc75112abc: add$0x8,%esi
>   0x7fdc75112abf: mov%esi,0xdc(%rax)
>   0x7fdc75112ac5: mov$0x7fdc7443bb30,%rax  ;   {metadata({method} 
> {0x7fdc7443bb30} 'simple' '()I' in 'Main')}
>   0x7fdc75112acf: and$0x1ff8,%esi
>   0x7fdc75112ad5: cmp$0x0,%esi
>   *0x7fdc75112ad8*: je 0x7fdc75112af7
>   0x7fdc75112ade: mov$0x76d2414f0,%rax  ;   {oop(a 
> 'java/lang/Class' = 'Main')}
>   0x7fdc75112ae8: mov0x68(%rax),%eax;*getstatic dontOptimize
> ; - Main::simple@0 (line 
> 11)
>
>   0x7fdc75112aeb: add$0x30,%rsp
>   0x7fdc75112aef: pop%rbp
>   0x7fdc75112af0: test   %eax,0x1861060a(%rip)# 
> 0x7fdc8d723100
> ;   {poll_return}
>   0x7fdc75112af6: retq   
>
>
>
>
>
> Though I know understand I cannot understand a such simple code. 
> Especially, what does lines between 0x7fdc75112aac-
> *0x7fdc75112ad8* means? I highlighted it. 
>
>
> This is profiling information. It is accessing the metadata for the 
> method, and adding 8 each time it’s called. The comparison is checking for 
> overflow of that counter, and jumping elsewhere if that’s the case. 
>
> It’s effectively equivalent to:
>
> if method.getMetadata().count++ > 0x1ff8 / 8 goto elsewhere
>
> The purpose is to determine whether or not the method has been called a 
> number of times (i.e. it is “hot”) and will then get called into the C2 
> compiler to re-optimise the method. 
>
> Alex 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: allocation memory in Java

2017-11-20 Thread John Hening

>
> With perf. OpenJDK might need to be configured with debug symbols: 
> --with-native-debug-symbols=internal 
>

@Aleksey, when I try to configure openjdk8 with that option I get an error:

configure: error: unrecognized options: --with-native-debug-symbols


Exactly, I I downloaded jdk8u and I  

./configure --enable-debug --with-freetype-include=/usr/include/freetype2/ 
--with-freetype-lib=/usr/lib/x86_64-linux-gnu/ --with-native-debug-symbols=
internal

Can you help?

 

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: allocation memory in Java

2017-11-18 Thread John Hening
Thanks for your replies :)

@Aleksey, how to get that?

-   17.12% 0.00%  org.openjdk.All  perf-31615.map
   - 0x7faaa3b2d125
  - 16.59% OptoRuntime::new_instance_C
 - 11.49% InstanceKlass::allocate_instance
  2.33% BlahBlahBlahCollectedHeap::mem_allocate  < entry point 
to GC
  0.35% AllocTracer::send_allocation_outside_tlab_event

I mean, how to get a such stacktrace with profiling information? I see 
perf-31615.map what indicates on perf.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


allocation memory in Java

2017-11-17 Thread John Hening
I am reading source of HotSpot sourcode src/share/vm/memory/allocation.hpp 
and I see:

// All classes in the virtual machine must be subclassed
> // by one of the following allocation classes:
> //
> // For objects allocated in the resource area (see resourceArea.hpp).
> // - ResourceObj
> //
> // For objects allocated in the C-heap (managed by: free & malloc).
> // - CHeapObj


For my eye, common objects in Java, like 

new String();

is a CHeapObj. I looked at an implementation of new operator in that class 
and it looks like:

void* CHeapObj::operator new(size_t size){
  return (void *) malloc(size);
}



No, I have no idea why it is said: allocation in Java in very fast- it is 
fater than C++/C

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Master thesis in mechanical sympathy, Java performance.

2017-11-16 Thread John Hening
Hi,
 
I know that there is a lot of experts in Java oriented on "mechanical 
sympathy" here. I am very interested in that subject- however I am a 
beginner. However, I am not clueless about it.  
I'm a bit familiar with the processor architecture, lock-free, garbage free 
and so on. My question is:
Has someone any idea for master thesis in that area? I'm graduating my 
university and I would like write a thesis in interesting for me subject. 

If someone has an idea, feel free to suggest somehting, like general idea. 
If someone consider that post as inadequete, feel free to give me a  sign 
as well.


-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Executing thread by JVM.

2017-11-12 Thread John Hening


Hello, 


I would like to ask for threads in Java. As we know, JVM uses system threads 
(native threads). So, for example it uses Linux threads. In simplification a 
thread is a stack + piece of code to execute. 


Let's consider:


Thread t = new Thread(() -> {`any_code`});
t.start();
t.join();



`any_code` will be compile to bytecode. So, how the thread is executed? We can 
assume that that code wouldn't be jited. I cannot understand what is a content 
of that thread? After all, bytecode must be interpreted 

by JVM. So, does the thread execute a JVM that interprets a bytecode `any_code`?


-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: sun.misc.Unsafe.getAndAddInt implementation, question.

2017-10-17 Thread John Hening
thanks :)

W dniu poniedziałek, 16 października 2017 22:47:07 UTC+2 użytkownik Gil 
Tene napisał:
>
> The bytecode doesn't matter. It's not the javac compiler that will be 
> doing the optimizations you should be worried about. It's the JIT compilers 
> in the JVM. The javac-generated bytecode is only executed by the 
> interpreter. The bytecode is eventually transformed to machine code by the 
> JIT compiler, during which it will undergo aggressive optimization.
>
> The "CPU" you should worry about and model in your mind is not x86, SPARC, 
> or ARM. It's the JVM's execution engine and the JIT-generated machine code 
> that does most of the actual execution. And that "CPU" will reorder the 
> code more aggressively than any HW CPU ever would. The JIT's optimizing 
> transformations include arbitrary and massive re-ordereing, reshaping, 
> folding-together, and completely eliminating big parts off your apparent 
> bytecode instructions. And the JIT will do all those as long as it can 
> prove that the transformations are allowed.
>
> On Monday, October 16, 2017 at 3:30:13 PM UTC+1, John Hening wrote:
>>
>> Thanks Gil Tene! 
>>
>> You obviously right. The read is not volatile so the compiler is allowed 
>> to reorder it. Moreover, the read is not volatile, so the compiler assumes 
>> that noone changes the source of getInt(). So, it can hoist out curr.
>>
>>
>> The last question (sorry for being inquisitive)  is:
>>
>> Let's assume that the compiler generated bytecode equivalent to 
>>
>> public final int getAndAddInt(Object ptr, long offset, int value) {
>>  int curr;
>>  do {
>> curr = this.getInt(ptr, offset); (1)
>>  } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)
>>  return curr;
>> }
>>
>>
>>
>> so it wasn't optimized. 
>>
>> Now, it seems to work correctly. But, note that CPU can make some 
>> reordering. We are lucky because the CPU cannot reorder here: there is a 
>> data dependency: (1) -> (2). So, on every sensible (with data dependency) 
>> CPU it works, yes?
>>
>> W dniu poniedziałek, 16 października 2017 12:13:07 UTC+2 użytkownik Gil 
>> Tene napisał:
>>>
>>> Ok. So the question below (ignoring other optimizations in the JVM that 
>>> are specific to this method) is "If I were doing this myself in some other 
>>> method, would this logic be valid if Unsafe.getIntVolatile() could be be 
>>> replaced with Unsafe.getInt()?"
>>>
>>> The answer IMO is "no".
>>>
>>> The issue here is that unlike e.g. AtomicInteger.compareAndSet(), which 
>>> is explicitly specified to include the behavior of a volatile read on the 
>>> field involved, Unsafe.compareAndSwapInt() does not make any claims about 
>>> exhibiting volatile read semantics. As a result, if you replace 
>>> Unsafe.getIntVolatile() with Unsafe.getInt(), the resulting code:
>>>
>>> public final int getAndAddInt(Object ptr, long offset, int value) {
>>>  int curr;
>>>  do {
>>> curr = this.getInt(ptr, offset); (1)
>>>  } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)
>>>  return curr;
>>> }
>>>
>>> Can be validly transformed by the optimizer to:
>>>
>>> public final int getAndAddInt(Object ptr, long offset, int value) {
>>>  int curr = this.getInt(ptr, offset); (1)
>>>  do {  
>>>  } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)
>>>  return curr;
>>> }
>>>
>>> Because:
>>>
>>> (a) The optimizer can prove that if the compareAndSwapInt ever actually 
>>> wrote to the field, the method would return and curr wouldn't be read again.
>>> (b) Since the read of curr is not volatile, and the read in 
>>> Unsafe.compareAndSwapInt() is not required to act like a volatile read, all 
>>> the reads of curr can be reordered with the all the reads in the 
>>> compareAndSwapInt() calls, which means that they can be folded together and 
>>> hoisted out of the loop.
>>>
>>> If this valid optimization happened, the resulting code would get stuck 
>>> in an infinite loop if another thread modified the field between the read 
>>> of curr and the compareAndSwapInt call, and that is obviously not the 
>>> intended behavior of getAndAddInt()...
>>>
>>> On Sunday, October 15, 2017 at 2:12:30 AM UTC-7, John Hening wrote:
>>>>
>>>> Gil Tene, thanks you very much. 
>>>>
&g

Re: sun.misc.Unsafe.getAndAddInt implementation, question.

2017-10-16 Thread John Hening
Thanks Gil Tene! 

You obviously right. The read is not volatile so the compiler is allowed to 
reorder it. Moreover, the read is not volatile, so the compiler assumes 
that noone changes the source of getInt(). So, it can hoist out curr.


The last question (sorry for being inquisitive)  is:

Let's assume that the compiler generated bytecode equivalent to 

public final int getAndAddInt(Object ptr, long offset, int value) {
 int curr;
 do {
curr = this.getInt(ptr, offset); (1)
 } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)
 return curr;
}



so it wasn't optimized. 

Now, it seems to work correctly. But, note that CPU can make some 
reordering. We are lucky because the CPU cannot reorder here: there is a 
data dependency: (1) -> (2). So, on every sensible (with data dependency) 
CPU it works. 

W dniu poniedziałek, 16 października 2017 12:13:07 UTC+2 użytkownik Gil 
Tene napisał:
>
> Ok. So the question below (ignoring other optimizations in the JVM that 
> are specific to this method) is "If I were doing this myself in some other 
> method, would this logic be valid if Unsafe.getIntVolatile() could be be 
> replaced with Unsafe.getInt()?"
>
> The answer IMO is "no".
>
> The issue here is that unlike e.g. AtomicInteger.compareAndSet(), which is 
> explicitly specified to include the behavior of a volatile read on the 
> field involved, Unsafe.compareAndSwapInt() does not make any claims about 
> exhibiting volatile read semantics. As a result, if you replace 
> Unsafe.getIntVolatile() with Unsafe.getInt(), the resulting code:
>
> public final int getAndAddInt(Object ptr, long offset, int value) {
>  int curr;
>  do {
> curr = this.getInt(ptr, offset); (1)
>  } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)
>  return curr;
> }
>
> Can be validly transformed by the optimizer to:
>
> public final int getAndAddInt(Object ptr, long offset, int value) {
>  int curr = this.getInt(ptr, offset); (1)
>  do {  
>  } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)
>  return curr;
> }
>
> Because:
>
> (a) The optimizer can prove that if the compareAndSwapInt ever actually 
> wrote to the field, the method would return and curr wouldn't be read again.
> (b) Since the read of curr is not volatile, and the read in 
> Unsafe.compareAndSwapInt() is not required to act like a volatile read, all 
> the reads of curr can be reordered with the all the reads in the 
> compareAndSwapInt() calls, which means that they can be folded together and 
> hoisted out of the loop.
>
> If this valid optimization happened, the resulting code would get stuck in 
> an infinite loop if another thread modified the field between the read of 
> curr and the compareAndSwapInt call, and that is obviously not the intended 
> behavior of getAndAddInt()...
>
> On Sunday, October 15, 2017 at 2:12:30 AM UTC-7, John Hening wrote:
>>
>> Gil Tene, thanks you very much. 
>>
>> Ok, so does it mean that Unsafe.getIntVolatile() could be be replaced 
>> with Unsafe.getInt()?
>>
>> W dniu niedziela, 15 października 2017 01:34:34 UTC+2 użytkownik Gil Tene 
>> napisał:
>>>
>>> A simple answer would be that the field is treated by the method as a 
>>> volatile, and the code is simply staying consistent with that notion. Is an 
>>> optimization possible here? Possibly. Probably. But does it matter? No. The 
>>> source code involved is not performance critical, and is not worth 
>>> optimizing. The interpreter may be running this logic, but no hot path 
>>> would be executing the actual logic in this code... 
>>>
>>> Why?  Because the java code you see there is NOT what the hot code would 
>>> be doing on (most) JVMs. Specifically, optimizing JITs can and will 
>>> identify and intrinsify the method, replacing it's body with code that does 
>>> whatever they want it to do. They don't have to perform any of the actual 
>>> the logic in the method, as long as they make sure the method's performs 
>>> it's intended (contracted) function. and that contracted functionality is 
>>> to perform a getAndAddInt on a field, treating it logically as a volatile.
>>>
>>> For example, on x86 there is support for atomic add via the XADD 
>>> instruction. Using XADD for this method's functionality has multiple 
>>> advantages over doing the as-coded CAS loop. And most optimizing JITs will 
>>> [transparently] use an XADD in place of a CAS in this case and get rid of 
>>> the loop altogether.
>>>
>>> On Saturday, October 14, 2017 at 6:58:17 AM UTC-7, John Hening wrote:
>>>>
>>&g

sun.misc.Unsafe.getAndAddInt implementation, question.

2017-10-14 Thread John Hening


Hello


it is an implementation from sun.misc.Unsafe.



public final int getAndAddInt(Object ptr, long offset, int value) {
 int curr;
 do {
curr = this.getIntVolatile(ptr, offset); (1)
 } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)

 return curr;}





Why there is 
Unsafe.getIntVolatile()

 called instead of 
Unsafe.getInt()

 here?


I am basically familiar with memory models, memory barriers etc., but, perhaps 
I don't see any something important. 


*getIntVolatile* means here: ensure the order of execution: (1) -> (2)


It looks something like:


curr = read(); 
acquire();
CAS operation



Obviously, acquire() depends on CPU, for example on x86 it is empty, on ARM it 
is a memory barrier, etc. 


My question/misunderstanding:


For my eye the order is ensured by data dependency between read of *(ptr + 
offset)* and *CAS* operation on it. So, I don't see a reason to worry about 
memory (re)ordering. 



-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.