[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-24 Thread roshannaik
Github user roshannaik commented on the pull request:

https://github.com/apache/storm/pull/1250#issuecomment-200924901
  
yes .. that corresponds with my observations with profiling... clojure 
lookups and reflection are among the most common bottlenecks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-24 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/1250#issuecomment-200853358
  
@roshannaik 

Yes we need to do a renewed effort in profiling once a lot of the code has 
moved to java for JStorm.  I know that the throughput is about 1/2 of what it 
is in 0.10, and 1.0.  I didn't do any profiling, but I have chocked it up to 
clojure reflection showing up again, because anytime clojure calls into java it 
is ridiculously easy to have clojure use reflection to get the job done.

But I have not profiled so I don't know for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread roshannaik
Github user roshannaik commented on the pull request:

https://github.com/apache/storm/pull/1250#issuecomment-200609948
  
My gut feeling based on profiling storm recently is that there are much 
bigger bottlenecks elsewhere that will eclipse any potential improvement this 
can deliver.  I would recommend verifying with a simple and quantify the 
performance improvement in throughput / latency this delivers.  

  Having said that, assuming it is a safe optimization, even if it does not 
improve perf in the current code base, as other bottlenecks get addressed.. 
eventually this should help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread hustfxj
Github user hustfxj closed the pull request at:

https://github.com/apache/storm/pull/1250


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/1250#issuecomment-200490900
  
@hustfxj 

If you want to close this pull request I will leave it up to you. In some 
situations the uniqueness of a number is important, and having Random emit 
truly unique values in a thread safe way is important.  This is specifically 
when creating the tuple IDs that will be used with acking.

In other situations we are using Random to do sub-sampling where the 
uniqueness of the numbers is not critical.  The correctness of the system is 
not compromised if the numbers are repeated or poorly distributed.  The only 
concern for those situations would be if we violate a contract, like we 
generate a random number that is not within the range specified by the API, or 
we deadlock, etc.  Looking at the code here we will never violate those 
constraints.  The worst thing that happens is that we may repeat some "random" 
numbers in different threads because the compiler cached the seed in a local 
register and didn't write back to memory.  For me the extra performance can 
outweigh the less then ideal situation.

Looking at ThreadLocalRandom, they are using unsafe operations and I don't 
know enough about the internal implementation to feel comfortable saying if it 
will or will not violate any of the constraints, but I don't think it will.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread hustfxj
Github user hustfxj commented on the pull request:

https://github.com/apache/storm/pull/1250#issuecomment-200444929
  
@revans2  It is used in a non-thread safe way, especialy spout/bolt thread. 
So we think it may not make sense. So I will close the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread hustfxj
Github user hustfxj commented on the pull request:

https://github.com/apache/storm/pull/1250#issuecomment-200402469
  
@revans2  ThreadLocalRandom is 20% faster than XORShiftRandom. But 
ThreadLocalRandom is static.Yes, we can't use XORShiftRandom in 
executor.clj due to thread safety now. But if we assure every spout/bolt thread 
has itself XORShiftRandom object. Thus we  can. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57174134
  
--- Diff: storm-core/src/clj/org/apache/storm/config.clj ---
@@ -53,7 +54,7 @@
   [freq]
   (let [freq (int freq)
 start (int 0)
-r (java.util.Random.)
+r (XORShiftRandom.)
--- End diff --

Here too the code can be called from multiple threads, but the worst that 
happens is we double up on sampling some items for metrics.  Ideally this would 
have a comment explaining this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57173711
  
--- Diff: storm-core/src/clj/org/apache/storm/daemon/executor.clj ---
@@ -75,7 +76,7 @@
   "Returns a function that returns a vector of which task indices to send 
tuple to, or just a single task index."
   [^WorkerTopologyContext context component-id stream-id ^Fields 
out-fields thrift-grouping ^List target-tasks topo-conf]
   (let [num-tasks (count target-tasks)
-random (Random.)
+random (XORShiftRandom.)
--- End diff --

Again here the grouping could be called from multiple threads, but in this 
case doubling up on some numbers should not be a big deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57167482
  
--- Diff: 
storm-core/src/jvm/org/apache/storm/transactional/TransactionalSpoutCoordinator.java
 ---
@@ -71,7 +72,7 @@ public ITransactionalSpout getSpout() {
 
 @Override
 public void open(Map conf, TopologyContext context, 
SpoutOutputCollector collector) {
-_rand = new Random(Utils.secureRandomLong());
+_rand = new XORShiftRandom(Utils.secureRandomLong());
--- End diff --

This should be fine it is only ever called from a single thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57167305
  
--- Diff: storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java 
---
@@ -35,7 +36,7 @@
 
 @Override
 public void prepare(WorkerTopologyContext context, GlobalStreamId 
stream, List targetTasks) {
-random = new Random();
+random = new XORShiftRandom();
--- End diff --

Same as above.  Having repeated numbers is not a big deal here, but we do 
want to have a comment declaring that this is safe and what the ramifications 
are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57167152
  
--- Diff: 
storm-core/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java ---
@@ -36,7 +37,7 @@
 
 @Override
 public void prepare(WorkerTopologyContext context, GlobalStreamId 
stream, List targetTasks) {
-random = new Random();
+random = new XORShiftRandom();
--- End diff --

Mostly if we are going to use something that is not thread safe here we 
should explain why it is OK, and what the drawbacks are, because this is 
supposed to be thread safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57167003
  
--- Diff: 
storm-core/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java ---
@@ -36,7 +37,7 @@
 
 @Override
 public void prepare(WorkerTopologyContext context, GlobalStreamId 
stream, List targetTasks) {
-random = new Random();
+random = new XORShiftRandom();
--- End diff --

Thread safety here is nice, but not a requirement.  If we double up on a 
few routs it is probably not that big of a deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57166843
  
--- Diff: storm-core/src/clj/org/apache/storm/daemon/executor.clj ---
@@ -698,7 +699,7 @@
 executor-stats (:stats executor-data)
 {:keys [storm-conf component-id worker-context transfer-fn 
report-error sampler
 open-or-prepare-was-called?]} executor-data
-rand (Random. (Utils/secureRandomLong))
+rand (XORShiftRandom. (Utils/secureRandomLong))
--- End diff --

Same reason as with the spout, but differently.  There is a much higher 
likelihood that this will be used in a non-thread safe way.  But it is less 
likely that it will result in a tuple being counted improperly, unless multiple 
threads are emitting tuples for the same tuple tree.  It is unlikely, but still 
not necessarily safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/1250#discussion_r57166147
  
--- Diff: storm-core/src/clj/org/apache/storm/daemon/executor.clj ---
@@ -508,7 +509,7 @@
 ^Integer max-spout-pending (if max-spout-pending (int 
max-spout-pending))
 last-active (atom false)
 spouts (ArrayList. (map :object (vals task-datas)))
-rand (Random. (Utils/secureRandomLong))
+rand (XORShiftRandom. (Utils/secureRandomLong))
--- End diff --

We cannot use a thread local random here, because it is placed into 
functions that are not called from a single thread.  What is more the 
implementation of XORShiftRandom is that if we do lose the race, it is likely 
that we will repeat a number which could result in tuples not being tracked 
properly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/1250#issuecomment-200363377
  
How does the performance compare to 


http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadLocalRandom.html

which is also not thread safe, but more standard?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1650] improve performance by XORShiftRa...

2016-03-23 Thread hustfxj
GitHub user hustfxj opened a pull request:

https://github.com/apache/storm/pull/1250

[STORM-1650] improve performance by XORShiftRandom

XORShiftRandom have much better performance than Random, So I use 
XORShiftRandom replace Random at some places

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hustfxj/storm rand

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1250


commit 56f27e5d58d7abd1bdd9aff95dfb862540b166ef
Author: xiaojian.fxj 
Date:   2016-03-16T06:02:10Z

Merge branch 'master' of github.com:apache/storm

commit d63167cc4a13289ef46b5fa1650621c57b191d3b
Author: xiaojian.fxj 
Date:   2016-03-17T01:29:54Z

Merge branch 'master' of github.com:apache/storm

commit 2e2ffb29df039e9339e7b2b44352c744efb5caf0
Author: xiaojian.fxj 
Date:   2016-03-18T13:16:44Z

Merge branch 'master' of github.com:apache/storm

commit 28867372a4fc96d744ccd00a27d9e38dab2bd49e
Author: xiaojian.fxj 
Date:   2016-03-23T03:10:08Z

Merge branch 'master' of github.com:apache/storm

commit b3c4d810be30a98b6c874abe535dd82bc2d4e13c
Author: xiaojian.fxj 
Date:   2016-03-23T12:22:34Z

improve performance by XORShiftRandom




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---