[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111240#comment-14111240
 ] 

ASF GitHub Bot commented on HELIX-470:
--

GitHub user brandtg opened a pull request:

https://github.com/apache/helix/pull/2

[HELIX-470] Netty-based IPC layer



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brandtg/helix master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/2.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2


commit f2475fa9a6123052fea2588cdd4e439ddc7af020
Author: Greg Brandt 
Date:   2014-08-26T20:14:36Z

[HELIX-470] Netty-based IPC layer




> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111314#comment-14111314
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16741557
  
--- Diff: helix-ipc/LICENSE ---
@@ -0,0 +1,273 @@
+
--- End diff --

Do not need extra LICENSE file here since the main LICENSE file will be 
used when doing the packaging


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113954#comment-14113954
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/2#issuecomment-53754601
  
If no one has any objections, I plan to merge this for our 0.7.1 beta 
release so that we can increase collaboration on it and move things forward 
iteratively.


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113974#comment-14113974
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16853293
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

Since we release Helix as per-module binary bundles, personally I think 
having specific NOTICE files makes more sense -- that way, people can pick and 
choose which modules to use without needing to worry about dependencies present 
in other modules.


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113983#comment-14113983
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16853492
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

So is this going to separate source zip? If this would be under the parent 
umbrella of main source the NOTICE need to be on top level directory [1]

[1] http://www.apache.org/legal/src-headers.html


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113972#comment-14113972
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16853182
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

Forgot to comment this one. Please move NOTICE file content to NOTICE in 
top level Helix directory. Then you can remove this one.

NOTICE file should be bundled into one when making releases and just copy 
this to main NOTICE will make packaging easier.


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113997#comment-14113997
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16854005
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

@kanakb, sorry looks like Helix already put NOTICE file in each module that 
released separately.
I guess if we package this separately then NOTICE for this module should 
stay but I think the top level NOTICE file should have copy of the content.

Does each NOTICE file in the sub module does not have copy to top NOTICE 
file?


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113999#comment-14113999
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16854152
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

@hsaputra Yes, the top-level NOTICE is currently a superset of all 
submodule NOTICE files, though this is done manually.


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113998#comment-14113998
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16854042
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

Ah, I think we have things somewhat backwards. Right now all of our 
submodules have LICENSE and NOTICE files, but I guess these only need to be at 
the top level? Helix has a single source release and per-module binary releases.


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114012#comment-14114012
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16854782
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

In any case, we need to figure out what is right in general, and apply it 
to all our submodules. I have created 
https://issues.apache.org/jira/browse/HELIX-509 to track this work.


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114027#comment-14114027
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16855212
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

Sounds good, for this PR we could keep the NOTICE file to be inline with 
others. 
@brandtg, could you add copy of this module NOTICE file to the top level 
one like other modules?


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114032#comment-14114032
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/2#discussion_r16855377
  
--- Diff: helix-ipc/NOTICE ---
@@ -0,0 +1,33 @@
+Apache Helix
--- End diff --

@hsaputra Greg is on vacation; I can take care of that during the merge


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-470) Add performant IPC (Helix actors)

2014-08-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114067#comment-14114067
 ] 

ASF GitHub Bot commented on HELIX-470:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/2


> Add performant IPC (Helix actors)
> -
>
> Key: HELIX-470
> URL: https://issues.apache.org/jira/browse/HELIX-470
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.7.1, 0.6.4
>Reporter: Greg Brandt
>
> Helix is missing a high-performance way to exchange messages among resource 
> partitions, with a user-friendly API.
> Currently, the Helix messaging service relies on creating many nodes in 
> ZooKeeper, which can lead to ZooKeeper outages if messages are sent too 
> frequently.
> In order to avoid this, high-performance NIO-based {{HelixActors}} should be 
> implemented (in rough accordance with the actor model). {{HelixActors}} 
> exchange messages asynchronously without waiting for a response, and are 
> partition/state-addressable.
> The API would look something like this:
> {code}
> public interface HelixActor {
> void send(Partition partition, String state, T message);
> void register(String resource, HelixActorCallback callback);
> }
> public interface HelixActorCallback {
> void onMessage(Partition partition, State state, T message);
> }
> {code}
> {{#send}} should likely support wildcards for partition number and state, or 
> its method signature might need to be massaged a little bit for more 
> flexibility. But that's the basic idea.
> Nothing is inferred about the format of the messages - the only metadata we 
> need to be able to interpret is (1) partition name and (2) state. The user 
> provides a codec to encode / decode messages, so it's nicer to implement 
> {{HelixActor#send}} and {{HelixActorCallback#onMessage}}.
> {code}
> public interface HelixActorMessageCodec {
> byte[] encode(T message);
> T decode(byte[] message);
> }
> {code}
> Actors should support somewhere around 100k to 1M messages per second. The 
> Netty framework is a potential implementation candidate, but should be 
> thoroughly evaluated w.r.t. performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155861#comment-14155861
 ] 

ASF GitHub Bot commented on HELIX-524:
--

GitHub user dayzzz opened a pull request:

https://github.com/apache/helix/pull/6

[HELIX-524] Add a getProgress to the Task interface

[HELIX-524] Add a getProgress to the Task interface, this is very helpful 
for long running tasks, from which we know the status of a task and see if it's 
blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
task is finished

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dayzzz/helix master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/6.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6


commit 5d1b27f81abc4c67301d864f122e82f5a0ce49c3
Author: Hongbo Zeng 
Date:   2014-10-02T00:19:49Z

[HELIX-524] Add a getProgress to the Task interface




> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156118#comment-14156118
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-57588016
  
I like this in general, but I have a couple questions/comments:

* This is an interface change. Is there a way we can provide this 
functionality without requiring that all existing task implementations be 
rewritten?
* What is the persistence story (if any)? Who calls getProgress?


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156204#comment-14156204
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user brandtg commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-57596146
  
For persistence story, what about putting a new health report: 
`/STAND_ALONE/INSTANCES/{instanceName}/HEALTHREPORT/taskStatus`?

Another thing to consider is that adding `#getProgress` to the task 
interface requires users to be able to intelligently report task status. I 
suspect that in practice someone might be annoyed at this extra responsibility, 
and provide dummy numbers (sounds stupid, but saw it before).

Maybe a better approach would be to try to monitor on things we know a 
priori about task (e.g. lifetime) and provide tools to inspect ones that seem 
stuck (e.g. task/partition-addressable stack trace)?


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156218#comment-14156218
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user brandtg commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-57597306
  
Wait, that stuff's gone now? Ok never mind.

Is persistence of the state of a running task completely necessary? (Like 
would JMX suffice?)


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157419#comment-14157419
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user dayzzz commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-57728028
  
* About the interface change, IMHO, this is a nature part of the task 
interface. If the existing customers want to update, adding the support is 
encouraged. We can have another interface for monitoring the progress, but 
doesn't seem to be a nice design.
* The task framework should call the progress interface and expose the 
status somehow, discussed below.
* Persistency story, ZK is a good place if we only want to record the final 
result. If we want to expose the progress as a task runs, putting these 
periodical status updates in ZK is not a choice due to the large traffic, 
generally ZK is not a good place for reporting and monitoring service status. I 
also discussed this with Jason, we thought about inGraph (which is not an 
option for open source), Kafka or Riemann. (Greg, JMX sounds a good idea.) 
Without a conclusion of where to put these status stats, I agree that the 
progress interface is not of much value. For the first step, it would be good 
enough if we can monitor the progress. What do you guys think?
* For the bogus progress number, it's the customer themselves who need to 
track the progress, if they want to see the bogus value, I'm fine with that :). 
The controller should be set not to act on the bogus value by the customers.


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157483#comment-14157483
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-57730714
  
Why not have an abstract ProgressReportingTask that implements task and 
includes all the JMX persistence, and then have tasks extends from that and 
implement getProgress()? That avoids the interface change, but also allows 
"smarter" tasks to let their progress be known.


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164230#comment-14164230
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user dayzzz commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-58433144
  
Sorry for the delay. Are there any customers outside the Espresso team? If 
we are the only customer right now, it would not be too costly to update the 
implementation with the interface. Making the getProgress into the interface 
has a good thing that it naturally goes into TaskRunner and we don't need to do 
something like "instance of ProgressReportingTask" and then call the 
getProgress, or a subclass of TaskRunner which is specific for 
ProgressReportingTask and call the getProgress.


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164469#comment-14164469
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-58448716
  
Since this is an open source project, the real answer to this question is 
"I don't know." We've made two public releases with this change, and so if 
there is anyone who has integrated or is planning to integrate with the task 
framework, they would need to made a change.

If `ProgressReportingTask` handles the JMX reporting itself (it's an 
abstract class, not an interface), then `TaskRunner` wouldn't need to care 
about progress at all.

And back to the "bogus" return value discussion, I think customers who 
don't have a good sense of progress shouldn't be required to change their 
integration just to implement `getProgress()`.


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164660#comment-14164660
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-58459371
  
Email from Kishore:

```
Apologize to throw in another idea into the mix. Why not have additional 
methods in the implementation that the agent can invoke on demand based on 
annotation?. This extends the way we invoke methods on the statemodel.

For example we have

@Transition(TO=MASTER, FROM=SLAVE)
void fromSlaveTOMaster(Message m, NotificationContext ctx){

}

Can we have something similar as 

@Method(name="getProgress")
Response getProgress(Message m, NotificationContext ctx){

}

Helix provides a generic way to invoke a method on a partition. I think 
this is more powerful , does not disturb any interfaces and can be extending to 
do custom stuff.
Also these methods will not be invoked via ZK, instead the controller can 
directly invoke the method on the participant.

Feedback?
```


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164661#comment-14164661
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-58459614
  
I think having a `@ProgressReporter` could be an elegant way to allow users 
to plug in additional functionality. Progress could be just the first thing a 
task could expose, and in the future if we think of others, we just need to add 
a new annotation.


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165886#comment-14165886
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user zzhang5 commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-58587902
  
If we are not using ZK to invoke these methods, are we opening some kind of 
end-point e.g. via Netty or JMX on each participant?




> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-524) add getProgress() to Task interface

2014-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167167#comment-14167167
 ] 

ASF GitHub Bot commented on HELIX-524:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/6#issuecomment-58689390
  
Kishore from JIRA comments:

```
Yes, thats the idea. With helix-ipc this will be possible rt? This can be
extended to write rebalancers that talk to nodes to get the high water mark
to decide new master. what do you think?
```

Jason from JIRA comments:

```
Yes. Helix-IPC will do the job. We can also extend the idea to get high
water mark, etc. Helix task frame exposes Task interface, so users are not
implementing StateModel directly. How can we make this available to Task
also?
```


> add getProgress() to Task interface
> ---
>
> Key: HELIX-524
> URL: https://issues.apache.org/jira/browse/HELIX-524
> Project: Apache Helix
>  Issue Type: Improvement
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hongbo Zeng
> Fix For: 0.6.5
>
>
> Add a getProgress to the Task interface, this is very helpful for long 
> running tasks, from which we know the status of a task and see if it's 
> blocked. The return value is a double, ranging from 0 to 1.0, 1.0 indicates a 
> task is finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-525) Drop a partition from resource ideal-state shall bring partition to initial state and then DROPPED state

2014-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177505#comment-14177505
 ] 

ASF GitHub Bot commented on HELIX-525:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/7

[HELIX-525] Add integration tests to verify that dropping a partition from 
resource ...

Add integration tests to verify that dropping a partition from resource 
ideal-state should bring partition to initial state and then DROPPED state (for 
AUTO, SEMI_AUTO, and CUSTOM modes).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/7.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7


commit 51b757b71693ec2f774035eac441dd51956e5862
Author: Lei Xia 
Date:   2014-10-20T21:09:16Z

Add integration tests to verify that dropping a partition from resource 
ideal-state should bring partition to initial state and then DROPPED state (for 
AUTO, SEMI_AUTO, and CUSTOM modes).




> Drop a partition from resource ideal-state shall bring partition to initial 
> state and then DROPPED state
> 
>
> Key: HELIX-525
> URL: https://issues.apache.org/jira/browse/HELIX-525
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Zhen Zhang
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If we manually remove a partition from ideal-state, Helix should bring the 
> partition to initial state (e.g. OFFLINE) on all hosts and then to DROPPED 
> state.
> - Add an integration test to verify this (for AUTO, SEMI_AUTO, and CUSTOM 
> modes)
> - Fix it if not behave in the  expected way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-525) Drop a partition from resource ideal-state shall bring partition to initial state and then DROPPED state

2014-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178000#comment-14178000
 ] 

ASF GitHub Bot commented on HELIX-525:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/7#discussion_r19130517
  
--- Diff: 
helix-core/src/test/java/org/apache/helix/integration/TestDrop.java ---
@@ -481,11 +507,27 @@ public void testDropSinglePartitionSemiAuto() throws 
Exception {
 4, // partitions per resource
 n, // number of nodes
 2, // replicas
-"MasterSlave", true); // do rebalance
+"MasterSlave", mode, 
(IdealState.RebalanceMode.FULL_AUTO.equals(mode) || 
IdealState.RebalanceMode.SEMI_AUTO
+.equals(mode))); // do rebalance only when it is in AUTO or 
SEMI-AUTO mode
--- End diff --

Why is the default rebalance behavior insufficient for CUSTOMIZED mode?


> Drop a partition from resource ideal-state shall bring partition to initial 
> state and then DROPPED state
> 
>
> Key: HELIX-525
> URL: https://issues.apache.org/jira/browse/HELIX-525
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Zhen Zhang
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If we manually remove a partition from ideal-state, Helix should bring the 
> partition to initial state (e.g. OFFLINE) on all hosts and then to DROPPED 
> state.
> - Add an integration test to verify this (for AUTO, SEMI_AUTO, and CUSTOM 
> modes)
> - Fix it if not behave in the  expected way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-525) Drop a partition from resource ideal-state shall bring partition to initial state and then DROPPED state

2014-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179299#comment-14179299
 ] 

ASF GitHub Bot commented on HELIX-525:
--

Github user lei-xia commented on the pull request:

https://github.com/apache/helix/pull/7#issuecomment-60014784
  
It is actually not necessary, thanks for pointing out.  I updated it, also 
use TestNG dataprovider to supply RebalanceMode to avoid write one test method 
for each mode.  I will apply the same strategy to many of our existing tests to 
testing code redundancy in following checkins if you guys are happy with it. 


> Drop a partition from resource ideal-state shall bring partition to initial 
> state and then DROPPED state
> 
>
> Key: HELIX-525
> URL: https://issues.apache.org/jira/browse/HELIX-525
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Zhen Zhang
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If we manually remove a partition from ideal-state, Helix should bring the 
> partition to initial state (e.g. OFFLINE) on all hosts and then to DROPPED 
> state.
> - Add an integration test to verify this (for AUTO, SEMI_AUTO, and CUSTOM 
> modes)
> - Fix it if not behave in the  expected way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-525) Drop a partition from resource ideal-state shall bring partition to initial state and then DROPPED state

2014-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179519#comment-14179519
 ] 

ASF GitHub Bot commented on HELIX-525:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/7#issuecomment-60031395
  
LGTM


> Drop a partition from resource ideal-state shall bring partition to initial 
> state and then DROPPED state
> 
>
> Key: HELIX-525
> URL: https://issues.apache.org/jira/browse/HELIX-525
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Zhen Zhang
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If we manually remove a partition from ideal-state, Helix should bring the 
> partition to initial state (e.g. OFFLINE) on all hosts and then to DROPPED 
> state.
> - Add an integration test to verify this (for AUTO, SEMI_AUTO, and CUSTOM 
> modes)
> - Fix it if not behave in the  expected way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-525) Drop a partition from resource ideal-state shall bring partition to initial state and then DROPPED state

2014-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179526#comment-14179526
 ] 

ASF GitHub Bot commented on HELIX-525:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/7


> Drop a partition from resource ideal-state shall bring partition to initial 
> state and then DROPPED state
> 
>
> Key: HELIX-525
> URL: https://issues.apache.org/jira/browse/HELIX-525
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Zhen Zhang
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If we manually remove a partition from ideal-state, Helix should bring the 
> partition to initial state (e.g. OFFLINE) on all hosts and then to DROPPED 
> state.
> - Add an integration test to verify this (for AUTO, SEMI_AUTO, and CUSTOM 
> modes)
> - Fix it if not behave in the  expected way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-537) org.apache.helix.task.TaskStateModel should have a shutdown method.

2014-11-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206903#comment-14206903
 ] 

ASF GitHub Bot commented on HELIX-537:
--

GitHub user atcurtis opened a pull request:

https://github.com/apache/helix/pull/8

[HELIX-537] Shutdown executors

https://issues.apache.org/jira/browse/HELIX-537


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/atcurtis/helix HELIX-537

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/8.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8


commit d6fbcf17e01439a290a0cc273933660274841663
Author: Antony T Curtis 
Date:   2014-11-11T19:17:43Z

[HELIX-537] Shutdown executors




> org.apache.helix.task.TaskStateModel should have a shutdown method.
> ---
>
> Key: HELIX-537
> URL: https://issues.apache.org/jira/browse/HELIX-537
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> There should be a shutdown method to terminate the Timer and Executor which 
> the org.apache.helix.task.TaskStateModel class creates.
> ie.
> {noformat}
> public boolean shutdown(long timeout, TimeUnit unit)
>   throws InterruptedException
> {
>   reset();
>   _taskExecutor.shutdown();
>   _timer.cancel();
>   return _taskExecutor.awaitTermination(timeout, unit);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-537) org.apache.helix.task.TaskStateModel should have a shutdown method.

2014-11-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207673#comment-14207673
 ] 

ASF GitHub Bot commented on HELIX-537:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/8#issuecomment-62669496
  
```[INFO] -
[ERROR] COMPILATION ERROR :
[INFO] -
[ERROR] 
/Users/kanak/Developer/incubator-helix/helix-core/src/main/java/org/apache/helix/task/TaskStateModel.java:[70,2]
 missing return statement
[INFO] 1 error
[INFO] -
[INFO] 

[INFO] BUILD FAILURE```

`TaskStateModel#shutdown` has return type `boolean`, but returns nothing.


> org.apache.helix.task.TaskStateModel should have a shutdown method.
> ---
>
> Key: HELIX-537
> URL: https://issues.apache.org/jira/browse/HELIX-537
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> There should be a shutdown method to terminate the Timer and Executor which 
> the org.apache.helix.task.TaskStateModel class creates.
> ie.
> {noformat}
> public boolean shutdown(long timeout, TimeUnit unit)
>   throws InterruptedException
> {
>   reset();
>   _taskExecutor.shutdown();
>   _timer.cancel();
>   return _taskExecutor.awaitTermination(timeout, unit);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-537) org.apache.helix.task.TaskStateModel should have a shutdown method.

2014-11-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207674#comment-14207674
 ] 

ASF GitHub Bot commented on HELIX-537:
--

Github user atcurtis commented on the pull request:

https://github.com/apache/helix/pull/8#issuecomment-62669668
  
Oops,

I pushed the wrong commit to github. I’ll force push the correct one to my 
repo.

On Nov 11, 2014, at 8:42 PM, Kanak Biscuitwala  
wrote:

> [INFO] -
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
/Users/kanak/Developer/incubator-helix/helix-core/src/main/java/org/apache/helix/task/TaskStateModel.java:[70,2]
 missing return statement
> [INFO] 1 error
> [INFO] -
> [INFO] 

> [INFO] BUILD FAILURE
> 
> TaskStateModel#shutdown has return type boolean, but returns nothing.
> 
> —
> Reply to this email directly or view it on GitHub.
> 


> org.apache.helix.task.TaskStateModel should have a shutdown method.
> ---
>
> Key: HELIX-537
> URL: https://issues.apache.org/jira/browse/HELIX-537
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> There should be a shutdown method to terminate the Timer and Executor which 
> the org.apache.helix.task.TaskStateModel class creates.
> ie.
> {noformat}
> public boolean shutdown(long timeout, TimeUnit unit)
>   throws InterruptedException
> {
>   reset();
>   _taskExecutor.shutdown();
>   _timer.cancel();
>   return _taskExecutor.awaitTermination(timeout, unit);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-537) org.apache.helix.task.TaskStateModel should have a shutdown method.

2014-11-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207681#comment-14207681
 ] 

ASF GitHub Bot commented on HELIX-537:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/8#issuecomment-62670141
  
LGTM, tests pass, will merge.


> org.apache.helix.task.TaskStateModel should have a shutdown method.
> ---
>
> Key: HELIX-537
> URL: https://issues.apache.org/jira/browse/HELIX-537
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> There should be a shutdown method to terminate the Timer and Executor which 
> the org.apache.helix.task.TaskStateModel class creates.
> ie.
> {noformat}
> public boolean shutdown(long timeout, TimeUnit unit)
>   throws InterruptedException
> {
>   reset();
>   _taskExecutor.shutdown();
>   _timer.cancel();
>   return _taskExecutor.awaitTermination(timeout, unit);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-537) org.apache.helix.task.TaskStateModel should have a shutdown method.

2014-11-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207745#comment-14207745
 ] 

ASF GitHub Bot commented on HELIX-537:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/8


> org.apache.helix.task.TaskStateModel should have a shutdown method.
> ---
>
> Key: HELIX-537
> URL: https://issues.apache.org/jira/browse/HELIX-537
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> There should be a shutdown method to terminate the Timer and Executor which 
> the org.apache.helix.task.TaskStateModel class creates.
> ie.
> {noformat}
> public boolean shutdown(long timeout, TimeUnit unit)
>   throws InterruptedException
> {
>   reset();
>   _taskExecutor.shutdown();
>   _timer.cancel();
>   return _taskExecutor.awaitTermination(timeout, unit);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-549) Discarding Throwable exceptions makes threads unkillable.

2014-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215180#comment-14215180
 ] 

ASF GitHub Bot commented on HELIX-549:
--

GitHub user atcurtis opened a pull request:

https://github.com/apache/helix/pull/9

[HELIX-549] Rethrow ThreadDeath instead of discarding.

https://issues.apache.org/jira/browse/HELIX-549


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/atcurtis/helix HELIX-549

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/9.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9






> Discarding Throwable exceptions makes threads unkillable.
> -
>
> Key: HELIX-549
> URL: https://issues.apache.org/jira/browse/HELIX-549
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> Threads in loops which catch and discard Throwable end up discarding 
> ThreadDeath exceptions. This causes Thread.stop() to be effectively ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215282#comment-14215282
 ] 

ASF GitHub Bot commented on HELIX-550:
--

GitHub user atcurtis opened a pull request:

https://github.com/apache/helix/pull/10

[HELIX-550] ZKHelixManager should shutdown GenericHelixController.

https://issues.apache.org/jira/browse/HELIX-550

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/atcurtis/helix HELIX-550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10


commit 073adf5e9da0d9508cfbc42df4fde460055db714
Author: Antony T Curtis 
Date:   2014-11-17T22:04:44Z

[HELIX-550] ZKHelixManager should shutdown GenericHelixController.




> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215426#comment-14215426
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user atcurtis closed the pull request at:

https://github.com/apache/helix/pull/10


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215498#comment-14215498
 ] 

ASF GitHub Bot commented on HELIX-550:
--

GitHub user atcurtis opened a pull request:

https://github.com/apache/helix/pull/11

[HELIX-550] ZKHelixManager should shutdown GenericHelixController.

https://issues.apache.org/jira/browse/HELIX-550

tested with mvn test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/atcurtis/helix HELIX-550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/11.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11


commit af882ea025b1daf821f9f17f969a587ca7ec3e17
Author: Antony T Curtis 
Date:   2014-11-17T22:04:44Z

[HELIX-550] ZKHelixManager should shutdown GenericHelixController.




> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-549) Discarding Throwable exceptions makes threads unkillable.

2014-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217380#comment-14217380
 ] 

ASF GitHub Bot commented on HELIX-549:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/9#issuecomment-63589932
  
LGTM, merged


> Discarding Throwable exceptions makes threads unkillable.
> -
>
> Key: HELIX-549
> URL: https://issues.apache.org/jira/browse/HELIX-549
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> Threads in loops which catch and discard Throwable end up discarding 
> ThreadDeath exceptions. This causes Thread.stop() to be effectively ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217391#comment-14217391
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/11#discussion_r20557176
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ---
@@ -573,6 +573,15 @@ public void shutdownClusterStatusMonitor(String 
clusterName) {
 }
   }
 
+  public void shutdown() throws InterruptedException {
+stopRebalancingTimer();
+while (_eventThread.isAlive())
+{
+  _eventThread.interrupt();
+  _eventThread.join(1000);
--- End diff --

Can you change this to a constant variable?


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217393#comment-14217393
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user atcurtis commented on a diff in the pull request:

https://github.com/apache/helix/pull/11#discussion_r20557205
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ---
@@ -573,6 +573,15 @@ public void shutdownClusterStatusMonitor(String 
clusterName) {
 }
   }
 
+  public void shutdown() throws InterruptedException {
+stopRebalancingTimer();
+while (_eventThread.isAlive())
+{
+  _eventThread.interrupt();
+  _eventThread.join(1000);
--- End diff --

Sure. Any preference for the constant name?


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217396#comment-14217396
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/11#discussion_r20557259
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ---
@@ -573,6 +573,15 @@ public void shutdownClusterStatusMonitor(String 
clusterName) {
 }
   }
 
+  public void shutdown() throws InterruptedException {
+stopRebalancingTimer();
+while (_eventThread.isAlive())
+{
+  _eventThread.interrupt();
+  _eventThread.join(1000);
--- End diff --

Maybe something like `EVENT_THREAD_JOIN_TIMEOUT`?


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217397#comment-14217397
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/11#discussion_r20557289
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java ---
@@ -543,6 +554,19 @@ public void disconnect() {
   _zkclient.close();
   _zkclient = null;
   LOG.info("Cluster manager: " + _instanceName + " disconnected");
+
+  if (_controller != null) {
+try {
+  _controller.shutdown();
+}
--- End diff --

nit: can you make the `catch` start on the same line as the close brace of 
the `try`?


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217400#comment-14217400
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user atcurtis commented on a diff in the pull request:

https://github.com/apache/helix/pull/11#discussion_r20557401
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java ---
@@ -543,6 +554,19 @@ public void disconnect() {
   _zkclient.close();
   _zkclient = null;
   LOG.info("Cluster manager: " + _instanceName + " disconnected");
+
+  if (_controller != null) {
+try {
+  _controller.shutdown();
+}
--- End diff --

np.


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217412#comment-14217412
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/11#issuecomment-63591519
  
Merged, thanks!


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-549) Discarding Throwable exceptions makes threads unkillable.

2014-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218074#comment-14218074
 ] 

ASF GitHub Bot commented on HELIX-549:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/9


> Discarding Throwable exceptions makes threads unkillable.
> -
>
> Key: HELIX-549
> URL: https://issues.apache.org/jira/browse/HELIX-549
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> Threads in loops which catch and discard Throwable end up discarding 
> ThreadDeath exceptions. This causes Thread.stop() to be effectively ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-550) ZKHelixManager does not shutdown GenericHelixController threads.

2014-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218073#comment-14218073
 ] 

ASF GitHub Bot commented on HELIX-550:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/11


> ZKHelixManager does not shutdown GenericHelixController threads.
> 
>
> Key: HELIX-550
> URL: https://issues.apache.org/jira/browse/HELIX-550
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Antony T Curtis
>Priority: Critical
>
> ZKHelixManager does not shutdown GenericHelixController threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-555) ClusterStateVerifier leaks ZkClients.

2014-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220509#comment-14220509
 ] 

ASF GitHub Bot commented on HELIX-555:
--

GitHub user atcurtis opened a pull request:

https://github.com/apache/helix/pull/12

[HELIX-555] Fix deficiency in ClusterStateVerifier api

https://issues.apache.org/jira/browse/HELIX-555


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/atcurtis/helix HELIX-555

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/12.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12


commit b2794e744e945da966c7ac6ae6408636951281d3
Author: Antony T Curtis 
Date:   2014-11-21T04:36:01Z

[HELIX-555] Fix deficiency in ClusterStateVerifier api




> ClusterStateVerifier leaks ZkClients.
> -
>
> Key: HELIX-555
> URL: https://issues.apache.org/jira/browse/HELIX-555
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> The classes in ClusterStateVerifier tend to leak ZkClients because there is 
> no way to provide an already constructed client to the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-555) ClusterStateVerifier leaks ZkClients.

2014-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220525#comment-14220525
 ] 

ASF GitHub Bot commented on HELIX-555:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/12#issuecomment-63924711
  
It looks like the original code already used `ZkClientPool`. How does this 
change improve the situation?


> ClusterStateVerifier leaks ZkClients.
> -
>
> Key: HELIX-555
> URL: https://issues.apache.org/jira/browse/HELIX-555
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> The classes in ClusterStateVerifier tend to leak ZkClients because there is 
> no way to provide an already constructed client to the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-555) ClusterStateVerifier leaks ZkClients.

2014-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220527#comment-14220527
 ] 

ASF GitHub Bot commented on HELIX-555:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/12#issuecomment-63924907
  
Or are there a bunch of tests/tools that create their own ZK client in 
addition to the one created by `ClusterStateVerifier`?


> ClusterStateVerifier leaks ZkClients.
> -
>
> Key: HELIX-555
> URL: https://issues.apache.org/jira/browse/HELIX-555
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> The classes in ClusterStateVerifier tend to leak ZkClients because there is 
> no way to provide an already constructed client to the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-555) ClusterStateVerifier leaks ZkClients.

2014-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220556#comment-14220556
 ] 

ASF GitHub Bot commented on HELIX-555:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/12#issuecomment-63927164
  
OK, in that case, merged.


> ClusterStateVerifier leaks ZkClients.
> -
>
> Key: HELIX-555
> URL: https://issues.apache.org/jira/browse/HELIX-555
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> The classes in ClusterStateVerifier tend to leak ZkClients because there is 
> no way to provide an already constructed client to the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-555) ClusterStateVerifier leaks ZkClients.

2014-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221098#comment-14221098
 ] 

ASF GitHub Bot commented on HELIX-555:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/12


> ClusterStateVerifier leaks ZkClients.
> -
>
> Key: HELIX-555
> URL: https://issues.apache.org/jira/browse/HELIX-555
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3
>Reporter: Antony T Curtis
>Priority: Blocker
>
> The classes in ClusterStateVerifier tend to leak ZkClients because there is 
> no way to provide an already constructed client to the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-569) Update website docs to correctly pass the rebalance mode to addResource in helix-admin.sh

2015-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323126#comment-14323126
 ] 

ASF GitHub Bot commented on HELIX-569:
--

GitHub user acmcelwee opened a pull request:

https://github.com/apache/helix/pull/16

[HELIX-569] - Update docs to correctly pass rebalance mode during 
helix-admin.sh resource creation

I stumbled upon this the other day in #apachehelix irc. I couldn't get the 
distributed locks example to work in full auto rebalance mode, and it turns out 
the docs just needed an update. Since the default rebalance mode since 0.6.2 is 
SEMI_AUTO, my testing halfway worked and exhibited very confusing behavior.

This updates the docs to keep future new users headed in the right 
direction.

https://issues.apache.org/jira/browse/HELIX-569

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/acmcelwee/helix helix-569

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/16.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16


commit 2525ecd8f37892201fc427450b85c342db55070c
Author: Adam McElwee 
Date:   2015-02-16T02:09:25Z

[HELIX-569] - Update docs to correctly pass rebalance mode during 
helix-admin.sh resource creation




> Update website docs to correctly pass the rebalance mode to addResource in 
> helix-admin.sh
> -
>
> Key: HELIX-569
> URL: https://issues.apache.org/jira/browse/HELIX-569
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Adam McElwee
>Priority: Trivial
> Fix For: 0.6.1-incubating, 0.6.2-incubating, 0.7.0-incubating, 
> 0.7.1, 0.6.3, 0.6.4
>
> Attachments: fix-rebalance-mode-arg.patch
>
>
> I stumbled upon this the other day in #apachehelix irc. I couldn't get the 
> distributed locks example to work in full auto rebalance mode, and it turns 
> out the docs just needed an update. Patch incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-569) Update website docs to correctly pass the rebalance mode to addResource in helix-admin.sh

2015-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327006#comment-14327006
 ] 

ASF GitHub Bot commented on HELIX-569:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/16#issuecomment-75000395
  
Thanks!


> Update website docs to correctly pass the rebalance mode to addResource in 
> helix-admin.sh
> -
>
> Key: HELIX-569
> URL: https://issues.apache.org/jira/browse/HELIX-569
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Adam McElwee
>Priority: Trivial
> Fix For: 0.6.1-incubating, 0.6.2-incubating, 0.7.0-incubating, 
> 0.7.1, 0.6.3, 0.6.4
>
> Attachments: fix-rebalance-mode-arg.patch
>
>
> I stumbled upon this the other day in #apachehelix irc. I couldn't get the 
> distributed locks example to work in full auto rebalance mode, and it turns 
> out the docs just needed an update. Patch incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-569) Update website docs to correctly pass the rebalance mode to addResource in helix-admin.sh

2015-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327008#comment-14327008
 ] 

ASF GitHub Bot commented on HELIX-569:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/16


> Update website docs to correctly pass the rebalance mode to addResource in 
> helix-admin.sh
> -
>
> Key: HELIX-569
> URL: https://issues.apache.org/jira/browse/HELIX-569
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Adam McElwee
>Priority: Trivial
> Fix For: 0.6.1-incubating, 0.6.2-incubating, 0.7.0-incubating, 
> 0.7.1, 0.6.3, 0.6.4
>
> Attachments: fix-rebalance-mode-arg.patch
>
>
> I stumbled upon this the other day in #apachehelix irc. I couldn't get the 
> distributed locks example to work in full auto rebalance mode, and it turns 
> out the docs just needed an update. Patch incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-578) NPE while deleting a job from a recurrent job queue

2015-03-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368037#comment-14368037
 ] 

ASF GitHub Bot commented on HELIX-578:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/18

[HELIX-578] NPE while deleting a job from a recurrent job queue.

This is to fix the deletion job operation when trying to delete a job from 
a recurrent job queue. 
New unit test added. mvn install package passed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18


commit 944b16387b3ae5cf622b9d785dba17863341c084
Author: Lei Xia 
Date:   2015-03-18T06:13:55Z

[HELIX-578] NPE while deleting a job from a recurrent job queue.




> NPE while deleting a job from a recurrent job queue
> ---
>
> Key: HELIX-578
> URL: https://issues.apache.org/jira/browse/HELIX-578
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Karthiek
>Assignee: Lei Xia
>Priority: Critical
>
> Helix throws an NPE when we try to delete a job from recurrent job queue.
> Partial stacktrace:
> java.lang.NullPointerException
>   at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:295)
> Helix is looking for workflow context's current state. 
> WorkflowContext wCtx = TaskUtil.getWorkflowContext(_propertyStore, queueName);
> String workflowState =
> (wCtx != null) ? wCtx.getWorkflowState().name() : 
> TaskState.NOT_STARTED.name();
> But for a recurring workflow, there is no "state" in the parent workflow's 
> context. Only the scheduled workflows will have a "state". Hence the NPE.
> To ensure that queue is stopped, Helix should look at the context of 
> last-scheduled-workflow instead of the parent workflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-578) NPE while deleting a job from a recurrent job queue

2015-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370043#comment-14370043
 ] 

ASF GitHub Bot commented on HELIX-578:
--

Github user lei-xia closed the pull request at:

https://github.com/apache/helix/pull/18


> NPE while deleting a job from a recurrent job queue
> ---
>
> Key: HELIX-578
> URL: https://issues.apache.org/jira/browse/HELIX-578
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Karthiek
>Assignee: Lei Xia
>Priority: Critical
>
> Helix throws an NPE when we try to delete a job from recurrent job queue.
> Partial stacktrace:
> java.lang.NullPointerException
>   at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:295)
> Helix is looking for workflow context's current state. 
> WorkflowContext wCtx = TaskUtil.getWorkflowContext(_propertyStore, queueName);
> String workflowState =
> (wCtx != null) ? wCtx.getWorkflowState().name() : 
> TaskState.NOT_STARTED.name();
> But for a recurring workflow, there is no "state" in the parent workflow's 
> context. Only the scheduled workflows will have a "state". Hence the NPE.
> To ensure that queue is stopped, Helix should look at the context of 
> last-scheduled-workflow instead of the parent workflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-578) NPE while deleting a job from a recurrent job queue

2015-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370123#comment-14370123
 ] 

ASF GitHub Bot commented on HELIX-578:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/19

[HELIX-578] NPE while deleting a job from a recurrent job queue.

Updated one with fix as Zhen Zhang suggested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/19.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19


commit 49ceac0e9449940546e37628780e5098ac4e8678
Author: Lei Xia 
Date:   2015-03-19T20:54:05Z

[HELIX-578] NPE while deleting a job from a recurrent job queue.




> NPE while deleting a job from a recurrent job queue
> ---
>
> Key: HELIX-578
> URL: https://issues.apache.org/jira/browse/HELIX-578
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Karthiek
>Assignee: Lei Xia
>Priority: Critical
>
> Helix throws an NPE when we try to delete a job from recurrent job queue.
> Partial stacktrace:
> java.lang.NullPointerException
>   at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:295)
> Helix is looking for workflow context's current state. 
> WorkflowContext wCtx = TaskUtil.getWorkflowContext(_propertyStore, queueName);
> String workflowState =
> (wCtx != null) ? wCtx.getWorkflowState().name() : 
> TaskState.NOT_STARTED.name();
> But for a recurring workflow, there is no "state" in the parent workflow's 
> context. Only the scheduled workflows will have a "state". Hence the NPE.
> To ensure that queue is stopped, Helix should look at the context of 
> last-scheduled-workflow instead of the parent workflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-578) NPE while deleting a job from a recurrent job queue

2015-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370663#comment-14370663
 ] 

ASF GitHub Bot commented on HELIX-578:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/19


> NPE while deleting a job from a recurrent job queue
> ---
>
> Key: HELIX-578
> URL: https://issues.apache.org/jira/browse/HELIX-578
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Karthiek
>Assignee: Lei Xia
>Priority: Critical
>
> Helix throws an NPE when we try to delete a job from recurrent job queue.
> Partial stacktrace:
> java.lang.NullPointerException
>   at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:295)
> Helix is looking for workflow context's current state. 
> WorkflowContext wCtx = TaskUtil.getWorkflowContext(_propertyStore, queueName);
> String workflowState =
> (wCtx != null) ? wCtx.getWorkflowState().name() : 
> TaskState.NOT_STARTED.name();
> But for a recurring workflow, there is no "state" in the parent workflow's 
> context. Only the scheduled workflows will have a "state". Hence the NPE.
> To ensure that queue is stopped, Helix should look at the context of 
> last-scheduled-workflow instead of the parent workflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-584) SimpleDateFormat should not be used as singleton due to its race conditions

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372283#comment-14372283
 ] 

ASF GitHub Bot commented on HELIX-584:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/20

[HELIX-584] SimpleDateFormat should not be used as singleton due to its 
race conditions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/20.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20


commit e924a4c4ee1f1c52dcf6b478bbc88d3050e9d0f8
Author: Lei Xia 
Date:   2015-03-20T22:55:24Z

[HELIX-584] SimpleDateFormat should not be used as singleton due to its 
race conditions.




> SimpleDateFormat should not be used as singleton due to its race conditions
> ---
>
> Key: HELIX-584
> URL: https://issues.apache.org/jira/browse/HELIX-584
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> SimpleDateFormat is used in workflowConfig as a singleton. But since it is 
> not thread-safe (refer here: 
> http://www.hpenterprisesecurity.com/vulncat/en/vulncat/java/race_condition_format_flaw.html),
>  it will mess up the output date format sometime due to race condition. 
> An example trace stack for such failure:
> Message:
> For input string: "2003.E2003E22"
> Full Stacktrace:
> java.lang.NumberFormatException: For input string: "2003.E2003E22"
> at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
> at java.lang.Double.parseDouble(Double.java:510)
> at java.text.DigitList.getDouble(DigitList.java:151)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1302)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1935)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1311)
> at java.text.DateFormat.parse(DateFormat.java:335)
> at 
> org.apache.helix.task.TaskUtil.parseScheduleFromConfigMap(TaskUtil.java:365)
> at 
> org.apache.helix.task.WorkflowConfig$Builder.fromMap(WorkflowConfig.java:173)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:113)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:126)
> at 
> org.apache.helix.integration.task.TestUtil.pollForJobState(TestUtil.java:61)
> at 
> org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopDeleteJobAndResumeRecurrentQueue(TestTaskRebalancerStopResume.java:420)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:673)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:846)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1170)
> at 
> org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
> at org.testng.TestRunner.runWorkers(TestRunner.java:1147)
> at org.testng.TestRunner.privateRun(TestRunner.java:749)
> at org.testng.TestRunner.run(TestRunner.java:600)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:317)
> at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:312)
> at org.testng.SuiteRunner.privateRun(SuiteRunner.java:274)
> at org.testng.SuiteRunner.run(SuiteRunner.java:223)
> at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
> at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
> at org.testng.TestNG.runSuitesSequentially(TestNG.java:1039)
> at org.testng.TestNG.runSuitesLocally(TestNG.java:964)
> at org.testng.TestNG.run(TestNG.java:900)
> at 
> org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:178)
> at 
> org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:92)
> at 
> org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:96)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.jav

[jira] [Commented] (HELIX-584) SimpleDateFormat should not be used as singleton due to its race conditions

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372320#comment-14372320
 ] 

ASF GitHub Bot commented on HELIX-584:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/20


> SimpleDateFormat should not be used as singleton due to its race conditions
> ---
>
> Key: HELIX-584
> URL: https://issues.apache.org/jira/browse/HELIX-584
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> SimpleDateFormat is used in workflowConfig as a singleton. But since it is 
> not thread-safe (refer here: 
> http://www.hpenterprisesecurity.com/vulncat/en/vulncat/java/race_condition_format_flaw.html),
>  it will mess up the output date format sometime due to race condition. 
> An example trace stack for such failure:
> Message:
> For input string: "2003.E2003E22"
> Full Stacktrace:
> java.lang.NumberFormatException: For input string: "2003.E2003E22"
> at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
> at java.lang.Double.parseDouble(Double.java:510)
> at java.text.DigitList.getDouble(DigitList.java:151)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1302)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1935)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1311)
> at java.text.DateFormat.parse(DateFormat.java:335)
> at 
> org.apache.helix.task.TaskUtil.parseScheduleFromConfigMap(TaskUtil.java:365)
> at 
> org.apache.helix.task.WorkflowConfig$Builder.fromMap(WorkflowConfig.java:173)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:113)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:126)
> at 
> org.apache.helix.integration.task.TestUtil.pollForJobState(TestUtil.java:61)
> at 
> org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopDeleteJobAndResumeRecurrentQueue(TestTaskRebalancerStopResume.java:420)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:673)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:846)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1170)
> at 
> org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
> at org.testng.TestRunner.runWorkers(TestRunner.java:1147)
> at org.testng.TestRunner.privateRun(TestRunner.java:749)
> at org.testng.TestRunner.run(TestRunner.java:600)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:317)
> at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:312)
> at org.testng.SuiteRunner.privateRun(SuiteRunner.java:274)
> at org.testng.SuiteRunner.run(SuiteRunner.java:223)
> at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
> at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
> at org.testng.TestNG.runSuitesSequentially(TestNG.java:1039)
> at org.testng.TestNG.runSuitesLocally(TestNG.java:964)
> at org.testng.TestNG.run(TestNG.java:900)
> at 
> org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:178)
> at 
> org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:92)
> at 
> org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:96)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:158)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:86)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-589) Delete job API throws NPE if the job does not exist in last scheduled workflow

2015-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393628#comment-14393628
 ] 

ASF GitHub Bot commented on HELIX-589:
--

GitHub user jicongrui opened a pull request:

https://github.com/apache/helix/pull/22

[HELIX-589] Delete job API throws NPE if the job does not exist in last ...

[HELIX-589] Delete job API throws NPE if the job does not exist in last 
scheduled workflow

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jicongrui/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/22.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22


commit e94a9f5f90099a248181d6dc50314aec0e8d9512
Author: Congrui Ji 
Date:   2015-04-02T22:40:09Z

[HELIX-589] Delete job API throws NPE if the job does not exist in last 
scheduled workflow




> Delete job API throws NPE if the job does not exist in last scheduled workflow
> --
>
> Key: HELIX-589
> URL: https://issues.apache.org/jira/browse/HELIX-589
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Karthiek
>
> When trying to delete a job from a recurrent job queue, Helix throws NPE if 
> the job does not exist in last scheduled workflow.
> java.lang.IllegalArgumentException: Could not remove job BackupJob_MailboxDB 
> from DAG of queue BackupJobQueue_ESPRESSO_CHO_1_SCHEDULED_0
> at 
> org.apache.helix.task.TaskDriver.removeJobFromDag(TaskDriver.java:411)
> at 
> org.apache.helix.task.TaskDriver.deleteJobFromScheduledQueue(TaskDriver.java:345)
> at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:303)
> It is possible for a user to add and immediately delete the job before the 
> next workflow is scheduled. Helix should accommodate this case and check if 
> the job exists in last scheduled workflow before trying to delete it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-589) Delete job API throws NPE if the job does not exist in last scheduled workflow

2015-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396380#comment-14396380
 ] 

ASF GitHub Bot commented on HELIX-589:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/22


> Delete job API throws NPE if the job does not exist in last scheduled workflow
> --
>
> Key: HELIX-589
> URL: https://issues.apache.org/jira/browse/HELIX-589
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Karthiek
>
> When trying to delete a job from a recurrent job queue, Helix throws NPE if 
> the job does not exist in last scheduled workflow.
> java.lang.IllegalArgumentException: Could not remove job BackupJob_MailboxDB 
> from DAG of queue BackupJobQueue_ESPRESSO_CHO_1_SCHEDULED_0
> at 
> org.apache.helix.task.TaskDriver.removeJobFromDag(TaskDriver.java:411)
> at 
> org.apache.helix.task.TaskDriver.deleteJobFromScheduledQueue(TaskDriver.java:345)
> at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:303)
> It is possible for a user to add and immediately delete the job before the 
> next workflow is scheduled. Helix should accommodate this case and check if 
> the job exists in last scheduled workflow before trying to delete it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-589) Delete job API throws NPE if the job does not exist in last scheduled workflow

2015-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396383#comment-14396383
 ] 

ASF GitHub Bot commented on HELIX-589:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/22#issuecomment-89847798
  
Merged, thanks.


> Delete job API throws NPE if the job does not exist in last scheduled workflow
> --
>
> Key: HELIX-589
> URL: https://issues.apache.org/jira/browse/HELIX-589
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Karthiek
>
> When trying to delete a job from a recurrent job queue, Helix throws NPE if 
> the job does not exist in last scheduled workflow.
> java.lang.IllegalArgumentException: Could not remove job BackupJob_MailboxDB 
> from DAG of queue BackupJobQueue_ESPRESSO_CHO_1_SCHEDULED_0
> at 
> org.apache.helix.task.TaskDriver.removeJobFromDag(TaskDriver.java:411)
> at 
> org.apache.helix.task.TaskDriver.deleteJobFromScheduledQueue(TaskDriver.java:345)
> at org.apache.helix.task.TaskDriver.deleteJob(TaskDriver.java:303)
> It is possible for a user to add and immediately delete the job before the 
> next workflow is scheduled. Helix should accommodate this case and check if 
> the job exists in last scheduled workflow before trying to delete it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-584) SimpleDateFormat should not be used as singleton due to its race conditions

2015-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513347#comment-14513347
 ] 

ASF GitHub Bot commented on HELIX-584:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/24

[HELIX-584] SimpleDateFormat should not be used as singleton due to its 
race conditions

[HELIX-584] SimpleDateFormat should not be used as singleton due to its 
race conditions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #24


commit a29ac9fb35365d1fe8cf12ef95c58643a5fea36b
Author: Lei Xia 
Date:   2015-04-27T00:25:33Z

[HELIX-584] SimpleDateFormat should not be used as singleton due to its 
race conditions.




> SimpleDateFormat should not be used as singleton due to its race conditions
> ---
>
> Key: HELIX-584
> URL: https://issues.apache.org/jira/browse/HELIX-584
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> SimpleDateFormat is used in workflowConfig as a singleton. But since it is 
> not thread-safe (refer here: 
> http://www.hpenterprisesecurity.com/vulncat/en/vulncat/java/race_condition_format_flaw.html),
>  it will mess up the output date format sometime due to race condition. 
> An example trace stack for such failure:
> Message:
> For input string: "2003.E2003E22"
> Full Stacktrace:
> java.lang.NumberFormatException: For input string: "2003.E2003E22"
> at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
> at java.lang.Double.parseDouble(Double.java:510)
> at java.text.DigitList.getDouble(DigitList.java:151)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1302)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1935)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1311)
> at java.text.DateFormat.parse(DateFormat.java:335)
> at 
> org.apache.helix.task.TaskUtil.parseScheduleFromConfigMap(TaskUtil.java:365)
> at 
> org.apache.helix.task.WorkflowConfig$Builder.fromMap(WorkflowConfig.java:173)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:113)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:126)
> at 
> org.apache.helix.integration.task.TestUtil.pollForJobState(TestUtil.java:61)
> at 
> org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopDeleteJobAndResumeRecurrentQueue(TestTaskRebalancerStopResume.java:420)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:673)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:846)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1170)
> at 
> org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
> at org.testng.TestRunner.runWorkers(TestRunner.java:1147)
> at org.testng.TestRunner.privateRun(TestRunner.java:749)
> at org.testng.TestRunner.run(TestRunner.java:600)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:317)
> at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:312)
> at org.testng.SuiteRunner.privateRun(SuiteRunner.java:274)
> at org.testng.SuiteRunner.run(SuiteRunner.java:223)
> at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
> at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
> at org.testng.TestNG.runSuitesSequentially(TestNG.java:1039)
> at org.testng.TestNG.runSuitesLocally(TestNG.java:964)
> at org.testng.TestNG.run(TestNG.java:900)
> at 
> org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:178)
> at 
> org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:92)
> at 
> org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:96)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.a

[jira] [Commented] (HELIX-584) SimpleDateFormat should not be used as singleton due to its race conditions

2015-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513557#comment-14513557
 ] 

ASF GitHub Bot commented on HELIX-584:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/24#issuecomment-96513218
  
thanks!


> SimpleDateFormat should not be used as singleton due to its race conditions
> ---
>
> Key: HELIX-584
> URL: https://issues.apache.org/jira/browse/HELIX-584
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> SimpleDateFormat is used in workflowConfig as a singleton. But since it is 
> not thread-safe (refer here: 
> http://www.hpenterprisesecurity.com/vulncat/en/vulncat/java/race_condition_format_flaw.html),
>  it will mess up the output date format sometime due to race condition. 
> An example trace stack for such failure:
> Message:
> For input string: "2003.E2003E22"
> Full Stacktrace:
> java.lang.NumberFormatException: For input string: "2003.E2003E22"
> at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
> at java.lang.Double.parseDouble(Double.java:510)
> at java.text.DigitList.getDouble(DigitList.java:151)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1302)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1935)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1311)
> at java.text.DateFormat.parse(DateFormat.java:335)
> at 
> org.apache.helix.task.TaskUtil.parseScheduleFromConfigMap(TaskUtil.java:365)
> at 
> org.apache.helix.task.WorkflowConfig$Builder.fromMap(WorkflowConfig.java:173)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:113)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:126)
> at 
> org.apache.helix.integration.task.TestUtil.pollForJobState(TestUtil.java:61)
> at 
> org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopDeleteJobAndResumeRecurrentQueue(TestTaskRebalancerStopResume.java:420)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:673)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:846)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1170)
> at 
> org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
> at org.testng.TestRunner.runWorkers(TestRunner.java:1147)
> at org.testng.TestRunner.privateRun(TestRunner.java:749)
> at org.testng.TestRunner.run(TestRunner.java:600)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:317)
> at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:312)
> at org.testng.SuiteRunner.privateRun(SuiteRunner.java:274)
> at org.testng.SuiteRunner.run(SuiteRunner.java:223)
> at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
> at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
> at org.testng.TestNG.runSuitesSequentially(TestNG.java:1039)
> at org.testng.TestNG.runSuitesLocally(TestNG.java:964)
> at org.testng.TestNG.run(TestNG.java:900)
> at 
> org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:178)
> at 
> org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:92)
> at 
> org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:96)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:158)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:86)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-584) SimpleDateFormat should not be used as singleton due to its race conditions

2015-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513556#comment-14513556
 ] 

ASF GitHub Bot commented on HELIX-584:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/24


> SimpleDateFormat should not be used as singleton due to its race conditions
> ---
>
> Key: HELIX-584
> URL: https://issues.apache.org/jira/browse/HELIX-584
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> SimpleDateFormat is used in workflowConfig as a singleton. But since it is 
> not thread-safe (refer here: 
> http://www.hpenterprisesecurity.com/vulncat/en/vulncat/java/race_condition_format_flaw.html),
>  it will mess up the output date format sometime due to race condition. 
> An example trace stack for such failure:
> Message:
> For input string: "2003.E2003E22"
> Full Stacktrace:
> java.lang.NumberFormatException: For input string: "2003.E2003E22"
> at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
> at java.lang.Double.parseDouble(Double.java:510)
> at java.text.DigitList.getDouble(DigitList.java:151)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1302)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1935)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1311)
> at java.text.DateFormat.parse(DateFormat.java:335)
> at 
> org.apache.helix.task.TaskUtil.parseScheduleFromConfigMap(TaskUtil.java:365)
> at 
> org.apache.helix.task.WorkflowConfig$Builder.fromMap(WorkflowConfig.java:173)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:113)
> at org.apache.helix.task.TaskUtil.getWorkflowCfg(TaskUtil.java:126)
> at 
> org.apache.helix.integration.task.TestUtil.pollForJobState(TestUtil.java:61)
> at 
> org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopDeleteJobAndResumeRecurrentQueue(TestTaskRebalancerStopResume.java:420)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:673)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:846)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1170)
> at 
> org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
> at org.testng.TestRunner.runWorkers(TestRunner.java:1147)
> at org.testng.TestRunner.privateRun(TestRunner.java:749)
> at org.testng.TestRunner.run(TestRunner.java:600)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:317)
> at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:312)
> at org.testng.SuiteRunner.privateRun(SuiteRunner.java:274)
> at org.testng.SuiteRunner.run(SuiteRunner.java:223)
> at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
> at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
> at org.testng.TestNG.runSuitesSequentially(TestNG.java:1039)
> at org.testng.TestNG.runSuitesLocally(TestNG.java:964)
> at org.testng.TestNG.run(TestNG.java:900)
> at 
> org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:178)
> at 
> org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:92)
> at 
> org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:96)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:158)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:86)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-591) Provide getResourcesWithTag in HelixAdmin to retrieve all resources with a group tag

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515148#comment-14515148
 ] 

ASF GitHub Bot commented on HELIX-591:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/25

[HELIX-591] Provide getResourcesWithTag in HelixAdmin to retrieve all all 
resources with a group tag.

[HELIX-591] Provide getResourcesWithTag in HelixAdmin to retrieve all 
resources with a group tag.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/25.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #25


commit 8a279a366d4e6c43366a1c5867a02f26768e5627
Author: Lei Xia 
Date:   2015-04-27T20:36:37Z

[HELIX-591] Provide getResourcesWithTag in HelixAdmin to retrieve all 
resources with a group tag.




> Provide getResourcesWithTag in HelixAdmin to retrieve all resources with a 
> group tag
> 
>
> Key: HELIX-591
> URL: https://issues.apache.org/jira/browse/HELIX-591
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> We need to retrieve resources with a given group tag. It is better for Helix 
> Admin to Provide getResourcesWithTag in HelixAdmin to retrieve all resources 
> with a given tag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520496#comment-14520496
 ] 

ASF GitHub Bot commented on HELIX-592:
--

GitHub user jicongrui opened a pull request:

https://github.com/apache/helix/pull/26

[HELIX-592] addCluster should respect overwriteExisitng when adding stat...

...eModelDefinations

There are some tests expecting exceptions when creating an existing cluster 
and I change the result.
So the question is that the business logic of creating a exisiting cluster.
If we allow that and overwrite is false, should we throw exceptions or do 
nothing?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jicongrui/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/26.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #26


commit 69cd1f27065710f6de157b82742673ab8baf5d11
Author: Congrui Ji 
Date:   2015-04-29T23:13:11Z

[HELIX-592] addCluster should respect overwriteExisitng when adding 
stateModelDefinations

There are some tests expecting exceptions when creating an existing cluster 
and I change the result.
So the question is that the business logic of creating a exisiting cluster.
If we allow that and overwrite is false, should we throw exceptions or do 
nothing?




> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525454#comment-14525454
 ] 

ASF GitHub Bot commented on HELIX-592:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/26#discussion_r29550320
  
--- Diff: helix-core/src/main/java/org/apache/helix/tools/ClusterSetup.java 
---
@@ -329,8 +329,9 @@ public HelixAdmin getClusterManagementTool() {
 return _admin;
   }
 
-  public void addStateModelDef(String clusterName, String stateModelDef, 
StateModelDefinition record) {
--- End diff --

Please do not remove public methods. Instead, call the new method from the 
old one with a default value.


> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-591) Provide getResourcesWithTag in HelixAdmin to retrieve all resources with a group tag

2015-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525452#comment-14525452
 ] 

ASF GitHub Bot commented on HELIX-591:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/25


> Provide getResourcesWithTag in HelixAdmin to retrieve all resources with a 
> group tag
> 
>
> Key: HELIX-591
> URL: https://issues.apache.org/jira/browse/HELIX-591
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> We need to retrieve resources with a given group tag. It is better for Helix 
> Admin to Provide getResourcesWithTag in HelixAdmin to retrieve all resources 
> with a given tag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538450#comment-14538450
 ] 

ASF GitHub Bot commented on HELIX-592:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/27

[HELIX-592] addCluster should respect overwriteExisitng when adding 
stateModel Definations.

Congrui Ji has sent this fix before, and got some comments.  But he is on 
vacation, and we (LinkedIn) do need the fix ASAP, so I am sending this again.  
Thanks

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/27.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #27


commit 57a5324034b689c8ae1fab855b75e7fc7e7517ef
Author: Lei Xia 
Date:   2015-05-11T18:35:20Z

[HELIX-592] addCluster should respect overwriteExisitng when adding 
stateModel Definations.




> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539227#comment-14539227
 ] 

ASF GitHub Bot commented on HELIX-592:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/27#discussion_r30104417
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java ---
@@ -715,14 +715,26 @@ public ExternalView getResourceExternalView(String 
clusterName, String resourceN
   @Override
   public void addStateModelDef(String clusterName, String stateModelDef,
   StateModelDefinition stateModel) {
+addStateModelDef(clusterName, stateModelDef, stateModel, false);
+  }
+
+  @Override
+  public void addStateModelDef(String clusterName, String stateModelDef,
+  StateModelDefinition stateModel, boolean recreateIfExists) {
 if (!ZKUtil.isClusterSetup(clusterName, _zkClient)) {
   throw new HelixException("cluster " + clusterName + " is not setup 
yet");
 }
 String stateModelDefPath = 
HelixUtil.getStateModelDefinitionPath(clusterName);
 String stateModelPath = stateModelDefPath + "/" + stateModelDef;
 if (_zkClient.exists(stateModelPath)) {
-  logger.warn("Skip the operation.State Model directory exists:" + 
stateModelPath);
-  throw new HelixException("State model path " + stateModelPath + " 
already exists.");
+  if (recreateIfExists) {
+logger.warn("Operation.State Model directory exists:" + 
stateModelPath +
+", remove and recreate.");
+_zkClient.deleteRecursive(stateModelPath);
+  } else {
+logger.warn("Skip the operation.State Model directory exists:" + 
stateModelPath);
+return;
--- End diff --

Why was this changed to no longer throw an exception? It's better to fail 
loudly in methods that return `void`.


> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539230#comment-14539230
 ] 

ASF GitHub Bot commented on HELIX-592:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/27#discussion_r30104425
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java ---
@@ -715,14 +715,26 @@ public ExternalView getResourceExternalView(String 
clusterName, String resourceN
   @Override
   public void addStateModelDef(String clusterName, String stateModelDef,
   StateModelDefinition stateModel) {
+addStateModelDef(clusterName, stateModelDef, stateModel, false);
+  }
+
+  @Override
+  public void addStateModelDef(String clusterName, String stateModelDef,
+  StateModelDefinition stateModel, boolean recreateIfExists) {
 if (!ZKUtil.isClusterSetup(clusterName, _zkClient)) {
   throw new HelixException("cluster " + clusterName + " is not setup 
yet");
 }
 String stateModelDefPath = 
HelixUtil.getStateModelDefinitionPath(clusterName);
 String stateModelPath = stateModelDefPath + "/" + stateModelDef;
 if (_zkClient.exists(stateModelPath)) {
-  logger.warn("Skip the operation.State Model directory exists:" + 
stateModelPath);
-  throw new HelixException("State model path " + stateModelPath + " 
already exists.");
+  if (recreateIfExists) {
+logger.warn("Operation.State Model directory exists:" + 
stateModelPath +
--- End diff --

Change to `info` log level.


> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540673#comment-14540673
 ] 

ASF GitHub Bot commented on HELIX-592:
--

Github user lei-xia commented on a diff in the pull request:

https://github.com/apache/helix/pull/27#discussion_r30178346
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java ---
@@ -715,14 +715,26 @@ public ExternalView getResourceExternalView(String 
clusterName, String resourceN
   @Override
   public void addStateModelDef(String clusterName, String stateModelDef,
   StateModelDefinition stateModel) {
+addStateModelDef(clusterName, stateModelDef, stateModel, false);
+  }
+
+  @Override
+  public void addStateModelDef(String clusterName, String stateModelDef,
+  StateModelDefinition stateModel, boolean recreateIfExists) {
 if (!ZKUtil.isClusterSetup(clusterName, _zkClient)) {
   throw new HelixException("cluster " + clusterName + " is not setup 
yet");
 }
 String stateModelDefPath = 
HelixUtil.getStateModelDefinitionPath(clusterName);
 String stateModelPath = stateModelDefPath + "/" + stateModelDef;
 if (_zkClient.exists(stateModelPath)) {
-  logger.warn("Skip the operation.State Model directory exists:" + 
stateModelPath);
-  throw new HelixException("State model path " + stateModelPath + " 
already exists.");
+  if (recreateIfExists) {
+logger.warn("Operation.State Model directory exists:" + 
stateModelPath +
+", remove and recreate.");
+_zkClient.deleteRecursive(stateModelPath);
+  } else {
+logger.warn("Skip the operation.State Model directory exists:" + 
stateModelPath);
+return;
--- End diff --

This is to align with the behavior of addCluster,  which return success if 
the cluster exists and overwrite flag is false.  


> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540675#comment-14540675
 ] 

ASF GitHub Bot commented on HELIX-592:
--

Github user lei-xia commented on a diff in the pull request:

https://github.com/apache/helix/pull/27#discussion_r30178394
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java ---
@@ -715,14 +715,26 @@ public ExternalView getResourceExternalView(String 
clusterName, String resourceN
   @Override
   public void addStateModelDef(String clusterName, String stateModelDef,
   StateModelDefinition stateModel) {
+addStateModelDef(clusterName, stateModelDef, stateModel, false);
+  }
+
+  @Override
+  public void addStateModelDef(String clusterName, String stateModelDef,
+  StateModelDefinition stateModel, boolean recreateIfExists) {
 if (!ZKUtil.isClusterSetup(clusterName, _zkClient)) {
   throw new HelixException("cluster " + clusterName + " is not setup 
yet");
 }
 String stateModelDefPath = 
HelixUtil.getStateModelDefinitionPath(clusterName);
 String stateModelPath = stateModelDefPath + "/" + stateModelDef;
 if (_zkClient.exists(stateModelPath)) {
-  logger.warn("Skip the operation.State Model directory exists:" + 
stateModelPath);
-  throw new HelixException("State model path " + stateModelPath + " 
already exists.");
+  if (recreateIfExists) {
+logger.warn("Operation.State Model directory exists:" + 
stateModelPath +
--- End diff --

Fixed in new diff.  Thanks!


> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541244#comment-14541244
 ] 

ASF GitHub Bot commented on HELIX-592:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/27


> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-592) addCluster should respect overwriteExisitng when adding stateModelDefinations

2015-05-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541246#comment-14541246
 ] 

ASF GitHub Bot commented on HELIX-592:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/27#issuecomment-101492973
  
This closes #26.


> addCluster should respect overwriteExisitng when adding stateModelDefinations
> -
>
> Key: HELIX-592
> URL: https://issues.apache.org/jira/browse/HELIX-592
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Congrui Ji
>
> Currently addCluster in clusterSetup.java ignores the overwriteExisitng 
> parameter while adding stateModelDefinations. This causes exception 
> -StateModelDef already exist. please help fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-596) Message throttling of controller behavior unexpectedly, throttled messages still take the constraint quota

2015-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553697#comment-14553697
 ] 

ASF GitHub Bot commented on HELIX-596:
--

GitHub user hangqi opened a pull request:

https://github.com/apache/helix/pull/28

[HELIX-596] fix throttled messages still take constraints' quota

Corresponding review request:  https://reviews.apache.org/r/34345/

Main changes in this pull request:

perMessageThrottleQuotaMap records all matched constraints quota for this 
message, and update the overall throttleMap iff the message has not been 
throttled. Originally not matter the message will be sent out or not, it will 
always take the quota of the matched constraints.

@zzhang5 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hangqi/helix fix_constrain_quota

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/28.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #28


commit 9ddbefcacff6b8e229e6413299d53d89f1cbcd43
Author: Hang Qi 
Date:   2015-05-18T06:06:20Z

[HELIX-596] fix throttled messages still take constraints' quota




> Message throttling of controller behavior unexpectedly, throttled messages 
> still take the constraint quota
> --
>
> Key: HELIX-596
> URL: https://issues.apache.org/jira/browse/HELIX-596
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hang Qi
> Fix For: master
>
>
> We found a very strange behavior on message throttling of controller when 
> there is multiple constraints. Here is our setup ( we are using helix-0.6.4, 
> only one resource )
>   - constraint 1: per node constraint, we only allow 3 state transitions 
> happens on one node concurrently.
>   - constraint 2: per partition constraint, we define the state transition 
> priorities in the state model, and only allow one state transition happens on 
> one single partition concurrently.
> We are using MasterSlave state model, suppose we have two nodes A, B, each 
> has 8 partitions (p0-p7) respectively, and initially both A and B are 
> shutdown, and now we start them at the same time (say A is slightly earlier 
> than B).
> The expected behavior might be
>   - p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B starts from 
> Offline -> Slave
> But the real result is:
>   - p0, p1, p2 on A starts from Offline -> Slave, nothing happens on B
>   - until p0, p1, p2 all transited to Master state, p3, p4, p5 on A starts 
> from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
> As step Offline -> Slave might take long time, this behavior result in very 
> long time to bring up these two nodes (long down time result in long catch up 
> time as well), though ideally we should not let both nodes down at the same 
> time.
> Looked at the controller code, I like the stage and pipeline based 
> implementation, it is well design, very easy to understand and to reason 
> about.
> The logic of MessageThrottleStage#throttle, 
>   - it goes through each messages selected by MessageSelectionStage, 
>   - for each message, it goes through all selected matched constraints, and 
> decrease the quota of each constraints
>  - if any constraint's quota is less than 0, this message will be marked 
> as throttled.
>  
> I think there is something wrong here, the message will take the quota of 
> constraints even it is not going to be sent out (throttled). That explains 
> our case, 
>   - all the messages have been generated by the beginning, (p0, A, 
> Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ..., 
> (p7, B, Offline->Slave)
>   - in the messageThrottleStage#throttle
> - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A, 
> Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on 
> p0, p1, p2 reaches 0 as well
> - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled by 
> constraint 1 on A, also takes the quota of constraint 2 on those partitions 
> as well.
> - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled by 
> constraint 2
> - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave), (p2, A, 
> Offline->Slave) has been sent out by controller.
> Does that make sense, or is there anything else you can think of to result in 
> this unexpected behavior? And is there any work around for it? One thing 
> comes into my mind is update constraint 2 to be only one state transition is 
> allowed of single partition on c

[jira] [Commented] (HELIX-596) Message throttling of controller behavior unexpectedly, throttled messages still take the constraint quota

2015-05-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554866#comment-14554866
 ] 

ASF GitHub Bot commented on HELIX-596:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/28


> Message throttling of controller behavior unexpectedly, throttled messages 
> still take the constraint quota
> --
>
> Key: HELIX-596
> URL: https://issues.apache.org/jira/browse/HELIX-596
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.4
>Reporter: Hang Qi
> Fix For: master
>
>
> We found a very strange behavior on message throttling of controller when 
> there is multiple constraints. Here is our setup ( we are using helix-0.6.4, 
> only one resource )
>   - constraint 1: per node constraint, we only allow 3 state transitions 
> happens on one node concurrently.
>   - constraint 2: per partition constraint, we define the state transition 
> priorities in the state model, and only allow one state transition happens on 
> one single partition concurrently.
> We are using MasterSlave state model, suppose we have two nodes A, B, each 
> has 8 partitions (p0-p7) respectively, and initially both A and B are 
> shutdown, and now we start them at the same time (say A is slightly earlier 
> than B).
> The expected behavior might be
>   - p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B starts from 
> Offline -> Slave
> But the real result is:
>   - p0, p1, p2 on A starts from Offline -> Slave, nothing happens on B
>   - until p0, p1, p2 all transited to Master state, p3, p4, p5 on A starts 
> from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
> As step Offline -> Slave might take long time, this behavior result in very 
> long time to bring up these two nodes (long down time result in long catch up 
> time as well), though ideally we should not let both nodes down at the same 
> time.
> Looked at the controller code, I like the stage and pipeline based 
> implementation, it is well design, very easy to understand and to reason 
> about.
> The logic of MessageThrottleStage#throttle, 
>   - it goes through each messages selected by MessageSelectionStage, 
>   - for each message, it goes through all selected matched constraints, and 
> decrease the quota of each constraints
>  - if any constraint's quota is less than 0, this message will be marked 
> as throttled.
>  
> I think there is something wrong here, the message will take the quota of 
> constraints even it is not going to be sent out (throttled). That explains 
> our case, 
>   - all the messages have been generated by the beginning, (p0, A, 
> Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ..., 
> (p7, B, Offline->Slave)
>   - in the messageThrottleStage#throttle
> - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A, 
> Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on 
> p0, p1, p2 reaches 0 as well
> - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled by 
> constraint 1 on A, also takes the quota of constraint 2 on those partitions 
> as well.
> - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled by 
> constraint 2
> - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave), (p2, A, 
> Offline->Slave) has been sent out by controller.
> Does that make sense, or is there anything else you can think of to result in 
> this unexpected behavior? And is there any work around for it? One thing 
> comes into my mind is update constraint 2 to be only one state transition is 
> allowed of single partition on certain state transitions.
> Thanks very much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-600) Task scheduler fails to schedule a recurring workflow if the startTime is set to a future timestamp

2015-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579731#comment-14579731
 ] 

ASF GitHub Bot commented on HELIX-600:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/29

[HELIX-600] Task scheduler fails to schedule a recurring workflow if the 
startTime is set to a future timestamp.

Ticket: https://issues.apache.org/jira/browse/HELIX-600

mvn test passed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/29.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #29


commit a84c9b1f55cb5c01f2c39fb437c6d4effcee3874
Author: Lei Xia 
Date:   2015-06-09T21:40:32Z

[HELIX-600] Task scheduler fails to schedule a recurring workflow if the 
startTime is set to a future timestamp.




> Task scheduler fails to schedule a recurring workflow if the startTime is set 
> to a future timestamp
> ---
>
> Key: HELIX-600
> URL: https://issues.apache.org/jira/browse/HELIX-600
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3, 0.6.4
>Reporter: Karthiek
>Assignee: Lei Xia
>
> If we define a recurrent job queue with start-time value in the future (say 
> current time + 5 minutes), Helix does not schedule the queue event after 
> start-time timestamp elapses. Helix should schedule jobs once the recurrence 
> timestamp is hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593786#comment-14593786
 ] 

ASF GitHub Bot commented on HELIX-601:
--

GitHub user jicongrui opened a pull request:

https://github.com/apache/helix/pull/30

[HELIX-601] Allow work flow to schedule dependency jobs in parallel

Currently, Helix won't schedule dependency jobs in a same work flow. For 
example, if Job2 depends on Job1, Job2 won't be scheduled until every partition 
of Job1 is completed.
However, if some participant is very slow, then all dependency jobs is 
waiting for that single participant.
Helix should be able to schedule multiple jobs according to a parameter.
A.C.
1. Introduce parallel count parameter in work flow and job queue.
2. Dependency jobs can be scheduled according to the parameter (Now the 
parameter is always 1, so no parallel)
3. If Job2 depends on Job1, Job1 is scheduled before Job2.
4. No parallel jobs on the same instance. If a instance is running Job1, it 
won't run Job2 until Job1 is finished.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jicongrui/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/30.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #30


commit 8819220738b18c54652e4b32b9677ea78d585da2
Author: Congrui Ji 
Date:   2015-06-19T18:51:19Z

[HELIX-601] Allow work flow to schedule dependency jobs in parallel

Currently, Helix won't schedule dependency jobs in a same work flow. For 
example, if Job2 depends on Job1, Job2 won't be scheduled until every partition 
of Job1 is completed.
However, if some participant is very slow, then all dependency jobs is 
waiting for that single participant.
Helix should be able to schedule multiple jobs according to a parameter.
A.C.
1. Introduce parallel count parameter in work flow and job queue.
2. Dependency jobs can be scheduled according to the parameter (Now the 
parameter is always 1, so no parallel)
3. If Job2 depends on Job1, Job1 is scheduled before Job2.
4. No parallel jobs on the same instance. If a instance is running Job1, it 
won't run Job2 until Job1 is finished.




> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-600) Task scheduler fails to schedule a recurring workflow if the startTime is set to a future timestamp

2015-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595340#comment-14595340
 ] 

ASF GitHub Bot commented on HELIX-600:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/29


> Task scheduler fails to schedule a recurring workflow if the startTime is set 
> to a future timestamp
> ---
>
> Key: HELIX-600
> URL: https://issues.apache.org/jira/browse/HELIX-600
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3, 0.6.4
>Reporter: Karthiek
>Assignee: Lei Xia
>
> If we define a recurrent job queue with start-time value in the future (say 
> current time + 5 minutes), Helix does not schedule the queue event after 
> start-time timestamp elapses. Helix should schedule jobs once the recurrence 
> timestamp is hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-600) Task scheduler fails to schedule a recurring workflow if the startTime is set to a future timestamp

2015-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595339#comment-14595339
 ] 

ASF GitHub Bot commented on HELIX-600:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/29#issuecomment-113996929
  
Merged -- thanks!


> Task scheduler fails to schedule a recurring workflow if the startTime is set 
> to a future timestamp
> ---
>
> Key: HELIX-600
> URL: https://issues.apache.org/jira/browse/HELIX-600
> Project: Apache Helix
>  Issue Type: Bug
>Affects Versions: 0.6.3, 0.6.4
>Reporter: Karthiek
>Assignee: Lei Xia
>
> If we define a recurrent job queue with start-time value in the future (say 
> current time + 5 minutes), Helix does not schedule the queue event after 
> start-time timestamp elapses. Helix should schedule jobs once the recurrence 
> timestamp is hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595344#comment-14595344
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/30#discussion_r32904755
  
--- Diff: 
helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerParallel.java
 ---
@@ -0,0 +1,195 @@
+package org.apache.helix.integration.task;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.helix.AccessOption;
+import org.apache.helix.HelixDataAccessor;
+import org.apache.helix.HelixManager;
+import org.apache.helix.HelixManagerFactory;
+import org.apache.helix.InstanceType;
+import org.apache.helix.PropertyKey;
+import org.apache.helix.TestHelper;
+import org.apache.helix.integration.ZkIntegrationTestBase;
+import org.apache.helix.integration.manager.ClusterControllerManager;
+import org.apache.helix.integration.manager.MockParticipantManager;
+import org.apache.helix.participant.StateMachineEngine;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobContext;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.Task;
+import org.apache.helix.task.TaskCallbackContext;
+import org.apache.helix.task.TaskConstants;
+import org.apache.helix.task.TaskDriver;
+import org.apache.helix.task.TaskFactory;
+import org.apache.helix.task.TaskPartitionState;
+import org.apache.helix.task.TaskResult;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskStateModelFactory;
+import org.apache.helix.task.TaskUtil;
+import org.apache.helix.task.Workflow;
+import org.apache.helix.tools.ClusterSetup;
+import org.apache.helix.tools.ClusterStateVerifier;
+import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskRebalancerParallel extends ZkIntegrationTestBase {
--- End diff --

Apache license header is missing


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595347#comment-14595347
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/30#discussion_r32904781
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/task/TaskRebalancer.java ---
@@ -134,14 +134,22 @@ public ResourceAssignment 
computeBestPossiblePartitionState(ClusterDataCache clu
   workflowCtx.setStartTime(System.currentTimeMillis());
 }
 
-// Check parent dependencies
-for (String parent : 
workflowCfg.getJobDag().getDirectParents(resourceName)) {
-  if (workflowCtx.getJobState(parent) == null
-  || !workflowCtx.getJobState(parent).equals(TaskState.COMPLETED)) 
{
-return emptyAssignment(resourceName, currStateOutput);
+// check ancestor job status
+int unStartCount = 0;
--- End diff --

Please rename to `notStartedCount` and `incompleteCount`


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595348#comment-14595348
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user kanakb commented on a diff in the pull request:

https://github.com/apache/helix/pull/30#discussion_r32904867
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/task/TaskRebalancer.java ---
@@ -219,6 +227,32 @@ public ResourceAssignment 
computeBestPossiblePartitionState(ClusterDataCache clu
 return newAssignment;
   }
 
+  private Set getWorkflowAssignedInstances(String currentJobName,
--- End diff --

Method name should indicate that the returned value does not consider the 
current job.


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595359#comment-14595359
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/30#issuecomment-113998619
  
This feels like a hack. If A depends on B, then A should never run before 
B. If it is acceptable for A and B to run in parallel, then A should not depend 
on B.


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596247#comment-14596247
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user jicongrui commented on the pull request:

https://github.com/apache/helix/pull/30#issuecomment-114181050
  
This is kind of hacky, but helix has no better way to handle it.
The request is something between totally out of order (workflow, no 
dependency) and totally order (jobDag, job2 can't run after job1).
The request hope job1 is scheduled before job2, and job2 can be scheduled 
even if some participants get stuck on job1


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596328#comment-14596328
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user jicongrui commented on the pull request:

https://github.com/apache/helix/pull/30#issuecomment-114198676
  
Updated the pull request by comments.


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-599) Support creating/maintaining/routing resources with same names in different instance groups

2015-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596405#comment-14596405
 ] 

ASF GitHub Bot commented on HELIX-599:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/31

[HELIX-599] Support creating/maintaining/routing resources with same names 
in different instance groups.


More details on the problems and our proposed solution is on the jira 
description: https://issues.apache.org/jira/browse/HELIX-599

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/31.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #31


commit 2f88e070fb698c1420873c1bffa63640638de1ba
Author: Lei Xia 
Date:   2015-05-11T17:54:27Z

[HELIX-599] Support creating/maintaining/routing resources with same names 
in different instance groups.




> Support creating/maintaining/routing resources with same names in different 
> instance groups
> ---
>
> Key: HELIX-599
> URL: https://issues.apache.org/jira/browse/HELIX-599
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In LinkedIn, we have a new use scenario that there will be multiple databases 
> sitting in the same Helix cluster with the same name, but on different 
> instance groups.  What we need are:
>  1) Allow resources (databases) with the same name, these resources are on 
> different instance groups (with different tags).
>  2) Routing table (Spectator) is able to aggregate and return all instance 
> (from multiple instance groups) that hold the database with given name.
> Our proposed solution is:
>  1) Add a "Resource Group" field in IdealState for the databases with the 
> same names from different instance groups
>  2) Use Instance Group Tag (or new "Resource Tag") to differentiate databases 
> (with same name) from different instance groups.
>  3) Use name mangling for Idealstate, for example, with database TestDB in 
> instance group "testGroup", the IdealState and ExternalView id would be 
> "TestDB$testGroup". 
>  4) Change Helix Routing Table to be able to aggregate databases from the 
> same resource group.
>  
> Four new APIs are going to be added to RoutingTableProvider:
> public class RoutingTableProvider {
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from all resources with given resource name
>  */
> public List getInstances(String resource, String partition, 
> String state);
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from selected resources with given name and tags
>  */
> public List getInstances(String resource, String partition, 
> String state, List resourceTags);
>  
> /**
>  * returns instances that contain given resource that are in a specific state
>  */
> public Set getInstances(String resource, String state);
>  
> /**
>  * returns instances that contain given resource with tags that are in a 
> specific state
>  */
> public Set getInstances(String resource, String state,  
> List groupTags);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609556#comment-14609556
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/30#issuecomment-117433383
  
This should fail fast if you try to set parallelism on workflows that do 
not have a target resource. Also, what if the target resource has its 
partitions assigned to other instances? I'm not fully convinced that this is 
safe except in the case where the task has a target resource and that target 
resource is assigned to a fixed set of instances.


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-06-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609577#comment-14609577
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user jicongrui commented on the pull request:

https://github.com/apache/helix/pull/30#issuecomment-117438877
  
The assignment follows the same logic as before, and it can be considered 
as a black box, whose input is job and output is task assignment.
So this diff only checks the output, task assignment, and remove busy 
instances from task assignment.

E.g. If the target resource change resource to a different set of 
instances, the task assignment would contain no busy instances, so job2 can be 
executed on any of the new instance.


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel

2015-07-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613456#comment-14613456
 ] 

ASF GitHub Bot commented on HELIX-601:
--

Github user kanakb commented on the pull request:

https://github.com/apache/helix/pull/30#issuecomment-118425221
  
1. Let's say we have a target resource with 2 partitions and 0 replicas, 
with one partition assigned to node A, and one partition assigned to node B. 
Job 0 runs on nodes A and B, it finishes on node A, and then Job 1 starts on 
node A. Then imagine node B fails, and both partitions are now on node A. Job 1 
is running on node A, but Job 0 did not finish for the partition that was 
reassigned to node A. We have a dependency inversion, and that's why this is 
unsafe.

2. If the job does not have a target resource, this change doesn't make 
sense. An exception should be thrown if you attempt to submit an untargeted 
workflow that has parallelism set.


> Allow work flow to schedule dependency jobs in parallel
> ---
>
> Key: HELIX-601
> URL: https://issues.apache.org/jira/browse/HELIX-601
> Project: Apache Helix
>  Issue Type: New Feature
>Reporter: Congrui Ji
>
> Currently, Helix won't schedule dependency jobs in a same work flow. For 
> example, if Job2 depends on Job1, Job2 won't be scheduled until every 
> partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is 
> waiting for that single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the 
> parameter is always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it 
> won't run Job2 until Job1 is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-599) Support creating/maintaining/routing resources with same names in different instance groups

2015-07-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613459#comment-14613459
 ] 

ASF GitHub Bot commented on HELIX-599:
--

Github user kishoreg commented on a diff in the pull request:

https://github.com/apache/helix/pull/31#discussion_r33881022
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
 ---
@@ -127,7 +127,9 @@ public void process(ClusterEvent event) throws 
Exception {
 Message message =
 createMessage(manager, resourceName, 
partition.getPartitionName(), instanceName,
 currentState, nextState, 
sessionIdMap.get(instanceName), stateModelDef.getId(),
-resource.getStateModelFactoryname(), bucketSize);
--- End diff --

can we simply pass in the entire resource and createMessage can fetch 
required attributes from resource


> Support creating/maintaining/routing resources with same names in different 
> instance groups
> ---
>
> Key: HELIX-599
> URL: https://issues.apache.org/jira/browse/HELIX-599
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In LinkedIn, we have a new use scenario that there will be multiple databases 
> sitting in the same Helix cluster with the same name, but on different 
> instance groups.  What we need are:
>  1) Allow resources (databases) with the same name, these resources are on 
> different instance groups (with different tags).
>  2) Routing table (Spectator) is able to aggregate and return all instance 
> (from multiple instance groups) that hold the database with given name.
> Our proposed solution is:
>  1) Add a "Resource Group" field in IdealState for the databases with the 
> same names from different instance groups
>  2) Use Instance Group Tag (or new "Resource Tag") to differentiate databases 
> (with same name) from different instance groups.
>  3) Use name mangling for Idealstate, for example, with database TestDB in 
> instance group "testGroup", the IdealState and ExternalView id would be 
> "TestDB$testGroup". 
>  4) Change Helix Routing Table to be able to aggregate databases from the 
> same resource group.
>  
> Four new APIs are going to be added to RoutingTableProvider:
> public class RoutingTableProvider {
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from all resources with given resource name
>  */
> public List getInstances(String resource, String partition, 
> String state);
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from selected resources with given name and tags
>  */
> public List getInstances(String resource, String partition, 
> String state, List resourceTags);
>  
> /**
>  * returns instances that contain given resource that are in a specific state
>  */
> public Set getInstances(String resource, String state);
>  
> /**
>  * returns instances that contain given resource with tags that are in a 
> specific state
>  */
> public Set getInstances(String resource, String state,  
> List groupTags);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-599) Support creating/maintaining/routing resources with same names in different instance groups

2015-07-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613460#comment-14613460
 ] 

ASF GitHub Bot commented on HELIX-599:
--

Github user kishoreg commented on a diff in the pull request:

https://github.com/apache/helix/pull/31#discussion_r33881054
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
 ---
@@ -190,7 +192,8 @@ public void process(ClusterEvent event) throws 
Exception {
 
   private Message createMessage(HelixManager manager, String resourceName, 
String partitionName,
   String instanceName, String currentState, String nextState, String 
sessionId,
-  String stateModelDefName, String stateModelFactoryName, int 
bucketSize) {
--- End diff --

This method has too many parameters, we need to just pass in resource or 
have a message builder class


> Support creating/maintaining/routing resources with same names in different 
> instance groups
> ---
>
> Key: HELIX-599
> URL: https://issues.apache.org/jira/browse/HELIX-599
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In LinkedIn, we have a new use scenario that there will be multiple databases 
> sitting in the same Helix cluster with the same name, but on different 
> instance groups.  What we need are:
>  1) Allow resources (databases) with the same name, these resources are on 
> different instance groups (with different tags).
>  2) Routing table (Spectator) is able to aggregate and return all instance 
> (from multiple instance groups) that hold the database with given name.
> Our proposed solution is:
>  1) Add a "Resource Group" field in IdealState for the databases with the 
> same names from different instance groups
>  2) Use Instance Group Tag (or new "Resource Tag") to differentiate databases 
> (with same name) from different instance groups.
>  3) Use name mangling for Idealstate, for example, with database TestDB in 
> instance group "testGroup", the IdealState and ExternalView id would be 
> "TestDB$testGroup". 
>  4) Change Helix Routing Table to be able to aggregate databases from the 
> same resource group.
>  
> Four new APIs are going to be added to RoutingTableProvider:
> public class RoutingTableProvider {
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from all resources with given resource name
>  */
> public List getInstances(String resource, String partition, 
> String state);
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from selected resources with given name and tags
>  */
> public List getInstances(String resource, String partition, 
> String state, List resourceTags);
>  
> /**
>  * returns instances that contain given resource that are in a specific state
>  */
> public Set getInstances(String resource, String state);
>  
> /**
>  * returns instances that contain given resource with tags that are in a 
> specific state
>  */
> public Set getInstances(String resource, String state,  
> List groupTags);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-599) Support creating/maintaining/routing resources with same names in different instance groups

2015-07-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613461#comment-14613461
 ] 

ASF GitHub Bot commented on HELIX-599:
--

Github user kishoreg commented on a diff in the pull request:

https://github.com/apache/helix/pull/31#discussion_r33881099
  
--- Diff: helix-core/src/main/java/org/apache/helix/model/IdealState.java 
---
@@ -55,7 +55,9 @@
 MAX_PARTITIONS_PER_INSTANCE,
 INSTANCE_GROUP_TAG,
 REBALANCER_CLASS_NAME,
-HELIX_ENABLED
+HELIX_ENABLED,
+RESOURCE_GROUP_NAME,
+RESOURCE_GROUP_ENABLED
--- End diff --

Why do we need ResourceGroupEnabled flag? Will things work as expected if 
there is a resourcegroupName and by default we can set the resourceGroupName to 
resourceName ?


> Support creating/maintaining/routing resources with same names in different 
> instance groups
> ---
>
> Key: HELIX-599
> URL: https://issues.apache.org/jira/browse/HELIX-599
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In LinkedIn, we have a new use scenario that there will be multiple databases 
> sitting in the same Helix cluster with the same name, but on different 
> instance groups.  What we need are:
>  1) Allow resources (databases) with the same name, these resources are on 
> different instance groups (with different tags).
>  2) Routing table (Spectator) is able to aggregate and return all instance 
> (from multiple instance groups) that hold the database with given name.
> Our proposed solution is:
>  1) Add a "Resource Group" field in IdealState for the databases with the 
> same names from different instance groups
>  2) Use Instance Group Tag (or new "Resource Tag") to differentiate databases 
> (with same name) from different instance groups.
>  3) Use name mangling for Idealstate, for example, with database TestDB in 
> instance group "testGroup", the IdealState and ExternalView id would be 
> "TestDB$testGroup". 
>  4) Change Helix Routing Table to be able to aggregate databases from the 
> same resource group.
>  
> Four new APIs are going to be added to RoutingTableProvider:
> public class RoutingTableProvider {
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from all resources with given resource name
>  */
> public List getInstances(String resource, String partition, 
> String state);
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from selected resources with given name and tags
>  */
> public List getInstances(String resource, String partition, 
> String state, List resourceTags);
>  
> /**
>  * returns instances that contain given resource that are in a specific state
>  */
> public Set getInstances(String resource, String state);
>  
> /**
>  * returns instances that contain given resource with tags that are in a 
> specific state
>  */
> public Set getInstances(String resource, String state,  
> List groupTags);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-599) Support creating/maintaining/routing resources with same names in different instance groups

2015-07-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613462#comment-14613462
 ] 

ASF GitHub Bot commented on HELIX-599:
--

Github user kishoreg commented on a diff in the pull request:

https://github.com/apache/helix/pull/31#discussion_r33881174
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/spectator/RoutingTableProvider.java 
---
@@ -73,6 +75,73 @@ public RoutingTableProvider() {
   }
 
   /**
+   * returns the instances for {resource,partition} pair that are in a 
specific {state} if
+   * aggregateGrouping is turned on, find all resources belongs to the 
given resourceGroupName and
+   * aggregate all partition states from all these resources.
+   *
+   * @param resourceName
+   * @param partitionName
+   * @param state
+   * @param groupingEnabled
+   *
+   * @return empty list if there is no instance in a given state
+   */
+  public List getInstances(String resourceName, String 
partitionName, String state,
--- End diff --

having boolean here does not make sense. we should problem have 
getInstancesForResource and getInstancesForResourceGroup 


> Support creating/maintaining/routing resources with same names in different 
> instance groups
> ---
>
> Key: HELIX-599
> URL: https://issues.apache.org/jira/browse/HELIX-599
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In LinkedIn, we have a new use scenario that there will be multiple databases 
> sitting in the same Helix cluster with the same name, but on different 
> instance groups.  What we need are:
>  1) Allow resources (databases) with the same name, these resources are on 
> different instance groups (with different tags).
>  2) Routing table (Spectator) is able to aggregate and return all instance 
> (from multiple instance groups) that hold the database with given name.
> Our proposed solution is:
>  1) Add a "Resource Group" field in IdealState for the databases with the 
> same names from different instance groups
>  2) Use Instance Group Tag (or new "Resource Tag") to differentiate databases 
> (with same name) from different instance groups.
>  3) Use name mangling for Idealstate, for example, with database TestDB in 
> instance group "testGroup", the IdealState and ExternalView id would be 
> "TestDB$testGroup". 
>  4) Change Helix Routing Table to be able to aggregate databases from the 
> same resource group.
>  
> Four new APIs are going to be added to RoutingTableProvider:
> public class RoutingTableProvider {
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from all resources with given resource name
>  */
> public List getInstances(String resource, String partition, 
> String state);
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from selected resources with given name and tags
>  */
> public List getInstances(String resource, String partition, 
> String state, List resourceTags);
>  
> /**
>  * returns instances that contain given resource that are in a specific state
>  */
> public Set getInstances(String resource, String state);
>  
> /**
>  * returns instances that contain given resource with tags that are in a 
> specific state
>  */
> public Set getInstances(String resource, String state,  
> List groupTags);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-599) Support creating/maintaining/routing resources with same names in different instance groups

2015-07-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613486#comment-14613486
 ] 

ASF GitHub Bot commented on HELIX-599:
--

Github user kishoreg commented on a diff in the pull request:

https://github.com/apache/helix/pull/31#discussion_r33882177
  
--- Diff: helix-core/src/main/java/org/apache/helix/model/IdealState.java 
---
@@ -536,4 +574,16 @@ public boolean isEnabled() {
   public void enable(boolean enabled) {
 _record.setSimpleField(IdealStateProperty.HELIX_ENABLED.name(), 
Boolean.toString(enabled));
   }
+
+  /**
+   * Get the mangled IdealState name if resourceGroup is enable.
+   *
+   * @param resourceName
+   * @param resourceTag
+   *
+   * @return
+   */
+  public static String getIdealStateName(String resourceName, String 
resourceTag) {
--- End diff --

why do we need this method? This convention can be completely handled on 
client side rt?


> Support creating/maintaining/routing resources with same names in different 
> instance groups
> ---
>
> Key: HELIX-599
> URL: https://issues.apache.org/jira/browse/HELIX-599
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core, helix-webapp-admin
>Reporter: Lei Xia
>Assignee: Lei Xia
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In LinkedIn, we have a new use scenario that there will be multiple databases 
> sitting in the same Helix cluster with the same name, but on different 
> instance groups.  What we need are:
>  1) Allow resources (databases) with the same name, these resources are on 
> different instance groups (with different tags).
>  2) Routing table (Spectator) is able to aggregate and return all instance 
> (from multiple instance groups) that hold the database with given name.
> Our proposed solution is:
>  1) Add a "Resource Group" field in IdealState for the databases with the 
> same names from different instance groups
>  2) Use Instance Group Tag (or new "Resource Tag") to differentiate databases 
> (with same name) from different instance groups.
>  3) Use name mangling for Idealstate, for example, with database TestDB in 
> instance group "testGroup", the IdealState and ExternalView id would be 
> "TestDB$testGroup". 
>  4) Change Helix Routing Table to be able to aggregate databases from the 
> same resource group.
>  
> Four new APIs are going to be added to RoutingTableProvider:
> public class RoutingTableProvider {
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from all resources with given resource name
>  */
> public List getInstances(String resource, String partition, 
> String state);
>  
> /**
>  * returns the instances that contain the given partition in a specific state 
> from selected resources with given name and tags
>  */
> public List getInstances(String resource, String partition, 
> String state, List resourceTags);
>  
> /**
>  * returns instances that contain given resource that are in a specific state
>  */
> public Set getInstances(String resource, String state);
>  
> /**
>  * returns instances that contain given resource with tags that are in a 
> specific state
>  */
> public Set getInstances(String resource, String state,  
> List groupTags);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >