Re: [VOTE] Apache Helix 0.8.2 Release

2018-07-28 Thread Hunter Lee
+1

On Wed, Jul 25, 2018 at 4:46 PM, Wang Jiajun  wrote:

> Hi,
>
> This is to call for a vote on releasing the following candidate as Apache
> Helix 0.8.2. This is the 14th release of Helix as an Apache project, as
> well as the 10th release as a top-level Apache project.
>
> Apache Helix is a generic cluster management framework that makes it easy
> to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
> Release notes:
> http://helix.apache.org/0.8.2-docs/releasenotes/release-0.8.2.html
>
> Release artifacts:
> https://repository.apache.org/content/repositories/orgapachehelix-1018
>
> Distribution:
> * binaries:
> https://dist.apache.org/repos/dist/dev/helix/0.8.2/binaries/
> * sources:
> https://dist.apache.org/repos/dist/dev/helix/0.8.2/src/
>
> The 0.8.2 release tag:
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=
> refs/tags/helix-
> 0.8.2
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/dev/helix/KEYS
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
> [+1] -- "YES, release"
> [0] -- "No opinion"
> [-1] -- "NO, do not release"
>
> Thanks,
> The Apache Helix Team
>


Re: Warn messages in controller log

2018-11-27 Thread Hunter Lee
Hi Dlmuthu,

Could you give us a bit more context? What kind of operations are you doing
and what are the characteristics (config settings) for these workflows?

On a cursory look, it seems that the workflow
"Workflow_of_process_PROCESS_4660666a-6acf-477e-bf97-0671b666dd19-PRE-6a568552-f517-49a1-94c8-c1d3814c3244"'s
WorkflowConfig is missing even though its IdealState exists. A few possible
scenarios are as follows:

1. Some purge operation/delete operation was performed on this particular
workflow; however, the WorkflowConfig was removed successfully, but the
IdealState removal failed. This causes WorkflowRebalancer to print out the
log. This is not very serious and you'll just see a lot of WARN logs. The
fix is to force delete the IdealState that failed to be deleted.
2. Some fundamental assumptions about Helix was broken - (for example,
CLUSTER/CONFIG/RESOURCE folder was removed and re-created) that caused
CallbackHandler to be removed. This will break the Helix Controller and no
workflows will be scheduled. In this case, you have to ensure the cluster
is set up properly and re-start the Controller.

Hope this helps,
Hunter

On Tue, Nov 20, 2018 at 7:12 AM DImuthu Upeksha 
wrote:

> Hi Folks,
>
> I'm seeing a set of log lines that are continuously printing in controller
> and some errors in between. Do you have an explanation for this?
>
> 2018-11-20 07:53:44,127 [GenericHelixController-event_process] WARN
> o.a.h.task.WorkflowRebalancer  - Workflow configuration is NULL for
>
> Workflow_of_process_PROCESS_4660666a-6acf-477e-bf97-0671b666dd19-PRE-6a568552-f517-49a1-94c8-c1d3814c3244
> 2018-11-20 07:53:44,662 [GenericHelixController-event_process] WARN
> o.a.h.task.WorkflowRebalancer  - Workflow configuration is NULL for
>
> Workflow_of_process_PROCESS_4660666a-6acf-477e-bf97-0671b666dd19-PRE-6a568552-f517-49a1-94c8-c1d3814c3244
> 2018-11-20 07:53:45,215 [GenericHelixController-event_process] WARN
> o.a.h.task.WorkflowRebalancer  - Workflow configuration is NULL for
>
> Workflow_of_process_PROCESS_4660666a-6acf-477e-bf97-0671b666dd19-PRE-6a568552-f517-49a1-94c8-c1d3814c3244
> 2018-11-20 07:53:46,709 [GenericHelixController-event_process] WARN
> o.a.h.task.WorkflowRebalancer  - Workflow configuration is NULL for
>
> Workflow_of_process_PROCESS_4660666a-6acf-477e-bf97-0671b666dd19-PRE-6a568552-f517-49a1-94c8-c1d3814c3244
> 2018-11-20 07:53:47,177 [GenericHelixController-event_process] WARN
> o.a.h.task.WorkflowRebalancer  - Workflow configuration is NULL for
>
> Workflow_of_process_PROCESS_85d960c4-d4c5-4308-aa67-eed87adb7eae-POST-5d7ffcdd-aa19-444d-b5e0-a7d8ecf7a450
> 2018-11-20 07:53:47,271 [pool-2-thread-1] ERROR
> o.a.h.m.zk.ZkBaseDataAccessor  - Exception while updating path:
>
> /AiravataDemoCluster/IDEALSTATES/Workflow_of_process_PROCESS_1e38a959-744b-4124-a79b-ff74f0980416-POST-1cdd0eec-1e0a-4f96-b9ca-e40c85365d76
> org.I0Itec.zkclient.exception.ZkInterruptedException:
> java.lang.InterruptedException
> at
>
> org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:943)
> at
>
> org.apache.helix.manager.zk.zookeeper.ZkClient.writeDataReturnStat(ZkClient.java:1141)
> at
>
> org.apache.helix.manager.zk.zookeeper.ZkClient.writeDataGetStat(ZkClient.java:1160)
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.doUpdate(ZkBaseDataAccessor.java:262)
> at
>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.update(ZkBaseDataAccessor.java:237)
> at
>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:190)
> at
>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:170)
> at
>
> org.apache.helix.controller.rebalancer.util.RebalanceScheduler.invokeRebalance(RebalanceScheduler.java:144)
> at
>
> org.apache.helix.controller.rebalancer.util.RebalanceScheduler$RebalanceInvoker.run(RebalanceScheduler.java:129)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.InterruptedException: null
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1259)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1226)
> at
> org.I0Itec.zkclient.ZkConnection.writeDataReturnStat(ZkConnection.java:131)
> at
> org.apache.helix.manager.zk.zookeeper.ZkClient$12.call(ZkClient.java:1143)
> at
>
> 

Re: [VOTE] Apache Helix 0.8.3 Release

2018-11-26 Thread Hunter Lee
+1

On Mon, Nov 26, 2018 at 6:09 PM Lei Xia  wrote:

> +1
>
> On Mon, Nov 26, 2018 at 5:48 PM Xue Junkai  wrote:
>
> > Hi,
> >
> >
> > This is to call for a vote on releasing the following candidate as Apache
> > Helix 0.8.3. This is the 15th release of Helix as an Apache project, as
> > well as the 11th release as a top-level Apache project.
> >
> >
> > Apache Helix is a generic cluster management framework that makes it easy
> > to build partitioned and replicated, fault-tolerant and scalable
> > distributed systems.
> >
> >
> > Release notes:
> >
> > *https://helix.apache.org/0.8.3-docs/releasenotes/release-0.8.3.html
> > *
> >
> >
> > Release artifacts:
> >
> > https://repository.apache.org/content/repositories/orgapachehelix-1021
> >
> >
> > Distribution:
> >
> > * binaries:
> >
> > https://dist.apache.org/repos/dist/dev/helix/0.8.3]/binaries/
> >
> > * sources:
> >
> > https://dist.apache.org/repos/dist/dev/helix/0.8.3/src/
> >
> >
> > The 0.8.3 release tag:
> >
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.8.3
> >
> >
> > KEYS file available here:
> >
> > https://dist.apache.org/repos/dist/dev/helix/KEYS
> >
> >
> > Please vote on the release. The vote will be open for at least 72 hours.
> >
> >
> > [+1] -- "YES, release"
> >
> > [0] -- "No opinion"
> >
> > [-1] -- "NO, do not release"
> >
> >
> > Thanks,
> >
> > The Apache Helix Team
> >
>


Re: Proposal: Moving Helix to Java 1.8 and upgrading Maven version

2019-03-27 Thread Hunter Lee
Seeing as we have a majority vote from the PMC as well as the active
committers, we will go ahead and put this into action.

Corresponding work will be scoped out and worked on. A few more details
around how we're going to version this is still to be discussed/decided.

Thanks!
Hunter

On Tue, Mar 26, 2019 at 12:57 PM Wang Jiajun  wrote:

> +1
>
> Best Regards,
> Jiajun
>
>
> On Mon, Mar 25, 2019 at 1:43 PM Lei Xia  wrote:
>
> > +1
> >
> > On Sun, Mar 24, 2019 at 10:56 PM Hunter Lee  wrote:
> >
> > > I would like to start a discussion on making Java 8 a minimum
> requirement
> > > and upgrading the Maven version for Helix's next feature release. I'd
> > like
> > > to see how people feel about it.
> > >
> > > Did some homework on this and dug up a few precedences that are also
> > > top-level Apache projects dependent on ZooKeeper. The following
> > > documentation lists many pros of moving to Java 8 as well, many of
> which
> > I
> > > will not include in this email for the sake of brevity (see the links
> > > below).
> > >
> > > Open-source community discussions for
> > >
> > > Apache Samza: link1
> > > <
> > >
> >
> https://mail-archives.apache.org/mod_mbox/samza-dev/201610.mbox/%3CCAHUevGGnOQD_VmLWEdpFNq3Lv%2B6gQQmw_JKx9jDr5Cw%2BxFfGtQ%40mail.gmail.com%3E
> > > >
> > >
> > > Apache Kafka: link1 <https://markmail.org/message/gnrn5ccql7a2pmc5>
> > link2
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-118%3A+Drop+Support+for+Java+7
> > > >
> > >
> > > I've also had informal chats with PMC members of both Samza and Kafka
> > about
> > > this specifically for more context, and from what they said, the
> > transition
> > > has been very smooth.
> > >
> > > Here are Helix-specific reasons why I think the move would be
> beneficial:
> > >
> > > - Other Apache open-source platforms built on Helix such as Pinot and
> > > Gobblin all cite Java 8 as the minimum requirement. Building Helix in
> > Java
> > > 8 will help contributors of Helix respond to feature/debugging requests
> > in
> > > a more timely manner (without having to jump back and forth between
> Java
> > 7
> > > and 8).
> > >
> > > - The recent change in Maven
> > > <
> > >
> >
> https://central.sonatype.org/articles/2018/May/04/discontinued-support-for-tlsv11-and-below/
> > > >
> > > (Central
> > > Repository). Long story short, Helix build using JDK 7 on Maven 3.0.5+
> > will
> > > fail. Using JDK 8 solves this problem.
> > >
> > > The cost of moving to Java 8 is relatively low. Java 7 is forward
> > > compatibile with Java 8. However, there may be some backporting work
> > needed
> > > due to the way Java 8 changed the ConcurrentHashMap implementation.
> > >
> > > As for Maven, Helix's requirement currently is 3.0.4 which is a version
> > > just below the required version other dependent Apache projects use
> (say,
> > > Pinot <https://media.readthedocs.org/pdf/pinot/latest/pinot.pdf>).
> > Again,
> > > to save the contributors the trouble of having to navigate between
> Maven
> > > versions, I am also suggesting that we update this requirement.
> > >
> > >
> > > -Hunter
> > >
> >
>


Re: Sporadic delays in task execution

2019-03-22 Thread Hunter Lee
Let me add a caveat to my previous email. Although it comes with
scalability improvements, there are currently a few known issues with the
latest version. We'd encourage you to check back to make sure your current
usage isn't affected.

Hunter

On Fri, Mar 22, 2019 at 12:35 PM Hunter Lee  wrote:

> No problem. If you have further questions, let us know what kind of load
> you're putting on Helix as well. The newest version of Helix contains Task
> Framework 2.0, and has greater scalability in scheduling tasks, so you
> might want to consider using the newest version as well.
>
> Hunter
>
> On Fri, Mar 22, 2019 at 8:59 AM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com> wrote:
>
>> Hi Lee,
>>
>> Thanks for the trick. I didn't know that we can poke the controller like
>> that :) However now we can see that tasks are moving smoothly in our
>> staging setup. This behavior can be seen from time to time and get
>> resolved
>> automatically in few hours. I can't find a particular pattern however my
>> best guess is that this happens when the load is high. I will put some
>> load
>> on testing setup and see if I can reproduce this issue and try your
>> instructions then get back to you
>>
>> Thanks
>> Dimuthu
>>
>> On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee  wrote:
>>
>> > Hi Dimuthu,
>> >
>> > What Junkai meant by touching the IdealState is this:
>> >
>> > 1) use Zooinspector to log into ZK
>> > 2) Locate the IDEALSTATES/ path
>> > 3) grab any ZNode under that path and try to modify (just add a
>> > whitespace) and save
>> > 4) This will trigger a ZK callback which should tell Helix Controller to
>> > rebalance/schedule things
>> >
>> > On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <
>> > dimuthu.upeks...@gmail.com> wrote:
>> >
>> >> Hi Junkai,
>> >>
>> >> What do you mean by touching ideal state to trigger an event? I didn't
>> >> quite get what you said. Is that like creating some path in zookeeper?
>> >> Workflows are eventually scheduled but the problem is, it is very slow
>> due
>> >> to that 30s freeze.
>> >>
>> >> Thanks
>> >> Dimuthu
>> >>
>> >> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai 
>> wrote:
>> >>
>> >> > Can you try one thing? Touch the ideal state to trigger an event. If
>> >> > workflows are not scheduled, it should scheduling has problem.
>> >> >
>> >> > Best,
>> >> >
>> >> > Junkai
>> >> >
>> >> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
>> >> > dimuthu.upeks...@gmail.com> wrote:
>> >> >
>> >> >> Hi Junkai,
>> >> >>
>> >> >> We are using 0.8.1
>> >> >>
>> >> >> Dimuthu
>> >> >>
>> >> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai 
>> >> wrote:
>> >> >>
>> >> >> > Hi Dimuthu,
>> >> >> >
>> >> >> > What's the version of Helix you are using?
>> >> >> >
>> >> >> > Best,
>> >> >> >
>> >> >> > Junkai
>> >> >> >
>> >> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
>> >> >> > dimuthu.upeks...@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi Helix Dev,
>> >> >> > >
>> >> >> > > We are again seeing this delay in task execution. Please have a
>> >> look
>> >> >> at
>> >> >> > the
>> >> >> > > screencast [1] of logs printed in participant (top shell) and
>> >> >> controller
>> >> >> > > (bottom shell). When I record this, there were about 90 - 100
>> >> >> workflows
>> >> >> > > pending to be executed. As you can see some tasks were suddenly
>> >> >> executed
>> >> >> > > and then participant freezed for about 30 seconds before
>> executing
>> >> >> next
>> >> >> > set
>> >> >> > > of tasks. I can see some WARN logs on controller log. I feel
>> like
>> >> >> this 30
>> >> >> > > second delay is some sort of a pattern

Proposal: Moving Helix to Java 1.8 and upgrading Maven version

2019-03-24 Thread Hunter Lee
I would like to start a discussion on making Java 8 a minimum requirement
and upgrading the Maven version for Helix's next feature release. I'd like
to see how people feel about it.

Did some homework on this and dug up a few precedences that are also
top-level Apache projects dependent on ZooKeeper. The following
documentation lists many pros of moving to Java 8 as well, many of which I
will not include in this email for the sake of brevity (see the links
below).

Open-source community discussions for

Apache Samza: link1


Apache Kafka: link1  link2


I've also had informal chats with PMC members of both Samza and Kafka about
this specifically for more context, and from what they said, the transition
has been very smooth.

Here are Helix-specific reasons why I think the move would be beneficial:

- Other Apache open-source platforms built on Helix such as Pinot and
Gobblin all cite Java 8 as the minimum requirement. Building Helix in Java
8 will help contributors of Helix respond to feature/debugging requests in
a more timely manner (without having to jump back and forth between Java 7
and 8).

- The recent change in Maven

(Central
Repository). Long story short, Helix build using JDK 7 on Maven 3.0.5+ will
fail. Using JDK 8 solves this problem.

The cost of moving to Java 8 is relatively low. Java 7 is forward
compatibile with Java 8. However, there may be some backporting work needed
due to the way Java 8 changed the ConcurrentHashMap implementation.

As for Maven, Helix's requirement currently is 3.0.4 which is a version
just below the required version other dependent Apache projects use (say,
Pinot ). Again,
to save the contributors the trouble of having to navigate between Maven
versions, I am also suggesting that we update this requirement.


-Hunter


Re: Scaling participants to improve throughput of task execution

2019-04-05 Thread Hunter Lee
Hi Dimuthu -

1. In Task Framework, tasks are units of work that are mutually independent
- that is, Helix will schedule A and B without considering any dependency.
By what you said about how task A depends on B, what did you actually mean?
Is Task A blocking (sleeping) until some condition is met by calling some
remote call? Or are these actually jobs where Job A depends on B in the
JobDAG?

2. Could you expand on the configs you are using? Your WorkflowConfig and
JobConfig. How are you modeling your workload? 1 workflow - 1 job - 2 tasks?

3. Let us also check that the cluster has 3 instances live. When you boot
up the cluster, do you see 3 ZNodes under LIVEINSTANCES? 3 ZNodes under
CONFIGS/PARTICIPANT? 3 directories in /INSTANCES?

As Kishore said, in theory the throughput should increase as you give the
cluster more nodes. My hunch is that by some combination of configs and
dependency setting, the workload is somehow getting 'linearized,' which
explains the "almost same" time to execute.

Hunter

On Thu, Apr 4, 2019 at 3:38 PM DImuthu Upeksha 
wrote:

> Hi Kishore,
>
> There are two tasks (A [1], B [2]). I submit 1000 workflows at a time which
> includes both task A and task B. Task A depends on task B. In both tasks, I
> connect to a Thrift API to fetch some data and in task B, there is a remote
> ssh call to a compute host.
>
> In first test
> 1 Controller
> 1 Participant
> 1 Zookeeper
>
> In second test
> 1 Controller
> 2 Participants
> 1 Zookeeper
>
> In third test
> 1 Controller
> 3 Participants
> 1 Zookeeper
>
> However in all three cases, time to complete all 1000 submitted workflows
> were almost same. In fact in 2nd and 3rd cases, it took little more time
> than 1st case.
>
> I understand that there are lots of moving parts in this scenario (Thrift
> API performance, SSH client delays) however I need to know whether I have
> setup the cluster correctly. Is there some additional steps to be followed
> when adding a new participant? In my case, I just created a copy of 1st
> participant, changed the participant name and started it.
>
> [1]
>
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/task/submission/DefaultJobSubmissionTask.java
> [2]
>
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/task/env/EnvSetupTask.java
>
> Thanks
> Dimuthu
>
> On Thu, Apr 4, 2019 at 5:34 PM kishore g  wrote:
>
> > It should ideally but might depend on what happens within each task. Can
> > you give more information about the setup (how many nodes, tasks) etc.
> >
> > On Thu, Apr 4, 2019 at 2:15 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com
> > >
> > wrote:
> >
> > > Hi Folks,
> > >
> > > In task framework, it is expected to significantly improve the
> throughput
> > > of tasks executed if I add a new participant to the the cluster? Reason
> > for
> > > asking for this is, I'm seeing the almost same throughput  with one
> > > participant and two participants. I'm using helix 0.8.4 for this setup.
> > >
> > > Thanks
> > > Dimuthu
> > >
> >
>


Re: [ANNOUNCE] New Committer: Hunter Lee

2019-03-28 Thread Hunter Lee
Thanks everyone! Excited to be part of the group :)

On Thu, Mar 28, 2019 at 7:31 PM Lei Xia  wrote:

> Welcome, Hunter!
>
> On Thu, Mar 28, 2019 at 5:54 PM Xue Junkai  wrote:
>
> > Hi, All
> >
> >
> >   The Project Management Committee (PMC) for Apache Helix has asked
> Hunter
> > Lee to become a committer and we are pleased to announce that he has
> > accepted.
> >
> >
> >   Being a committer enables easier contribution to the project since
> there
> > is no need to go via the patch submission process. This should enable
> > better productivity.
> >
> >
> >   Welcome Hunter!
> >
> >
> > Helix Team
> >
>


Re: [VOTE] Apache Helix 0.8.4 Release

2019-02-28 Thread Hunter Lee
+1

On Thu, Feb 28, 2019 at 1:49 PM Wang Jiajun  wrote:

> +1
>
> Best Regards,
> Jiajun
>
>
> On Wed, Feb 27, 2019 at 3:26 PM Lei Xia  wrote:
>
> > +1
> >
> > On Wed, Feb 27, 2019 at 2:07 PM Xue Junkai  wrote:
> >
> > > Hi,
> > >
> > >
> > > This is to call for a vote on releasing the following candidate as
> Apache
> > > Helix 0.8.4. This is the 16th release of Helix as an Apache project, as
> > > well as the 12th release as a top-level Apache project.
> > >
> > >
> > > Apache Helix is a generic cluster management framework that makes it
> easy
> > > to build partitioned and replicated, fault-tolerant and scalable
> > > distributed systems.
> > >
> > >
> > > Release notes:
> > >
> > > *https://helix.apache.org/0.8.4-docs/releasenotes/release-0.8.4.html
> > > *
> > >
> > >
> > > Release artifacts:
> > >
> > > https://repository.apache.org/content/repositories/orgapachehelix-1026
> > >
> > >
> > > Distribution:
> > >
> > > * binaries:
> > >
> > > *https://dist.apache.org/repos/dist/dev/helix/0.8.4/binaries/
> > > *
> > >
> > > * sources:
> > >
> > > https://dist.apache.org/repos/dist/dev/helix/0.8.4/src/
> > >
> > >
> > > The 0.8.4 release tag:
> > >
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.8.4
> > >
> > >
> > > KEYS file available here:
> > >
> > > https://dist.apache.org/repos/dist/dev/helix/KEYS
> > >
> > >
> > > Please vote on the release. The vote will be open for at least 72
> hours.
> > >
> > >
> > > [+1] -- "YES, release"
> > >
> > > [0] -- "No opinion"
> > >
> > > [-1] -- "NO, do not release"
> > >
> > >
> > > Thanks,
> > >
> > > The Apache Helix Team
> > >
> >
>


Re: Apache Helix 0.8.3 (Major Release) - please read

2019-02-22 Thread Hunter Lee
It has come to our attention that there was a bug in Task Framework in
0.8.3. In short, Task Framework fails to honor instance tag constraints
when assigning tasks. This is critical and may break other services built
on Helix.

In order to fix forward, I'd like to solicit advice on how to potentially
deprecate 0.8.3 and call for a vote and publish another major version with
the fix. Although it's only been days since 0.8.3 was released, this bug is
critical enough that it warrants a new release.

Please reply to this email thread with your thoughts,
Hunter


Re: Sporadic delays in task execution

2019-03-22 Thread Hunter Lee
No problem. If you have further questions, let us know what kind of load
you're putting on Helix as well. The newest version of Helix contains Task
Framework 2.0, and has greater scalability in scheduling tasks, so you
might want to consider using the newest version as well.

Hunter

On Fri, Mar 22, 2019 at 8:59 AM DImuthu Upeksha 
wrote:

> Hi Lee,
>
> Thanks for the trick. I didn't know that we can poke the controller like
> that :) However now we can see that tasks are moving smoothly in our
> staging setup. This behavior can be seen from time to time and get resolved
> automatically in few hours. I can't find a particular pattern however my
> best guess is that this happens when the load is high. I will put some load
> on testing setup and see if I can reproduce this issue and try your
> instructions then get back to you
>
> Thanks
> Dimuthu
>
> On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee  wrote:
>
> > Hi Dimuthu,
> >
> > What Junkai meant by touching the IdealState is this:
> >
> > 1) use Zooinspector to log into ZK
> > 2) Locate the IDEALSTATES/ path
> > 3) grab any ZNode under that path and try to modify (just add a
> > whitespace) and save
> > 4) This will trigger a ZK callback which should tell Helix Controller to
> > rebalance/schedule things
> >
> > On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com> wrote:
> >
> >> Hi Junkai,
> >>
> >> What do you mean by touching ideal state to trigger an event? I didn't
> >> quite get what you said. Is that like creating some path in zookeeper?
> >> Workflows are eventually scheduled but the problem is, it is very slow
> due
> >> to that 30s freeze.
> >>
> >> Thanks
> >> Dimuthu
> >>
> >> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai 
> wrote:
> >>
> >> > Can you try one thing? Touch the ideal state to trigger an event. If
> >> > workflows are not scheduled, it should scheduling has problem.
> >> >
> >> > Best,
> >> >
> >> > Junkai
> >> >
> >> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
> >> > dimuthu.upeks...@gmail.com> wrote:
> >> >
> >> >> Hi Junkai,
> >> >>
> >> >> We are using 0.8.1
> >> >>
> >> >> Dimuthu
> >> >>
> >> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai 
> >> wrote:
> >> >>
> >> >> > Hi Dimuthu,
> >> >> >
> >> >> > What's the version of Helix you are using?
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> > Junkai
> >> >> >
> >> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> >> >> > dimuthu.upeks...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Helix Dev,
> >> >> > >
> >> >> > > We are again seeing this delay in task execution. Please have a
> >> look
> >> >> at
> >> >> > the
> >> >> > > screencast [1] of logs printed in participant (top shell) and
> >> >> controller
> >> >> > > (bottom shell). When I record this, there were about 90 - 100
> >> >> workflows
> >> >> > > pending to be executed. As you can see some tasks were suddenly
> >> >> executed
> >> >> > > and then participant freezed for about 30 seconds before
> executing
> >> >> next
> >> >> > set
> >> >> > > of tasks. I can see some WARN logs on controller log. I feel like
> >> >> this 30
> >> >> > > second delay is some sort of a pattern. What do you think as the
> >> >> reason
> >> >> > for
> >> >> > > this? I can provide you more information by turning on verbose
> >> logs on
> >> >> > > controller if you want.
> >> >> > >
> >> >> > > [1] https://youtu.be/3EUdSxnIxVw
> >> >> > >
> >> >> > > Thanks
> >> >> > > Dimuthu
> >> >> > >
> >> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> >> >> > dimuthu.upeks...@gmail.com
> >> >> > > >
> >> >> > > wrote:
> >> >> > &

For PMC - enabling GitHub issues and wiki

2019-05-26 Thread Hunter Lee
Could a member of the PMC update the ticket for GitHub issues and wiki?
This was discussed informally offline, so please mention that we do not
have the record of it, but as long as the PMC could verify that we want
this for Helix, the infra team should be able to go ahead and do it for us.
https://issues.apache.org/jira/browse/INFRA-18471

Thanks,
Hunter


Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread Hunter Lee
Hey Dimuthu -

We are actually in the process of preparing a new release, and this will
come with the previously mentioned bug fixes in Task Framework. It also
contains various ZK-related fixes - I don't know what your deployment
schedule is but it might be worth the wait of another week or so.

Hunter

On Fri, May 31, 2019 at 10:27 AM DImuthu Upeksha 
wrote:

> Now I'm seeing following error in controller log. Restarting the controller
> fixed the issue. We are time to time seeing this in controller with zk
> connection issues. Is this also something to do with zk client version?
>
> 2019-05-31 13:21:46,669 [Thread-0-SendThread(localhost:2181)] WARN
>  o.apache.zookeeper.ClientCnxn  - Session 0x16b0ebbee1d000e for server
> localhost/127.0.0.1:2181, unexpected error, closing socket connection and
> attempting reconnect
> java.io.IOException: Broken pipe
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:102)
> at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
>
> Thanks
> Dimuthu
>
> On Fri, May 31, 2019 at 1:14 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com>
> wrote:
>
> > Hi Lei,
> >
> > We use 0.8.2. We initially had 0.8.4 but it contains an issue with task
> > retry logic so we downgraded to 0.8.2. We are planning to go into
> > production with 0.8.2 by next week so can you please advice a better way
> to
> > solve this without upgrading to 0.8.4.
> >
> > Thanks
> > Dimuthu
> >
> > On Fri, May 31, 2019 at 1:04 PM Lei Xia  wrote:
> >
> >> Which Helix version do you use?  This may caused by this Zookeeper bug (
> >> https://issues.apache.org/jira/browse/ZOOKEEPER-706).  We have upgraded
> >> ZkClient in later Helix versions.
> >>
> >>
> >> Lei
> >>
> >> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
> >> dimuthu.upeks...@gmail.com> wrote:
> >>
> >>> Hi Folks,
> >>>
> >>> I'm getting following error in controller log and seems like controller
> >>> is
> >>> not moving froward after that point
> >>>
> >>> 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
> >>> Starting helix controller
> >>> 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
> >>> Settings loaded from
> >>>
> >>>
> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
> >>> 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> >>> Connection to helix cluster : AiravataDemoCluster with name :
> >>> helixcontroller2
> >>> 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> >>> Zookeeper connection string localhost:2181
> >>> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
> >>> o.a.h.c.GenericHelixController  - Exception while executing
> >>> DEFAULTpipeline:
> >>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for
> >>> cluster .AiravataDemoCluster. Will not continue to next pipeline
> >>> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
> >>> get
> >>> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
> >>> at
> org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
> >>> Caused by:
> org.apache.helix.api.exceptions.HelixMetaDataAccessException:
> >>> Fail to read nodes for
> >>> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
> >>> ... 7 common frames omitted
> >>>
> >>> In the zookeeper log I can see following warning getting printed
> >>> 

[RESULT][VOTE] Apache Helix 0.9.0 Release

2019-06-14 Thread Hunter Lee
Thanks for voting on the 0.9.0 release. It has now exceeded 72 hours so I
am closing the vote.

Binding +1s:
Kishore G.
Lei Xia
Junkai Xue

Nonbinding +1s:
Jiajun Wang

Binding 0s:
-

Nonbinding 0s:
-

Binding -1s:
-

Nonbinding -1s:
-

The vote has passed, thanks a lot to everyone for voting!


Re: Multiple instance group tags for Job config

2019-06-24 Thread Hunter Lee
https://github.com/apache/helix/issues/323


On Mon, Jun 24, 2019 at 10:09 AM Hunter Lee  wrote:

> Hi all -
>
> Moving forward, let's use GitHub Issues over JIRA. It is closer to the
> code itself and arguably easier to link to and work with the code. I have
> created an issue based on the JIRA Dimuthu provided for the time being.
>
> Hunter
>
> On Mon, Jun 24, 2019 at 8:56 AM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com> wrote:
>
>> Hi Junkai,
>>
>> Thanks for the quick response. I created the ticket [2] as you have asked.
>>
>> [2]
>>
>> https://issues.apache.org/jira/projects/HELIX/issues/HELIX-817?filter=allopenissues
>>
>> Thanks
>> Dimuthu
>>
>> On Mon, Jun 24, 2019 at 11:34 AM Xue Junkai  wrote:
>>
>> > Hi Dimuthu,
>> >
>> > That's a good feature to support in the future. We dont have plan to
>> > support it right now. Could you please create a Helix ticket for that?
>> >
>> > Best,
>> >
>> > Junkai
>> >
>> > On Mon, Jun 24, 2019 at 7:59 AM DImuthu Upeksha <
>> > dimuthu.upeks...@gmail.com>
>> > wrote:
>> >
>> > > Hi Folks,
>> > >
>> > > Currently we can set only one instance group tag for a job.
>> > >
>> > > jobCfg.setInstanceGroupTag("INSTANCEGROUPTAG");
>> > >
>> > > Do you have anything planned to support multiple instance group tags
>> for
>> > > one job so that the job can be run either in group A or group B? This
>> is
>> > > somewhat similar to Node Affinity [1] concept in Kubernetes.
>> > >
>> > > [1]
>> https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
>> > >
>> > > Thanks
>> > > Dimuthu
>> > >
>> >
>> >
>> > --
>> > Junkai Xue
>> >
>>
>


Re: Multiple instance group tags for Job config

2019-06-24 Thread Hunter Lee
Hi all -

Moving forward, let's use GitHub Issues over JIRA. It is closer to the code
itself and arguably easier to link to and work with the code. I have
created an issue based on the JIRA Dimuthu provided for the time being.

Hunter

On Mon, Jun 24, 2019 at 8:56 AM DImuthu Upeksha 
wrote:

> Hi Junkai,
>
> Thanks for the quick response. I created the ticket [2] as you have asked.
>
> [2]
>
> https://issues.apache.org/jira/projects/HELIX/issues/HELIX-817?filter=allopenissues
>
> Thanks
> Dimuthu
>
> On Mon, Jun 24, 2019 at 11:34 AM Xue Junkai  wrote:
>
> > Hi Dimuthu,
> >
> > That's a good feature to support in the future. We dont have plan to
> > support it right now. Could you please create a Helix ticket for that?
> >
> > Best,
> >
> > Junkai
> >
> > On Mon, Jun 24, 2019 at 7:59 AM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com>
> > wrote:
> >
> > > Hi Folks,
> > >
> > > Currently we can set only one instance group tag for a job.
> > >
> > > jobCfg.setInstanceGroupTag("INSTANCEGROUPTAG");
> > >
> > > Do you have anything planned to support multiple instance group tags
> for
> > > one job so that the job can be run either in group A or group B? This
> is
> > > somewhat similar to Node Affinity [1] concept in Kubernetes.
> > >
> > > [1] https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
> > >
> > > Thanks
> > > Dimuthu
> > >
> >
> >
> > --
> > Junkai Xue
> >
>


[VOTE] Apache Helix 0.9.0 Release

2019-06-11 Thread Hunter Lee
Hi,

This is to call for a vote on releasing the following candidate as Apache
Helix 0.9.0. This is the 17th release of Helix as an Apache project, as
well as the 13th release as a top-level Apache project.

Apache Helix is a generic cluster management framework that makes it easy
to build partitioned and replicated, fault-tolerant and scalable
distributed systems.

Release notes:
https://helix.apache.org/0.9.0-docs/releasenotes/release-0.9.0.html

Release artifacts:
https://repository.apache.org/content/repositories/orgapachehelix-1029/


Distribution:
* binaries:
https://dist.apache.org/repos/dist/dev/helix/0.9.0/binaries/

* sources:
https://dist.apache.org/repos/dist/dev/helix/0.9.0/src/


The [VERSION] release tag:
https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.0


KEYS file available here:
https://dist.apache.org/repos/dist/dev/helix/KEYS

Please vote on the release. The vote will be open for at least 72 hours.

[+1] -- "YES, release"
[0] -- "No opinion"
[-1] -- "NO, do not release"

Thanks,
The Apache Helix Team


Enabling issues and wiki on Apache GitHub

2019-05-23 Thread Hunter Lee
Hi Apache infra -

We would like to enable the issues feature and the wiki feature GitHub
provides on Apache Helix's GitHub. How should we go about this?

Thanks in advance!
Hunter


Syncing from GitHub issues to Apache JIRA

2019-05-23 Thread Hunter Lee
Hi Apache Infra -

I'm wondering if there is a way to sync the GitHub issues created with our
corresponding Apache JIRA board. As in, if we create an issue/PR in GitHub,
is there a tool/support available that creates a JIRA ticket as well?

Or when Apache measures a project's activity, is there a way to do so using
the activity on our GitHub mirror, instead of the activity on the JIRA
board?

Thank you,
Hunter


Re: Enabling issues and wiki on Apache GitHub

2019-05-23 Thread Hunter Lee
Created: https://issues.apache.org/jira/browse/INFRA-18471

Thanks!
Hunter

On Thu, May 23, 2019 at 4:49 PM Chris Lambertus  wrote:

>
>
>
> > On May 23, 2019, at 4:46 PM, Hunter Lee  wrote:
> >
> > Hi Apache infra -
> >
> > We would like to enable the issues feature and the wiki feature GitHub
> provides on Apache Helix's GitHub. How should we go about this?
>
> Infra jira ticket please.
>
>
> -Chris
> ASF Infra
>
>
> >
> > Thanks in advance!
> > Hunter
>
>


Re: Upgrade ioitec ZkClient version

2019-06-27 Thread Hunter Lee
Upon discussion, the direction should actually be to move away from using
IOItec's ZkClient due to the following reasons:
1. ZK version dependency
2. Helix's own ZKclient contains a lot of custom logic, different from
ZkClient.

We will proceed in this direction.

Hunter

On Thu, Jun 27, 2019 at 10:28 AM Hunter Lee  wrote:

> This email is to suggest the version bump up of ZkClient library used by
> Helix.
>
>1. We have noticed that sometimes ZK calls hang due to unknown
>reasons. This kind of issue seems to be commonly experienced by HBase
>users, but various fixes have been incorporated to ZK in versions 3.4+. The
>client version 0.5 is based on an older version of 3.4. I will attach a
>jstack log reported by one of our open source users, Gobblin at the end of
>this email.
>2. We have already upgraded ZK server to 3.4.13. The corresponding
>version for the ZkClient library is 0.11 (See
>https://github.com/sgroschupf/zkclient/blob/master/CHANGELOG.markdown
>
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsgroschupf%2Fzkclient%2Fblob%2Fmaster%2FCHANGELOG.markdown=02%7C01%7Chulee%40linkedin.com%7Ce8f2dd7d1fae49143ad308d6fb2456ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636972530820384140=1AFJyM%2BQ%2Bu2uL28rYrAHDzogtk0C%2Bn95aJJdLJ7bYPM%3D=0>).
>Other heavy users of ZooKeeper such as Kafka have already upgraded to 0.11.
>3. We will first proceed by testing it at LinkedIn's testing clusters
>to make sure there are no obvious signs of regression.
>
> Overall, the goal is to further stabilize ZK-related operations in Helix.
> Please take a look at the CHANGELOG linked above for more details on what
> changed across ZkClient versions.
>
> Let me know what you think,
> Hunter
>
> --
>
> "FetchJobSpecExecutor" #88 prio=5 os_prio=0 tid=0x7f8f8ab2c800
> nid=0x25e9 in Object.wait() [0x7f8f5c13b000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
> - locked <0x00076af719c0> (a
> org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1470)
>
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
>
> at
> org.apache.helix.manager.zk.zookeeper.ZkConnection.getChildren(ZkConnection.java:127)
>
> at
> org.apache.helix.manager.zk.zookeeper.ZkClient$2.call(ZkClient.java:698)
> at
> org.apache.helix.manager.zk.zookeeper.ZkClient$2.call(ZkClient.java:695)
> at
> org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1102)
>
> at
> org.apache.helix.manager.zk.zookeeper.ZkClient.getChildren(ZkClient.java:695)
>
> at
> org.apache.helix.manager.zk.zookeeper.ZkClient.getChildren(ZkClient.java:689)
>
> at
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildNames(ZkBaseDataAccessor.java:507)
>
> at
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:463)
>
> at
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:431)
>
> at
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:409)
>
> at
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:468)
>
> at
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:459)
>
> at
> org.apache.helix.task.TaskDriver.getWorkflows(TaskDriver.java:847)
> at
> org.apache.gobblin.cluster.HelixUtils.getWorkflowIdsFromJobNames(HelixUtils.java:287)
>
> at
> org.apache.gobblin.cluster.GobblinHelixJobScheduler.cancelJobIfRequired(GobblinHelixJobScheduler.java:363)
>
> at
> org.apache.gobblin.cluster.GobblinHelixJobScheduler.handleDeleteJobConfigArrival(GobblinHelixJobScheduler.java:352)
>
> at
> org.apache.gobblin.cluster.GobblinHelixJobScheduler.handleUpdateJobConfigArrival(GobblinHelixJobScheduler.java:322)
>


Upgrade ioitec ZkClient version

2019-06-27 Thread Hunter Lee
This email is to suggest the version bump up of ZkClient library used by
Helix.

   1. We have noticed that sometimes ZK calls hang due to unknown reasons.
   This kind of issue seems to be commonly experienced by HBase users, but
   various fixes have been incorporated to ZK in versions 3.4+. The client
   version 0.5 is based on an older version of 3.4. I will attach a jstack log
   reported by one of our open source users, Gobblin at the end of this email.
   2. We have already upgraded ZK server to 3.4.13. The corresponding
   version for the ZkClient library is 0.11 (See
   https://github.com/sgroschupf/zkclient/blob/master/CHANGELOG.markdown
   
).
   Other heavy users of ZooKeeper such as Kafka have already upgraded to 0.11.
   3. We will first proceed by testing it at LinkedIn's testing clusters to
   make sure there are no obvious signs of regression.

Overall, the goal is to further stabilize ZK-related operations in Helix.
Please take a look at the CHANGELOG linked above for more details on what
changed across ZkClient versions.

Let me know what you think,
Hunter

--

"FetchJobSpecExecutor" #88 prio=5 os_prio=0 tid=0x7f8f8ab2c800
nid=0x25e9 in Object.wait() [0x7f8f5c13b000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
- locked <0x00076af719c0> (a
org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1470)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
at
org.apache.helix.manager.zk.zookeeper.ZkConnection.getChildren(ZkConnection.java:127)

at
org.apache.helix.manager.zk.zookeeper.ZkClient$2.call(ZkClient.java:698)
at
org.apache.helix.manager.zk.zookeeper.ZkClient$2.call(ZkClient.java:695)
at
org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1102)

at
org.apache.helix.manager.zk.zookeeper.ZkClient.getChildren(ZkClient.java:695)

at
org.apache.helix.manager.zk.zookeeper.ZkClient.getChildren(ZkClient.java:689)

at
org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildNames(ZkBaseDataAccessor.java:507)

at
org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:463)

at
org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:431)

at
org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:409)

at
org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:468)

at
org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:459)

at
org.apache.helix.task.TaskDriver.getWorkflows(TaskDriver.java:847)
at
org.apache.gobblin.cluster.HelixUtils.getWorkflowIdsFromJobNames(HelixUtils.java:287)

at
org.apache.gobblin.cluster.GobblinHelixJobScheduler.cancelJobIfRequired(GobblinHelixJobScheduler.java:363)

at
org.apache.gobblin.cluster.GobblinHelixJobScheduler.handleDeleteJobConfigArrival(GobblinHelixJobScheduler.java:352)

at
org.apache.gobblin.cluster.GobblinHelixJobScheduler.handleUpdateJobConfigArrival(GobblinHelixJobScheduler.java:322)


Re: How to do Helix hot release?

2019-07-29 Thread Hunter Lee
This might be a question for Apache Infra. Let me also write a quick email
and include the Helix dev mailing list.

Hunter

On Mon, Jul 29, 2019 at 4:20 PM Wang Jiajun  wrote:

> Hi Helix devs,
>
> I'm sending this mail for discussing Helix hot release. This is requested
> by our customer Pinot recently.
> Unfortunately, it seems there is no official process to release a hot-fix
> version in the Helix project. Is there any suggestion on how we can do it?
> Note that Pinot requires a published Helix jar. So a code patch or a GitHub
> branch is probably not good enough for them.
>
> Thanks.
>
> Best Regards,
> Jiajun
>


Re: [VOTE] Apache Helix 0.9.1 Release

2019-08-14 Thread Hunter Lee
+1

On Wed, Aug 14, 2019 at 2:07 PM Wang Jiajun  wrote:

> Hi,
>
> This is to call for a vote on releasing the following candidate as Apache
> Helix 0.9.1. This is the 18th release of Helix as an Apache project, as
> well as the 14th release as a top-level Apache project.
>
> Apache Helix is a generic cluster management framework that makes it easy
> to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
> Release notes:
> https://helix.apache.org/0.9.1-docs/releasenotes/release-0.9.1.html
>
> Release artifacts:
> https://repository.apache.org/content/repositories/orgapachehelix-1032/
>
> Distribution:
> * binaries:
> https://dist.apache.org/repos/dist/dev/helix/0.9.1/binaries/
> * sources:
> https://dist.apache.org/repos/dist/dev/helix/0.9.1/src/
>
> The 0.9.1 release tag:
>
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.1
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/dev/helix/KEYS
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
> [+1] -- "YES, release"
> [0] -- "No opinion"
> [-1] -- "NO, do not release"
>
> Thanks,
> The Apache Helix Team
>


Re: Task is running periodically

2019-11-25 Thread Hunter Lee
Hi Dimuthu,

Are you using periodicalRebalance? That might trigger a rebalance
periodically even if there's no activity in the cluster.

As for the ConcurrentModification exception, I believe there was a patch
for that going from 0.8.2 to the latest on the upstream.

Hunter

On Mon, Nov 25, 2019 at 10:13 AM DImuthu Upeksha 
wrote:

> Hi Folks,
>
> We have noticed a task in Helix cluster was running periodically even
> though it was completed at each run. When we look at the logs of the
> controller, I can see some Concurrent Modification exceptions [1]. However
> this is a very rare occurrence. We have been using current Helix version
> for few months on our production deployments but this is the first time we
> have seen this behavior.
>
> Do you have any insight into this?
>
> We are using helix 0.8.2
>
> [1] https://gist.github.com/DImuthuUpe/33db7bbe5d53fcc38dfc66eb0d45df55
>
> Thanks
> Dimuthu
>


Re: [VOTE] Apache Helix 0.9.4 Release

2020-01-22 Thread Hunter Lee
It is up now.

Hunter

On Wed, Jan 22, 2020 at 8:48 AM Lei Xia  wrote:

> Thanks Hunter, the release notes link seems not work?
>
>
> Lei
>
> On Tue, Jan 21, 2020 at 11:40 PM Hunter Lee  wrote:
>
> > Hi,
> >
> > This is to call for a vote on releasing the following candidate as Apache
> > Helix 0.9.4. This is the 19th release of Helix as an Apache project, as
> > well as the 15th release as a top-level Apache project.
> >
> > Apache Helix is a generic cluster management framework that makes it easy
> > to build partitioned and replicated, fault-tolerant and scalable
> > distributed systems.
> >
> > Release notes:
> > https://helix.apache.org/0.9.4-docs/releasenotes/release-0.9.4.html
> >
> > Release artifacts:
> > https://repository.apache.org/content/repositories/orgapachehelix-1036/
> >
> > Distribution:
> > * binaries:
> > https://dist.apache.org/repos/dist/dev/helix/0.9.4/binaries/
> > * sources:
> > https://dist.apache.org/repos/dist/dev/helix/0.9.4/src/
> >
> > The 0.9.4 release tag:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.4
> >
> > KEYS file available here:
> > https://dist.apache.org/repos/dist/dev/helix/KEYS
> >
> > Please vote on the release. The vote will be open for at least 72 hours.
> >
> > [+1] -- "YES, release"
> > [0] -- "No opinion"
> > [-1] -- "NO, do not release"
> >
> > Thanks,
> > The Apache Helix Team
> >
>


[ANNOUNCE] Apache Helix 0.9.4 Release

2020-01-27 Thread Hunter Lee
The Apache Helix Team is pleased to announce the 19th release,
0.9.4, of the Apache Helix project.

Apache Helix is a generic cluster management framework that makes it easy
to build partitioned, fault tolerant, and scalable distributed systems.

The full release notes are available here:
https://helix.apache.org/0.9.4-docs/releasenotes/release-0.9.4.html

You can declare a maven dependency to use it:


  org.apache.helix
  helix-core
  0.9.4


Or download the release sources:
http://helix.apache.org/0.9.4-docs/download.cgi

Additional info

Website: http://helix.apache.org/
Helix mailing lists: http://helix.apache.org/mail-lists.html

We hope you will enjoy using the latest release of Apache Helix!

Cheers,
Apache Helix Team


Subject: [RESULT][VOTE] Apache Helix 0.9.4 Release

2020-01-27 Thread Hunter Lee
Thanks for voting on the 0.9.4 release. It has now exceeded 72 hours so I
am closing the vote.

Binding +1s:
 Lei Xia
 Junkai Xue
 Kishore G

Nonbinding +1s:

Binding 0s:

Nonbinding 0s:

Binding -1s:

Nonbinding -1s:

The vote has passed, thanks a lot to everyone for voting!


Review and try out new modules (helix-common and zookeeper-api)

2020-01-29 Thread Hunter Lee
Hi Helix users and devs-

I sent out an email describing the separation of ZkClient into a separate
module a few weeks ago and would like to follow up.

I am currently getting https://github.com/apache/helix/pull/684 reviewed.
Here is the changelist:
1. Remove IOItec import
2. Create helix-common to resolve circular dependencies
3. Create zookeeper-api

This should be pretty straightforward since there is no change in logic.

Here is the *migration guide*:
1. All the classes in helix-common kept the same package structure they had
in helix-core, so there should be no code change required.
2. For ZK-related classes, most of them are *backported* by way of
subclassing. But if you used IOItec classes explicitly, you'll need to fix
the imports (because we no longer import IOItec!). Do the following:

Find-replace in your project:
- org.I0Itec -> org.apache.helix.zookeeper.api
E.g.)
import org.I0Itec.zkclient.ZkServer;
becomes
import org.apache.helix.zookeeper.api.zkclient.ZkServer;

I've tested binary level compatibility with some production-level codebases
(at LinkedIn) and haven't come across any major problems. If deemed good,
this change will probably be included in the next major release.

Please review this change and let me know if you have any issues using it
in your projects.

Hunter


Re: Review and try out new modules (helix-common and zookeeper-api)

2020-02-05 Thread Hunter Lee
Since I haven't heard back from anyone for some time, I'll go ahead with
the merge/review. Migration guide has been updated and published in the
GitHub wiki.

Hunter

On Wed, Jan 29, 2020 at 7:30 PM Hunter Lee  wrote:

> Hi Helix users and devs-
>
> I sent out an email describing the separation of ZkClient into a separate
> module a few weeks ago and would like to follow up.
>
> I am currently getting https://github.com/apache/helix/pull/684 reviewed.
> Here is the changelist:
> 1. Remove IOItec import
> 2. Create helix-common to resolve circular dependencies
> 3. Create zookeeper-api
>
> This should be pretty straightforward since there is no change in logic.
>
> Here is the *migration guide*:
> 1. All the classes in helix-common kept the same package structure they
> had in helix-core, so there should be no code change required.
> 2. For ZK-related classes, most of them are *backported* by way of
> subclassing. But if you used IOItec classes explicitly, you'll need to fix
> the imports (because we no longer import IOItec!). Do the following:
>
> Find-replace in your project:
> - org.I0Itec -> org.apache.helix.zookeeper.api
> E.g.)
> import org.I0Itec.zkclient.ZkServer;
> becomes
> import org.apache.helix.zookeeper.api.zkclient.ZkServer;
>
> I've tested binary level compatibility with some production-level
> codebases (at LinkedIn) and haven't come across any major problems. If
> deemed good, this change will probably be included in the next major
> release.
>
> Please review this change and let me know if you have any issues using it
> in your projects.
>
> Hunter
>


Re: [VOTE] Apache Helix 0.9.300 Release

2020-01-21 Thread Hunter Lee
I am stopping the vote for 0.9.300 and re-creating 0.9.4.

Hunter

On Tue, Jan 21, 2020 at 4:58 PM Hunter Lee  wrote:

> Correction:
>
> Release notes:
> https://helix.apache.org/0.9.300-docs/releasenotes/release-0.9.300.html
>
> This page will be updated shortly. All other links have already been made
> available.
>
> Hunter
>
> On Tue, Jan 21, 2020 at 4:57 PM Hunter Lee  wrote:
>
>> Hi,
>>
>> This is to call for a vote on releasing the following candidate as
>> Apache Helix 0.9.300. This is the 19th release of Helix as an Apache project,
>> as well as the 15th release as a top-level Apache project.
>>
>> Apache Helix is a generic cluster management framework that makes it
>> easy to build partitioned and replicated, fault-tolerant and scalable
>> distributed systems.
>>
>> Release notes:
>> https://helix.apache.org/0.9.1-docs/releasenotes/release-0.9.300.html
>>
>> Release artifacts:
>> https://repository.apache.org/content/repositories/orgapachehelix-1035/
>> <https://repository.apache.org/content/repositories/orgapachehelix-1032/>
>>
>> Distribution:
>> * binaries:
>> https://dist.apache.org/repos/dist/dev/helix/0.9.300/binaries/
>> * sources:
>> https://dist.apache.org/repos/dist/dev/helix/0.9.300/src/
>>
>> The 0.9.1 release tag:
>>
>> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.300
>>
>> KEYS file available here:
>> https://dist.apache.org/repos/dist/dev/helix/KEYS
>>
>> Please vote on the release. The vote will be open for at least 72 hours.
>>
>> [+1] -- "YES, release"
>> [0] -- "No opinion"
>> [-1] -- "NO, do not release"
>>
>> Thanks,
>> The Apache Helix Team
>>
>


Re: [VOTE] Apache Helix 0.9.300 Release

2020-01-21 Thread Hunter Lee
Correction:

Release notes:
https://helix.apache.org/0.9.300-docs/releasenotes/release-0.9.300.html

This page will be updated shortly. All other links have already been made
available.

Hunter

On Tue, Jan 21, 2020 at 4:57 PM Hunter Lee  wrote:

> Hi,
>
> This is to call for a vote on releasing the following candidate as Apache
> Helix 0.9.300. This is the 19th release of Helix as an Apache project, as
> well as the 15th release as a top-level Apache project.
>
> Apache Helix is a generic cluster management framework that makes it easy
> to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
> Release notes:
> https://helix.apache.org/0.9.1-docs/releasenotes/release-0.9.300.html
>
> Release artifacts:
> https://repository.apache.org/content/repositories/orgapachehelix-1035/
> <https://repository.apache.org/content/repositories/orgapachehelix-1032/>
>
> Distribution:
> * binaries:
> https://dist.apache.org/repos/dist/dev/helix/0.9.300/binaries/
> * sources:
> https://dist.apache.org/repos/dist/dev/helix/0.9.300/src/
>
> The 0.9.1 release tag:
>
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.300
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/dev/helix/KEYS
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
> [+1] -- "YES, release"
> [0] -- "No opinion"
> [-1] -- "NO, do not release"
>
> Thanks,
> The Apache Helix Team
>


[VOTE] Apache Helix 0.9.4 Release

2020-01-21 Thread Hunter Lee
Hi,

This is to call for a vote on releasing the following candidate as Apache
Helix 0.9.4. This is the 19th release of Helix as an Apache project, as
well as the 15th release as a top-level Apache project.

Apache Helix is a generic cluster management framework that makes it easy
to build partitioned and replicated, fault-tolerant and scalable
distributed systems.

Release notes:
https://helix.apache.org/0.9.4-docs/releasenotes/release-0.9.4.html

Release artifacts:
https://repository.apache.org/content/repositories/orgapachehelix-1036/

Distribution:
* binaries:
https://dist.apache.org/repos/dist/dev/helix/0.9.4/binaries/
* sources:
https://dist.apache.org/repos/dist/dev/helix/0.9.4/src/

The 0.9.4 release tag:
https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.4

KEYS file available here:
https://dist.apache.org/repos/dist/dev/helix/KEYS

Please vote on the release. The vote will be open for at least 72 hours.

[+1] -- "YES, release"
[0] -- "No opinion"
[-1] -- "NO, do not release"

Thanks,
The Apache Helix Team


Re: [VOTE] Apache Helix 0.9.5 Release

2020-05-11 Thread Hunter Lee
+1

On Mon, May 11, 2020 at 12:00 PM Xue Junkai  wrote:

> Hi,
>
>
> This is to call for a vote on releasing the following candidate as Apache
> Helix 0.9.5. This is the 22nd release of Helix as an Apache project, as
> well as the 18th release as a top-level Apache project. This release is
> supporting the customers are using 0.9 series.
>
>
> Apache Helix is a generic cluster management framework that makes it easy
> to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
>
> Release notes:
>
> http://helix.apache.org/0.9.5-docs/releasenotes/release-0.9.5.html
>
>
> Release artifacts:
>
> https://repository.apache.org/content/repositories/orgapachehelix-1038
>
>
> Distribution:
>
> * binaries:
>
> https://dist.apache.org/repos/dist/dev/helix/0.9.5/binaries/
>
> * sources:
>
> https://dist.apache.org/repos/dist/dev/helix/0.9.5/src/
>
>
> The 0.9.5 release tag:
>
>
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.5
>
>
> KEYS file available here:
>
> https://dist.apache.org/repos/dist/dev/helix/KEYS
>
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
>
> [+1] -- "YES, release"
>
> [0] -- "No opinion"
>
> [-1] -- "NO, do not release"
>
>
> Thanks,
>
> The Apache Helix Team
>


Re: [jira] [Created] (HELIX-823) run-helix-controller.sh command gives error

2020-05-20 Thread Hunter Lee
Hi,

We are tracking this issue. This was caused by an issue in the auto-merge
process. We'll be creating a fix shortly.

Hunter

On Wed, May 20, 2020 at 5:04 AM anil (Jira)  wrote:

> anil created HELIX-823:
> --
>
>  Summary: run-helix-controller.sh command gives error
>  Key: HELIX-823
>  URL: https://issues.apache.org/jira/browse/HELIX-823
>  Project: Apache Helix
>   Issue Type: Bug
>   Components: helix-core
>  Environment: Linux Red Hat 4.8.5-4
> Reporter: anil
>
>
> Download helix verversion 1.0.0 binary and setup two node cluster. Every
> thing work fine.
>
> but when fire below command
>
> ./run-helix-controller.sh --zkSvr localhost:2181 --cluster jbpm-cluster
>
> It gives error -
>
> sterName:jbpm-cluster, controllerName:null, mode:STANDALONE
> Exception in thread "main" *java.lang.NoSuchFieldError: Rebalancer*
>  at org.apache.helix.InstanceType.(InstanceType.java:39)
>  at
> org.apache.helix.controller.HelixControllerMain.startHelixController(HelixControllerMain.java:156)
>  at
> org.apache.helix.controller.HelixControllerMain.main(HelixControllerMain.java:212)
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


Re: [VOTE] Apache Helix 0.9.8 Release

2020-10-14 Thread Hunter Lee
+1. Thanks for putting this together!

On Wed, Oct 14, 2020 at 6:25 PM Wang Jiajun  wrote:

> Hi,
>
> This is to call for a vote on releasing the following candidate as Apache
> Helix 0.9.8. This is the 23rd release of Helix as an Apache project, as
> well as the 19th release as a top-level Apache project. This release is
> supporting the customers who are using the 0.9 series.
>
> Apache Helix is a generic cluster management framework that makes it easy
> to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
> Release notes:
> https://helix.apache.org/0.9.8-docs/releasenotes/release-0.9.8.html
>
> Release artifacts:
> https://repository.apache.org/content/repositories/orgapachehelix-1042/
>
> Distribution:
> * binaries:
> https://dist.apache.org/repos/dist/dev/helix/0.9.8/binaries/
> * sources:
> https://dist.apache.org/repos/dist/dev/helix/0.9.8/src/
>
> The 0.9.8 release tag:
>
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.8
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/dev/helix/KEYS
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
> [+1] -- "YES, release"
> [0] -- "No opinion"
> [-1] -- "NO, do not release"
>
> Thanks,
> The Apache Helix Team
>


Re: Workflow task retry intervals

2020-08-19 Thread Hunter Lee
There is a field you could set. You could look in JobConfig to see which
enum value you could set for task retries. It allows you to set two types
of delays - 1. before starting a task, and 2. delay between retries. These
are static values and we don't support exponential backoff yet.

Hunter

On Wed, Aug 19, 2020 at 1:50 PM DImuthu Upeksha 
wrote:

> Hi folks,
>
> Is there a way to define time intervals between task retries or even to
> provide exponential backoff intervals?
>
> Thanks
> Dimuthu
>


Re: [VOTE] Apache Helix 0.9.9 Release

2020-11-21 Thread Hunter Lee
+1

On Sat, Nov 21, 2020 at 7:43 AM Lei Xia  wrote:

> +1
>
> On Fri, Nov 20, 2020 at 8:04 PM Xue Junkai  wrote:
>
> > Hi,
> >
> > This is to call for a vote on releasing the following candidate as Apache
> > Helix 0.9.9. This is the 24th release of Helix as an Apache project, as
> > well as the 20th release as a top-level Apache project.
> >
> > Apache Helix is a generic cluster management framework that makes it easy
> > to build partitioned and replicated, fault-tolerant and scalable
> > distributed systems.
> >
> > Release notes:
> > http://helix.apache.org/0.9.9-docs/releasenotes/release-0.9.9.html
> > Release artifacts:
> > https://repository.apache.org/content/repositories/orgapachehelix-1043
> >
> > Distribution:
> > * binaries: https://dist.apache.org/repos/dist/dev/helix/0.9.9/binaries/
> > * sources: https://dist.apache.org/repos/dist/dev/helix/0.9.9/src/
> >
> > The 0.9.9 release tag:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-0.9.9
> >
> > KEYS file available here:
> > https://dist.apache.org/repos/dist/dev/helix/KEYS
> >
> > Please vote on the release.
> > The vote will be open for at least 72 hours.
> > [+1] -- "YES, release"
> > [0] -- "No opinion"
> > [-1] -- "NO, do not release"
> >
> > Thanks,
> >
> > The Apache Helix Team
> >
>


Re: [VOTE] Apache Helix 1.0.2 Release

2021-06-08 Thread Hunter Lee
+1

On Tue, Jun 8, 2021 at 1:10 PM Junkai Xue  wrote:

> Hi,
>
>
> This is to call for a vote on releasing the following candidate as Apache
> Helix 1.0.2. This is the 22nd release of Helix as an Apache project, as
> well as the 18th release as a top-level Apache project.
>
>
> Apache Helix is a generic cluster management framework that makes it easy
> to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
>
> Release notes:
>
> https://helix.apache.org/1.0.2-docs/releasenotes/release-1.0.2.html
>
>
> Release artifacts:
>
> https://repository.apache.org/content/repositories/orgapachehelix-1045
>
>
> Distribution:
>
> * binaries:
>
> https://dist.apache.org/repos/dist/dev/helix/1.0.2/binaries/
>
> * sources:
>
> https://dist.apache.org/repos/dist/dev/helix/1.0.2/src/
>
>
> The 1.0.2 release tag:
>
>
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-1.0.2
>
>
> KEYS file available here:
>
> https://dist.apache.org/repos/dist/dev/helix/KEYS
>
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
>
> [+1] -- "YES, release"
>
> [0] -- "No opinion"
>
> [-1] -- "NO, do not release"
>
>
> Thanks,
>
> The Apache Helix Team
>


Re: [VOTE] Apache Helix 1.0.2 Release

2021-08-19 Thread Hunter Lee
+1

On Thu, Aug 19, 2021 at 2:34 PM Junkai Xue  wrote:

> Hi,
>
>
> This is to call for a vote on releasing the following candidate as Apache
> Helix 1.0.2. This is the 22nd release of Helix as an Apache project, as
> well as the 18th release as a top-level Apache project.
>
>
> Apache Helix is a generic cluster management framework that makes it easy
> to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
>
> Release notes:
>
> https://helix.apache.org/1.0.2-docs/releasenotes/release-1.0.2.html
>
>
> Release artifacts:
>
> https://repository.apache.org/content/repositories/orgapachehelix-1046
>
>
> Distribution:
>
> * binaries:
>
> https://dist.apache.org/repos/dist/dev/helix/1.0.2/binaries/
>
> * sources:
>
> https://dist.apache.org/repos/dist/dev/helix/1.0.2/src/
>
>
> The 1.0.2 release tag:
>
>
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-
> 1.0.2
>
>
> KEYS file available here:
>
> https://dist.apache.org/repos/dist/dev/helix/KEYS
>
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
>
> [+1] -- "YES, release"
>
> [0] -- "No opinion"
>
> [-1] -- "NO, do not release"
>
>
> Thanks,
>
> The Apache Helix Team
>


Re: Log4J

2021-12-16 Thread Hunter Lee
Thanks Brent for a quick turnaround.

With Helix we find that laptops aren't usually powerful enough to run
tests. But around last year we started looking at GitHub CI for testing
results for testing consistency.

Seems that the test is still running, so let's wait this out and see what
we get.

Hunter

On Thu, Dec 16, 2021 at 5:17 PM Junkai Xue  wrote:

> Thanks Brent! Right, I was commenting on your PR with that. Maybe we need
> to run the patch you provided to double verify it before merging.
> Anyway, thanks for contributing to this!
>
> Best,
>
> Junkai
>
> On Thu, Dec 16, 2021 at 2:11 PM Brent  wrote:
>
> > I'm sure you all saw the notifications, but I pushed a PR for this at
> > https://github.com/apache/helix/pull/1922
> >
> > I describe some of this in the PR, but the changes rippled out a little
> > further than I thought, partly due to the Zookeeper dependency still
> > bringing in vulnerable versions and partly due to a few places in code
> > referencing Log4j 1.x APIs/packages/classes directly.
> >
> > My main concern, other than the magnitude of the change, is that I
> > successfully ran all of the tests except helix-core.  All of the
> helix-core
> > tests succeeded up until the last 150 or so when I started getting out of
> > memory errors, e.g.:
> > [ERROR] Failures:
> > [ERROR]   TestConfigAccessor.testBasic:50 » OutOfMemory unable to create
> > new native thre...
> > [ERROR]   TestConfigAccessor.testDeleteCloudConfig:329 » OutOfMemory
> unable
> > to create ne...
> > [ERROR]   TestConfigAccessor.testSetRestConfig:219 » OutOfMemory unable
> to
> > create new na...
> >
> > I can't tell if that's just my laptop or if it's a legitimate problem
> > introduced by this change, so any independent verification (maybe the PR
> > hooks already do this) would be greatly appreciated.  I'm going to try to
> > test this in one of our dev environments, but would it would be great if
> > someone else could independently verify too.
> >
> > Thanks!
> >
> > ~Brent
> >
> > On Wed, Dec 15, 2021 at 11:01 AM Hunter Lee  wrote:
> >
> > > Thanks Brent. We'll keep an eye out for it.
> > >
> > > Hunter
> > >
> > > On Wed, Dec 15, 2021 at 12:42 AM Brent 
> > wrote:
> > >
> > > > I filed this issue so we have something to track:
> > > > https://github.com/apache/helix/issues/1921
> > > >
> > > > I'm attempting to get Log4J 2.16.x building and running properly
> > locally.
> > > > I will submit a PR if I can get it working.
> > > >
> > > > Thanks!
> > > >
> > > > On Tue, Dec 14, 2021 at 8:40 AM Brent 
> > wrote:
> > > >
> > > > > Thanks Hunter, much appreciated!  I will try to put together a
> patch
> > > with
> > > > > what I've done for remediation elsewhere (good news is it's not
> much
> > > > since
> > > > > Helix still inherits Log4J 1.x).  If you wouldn't mind, I might
> also
> > > file
> > > > > an issue to consider upgrading to Log4J 2.16.x that was just pushed
> > > out (
> > > > > https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4).
> > > That
> > > > > one will require some more thought to make sure things don't break
> I
> > > > > suspect.
> > > > >
> > > > > ~Brent
> > > > >
> > > > > On Mon, Dec 13, 2021 at 1:42 PM Hunter Lee 
> > wrote:
> > > > >
> > > > >> This is being discussed. Feel free to post a patch if you're
> > > interested
> > > > >> (but do let us know so there's no duplicate effort being made
> here).
> > > > >>
> > > > >> On Fri, Dec 10, 2021 at 1:33 PM Brent 
> > > > wrote:
> > > > >>
> > > > >> > [Feel free to take this offline or out-of-band if this is an
> > > > >> inappropriate
> > > > >> > place to discuss this]
> > > > >> >
> > > > >> > Is there any hotfixing planned as a result of the Log4J zero day
> > > going
> > > > >> > around?
> > > > >> >
> > > > >> > Reference: https://www.lunasec.io/docs/blog/log4j-zero-day/
> > > > >> > CVE: https://nvd.nist.gov/vuln/detail/CVE-2021-44228
> > > > >> >
> > > > >> > From what I can tell, Helix seems to be building with
> > > > >> >
> https://mvnrepository.com/artifact/org.slf4j/slf4j-log4j12/1.7.14
> > > > >> which in
> > > > >> > turn maps to
> > https://mvnrepository.com/artifact/log4j/log4j/1.2.17
> > > > >> >
> > > > >> > The exploit is more prevalent in the 2.x versions of Log4J, but
> > > there
> > > > >> are
> > > > >> > scenarios where 1.x is exploitable and it's been pointed out
> that
> > > 1.x
> > > > is
> > > > >> > also end of life and has other vulnerabilities.
> > > > >> >
> > > > >> > See:
> > > > >> >
> > > > >>
> > > >
> > https://github.com/apache/logging-log4j2/pull/608#issuecomment-990494126
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> > ~Brent
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>


Re: Log4J

2021-12-15 Thread Hunter Lee
Thanks Brent. We'll keep an eye out for it.

Hunter

On Wed, Dec 15, 2021 at 12:42 AM Brent  wrote:

> I filed this issue so we have something to track:
> https://github.com/apache/helix/issues/1921
>
> I'm attempting to get Log4J 2.16.x building and running properly locally.
> I will submit a PR if I can get it working.
>
> Thanks!
>
> On Tue, Dec 14, 2021 at 8:40 AM Brent  wrote:
>
> > Thanks Hunter, much appreciated!  I will try to put together a patch with
> > what I've done for remediation elsewhere (good news is it's not much
> since
> > Helix still inherits Log4J 1.x).  If you wouldn't mind, I might also file
> > an issue to consider upgrading to Log4J 2.16.x that was just pushed out (
> > https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4).  That
> > one will require some more thought to make sure things don't break I
> > suspect.
> >
> > ~Brent
> >
> > On Mon, Dec 13, 2021 at 1:42 PM Hunter Lee  wrote:
> >
> >> This is being discussed. Feel free to post a patch if you're interested
> >> (but do let us know so there's no duplicate effort being made here).
> >>
> >> On Fri, Dec 10, 2021 at 1:33 PM Brent 
> wrote:
> >>
> >> > [Feel free to take this offline or out-of-band if this is an
> >> inappropriate
> >> > place to discuss this]
> >> >
> >> > Is there any hotfixing planned as a result of the Log4J zero day going
> >> > around?
> >> >
> >> > Reference: https://www.lunasec.io/docs/blog/log4j-zero-day/
> >> > CVE: https://nvd.nist.gov/vuln/detail/CVE-2021-44228
> >> >
> >> > From what I can tell, Helix seems to be building with
> >> > https://mvnrepository.com/artifact/org.slf4j/slf4j-log4j12/1.7.14
> >> which in
> >> > turn maps to https://mvnrepository.com/artifact/log4j/log4j/1.2.17
> >> >
> >> > The exploit is more prevalent in the 2.x versions of Log4J, but there
> >> are
> >> > scenarios where 1.x is exploitable and it's been pointed out that 1.x
> is
> >> > also end of life and has other vulnerabilities.
> >> >
> >> > See:
> >> >
> >>
> https://github.com/apache/logging-log4j2/pull/608#issuecomment-990494126
> >> >
> >> > Thanks!
> >> >
> >> > ~Brent
> >> >
> >>
> >
>


Re: Log4J

2021-12-13 Thread Hunter Lee
This is being discussed. Feel free to post a patch if you're interested
(but do let us know so there's no duplicate effort being made here).

On Fri, Dec 10, 2021 at 1:33 PM Brent  wrote:

> [Feel free to take this offline or out-of-band if this is an inappropriate
> place to discuss this]
>
> Is there any hotfixing planned as a result of the Log4J zero day going
> around?
>
> Reference: https://www.lunasec.io/docs/blog/log4j-zero-day/
> CVE: https://nvd.nist.gov/vuln/detail/CVE-2021-44228
>
> From what I can tell, Helix seems to be building with
> https://mvnrepository.com/artifact/org.slf4j/slf4j-log4j12/1.7.14 which in
> turn maps to https://mvnrepository.com/artifact/log4j/log4j/1.2.17
>
> The exploit is more prevalent in the 2.x versions of Log4J, but there are
> scenarios where 1.x is exploitable and it's been pointed out that 1.x is
> also end of life and has other vulnerabilities.
>
> See:
> https://github.com/apache/logging-log4j2/pull/608#issuecomment-990494126
>
> Thanks!
>
> ~Brent
>


Re: [VOTE] Apache Helix 0.9.10 Release

2022-04-05 Thread Hunter Lee
+1

On Tue, Apr 5, 2022 at 1:10 PM Junkai Xue  wrote:

> Hi,
>
> We would like to do a retrospective vote for Apache Helix 0.9.10 since
> 0.9.10 was a replacement of 0.9.9, which did not reflect the latest change
> when it was released.
>
> Content wise, there is no change on release note for 0.9.10 from 0.9.9.
>
> * binaries:
> https://dist.apache.org/repos/dist/release/helix/0.9.10/binaries/
>
> * sources:
> https://dist.apache.org/repos/dist/release/helix/0.9.10/src/
>
> Thanks,
>
> The Apache Helix Team
>


Re: [VOTE] Apache Helix 1.0.3 Release

2022-04-18 Thread Hunter Lee
+1

On Mon, Apr 18, 2022 at 6:26 PM Junkai Xue  wrote:

> Hi,
>
> This is to call for a vote on releasing the following candidate as
> Apache Helix 1.0.3. This is the 24th release of Helix as an Apache
> project, as well as the 20th release as a top-level Apache project.
>
> Apache Helix is a generic cluster management framework that makes it
> easy to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
> (*ATTENTION: Due to webpage publishing problem, we use source code for
> release to unblock the Apache release. We are asking Apache Folks
> helping us now!*)
> Release notes:
> https://github.com/apache/helix/blob/master/website/1.0.3/src/site/apt/releasenotes/release-1.0.3.apt
>
> Release artifacts:
> https://repository.apache.org/content/repositories/orgapachehelix-1050
>
> Distribution:
> * binaries:https://dist.apache.org/repos/dist/dev/helix/1.0.3/binaries/
> * sources:https://dist.apache.org/repos/dist/dev/helix/1.0.3/src/
>
> The [VERSION] release
> tag:
> https://gitbox.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-1.0.3
>
> KEYS file available here:https://dist.apache.org/repos/dist/dev/helix/KEYS
>
> Please vote on the release. The vote will be open for at least 72 hours.
>
> [+1] -- "YES, release"
> [0] -- "No opinion"
> [-1] -- "NO, do not release"
>
> Thanks,
> The Apache Helix Team
>


Re: [VOTE] Apache Helix 1.0.4 Release

2022-05-12 Thread Hunter Lee
+1

On Thu, May 12, 2022 at 1:34 AM Junkai Xue  wrote:

> Hi,
>
> This is to call for a vote on releasing the following candidate as
> Apache Helix 1.0.4. This is the 25th release of Helix as an Apache
> project, as well as the 21st release as a top-level Apache project.
>
> Apache Helix is a generic cluster management framework that makes it
> easy to build partitioned and replicated, fault-tolerant and scalable
> distributed systems.
>
> Release notes:
> https://helix.apache.org/1.0.4-docs/releasenotes/release-1.0.4.html
>
> Release artifacts:
> https://repository.apache.org/content/repositories/orgapachehelix-1052
>
> Distribution:
> * binaries:https://dist.apache.org/repos/dist/dev/helix/1.0.4/binaries/*
> sources:https://dist.apache.org/repos/dist/dev/helix/1.0.4/src/
> The 1.0.4 release
> tag:
> https://gitbox.apache.org/repos/asf?p=helix.git;a=tag;h=refs/tags/helix-1.0.4
>
> KEYS file available here:https://dist.apache.org/repos/dist/dev/helix/KEYS
> Please vote on the release. The vote will be open for at least 72 hours.
>
> [+1] -- "YES, release"
> [0] -- "No opinion"
> [-1] -- "NO, do not release"
>
> Thanks,
> The Apache Helix Team
>