from:"Benjamin Reed"

zk-merger-pr.py and PR 1444

2021-03-01 Thread Benjamin Reed

i found an interesting issue when merging PR 1444: an unintended
icloud email popped up. (it looks like the simple retrigger commits
were done with that email.) i thought i could override it with the
author prompt but that only adds authors. i ended up modifying the
script slightly to account for it. i'm wondering if others have run
into this issue.

ben

Re: Re: [Commit Accident Case Study] Commit 4faf507 broke the build

2021-02-12 Thread Benjamin Reed

thank you for figuring this out! two questions:

1) is there a daily test build still running? if so, where can we see
its status?
2) what is the easiest way to retrigger tests? (sorry, i know i've
asked this before :'( )

ben

On Wed, Feb 10, 2021 at 2:18 AM Szalay-Bekő Máté
 wrote:
>
> > For punishment:
> > I will frozen/forbid my committership permission for three months
>
> I think you took this too seriously. Mistakes / accidents happen when
> someone is working (I did much more serious ones myself on different
> projects). And the community is grateful for the contribution, no one
> should expect perfection. At least I hope so, for my sake :p
>
> Independently from this issue we really should focus on making our CI to be
> rock-solid. So if the CI is red, then we could assume the PR broke
> something. Currently I think flaky tests and independent CI issues are more
> frequently causing red builds than actual failures introduced by PRs.
>
> Cheers,
> Mate
>
> On Wed, Feb 10, 2021 at 10:26 AM Justin Ling Mao 
> wrote:
>
> > Haha, it scared me. Let me go through this accident.
> > The root cause is: I'm over-confident, frivolous and hasty. I flatter
> > myself that it's just a typo and committing it could not have anything bad
> > happens. And I also don't give this PR a buffer time for other people's
> > review.
> > Accident is bad, but it's much more terrible if we can not reflect on it
> > and think about how to avoid it next time.
> > For remedy:
> > I will add a new section: Commit Accident Case Study in [1] for the
> > successor’s learning (Can anyone give me the permission to edit that wiki)?
> > I will sum up our commit rules and the checklists before committing one
> > patch, and do some works to use the Github CI and commit script to
> > protect/check these constraint.
> > For punishment:
> > I will frozen/forbid my committership permission for three months(02-10 ~
> > 05-10). During this period, I must not commit anything. I wish I could
> > reflect on my fault and have a better understanding on the wording: "With
> > great power comes great responsibility"
> >
> > Reference:[1]
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
> > - Original Message -
> > From: Andor Molnar 
> > To: maoling199210...@sina.com
> > Cc: dev 
> > Subject: Re: Commit 4faf507 broke the build
> > Date: 2021-02-10 00:26
> >
> > I’m sorry Justin. There’s no excuse for a mistake like this. We should not
> > show mercy for anybody, otherwise it would erode the trust in our
> > community. Your committership is now revoked.
> > Just kidding. Don’t worry at all. ;-)
> > I reverted the patch, so now please create a new PR with all the required
> > changes included.
> > Also I second Enrico’s comment: if CI is in bad shape, we should fix it.
> > Regards,
> > Andor
> > > On 2021. Feb 9., at 13:45, Justin Ling Mao 
> > wrote:
> > >
> > > Oops, it's my blame. I'm very sorry for my mistakes. Since these days
> > the CI is in disorder and it's a typo, so I'm not waiting for CI check and
> > forgot that an UT has covered this change although I wrote these related
> > codes. It's all my mistake and I will summarize our submission process and
> > this accident. I will write another letter to discuss the commit rules and
> > how to improve our code review throughput
> > >
> > >
> > > - Original Message -
> > > From: Andor Molnar 
> > > To: DevZooKeeper 
> > > Subject: Commit 4faf507 broke the build
> > > Date: 2021-02-09 19:43
> > >
> > > Hi,
> > > I noticed that the latest commit 4faf507 ZOOKEEPER-4007: A typo in the
> > ZKUtil#validateFileInput method broke the build, because the unit test has
> > not been amended.
> > > I reverted the commit to fix the build. Please create new PR with a
> > proper patch.
> > > Has the committer verified that the build is green before submitting it?
> > > Andor
> >

rebase and retest on github

2021-01-28 Thread Benjamin Reed

i would really like to get ZOOKEEPER-3922 but it needs to be rebased
and retested. is there a nice way to do that on github? or does the
pull requestor need to do that?

happy new year,
ben

[jira] [Created] (ZOOKEEPER-3922) Add support for two server ZooKeeper with hardware oracle

2020-08-27 Thread Benjamin Reed (Jira)

Benjamin Reed created ZOOKEEPER-3922:


 Summary: Add support for two server ZooKeeper with hardware oracle
 Key: ZOOKEEPER-3922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3922
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Benjamin Reed


Currently, we cannot really have ZooKeeper ensembles of size less than 3 and 
still tolerate failures. However, with hardware support for failure detection, 
we could support a 2 server ensemble and still tolerate the failure of one 
machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: committing changes

2020-04-16 Thread Benjamin Reed

great thanx! i've put a reference to this page at the top of the other
page. (i'm glad we have a script!!!)

thanx
ben

On Thu, Apr 16, 2020 at 11:05 AM Enrico Olivelli  wrote:
>
> Il Gio 16 Apr 2020, 19:39 Benjamin Reed  ha scritto:
>
> > i want to start getting christopher's maven changes in! starting with
> > https://github.com/apache/zookeeper/pull/1313 it's been a while since
> > I've actually pushed. i wanted to make sure
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Committing+changes
> > is still update before i followed the instructions there. (i was kind
> > of hoping it got easier. the CHANGES.TXT thing is a pain!) can another
> > committer confirm?
> >
>
> Once github reports a successful build on Jenkins we are good to go.
>
> In order to commit the patch just use the script
>
> python3 zk-merge-pr.py
>
> The right guide is
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Merging+Github+Pull+Requests
>
> Feel free to ping me on slack if you have problems.
>
> Enrico
>
>
>
> > thanx
> > ben
> >

committing changes

2020-04-16 Thread Benjamin Reed

i want to start getting christopher's maven changes in! starting with
https://github.com/apache/zookeeper/pull/1313 it's been a while since
I've actually pushed. i wanted to make sure
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Committing+changes
is still update before i followed the instructions there. (i was kind
of hoping it got easier. the CHANGES.TXT thing is a pain!) can another
committer confirm?

thanx
ben

Re: Contribs as separate git repos

2020-04-10 Thread Benjamin Reed

did you mean to mention the c client in the list of things to move to
another repo? it's not in contribs right now.

i think it would be nice to clean the contrib a bit. i don't really
support moving it to a separate repo. the nice thing about keeping it
in the same repo is that when there is a change that might break
contrib, we can catch it.

one problem with "cleaning" contrib is that different people have
different perspectives on the usefulness. for example, i only use
fatjar files. it makes everything easier when using zookeeper. on the
other hand, i never use the python binding; i wouldn't recommend it to
anyone either: kazoo is just too good!

having said all that, i'm extremely happy you are working on this
Christopher! your maven cleanups and enhancements are great!

ben

On Thu, Apr 9, 2020 at 7:57 PM Christopher  wrote:
>
> On Thu, Apr 9, 2020 at 7:21 AM Enrico Olivelli  wrote:
> >
> > Answers inline
> >
> > Il Gio 9 Apr 2020, 05:28 Christopher  ha scritto:
> >
> > > On Wed, Apr 8, 2020 at 2:01 PM Damien Diederen 
> > > wrote:
> > > >
> > > >
> > > > Hi Christopher,
> > > >
> > > > > I am just curious if anybody has thought about, or perhaps discussed,
> > > > > the idea that the projects in the zookeeper-contrib folder should be
> > > > > in their own separate git repos?
> > > >
> > > > We were discussing this a few days ago:
> > > >
> > > > https://github.com/apache/zookeeper/pull/1068#issuecomment-607160440
> > > >
> > > > My (only) concern was that I wouldn't want to see contribs *even more*
> > > > abandoned than they are now:
> > >
> >
> > Yep
> > But I'd no one contributes to them it is better to drop them from master.
> >
> > >
> > > That's a fair concern. It is always sad to see code get abandoned.
> > > Moving them out won't solve a "lack of interest" problem. Apache is
> > > composed of volunteers... and sometimes interest in a project withers.
> > > But, it can help organize whatever remaining (or future) interest
> > > there is by decoupling the contrib and presenting it as a smaller,
> > > more focused project.
> > >
> > > >
> > > > >> While I wouldn't be opposed to moving "unpopular" bindings to their
> > > > >> own repository, it would probably only make sense to do so if merge
> > > > >> rules are somewhat relaxed—as I suspect it would otherwise be even
> > > > >> more difficult to meet the "two PMC approvals" threshold.
> > >
> >
> > We need two committers +1, not strictly PMCs.
> > This is setting the quality of our product,
> > everything we deliver must have the same level.
>
> Are those different for ZooKeeper? I'm not sure what the norm is here.
>
> For Accumulo and Fluo, we invite people to be PMC at the same time as
> committer. (It's easier, because ASF policy is private@ is for PMC,
> and PMC status is required to vote on releases.) But, I know lots of
> projects keep them separate... as sort of a tiered merit system. Both
> have their pros and cons.
>
> >
> > Personally I find good to keep the python binding, and maybe to make it a
> > sibling of the C client inside the zookeeper-clients module.
> > I saw recent activity on fat-jar.
> >
> > Other modules seem abandoned so no value for me in keeping them.
> >
>
> It would seem kind of wasteful to create a new repo for contrib
> modules that are already known to be abandoned. They could just be
> dropped. They can always be restored from the history and be moved
> into their own repo if there is renewed interest in future.
>
> I guess there are several options, to mix-and-match:
>
> 1. delete an already abandoned contrib module from the source tree
> (can restore from history if desired),
> 2. move a contrib to a new git repo to isolate it from the core build, or
> 3. keep in the main build... but maybe reorganize, as needed to
> improve the build in other ways (better maven profiles?).
>
> Personally, I think option 2 would be good for the C client and the
> python binding, as contributors often have expertise in one language
> over another, and it seems that these wouldn't need to be released as
> often as the main project. So, they could be more easily maintained
> and released independently.
>
> But, I'm not really in a position to assess the current state of any
> particular contrib. Until a few were mentioned here, I hadn't even
> looked closely enough to know how many there were or what they each
> were for. If I were to help with any of this, I'd rely heavily on
> current expertise about the state of existing contribs.

Re: Decrease number of threads in Jenkins builds to reduce flakyness

2018-10-12 Thread Benjamin Reed

i think the unique port assignment (d) is more problematic than it
appears. there is a race between finding a free port and actually
grabbing it. i think that contributes to the flakiness.

ben
On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar  wrote:
>
> That is a completely valid point. I started to investigate flakies for 
> exactly the same reason, if you remember the thread that I started a while 
> ago. It was later abandoned unfortunately, because I’ve run into a few issues:
>
> - We nailed down that in order to release 3.5 stable, we have to make sure 
> it’s not worse than 3.4 by comparing the builds: but these builds are not 
> comparable, because 3.4 tests running single threaded while 3.5 multithreaded 
> showing problems which might also exist on 3.4,
>
> - Neither of them running C++ tests for some reason, but that’s not really an 
> issue here,
>
> - Looks like tests on 3.5 is just as solid as on 3.4, because running them on 
> a dedicated, single threaded environment show almost all tests succeeding,
>
> - I think the root cause of failing unit tests could be one (or more) of the 
> following:
> a) Environmental: Jenkins slave gets overloaded with other builds and 
> multithreaded test running makes things even worse: starving JDK threads and 
> ZK instances (both clients and servers) are unable to operate
> b) Conceptional: ZK unit tests were not designed to run on multiple 
> threads: I investigated the unique port assignment feature which is looking 
> good, but there could be other possible gaps which makes them unreliable when 
> running simultaneously.
> c) Bad testing: testing ZK in the wrong way, making bad assumption 
> (e.g. not syncing clients), etc.
> d) Bug in the server.
>
> I feel that finding case d) with these tests is super hard, because a test 
> report doesn’t give any information on what could go wrong with ZooKeeper. 
> More or less guessing is your only option.
>
> Finding c) is a little bit easier, I’m trying to submit patches on them and 
> hopefully making some progress.
>
> The huge pain in the arse though are a) and b): people desperately keep 
> commenting “please retest this” on github to get a green build while testing 
> is going in a direction to hide real problems: I mean people started not to 
> care about a failing build, because “it must be some flaky unrelated to my 
> patch”. Which is bad, but the shame is it’s true 90% percent of cases.
>
> I’m just trying to find some ways - besides fixing c) and d) flakies - to get 
> more reliable and more informative Jenkins builds. Don’t want to make a huge 
> turnaround, but I think if we can get a significantly more reliable build for 
> the price of slightly longer build time running on 4 threads instead of 8, I 
> say let’s do it.
>
> As always, any help from the community is more than welcome and appreciated.
>
> Thanks,
> Andor
>
>
>
>
> > On 2018. Oct 12., at 16:52, Patrick Hunt  wrote:
> >
> > iirc the number of threads was increased to improve performance. Reducing
> > is fine, but do we understand why it's failing? Perhaps it's finding real
> > issues as a result of the artificial concurrency/load.
> >
> > Patrick
> >
> > On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar 
> > wrote:
> >
> >> Thanks for the feedback.
> >> I'm running a few tests now: branch-3.5 on 2 threads and trunk on 4 threads
> >> to see what's the impact on the build time.
> >>
> >> Github PR job is hard to configure, because its settings are hard coded
> >> into a shell script in the codebase. I have to open PR for that.
> >>
> >> Andor
> >>
> >>
> >>
> >> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
> >> nkal...@cloudera.com.invalid> wrote:
> >>
> >>> +1, running the tests locally with 1 thread always passes (well, I run it
> >>> about 5 times, but still)
> >>> On the other hand, running it on 8 threads yields similarly flaky results
> >>> as Apache runs. (Although it is much faster, but if we have to run 6-8-10
> >>> times sometimes to get a green run...)
> >>>
> >>> Norbert
> >>>
> >>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli 
> >>> wrote:
> >>>
>  +1
> 
>  Enrico
> 
>  Il ven 12 ott 2018, 13:52 Andor Molnar  ha scritto:
> 
> > Hi,
> >
> > What do you think of changing number of threads running unit tests in
> > Jenkins from current 8 to 4 or even 2?
> >
> > Running unit tests inside Cloudera environment on a single thread
> >> shows
>  the
> > builds much more stable. That would be probably too slow, but maybe
>  running
> > at least less threads would improve the situation.
> >
> > It's getting very annoying that I cannot get a green build on GitHub
> >>> with
> > only a few retests.
> >
> > Regards,
> > Andor
> >
>  --
> 
> 
>  -- Enrico Olivelli
> 
> >>>
> >>
>

[jira] [Commented] (ZOOKEEPER-3108) use a new property server.id in the zoo.cfg to substitute for myid file

2018-09-14 Thread Benjamin Reed (JIRA)



[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615566#comment-16615566
 ] 

Benjamin Reed commented on ZOOKEEPER-3108:
--

the reason we kept myid out of the config is so that all the servers can use 
the same configuration file. the id would then be tied with the data.

the id of the server should be a rather permanent thing. for example, if you 
have an ensemble that has ids host1=1, host2=2, host3=3 and an observer with id 
of host4=4, today to make host3 an observer and host4 a participant you have to 
go through reconfiguration. with this id configuration option it is tempting to 
just change the ids (host4=3 and host3=4). this can result in data loss or 
corruption.

it's not a show stopper, but we do need to document it properly: even though 
the id can be set via the configuration file, it should be considered bound to 
the data directory.

> use a new property server.id in the zoo.cfg to substitute for myid file
> ---
>
> Key: ZOOKEEPER-3108
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3108
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When use zk in distributional model,we need to touch a myid file in 
> dataDir.then write a unique number to it.It is inconvenient and not 
> user-friendly,Look at an example from other distribution system such as 
> kafka:it just uses broker.id=0 in the server.properties to indentify a unique 
> server node.This issue is going to abandon the myid file and use a new 
> property such as server.id=0 in the zoo.cfg. this fix will be applied to 
> master branch,branch-3.5+,
> keep branch-3.4 unchaged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Jute max buffer size related question

2018-08-24 Thread Benjamin Reed

we stop reading the socket once we hit max buffer size so we don't
overflow memory. it was put in when a buggy client cause the server to
think it was getting a 1G packet and ran out of memory trying to
allocate memory for it. in theory we could read in the data and just
drop it on the floor. this would allow us to get to the next packet,
but really this is a sanity check. if the packets are coming in that
big, the client is insane, so we need to drop them.

ben
On Thu, Aug 23, 2018 at 11:01 PM Karan Mehta  wrote:
>
> Hello everyone,
>
> Why do we close the clientCnxn whenever a client sends a request which
> payload larger than jute max buffer size? (and similar for client as well)
>
> Is it a security issue if we send a relevant KeeperException instead? Even
> more, we send the parameter value to the client and client can chunk up
> request accordingly? If not, can somebody elaborate on the reason.
>
> Thanks
> Karan

[jira] [Resolved] (ZOOKEEPER-3104) Potential data inconsistency due to NEWLEADER packet being sent too early during SNAP sync

2018-08-03 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3104.
--
   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 583
[https://github.com/apache/zookeeper/pull/583]

> Potential data inconsistency due to NEWLEADER packet being sent too early 
> during SNAP sync
> --
>
> Key: ZOOKEEPER-3104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3104
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, in SNAP sync, the leader will start queuing the proposal/commits 
> and the NEWLEADER packet before sending over the snapshot over wire. So it's 
> possible that the zxid associated with the snapshot might be higher than all 
> the packets queued before NEWLEADER.
>  
> When the follower received the snapshot, it will apply all the txns queued 
> before NEWLEADER, which may not cover all the txns up to the zxid in the 
> snapshot. After that, it will write the snapshot out to disk with the zxid 
> associated with the snapshot. In case the server crashed after writing this 
> out, when loading the data from disk, it will use zxid of the snapshot file 
> to sync with leader, and it could cause data inconsistent, because we only 
> replayed partial of the historical data during previous syncing.
>  
> NEWLEADER packet means the learner now has the correct and almost up to data 
> state as leader, so it makes more sense to move the NEWLEADER packet after 
> sending over snapshot, and this is what we did in the fix.
>  
> Besides this, the socket timeout is changed to use smaller sync timeout after 
> received NEWLEADER ack, in high write traffic ensembles with large snapshot, 
> the follower might be timed out by leader before finishing sending over those 
> queued txns after writing snapshot out, which could cause the follower 
> staying in syncing state forever. Move the NEWLEADER packet after sending 
> over snapshot can avoid this issue as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper

2018-07-30 Thread Benjamin Reed (JIRA)



[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561968#comment-16561968
 ] 

Benjamin Reed commented on ZOOKEEPER-3036:
--

can you give a bit more detail as to what happened? there were 3 servers. it 
sounds like one of the followers failed right? the leader should keep working 
with the other follower alive. did the leader actually shutdown as well?

> Unexpected exception in zookeeper
> -
>
> Key: ZOOKEEPER-3036
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.4.10
> Environment: 3 Zookeepers, 5 kafka servers
>Reporter: Oded
>Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka 
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR 
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected 
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>     at java.net.SocketInputStream.read(SocketInputStream.java:171)
>     at java.net.SocketInputStream.read(SocketInputStream.java:141)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>     at java.io.DataInputStream.readInt(DataInputStream.java:387)
>     at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>     at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>     at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE 
> /192.168.0.91:42490 
>  
> We would expect that zookeeper will choose another Leader and the Kafka 
> cluster will continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ZOOKEEPER-3095) Connect string fix for non-existent hosts

2018-07-27 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3095.
--
Resolution: Fixed

Issue resolved by pull request 579
[https://github.com/apache/zookeeper/pull/579]

> Connect string fix for non-existent hosts
> -
>
> Key: ZOOKEEPER-3095
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3095
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: other
>Affects Versions: 3.4.0
>Reporter: Mohamed Jeelani
>Assignee: Mohamed Jeelani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Connect string fix for non-existent hosts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ZOOKEEPER-3072) Race condition in throttling

2018-07-27 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3072.
--
Resolution: Fixed

> Race condition in throttling
> 
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>Reporter: Botond Hejj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.4
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a race condition in the server throttling code. It is possible that 
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>  
>     submitRequest(si);
>     }
>     }
>     cnxn.incrOutstandingRequests(h);
>     }
>  
> incrOutstandingRequests() checks for limit breach, and potentially turns on 
> throttling, 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>  
> submitRequest() will create a logical request and en-queue it so that 
> Processor thread can pick it up. After being de-queued by Processor thread, 
> it does necessary handling, and then calls this 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
>  :
>  
>     cnxn.sendResponse(hdr, rsp, "response");
>  
> and in sendResponse(), it first appends to outgoing buffer, and then checks 
> if un-throttle is needed:  
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>  
> However, if there is a context switch between submitRequest() and 
> cnxn.incrOutstandingRequests(), so that Processor thread completes 
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv() 
> will happen before disableRecv(), and enableRecv() will fail the CAS ops, 
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is 
> needed for letting in requests, and sendResponse is needed to trigger 
> un-throttle, but sendResponse() requires an incoming message. From that point 
> on, ZK server will no longer select the affected client socket for read, 
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit 
> down to 1 makes this reproducible easier as throttling starts with less 
> requests. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ZOOKEEPER-3072) Race condition in throttling

2018-07-27 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-3072:
-
Fix Version/s: 3.5.4
   3.6.0

> Race condition in throttling
> 
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>Reporter: Botond Hejj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.4, 3.6.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a race condition in the server throttling code. It is possible that 
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>  
>     submitRequest(si);
>     }
>     }
>     cnxn.incrOutstandingRequests(h);
>     }
>  
> incrOutstandingRequests() checks for limit breach, and potentially turns on 
> throttling, 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>  
> submitRequest() will create a logical request and en-queue it so that 
> Processor thread can pick it up. After being de-queued by Processor thread, 
> it does necessary handling, and then calls this 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
>  :
>  
>     cnxn.sendResponse(hdr, rsp, "response");
>  
> and in sendResponse(), it first appends to outgoing buffer, and then checks 
> if un-throttle is needed:  
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>  
> However, if there is a context switch between submitRequest() and 
> cnxn.incrOutstandingRequests(), so that Processor thread completes 
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv() 
> will happen before disableRecv(), and enableRecv() will fail the CAS ops, 
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is 
> needed for letting in requests, and sendResponse is needed to trigger 
> un-throttle, but sendResponse() requires an incoming message. From that point 
> on, ZK server will no longer select the affected client socket for read, 
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit 
> down to 1 makes this reproducible easier as throttling starts with less 
> requests. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ZOOKEEPER-3061) add more details to 'Unhandled scenario for peer' log.warn message

2018-07-27 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3061.
--
   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 555
[https://github.com/apache/zookeeper/pull/555]

> add more details to 'Unhandled scenario for peer' log.warn message
> --
>
> Key: ZOOKEEPER-3061
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3061
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-3061.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A few lines earlier the {{LOG.info("Synchronizing with Follower sid: ...}} 
> logging already contains most relevant details but it would be convenient to 
> more directly have full details in the {{LOG.warn("Unhandled scenario for 
> peer sid: ...}} itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ZOOKEEPER-3073) fix couple of typos

2018-07-10 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3073.
--
Resolution: Fixed

> fix couple of typos
> ---
>
> Key: ZOOKEEPER-3073
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3073
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Saw a number of open pull requests concerning typos but without associated 
> JIRA ticket and so here taking the opportunity to gather them up (where not 
> already otherwise taken care of) plus couple of additions I noticed whilst my 
> other code was doing its compiling-and-testing thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ZOOKEEPER-3078) Remove unused print_completion_queue function

2018-07-10 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3078.
--
Resolution: Fixed

> Remove unused print_completion_queue function
> -
>
> Key: ZOOKEEPER-3078
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3078
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.5.4
>Reporter: Kent R. Spillner
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The function print_completion_queue in zookeeper.c causes compilation errors 
> with GCC 8.  However, this function is unused and can safely be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ZOOKEEPER-3079) Fix unsafe use of sprintf(3) for creating IP address strings

2018-07-10 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3079.
--
Resolution: Fixed

> Fix unsafe use of sprintf(3) for creating IP address strings
> 
>
> Key: ZOOKEEPER-3079
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3079
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.5.4
>Reporter: Kent R. Spillner
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The function format_endpoint_info in zookeeper.c causes compiler errors when 
> building with GCC 8 due to a potentially unsafe use of sprintf(3).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ZOOKEEPER-2886) Permanent session moved error in multi-op only connections

2018-07-10 Thread Benjamin Reed (JIRA)



 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2886.
--
Resolution: Fixed

> Permanent session moved error in multi-op only connections
> --
>
> Key: ZOOKEEPER-2886
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2886
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If there are slow followers, it's possible that the leader and the client 
> disagree on where the client is connecting to, therefore the client keeps 
> getting "Session Moved" error. Partial of the issue fixed in Jira: 
> ZOOKEEPER-710, but leaves the issue in multi-op only connection. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

rerunning jenkins test

2018-06-25 Thread Benjamin Reed

is there a button on jenkins somewhere that we can push to rerun a test?

thanx
ben

[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file

2018-06-11 Thread Benjamin Reed (JIRA)



[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508441#comment-16508441
 ] 

Benjamin Reed commented on ZOOKEEPER-3056:
--

how are you getting in a state where you have log file but no snapshot? is it 
that a machine starts up with no data and then diff syncs with the leader? or 
is there another case that i'm missing.

trying to use a txn log with no base snapshot seems frought with danger.

 

> Fails to load database with missing snapshot file but valid transaction log 
> file
> 
>
> Key: ZOOKEEPER-3056
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4
>Reporter: Michael Han
>Priority: Critical
>
> [An 
> issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E]
>  was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing 
> snapshot file.
> The code complains about missing snapshot file is 
> [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206]
>  which is introduced as part of ZOOKEEPER-2325.
> With this check, ZK will not load the db without a snapshot file, even the 
> transaction log files are present and valid. This could be a problem for 
> restoring a ZK instance which does not have a snapshot file but have a sound 
> state (e.g. it crashes before being able to take the first snap shot with a 
> large snapCount parameter configured).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Meetup at Cloudera

2017-10-18 Thread Benjamin Reed

is there still going to be a meetup tomorrow? i don't see an
announcement anywhere.

ben

On Fri, Oct 13, 2017 at 12:02 PM, Jordan Zimmerman
 wrote:
> Damn - I'm in Germany - that's 2 in the morning. Can't make it. If it's the 
> week after, though, I can do it.
>
> -Jordan
>
>> On Oct 13, 2017, at 9:00 PM, Abraham Fine  wrote:
>>
>> The current plan is 5PM-8PM PST, is that acceptable?
>>
>> Abe
>>
>> On Fri, Oct 13, 2017, at 11:43, Jordan Zimmerman wrote:
>>> OK - however, it's turning out the Oct 19 may not work for me. It depends
>>> on the time. The following week is much better. FYI
>>>
 On Oct 13, 2017, at 8:00 PM, Abraham Fine  wrote:

 Hey Jordan-

 I think having a presentation on persistent watches would be great.

 I'll send out info on joining remotely as soon as its available to me.

 Thanks,
 Abe

 On Wed, Oct 11, 2017, at 00:40, Jordan Zimmerman wrote:
> As usual I'd like to attend remotely if that's possible. I'm in Europe
> until the 25th though but if it's at the right time I can present on some
> nice new features in Curator or possibly the work I've been doing for
> Persistent Watches in ZooKeeper itself.
>
> -Jordan
>
>> On Oct 11, 2017, at 1:06 AM, Abraham Fine  wrote:
>>
>> Hello ZooKeeper Community-
>>
>> It has been a while since our last meetup and it would be great to bring
>> everyone together again. Cloudera would be able to host a meetup at our
>> headquarters in Palo Alto, CA next week (I'm thinking 10/19).
>>
>> I was hoping to use the mailing lists to gauge interest. Please reply if
>> you think you would be able to attend or would prefer a different date.
>>
>> Looking forward to hearing from everyone.
>>
>> Thanks,
>> Abe
>
>>>
>

[jira] [Resolved] (ZOOKEEPER-2772) Delete node command does not honor Acl policy

2017-05-17 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2772.
--
Resolution: Not A Bug

> Delete node command does not honor Acl policy
> -
>
> Key: ZOOKEEPER-2772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2772
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.4.8, 3.4.10
>Reporter: joe smith
>
> I set the acl to not be able to delete a node - but was able to delete 
> regardless.
> I am not familiar with the code, but a reply from Martin in the user@ mailing 
> list seems to confirm the issue.  I will paste his response below - sorry for 
> the long listing.
> Martin's reply are inline prefixed with: MG>
> --
> From: joe smith <water4...@yahoo.com.INVALID>
> Sent: Tuesday, May 2, 2017 8:40 AM
> To: u...@zookeeper.apache.org
> Subject: Acl block detete not working
> Hi,
> I'm using 3.4.10 and setting custom aol to block deletion of a znode.  
> However, I'm able to delete the node even after I've set acl from cdrwa to 
> cra.
> Can anyone point out if I missed some step.
> Thanks for the help
> Here is the trace:
> [zk: localhost:2181(CONNECTED) 0] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 1] create /test "data"
> Created /test
> [zk: localhost:2181(CONNECTED) 2] ls /
> [zookeeper, test]
> [zk: localhost:2181(CONNECTED) 3] addauth myfqdn localhost
> [zk: localhost:2181(CONNECTED) 4] setAcl /test myfqdn:localhost:cra
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> MG>in SetAclCommand you can see the acl being parsed and acl being set by 
> setAcl into zk object
> List acl = AclParser.parse(aclStr);
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> Stat stat = zk.setACL(path, acl, version);
> MG>later on in DeleteCommand there is no check for aforementioned acl 
> parameter
>   public boolean exec() throws KeeperException, InterruptedException {
> String path = args[1];
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> zk.delete(path, version);
> } catch(KeeperException.BadVersionException ex) {
> err.println(ex.getMessage());
> }
> return false;
> MG>as seen here the testCase works properly saving the Zookeeper object
> LsCommand entity = new LsCommand();
> entity.setZk(zk);
> MG>but setACL does not save the zookeeper object anywhere but instead seems 
> to discard zookeeper object with accompanying ACLs
> MG>can you report this bug to Zookeeper?
> https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
> ZooKeeper - ASF JIRA - 
> issues.apache.org<https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel>
> issues.apache.org
> Apache ZooKeeper is a service for coordinating processes of distributed 
> applications. Versions: Unreleased. Name Release date; Unreleased 3.2.3 : 
> Unreleased 3.3.7
> MG>Thanks Joe!
> [zk: localhost:2181(CONNECTED) 5] getAcl /test
> 'myfqdn,'localhost
> : cra
> [zk: localhost:2181(CONNECTED) 6] get /testdata
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 7] set /test "testwrite"
> Authentication is not valid : /test
> [zk: localhost:2181(CONNECTED) 8] delete /test
> [zk: localhost:2181(CONNECTED) 9] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 10]
> The auth provider imple is here: 
> http://s000.tinyupload.com/?file_id=42827186839577179157



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ZOOKEEPER-2772) Delete node command does not honor Acl policy

2017-05-13 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009215#comment-16009215
 ] 

Benjamin Reed commented on ZOOKEEPER-2772:
--

this appears to be a misunderstanding of what the DELETE acl protects. CREATE 
and DELETE are about restricting operations on children of the znode, not the 
znode itself.

> Delete node command does not honor Acl policy
> -
>
> Key: ZOOKEEPER-2772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2772
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.4.8, 3.4.10
>Reporter: joe smith
>
> I set the acl to not be able to delete a node - but was able to delete 
> regardless.
> I am not familiar with the code, but a reply from Martin in the user@ mailing 
> list seems to confirm the issue.  I will paste his response below - sorry for 
> the long listing.
> Martin's reply are inline prefixed with: MG>
> --
> From: joe smith <water4...@yahoo.com.INVALID>
> Sent: Tuesday, May 2, 2017 8:40 AM
> To: u...@zookeeper.apache.org
> Subject: Acl block detete not working
> Hi,
> I'm using 3.4.10 and setting custom aol to block deletion of a znode.  
> However, I'm able to delete the node even after I've set acl from cdrwa to 
> cra.
> Can anyone point out if I missed some step.
> Thanks for the help
> Here is the trace:
> [zk: localhost:2181(CONNECTED) 0] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 1] create /test "data"
> Created /test
> [zk: localhost:2181(CONNECTED) 2] ls /
> [zookeeper, test]
> [zk: localhost:2181(CONNECTED) 3] addauth myfqdn localhost
> [zk: localhost:2181(CONNECTED) 4] setAcl /test myfqdn:localhost:cra
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> MG>in SetAclCommand you can see the acl being parsed and acl being set by 
> setAcl into zk object
> List acl = AclParser.parse(aclStr);
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> Stat stat = zk.setACL(path, acl, version);
> MG>later on in DeleteCommand there is no check for aforementioned acl 
> parameter
>   public boolean exec() throws KeeperException, InterruptedException {
> String path = args[1];
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> zk.delete(path, version);
> } catch(KeeperException.BadVersionException ex) {
> err.println(ex.getMessage());
> }
> return false;
> MG>as seen here the testCase works properly saving the Zookeeper object
> LsCommand entity = new LsCommand();
> entity.setZk(zk);
> MG>but setACL does not save the zookeeper object anywhere but instead seems 
> to discard zookeeper object with accompanying ACLs
> MG>can you report this bug to Zookeeper?
> https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
> ZooKeeper - ASF JIRA - 
> issues.apache.org<https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel>
> issues.apache.org
> Apache ZooKeeper is a service for coordinating processes of distributed 
> applications. Versions: Unreleased. Name Release date; Unreleased 3.2.3 : 
> Unreleased 3.3.7
> MG>Thanks Joe!
> [zk: localhost:2181(CONNECTED) 5] getAcl /test
> 'myfqdn,'localhost
> : cra
> [zk: localhost:2181(CONNECTED) 6] get /testdata
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 7] set /test "testwrite"
> Authentication is not valid : /test
> [zk: localhost:2181(CONNECTED) 8] delete /test
> [zk: localhost:2181(CONNECTED) 9] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 10]
> The auth provider imple is here: 
> http://s000.tinyupload.com/?file_id=42827186839577179157



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Ever considered using buck to build?

2017-05-08 Thread Benjamin Reed

from patrick's response: "My intent at this point is not to replace
anything, just to add the ability to build with buck." this isn't to
move to a boutique build system. patrick is just suggesting that he
can make available the build files we use inside facebook for others
to try.

ben

On Mon, May 8, 2017 at 12:39 PM, Jordan Zimmerman
 wrote:
> I beg you not to move to a boutique build system. Stay in Ant or go to Maven.
>
> 
> Jordan Zimmerman
>
>> On May 8, 2017, at 9:16 PM, Patrick White  wrote:
>>
>>
>> To address some points from over the weekend:
>>
 I thought we were moving to Maven
>>
>>
>> Yep. Nothing needs to change, and this doesn't impede those plans at all.
>>
>>
 Does it work on Jenkins?
>>
>>
>> Again, by no means an expert. I downloaded jenkins and set up a test project 
>> to build with buck. Seems to work?
>>
 Doesn't build a release-style tarball
>>
>>
>> I took a first cut at this yesterday, and was able to build something that 
>> looks similar to the release tarball. There's still some layout matching to 
>> do, but it's moved from 'can it be done?' to 'just needs doing'. I'll keep 
>> chipping away at it.
>>
>> 
>> From: Michael Han 
>> Sent: Friday, May 5, 2017 4:10:03 PM
>> To: dev@zookeeper.apache.org
>> Subject: Re: Ever considered using buck to build?
>>
 I thought we were moving to Maven at some point. Did that get sidelined?
>>
>> I think moving to maven is still the plan and there are definitely lots of
>> interests on this - see ZOOKEEPER-1078
>> >  >
>>
>> On Fri, May 5, 2017 at 3:39 PM, Jordan Zimmerman >> wrote:
>>
>>> I thought we were moving to Maven at some point. Did that get sidelined?
>>>
>>> -Jordan
>>>
 On May 5, 2017, at 6:02 PM, Michael Han  wrote:

 Is this proposal intended to use BUCK to replace ant someday, or just add
 BUCK as an alternative build system? I thought it's not replacing ant,
>>> but
 I want double check, because choosing a build system vs support multiple
 build system are different topics.


> On Fri, May 5, 2017 at 2:52 PM, Patrick White  wrote:
>
> My bad, I'll clarify.
>
>
> Internally, we build and test with buck, but we don't worry about the
> bin,conf,share,etc folders. So it's a thing that is possible (and I'll
> certainly do it if there's interest) we just haven't put effort behind
>>> it
> because... well we don't use it that way.
>
> re: jenkins. u... I'll have to get back to you on that one. (never
> used it, but I'll go download it and see what shakes loose)
>
> 
> From: Camille Fournier 
> Sent: Friday, May 5, 2017 2:11:15 PM
> To: dev@zookeeper.apache.org
> Subject: Re: Ever considered using buck to build?
>
> Did you... Just list as a con that actually it currently won't work?
>
> Does it work on Jenkins?
>
>> On May 5, 2017 4:51 PM, "Patrick White"  wrote:
>>
>> Howdy! I'm Patrick from the core systems team at Facebook, and I work
>>> on
>> ZooKeeper and ZooKeeper accessories all day long.
>>
>> Proposal: I want to add BUCK files to the zookeeper source tree.
>>
>>
>> Hear me out:
>>
>> TL; DR - I want to hear everyone's thoughts and opinions on the matter.
>>
>>
>> At Facebook, we use buck (buckbuild.com) to build everything. Buck
>>> turns
>> out to be a really nice build system. It's easy to set up and super
> fast. I
>> love buck.
>>
>>
>> Ben put together some nice BUCK files that we use internally to build
>> zookeeper and zkcli. Since we're already working to sync back with
>> upstream, we'd love to get them in.
>>
>>
>> Pros:
>>
>> Buck files are a lot easier to work with than maven, ant, or anything
> else
>>
>> Buck's fast
>>
>> These files do absolutely nothing for or against people who want to use
>> maven or ant
>>
>> 'java_binary' generates a single executable file containing all the
>>> jars
>>
>>
>> Cons:
>>
>> Not one of the "conventional" java build systems
>>
>> BUCK files laying around are just trash for people not interested in
>>> them
>>
>> Doesn't currently generate the typical layout of bin, conf, share, etc.
>>
>> - *currently*, it could probably be done
>>
>>
>> Thanks,
>>
>> Patrick
>>
>>
>



 --
 Cheers
 Michael.
>>>
>>>
>>
>>
>> --
>>

Re: Ever considered using buck to build?

2017-05-08 Thread Benjamin Reed

part of the reason we haven't moved to maven (this is supposition
since i have not been involved in the decision at all) is that it
doesn't buy that much over ant. both maven and ant are complex and
slow. the one thing maven has is the repository for dependencies.

i have been using buck for a while and i really like it. you can use
the maven repos without having to use maven. the gerritt devs are also
fans: http://gerrit-talks.commondatastorage.googleapis.com/buck-rant.html#1

it hasn't reached critical mass though, so i'm not sure we want to
move exclusively to buck, but having files there for people to try
could be useful.

ben

On Fri, May 5, 2017 at 4:10 PM, Michael Han  wrote:
>>> I thought we were moving to Maven at some point. Did that get sidelined?
>
> I think moving to maven is still the plan and there are definitely lots of
> interests on this - see ZOOKEEPER-1078
> 
>
> On Fri, May 5, 2017 at 3:39 PM, Jordan Zimmerman > wrote:
>
>> I thought we were moving to Maven at some point. Did that get sidelined?
>>
>> -Jordan
>>
>> > On May 5, 2017, at 6:02 PM, Michael Han  wrote:
>> >
>> > Is this proposal intended to use BUCK to replace ant someday, or just add
>> > BUCK as an alternative build system? I thought it's not replacing ant,
>> but
>> > I want double check, because choosing a build system vs support multiple
>> > build system are different topics.
>> >
>> >
>> > On Fri, May 5, 2017 at 2:52 PM, Patrick White  wrote:
>> >
>> >> My bad, I'll clarify.
>> >>
>> >>
>> >> Internally, we build and test with buck, but we don't worry about the
>> >> bin,conf,share,etc folders. So it's a thing that is possible (and I'll
>> >> certainly do it if there's interest) we just haven't put effort behind
>> it
>> >> because... well we don't use it that way.
>> >>
>> >> re: jenkins. u... I'll have to get back to you on that one. (never
>> >> used it, but I'll go download it and see what shakes loose)
>> >>
>> >> 
>> >> From: Camille Fournier 
>> >> Sent: Friday, May 5, 2017 2:11:15 PM
>> >> To: dev@zookeeper.apache.org
>> >> Subject: Re: Ever considered using buck to build?
>> >>
>> >> Did you... Just list as a con that actually it currently won't work?
>> >>
>> >> Does it work on Jenkins?
>> >>
>> >> On May 5, 2017 4:51 PM, "Patrick White"  wrote:
>> >>
>> >>> Howdy! I'm Patrick from the core systems team at Facebook, and I work
>> on
>> >>> ZooKeeper and ZooKeeper accessories all day long.
>> >>>
>> >>> Proposal: I want to add BUCK files to the zookeeper source tree.
>> >>>
>> >>>
>> >>> Hear me out:
>> >>>
>> >>> TL; DR - I want to hear everyone's thoughts and opinions on the matter.
>> >>>
>> >>>
>> >>> At Facebook, we use buck (buckbuild.com) to build everything. Buck
>> turns
>> >>> out to be a really nice build system. It's easy to set up and super
>> >> fast. I
>> >>> love buck.
>> >>>
>> >>>
>> >>> Ben put together some nice BUCK files that we use internally to build
>> >>> zookeeper and zkcli. Since we're already working to sync back with
>> >>> upstream, we'd love to get them in.
>> >>>
>> >>>
>> >>> Pros:
>> >>>
>> >>> Buck files are a lot easier to work with than maven, ant, or anything
>> >> else
>> >>>
>> >>> Buck's fast
>> >>>
>> >>> These files do absolutely nothing for or against people who want to use
>> >>> maven or ant
>> >>>
>> >>> 'java_binary' generates a single executable file containing all the
>> jars
>> >>>
>> >>>
>> >>> Cons:
>> >>>
>> >>> Not one of the "conventional" java build systems
>> >>>
>> >>> BUCK files laying around are just trash for people not interested in
>> them
>> >>>
>> >>> Doesn't currently generate the typical layout of bin, conf, share, etc.
>> >>>
>> >>>  - *currently*, it could probably be done
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Patrick
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Cheers
>> > Michael.
>>
>>
>
>
> --
> Cheers
> Michael.

[jira] [Commented] (ZOOKEEPER-2748) Admin command to voluntarily drop client connections

2017-04-11 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964730#comment-15964730
 ] 

Benjamin Reed commented on ZOOKEEPER-2748:
--

we do this using the JMX interface. i think that is better since you avoid 
security issues. well at least you push the security issues to JMX

> Admin command to voluntarily drop client connections
> 
>
> Key: ZOOKEEPER-2748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2748
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Reporter: Marco P.
>Assignee: Marco P.
>Priority: Minor
>
> In certain circumstances, it would be useful to be able to move clients from 
> one server to another.
> One example: a quorum that consists of 3 servers (A,B,C) with 1000 active 
> client session, where 900 clients are connected to server A, and the 
> remaining 100 are split over B and C (see example below for an example of how 
> this can happen).
> A will do a lot more work than B, C. 
> Overall throughput will benefit by having the clients more evenly divided.
> In case of A failure, all its client will create an avalanche by migrating en 
> masse to a different server.
> There are other possible use cases for a mechanism to move clients: 
>  - Migrate away all clients before a server restart
>  - Migrate away part of clients in response to runtime metrics (CPU/Memory 
> usage, ...)
>  - Shuffle clients after adding more server capacity (i.e. adding Observer 
> nodes)
> The simplest form of rebalancing which does not require major changes of 
> protocol or client code consists of requesting a server to voluntarily drop 
> some number of connections.
> Clients should be able to transparently move to a different server.
> Patch introducing 4-letter commands to shed clients:
> https://github.com/apache/zookeeper/pull/215
> -- -- --
> How client imbalance happens in the first place, an example.
> Imagine servers A, B, C and 1000 clients connected.
> Initially clients are spread evenly (i.e. 333 clients per server).
> A: 333 (restarts: 0)
> B: 333 (restarts: 0)
> C: 334 (restarts: 0)
> Now restart servers a few times, always in A, B, C order (e.g. to pick up a 
> software upgrades or configuration changes).
> Restart A:
> A: 0 (restarts: 1)
> B: 499 (restarts: 0)
> C: 500 (restarts: 0)
> Restart B:
> A: 250 (restarts: 1)
> B: 0 (restarts: 1)
> C: 750 (restarts: 0)
> Restart C:
> A: 625 (restarts: 1)
> B: 375 (restarts: 1)
> C: 0 (restarts: 1)
> The imbalance is pretty bad already. C is idle while A has a lot of work.
> A second round of restarts makes the situation even worse:
> Restart A:
> A: 0 (restarts: 2)
> B: 688 (restarts: 1)
> C: 313 (restarts: 1)
> Restart B:
> A: 344 (restarts: 2)
> B: 657 (restarts: 1)
> C: 0 (restarts: 1)
> Restart C:
> A: 673 (restarts: 2)
> B: 328 (restarts: 1)
> C: 0 (restarts: 1)
> Large cluster (5, 7, 9 servers) make the imbalance even more evident.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ZOOKEEPER-2693) DOS attack on wchp/wchc four letter words (4lw)

2017-03-16 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929091#comment-15929091
 ] 

Benjamin Reed commented on ZOOKEEPER-2693:
--

can someone put a good link to the exploit in the description? a cache isn't an 
appropriate link to use.

> DOS attack on wchp/wchc four letter words (4lw)
> ---
>
> Key: ZOOKEEPER-2693
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2693
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security, server
>Affects Versions: 3.4.0, 3.5.1, 3.5.2
>Reporter: Patrick Hunt
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2693-01.patch
>
>
> The wchp/wchc four letter words can be exploited in a DOS attack on the ZK 
> client port - typically 2181. The following POC attack was recently published 
> on the web:
> https://webcache.googleusercontent.com/search?q=cache:_CNGIz10PRYJ:https://www.exploit-db.com/exploits/41277/+=14=en=clnk=us
> The most straightforward way to block this attack is to not allow access to 
> the client port to non-trusted clients - i.e. firewall the ZooKeeper service 
> and only allow access to trusted applications using it for coordination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

using java 8 code

2017-03-06 Thread Benjamin Reed

have people given thought about moving to java8? i'm really not that
excited about moving to java8, but we are running into libraries that
we want to use that are compiled with java8.

ben

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2017-02-19 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873747#comment-15873747
 ] 

Benjamin Reed commented on ZOOKEEPER-2184:
--

another option would be to have a  background worker that periodically wakes up 
and re-resolves hosts every few minutes. if we ever get a connection failure we 
could use that to kick the background worker to run right away.

> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.5.0
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>  Labels: easyfix, patch
> Fix For: 3.5.3, 3.4.11
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (ZOOKEEPER-27) Unique DB identifiers for servers and clients

2017-02-06 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed reassigned ZOOKEEPER-27:
--

 Assignee: (was: Mahadev konar)
Fix Version/s: (was: 3.0.0)
   3.6.0

> Unique DB identifiers for servers and clients
> -
>
> Key: ZOOKEEPER-27
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-27
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Reporter: Patrick Hunt
> Fix For: 3.6.0
>
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547
> here is the text from sourceforge:
> There should be a persistent unique identifier for an instance of ZooKeeper. 
> Currently, if you bring a cluster down without stopping clients and 
> reinitialize the servers, the servers will start logging client zxid errors 
> because the clients have seen a later transaction than the server has. In 
> reality the clients should detect that they are now talking to a new instance 
> of the database and close the session.
> A similar problem occurs when a server fails in a cluster of three machines, 
> and the other two machines are reinitialized and restarted. If the failed 
> machine starts up again, there is a chance that the old machine may get 
> elected leader (since it will have the highest zxid) and overwrite new data.
> A unique random id should probably get generated when a new cluster comes up. 
> (It is easy to detect since the zxid will be zero.) Leader Election and the 
> Leader should validate that the peers have the same database id. Clients 
> should also validate that they are talking to servers with the same database 
> id during a session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ZOOKEEPER-27) Unique DB identifiers for servers and clients

2017-02-06 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854717#comment-15854717
 ] 

Benjamin Reed commented on ZOOKEEPER-27:


had the joy of running into this problem today. this issue was prematurely 
closed.

> Unique DB identifiers for servers and clients
> -
>
> Key: ZOOKEEPER-27
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-27
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Reporter: Patrick Hunt
>Assignee: Mahadev konar
> Fix For: 3.0.0
>
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547
> here is the text from sourceforge:
> There should be a persistent unique identifier for an instance of ZooKeeper. 
> Currently, if you bring a cluster down without stopping clients and 
> reinitialize the servers, the servers will start logging client zxid errors 
> because the clients have seen a later transaction than the server has. In 
> reality the clients should detect that they are now talking to a new instance 
> of the database and close the session.
> A similar problem occurs when a server fails in a cluster of three machines, 
> and the other two machines are reinitialized and restarted. If the failed 
> machine starts up again, there is a chance that the old machine may get 
> elected leader (since it will have the highest zxid) and overwrite new data.
> A unique random id should probably get generated when a new cluster comes up. 
> (It is easy to detect since the zxid will be zero.) Leader Election and the 
> Leader should validate that the peers have the same database id. Clients 
> should also validate that they are talking to servers with the same database 
> id during a session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (ZOOKEEPER-27) Unique DB identifiers for servers and clients

2017-02-06 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-27:
---
Description: 
Moved from SourceForge to Apache.
http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547

here is the text from sourceforge:

There should be a persistent unique identifier for an instance of ZooKeeper. 
Currently, if you bring a cluster down without stopping clients and 
reinitialize the servers, the servers will start logging client zxid errors 
because the clients have seen a later transaction than the server has. In 
reality the clients should detect that they are now talking to a new instance 
of the database and close the session.

A similar problem occurs when a server fails in a cluster of three machines, 
and the other two machines are reinitialized and restarted. If the failed 
machine starts up again, there is a chance that the old machine may get elected 
leader (since it will have the highest zxid) and overwrite new data.

A unique random id should probably get generated when a new cluster comes up. 
(It is easy to detect since the zxid will be zero.) Leader Election and the 
Leader should validate that the peers have the same database id. Clients should 
also validate that they are talking to servers with the same database id during 
a session.

  was:
Moved from SourceForge to Apache.
http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547


> Unique DB identifiers for servers and clients
> -
>
> Key: ZOOKEEPER-27
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-27
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Reporter: Patrick Hunt
>Assignee: Mahadev konar
> Fix For: 3.0.0
>
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547
> here is the text from sourceforge:
> There should be a persistent unique identifier for an instance of ZooKeeper. 
> Currently, if you bring a cluster down without stopping clients and 
> reinitialize the servers, the servers will start logging client zxid errors 
> because the clients have seen a later transaction than the server has. In 
> reality the clients should detect that they are now talking to a new instance 
> of the database and close the session.
> A similar problem occurs when a server fails in a cluster of three machines, 
> and the other two machines are reinitialized and restarted. If the failed 
> machine starts up again, there is a chance that the old machine may get 
> elected leader (since it will have the highest zxid) and overwrite new data.
> A unique random id should probably get generated when a new cluster comes up. 
> (It is easy to detect since the zxid will be zero.) Leader Election and the 
> Leader should validate that the peers have the same database id. Clients 
> should also validate that they are talking to servers with the same database 
> id during a session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: zookeeper git not getting bridged to github

2017-01-13 Thread Benjamin Reed

ok created https://issues.apache.org/jira/browse/INFRA-13331

On Fri, Jan 13, 2017 at 5:44 PM, Michael Han <h...@cloudera.com> wrote:
> Other Apache mirrors work fine (checked HBase, Kafka, some others), so it
> is unlikely a foundational infrastructure issue - might be ZK specific.
>
> On Fri, Jan 13, 2017 at 3:47 PM, Benjamin Reed <br...@apache.org> wrote:
>
>> i committed ZOOKEEPER-261 pull request to apache git using the perl
>> script. my .netrc wasn't setup properly and then i had a problem
>> trying to rerun with the failed branches in place. then i started
>> getting very weird errors at the last step of the script and realized
>> that the patch had already been uploaded, so i closed the jira
>> manually and waited for the sync to github to happen, but it hasn't.
>>
>> i'll open an INFRA jira, but before that i thought i would check to
>> see if anyone else sees anything obviously wrong.
>>
>> tl;dr https://git-wip-us.apache.org/repos/asf/zookeeper.git and
>> https://github.com/apache/zookeeper don't match.
>>
>
>
>
> --
> Cheers
> Michael.

zookeeper git not getting bridged to github

2017-01-13 Thread Benjamin Reed

i committed ZOOKEEPER-261 pull request to apache git using the perl
script. my .netrc wasn't setup properly and then i had a problem
trying to rerun with the failed branches in place. then i started
getting very weird errors at the last step of the script and realized
that the patch had already been uploaded, so i closed the jira
manually and waited for the sync to github to happen, but it hasn't.

i'll open an INFRA jira, but before that i thought i would check to
see if anyone else sees anything obviously wrong.

tl;dr https://git-wip-us.apache.org/repos/asf/zookeeper.git and
https://github.com/apache/zookeeper don't match.

[jira] [Resolved] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election

2017-01-13 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-261.
-
Resolution: Fixed

committed to master

> Reinitialized servers should not participate in leader election
> ---
>
> Key: ZOOKEEPER-261
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection, quorum
>    Reporter: Benjamin Reed
>
> A server that has lost its data should not participate in leader election 
> until it has resynced with a leader. Our leader election algorithm and 
> NEW_LEADER commit assumes that the followers voting on a leader have not lost 
> any of their data. We should have a flag in the data directory saying whether 
> or not the data is preserved so that the the flag will be cleared if the data 
> is ever cleared.
> Here is the problematic scenario: you have have ensemble of machines A, B, 
> and C. C is down. the last transaction seen by C is z. a transaction, z+1, is 
> committed on A and B. Now there is a power outage. B's data gets 
> reinitialized. when power comes back up, B and C comes up, but A does not. C 
> will be elected leader and transaction z+1 is lost. (note, this can happen 
> even if all three machines are up and C just responds quickly. in that case C 
> would tell A to truncate z+1 from its log.) in theory we haven't violated our 
> 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, 
> but it would be nice if when we don't have quorum that system stops working 
> rather than works incorrectly if we lose quorum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-15 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752196#comment-15752196
 ] 

Benjamin Reed commented on ZOOKEEPER-2325:
--

[~rgs] can you commit this? we need it to get ZOOKEEPER-261 in.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2640) fix test coverage for single threaded C-API

2016-12-04 Thread Benjamin Reed (JIRA)

Benjamin Reed created ZOOKEEPER-2640:


 Summary: fix test coverage for single threaded C-API
 Key: ZOOKEEPER-2640
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2640
 Project: ZooKeeper
  Issue Type: Test
  Components: c client, tests
Reporter: Benjamin Reed


the tests for the C-API are mostly for the multithreaded API. we need to get 
better coverage for the single threaded API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: committing doc changes

2016-11-30 Thread Benjamin Reed

we could also build the doc as part of the tests.

On Wed, Nov 30, 2016 at 3:26 PM, Flavio Junqueira <f...@apache.org> wrote:
> As part of the release process, we only copy the documentation, see it here:
>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToRelease 
> <https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToRelease>
>
> I think the reason we have gone this way is to avoid issues compiling the 
> documentation at the time that we are preparing a release candidate or after 
> voting on a release candidate. We could for sure build the documentation 
> right before generating the first rc for a release and create blocker jiras 
> in the case there is any issue.
>
> -Flavio
>
>> On 30 Nov 2016, at 23:12, Benjamin Reed <br...@apache.org> wrote:
>>
>> yeah, that's a deeper question. pat or flavio can correct me on this,
>> but i think the reason we check it in is so that the website's "trunk"
>> documentation will work. now that we moved to git, i don't thing it
>> works though... i also would just like to only build it when we do
>> releases.
>>
>> On Wed, Nov 30, 2016 at 2:24 PM, Jordan Zimmerman
>> <jor...@jordanzimmerman.com> wrote:
>>> I wondered about that myself. Why bother building the docs? Isn’t that only 
>>> needed for packaging/deployment? It ends up making PRs ugly because you 
>>> have all the unnecessary docs in the diff.
>>>
>>> -Jordan
>>>
>>>> On Nov 30, 2016, at 11:23 PM, Benjamin Reed <br...@apache.org> wrote:
>>>>
>>>> when we commit pull requests with doc changes, i think we should
>>>> commit the generated doc as a separate commit. what do you all think?
>>>> i would like to do that to keep the change from the contributors
>>>> pristine :) and i think it simplifies things a bit.
>>>>
>>>> ben
>>>
>

Re: committing doc changes

2016-11-30 Thread Benjamin Reed

yeah, that's a deeper question. pat or flavio can correct me on this,
but i think the reason we check it in is so that the website's "trunk"
documentation will work. now that we moved to git, i don't thing it
works though... i also would just like to only build it when we do
releases.

On Wed, Nov 30, 2016 at 2:24 PM, Jordan Zimmerman
<jor...@jordanzimmerman.com> wrote:
> I wondered about that myself. Why bother building the docs? Isn’t that only 
> needed for packaging/deployment? It ends up making PRs ugly because you have 
> all the unnecessary docs in the diff.
>
> -Jordan
>
>> On Nov 30, 2016, at 11:23 PM, Benjamin Reed <br...@apache.org> wrote:
>>
>> when we commit pull requests with doc changes, i think we should
>> commit the generated doc as a separate commit. what do you all think?
>> i would like to do that to keep the change from the contributors
>> pristine :) and i think it simplifies things a bit.
>>
>> ben
>

committing doc changes

2016-11-30 Thread Benjamin Reed

when we commit pull requests with doc changes, i think we should
commit the generated doc as a separate commit. what do you all think?
i would like to do that to keep the change from the contributors
pristine :) and i think it simplifies things a bit.

ben

[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-29 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706785#comment-15706785
 ] 

Benjamin Reed commented on ZOOKEEPER-2325:
--

hey andrew, i've merged all the patches into a pull request. can you take a 
look and make sure everything looks ok?

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Apache ZooKeeper Meetup - draft Community Update

2016-11-16 Thread Benjamin Reed

looks good!

On Wed, Nov 16, 2016 at 2:20 PM, Patrick Hunt  wrote:
> Hi folks. Ben asked me to give a quick 5-10 minute overview at the meetup
> tomorrow giving insight into progress over the last year. Please see the
> linked slides that I put together. If you have any comments or thoughts wrt
> things that should be highlighted please respond to me directly. I tried to
> review jira/etc... but I'm sure I missed some important changes. Thanks!
>
> https://docs.google.com/presentation/d/1aElXcVPNng60BEpRV6qj3j6fFjScK3xSj8_e02e-lFI
>
> Patrick

november 17 zk meetup (additional registration info required)

2016-11-16 Thread Benjamin Reed

sorry for the late notice, but we have found out from security that we
will need an email and phone number to get you in without the usual
visitor NDA. if you are coming, please send an email to acon...@fb.com
with that information by tonight.

if you forget to do this or decide to come at the last minute, you can
still attend, you will just need to sign the normal visitor NDA.

also, please be sure to bring a photo id.

hope to see you there!
ben

ps. the event info is at https://www.facebook.com/events/1228722650504268/

[jira] [Commented] (ZOOKEEPER-2014) Only admin should be allowed to reconfig a cluster

2016-11-07 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645962#comment-15645962
 ] 

Benjamin Reed commented on ZOOKEEPER-2014:
--

shall i commit it or are we waiting on something else?


> Only admin should be allowed to reconfig a cluster
> --
>
> Key: ZOOKEEPER-2014
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2014
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch
>
>
> ZOOKEEPER-107 introduces reconfiguration support via the reconfig() call. We 
> should, at the very least, ensure that only the Admin can reconfigure a 
> cluster. Perhaps restricting access to /zookeeper/config as well, though this 
> is debatable. Surely one could ensure Admin only access via an ACL, but that 
> would leave everyone who doesn't use ACLs unprotected. We could also force a 
> default ACL to make it a bit more consistent (maybe).
> Finally, making reconfig() only available to Admins means they have to run 
> with zookeeper.DigestAuthenticationProvider.superDigest (which I am not sure 
> if everyone does, or how would it work with other authentication providers). 
> Review board https://reviews.apache.org/r/51546/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2014) Only admin should be allowed to reconfig a cluster

2016-11-03 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635265#comment-15635265
 ] 

Benjamin Reed commented on ZOOKEEPER-2014:
--

ah that makes sense. i didn't dig deep enough :) it is sad that an exception 
"that should never happen" has such a big impact on the code. shouldn't we have 
thrown a runtime exception? i think it would have eliminated a lot of this 
patch...

this is just an observation not a vote :)

> Only admin should be allowed to reconfig a cluster
> --
>
> Key: ZOOKEEPER-2014
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2014
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch
>
>
> ZOOKEEPER-107 introduces reconfiguration support via the reconfig() call. We 
> should, at the very least, ensure that only the Admin can reconfigure a 
> cluster. Perhaps restricting access to /zookeeper/config as well, though this 
> is debatable. Surely one could ensure Admin only access via an ACL, but that 
> would leave everyone who doesn't use ACLs unprotected. We could also force a 
> default ACL to make it a bit more consistent (maybe).
> Finally, making reconfig() only available to Admins means they have to run 
> with zookeeper.DigestAuthenticationProvider.superDigest (which I am not sure 
> if everyone does, or how would it work with other authentication providers). 
> Review board https://reviews.apache.org/r/51546/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2014) Only admin should be allowed to reconfig a cluster

2016-11-03 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634422#comment-15634422
 ] 

Benjamin Reed commented on ZOOKEEPER-2014:
--

just a curious observer: why are we propagating the NoNodeException everywhere? 
i wasn't clear from the patch why that suddenly popped up as part of the change.

> Only admin should be allowed to reconfig a cluster
> --
>
> Key: ZOOKEEPER-2014
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2014
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch
>
>
> ZOOKEEPER-107 introduces reconfiguration support via the reconfig() call. We 
> should, at the very least, ensure that only the Admin can reconfigure a 
> cluster. Perhaps restricting access to /zookeeper/config as well, though this 
> is debatable. Surely one could ensure Admin only access via an ACL, but that 
> would leave everyone who doesn't use ACLs unprotected. We could also force a 
> default ACL to make it a bit more consistent (maybe).
> Finally, making reconfig() only available to Admins means they have to run 
> with zookeeper.DigestAuthenticationProvider.superDigest (which I am not sure 
> if everyone does, or how would it work with other authentication providers). 
> Review board https://reviews.apache.org/r/51546/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-11-03 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632820#comment-15632820
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

i would love to see ZOOKEEPER-22 fixed, but i don't think it will be fixed 
anytime soon. (it would be awesome to be surprised though :)

@diego perhaps you could implement your idea in your go client implementation 
and propose it again if it works out well? i like the getConnection proposal.

> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. 
> ZKConnection would wrap a TCP connection. It would include all synchronous 
> and asynchronous operations currently defined on the ZooKeeper class. Upon a 
> connection loss on a ZKConnection, all subsequent operations on the same 
> ZKConnection would return a Connection Loss e

[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-11-03 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632811#comment-15632811
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

multi will handle some of the use cases, but a simple one that it doesn't 
handle is if you want to implement swap:

zk.getData(znode, ...)
zk.setData(znode, ...)

you can't do that with multi (and i don't think we should extend multi to do it 
:)

mutli also doesn't handle the case when you are updating lots of data and would 
go over max packet size.



> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. 
> ZKConnection would wrap a TCP connection. It would include all synchronous 
> and asynchronous operations currently defined on the ZooKeeper class. Upon a 
> connection loss on a ZKConnection, all subsequent operatio

[jira] [Commented] (ZOOKEEPER-2623) CheckVersion outside of Multi causes NullPointerException

2016-11-03 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632776#comment-15632776
 ] 

Benjamin Reed commented on ZOOKEEPER-2623:
--

i agree that we should handle this gracefully :) we should fix this.

> CheckVersion outside of Multi causes NullPointerException
> -
>
> Key: ZOOKEEPER-2623
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2623
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>Priority: Minor
>
> I wasn't sure if check version (opcode 13) was permitted outside of a multi 
> op, so I tried it. My server crashed with a NullPointerException and became 
> unusable until restarted. I guess it's not allowed, but perhaps the server 
> should handle this more gracefully?
> Here are the server logs:
> {noformat}
> Accepted socket connection from /0:0:0:0:0:0:0:1:51737
> Session establishment request from client /0:0:0:0:0:0:0:1:51737 client's 
> lastZxid is 0x0
> Connection request from old client /0:0:0:0:0:0:0:1:51737; will be dropped if 
> server is in r-o mode
> Client attempting to establish new session at /0:0:0:0:0:0:0:1:51737
> :Fsessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0xfffe txntype:unknown reqpath:n/a
> Processing request:: sessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0xfffe txntype:unknown reqpath:n/a
> Got zxid 0x6065e expected 0x1
> Creating new log file: log.6065e
> Committing request:: sessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0x6065e txntype:-10 reqpath:n/a
> Processing request:: sessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0x6065e txntype:-10 reqpath:n/a
> :Esessionid:0x10025651faa type:createSession cxid:0x0 zxid:0x6065e 
> txntype:-10 reqpath:n/a
> sessionid:0x10025651faa type:createSession cxid:0x0 zxid:0x6065e 
> txntype:-10 reqpath:n/a
> Add a buffer to outgoingBuffers, sk sun.nio.ch.SelectionKeyImpl@28e9f397 is 
> valid: true
> Established session 0x10025651faa with negotiated timeout 2 for 
> client /0:0:0:0:0:0:0:1:51737
> :Fsessionid:0x10025651faa type:check cxid:0x1 zxid:0xfffe 
> txntype:unknown reqpath:/
> Processing request:: sessionid:0x10025651faa type:check cxid:0x1 
> zxid:0xfffe txntype:unknown reqpath:/
> Processing request:: sessionid:0x10025651faa type:check cxid:0x1 
> zxid:0xfffe txntype:unknown reqpath:/
> Exception causing close of session 0x10025651faa: Connection reset by peer
> :Esessionid:0x10025651faa type:check cxid:0x1 zxid:0xfffe 
> txntype:unknown reqpath:/
> IOException stack trace
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:320)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:530)
>   at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:162)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Unexpected exception
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.ZKDatabase.addCommittedProposal(ZKDatabase.java:252)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:127)
>   at 
> org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:362)
>   at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:162)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Committing request:: sessionid:0x10025651faa type:error cxid:0x1 
> zxid:0x6065f txntype:-1 reqpath:n/a
> Unregister MBean 
> [org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=Connections

[jira] [Commented] (ZOOKEEPER-2592) Zookeeper is not recoverable once running system( machine on which zookeeper is running) is out of space

2016-11-03 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632763#comment-15632763
 ] 

Benjamin Reed commented on ZOOKEEPER-2592:
--

it sounds like we should close this as a duplicate. right?

> Zookeeper is not recoverable once running system( machine on which zookeeper 
> is running) is out of space
> 
>
> Key: ZOOKEEPER-2592
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2592
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
>Priority: Critical
>
> Zookeeper is not recoverable once running system( machine on which zookeeper 
> is running) is out of space 
> Steps to reproduce:-
> 1. Install zookeeper on standalone mode and start zookeeper
> 2. Make the machine physical memory full
> 3. Connect through client to zookeeper and trying create some znodes with 
> some data.
> 4. After sometime creating further znode will not happened as complete memory 
> is occupied
> 5. Now start creating space in that machine
> 6. Again connect through a client. Connection is fine. Now try to execute any 
> command like "ls / " it fails even though now space is more than 11gb
> Client log:-
> BLR107042:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin 
> # df -h
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda2   36G   24G   11G  70% /
> udev1.9G  116K  1.9G   1% /dev
> tmpfs   1.9G 0  1.9G   0% /dev/shm
> BLR107042:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin 
> # ./zkCli.sh
> Connecting to localhost:2181
> 2016-09-19 22:50:20,227 [myid:] - INFO  [main:Environment@109] - Client 
> environment:zookeeper.version=3.5.1-alpha--1, built on 08/18/2016 08:20 GMT
> 2016-09-19 22:50:20,231 [myid:] - INFO  [main:Environment@109] - Client 
> environment:host.name=BLR107042
> 2016-09-19 22:50:20,231 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.version=1.7.0_79
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.vendor=Oracle Corporation
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.home=/usr/java/jdk1.7.0_79/jre
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.class.path=/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/classes:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/lib/*.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-log4j12-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-api-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/servlet-api-2.5-20081211.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/netty-3.7.0.Final.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/log4j-1.2.16.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jline-2.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-util-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/javacc.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-mapper-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-core-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/commons-cli-1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../zookeeper-3.5.1-alpha.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../src/java/lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../conf:/usr/java/jdk1.7.0_79/lib
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.io.tmpdir=/tmp
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.compiler=
> 2016-09-19 22:50:20,235 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.name=Linux
> 2016-09-19 22:50:20,235 [myid:] - INFO  [main:Enviro

november 17 zk meetup reminder

2016-10-31 Thread Benjamin Reed

this is just a reminder that we have a zookeeper meetup at facebook on
november 17. note that usually you have to sign an NDA to get into
facebook, but for the meetup we are giving security a list of people
for the event so that they can get in without an NDA. that's why we
need you to mark you are going ahead of time.

also, please email acon...@fb.com if you'd like to present, so that he
can put the agenda together.

for event details and to indicate you are going please visit
https://www.facebook.com/events/1228722650504268

hope to see you there
ben

Re: QA github pre-commit queue

2016-10-27 Thread Benjamin Reed

i also pushed a new version for
https://issues.apache.org/jira/browse/ZOOKEEPER-761 although that one might
be tricky since there are attached patches and a pr. should the pr still be
qaed?


On Thu, Oct 27, 2016 at 12:04 PM, Michael Han  wrote:

> Created PR94 to ZOOKEEPER-2014. It's been 2 hours, and no QA bot activity.
>
> On Thu, Oct 27, 2016 at 9:05 AM, Flavio Junqueira  wrote:
>
> > Ok, I have created this queue: PreCommit-ZOOKEEPER-github-pr-build. I
> > have configured it and would kindly appreciate if anyone could update a
> PR
> > to test it.
> >
> > -Flavio
> >
> >
> > > On 27 Oct 2016, at 16:57, Edward Ribeiro 
> > wrote:
> > >
> > > Cool! Thanks for the heads up. :)
> > >
> > > Cheers
> > >
> > > Em 27 de out de 2016 1:56 PM, "Flavio Junqueira" 
> > escreveu:
> > >
> > >> There is no need to create an INFRA jira, I'm taking care of it, stay
> > >> tuned. In the meanwhile, please submit patches as usual through jira
> to
> > >> trigger QA.
> > >>
> > >> -Flavio
> > >>
> > >>> On 27 Oct 2016, at 16:54, Edward Ribeiro 
> > >> wrote:
> > >>>
> > >>> Dear community,
> > >>>
> > >>> As part of the github move, we are still lacking the plumbing that
> > allows
> > >>> to run Jenkins CI tests, etc, on open Pull Requests. Please, take a
> > look
> > >> at
> > >>> Kafka pending PR at Github to see what I am referring to.
> > >>>
> > >>> Any committer could open an INFRA JIRA to address this?
> > >>>
> > >>> Best regards,
> > >>> Eddie
> > >>
> > >>
> >
> >
>
>
> --
> Cheers
> Michael.
>

[jira] [Commented] (ZOOKEEPER-761) Remove synchronous calls from the single-threaded C clieant API, since they are documented not to work

2016-10-27 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612002#comment-15612002
 ] 

Benjamin Reed commented on ZOOKEEPER-761:
-

not yet, i have a mac and tests don't seem to work on a mac. i haven't had a 
chance to test on linux yet.

> Remove *synchronous* calls from the *single-threaded* C clieant API, since 
> they are documented not to work
> --
>
> Key: ZOOKEEPER-761
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.1.1, 3.2.2
> Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
>Reporter: Jozef Hatala
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: fix-sync-apis-in-st-adaptor.patch, 
> fix-sync-apis-in-st-adaptor.v2.patch
>
>
> Since the synchronous calls are 
> [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
>  to be unimplemented in the single threaded version of the client library 
> libzookeeper_st.so, I believe that it would be helpful towards users of the 
> library if that information was also obvious from the header file.
> Anecdotally more than one of us here made the mistake of starting by using 
> the synchronous calls with the single-threaded library, and we found 
> ourselves debugging it.  An early warning would have been greatly appreciated.
> 1. Could you please add warnings to the doxygen blocks of all synchronous 
> calls saying that they are not available in the single-threaded API.  This 
> cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
> header file is included whichever client library implementation one is 
> compiling for.
> 2. Could you please bracket the implementation of all synchronous calls in 
> zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
> are not present in libzookeeper_st.so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607124#comment-15607124
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

btw thanx for the script edward! even though there were problems it made the 
process very easy!

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2597.
--
Resolution: Fixed

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607121#comment-15607121
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

i used the script to commit the pull request. i also followed your 
instructions. here are a couple of things that went wrong:

1) you have the wrong repo line for adding the apache repo. it should be:
{code:borderStyle=solid}
  git remote add apache https://git-wip-us.apache.org/repos/asf/zookeeper.git
{code}

2) when things go bad it doesn't delete the branches it creates. i'm not sure 
if that is a bug or a feature. we should document that you need to remove the 
temporary branches before rerunning the script.

3) the script asks {{List pull request commits in squashed commit message? 
(y/n):}} i think the answer should be {{n}}

4) after the script ran i was very disappointed that the jira integration 
didn't work. we should make sure we run the following before running the script:

{code}
sudo pip install jira
{code}

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-761) Remove synchronous calls from the single-threaded C clieant API, since they are documented not to work

2016-10-25 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605468#comment-15605468
 ] 

Benjamin Reed commented on ZOOKEEPER-761:
-

for some reason my comments/changes do not get bridged. i've updated the pr to 
move zoo_remove_watchers into the #ifdef

> Remove *synchronous* calls from the *single-threaded* C clieant API, since 
> they are documented not to work
> --
>
> Key: ZOOKEEPER-761
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.1.1, 3.2.2
> Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
>Reporter: Jozef Hatala
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: fix-sync-apis-in-st-adaptor.patch, 
> fix-sync-apis-in-st-adaptor.v2.patch
>
>
> Since the synchronous calls are 
> [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
>  to be unimplemented in the single threaded version of the client library 
> libzookeeper_st.so, I believe that it would be helpful towards users of the 
> library if that information was also obvious from the header file.
> Anecdotally more than one of us here made the mistake of starting by using 
> the synchronous calls with the single-threaded library, and we found 
> ourselves debugging it.  An early warning would have been greatly appreciated.
> 1. Could you please add warnings to the doxygen blocks of all synchronous 
> calls saying that they are not available in the single-threaded API.  This 
> cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
> header file is included whichever client library implementation one is 
> compiling for.
> 2. Could you please bracket the implementation of all synchronous calls in 
> zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
> are not present in libzookeeper_st.so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605414#comment-15605414
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

i'll commit at end of day unless someone has an objection.

edward, can you put up the instructions. i'll follow them to do the commit :)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605408#comment-15605408
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

for some reason my comments were not bridged over:

+1

i think there are still quite a few improvements that can be made to this 
script, for example it assumes that the repo 'apache-github' is setup, so it 
would be nice to check for that at the start of the script and then print out 
how to set it up if it isn't setup.

i'm thinking that we should get this checked in and then iterate on it as we 
use it. what do others think?

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-10-24 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604093#comment-15604093
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

thanx diego, you did express well what i was trying to say. i also like your 
proposal. there are probably more details to work out, like how would it look 
for the C api? i like how it encapsulates nicely the relation between a 
sequence of operations, and your example does make a compelling argument for 
also including the sync api.

do we have some applications that we can use to validate the api? it would be 
nice to validate the design before we standardize it.

what i meant by "i think it's a good idea to document this issue in this jira" 
is that it's good that we have this jira to discuss the problem and potential 
solutions.

> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add

[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-10-22 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15598842#comment-15598842
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

i think it's a good idea to document this issue in this jira. it would be 
really nice to surface this to clients in a way that they both realize the 
problem and they have a way to deal with it.

the nice thing about it is that it is a client side issue. the server maintains 
its guarantees. since you are implementing your own client you can actually 
experiment with different ideas.

it sounds to me that getConnection() and reenableOps() are basically the same. 
right? or are you proposing that when you get a ZKConnection object you can 
invoke the zookeeper operations on that?

i think this is really only an issue for async methods, since synchronous 
methods execute ... synchronously, thus one at a time. i kind of like the idea 
of getting a object that only has async methods that you can have a strong 
guarantee of FIFO execution.

one problem i see with reenableOps is that it affects everything using the 
zookeeper handle, not just the ops in question.

> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or schedul

Re: svn ref in build.xml

2016-10-19 Thread Benjamin Reed

yes, thank you edward. that's what i was referring to.

On Wed, Oct 19, 2016 at 11:53 AM, Edward Ribeiro <edward.ribe...@gmail.com>
wrote:

> I guess Ben, may be refereing to these lines in build.xml:
>
> https://github.com/apache/zookeeper/blob/master/build.xml#L315-L341
>
> If so, there's an issue that Arshad opened some time ago that is somewhat
> related to this: https://issues.apache.org/jira/browse/ZOOKEEPER-2573
>
>
>
> On Wed, Oct 19, 2016 at 4:14 PM, Flavio Junqueira <f...@apache.org> wrote:
>
> > What's the problem exactly, Ben? Could you create a jira and propose a
> > patch or at least describe the issue?
> >
> > -Flavio
> >
> > > On 19 Oct 2016, at 18:25, Benjamin Reed <br...@apache.org> wrote:
> > >
> > > build.xml still has some svn stuff in it. can someone fix? pat?
> > >
> > > thanx
> > > ben
> >
> >
>

svn ref in build.xml

2016-10-19 Thread Benjamin Reed

build.xml still has some svn stuff in it. can someone fix? pat?

thanx
ben

[jira] [Commented] (ZOOKEEPER-761) Remove synchronous calls from the single-threaded C clieant API, since they are documented not to work

2016-10-18 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586819#comment-15586819
 ] 

Benjamin Reed commented on ZOOKEEPER-761:
-

this is a pretty good patch to make things start working, but i think we should 
deprecate the single threaded API altogether. what do others think?

concretely, i propose that we had #ifdef THREADED around the sync APIs and also 
add a warning that the non THREADED API is deprecated.

> Remove *synchronous* calls from the *single-threaded* C clieant API, since 
> they are documented not to work
> --
>
> Key: ZOOKEEPER-761
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.1.1, 3.2.2
> Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
>Reporter: Jozef Hatala
>Assignee: Pierre Habouzit
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: fix-sync-apis-in-st-adaptor.patch, 
> fix-sync-apis-in-st-adaptor.v2.patch
>
>
> Since the synchronous calls are 
> [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
>  to be unimplemented in the single threaded version of the client library 
> libzookeeper_st.so, I believe that it would be helpful towards users of the 
> library if that information was also obvious from the header file.
> Anecdotally more than one of us here made the mistake of starting by using 
> the synchronous calls with the single-threaded library, and we found 
> ourselves debugging it.  An early warning would have been greatly appreciated.
> 1. Could you please add warnings to the doxygen blocks of all synchronous 
> calls saying that they are not available in the single-threaded API.  This 
> cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
> header file is included whichever client library implementation one is 
> compiling for.
> 2. Could you please bracket the implementation of all synchronous calls in 
> zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
> are not present in libzookeeper_st.so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: november 17, 2016 zookeeper meetup at facebook

2016-10-18 Thread Benjamin Reed

yes. we'll put up streaming information on the event page when the time
gets closer.

On Tue, Oct 18, 2016 at 5:10 AM, Edward Ribeiro <edward.ribe...@gmail.com>
wrote:

> There will be live streaming? :)
>
> Thanks,
> Edward
>
> On Tue, Oct 18, 2016 at 2:51 AM, Benjamin Reed <br...@apache.org> wrote:
>
> > we would like to invite you all to a zookeeper meetup at the facebook
> > campus on november 17, 2016 starting at 5pm.
> >
> > please use the following link to the facebook event to get details and
> > rsvp.
> >
> > https://www.facebook.com/events/1228722650504268
> >
> > we need you to rsvp to make sure we plan appropriately for food and swag
> > and so you can get past security :)
> >
> > hope to see you there.
> >
> > thanx
> > ben
> >
>

november 17, 2016 zookeeper meetup at facebook

2016-10-17 Thread Benjamin Reed

we would like to invite you all to a zookeeper meetup at the facebook
campus on november 17, 2016 starting at 5pm.

please use the following link to the facebook event to get details and rsvp.

https://www.facebook.com/events/1228722650504268

we need you to rsvp to make sure we plan appropriately for food and swag
and so you can get past security :)

hope to see you there.

thanx
ben

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-11 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567516#comment-15567516
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

cool i opened an infra jira to see if they can turn on the bridging: INFRA-12752

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-11 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567338#comment-15567338
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

are you still working on this edward? do you want me to try and implement the 
changes i suggested?

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-08 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559081#comment-15559081
 ] 

Benjamin Reed edited comment on ZOOKEEPER-2597 at 10/9/16 2:16 AM:
---

no problem. i made some reviews. (i thought that they would be bridged to 
jira...)


was (Author: breed):
no problem. i made some reviews. (i though that they would be bridged to 
jira...)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-08 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559081#comment-15559081
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

no problem. i made some reviews. (i though that they would be bridged to 
jira...)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-08 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558730#comment-15558730
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

did you put up the pull request?

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] move Apache Zookeeper to git

2016-10-07 Thread Benjamin Reed

 include a JIRA ticket ID will trigger an update on that specific
> > > > ticket
> > > > >>
> > > > >> I checked a couple Kafka and Spark JIRAs but I don't see any of
> the
> > > > >> comments made in github PR were posted on JIRA, except the
> > activities
> > > > (open
> > > > >> a PR, close a PR). Since both projects have been using github for
> a
> > > > while I
> > > > >> assume such practice of NOT integrating comments between github
> and
> > > ASF
> > > > >> JIRA is acceptable? Though I feel it would be really useful if
> > > comments
> > > > >> could converge in a single place as well, that will provide a
> clear
> > > > history
> > > > >> for a given technical issue.
> > > > >>
> > > > >> On Tue, Oct 4, 2016 at 12:06 PM, Flavio Junqueira <f...@apache.org
> >
> > > > wrote:
> > > > >>
> > > > >>> Until ZOOKEEPER-2597 <https://issues.apache.org/
> > > > >> jira/browse/ZOOKEEPER-2597>
> > > > >>> is fixed, we can't merge via github.
> > > > >>>
> > > > >>> For code reviews, we can use GH as long as the
> > > > opening/closing/commenting
> > > > >>> all get sent to the mailing list or recorded in jira. I don't
> think
> > > we
> > > > >> have
> > > > >>> that yet, but it is possible according to this:
> > > > >>>
> > > > >>> https://blogs.apache.org/infra/entry/improved_
> > > > >>> integration_between_apache_and <https://blogs.apache.org/
> > > > >>> infra/entry/improved_integration_between_apache_and>
> > > > >>>
> > > > >>> For now, we do need to upload patches and converge using jira.
> > > > >>>
> > > > >>> I think Eddie has been looking at this process trying to
> replicate
> > > the
> > > > >>> Kafka setup, so perhaps he can give an update if I'm right. Kafka
> > > > doesn't
> > > > >>> send every comment to the mailing list, though, but I'm not sure
> if
> > > > >> that's
> > > > >>> acceptable according to the ASF, I need to double-check.
> > > > >>>
> > > > >>> -Flavio
> > > > >>>
> > > > >>>> On 04 Oct 2016, at 19:42, Michael Han <h...@cloudera.com>
> wrote:
> > > > >>>>
> > > > >>>> Hi,
> > > > >>>>
> > > > >>>> Now we've moved to git, what is the policy for uploading patches
> > and
> > > > >>> doing
> > > > >>>> code reviews? I am asking because I've seen recently there are
> git
> > > > pull
> > > > >>>> requests coming in without associated patch file uploaded to
> JIRA.
> > > > I've
> > > > >>>> checked
> > > > >>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/
> > > HowToContribute
> > > > ,
> > > > >>>> looks like there is not much change regarding patch process - so
> > > > >>> presumably
> > > > >>>> we still need to generate and upload patch file to JIRA for the
> > > > record,
> > > > >>>> while using github (maybe in addition of review board, or in the
> > > > future
> > > > >>>> with gerrit) to do code reviews?
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, Sep 21, 2016 at 6:05 AM, Edward Ribeiro <
> > > > >>> edward.ribe...@gmail.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Cool, just open https://issues.apache.org/
> > > jira/browse/ZOOKEEPER-2597
> > > > >>>>>
> > > > >>>>> PS: I removed the REPO_HOME global variable.
> > > > >>>>>
> > > > >>>>> On Wed, Sep 21, 2016 at 6:53 AM, Flavio Junqueira <
> > f...@apache.org>
> > > > >>> wrote:
> > > > >>>>>
> > > > >>>>>> Better to have that in the form of a pull request or diff.

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-07 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556598#comment-15556598
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

great thanx! i have a couple of questions about it and it would be nice to be 
able comment on the diff in github :) i too would like to get this in asap!

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-06 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552617#comment-15552617
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

i'm just starting to look at this script. it's kind of ironic that this isn't a 
pull request ;)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: november meetup at facebook (take 2)

2016-10-03 Thread Benjamin Reed

great. i'll get an official invite setup in the next couple of days. we
will have a VC option for remote people.

thanx
ben

On Sun, Oct 2, 2016 at 8:25 AM, Jordan Zimmerman <randg...@apache.org>
wrote:

> Assuming
>
> On Oct 1, 2016, at 12:00 PM, Flavio P JUNQUEIRA <f...@apache.org> wrote:
>
> +1, great initiative, although it is very likely I won't be there. 
>
> On 1 Oct 2016 02:06, "Michael Han" <h...@cloudera.com> wrote:
>
> +1, thanks!
>
> On Fri, Sep 30, 2016 at 4:25 PM, Rahul R <rahul8...@gmail.com> wrote:
>
> +1 , would love to be a part of it.
>
> Thanks,
> ./Rahul
>
> On Fri, Sep 30, 2016 at 4:09 PM, Alexander Shraer <shra...@gmail.com>
> wrote:
>
> +1 for me too, thanks!
>
> On Fri, Sep 30, 2016 at 3:18 PM, Ryan Zhang <yangzhangr...@hotmail.com
>
>
> wrote:
>
> +1. My coworkers in twitter would be interested.
>
> On Sep 30, 2016, at 2:35 PM, Raúl Gutiérrez Segalés <
>
> r...@itevenworks.net>
>
> wrote:
>
>
> +1 (probably bringing along some people from Pinterest as well).
>
> -rgs
>
> On Sep 30, 2016 2:26 PM, "Marshall McMullen" <
>
> marshall.mcmul...@gmail.com>
>
> wrote:
>
> +1. I would love to attend along with a few of my coworkers and
>
> this
>
> date
>
> works for us.
>
> On Fri, Sep 30, 2016 at 3:10 PM, Benjamin Reed <br...@apache.org>
>
> wrote:
>
>
> facebook would like to host a zookeeper meetup in our offices in
>
> menlo
>
> park, ca on november 17th (a thursday). before sending out an
>
> official
>
> invitation with details about logistics, i thought i would first
>
> do
>
> a
>
> quick
>
> date check and make sure that there isn't a big scheduling
>
> conflict
>
> that
>
> we
>
> didn't notice (like a big election or something like that...).
>
> it's
>
> a
>
> bit
>
> tricky to book facilities here, so we don't have a lot of options
>
> on
>
> dates.
>
>
> would this date work for most people?
>
> thanx
> ben
>
> ps - should i cross post to dev@? i assume that most subscribers
>
> of
>
> user@
>
> also subscribe to dev@
>
>
>
>
>
>
>
>
> --
> Cheers
> Michael.
>

Re: ZooKeeper clients does not handle new error codes properly

2016-10-03 Thread Benjamin Reed

did we bump the protocol version when we added the new errors? the server
could do the conversion when it responds to older clients.

On Mon, Oct 3, 2016 at 3:05 AM, Flavio Junqueira  wrote:

> Hi Arshad,
>
> It makes sense to me. What if we convert unknown server errors to
> KeeperException.SystemErrorException? This is a generic error and it
> extends KeeperException.
>
> I don't see it as a big issue to make this change, but others may feel
> differently. If we do it, then we will need a release note pointing out the
> change of behavior.
>
> -Flavio
>
> > On 03 Oct 2016, at 08:54, Mohammad arshad 
> wrote:
> >
> > Hi All,
> > In Zookeeper rolling upgrade scenario where server is new but client is
> old, when sever sends error code which is not understood by a client,
> client throws IllegalArgumentException. Generally IllegalArgumentException
> is not handled by any of the ZK applications. It is too generic. How to
> handle this scenario in ZK applications?
> > My understanding is instead of throwing IllegalArgumentException we
> should throw a subclass of KeeperException, for example
> InvalidErrorCodeException, so that zk apps can take more specific action.
> > Any thoughts?
> >
> > Thanks
> > -Arshad
> >
>
>

Re: ZooKeeper ping requests Unnecessarily go though request processor chain

2016-10-03 Thread Benjamin Reed

yes, vitalii is correct. the ping is a mutual test of health, so we want it
to go through the full pipeline.

On Mon, Oct 3, 2016 at 5:26 AM, Vitalii Tymchyshyn  wrote:

> Hi.
>
> I think this would break ordering guarantie, would not it?
> Also ping is supposed to test health and I am not sure why do you want to
> skip testing part of the flow? Does it incur high load?
> What would happen if disk would stall for a minute?
>
> Best regards, Vitalii Tymchyshyn
>
> Пн, 3 жовт. 2016 05:11 користувач Mohammad arshad <
> mohammad.ars...@huawei.com> пише:
>
> > Hi All
> > ZooKeeper clients send ping request(heartbeat) to ZooKeeper server to
> keep
> > its session alive. These ping requests do nothing but touch its session
> on
> > the server.
> >
> > If client is connected to a follower then the ping request is processed
> in
> > sequence of ServerCnxn --> ZooKeeperServer --> FollowerRequestProcessor
> -->
> > CommitProcessor --> FinalRequestProcessor. The ping request will wait in
> > CommitProcessor for previous request completion. This wait for ping
> request
> > is unnecessary. I think it offers no benefit.
> >
> > is ping request doing more than touching its session? I think it is only
> > touching its session, not doing anything else.
> > If this is the case we should process the ping request differently from
> > the other requests. It should be  treated as system request and should be
> > processed with higher priority. May be we can process in the sequence of
> > ServerCnxn --> ZooKeeperServer --> PingRequestProcessor
> >
> > Any thought?
> >
> > Thanks
> > -Arshad
> >
> >
>

[jira] [Resolved] (ZOOKEEPER-2600) dangling ephemerals on overloaded server with local sessions

2016-09-24 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2600.
--
Resolution: Cannot Reproduce

> dangling ephemerals on overloaded server with local sessions
> 
>
> Key: ZOOKEEPER-2600
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2600
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>    Reporter: Benjamin Reed
>
> we had the following strange production bug:
> there was an ephemeral znode for a session that was no longer active.  it 
> happened even in the absence of failures.
> we are running with local sessions enabled and slightly different logic than 
> the open source zookeeper, but code inspection shows that the problem is also 
> in open source.
> the triggering condition was server overload. we had a traffic burst and it 
> we were having commit latencies of over 30 seconds.
> after digging through logs/code we realized from the logs that the create 
> session txn for the ephemeral node started (in the PrepRequestProcessor) at 
> 11:23:04 and committed at 11:23:38 (the "Adding global session" is output in 
> the commit processor). it took 34 seconds to commit the createSession, during 
> that time the session expired. due to delays it appears that the interleave 
> was as follows:
> 1) create session hits prep request processor and create session txn 
> generated 11:23:04
> 2) time passes as the create session is going through zab
> 3) the session expires, close session is generated, and close session txn 
> generated 11:23:23
> 4) the create session gets committed and the session gets re-added to the 
> sessionTracker 11:23:38
> 5) the create ephemeral node hits prep request processor and a create txn 
> generated 11:23:40
> 6) the close session gets committed (all ephemeral nodes for the session are 
> deleted) and the session is deleted from sessionTracker
> 7) the create ephemeral node gets committed
> the root cause seems to be that the gobal sessions are managed by both the 
> PrepRequestProcessor and the CommitProcessor. also with the local session 
> upgrading we can have changes in flight before our sessions commits. i think 
> there are probably two places to fix:
> 1) changes to session tracker should not happen in prep request processor.
> 2) we should not have requests in flight while create session is in process. 
> there are two options to prevent this:
> a) when a create session is generated in makeUpgradeRequest, we need to start 
> queuing the requests from the clients and only submit them once the create 
> session is committed
> b) the client should explicitly detect that it needs to change from local 
> session to global session and explicitly open a global session and get the 
> commit before it sends an ephemeral create request
> option 2a) is a more transparent fix, but architecturally and in the long 
> term i think 2b) might be better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2600) dangling ephemerals on overloaded server with local sessions

2016-09-24 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518765#comment-15518765
 ] 

Benjamin Reed commented on ZOOKEEPER-2600:
--

i investigated this further and it appears that a local change that has not 
been upstream is causing this problem. closing the bug.

> dangling ephemerals on overloaded server with local sessions
> 
>
> Key: ZOOKEEPER-2600
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2600
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>    Reporter: Benjamin Reed
>
> we had the following strange production bug:
> there was an ephemeral znode for a session that was no longer active.  it 
> happened even in the absence of failures.
> we are running with local sessions enabled and slightly different logic than 
> the open source zookeeper, but code inspection shows that the problem is also 
> in open source.
> the triggering condition was server overload. we had a traffic burst and it 
> we were having commit latencies of over 30 seconds.
> after digging through logs/code we realized from the logs that the create 
> session txn for the ephemeral node started (in the PrepRequestProcessor) at 
> 11:23:04 and committed at 11:23:38 (the "Adding global session" is output in 
> the commit processor). it took 34 seconds to commit the createSession, during 
> that time the session expired. due to delays it appears that the interleave 
> was as follows:
> 1) create session hits prep request processor and create session txn 
> generated 11:23:04
> 2) time passes as the create session is going through zab
> 3) the session expires, close session is generated, and close session txn 
> generated 11:23:23
> 4) the create session gets committed and the session gets re-added to the 
> sessionTracker 11:23:38
> 5) the create ephemeral node hits prep request processor and a create txn 
> generated 11:23:40
> 6) the close session gets committed (all ephemeral nodes for the session are 
> deleted) and the session is deleted from sessionTracker
> 7) the create ephemeral node gets committed
> the root cause seems to be that the gobal sessions are managed by both the 
> PrepRequestProcessor and the CommitProcessor. also with the local session 
> upgrading we can have changes in flight before our sessions commits. i think 
> there are probably two places to fix:
> 1) changes to session tracker should not happen in prep request processor.
> 2) we should not have requests in flight while create session is in process. 
> there are two options to prevent this:
> a) when a create session is generated in makeUpgradeRequest, we need to start 
> queuing the requests from the clients and only submit them once the create 
> session is committed
> b) the client should explicitly detect that it needs to change from local 
> session to global session and explicitly open a global session and get the 
> commit before it sends an ephemeral create request
> option 2a) is a more transparent fix, but architecturally and in the long 
> term i think 2b) might be better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2600) dangling ephemerals on overloaded server with local sessions

2016-09-22 Thread Benjamin Reed (JIRA)

Benjamin Reed created ZOOKEEPER-2600:


 Summary: dangling ephemerals on overloaded server with local 
sessions
 Key: ZOOKEEPER-2600
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2600
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Benjamin Reed


we had the following strange production bug:

there was an ephemeral znode for a session that was no longer active.  it 
happened even in the absence of failures.

we are running with local sessions enabled and slightly different logic than 
the open source zookeeper, but code inspection shows that the problem is also 
in open source.

the triggering condition was server overload. we had a traffic burst and it we 
were having commit latencies of over 30 seconds.

after digging through logs/code we realized from the logs that the create 
session txn for the ephemeral node started (in the PrepRequestProcessor) at 
11:23:04 and committed at 11:23:38 (the "Adding global session" is output in 
the commit processor). it took 34 seconds to commit the createSession, during 
that time the session expired. due to delays it appears that the interleave was 
as follows:

1) create session hits prep request processor and create session txn generated 
11:23:04
2) time passes as the create session is going through zab
3) the session expires, close session is generated, and close session txn 
generated 11:23:23
4) the create session gets committed and the session gets re-added to the 
sessionTracker 11:23:38
5) the create ephemeral node hits prep request processor and a create txn 
generated 11:23:40
6) the close session gets committed (all ephemeral nodes for the session are 
deleted) and the session is deleted from sessionTracker
7) the create ephemeral node gets committed

the root cause seems to be that the gobal sessions are managed by both the 
PrepRequestProcessor and the CommitProcessor. also with the local session 
upgrading we can have changes in flight before our sessions commits. i think 
there are probably two places to fix:

1) changes to session tracker should not happen in prep request processor.
2) we should not have requests in flight while create session is in process. 
there are two options to prevent this:
a) when a create session is generated in makeUpgradeRequest, we need to start 
queuing the requests from the clients and only submit them once the create 
session is committed
b) the client should explicitly detect that it needs to change from local 
session to global session and explicitly open a global session and get the 
commit before it sends an ephemeral create request

option 2a) is a more transparent fix, but architecturally and in the long term 
i think 2b) might be better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] move Apache Zookeeper to git

2016-09-19 Thread Benjamin Reed

what you are suggesting sounds good, but i don't know how to do it? since
in the end we are still just accepting diffs on patches, the only thing
that changes is that we use svn rather than git right?

i LOVE chris's idea! lets do it!

ben

On Sun, Sep 18, 2016 at 3:22 PM, Patrick Hunt <ph...@apache.org> wrote:

> Ben, do you also want to update the "Applying a patch" section to make it
> git specific?
>
> We (committers) should move to a model where authors get proper credit in
> git. Our old workflow in svn resulted in only the committer being listed
> (except that we listed the patch author in the commit message). We should
> move to a model where the author of the patch gets proper credit in git. I
> believe we will get that if we use git for patch creation/application?
>
> Chris brought up getting rid of CHANGES.txt recently on the dev list in a
> separate thread - Chris do you want to implement that change now that we've
> moved to git?
>
> Patrick
>
> On Wed, Sep 14, 2016 at 9:01 PM, Benjamin Reed <br...@apache.org> wrote:
>
>> > 1) actually in the previous step that was just adding new files. you
>>> still
>>> > need the commit -a for the rest of the changes. that's my normal
>>> workflow.
>>>
>>> I think that will be confusing for most folks. They typically stage
>>> all the changes and then commit or don't stage and use -a.
>>>
>>
>> do you mind fixing it with your workflow. commit -a doesn't get new
>> files, which is why you need to do the add, but i'm not the most
>> sophisticated git user, so
>>
>>
>>>
>>> > 2) i figured since we are using git now that we should use git's
>>> default.
>>> > the patch should work (by default it seems to strip the first path
>>> element).
>>> > does it not work for you?
>>> >
>>>
>>> It will fail precommit in it's current state.
>>>
>>
>> fixed
>>
>
>

Re: [VOTE] move Apache Zookeeper to git

2016-09-14 Thread Benjamin Reed

>
> > 1) actually in the previous step that was just adding new files. you
> still
> > need the commit -a for the rest of the changes. that's my normal
> workflow.
>
> I think that will be confusing for most folks. They typically stage
> all the changes and then commit or don't stage and use -a.
>

do you mind fixing it with your workflow. commit -a doesn't get new files,
which is why you need to do the add, but i'm not the most sophisticated git
user, so


>
> > 2) i figured since we are using git now that we should use git's default.
> > the patch should work (by default it seems to strip the first path
> element).
> > does it not work for you?
> >
>
> It will fail precommit in it's current state.
>

fixed

Re: [VOTE] move Apache Zookeeper to git

2016-09-14 Thread Benjamin Reed

1) actually in the previous step that was just adding new files. you still
need the commit -a for the rest of the changes. that's my normal workflow.
2) i figured since we are using git now that we should use git's default.
the patch should work (by default it seems to strip the first path
element). does it not work for you?

On Tue, Sep 13, 2016 at 2:02 PM, Patrick Hunt <ph...@apache.org> wrote:

> Hi Ben. I didn't review it fully, but it looks like there are at least
> two issues:
>
> 1) you shouldn't have the person "git commit -a" given the previous
> commands added the files the user is interested in submitting. I think
> you just mean to "git commit" ?
>
> 2) you dropped the "--no-prefix" from the git diff - this will break
> existing tooling. It's also not consistent with the section "Applying
> a patch" from that same wiki page. I recommend you add back the
> --no-prefix until we are able to address (also impacts any pending
> patches created using the old style)
>
> Patrick
>
> On Mon, Sep 12, 2016 at 10:13 PM, Benjamin Reed <br...@apache.org> wrote:
> > i've updated the contributing to zookeeper wiki:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
> >
> > there are many ways to do things in git, so if others feel that there is
> a
> > better way please updated.
> >
> > i think it would be cool to adopt a github based submission like other
> > projects have, but i'll leave that up to someone else that has used one
> of
> > those work flows to propose something.
> >
> > thanx
> > ben
> >
> > ps - i have a small change to the site to fix some svn references, but
> i'm
> > still trying to figure out if the site is on git or svn...
> >
> > On Mon, Sep 12, 2016 at 8:28 PM, Patrick Hunt <ph...@apache.org> wrote:
> >>
> >> Cool, thanks.
> >>
> >> I've re-enabled the precommit job, reconfigured it, and it seems to be
> >> working again. LMK if you notice any issues.
> >>
> >> Patrick
> >>
> >> On Mon, Sep 12, 2016 at 8:22 PM, Raúl Gutiérrez Segalés <r...@apache.org
> >
> >> wrote:
> >> > On 12 September 2016 at 17:48, Patrick Hunt <ph...@apache.org> wrote:
> >> >
> >> >> On Mon, Sep 12, 2016 at 10:15 AM, Raúl Gutiérrez Segalés
> >> >> <r...@itevenworks.net> wrote:
> >> >> > On 12 September 2016 at 09:58, Patrick Hunt <ph...@apache.org>
> wrote:
> >> >> >
> >> >> >> Here it is, please take a look, review, and commit it to master
> >> >> >> (remember, needs to be git now :-) )
> >> >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-2576
> >> >> >
> >> >> >
> >> >> > Merged:
> >> >> >
> >> >> > https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=
> commitdiff;h=
> >> >> 8c4082647f89b0a92fa00a2af8de84b3c7314e23
> >> >> >
> >> >> > This is only needed in master?
> >> >> >
> >> >>
> >> >> Only on master. Our current pre-commit is only for master. If we move
> >> >> to something like Yetus I believe they will also check branches.
> >> >>
> >> >> Raul/Ben/et.al. can you commit the second part of the patch? I
> >> >> attached it to ZOOKEEPER-2576. Bit of a cleanup on the command naming
> >> >> (missed build.xml changes). Thanks.
> >> >>
> >> >
> >> > Merged:
> >> >
> >> > https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=
> commitdiff;h=b2a484cfe743116d2531fe5d1e1d78b3960c511e
> >> >
> >> >
> >> > -rgs
> >
> >
>

Re: [VOTE] move Apache Zookeeper to git

2016-09-14 Thread Benjamin Reed

btw, there is an INFRA jira to get gerritt, but the status of it is very
unclear: https://issues.apache.org/jira/browse/INFRA-2205

On Tue, Sep 13, 2016 at 1:30 AM, Flavio Junqueira <f...@apache.org> wrote:

> Thanks, Ben and Pat. Is there anyone wiling to pick this up? I believe
> there are some scripts involved to do the workflow integration with github.
> I'd look at projects like Kafka and BookKeeper, which I know have done it,
> and replicate it.
>
> You don't have to be a committer to work on this, but you'd have to work
> with a committer. It is a good opportunity to learn a bit more about the
> moving parts of an Apache project if you consider becoming a ZK committer
> eventually.
>
> -Flavio
>
> > On 13 Sep 2016, at 06:13, Benjamin Reed <br...@apache.org> wrote:
> >
> > i've updated the contributing to zookeeper wiki:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
> >
> > there are many ways to do things in git, so if others feel that there is
> a
> > better way please updated.
> >
> > i think it would be cool to adopt a github based submission like other
> > projects have, but i'll leave that up to someone else that has used one
> of
> > those work flows to propose something.
> >
> > thanx
> > ben
> >
> > ps - i have a small change to the site to fix some svn references, but
> i'm
> > still trying to figure out if the site is on git or svn...
> >
> > On Mon, Sep 12, 2016 at 8:28 PM, Patrick Hunt <ph...@apache.org> wrote:
> >
> >> Cool, thanks.
> >>
> >> I've re-enabled the precommit job, reconfigured it, and it seems to be
> >> working again. LMK if you notice any issues.
> >>
> >> Patrick
> >>
> >> On Mon, Sep 12, 2016 at 8:22 PM, Raúl Gutiérrez Segalés <r...@apache.org
> >
> >> wrote:
> >>> On 12 September 2016 at 17:48, Patrick Hunt <ph...@apache.org> wrote:
> >>>
> >>>> On Mon, Sep 12, 2016 at 10:15 AM, Raúl Gutiérrez Segalés
> >>>> <r...@itevenworks.net> wrote:
> >>>>> On 12 September 2016 at 09:58, Patrick Hunt <ph...@apache.org>
> wrote:
> >>>>>
> >>>>>> Here it is, please take a look, review, and commit it to master
> >>>>>> (remember, needs to be git now :-) )
> >>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2576
> >>>>>
> >>>>>
> >>>>> Merged:
> >>>>> https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=
> >> commitdiff;h=
> >>>> 8c4082647f89b0a92fa00a2af8de84b3c7314e23
> >>>>>
> >>>>> This is only needed in master?
> >>>>>
> >>>>
> >>>> Only on master. Our current pre-commit is only for master. If we move
> >>>> to something like Yetus I believe they will also check branches.
> >>>>
> >>>> Raul/Ben/et.al. can you commit the second part of the patch? I
> >>>> attached it to ZOOKEEPER-2576. Bit of a cleanup on the command naming
> >>>> (missed build.xml changes). Thanks.
> >>>>
> >>>
> >>> Merged:
> >>> https://git-wip-us.apache.org/repos/asf?p=zookeeper.git;a=
> commitdiff;h=
> >> b2a484cfe743116d2531fe5d1e1d78b3960c511e
> >>>
> >>>
> >>> -rgs
> >>
>
>

[jira] [Commented] (ZOOKEEPER-2465) Documentation copyright notice is out of date.

2016-09-12 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485391#comment-15485391
 ] 

Benjamin Reed commented on ZOOKEEPER-2465:
--

the last time a similar issue came up like this on a different project an 
unnamed companies legal team pointed out that you don't need to keep updating 
the date. after a long and heated discussion among engineers with no legal 
background, we decided to follow advice and just leave the date.

i don't have the legal background to make a definitive statement, but it does 
make maintenance easier if we don't have to keep updating the year. we can just 
make it 2008 and not worry about changing it.

> Documentation copyright notice is out of date.
> --
>
> Key: ZOOKEEPER-2465
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2465
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Edward Ribeiro
>Priority: Blocker
> Fix For: 3.5.3
>
>
> As reported by [~eribeiro], all of the documentation pages show a copyright 
> notice dating "2008-2013".  This issue tracks updating the copyright notice 
> on all documentation pages to show the current year.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] move Apache Zookeeper to git

2016-09-11 Thread Benjamin Reed

sure. i'll update it to reference git rather than svn.

if i understand correctly pull requests that were submitted via github were
reviewed by the qa bot (or something like that) in the past, but it was
turned off. we should turn that back on i think.

thanx
ben

On Sun, Sep 11, 2016 at 8:49 PM, Patrick Hunt <ph...@apache.org> wrote:

> FYI Apache INFRA has made the cutover -
> https://issues.apache.org/jira/browse/INFRA-12573
>
> At this point we need to update the "how to contribute" etc... Ben do
> you want to take a stab at that? I can update the respective Jenkins
> jobs.
>
> What else is there?
>
> Patrick
>
> On Wed, Sep 7, 2016 at 9:59 AM, Chris Nauroth <cnaur...@hortonworks.com>
> wrote:
> > Thank you for doing this, Eddie.  I just picked up the code review.
> >
> > --Chris Nauroth
> >
> > On 9/7/16, 9:49 AM, "Edward Ribeiro" <edward.ribe...@gmail.com> wrote:
> >
> > Hey folks, as part of this major change, I took a look at the
> gitignore and
> > it already lacks a lot of file extensions for a modern Java project.
> > Therefore, I created a trivial patch (shameless plug) that updates
> for more
> > commonly extensions:  https://issues.apache.org/
> jira/browse/ZOOKEEPER-2557
> >
> > Could you please review it and (the committers) this incorporated
> into
> > branches before the transition if everything is alright, whenever
> you have
> > time? The final gitignore doesn't look particularly big and cover
> only
> > mostly the common IDE extensions and temporary files.
> >
> > Cheers,
> > Eddie
> >
> >
> > On Wed, Sep 7, 2016 at 7:31 AM, Flavio Junqueira <f...@apache.org>
> wrote:
> >
> > > +1
> > >
> > > > On 07 Sep 2016, at 06:10, Patrick Hunt <ph...@apache.org> wrote:
> > > >
> > > > Quick update (more details on the INFRA jira). It might take
> upwards of
> > > 24
> > > > hours to do the svn->git migration although our repo isn't that
> large,
> > > > likely less. INFRA can do it, for example, on Saturday around
> 18:00 UTC.
> > > > Any concerns with such an approach?
> > > >
> > > > Patrick
> > > >
> > > > On Sun, Sep 4, 2016 at 9:20 PM, Patrick Hunt <ph...@apache.org>
> wrote:
> > > >
> > > >> Follow along here: https://issues.apache.org/
> jira/browse/INFRA-12573
> > > >>
> > > >> Patrick
> > > >>
> > > >> On Sun, Sep 4, 2016 at 8:33 AM, Benjamin Reed <br...@apache.org>
> wrote:
> > > >>
> > > >>> with 10 votes for (5 of which are from the PMC) on no votes
> against.
> > > the
> > > >>> vote passes.
> > > >>>
> > > >>> pat please make git happen! :)
> > > >>>
> > > >>> thanx for voting!
> > > >>>
> > > >>> On Thu, Sep 1, 2016 at 9:25 AM, Michael Han <h...@cloudera.com>
> wrote:
> > > >>>
> > > >>>> +1
> > > >>>>
> > > >>>> On Thu, Sep 1, 2016 at 6:08 AM, Michelle Tan <
> pheyyin...@gmail.com>
> > > >>> wrote:
> > > >>>>
> > > >>>>> +1
> > > >>>>>
> > > >>>>> On Thu, Sep 1, 2016 at 2:01 PM, Flavio Junqueira <
> f...@apache.org>
> > > >>> wrote:
> > > >>>>>
> > > >>>>>> +1
> > > >>>>>>
> > > >>>>>>> On 01 Sep 2016, at 13:28, Edward Ribeiro <
> > > >>> edward.ribe...@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> +1 (non binding)
> > > >>>>>>>
> > > >>>>>>> On Thu, Sep 1, 2016 at 3:44 AM, Jordan Zimmerman <
> > > >>>>>> jor...@jordanzimmerman.com
> > > >>>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> +1 (non binding)
> > > >>>>>>>>
> > > >>>>>>>>> On Aug 31, 2016, at 8:29 PM, Benjamin Reed <
> br...@apache.org>
> > > >>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> flip the switch to git and update the relevant scripts
> and docs.
> > > >>>>>>>>>
> > > >>>>>>>>> i couldn't figure out which timeframe this falls under
> in the
> > > >>>> voting
> > > >>>>>>>>> procedure table, but i think it's safe to go with 3
> days, so the
> > > >>>> vote
> > > >>>>>>>> will
> > > >>>>>>>>> close on Saturday, September 3 at 6:30pm pdt.
> > > >>>>>>>>>
> > > >>>>>>>>> +1 from me
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Cheers
> > > >>>> Michael.
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > >
> > >
> >
> >
>

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-09-06 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469376#comment-15469376
 ] 

Benjamin Reed commented on ZOOKEEPER-2169:
--

can't you just do a stat to find this out?


> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] move Apache Zookeeper to git

2016-09-04 Thread Benjamin Reed

with 10 votes for (5 of which are from the PMC) on no votes against. the
vote passes.

pat please make git happen! :)

thanx for voting!

On Thu, Sep 1, 2016 at 9:25 AM, Michael Han <h...@cloudera.com> wrote:

> +1
>
> On Thu, Sep 1, 2016 at 6:08 AM, Michelle Tan <pheyyin...@gmail.com> wrote:
>
> > +1
> >
> > On Thu, Sep 1, 2016 at 2:01 PM, Flavio Junqueira <f...@apache.org> wrote:
> >
> > > +1
> > >
> > > > On 01 Sep 2016, at 13:28, Edward Ribeiro <edward.ribe...@gmail.com>
> > > wrote:
> > > >
> > > > +1 (non binding)
> > > >
> > > > On Thu, Sep 1, 2016 at 3:44 AM, Jordan Zimmerman <
> > > jor...@jordanzimmerman.com
> > > >> wrote:
> > > >
> > > >> +1 (non binding)
> > > >>
> > > >>> On Aug 31, 2016, at 8:29 PM, Benjamin Reed <br...@apache.org>
> wrote:
> > > >>>
> > > >>> flip the switch to git and update the relevant scripts and docs.
> > > >>>
> > > >>> i couldn't figure out which timeframe this falls under in the
> voting
> > > >>> procedure table, but i think it's safe to go with 3 days, so the
> vote
> > > >> will
> > > >>> close on Saturday, September 3 at 6:30pm pdt.
> > > >>>
> > > >>> +1 from me
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Cheers
> Michael.
>

[jira] [Commented] (ZOOKEEPER-2536) When provide path for "dataDir" with trailing space, it is taking correct path (by trucating space) for snapshot but creating temporary file with some junk folder n

2016-09-01 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457459#comment-15457459
 ] 

Benjamin Reed commented on ZOOKEEPER-2536:
--

BTW i think the patch is not applying because you didn't do the diff relative 
to the root of the zookeeper repo

> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> 
>
> Key: ZOOKEEPER-2536
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2536
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
> Fix For: 3.5.1, 3.5.2
>
> Attachments: zkServer.sh.patch
>
>
> Scenario 1:-
> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> Steps to reproduce:-
> 1. Configure the dataDir
> dataDir=/home/Rakesh/Zookeeper/18_Aug/zookeeper-3.5.1-alpha/data 
> Here there is a space after /data 
> 2. Start Zookeeper Server
> 3. The snapshot is getting created at location mentioned above by truncating 
> the trailing space but
> one temp folder with junk name (like -> D29D4X~J) is getting created for 
> zookeeper_server.pid
> Scenario 2:-
> When configure the heading and trailing space in above mentioned scenario. 
> the temp folder is getting created in zookeeper/bin folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2537) When provide path for "dataDir" with heading space, it is taking correct path (by trucating space) for snapshot but zookeeper_server.pid is getting created in root

2016-09-01 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457457#comment-15457457
 ] 

Benjamin Reed commented on ZOOKEEPER-2537:
--

isn't this the same as ZOOKEEPER-2536?

> When provide path for "dataDir" with heading space, it is taking correct path 
> (by trucating space) for snapshot but zookeeper_server.pid is getting created 
> in root (/) folder
> --
>
> Key: ZOOKEEPER-2537
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2537
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
> Fix For: 3.5.1, 3.5.2
>
> Attachments: zkServer.sh.patch
>
>
> Scenario 1 :-
> When provide path for "dataDir" with heading space, it is taking correct path 
> (by trucating space) for snapshot but zookeeper_server.pid is getting created 
> in root (/) folder
> Steps to reproduce:-
> 1. Configure the dataDir
> dataDir= /home/Rakesh/Zookeeper/18_Aug/zookeeper-3.5.1-alpha/data
> Here there is a space after dataDir=
> 2. Start Zookeeper Server
> 3. The snapshot is getting created at location mentioned above by truncating 
> the heading space but
> zookeeper_server.pid is getting created at root (/) folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2536) When provide path for "dataDir" with trailing space, it is taking correct path (by trucating space) for snapshot but creating temporary file with some junk folder n

2016-09-01 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457447#comment-15457447
 ] 

Benjamin Reed commented on ZOOKEEPER-2536:
--

+1 LGTM

> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> 
>
> Key: ZOOKEEPER-2536
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2536
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
> Fix For: 3.5.1, 3.5.2
>
> Attachments: zkServer.sh.patch
>
>
> Scenario 1:-
> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> Steps to reproduce:-
> 1. Configure the dataDir
> dataDir=/home/Rakesh/Zookeeper/18_Aug/zookeeper-3.5.1-alpha/data 
> Here there is a space after /data 
> 2. Start Zookeeper Server
> 3. The snapshot is getting created at location mentioned above by truncating 
> the trailing space but
> one temp folder with junk name (like -> D29D4X~J) is getting created for 
> zookeeper_server.pid
> Scenario 2:-
> When configure the heading and trailing space in above mentioned scenario. 
> the temp folder is getting created in zookeeper/bin folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2539) Throwing nullpointerException when run the command "config -c" when client port is mentioned as separate and not like new style

2016-09-01 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457427#comment-15457427
 ] 

Benjamin Reed commented on ZOOKEEPER-2539:
--

+1 looks good, just a formatting nit you need to indent the line after the if() 
and put it in {}'s since it is on a separate line.

> Throwing nullpointerException when run the command "config -c" when client 
> port is mentioned as separate and not like new style
> ---
>
> Key: ZOOKEEPER-2539
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2539
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
>Priority: Minor
> Fix For: 3.5.1, 3.5.2
>
> Attachments: ConfigUtils.java.patch
>
>
> Throwing nullpointerException when run the command "config -c" when client 
> port is mentioned as separate and not like new style
> 1. Configure the zookeeper to start in cluster mode like below-
> clientPort=2181
> server.1=10.18.101.80:2888:3888
> server.2=10.18.219.50:2888:3888
> server.3=10.18.221.194:2888:3888
> and not like below:-
> server.1=10.18.101.80:2888:3888:participant;2181
> server.2=10.18.219.50:2888:3888:participant;2181
> server.3=10.18.221.194:2888:3888:participant;2181
> 2. Start the cluster and one client using >zkCli.sh
> 3. execute command "config -c"
> It is throwing nullpointerException:-
> root@BLR110865:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin#
>  ./zkCli.sh 
> Connecting to localhost:2181
> 2016-08-29 21:45:19,558 [myid:] - INFO  [main:Environment@109] - Client 
> environment:zookeeper.version=3.5.1-alpha--1, built on 08/18/2016 08:20 GMT
> 2016-08-29 21:45:19,561 [myid:] - INFO  [main:Environment@109] - Client 
> environment:host.name=BLR110865
> 2016-08-29 21:45:19,562 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.version=1.7.0_17
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.vendor=Oracle Corporation
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.home=/usr/lib/jvm/oracle_jdk7/jre
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.class.path=/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/classes:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/lib/*.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-log4j12-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-api-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/servlet-api-2.5-20081211.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/netty-3.7.0.Final.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/log4j-1.2.16.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jline-2.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-util-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/javacc.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-mapper-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-core-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/commons-cli-1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../zookeeper-3.5.1-alpha.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../src/java/lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../conf:
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.io.tmpdir=/tmp
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.compiler=
> 2016-08-29 21:45:19,565 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.name=Linux
> 2016-08-29 21:45:19,565 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.arch=amd64
> 2016-08-29 21:45:19,565

Re: Issue with NettyServerCnxn.java

2016-09-01 Thread Benjamin Reed

i agree. propagating up is the best. i was looking at NIOServerCnxn rather
than checking above. behavior will change slightly since failing on a PING
will cause another attempt at sending a response. i'm not sure how much
that will change thing, but it is a code path that we have never hit with
NIOServerCnxn.

ben

On Thu, Sep 1, 2016 at 1:17 PM, yuliya Feldman <yufeld...@yahoo.com.invalid>
wrote:

> Yes,
> This is what I plan - propagate IOException up. We could convert other
> exceptions to IOException as well and propagate up.
>
> Thank you guys for replies.Yuliya
>
>   From: Michael Han <h...@cloudera.com>
>  To: UserZooKeeper <u...@zookeeper.apache.org>
> Cc: dev@zookeeper.apache.org; yuliya Feldman <yufeld...@yahoo.com>;
> Patrick Hunt <ph...@apache.org>
>  Sent: Thursday, September 1, 2016 12:17 PM
>  Subject: Re: Issue with NettyServerCnxn.java
>
> Make sense to me.
>
> On Thu, Sep 1, 2016 at 12:10 PM, Flavio Junqueira <f...@apache.org> wrote:
>
> > I guess that's precisely what I'm proposing we avoid, I think we should
> > propagate up as an IOException, which the signature of the abstract
> method
> > already suggests we should be doing. If what I'm saying makes any sense,
> we
> > should instead remove the catch Exception block at the end of
> > NIOServerCnx.sendResponse.
> >
> > -Flavio
> >
> > > On 01 Sep 2016, at 20:05, Michael Han <h...@cloudera.com> wrote:
> > >
> > > I think it is not just about IOException - the current
> > > NIOServerCnxn.sendResponse swallows any exception it caught (including
> > the
> > > NPE this thread the related JIRA is talking about.). On the other hand,
> > the
> > > NettyServerCnx.sendResponse only catches IOException, so there is a
> > > discrepancy in terms of the behaviors of catching exception. Probably
> > > making NettyServerCnx.sendResponse catches every exception is the
> > solution
> > > here?
> > >
> > > On Thu, Sep 1, 2016 at 11:57 AM, Flavio Junqueira <f...@apache.org>
> > wrote:
> > >
> > >> I'm not sure why you say that it is better to swallow the exception,
> > Ben.
> > >> I checked the methods that call sendResponse and they seem to be able
> to
> > >> handle IOExceptions fine. For example, in NettyServerCnxn.process, we
> > call
> > >> close upon IOException, which is exactly the behavior you mention you
> > >> should have.
> > >>
> > >> I'm thinking that in this case, if the channel is closed and it is
> null,
> > >> we throw IOException. I'm trying to understand why that's bad course
> of
> > >> action.
> > >>
> > >> -Flavio
> > >>
> > >>> On 01 Sep 2016, at 19:29, Benjamin Reed <br...@apache.org> wrote:
> > >>>
> > >>> i agree, the exception should not bubble up. if something bad happens
> > we
> > >>> should mark the connection as closed (if not already) and continue
> on.
> > >>> elsewhere closed connections are cleaned up. (or at least they better
> > >> be...)
> > >>>
> > >>> ben
> > >>>
> > >>> On Thu, Sep 1, 2016 at 11:02 AM, yuliya Feldman
> > >> <yufeld...@yahoo.com.invalid
> > >>>> wrote:
> > >>>
> > >>>> Thank you Ben and Patrick for the replies.
> > >>>> The problem I see with Netty exception handling (or rather not
> > handling)
> > >>>> is that if something happens it bubbles up and main request
> processing
> > >>>> thread is stopped which effectively halts whole ZK server
> operations.
> > >>>> I will submit a JIRA on this (hopefully today). Either we should not
> > >>>> bubble up any exception by IOException or ZK server should be
> stopped,
> > >> as
> > >>>> it is really hard to figure out without turning on tracing what
> really
> > >>>> happened.
> > >>>> ThanksYuliya
> > >>>>
> > >>>>From: Benjamin Reed <br...@apache.org>
> > >>>> To: Patrick Hunt <ph...@apache.org>
> > >>>> Cc: DevZooKeeper <dev@zookeeper.apache.org>; yuliya Feldman <
> > >>>> yufeld...@yahoo.com>; "u...@zookeeper.apache.org" <
> > >>>> u...@zookeeper.apache.org>
> > >>>> Sent: Wednesday, August 31, 2016 10:47 PM
> > >>>>

Re: Issue with NettyServerCnxn.java

2016-09-01 Thread Benjamin Reed

i agree, the exception should not bubble up. if something bad happens we
should mark the connection as closed (if not already) and continue on.
elsewhere closed connections are cleaned up. (or at least they better be...)

ben

On Thu, Sep 1, 2016 at 11:02 AM, yuliya Feldman <yufeld...@yahoo.com.invalid
> wrote:

> Thank you Ben and Patrick for the replies.
> The problem I see with Netty exception handling (or rather not handling)
> is that if something happens it bubbles up and main request processing
> thread is stopped which effectively halts whole ZK server operations.
> I will submit a JIRA on this (hopefully today). Either we should not
> bubble up any exception by IOException or ZK server should be stopped, as
> it is really hard to figure out without turning on tracing what really
> happened.
> ThanksYuliya
>
>   From: Benjamin Reed <br...@apache.org>
>  To: Patrick Hunt <ph...@apache.org>
> Cc: DevZooKeeper <dev@zookeeper.apache.org>; yuliya Feldman <
> yufeld...@yahoo.com>; "u...@zookeeper.apache.org" <
> u...@zookeeper.apache.org>
>  Sent: Wednesday, August 31, 2016 10:47 PM
>  Subject: Re: Issue with NettyServerCnxn.java
>
> if i remember correctly the case in sendResponse where it is catching the
> IOException is due to the fact that we are opportunistically trying to send
> something on a non-blocking channel. if it works, ok, but if we can't send
> because we are blocked then we will just send later.
>
> in the case of NIOServerCnxn there really shouldn't be any exceptions in
> sendResponse since it's just queuing. i think the catch is probably there
> so that the exception does not get propagated up and kill everything.
>
> ben
>
> On Wed, Aug 31, 2016 at 9:52 PM, Patrick Hunt <ph...@apache.org> wrote:
>
> > Hi Yuliya - my read is that sendResponse in NIOServerCnxn is logging,
> > then dropping, any Exceptions encountered during sendResponse. In other
> > words it's doing best effort response. Not sure if that is "correct", but
> > that's what it's currently doing in NIO. Surprisingly it's also hiding
> any
> > IOExceptions, which is part of the method signature as defined by
> > ServerCnxn. Some of the calling code is trying to handle IOException in
> > some cases which is odd... I suspect it was an oversight in
> ZOOKEEPER-597,
> > but I'm not sure.
> >
> > Ben any insight?
> >
> > Patrick
> >
> > On Tue, Aug 30, 2016 at 5:15 PM, yuliya Feldman <
> > yufeld...@yahoo.com.invalid> wrote:
> >
> >> Hello there,
> >> We have been extensively testing Netty connection versus NIIO and there
> >> are some issues that show up I wanted to get community response on.
> >> In the process of testing https://issues.apache.
> >> org/jira/browse/ZOOKEEPER-2509 fix we identified that sendResponse()
> >> method may try to do some operations after close() was invoked - as
> >> channel.close() in Netty is asynch. and subsequently lead to some NPE.
> >> NPE itself is not a good thing but the problems aggravates with the fact
> >> that propagation of NPE will lead to main processing thread exiting and
> at
> >> that point ZK server becomes unresponsive - since no requests will be
> >> processed anymore.
> >> In NIOServerCnxn.java in sendResponse() it is catching Exception and
> just
> >> logs a warning  which was added as part of
> https://issues.apache.org/jira
> >> /browse/ZOOKEEPER-597
> >> I am trying to understand what a behavior should be in case of any
> >> exception in sendResponse.
> >> Any insight would be highly appreciated
> >> Thanks,Yuliya
> >>
> >>
> >
>
>
>
>

Re: Issue with NettyServerCnxn.java

2016-08-31 Thread Benjamin Reed

if i remember correctly the case in sendResponse where it is catching the
IOException is due to the fact that we are opportunistically trying to send
something on a non-blocking channel. if it works, ok, but if we can't send
because we are blocked then we will just send later.

in the case of NIOServerCnxn there really shouldn't be any exceptions in
sendResponse since it's just queuing. i think the catch is probably there
so that the exception does not get propagated up and kill everything.

ben

On Wed, Aug 31, 2016 at 9:52 PM, Patrick Hunt  wrote:

> Hi Yuliya - my read is that sendResponse in NIOServerCnxn is logging,
> then dropping, any Exceptions encountered during sendResponse. In other
> words it's doing best effort response. Not sure if that is "correct", but
> that's what it's currently doing in NIO. Surprisingly it's also hiding any
> IOExceptions, which is part of the method signature as defined by
> ServerCnxn. Some of the calling code is trying to handle IOException in
> some cases which is odd... I suspect it was an oversight in ZOOKEEPER-597,
> but I'm not sure.
>
> Ben any insight?
>
> Patrick
>
> On Tue, Aug 30, 2016 at 5:15 PM, yuliya Feldman <
> yufeld...@yahoo.com.invalid> wrote:
>
>> Hello there,
>> We have been extensively testing Netty connection versus NIIO and there
>> are some issues that show up I wanted to get community response on.
>> In the process of testing https://issues.apache.
>> org/jira/browse/ZOOKEEPER-2509 fix we identified that sendResponse()
>> method may try to do some operations after close() was invoked - as
>> channel.close() in Netty is asynch. and subsequently lead to some NPE.
>> NPE itself is not a good thing but the problems aggravates with the fact
>> that propagation of NPE will lead to main processing thread exiting and at
>> that point ZK server becomes unresponsive - since no requests will be
>> processed anymore.
>> In NIOServerCnxn.java in sendResponse() it is catching Exception and just
>> logs a warning  which was added as part of https://issues.apache.org/jira
>> /browse/ZOOKEEPER-597
>> I am trying to understand what a behavior should be in case of any
>> exception in sendResponse.
>> Any insight would be highly appreciated
>> Thanks,Yuliya
>>
>>
>

[VOTE] move Apache Zookeeper to git

2016-08-31 Thread Benjamin Reed

flip the switch to git and update the relevant scripts and docs.

i couldn't figure out which timeframe this falls under in the voting
procedure table, but i think it's safe to go with 3 days, so the vote will
close on Saturday, September 3 at 6:30pm pdt.

+1 from me

Re: switching to git?

2016-08-30 Thread Benjamin Reed

i like kafka's contributor work flow!
https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
i'd like to head toward that. of course there are other options as well,
but as pat points out the first phase is moving to git.


On Tue, Aug 30, 2016 at 5:37 AM, Flavio Junqueira <f...@apache.org> wrote:

>
> > On 30 Aug 2016, at 07:42, Benjamin Reed <br...@apache.org> wrote:
> >
> > i'm curious, can we get the qa
> > bot to trigger off a pull request?
>
> Yes, we can. See this Kafka build queue:
>
> https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/ <
> https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/>
>
> -Flavio
>
> >
> > On Mon, Aug 29, 2016 at 8:56 PM, Patrick Hunt <ph...@apache.org> wrote:
> >
> >> On Mon, Aug 29, 2016 at 8:48 AM, Benjamin Reed <br...@apache.org>
> wrote:
> >>
> >>> gerritt is pretty amazing. after you upload a patch, the whole life
> cycle
> >>> of review, verification, and committer committing can happen on the
> web.
> >> in
> >>> practice it means that reviewing and committing small correct changes
> >>> becomes two clicks of a button. but, this is not what i'm proposing.
> >>> talking with pat there is no clear way to get gerritt setup. kudo uses
> >> it,
> >>> but it's a one off that we can't really leverage.
> >>>
> >>> in moving to git we can enable more flexible workflows that get us
> closer
> >>> to the benefits of using something like gerritt. for example, other
> >> apache
> >>> projects use workflows that allows patch uploads and reviewing to
> happen
> >> as
> >>> git pull requests. i realize it is subjective, which is why it's good
> >> that
> >>> others share their opinions; in my experience working on a variety of
> >>> different projects with different contribution systems, i've found that
> >> the
> >>> jira based patch uploading to be cumbersome for the contributor,
> >> reviewer,
> >>> and committer. even just the simple start of moving to git and
> accepting
> >>> github pull requests as a method of contribution will improve things a
> >> lot.
> >>>
> >>> as far as who does the work: pat mentioned that he has looked into this
> >> and
> >>> has the skillz to make it happen.
> >>>
> >>>
> >> I am pretty sure I know how to get INFRA to make the switch from
> svn->git.
> >> However there are additional changes we'll need to make (e.g. updating
> the
> >> "how to release" process, patch submission process, etc...) that we'll
> have
> >> to handle together.
> >>
> >> Gerrit is not something that INFRA is supporting at this time. As such
> >> we'll either have to continue to use RB or collectively agree to
> something
> >> else. Also keep in mind that qabot triggers off JIRA patches, we don't
> want
> >> to lose that.
> >>
> >> This thread is "switch to git", I'd recommend we focus on one step at a
> >> time. That said it might be good to lay out some steps before we start
> >> making changes.
> >>
> >> Patrick
> >>
> >>
> >>> ben
> >>>
> >>> On Mon, Aug 29, 2016 at 7:33 AM, Flavio Junqueira <f...@apache.org>
> >> wrote:
> >>>
> >>>> I also don't know what's being proposed here exactly. I thought Ben
> was
> >>>> just pointing out that it'd be nice to improve our infrastructure and
> >>>> tooling.
> >>>>
> >>>> In any case, I wanted to point to a thread in which we have some
> >>>> discussion around moving to git and doing github pull requests along
> >> with
> >>>> some pointers that this was done in other projects, like BookKeeper
> and
> >>>> Kafka:
> >>>>
> >>>> http://mail-archives.apache.org/mod_mbox/zookeeper-dev/
> >>>> 201603.mbox/%3cCANLc_9L4kygfgXX4u-C33ccohtauNYpKExRHwUgJMHCkAws-
> >>>> d...@mail.gmail.com%3e <http://mail-archives.apache.
> >>>> org/mod_mbox/zookeeper-dev/201603.mbox/%3CCANLc_9L4kygfgXX4u-
> >>>> c33ccohtaunypkexrhwugjmhckaws...@mail.gmail.com%3E>
> >>>>
> >>>> I don't have experience with gerritt and I have see only one project
> >>>> referring to gerritt in Apache: Kudo.
> >>>>
> >>>> -Flavio
> &

Re: switching to git?

2016-08-30 Thread Benjamin Reed

i agree that we should take it one step at a time, which is why i've kept
the discussion focused on the git switch. i'm curious, can we get the qa
bot to trigger off a pull request? if not, would the qa bot be able to pull
from a git url in a jira? (the more i think about it, the more i realize
that the qabot is purposely setup to run arbitrary code from a web page. a
security nightmare :) )

ben

On Mon, Aug 29, 2016 at 8:56 PM, Patrick Hunt <ph...@apache.org> wrote:

> On Mon, Aug 29, 2016 at 8:48 AM, Benjamin Reed <br...@apache.org> wrote:
>
> > gerritt is pretty amazing. after you upload a patch, the whole life cycle
> > of review, verification, and committer committing can happen on the web.
> in
> > practice it means that reviewing and committing small correct changes
> > becomes two clicks of a button. but, this is not what i'm proposing.
> > talking with pat there is no clear way to get gerritt setup. kudo uses
> it,
> > but it's a one off that we can't really leverage.
> >
> > in moving to git we can enable more flexible workflows that get us closer
> > to the benefits of using something like gerritt. for example, other
> apache
> > projects use workflows that allows patch uploads and reviewing to happen
> as
> > git pull requests. i realize it is subjective, which is why it's good
> that
> > others share their opinions; in my experience working on a variety of
> > different projects with different contribution systems, i've found that
> the
> > jira based patch uploading to be cumbersome for the contributor,
> reviewer,
> > and committer. even just the simple start of moving to git and accepting
> > github pull requests as a method of contribution will improve things a
> lot.
> >
> > as far as who does the work: pat mentioned that he has looked into this
> and
> > has the skillz to make it happen.
> >
> >
> I am pretty sure I know how to get INFRA to make the switch from svn->git.
> However there are additional changes we'll need to make (e.g. updating the
> "how to release" process, patch submission process, etc...) that we'll have
> to handle together.
>
> Gerrit is not something that INFRA is supporting at this time. As such
> we'll either have to continue to use RB or collectively agree to something
> else. Also keep in mind that qabot triggers off JIRA patches, we don't want
> to lose that.
>
> This thread is "switch to git", I'd recommend we focus on one step at a
> time. That said it might be good to lay out some steps before we start
> making changes.
>
> Patrick
>
>
> > ben
> >
> > On Mon, Aug 29, 2016 at 7:33 AM, Flavio Junqueira <f...@apache.org>
> wrote:
> >
> > > I also don't know what's being proposed here exactly. I thought Ben was
> > > just pointing out that it'd be nice to improve our infrastructure and
> > > tooling.
> > >
> > > In any case, I wanted to point to a thread in which we have some
> > > discussion around moving to git and doing github pull requests along
> with
> > > some pointers that this was done in other projects, like BookKeeper and
> > > Kafka:
> > >
> > > http://mail-archives.apache.org/mod_mbox/zookeeper-dev/
> > > 201603.mbox/%3cCANLc_9L4kygfgXX4u-C33ccohtauNYpKExRHwUgJMHCkAws-
> > > d...@mail.gmail.com%3e <http://mail-archives.apache.
> > > org/mod_mbox/zookeeper-dev/201603.mbox/%3CCANLc_9L4kygfgXX4u-
> > > c33ccohtaunypkexrhwugjmhckaws...@mail.gmail.com%3E>
> > >
> > > I don't have experience with gerritt and I have see only one project
> > > referring to gerritt in Apache: Kudo.
> > >
> > >  -Flavio
> > >
> > > > On 29 Aug 2016, at 14:25, Camille Fournier <cami...@apache.org>
> wrote:
> > > >
> > > > I'm confused what is the actual work that needs to happen to finalize
> > > this
> > > > and who is going to do that work? Ben are you volunteering or is
> there
> > > > nothing left to do?
> > > >
> > > > On Aug 28, 2016 10:54 PM, "Patrick Hunt" <ph...@apache.org> wrote:
> > > >
> > > >> :-)
> > > >>
> > > >> In order to make it "official" my recommendation would be for you to
> > > start
> > > >> a VOTE thread on dev@ after a couple more days, if there are no
> > > objections
> > > >> here that is.
> > > >>
> > > >> Regards,
> > > >>
> > > >> Patrick
> > > >>
> > > >>
>

1 2 3 4 5 6 >

1 - 100 of 509 matches

Mail list logo