Build failed in Jenkins: ZooKeeper-trunk-owasp #216

2018-12-26 Thread Apache Jenkins Server
See 

--
[...truncated 19.06 KB...]
[javac] Compiling 1 source file to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 8
[javac] 1 warning

git-revision:
[mkdir] Created dir: 


version-info:

process-template:

build-generated:
[javac] Compiling 63 source files to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 8
[javac] 1 warning

compile:
[javac] Compiling 300 source files to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 8
[javac] 
:37:
 warning: [deprecation] newInstance() in Class has been deprecated
[javac] true, 
Thread.currentThread().getContextClassLoader()).newInstance();
[javac] 
 ^
[javac]   where T is a type-variable:
[javac] T extends Object declared in class Class
[javac] 
:240:
 warning: [cast] redundant cast to ByteBuffer
[javac] b = (ByteBuffer) b.slice().limit(
[javac] ^
[javac] 
:78:
 warning: [cast] redundant cast to ByteBuffer
[javac] fileChannel.write((ByteBuffer) fill.position(0), 
newFileSize - fill.remaining());
[javac]   ^
[javac] 
:42:
 warning: [deprecation] newInstance() in Class has been deprecated
[javac] (IWatchManager) 
Class.forName(watchManagerName).newInstance();
[javac]^
[javac]   where T is a type-variable:
[javac] T extends Object declared in class Class
[javac] 5 warnings

compile-test:
[mkdir] Created dir: 

[javac] Compiling 264 source files to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 8
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 warning
[javac] Compiling 11 source files to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 8
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 1 warning
[javac] Compiling 2 source files to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 8
[javac] Note: 

 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 1 warning

init:

ivy-download:

ivy-taskdef:

ivy-init:

ivy-retrieve-owasp:
[ivy:retrieve] :: loading settings :: file = 

[ivy:retrieve] :: resolving dependencies :: 
org.apache.zookeeper#zookeeper;3.6.0-SNAPSHOT
[ivy:retrieve]  confs: [owasp]
[ivy:retrieve]  found org.owasp#dependency-check-ant;3.2.1 in maven2
[ivy:retrieve]  found org.owasp#dependency-check-core;3.2.1 in maven2
[ivy:retrieve]  found com.vdurmont#semver4j;2.2.0 in maven2
[ivy:retrieve]  found joda-time#joda-time;1.6 in maven2
[ivy:retrieve]  found org.slf4j#slf4j-api;1.7.25 in maven2
[ivy:retrieve]  found org.owasp#dependency-check-utils;3.2.1 in maven2
[ivy:retrieve]  found commons-io#commons-io;2.6 in maven2
[ivy:retrieve]  found org.apache.commons#commons-lang3;3.7 in maven2
[ivy:retrieve]  

[jira] [Comment Edited] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729095#comment-16729095
 ] 

maoling edited comment on ZOOKEEPER-3220 at 12/26/18 4:36 PM:
--

[~jiangjiafu]

1.>  "*In my environment, the save method returned successfully, that means 
no exception had been thrown. But, the data was not in disk! That's the problem 
I want to report!*"

why this situation happend? The disk is full? 
 snapshot does not call *fsync* may be the answer.
 Do you see some logs about *FileTxnSnapLog#save* at that time?
 2.Even if this situation that the size of snapshot is 0 could not cause data 
inconsistency.
 because when ZooKeeper server restarted again,the invalid snapshots will be 
skiped,if no any valid snapshot,the leader can do *SNAP* to sync with the 
follower


was (Author: maoling):
[~jiangjiafu]

--->"*In my environment, the save method returned successfully, that means no 
exception had been thrown. But, the data was not in disk! That's the problem I 
want to report!*"

1.why this situation happend? The disk is full? 
 snapshot does not call *fsync* may be the answer.
 Do you see some logs about *FileTxnSnapLog#save* at that time?
2.Even if this situation that the size of snapshot is 0 could not cause data 
inconsistency.
 because when ZooKeeper server restarted again,the invalid snapshots will be 
skiped,if no any invalid snapshot,
 the leader can do *SNAP* to sync with the follower

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3218) zk server reopened,the interval for observer connect to the new leader is too long,then session expired

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729186#comment-16729186
 ] 

Brian Nixon commented on ZOOKEEPER-3218:


We had similar issues which we addressed by making the polling interval 
configurable. Attaching our patch to this issue (it adds 
"zookeeper.fastleader.minNotificationInterval").

 

> zk server reopened,the interval for observer connect to the new leader is too 
> long,then session expired
> ---
>
> Key: ZOOKEEPER-3218
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3218
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: win7 32bits
> zookeeper 3.4.6、3.4.13
>Reporter: yangoofy
>Priority: Major
>
> two participants、one observer,zkclient connect to observer。
> Then,close the two participants,the zookeeper server cloesed
> Ten seconds later,reopen the two participants,and leader selected
> 
> But the observer can't connect to the new leader immediately。Because in 
> lookForLeader, the observer use blockingQueue(recvqueue)  to offer/poll 
> notifications,when the recvqueue is empty,poll from recvqueue will be 
> blocked,and timeout is 200ms,400ms,800ms60s。
> For example,09:59:59 observer poll notification,recvqueue was empty and 
> timeout was 60s;10:00:00 two participants reopened and reselected;10:00:59 
> observer polled notification,connected to the new leader
> But the maxSessionTimeout default to 40s。The session expired
> -
> Please improve it:observer should connect to the new leader as soon as 
> possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729136#comment-16729136
 ] 

Brian Nixon commented on ZOOKEEPER-2872:


Now that the patch is merged, was there any further work here?

> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>Priority: Major
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3197) Improve documentation in ZooKeeperServer.superSecret

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729138#comment-16729138
 ] 

Brian Nixon commented on ZOOKEEPER-3197:


Password is probably the wrong term for this variable (though it does suggest 
some potential future work). It's more of a checksum that's used in 
reconnection, carries no security weight, and is treated internally as if it 
carries no security weight.

 

[~breed] might be the only one left who knows the full story (it's telling that 
the secret decodes to "Ben is Cool").

 

> Improve documentation in ZooKeeperServer.superSecret
> 
>
> Key: ZOOKEEPER-3197
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3197
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Colm O hEigeartaigh
>Priority: Trivial
>
> A security scan flagged the use of a hard-coded secret 
> (ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
> generate a password:
> byte[] generatePasswd(long id)
> {         Random r = new Random(id ^ superSecret);         byte p[] = 
> new byte[16];         r.nextBytes(p);         return p;     }
> superSecret has the following javadoc:
>  /**
>     * This is the secret that we use to generate passwords, for the moment it
>     * is more of a sanity check.
>     */
> It is unclear from this comment and looking at the code why it is not a 
> security risk. It would be good to update the javadoc along the lines of 
> "Using a hard-coded secret with Random to generate a password is not a 
> security risk because the resulting passwords are used for X, Y, Z and not 
> for authentication etc" or something would be very helpful for anyone else 
> looking at the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3211) zookeeper standalone mode,found a high level bug in kernel of centos7.0 ,zookeeper Server's tcp/ip socket connections(default 60 ) are CLOSE_WAIT ,this lead to zk

2018-12-26 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729104#comment-16729104
 ] 

maoling commented on ZOOKEEPER-3211:


[~yss]

Not a deadlock? Just a problem about the IO in your env?

Is it easy to reproduce this issue? any advances?

> zookeeper standalone mode,found a high level bug in kernel of centos7.0 
> ,zookeeper Server's  tcp/ip socket connections(default 60 ) are CLOSE_WAIT 
> ,this lead to zk can't work for client any more
> --
>
> Key: ZOOKEEPER-3211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3211
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5
> Environment: 1.zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel
> kernel:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
>Reporter: yeshuangshuang
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: 1.log, 2018-12-09_124131.png, 2018-12-09_124210.png, 
> 2018-12-09_132854.png, 2018-12-09_133017.png, 2018-12-09_133049.png, 
> 2018-12-09_133111.png, 2018-12-09_133131.png, 2018-12-09_133150.png, 
> 2018-12-09_133210.png, 2018-12-09_133229.png, 2018-12-09_133248.png, 
> 2018-12-09_133320.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1.config--zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel version
> version:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
> 3.bug details:
> Occasionally,But the recurrence probability is extremely high. At first, the 
> read-write timeout takes about 6s, and after a few minutes, all connections 
> (including long ones) will be CLOSE_WAIT state.
> 4.:Circumvention scheme: it is found that all connections become close_wait 
> to restart the zookeeper server side actively



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729095#comment-16729095
 ] 

maoling commented on ZOOKEEPER-3220:


[~jiangjiafu]

--->"*In my environment, the save method returned successfully, that means no 
exception had been thrown. But, the data was not in disk! That's the problem I 
want to report!*"

1.why this situation happend? The disk is full? 
 snapshot does not call *fsync* may be the answer.
 Do you see some logs about *FileTxnSnapLog#save* at that time?
2.Even if this situation that the size of snapshot is 0 could not cause data 
inconsistency.
 because when ZooKeeper server restarted again,the invalid snapshots will be 
skiped,if no any invalid snapshot,
 the leader can do *SNAP* to sync with the follower

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729134#comment-16729134
 ] 

Brian Nixon commented on ZOOKEEPER-3220:


I believe ZOOKEEPER-2872 addressed the fsyncing part of this issue and 
ZOOKEEPER-3082 added some nice cleanup around 0 size snapshot file. Neither of 
these changes were backported to 3.4 so that suggests one potential path 
forward. Note that backporting ZOOKEEPER-2872 also requires backporting 
ZOOKEEPER-2870.

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper_branch34_jdk8 - Build # 1638 - Failure

2018-12-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1638/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 42.79 KB...]
[junit] Running org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.47 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.922 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.999 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.493 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.775 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.942 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.901 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.105 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.022 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.042 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.16 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.025 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.084 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.872 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.905 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.631 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.312 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.14 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.586 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
30.135 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.956 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.902 sec
[junit] Running org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.097 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1408: 
The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1411: 
Tests failed!

Total time: 41 minutes 0 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.zookeeper.server.ServerStatsTest.testLatencyMetrics

Error Message:
Min latency check
Expected: a value equal to or greater than <1001L>
 but: <1000L> was less than <1001L>

Stack Trace:
junit.framework.AssertionFailedError: Min latency check
Expected: a value equal to or greater than <1001L>
 but: <1000L> was less than <1001L>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at 
org.apache.zookeeper.server.ServerStatsTest.testLatencyMetrics(ServerStatsTest.java:77)
at 

[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729312#comment-16729312
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


[~nixon] Thanks very much!

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729308#comment-16729308
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


[~maoling]

 

why this situation happend? The disk is full? 

No, but the machine restarted.

Do you see some logs about *FileTxnSnapLog#save* at that time?

No any error log, if fact, during the machine reboot, some log of the follower 
was missing. But from the log of the leader, the follower had received a 
snapshot and began to received other transaction logs, so the  
*FileTxnSnapLog#save of follower must have succeed, but the data is not in 
disk!*

 

*2.Even if this situation that the size of snapshot is 0 could not cause data 
inconsistency.*

Yes, I know. Zookeeper recover it's data from both logs and snapshot.

If a ZooKeeper follower believes a snapshot is saved, it believes that the data 
in the snapshot is all in the disk(but in fact it may be not), it will begin to 
receive logs that come after the snapshot. If the snapshot is invalid, 
ZooKeeper server will recover data from logs only, but some data is missing, 
because the data is only saved in the snapshot.

 

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3197) Improve documentation in ZooKeeperServer.superSecret

2018-12-26 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729405#comment-16729405
 ] 

maoling commented on ZOOKEEPER-3197:


Thanks [~nixon]'s good explains about this magic number!!!

> Improve documentation in ZooKeeperServer.superSecret
> 
>
> Key: ZOOKEEPER-3197
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3197
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Colm O hEigeartaigh
>Priority: Trivial
>
> A security scan flagged the use of a hard-coded secret 
> (ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
> generate a password:
> byte[] generatePasswd(long id)
> {         Random r = new Random(id ^ superSecret);         byte p[] = 
> new byte[16];         r.nextBytes(p);         return p;     }
> superSecret has the following javadoc:
>  /**
>     * This is the secret that we use to generate passwords, for the moment it
>     * is more of a sanity check.
>     */
> It is unclear from this comment and looking at the code why it is not a 
> security risk. It would be good to update the javadoc along the lines of 
> "Using a hard-coded secret with Random to generate a password is not a 
> security risk because the resulting passwords are used for X, Y, Z and not 
> for authentication etc" or something would be very helpful for anyone else 
> looking at the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)