Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

2018-12-12 Thread Yongjun Zhang
Hi Konstantin,

Thanks for addressing my other question about failover.

Some thought to share about the suggestion Daryn made.  Seems we could try
this: let ObserverNode throws an RetriableException back to client saying
it has not reached the transaction ID to serve the client yet, maybe even
include the transaction ID gap information in the exception, then when the
client received the RetriableException, it can decide whether the continue
to send the request to the observer node again, or to the active NN when
the gap is too big.

Though saving another RPC would help the performance with the current
implementation, I expect the above mentioned exception only happens
infrequently, so the performance won't be too bad, plus the client has a
chance to try ANN when knowing that the observer is too behind at extreme
case.

I wonder how different the performance is between these two approaches in
cluster with real workload.

Comments?

--Yongjun

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko 
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the Observer
> that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp  wrote:
>
> > -1 pending additional info.  After a cursory scan, I have serious
> concerns
> > regarding the design.  This seems like a feature that should have been
> > purely implemented in hdfs w/o touching the common IPC layer.
> >
> > The biggest issue in the alignment context.  It's purpose appears to be
> > for allowing handlers to reinsert calls back into the call queue.  That's
> > completely unacceptable.  A buggy or malicious client can easily cause
> > livelock in the IPC layer with handlers only looping on calls that never
> > satisfy the condition.  Why is this not implemented via
> RetriableExceptions?
> >
> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang  >
> > wrote:
> >
> >> Great work guys.
> >>
> >> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> >> #2
> >> is not needed for the feature to complete?
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >>
> >> Thanks.
> >> --Yongjun
> >>
> >>
> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
> shv.had...@gmail.com>
> >> wrote:
> >>
> >> > Hi Hadoop developers,
> >> >
> >> > I would like to propose to merge to trunk the feature branch
> HDFS-12943
> >> for
> >> > Consistent Reads from Standby Node. The feature is intended to scale
> >> read
> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> > NameNode. We should be able to accommodate higher overall RPC
> workloads
> >> (up
> >> > to 4x by some estimates) by adding multiple ObserverNodes.
> >> >
> >> > The main functionality has been implemented see sub-tasks of
> HDFS-12943.
> >> > We followed up with the test plan. Testing was done on two independent
> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> > We ran standard HDFS commands, MR jobs, admin commands including
> manual
> >> > failover.
> >> > We know of one cluster running this feature in production.
> >> >
> >> > There are a few outstanding issues:
> >> > 1. Need to provide proper documentation - a user guide for the new
> >> feature
> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> >> doesn't
> >> > know about ObserverNodes trying to convert them to SBNs.
> >> > 3. Scale testing and performance fine-tuning
> >> > 4. As testing progresses, we continue fixing non-critical bugs like
> >> > HDFS-14116.
> >> >
> >> > I attached a unified patch to the umbrella jira for the review and
> >> Jenkins
> >> > build.
> >> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> >> 12.
> >> >
> >> > Thanks,
> >> > --Konstantin
> >> >
> >>
> >
> >
> > --
> >
> > Daryn
> >
>


Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

2018-12-05 Thread Yongjun Zhang
Great work guys.

Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
is not needed for the feature to complete?
2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
know about ObserverNodes trying to convert them to SBNs.

Thanks.
--Yongjun


On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko 
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>


[jira] [Created] (HDFS-14012) Add diag info in RetryInvocationHandler

2018-10-20 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-14012:


 Summary: Add diag info in RetryInvocationHandler
 Key: HDFS-14012
 URL: https://issues.apache.org/jira/browse/HDFS-14012
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
 Environment: RetryInvocationHandler does the following logging:
{code:java}
} else {
LOG.warn("A failover has occurred since the start of this method"
+ " invocation attempt.");
}{code}
Would be helpful to report the method name, and call stack in this message.

Thanks.
        Reporter: Yongjun Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-09-09 Thread Yongjun Zhang
Hi Vinod,

Thanks a lot for the finding and fixing! Sorry for having missed your email
earlier.

Best,

--Yongjun

On Mon, Aug 13, 2018 at 9:51 AM, Vinod Kumar Vavilapalli  wrote:

> Yongjun,
>
> Looks like you didn't add the links to 3.0.3 binary release on the
> http://hadoop.apache.org/releases.html page.
>
> I just did it, FYI: https://svn.apache.org/viewvc?view=revision=
> 1837967
>
> Thanks
> +Vinod
>
>
> On May 31, 2018, at 10:48 PM, Yongjun Zhang  wrote:
>
> Greetings all,
>
> I've created the first release candidate (RC0) for Apache Hadoop
> 3.0.3. This is our next maintenance release to follow up 3.0.2. It includes
> about 249
> important fixes and improvements, among which there are 8 blockers. See
> https://issues.apache.org/jira/issues/?filter=12343997
>
> The RC artifacts are available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
>
> The maven artifacts are available via
> https://repository.apache.org/content/repositories/orgapachehadoop-1126
>
> Please try the release and vote; the vote will run for the usual 5 working
> days, ending on 06/07/2018 PST time. Would really appreciate your
> participation here.
>
> I bumped into quite some issues along the way, many thanks to quite a few
> people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy Xu.
>
> Thanks,
>
> --Yongjun
>
>
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-07-10 Thread Yongjun Zhang
Welcome Jonathan.

http://hadoop.apache.org/releases.html stated:
"Hadoop is released as source code tarballs with corresponding binary
tarballs for convenience. "

and Andrew Wang said "The binary artifacts (including JARs) are technically
just convenience artifacts" and it seems not an uncommon practice to do
follow-up builds to release maven artifacts.

IIRC, Andrew once shared with me that we started in 3.x to use a single
build to to do both release binaries creation and maven artifacts
deployment, prior releases are using multiple builds:

Referring to https://wiki.apache.org/hadoop/HowToRelease

   - 3.x: step 4 in   "Creating the release candidate (X.Y.Z-RC)"
   section does both release binaries creation and maven artifacts deployment.
   - prior to 3.x: step 4 does release binary creation, and step 10 does
   maven artifacts deployment, *each step does its build so two builds here*.
   As a matter of fact, I did not run step 10 for 3.0.3.

That said, I agree that ideally it's better to do a single build to
generate release binaries and deploy maven artifacts from the same build.

Hope it helps. Welcome other folks to chime in.

Best,

--Yongjun






On Mon, Jul 9, 2018 at 2:08 PM, Jonathan Eagles  wrote:

> Thank you, Yongjun Zhang for resolving this issue for me. I have verified
> the 3.0.3 build is now working for me for tez to specify as a hadoop
> dependency.
>
> As for release procedure, can someone comment on what to do now that the
> artifacts published to maven are different than the voted on artifacts. I
> believe the source code is what is voted on and the maven artifacts are
> just for convenience, but would like an "official" answer.
>
> Reference:
> https://issues.apache.org/jira/browse/TEZ-3955
>
> Regards,
> jeagles
>
> On Mon, Jul 9, 2018 at 12:26 PM, Yongjun Zhang 
> wrote:
>
>> HI Jonathan,
>>
>> I have updated the artifacts, so now
>>
>> https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3
>> .0.2~~
>> https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3.0.3
>> ~~
>>
>> are more consistent, except that 3.0.3 has an extra entry for rbf. Would
>> you please try again?
>>
>> The propagation to
>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project
>> will take some time. I did nothing different than last time, so keep
>> finger crossed that it will propagate there.
>>
>> Thanks Sammi Chen and Andrew Wang for info and advice, and sorry for the
>> inconvenience again.
>>
>> Best,
>>
>> --Yongjun
>>
>> On Mon, Jul 2, 2018 at 9:30 AM, Jonathan Eagles 
>> wrote:
>>
>>> Release 3.0.3 is still broken due to the missing artifacts. Any update
>>> on when these artifacts will be published?
>>>
>>> On Wed, Jun 27, 2018 at 8:25 PM, Chen, Sammi 
>>> wrote:
>>>
>>>> Hi Yongjun,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The artifacts will be pushed to https://mvnrepository.com/arti
>>>> fact/org.apache.hadoop/hadoop-project after step 6 of Publishing steps.
>>>>
>>>>
>>>> For 2.9.1, I remember I absolutely did the step before. I redo the step
>>>> 6 today and now 2.9.1 is pushed to the mvn repo.
>>>>
>>>> You can double check it. I suspect sometimes Nexus may fail to notify
>>>> user when this is unexpected failures.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Bests,
>>>>
>>>> Sammi
>>>>
>>>> *From:* Yongjun Zhang [mailto:yzh...@cloudera.com]
>>>> *Sent:* Sunday, June 17, 2018 12:17 PM
>>>> *To:* Jonathan Eagles ; Chen, Sammi <
>>>> sammi.c...@intel.com>
>>>> *Cc:* Eric Payne ; Hadoop Common <
>>>> common-...@hadoop.apache.org>; Hdfs-dev ;
>>>> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
>>>> *Subject:* Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
>>>>
>>>>
>>>>
>>>> + Junping, Sammi
>>>>
>>>>
>>>>
>>>> Hi Jonathan,
>>>>
>>>>
>>>>
>>>> Many thanks for reporting the issues and sorry for the inconvenience.
>>>>
>>>>
>>>>
>>>> 1. Shouldn't the build be looking for artifacts in
>>>>
>>>>
>>>>
>>>> https://repository.apache.org/content/repositories/releases
>>>>
>>>> rather than
>>>>
>>&g

Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-07-09 Thread Yongjun Zhang
HI Jonathan,

I have updated the artifacts, so now

https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3.0.2~~
https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3.0.3~~

are more consistent, except that 3.0.3 has an extra entry for rbf. Would
you please try again?

The propagation to
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project
will take some time. I did nothing different than last time, so keep finger
crossed that it will propagate there.

Thanks Sammi Chen and Andrew Wang for info and advice, and sorry for the
inconvenience again.

Best,

--Yongjun

On Mon, Jul 2, 2018 at 9:30 AM, Jonathan Eagles  wrote:

> Release 3.0.3 is still broken due to the missing artifacts. Any update on
> when these artifacts will be published?
>
> On Wed, Jun 27, 2018 at 8:25 PM, Chen, Sammi  wrote:
>
>> Hi Yongjun,
>>
>>
>>
>>
>>
>> The artifacts will be pushed to https://mvnrepository.com/arti
>> fact/org.apache.hadoop/hadoop-project after step 6 of Publishing steps.
>>
>>
>> For 2.9.1, I remember I absolutely did the step before. I redo the step 6
>> today and now 2.9.1 is pushed to the mvn repo.
>>
>> You can double check it. I suspect sometimes Nexus may fail to notify
>> user when this is unexpected failures.
>>
>>
>>
>>
>>
>> Bests,
>>
>> Sammi
>>
>> *From:* Yongjun Zhang [mailto:yzh...@cloudera.com]
>> *Sent:* Sunday, June 17, 2018 12:17 PM
>> *To:* Jonathan Eagles ; Chen, Sammi <
>> sammi.c...@intel.com>
>> *Cc:* Eric Payne ; Hadoop Common <
>> common-...@hadoop.apache.org>; Hdfs-dev ;
>> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
>> *Subject:* Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
>>
>>
>>
>> + Junping, Sammi
>>
>>
>>
>> Hi Jonathan,
>>
>>
>>
>> Many thanks for reporting the issues and sorry for the inconvenience.
>>
>>
>>
>> 1. Shouldn't the build be looking for artifacts in
>>
>>
>>
>> https://repository.apache.org/content/repositories/releases
>>
>> rather than
>>
>>
>>
>> https://repository.apache.org/content/repositories/snapshots
>>
>> ?
>>
>>
>>
>> 2.
>>
>> Not seeing the artifact published here as well.
>>
>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project
>>
>>
>>
>> Indeed, I did not see 2.9.1 there too. So included Sammi Chen.
>>
>>
>>
>> Hi Junping, would you please share which step in
>>
>> https://wiki.apache.org/hadoop/HowToRelease
>>
>> should have done this?
>>
>>
>>
>> Thanks a lot.
>>
>>
>>
>> --Yongjun
>>
>>
>>
>> On Fri, Jun 15, 2018 at 10:52 PM, Jonathan Eagles 
>> wrote:
>>
>> Upgraded Tez dependency to hadoop 3.0.3 and found this issue. Anyone else
>> seeing this issue?
>>
>>
>>
>> [ERROR] Failed to execute goal on project hadoop-shim: Could not resolve
>> dependencies for project org.apache.tez:hadoop-shim:jar:0.10.0-SNAPSHOT:
>> Failed to collect dependencies at 
>> org.apache.hadoop:hadoop-yarn-api:jar:3.0.3:
>> Failed to read artifact descriptor for 
>> org.apache.hadoop:hadoop-yarn-api:jar:3.0.3:
>> Could not find artifact org.apache.hadoop:hadoop-project:pom:3.0.3 in
>> apache.snapshots.https (https://repository.apache.org
>> /content/repositories/snapshots) -> [Help 1]
>>
>> [ERROR]
>>
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>> -e switch.
>>
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>
>> [ERROR]
>>
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>>
>> [ERROR] [Help 1] http://cwiki.apache.org/conflu
>> ence/display/MAVEN/DependencyResolutionException
>>
>> [ERROR]
>>
>> [ERROR] After correcting the problems, you can resume the build with the
>> command
>>
>> [ERROR]   mvn  -rf :hadoop-shim
>>
>>
>>
>> Not seeing the artifact published here as well.
>>
>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project
>>
>>
>>
>> On Tue, Jun 12, 2018 at 6:44 PM, Yongjun Zhang 
>> wrote:
>>
>> Thanks Eric!
>>
>> --Yongjun
>>
>>
>> On Mon, Jun 11, 2018 at 8:05 AM, Eric Payne 
>> wrote:
>>
>> > Sorry, Yongjun. My

Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-16 Thread Yongjun Zhang
+ Junping, Sammi

Hi Jonathan,

Many thanks for reporting the issues and sorry for the inconvenience.

1. Shouldn't the build be looking for artifacts in

https://repository.apache.org/content/repositories/releases

rather than

https://repository.apache.org/content/repositories/snapshots
?

2.

Not seeing the artifact published here as well.
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project


Indeed, I did not see 2.9.1 there too. So included Sammi Chen.

Hi Junping, would you please share which step in
https://wiki.apache.org/hadoop/HowToRelease
should have done this?

Thanks a lot.

--Yongjun

On Fri, Jun 15, 2018 at 10:52 PM, Jonathan Eagles  wrote:

> Upgraded Tez dependency to hadoop 3.0.3 and found this issue. Anyone else
> seeing this issue?
>
> [ERROR] Failed to execute goal on project hadoop-shim: Could not resolve
> dependencies for project org.apache.tez:hadoop-shim:jar:0.10.0-SNAPSHOT:
> Failed to collect dependencies at org.apache.hadoop:hadoop-yarn-api:jar:3.0.3:
> Failed to read artifact descriptor for 
> org.apache.hadoop:hadoop-yarn-api:jar:3.0.3:
> Could not find artifact org.apache.hadoop:hadoop-project:pom:3.0.3 in
> apache.snapshots.https (https://repository.apache.
> org/content/repositories/snapshots) -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/
> DependencyResolutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :hadoop-shim
>
> Not seeing the artifact published here as well.
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project
>
> On Tue, Jun 12, 2018 at 6:44 PM, Yongjun Zhang 
> wrote:
>
>> Thanks Eric!
>>
>> --Yongjun
>>
>> On Mon, Jun 11, 2018 at 8:05 AM, Eric Payne 
>> wrote:
>>
>> > Sorry, Yongjun. My +1 is also binding
>> > +1 (binding)
>> > -Eric Payne
>> >
>> > On Friday, June 1, 2018, 12:25:36 PM CDT, Eric Payne <
>> > eric.payne1...@yahoo.com> wrote:
>> >
>> >
>> >
>> >
>> > Thanks a lot, Yongjun, for your hard work on this release.
>> >
>> > +1
>> > - Built from source
>> > - Installed on 6 node pseudo cluster
>> >
>> >
>> > Tested the following in the Capacity Scheduler:
>> > - Verified that running apps in labelled queues restricts tasks to the
>> > labelled nodes.
>> > - Verified that various queue config properties for CS are refreshable
>> > - Verified streaming jobs work as expected
>> > - Verified that user weights work as expected
>> > - Verified that FairOrderingPolicy in a CS queue will evenly assign
>> > resources
>> > - Verified running yarn shell application runs as expected
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Friday, June 1, 2018, 12:48:26 AM CDT, Yongjun Zhang <
>> > yjzhan...@apache.org> wrote:
>> >
>> >
>> >
>> >
>> >
>> > Greetings all,
>> >
>> > I've created the first release candidate (RC0) for Apache Hadoop
>> > 3.0.3. This is our next maintenance release to follow up 3.0.2. It
>> includes
>> > about 249
>> > important fixes and improvements, among which there are 8 blockers. See
>> > https://issues.apache.org/jira/issues/?filter=12343997
>> >
>> > The RC artifacts are available at:
>> > https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
>> >
>> > The maven artifacts are available via
>> > https://repository.apache.org/content/repositories/orgapachehadoop-1126
>> >
>> > Please try the release and vote; the vote will run for the usual 5
>> working
>> > days, ending on 06/07/2018 PST time. Would really appreciate your
>> > participation here.
>> >
>> > I bumped into quite some issues along the way, many thanks to quite a
>> few
>> > people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy
>> Xu.
>> >
>> > Thanks,
>> >
>> > --Yongjun
>> >
>>
>
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-12 Thread Yongjun Zhang
Thanks Eric!

--Yongjun

On Mon, Jun 11, 2018 at 8:05 AM, Eric Payne  wrote:

> Sorry, Yongjun. My +1 is also binding
> +1 (binding)
> -Eric Payne
>
> On Friday, June 1, 2018, 12:25:36 PM CDT, Eric Payne <
> eric.payne1...@yahoo.com> wrote:
>
>
>
>
> Thanks a lot, Yongjun, for your hard work on this release.
>
> +1
> - Built from source
> - Installed on 6 node pseudo cluster
>
>
> Tested the following in the Capacity Scheduler:
> - Verified that running apps in labelled queues restricts tasks to the
> labelled nodes.
> - Verified that various queue config properties for CS are refreshable
> - Verified streaming jobs work as expected
> - Verified that user weights work as expected
> - Verified that FairOrderingPolicy in a CS queue will evenly assign
> resources
> - Verified running yarn shell application runs as expected
>
>
>
>
>
>
>
> On Friday, June 1, 2018, 12:48:26 AM CDT, Yongjun Zhang <
> yjzhan...@apache.org> wrote:
>
>
>
>
>
> Greetings all,
>
> I've created the first release candidate (RC0) for Apache Hadoop
> 3.0.3. This is our next maintenance release to follow up 3.0.2. It includes
> about 249
> important fixes and improvements, among which there are 8 blockers. See
> https://issues.apache.org/jira/issues/?filter=12343997
>
> The RC artifacts are available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
>
> The maven artifacts are available via
> https://repository.apache.org/content/repositories/orgapachehadoop-1126
>
> Please try the release and vote; the vote will run for the usual 5 working
> days, ending on 06/07/2018 PST time. Would really appreciate your
> participation here.
>
> I bumped into quite some issues along the way, many thanks to quite a few
> people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy Xu.
>
> Thanks,
>
> --Yongjun
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-08 Thread Yongjun Zhang
Many thanks to all of you!

Now we have many +1s including 3 from PMCs, and no -1s. Would like to call
it a done vote!

I will work on the remaining steps to push forward the release.

Wish you all a very nice weekend,

--Yongjun




On Fri, Jun 8, 2018 at 2:45 PM, John Zhuge  wrote:

> Thanks Xongjun for the excellent work to drive this release!
>
>
> +1 (binding)
>
>- Verified checksums and signatures of tarballs
>- Built source with native, Oracle Java 1.8.0_152 on Mac OS X 10.13.5
>- Verified cloud connectors:
>   - ADLS integration tests passed with 1 failure, not a blocker
>- Deployed both binary and built source to a pseudo cluster, passed
>the following sanity tests in insecure and SSL mode:
>   - HDFS basic and ACL
>   - WebHDFS CLI ls and REST LISTSTATUS
>   - DistCp basic
>   - MapReduce wordcount
>   - KMS and HttpFS basic and servlets
>   - Balancer start/stop
>
>
> ADLS unit test failure:
>
>
> [ERROR] Tests run: 43, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
> 68.889 s <<< FAILURE! - in org.apache.hadoop.fs.adl.live.
> TestAdlFileSystemContractLive
>
> [ERROR] testMkdirsWithUmask(org.apache.hadoop.fs.adl.live.
> TestAdlFileSystemContractLive)  Time elapsed: 0.851 s  <<< FAILURE!
>
> java.lang.AssertionError: expected:<461> but was:<456>
>
>
> See https://issues.apache.org/jira/browse/HADOOP-14435. I don't think it
> is a blocker.
>
> Thanks,
>
> On Fri, Jun 8, 2018 at 12:04 PM, Xiao Chen  wrote:
>
>> Thanks for the effort on this Yongjun.
>>
>> +1 (binding)
>>
>>- Built from src
>>- Deployed a pseudo distributed HDFS with KMS
>>- Ran basic hdfs commands with encryption
>>- Sanity checked webui and logs
>>
>>
>>
>> -Xiao
>>
>> On Fri, Jun 8, 2018 at 10:34 AM, Brahma Reddy Battula <
>> brahmareddy.batt...@hotmail.com> wrote:
>>
>> > Thanks yongjun zhang for driving this release.
>> >
>> > +1 (binding).
>> >
>> >
>> > ---Built from the source
>> > ---Installed HA cluster
>> > ---Execute the basic shell commands
>> > ---Browsed the UI's
>> > ---Ran sample jobs like pi,wordcount
>> >
>> >
>> >
>> > 
>> > From: Yongjun Zhang 
>> > Sent: Friday, June 8, 2018 1:04 PM
>> > To: Allen Wittenauer
>> > Cc: Hadoop Common; Hdfs-dev; mapreduce-...@hadoop.apache.org;
>> > yarn-...@hadoop.apache.org
>> > Subject: Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
>> >
>> > BTW, thanks Allen and Steve for discussing and suggestion about the site
>> > build problem I hit earlier, I did the following step
>> >
>> > mvn install -DskipTests
>> >
>> > before doing the steps Nanda listed helped to solve the problems.
>> >
>> > --Yongjun
>> >
>> >
>> >
>> >
>> > On Thu, Jun 7, 2018 at 6:15 PM, Yongjun Zhang 
>> wrote:
>> >
>> > > Thank you all very much for the testing, feedback and discussion!
>> > >
>> > > I was able to build outside docker, by following the steps Nanda
>> > > described, I saw the same problem; then I tried 3.0.2 released a while
>> > > back, it has the same issue.
>> > >
>> > > As Allen pointed out, it seems the steps to build site are not
>> correct. I
>> > > have not figured out the correct steps yet.
>> > >
>> > > At this point, I think this issue should not block the 3.0.3 issue.
>> While
>> > > at the same time we need to figure out the right steps to build the
>> site.
>> > > Would you please let me know if you think differently?
>> > >
>> > > We only have the site build issue reported so far. And we don't have
>> > > enough PMC votes yet. So need some more PMCs to help.
>> > >
>> > > Thanks again, and best regards,
>> > >
>> > > --Yongjun
>> > >
>> > >
>> > > On Thu, Jun 7, 2018 at 4:15 PM, Allen Wittenauer <
>> > a...@effectivemachines.com
>> > > > wrote:
>> > >
>> > >> > On Jun 7, 2018, at 11:47 AM, Steve Loughran <
>> ste...@hortonworks.com>
>> > >> wrote:
>> > >> >
>> > >> > Actually, Yongjun has been really good at helping me get set up
>> for a
>> > >> 2.7.7 release, including "things you need to do to get GPG working in
>> > the
>> > >> docker image”
>> > >>
>> > >> *shrugs* I use a different release script after some changes
>> > >> broke the in-tree version for building on OS X and I couldn’t get the
>> > fixes
>> > >> committed upstream.  So not sure what the problems are that you are
>> > hitting.
>> > >>
>> > >> > On Jun 7, 2018, at 1:08 PM, Nandakumar Vadivelu <
>> > >> nvadiv...@hortonworks.com> wrote:
>> > >> >
>> > >> > It will be helpful if we can get the correct steps, and also update
>> > the
>> > >> wiki.
>> > >> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+
>> > >> Release+Validation
>> > >>
>> > >> Yup. Looking forward to seeing it.
>> > >> 
>> -
>> > >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>> > >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>> > >>
>> > >>
>> > >
>> >
>>
>
>
>
> --
> John
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-08 Thread Yongjun Zhang
Hello,

PMC votes are needed for release, can I get some help from PMCs?

Really appreciate it!

--Yongjun

On Thu, Jun 7, 2018 at 6:15 PM, Yongjun Zhang  wrote:

> Thank you all very much for the testing, feedback and discussion!
>
> I was able to build outside docker, by following the steps Nanda
> described, I saw the same problem; then I tried 3.0.2 released a while
> back, it has the same issue.
>
> As Allen pointed out, it seems the steps to build site are not correct. I
> have not figured out the correct steps yet.
>
> At this point, I think this issue should not block the 3.0.3 issue. While
> at the same time we need to figure out the right steps to build the site.
> Would you please let me know if you think differently?
>
> We only have the site build issue reported so far. And we don't have
> enough PMC votes yet. So need some more PMCs to help.
>
> Thanks again, and best regards,
>
> --Yongjun
>
>
> On Thu, Jun 7, 2018 at 4:15 PM, Allen Wittenauer  > wrote:
>
>> > On Jun 7, 2018, at 11:47 AM, Steve Loughran 
>> wrote:
>> >
>> > Actually, Yongjun has been really good at helping me get set up for a
>> 2.7.7 release, including "things you need to do to get GPG working in the
>> docker image”
>>
>> *shrugs* I use a different release script after some changes
>> broke the in-tree version for building on OS X and I couldn’t get the fixes
>> committed upstream.  So not sure what the problems are that you are hitting.
>>
>> > On Jun 7, 2018, at 1:08 PM, Nandakumar Vadivelu <
>> nvadiv...@hortonworks.com> wrote:
>> >
>> > It will be helpful if we can get the correct steps, and also update the
>> wiki.
>> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+
>> Release+Validation
>>
>> Yup. Looking forward to seeing it.
>> -
>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>>
>>
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-07 Thread Yongjun Zhang
BTW, thanks Allen and Steve for discussing and suggestion about the site
build problem I hit earlier, I did the following step

mvn install -DskipTests

before doing the steps Nanda listed helped to solve the problems.

--Yongjun




On Thu, Jun 7, 2018 at 6:15 PM, Yongjun Zhang  wrote:

> Thank you all very much for the testing, feedback and discussion!
>
> I was able to build outside docker, by following the steps Nanda
> described, I saw the same problem; then I tried 3.0.2 released a while
> back, it has the same issue.
>
> As Allen pointed out, it seems the steps to build site are not correct. I
> have not figured out the correct steps yet.
>
> At this point, I think this issue should not block the 3.0.3 issue. While
> at the same time we need to figure out the right steps to build the site.
> Would you please let me know if you think differently?
>
> We only have the site build issue reported so far. And we don't have
> enough PMC votes yet. So need some more PMCs to help.
>
> Thanks again, and best regards,
>
> --Yongjun
>
>
> On Thu, Jun 7, 2018 at 4:15 PM, Allen Wittenauer  > wrote:
>
>> > On Jun 7, 2018, at 11:47 AM, Steve Loughran 
>> wrote:
>> >
>> > Actually, Yongjun has been really good at helping me get set up for a
>> 2.7.7 release, including "things you need to do to get GPG working in the
>> docker image”
>>
>> *shrugs* I use a different release script after some changes
>> broke the in-tree version for building on OS X and I couldn’t get the fixes
>> committed upstream.  So not sure what the problems are that you are hitting.
>>
>> > On Jun 7, 2018, at 1:08 PM, Nandakumar Vadivelu <
>> nvadiv...@hortonworks.com> wrote:
>> >
>> > It will be helpful if we can get the correct steps, and also update the
>> wiki.
>> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+
>> Release+Validation
>>
>> Yup. Looking forward to seeing it.
>> -
>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>>
>>
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-07 Thread Yongjun Zhang
Thank you all very much for the testing, feedback and discussion!

I was able to build outside docker, by following the steps Nanda described,
I saw the same problem; then I tried 3.0.2 released a while back, it has
the same issue.

As Allen pointed out, it seems the steps to build site are not correct. I
have not figured out the correct steps yet.

At this point, I think this issue should not block the 3.0.3 issue. While
at the same time we need to figure out the right steps to build the site.
Would you please let me know if you think differently?

We only have the site build issue reported so far. And we don't have enough
PMC votes yet. So need some more PMCs to help.

Thanks again, and best regards,

--Yongjun


On Thu, Jun 7, 2018 at 4:15 PM, Allen Wittenauer 
wrote:

> > On Jun 7, 2018, at 11:47 AM, Steve Loughran 
> wrote:
> >
> > Actually, Yongjun has been really good at helping me get set up for a
> 2.7.7 release, including "things you need to do to get GPG working in the
> docker image”
>
> *shrugs* I use a different release script after some changes broke
> the in-tree version for building on OS X and I couldn’t get the fixes
> committed upstream.  So not sure what the problems are that you are hitting.
>
> > On Jun 7, 2018, at 1:08 PM, Nandakumar Vadivelu <
> nvadiv...@hortonworks.com> wrote:
> >
> > It will be helpful if we can get the correct steps, and also update the
> wiki.
> > https://cwiki.apache.org/confluence/display/HADOOP/
> Hadoop+Release+Validation
>
> Yup. Looking forward to seeing it.
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (HDFS-13663) Should throw exception when incorrect block size is set

2018-06-07 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-13663:


 Summary: Should throw exception when incorrect block size is set
 Key: HDFS-13663
 URL: https://issues.apache.org/jira/browse/HDFS-13663
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


See

./hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java

{code}
void syncBlock(List syncList) throws IOException {


   newBlock.setNumBytes(finalizedLength);
break;
  case RBW:
  case RWR:
long minLength = Long.MAX_VALUE;
for(BlockRecord r : syncList) {
  ReplicaState rState = r.rInfo.getOriginalReplicaState();
  if(rState == bestState) {
minLength = Math.min(minLength, r.rInfo.getNumBytes());
participatingList.add(r);
  }
  if (LOG.isDebugEnabled()) {
LOG.debug("syncBlock replicaInfo: block=" + block +
", from datanode " + r.id + ", receivedState=" + rState.name() +
", receivedLength=" + r.rInfo.getNumBytes() + ", bestState=" +
bestState.name());
  }
}
// recover() guarantees syncList will have at least one replica with RWR
// or better state.
assert minLength != Long.MAX_VALUE : "wrong minLength"; <= should throw 
exception 
newBlock.setNumBytes(minLength);
break;
  case RUR:
  case TEMPORARY:
assert false : "bad replica state: " + bestState;
  default:
break; // we have 'case' all enum values
  }
{code}

when minLength is Long.MAX_VALUE, it should throw exception.

There might be other places like this.

Otherwise, we would see the following WARN in datanode log
{code}
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block xyz 
because on-disk length 11852203 is shorter than NameNode recorded length 
9223372036854775807
{code}
where 9223372036854775807 is Long.MAX_VALUE.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-06 Thread Yongjun Zhang
Thank you all so much for the testing and vote!

Nanda reported the following problem:

The following links in site documentation are broken
- Changelog and Release Notes
- Unix Shell API
- core-default.xml
- hdfs-default.xml
- hdfs-rbf-default.xml
- mapred-default.xml
- yarn-default.xml

Site documentation was generated using the below steps
*- mvn site:site*
*- mkdir -p /tmp/site && mvn site:stage -DstagingDirectory=/tmp/site*
*- Browse to file:///tmp/site/hadoop-project/index.html.*

Thanks,
Nanda


My build in docker was fine, and when I tested out the build outside
docker, I tested the RC binary tarbal and regular build with source
tarball, but missed out the site:site build.

I tried the steps Nanda did, and I'm even hitting an issue at the "mvn
site:site" which Nanda did not hit.

What I'm seeing is:

[INFO] 

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-site-plugin:3.6:site
(default-cli) on project hadoop-main: failed to get report for
org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal on
project hadoop-hdfs: Could not resolve dependencies for project
org.apache.hadoop:hadoop-hdfs:jar:3.0.3: *Failure to find
org.apache.hadoop:hadoop-hdfs-client:jar:3.0.3
in https://s3-us-west-2.amazonaws.com/dynamodb-local/release
<https://s3-us-west-2.amazonaws.com/dynamodb-local/release>* was cached in
the local repository, resolution will not be reattempted until the update
interval of dynamodb-local-oregon has elapsed or updates are forced ->
[Help 1]
[ERROR]


I wonder if anyone else hit this problem. I was able to build 3.0.2 with
the same env. The build is looking for 3.0.3 artifacts in
*https://s3-us-west-2.amazonaws.com/dynamodb-local/release
<https://s3-us-west-2.amazonaws.com/dynamodb-local/release>, *which I'm
sure is the right thing, because 3.0.3 is not yet released. Or probably I
missed something in my env for this, which I have to investigate.

If anyone can follow Nanda's steps do see different outcome, that would be
interesting. Very much appreciated.

Thanks again!

--Yongjun

On Wed, Jun 6, 2018 at 12:59 AM, Kitti Nánási  wrote:

> Thanks Yongjun for working on this!
>
> +1 (non-binding)
>
> - checked out git tag release-3.0.3-RC0
> - built from source on Mac OS X 10.13.4, java version 1.8.0_171
> - deployed on a 3 node cluster
> - ran terasort, teragen, teravalidate with success
> - executed basic hdfs commands
> - created snapshots and produced snapshot diffs
>
> Thanks,
> Kitti Nanasi
>
>
> On Wed, Jun 6, 2018 at 8:06 AM, Takanobu Asanuma 
> wrote:
>
>> Sorry, I forgot to write my vote in the last message.
>>
>> +1 (non-binding)
>>
>> Regards,
>> Takanobu
>>
>> > -Original Message-----
>> > From: Takanobu Asanuma [mailto:tasan...@yahoo-corp.jp]
>> > Sent: Wednesday, June 06, 2018 10:22 AM
>> > To: Gabor Bota 
>> > Cc: Yongjun Zhang ; nvadiv...@hortonworks.com;
>> > sbaner...@hortonworks.com; Hadoop Common 
>> ;
>> > Hdfs-dev ; mapreduce-...@hadoop.apache.org;
>> > yarn-...@hadoop.apache.org
>> > Subject: RE: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
>> >
>> > Thanks for driving this release, Yongjun!
>> >
>> > - Verified checksums
>> > - Succeeded native package build on CentOS 7
>> > - Started a cluster with 1 master and 5 slaves
>> > - Verified Web UI (NN, RM, JobHistory, Timeline)
>> > - Verified Teragen/Terasort jobs
>> > - Verified some operations for erasure coding
>> >
>> > Regards,
>> > Takanobu
>> >
>> > > -Original Message-
>> > > From: Gabor Bota [mailto:gabor.b...@cloudera.com]
>> > > Sent: Tuesday, June 05, 2018 9:18 PM
>> > > To: nvadiv...@hortonworks.com
>> > > Cc: Yongjun Zhang ; sbaner...@hortonworks.com;
>> > > Hadoop Common ; Hdfs-dev
>> > > ; mapreduce-...@hadoop.apache.org;
>> > > yarn-...@hadoop.apache.org
>> > > Subject: Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
>> > >
>> > > Thanks for the work Yongjun!
>> > >
>> > > +1 (non-binding)
>> > >
>> > >- checked out git tag release-3.0.3-RC0. Thanks for adding this
>> > Yongjun,
>> > >it worked.
>> > >- S3A integration (mvn verify) test run were successful on
>> eu-west-1
>> > >besides of one test issue reported in HADOOP-14927.
>> > >- built from source on Mac OS X 10.13.4, java version
>> 8.0.171-oracle
>> > >- deployed on a 3 node cluster (HDFS HA, Non-HA YARN)
>> > >- verified

Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-02 Thread Yongjun Zhang
Hi Gabor,

I got the git tag in, it's release-3.0.3-RC0. Would you please give it a
try?

It should correspond to

commit 37fd7d752db73d984dc31e0cdfd590d252f5e075
Author: Yongjun Zhang 
Date:   Wed May 30 00:07:33 2018 -0700

Update version to 3.0.3 to prepare for 3.0.3 release


Thanks,

--Yongjun

On Fri, Jun 1, 2018 at 4:17 AM, Gabor Bota  wrote:

> Hi Yongjun,
>
> Thank you for working on this release. Is there a git tag in the upstream
> repo which can be checked out? I'd like to build the release from source.
>
> Regards,
> Gabor
>
> On Fri, Jun 1, 2018 at 7:57 AM Shashikant Banerjee <
> sbaner...@hortonworks.com> wrote:
>
>> Looks like the link with the filter seems to be private. I can't see the
>> blocker list.
>> https://issues.apache.org/jira/issues/?filter=12343997
>>
>> Meanwhile , I will be working on testing the release.
>>
>> Thanks
>> Shashi
>> On 6/1/18, 11:18 AM, "Yongjun Zhang"  wrote:
>>
>> Greetings all,
>>
>> I've created the first release candidate (RC0) for Apache Hadoop
>> 3.0.3. This is our next maintenance release to follow up 3.0.2. It
>> includes
>> about 249
>> important fixes and improvements, among which there are 8 blockers.
>> See
>> https://issues.apache.org/jira/issues/?filter=12343997
>>
>> The RC artifacts are available at:
>> https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
>>
>> The maven artifacts are available via
>> https://repository.apache.org/content/repositories/
>> orgapachehadoop-1126
>>
>> Please try the release and vote; the vote will run for the usual 5
>> working
>> days, ending on 06/07/2018 PST time. Would really appreciate your
>> participation here.
>>
>> I bumped into quite some issues along the way, many thanks to quite a
>> few
>> people who helped, especially Sammi Chen, Andrew Wang, Junping Du,
>> Eddy Xu.
>>
>> Thanks,
>>
>> --Yongjun
>>
>>
>>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-02 Thread Yongjun Zhang
Many thanks Shashikant, Zsolt, and Ajay!

Best,

--Yongjun

On Sat, Jun 2, 2018 at 7:34 AM, Shashikant Banerjee <
sbaner...@hortonworks.com> wrote:

> Thanks for working on this Yongjun!
>
> +1 (non-binding)
>  - verified signatures and checksums
>  - built from source and setup single node cluster
>  - ran basic hdfs operations
> - basic sanity check of NN
>
> Thanks
> Shashi
>
> On 6/2/18, 7:31 PM, "Zsolt Venczel"  wrote:
>
> Thanks Yongjun for working on this!
>
> +1 (non-binding)
>
>  - built from source with native library
>  - run hadoop-hdfs-client tests with native library that all passed
>  - set up a cluster with 3 nodes and run teragen, terasort and
> teravalidate
> successfully
>  - created two snapshots and produced a snapshot diff successfully
>  - checked out the webui, looked at the file structure and double
> checked
> the created snapshots
>
> Thanks and best regards,
> Zsolt
>
> On Sat, Jun 2, 2018 at 11:33 AM Ajay Kumar  >
> wrote:
>
> > Thanks for working on this Yongjun!!
> >
> > +1 (non-binding)
> > - verified signatures and checksums
> > - built from source and setup single node cluster
> > - ran basic hdfs operations
> > - ran TestDFSIO(read/write), wordcount, pi jobs.
> > - basic sanity check of NN, RM UI
> >
> > Thanks,
> > Ajay
> >
> > On 6/2/18, 12:45 AM, "Shashikant Banerjee" <
> sbaner...@hortonworks.com>
> > wrote:
> >
> > Hi Yongjun,
> >
> > I am able to see the list after logging in now.
> >
> > Thanks
> > Shashi
> >
> > From: Yongjun Zhang 
> > Date: Friday, June 1, 2018 at 9:11 PM
> > To: Gabor Bota 
> > Cc: Shashikant Banerjee , Hadoop
> Common <
> > common-...@hadoop.apache.org>, Hdfs-dev ,
> "
> > mapreduce-...@hadoop.apache.org" ,
> "
> > yarn-...@hadoop.apache.org" 
> > Subject: Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
> >
> >
> > Thanks for the feedback!
> >
> > Hi Shashikant, I thought I have made the filter visible to jira
> users,
> > now I changed it to be visible to all logged-in users of jira.
> Please let
> > me know if you can not see it.
> >
> > Hi Gabor,
> >
> > Good question. I forgot to mention, I have tried to add tag
> earlier,
> > as step 7,8 9 in
> > https://wiki.apache.org/hadoop/HowToRelease, but these steps
> seem to
> > not push anything to git. I suspect step 4 should have been run with
> > --rc-label , and checked with Andrew, he said it doesn't matter and
> often
> > people don't use the rc label.
> >
> > I probably should mention that the build is on commit id
> > 37fd7d752db73d984dc31e0cdfd590d252f5e075.
> >
> > The source is also available at
> > https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
> >
> > Thanks.
> >
> > --Yongjun
> >
> > On Fri, Jun 1, 2018 at 4:17 AM, Gabor Bota <
> gabor.b...@cloudera.com
> > <mailto:gabor.b...@cloudera.com>> wrote:
> > Hi Yongjun,
> >
> > Thank you for working on this release. Is there a git tag in the
> > upstream repo which can be checked out? I'd like to build the
> release from
> > source.
> >
> > Regards,
> > Gabor
> >
> > On Fri, Jun 1, 2018 at 7:57 AM Shashikant Banerjee <
> > sbaner...@hortonworks.com<mailto:sbaner...@hortonworks.com>> wrote:
> > Looks like the link with the filter seems to be private. I can't
> see
> > the blocker list.
> > https://issues.apache.org/jira/issues/?filter=12343997
> >
> > Meanwhile , I will be working on testing the release.
> >
> > Thanks
> > Shashi
> > On 6/1/18, 11:18 AM, "Yongjun Zhang"   > yjzhan...@apache.org>> wrote:
> >
> > Greetings all,
> >
> > I've created the first release candidate (RC0) for Apache
> Hadoop
> > 3.0.3. This is our next maintenance release to follow up
> 3.0.2. It
> > includes
> > about 249
&g

Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-01 Thread Yongjun Zhang
Hi Eric,

Thanks so much for your work and quick feedback, that's cool!

--Yongjun

On Fri, Jun 1, 2018 at 10:25 AM, Eric Payne <
eric.payne1...@yahoo.com.invalid> wrote:

>
>
> Thanks a lot, Yongjun, for your hard work on this release.
>
> +1
> - Built from source
> - Installed on 6 node pseudo cluster
>
>
> Tested the following in the Capacity Scheduler:
> - Verified that running apps in labelled queues restricts tasks to the
> labelled nodes.
> - Verified that various queue config properties for CS are refreshable
> - Verified streaming jobs work as expected
> - Verified that user weights work as expected
> - Verified that FairOrderingPolicy in a CS queue will evenly assign
> resources
> - Verified running yarn shell application runs as expected
>
>
>
>
>
>
>
> On Friday, June 1, 2018, 12:48:26 AM CDT, Yongjun Zhang <
> yjzhan...@apache.org> wrote:
>
>
>
>
>
> Greetings all,
>
> I've created the first release candidate (RC0) for Apache Hadoop
> 3.0.3. This is our next maintenance release to follow up 3.0.2. It includes
> about 249
> important fixes and improvements, among which there are 8 blockers. See
> https://issues.apache.org/jira/issues/?filter=12343997
>
> The RC artifacts are available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
>
> The maven artifacts are available via
> https://repository.apache.org/content/repositories/orgapachehadoop-1126
>
> Please try the release and vote; the vote will run for the usual 5 working
> days, ending on 06/07/2018 PST time. Would really appreciate your
> participation here.
>
> I bumped into quite some issues along the way, many thanks to quite a few
> people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy Xu.
>
> Thanks,
>
> --Yongjun
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-01 Thread Yongjun Zhang
Hi Rushabh,

Would you please try again?

Permissions

This filter is visible to:

   - *Group*: all-developers
   - *Group*: jira-users


Thanks.

--Yongjun

On Fri, Jun 1, 2018 at 8:48 AM, Rushabh Shah  wrote:

> Hi Yongjun,
> Thanks for all the effort you put in for this release.
>
>> https://issues.apache.org/jira/issues/?filter=12343997
>
>  Even after logging in, I am not able to see any jiras.
> It says "The requested filter doesn't exist or is private."
>
> On Fri, Jun 1, 2018 at 10:41 AM, Yongjun Zhang 
> wrote:
>
>> Thanks for the feedback!
>>
>> Hi Shashikant, I thought I have made the filter visible to jira users, now
>> I changed it to be visible to all logged-in users of jira. Please let me
>> know if you can not see it.
>>
>> Hi Gabor,
>>
>> Good question. I forgot to mention, I have tried to add tag earlier, as
>> step 7,8 9 in
>> https://wiki.apache.org/hadoop/HowToRelease, but these steps seem to not
>> push anything to git. I suspect step 4 should have been run with
>> *--rc-label* , and checked with Andrew, he said it doesn't matter and
>> often
>>
>> people don't use the rc label.
>>
>> I probably should mention that the build is on commit id
>> 37fd7d752db73d984dc31e0cdfd590d252f5e075.
>>
>> The source is also available at   https://dist.apache.org/repos/
>> dist/dev/hadoop/3.0.3-RC0/
>>
>> Thanks.
>>
>> --Yongjun
>>
>> On Fri, Jun 1, 2018 at 4:17 AM, Gabor Bota 
>> wrote:
>>
>> > Hi Yongjun,
>> >
>> > Thank you for working on this release. Is there a git tag in the
>> upstream
>> > repo which can be checked out? I'd like to build the release from
>> source.
>> >
>> > Regards,
>> > Gabor
>> >
>> > On Fri, Jun 1, 2018 at 7:57 AM Shashikant Banerjee <
>> > sbaner...@hortonworks.com> wrote:
>> >
>> >> Looks like the link with the filter seems to be private. I can't see
>> the
>> >> blocker list.
>> >> https://issues.apache.org/jira/issues/?filter=12343997
>> >>
>> >> Meanwhile , I will be working on testing the release.
>> >>
>> >> Thanks
>> >> Shashi
>> >> On 6/1/18, 11:18 AM, "Yongjun Zhang"  wrote:
>> >>
>> >> Greetings all,
>> >>
>> >> I've created the first release candidate (RC0) for Apache Hadoop
>> >> 3.0.3. This is our next maintenance release to follow up 3.0.2. It
>> >> includes
>> >> about 249
>> >> important fixes and improvements, among which there are 8 blockers.
>> >> See
>> >> https://issues.apache.org/jira/issues/?filter=12343997
>> >>
>> >> The RC artifacts are available at:
>> >> https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
>> >>
>> >> The maven artifacts are available via
>> >> https://repository.apache.org/content/repositories/
>> >> orgapachehadoop-1126
>> >>
>> >> Please try the release and vote; the vote will run for the usual 5
>> >> working
>> >> days, ending on 06/07/2018 PST time. Would really appreciate your
>> >> participation here.
>> >>
>> >> I bumped into quite some issues along the way, many thanks to
>> quite a
>> >> few
>> >> people who helped, especially Sammi Chen, Andrew Wang, Junping Du,
>> >> Eddy Xu.
>> >>
>> >> Thanks,
>> >>
>> >> --Yongjun
>> >>
>> >>
>> >>
>>
>
>


Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-01 Thread Yongjun Zhang
Thanks for the feedback!

Hi Shashikant, I thought I have made the filter visible to jira users, now
I changed it to be visible to all logged-in users of jira. Please let me
know if you can not see it.

Hi Gabor,

Good question. I forgot to mention, I have tried to add tag earlier, as
step 7,8 9 in
https://wiki.apache.org/hadoop/HowToRelease, but these steps seem to not
push anything to git. I suspect step 4 should have been run with
*--rc-label* , and checked with Andrew, he said it doesn't matter and often
people don't use the rc label.

I probably should mention that the build is on commit id
37fd7d752db73d984dc31e0cdfd590d252f5e075.

The source is also available at   https://dist.apache.org/repos/
dist/dev/hadoop/3.0.3-RC0/

Thanks.

--Yongjun

On Fri, Jun 1, 2018 at 4:17 AM, Gabor Bota  wrote:

> Hi Yongjun,
>
> Thank you for working on this release. Is there a git tag in the upstream
> repo which can be checked out? I'd like to build the release from source.
>
> Regards,
> Gabor
>
> On Fri, Jun 1, 2018 at 7:57 AM Shashikant Banerjee <
> sbaner...@hortonworks.com> wrote:
>
>> Looks like the link with the filter seems to be private. I can't see the
>> blocker list.
>> https://issues.apache.org/jira/issues/?filter=12343997
>>
>> Meanwhile , I will be working on testing the release.
>>
>> Thanks
>> Shashi
>> On 6/1/18, 11:18 AM, "Yongjun Zhang"  wrote:
>>
>> Greetings all,
>>
>> I've created the first release candidate (RC0) for Apache Hadoop
>> 3.0.3. This is our next maintenance release to follow up 3.0.2. It
>> includes
>> about 249
>> important fixes and improvements, among which there are 8 blockers.
>> See
>> https://issues.apache.org/jira/issues/?filter=12343997
>>
>> The RC artifacts are available at:
>> https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
>>
>> The maven artifacts are available via
>> https://repository.apache.org/content/repositories/
>> orgapachehadoop-1126
>>
>> Please try the release and vote; the vote will run for the usual 5
>> working
>> days, ending on 06/07/2018 PST time. Would really appreciate your
>> participation here.
>>
>> I bumped into quite some issues along the way, many thanks to quite a
>> few
>> people who helped, especially Sammi Chen, Andrew Wang, Junping Du,
>> Eddy Xu.
>>
>> Thanks,
>>
>> --Yongjun
>>
>>
>>


[VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-05-31 Thread Yongjun Zhang
Greetings all,

I've created the first release candidate (RC0) for Apache Hadoop
3.0.3. This is our next maintenance release to follow up 3.0.2. It includes
about 249
important fixes and improvements, among which there are 8 blockers. See
https://issues.apache.org/jira/issues/?filter=12343997

The RC artifacts are available at:
https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/

The maven artifacts are available via
https://repository.apache.org/content/repositories/orgapachehadoop-1126

Please try the release and vote; the vote will run for the usual 5 working
days, ending on 06/07/2018 PST time. Would really appreciate your
participation here.

I bumped into quite some issues along the way, many thanks to quite a few
people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy Xu.

Thanks,

--Yongjun


Re: Apache Hadoop 3.0.3 Release plan

2018-05-31 Thread Yongjun Zhang
Unfortunately still hit another issue, step 4 in
https://wiki.apache.org/hadoop/HowToRelease doesn't deploy artifacts to
Nexus. According to Andrew, it should and we should skip step 10.

Though I have the release tarballs, because of this deployment issue, I
still have to investigate further. Sorry for the inconvenience.

--Yongjun

On Wed, May 30, 2018 at 11:49 AM, Yongjun Zhang 
wrote:

> Hi,
>
> The build issues are all solved, and I have cut the 3.0.3 branch and close
> to get a build out. Since it's taking me a bit more time (I expect to send
> vote invitation email by today). I would like to send a heads-up notice now.
>
> Thank you all for feedback, and many thanks to Sammi Chen, Andrew Wang,
> Eddy Xu who helped when I tried to solve the build issues.
>
> At this point, please be aware of the existence of branch-3.0,
> branch-3.0.3.
>
> Best,
>
> --Yongjun
>
>
>
> On Sat, May 26, 2018 at 11:52 PM, Yongjun Zhang 
> wrote:
>
>> HI,
>>
>> I did build before cut branch and hit some issues, have not got to the
>> bottom, will cut branch after the build issues are resolved.
>>
>> Thanks.
>>
>> --Yongjun
>>
>> On Sat, May 26, 2018 at 1:46 PM, Yongjun Zhang 
>> wrote:
>>
>>> Hi All,
>>>
>>> I will be working on cutting the 3.0.3 branch and trying a build today.
>>>
>>> Thanks.
>>>
>>> --Yongjun
>>>
>>>
>>>
>>> On Wed, May 23, 2018 at 3:31 PM, Yongjun Zhang 
>>> wrote:
>>>
>>>> Thanks Eric. Sounds good. I may try to see if I can do the branching/RC
>>>> sooner.
>>>>
>>>> --Yongjun
>>>>
>>>>
>>>> On Wed, May 23, 2018 at 2:18 PM, Eric Badger  wrote:
>>>>
>>>>> My thinking is to cut the branch in next couple of days and create RC
>>>>> for
>>>>> vote at the end of month.
>>>>>   >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th)
>>>>> and vote for RC on May 30th
>>>>>   I much prefer to wait to cut the branch until just before the
>>>>> production of the release and the vote. With so many branches, we 
>>>>> sometimes
>>>>> miss putting critical bug fixes in unreleased branches if thebranch is
>>>>> cut too early.
>>>>>
>>>>> Echoing Eric Payne, I think we should wait to cut the branch until we
>>>>> are actually creating the RC to vote on (i.e. on May 29 or 30 if the vote
>>>>> is to be on May 30).
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 23, 2018 at 4:11 PM, Yongjun Zhang 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have gardened the jiras for 3.0.3, and have the following open
>>>>>> issues:
>>>>>>
>>>>>> https://issues.apache.org/jira/issues/?filter=12343970
>>>>>>
>>>>>> Two of them are blockers, one of them (YARN-8346) has already got +1
>>>>>> for
>>>>>> patch, the other (YARN-8108) will take longer time to resolve and it
>>>>>> seems
>>>>>> we can possibly push it to next release given 3.0.2 also has the
>>>>>> issue.
>>>>>>
>>>>>> My thinking is to cut the branch in next couple of days and create RC
>>>>>> for
>>>>>> vote at the end of month.
>>>>>>
>>>>>> Comments are welcome.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --Yongjun
>>>>>>
>>>>>> On Tue, May 8, 2018 at 11:40 AM, Vrushali C 
>>>>>> wrote:
>>>>>>
>>>>>> > +1 for including the YARN-7190 patch in 3.0.3 release. This is a
>>>>>> fix that
>>>>>> > will enable HBase to use Hadoop 3.0.x in the production line.
>>>>>> >
>>>>>> > thanks
>>>>>> > Vrushali
>>>>>> >
>>>>>> >
>>>>>> > On Tue, May 8, 2018 at 10:24 AM, Yongjun Zhang >>>>> >
>>>>>> > wrote:
>>>>>> >
>>>>>> >> Thanks Wei-Chiu and Haibo for the feedback!
>>>>>> >>
>>>>>> >> Good thing is that I have 

Re: Apache Hadoop 3.0.3 Release plan

2018-05-30 Thread Yongjun Zhang
Sorry, forgot to mention Junping Du, who also helped me a lot. Many thanks
Junping!

Thanks.

--Yongjun

On Wed, May 30, 2018 at 11:49 AM, Yongjun Zhang 
wrote:

> Hi,
>
> The build issues are all solved, and I have cut the 3.0.3 branch and close
> to get a build out. Since it's taking me a bit more time (I expect to send
> vote invitation email by today). I would like to send a heads-up notice now.
>
> Thank you all for feedback, and many thanks to Sammi Chen, Andrew Wang,
> Eddy Xu who helped when I tried to solve the build issues.
>
> At this point, please be aware of the existence of branch-3.0,
> branch-3.0.3.
>
> Best,
>
> --Yongjun
>
>
>
> On Sat, May 26, 2018 at 11:52 PM, Yongjun Zhang 
> wrote:
>
>> HI,
>>
>> I did build before cut branch and hit some issues, have not got to the
>> bottom, will cut branch after the build issues are resolved.
>>
>> Thanks.
>>
>> --Yongjun
>>
>> On Sat, May 26, 2018 at 1:46 PM, Yongjun Zhang 
>> wrote:
>>
>>> Hi All,
>>>
>>> I will be working on cutting the 3.0.3 branch and trying a build today.
>>>
>>> Thanks.
>>>
>>> --Yongjun
>>>
>>>
>>>
>>> On Wed, May 23, 2018 at 3:31 PM, Yongjun Zhang 
>>> wrote:
>>>
>>>> Thanks Eric. Sounds good. I may try to see if I can do the branching/RC
>>>> sooner.
>>>>
>>>> --Yongjun
>>>>
>>>>
>>>> On Wed, May 23, 2018 at 2:18 PM, Eric Badger  wrote:
>>>>
>>>>> My thinking is to cut the branch in next couple of days and create RC
>>>>> for
>>>>> vote at the end of month.
>>>>>   >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th)
>>>>> and vote for RC on May 30th
>>>>>   I much prefer to wait to cut the branch until just before the
>>>>> production of the release and the vote. With so many branches, we 
>>>>> sometimes
>>>>> miss putting critical bug fixes in unreleased branches if thebranch is
>>>>> cut too early.
>>>>>
>>>>> Echoing Eric Payne, I think we should wait to cut the branch until we
>>>>> are actually creating the RC to vote on (i.e. on May 29 or 30 if the vote
>>>>> is to be on May 30).
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 23, 2018 at 4:11 PM, Yongjun Zhang 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have gardened the jiras for 3.0.3, and have the following open
>>>>>> issues:
>>>>>>
>>>>>> https://issues.apache.org/jira/issues/?filter=12343970
>>>>>>
>>>>>> Two of them are blockers, one of them (YARN-8346) has already got +1
>>>>>> for
>>>>>> patch, the other (YARN-8108) will take longer time to resolve and it
>>>>>> seems
>>>>>> we can possibly push it to next release given 3.0.2 also has the
>>>>>> issue.
>>>>>>
>>>>>> My thinking is to cut the branch in next couple of days and create RC
>>>>>> for
>>>>>> vote at the end of month.
>>>>>>
>>>>>> Comments are welcome.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --Yongjun
>>>>>>
>>>>>> On Tue, May 8, 2018 at 11:40 AM, Vrushali C 
>>>>>> wrote:
>>>>>>
>>>>>> > +1 for including the YARN-7190 patch in 3.0.3 release. This is a
>>>>>> fix that
>>>>>> > will enable HBase to use Hadoop 3.0.x in the production line.
>>>>>> >
>>>>>> > thanks
>>>>>> > Vrushali
>>>>>> >
>>>>>> >
>>>>>> > On Tue, May 8, 2018 at 10:24 AM, Yongjun Zhang >>>>> >
>>>>>> > wrote:
>>>>>> >
>>>>>> >> Thanks Wei-Chiu and Haibo for the feedback!
>>>>>> >>
>>>>>> >> Good thing is that I have made the following note couple of days
>>>>>> ago when
>>>>>> >> I
>>>>>> >> looked the at branch diff, so we are on the same page:
>>>>>> >>
>

Re: Apache Hadoop 3.0.3 Release plan

2018-05-30 Thread Yongjun Zhang
Hi,

The build issues are all solved, and I have cut the 3.0.3 branch and close
to get a build out. Since it's taking me a bit more time (I expect to send
vote invitation email by today). I would like to send a heads-up notice now.

Thank you all for feedback, and many thanks to Sammi Chen, Andrew Wang,
Eddy Xu who helped when I tried to solve the build issues.

At this point, please be aware of the existence of branch-3.0, branch-3.0.3.

Best,

--Yongjun



On Sat, May 26, 2018 at 11:52 PM, Yongjun Zhang  wrote:

> HI,
>
> I did build before cut branch and hit some issues, have not got to the
> bottom, will cut branch after the build issues are resolved.
>
> Thanks.
>
> --Yongjun
>
> On Sat, May 26, 2018 at 1:46 PM, Yongjun Zhang 
> wrote:
>
>> Hi All,
>>
>> I will be working on cutting the 3.0.3 branch and trying a build today.
>>
>> Thanks.
>>
>> --Yongjun
>>
>>
>>
>> On Wed, May 23, 2018 at 3:31 PM, Yongjun Zhang 
>> wrote:
>>
>>> Thanks Eric. Sounds good. I may try to see if I can do the branching/RC
>>> sooner.
>>>
>>> --Yongjun
>>>
>>>
>>> On Wed, May 23, 2018 at 2:18 PM, Eric Badger  wrote:
>>>
>>>> My thinking is to cut the branch in next couple of days and create RC
>>>> for
>>>> vote at the end of month.
>>>>   >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and
>>>> vote for RC on May 30th
>>>>   I much prefer to wait to cut the branch until just before the
>>>> production of the release and the vote. With so many branches, we sometimes
>>>> miss putting critical bug fixes in unreleased branches if the    branch is
>>>> cut too early.
>>>>
>>>> Echoing Eric Payne, I think we should wait to cut the branch until we
>>>> are actually creating the RC to vote on (i.e. on May 29 or 30 if the vote
>>>> is to be on May 30).
>>>>
>>>> Eric
>>>>
>>>>
>>>>
>>>> On Wed, May 23, 2018 at 4:11 PM, Yongjun Zhang 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have gardened the jiras for 3.0.3, and have the following open
>>>>> issues:
>>>>>
>>>>> https://issues.apache.org/jira/issues/?filter=12343970
>>>>>
>>>>> Two of them are blockers, one of them (YARN-8346) has already got +1
>>>>> for
>>>>> patch, the other (YARN-8108) will take longer time to resolve and it
>>>>> seems
>>>>> we can possibly push it to next release given 3.0.2 also has the issue.
>>>>>
>>>>> My thinking is to cut the branch in next couple of days and create RC
>>>>> for
>>>>> vote at the end of month.
>>>>>
>>>>> Comments are welcome.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --Yongjun
>>>>>
>>>>> On Tue, May 8, 2018 at 11:40 AM, Vrushali C 
>>>>> wrote:
>>>>>
>>>>> > +1 for including the YARN-7190 patch in 3.0.3 release. This is a fix
>>>>> that
>>>>> > will enable HBase to use Hadoop 3.0.x in the production line.
>>>>> >
>>>>> > thanks
>>>>> > Vrushali
>>>>> >
>>>>> >
>>>>> > On Tue, May 8, 2018 at 10:24 AM, Yongjun Zhang 
>>>>> > wrote:
>>>>> >
>>>>> >> Thanks Wei-Chiu and Haibo for the feedback!
>>>>> >>
>>>>> >> Good thing is that I have made the following note couple of days
>>>>> ago when
>>>>> >> I
>>>>> >> looked the at branch diff, so we are on the same page:
>>>>> >>
>>>>> >>  496dc57 Revert "YARN-7190. Ensure only NM classpath in 2.x
>>>>> gets TSv2
>>>>> >> related hbase jars, not the user classpath. Contributed by Varun
>>>>> Saxena."
>>>>> >>
>>>>> >> *YARN-7190 is not in 3.0.2,  I will include it in 3.0.3 per* the
>>>>> comment
>>>>> >> below:
>>>>> >> https://issues.apache.org/jira/browse/YARN-7190?focusedComme
>>>>> >> ntId=16457649&
>>>>> >> page=com.atlassian.jira.plugin.system.issuetabpanels
>>>>> >> <https://is

Re: Apache Hadoop 3.0.3 Release plan

2018-05-27 Thread Yongjun Zhang
HI,

I did build before cut branch and hit some issues, have not got to the
bottom, will cut branch after the build issues are resolved.

Thanks.

--Yongjun

On Sat, May 26, 2018 at 1:46 PM, Yongjun Zhang <yjzhan...@apache.org> wrote:

> Hi All,
>
> I will be working on cutting the 3.0.3 branch and trying a build today.
>
> Thanks.
>
> --Yongjun
>
>
>
> On Wed, May 23, 2018 at 3:31 PM, Yongjun Zhang <yzh...@cloudera.com>
> wrote:
>
>> Thanks Eric. Sounds good. I may try to see if I can do the branching/RC
>> sooner.
>>
>> --Yongjun
>>
>>
>> On Wed, May 23, 2018 at 2:18 PM, Eric Badger <ebad...@oath.com> wrote:
>>
>>> My thinking is to cut the branch in next couple of days and create RC for
>>> vote at the end of month.
>>>   >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and
>>> vote for RC on May 30th
>>>   I much prefer to wait to cut the branch until just before the
>>> production of the release and the vote. With so many branches, we sometimes
>>> miss putting critical bug fixes in unreleased branches if thebranch is
>>> cut too early.
>>>
>>> Echoing Eric Payne, I think we should wait to cut the branch until we
>>> are actually creating the RC to vote on (i.e. on May 29 or 30 if the vote
>>> is to be on May 30).
>>>
>>> Eric
>>>
>>>
>>>
>>> On Wed, May 23, 2018 at 4:11 PM, Yongjun Zhang <yzh...@cloudera.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have gardened the jiras for 3.0.3, and have the following open issues:
>>>>
>>>> https://issues.apache.org/jira/issues/?filter=12343970
>>>>
>>>> Two of them are blockers, one of them (YARN-8346) has already got +1 for
>>>> patch, the other (YARN-8108) will take longer time to resolve and it
>>>> seems
>>>> we can possibly push it to next release given 3.0.2 also has the issue.
>>>>
>>>> My thinking is to cut the branch in next couple of days and create RC
>>>> for
>>>> vote at the end of month.
>>>>
>>>> Comments are welcome.
>>>>
>>>> Thanks,
>>>>
>>>> --Yongjun
>>>>
>>>> On Tue, May 8, 2018 at 11:40 AM, Vrushali C <vrushalic2...@gmail.com>
>>>> wrote:
>>>>
>>>> > +1 for including the YARN-7190 patch in 3.0.3 release. This is a fix
>>>> that
>>>> > will enable HBase to use Hadoop 3.0.x in the production line.
>>>> >
>>>> > thanks
>>>> > Vrushali
>>>> >
>>>> >
>>>> > On Tue, May 8, 2018 at 10:24 AM, Yongjun Zhang <yzh...@cloudera.com>
>>>> > wrote:
>>>> >
>>>> >> Thanks Wei-Chiu and Haibo for the feedback!
>>>> >>
>>>> >> Good thing is that I have made the following note couple of days ago
>>>> when
>>>> >> I
>>>> >> looked the at branch diff, so we are on the same page:
>>>> >>
>>>> >>  496dc57 Revert "YARN-7190. Ensure only NM classpath in 2.x gets
>>>> TSv2
>>>> >> related hbase jars, not the user classpath. Contributed by Varun
>>>> Saxena."
>>>> >>
>>>> >> *YARN-7190 is not in 3.0.2,  I will include it in 3.0.3 per* the
>>>> comment
>>>> >> below:
>>>> >> https://issues.apache.org/jira/browse/YARN-7190?focusedComme
>>>> >> ntId=16457649&
>>>> >> page=com.atlassian.jira.plugin.system.issuetabpanels
>>>> >> <https://issues.apache.org/jira/browse/YARN-7190?focusedComm
>>>> entId=16457649=com.atlassian.jira.plugin.system.issuetabpanels>
>>>>
>>>> >> :
>>>> >> comment-tabpanel#comment-16457649
>>>> >>
>>>> >>
>>>> >> In addition, I will revert   https://issues.apache.org/
>>>> >> jira/browse/HADOOP-13055 from 3.0.3 since it's a feature.
>>>> >>
>>>> >>
>>>> >> Best,
>>>> >>
>>>> >> --Yongjun
>>>> >>
>>>> >> On Tue, May 8, 2018 at 8:57 AM, Haibo Chen <haiboc...@cloudera.com>
>>>> >> wrote:
>>>> >>
>>>> >> > +1 on adding YARN-7190 to Hadoo

Re: Apache Hadoop 3.0.3 Release plan

2018-05-26 Thread Yongjun Zhang
Hi All,

I will be working on cutting the 3.0.3 branch and trying a build today.

Thanks.

--Yongjun



On Wed, May 23, 2018 at 3:31 PM, Yongjun Zhang <yzh...@cloudera.com> wrote:

> Thanks Eric. Sounds good. I may try to see if I can do the branching/RC
> sooner.
>
> --Yongjun
>
>
> On Wed, May 23, 2018 at 2:18 PM, Eric Badger <ebad...@oath.com> wrote:
>
>> My thinking is to cut the branch in next couple of days and create RC for
>> vote at the end of month.
>>   >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and
>> vote for RC on May 30th
>>   I much prefer to wait to cut the branch until just before the
>> production of the release and the vote. With so many branches, we sometimes
>> miss putting critical bug fixes in unreleased branches if thebranch is
>> cut too early.
>>
>> Echoing Eric Payne, I think we should wait to cut the branch until we are
>> actually creating the RC to vote on (i.e. on May 29 or 30 if the vote is to
>> be on May 30).
>>
>> Eric
>>
>>
>>
>> On Wed, May 23, 2018 at 4:11 PM, Yongjun Zhang <yzh...@cloudera.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have gardened the jiras for 3.0.3, and have the following open issues:
>>>
>>> https://issues.apache.org/jira/issues/?filter=12343970
>>>
>>> Two of them are blockers, one of them (YARN-8346) has already got +1 for
>>> patch, the other (YARN-8108) will take longer time to resolve and it
>>> seems
>>> we can possibly push it to next release given 3.0.2 also has the issue.
>>>
>>> My thinking is to cut the branch in next couple of days and create RC for
>>> vote at the end of month.
>>>
>>> Comments are welcome.
>>>
>>> Thanks,
>>>
>>> --Yongjun
>>>
>>> On Tue, May 8, 2018 at 11:40 AM, Vrushali C <vrushalic2...@gmail.com>
>>> wrote:
>>>
>>> > +1 for including the YARN-7190 patch in 3.0.3 release. This is a fix
>>> that
>>> > will enable HBase to use Hadoop 3.0.x in the production line.
>>> >
>>> > thanks
>>> > Vrushali
>>> >
>>> >
>>> > On Tue, May 8, 2018 at 10:24 AM, Yongjun Zhang <yzh...@cloudera.com>
>>> > wrote:
>>> >
>>> >> Thanks Wei-Chiu and Haibo for the feedback!
>>> >>
>>> >> Good thing is that I have made the following note couple of days ago
>>> when
>>> >> I
>>> >> looked the at branch diff, so we are on the same page:
>>> >>
>>> >>  496dc57 Revert "YARN-7190. Ensure only NM classpath in 2.x gets
>>> TSv2
>>> >> related hbase jars, not the user classpath. Contributed by Varun
>>> Saxena."
>>> >>
>>> >> *YARN-7190 is not in 3.0.2,  I will include it in 3.0.3 per* the
>>> comment
>>> >> below:
>>> >> https://issues.apache.org/jira/browse/YARN-7190?focusedComme
>>> >> ntId=16457649&
>>> >> page=com.atlassian.jira.plugin.system.issuetabpanels
>>> >> <https://issues.apache.org/jira/browse/YARN-7190?focusedComm
>>> entId=16457649=com.atlassian.jira.plugin.system.issuetabpanels>
>>>
>>> >> :
>>> >> comment-tabpanel#comment-16457649
>>> >>
>>> >>
>>> >> In addition, I will revert   https://issues.apache.org/
>>> >> jira/browse/HADOOP-13055 from 3.0.3 since it's a feature.
>>> >>
>>> >>
>>> >> Best,
>>> >>
>>> >> --Yongjun
>>> >>
>>> >> On Tue, May 8, 2018 at 8:57 AM, Haibo Chen <haiboc...@cloudera.com>
>>> >> wrote:
>>> >>
>>> >> > +1 on adding YARN-7190 to Hadoop 3.0.x despite the fact that it is
>>> >> > technically incompatible.
>>> >> > It is critical enough to justify being an exception, IMO.
>>> >> >
>>> >> > Added Rohith and Vrushali
>>> >> >
>>> >> > On Tue, May 8, 2018 at 6:20 AM, Wei-Chiu Chuang <weic...@apache.org
>>> >
>>> >> > wrote:
>>> >> >
>>> >> >> Thanks Yongjun for driving 3.0.3 release!
>>> >> >>
>>> >> >> IMHO, could we consider adding YARN-7190
>>> >> >> <https://issues.apache.org/jira/browse/YARN-7190> 

Re: Apache Hadoop 3.0.3 Release plan

2018-05-23 Thread Yongjun Zhang
Thanks Eric. Sounds good. I may try to see if I can do the branching/RC
sooner.

--Yongjun

On Wed, May 23, 2018 at 2:18 PM, Eric Badger <ebad...@oath.com> wrote:

> My thinking is to cut the branch in next couple of days and create RC for
> vote at the end of month.
>   >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and
> vote for RC on May 30th
>   I much prefer to wait to cut the branch until just before the production
> of the release and the vote. With so many branches, we sometimes miss
> putting critical bug fixes in unreleased branches if thebranch is cut
> too early.
>
> Echoing Eric Payne, I think we should wait to cut the branch until we are
> actually creating the RC to vote on (i.e. on May 29 or 30 if the vote is to
> be on May 30).
>
> Eric
>
>
>
> On Wed, May 23, 2018 at 4:11 PM, Yongjun Zhang <yzh...@cloudera.com>
> wrote:
>
>> Hi,
>>
>> I have gardened the jiras for 3.0.3, and have the following open issues:
>>
>> https://issues.apache.org/jira/issues/?filter=12343970
>>
>> Two of them are blockers, one of them (YARN-8346) has already got +1 for
>> patch, the other (YARN-8108) will take longer time to resolve and it seems
>> we can possibly push it to next release given 3.0.2 also has the issue.
>>
>> My thinking is to cut the branch in next couple of days and create RC for
>> vote at the end of month.
>>
>> Comments are welcome.
>>
>> Thanks,
>>
>> --Yongjun
>>
>> On Tue, May 8, 2018 at 11:40 AM, Vrushali C <vrushalic2...@gmail.com>
>> wrote:
>>
>> > +1 for including the YARN-7190 patch in 3.0.3 release. This is a fix
>> that
>> > will enable HBase to use Hadoop 3.0.x in the production line.
>> >
>> > thanks
>> > Vrushali
>> >
>> >
>> > On Tue, May 8, 2018 at 10:24 AM, Yongjun Zhang <yzh...@cloudera.com>
>> > wrote:
>> >
>> >> Thanks Wei-Chiu and Haibo for the feedback!
>> >>
>> >> Good thing is that I have made the following note couple of days ago
>> when
>> >> I
>> >> looked the at branch diff, so we are on the same page:
>> >>
>> >>  496dc57 Revert "YARN-7190. Ensure only NM classpath in 2.x gets
>> TSv2
>> >> related hbase jars, not the user classpath. Contributed by Varun
>> Saxena."
>> >>
>> >> *YARN-7190 is not in 3.0.2,  I will include it in 3.0.3 per* the
>> comment
>> >> below:
>> >> https://issues.apache.org/jira/browse/YARN-7190?focusedComme
>> >> ntId=16457649&
>> >> page=com.atlassian.jira.plugin.system.issuetabpanels
>> >> <https://issues.apache.org/jira/browse/YARN-7190?focusedComm
>> entId=16457649=com.atlassian.jira.plugin.system.issuetabpanels>
>>
>> >> :
>> >> comment-tabpanel#comment-16457649
>> >>
>> >>
>> >> In addition, I will revert   https://issues.apache.org/
>> >> jira/browse/HADOOP-13055 from 3.0.3 since it's a feature.
>> >>
>> >>
>> >> Best,
>> >>
>> >> --Yongjun
>> >>
>> >> On Tue, May 8, 2018 at 8:57 AM, Haibo Chen <haiboc...@cloudera.com>
>> >> wrote:
>> >>
>> >> > +1 on adding YARN-7190 to Hadoop 3.0.x despite the fact that it is
>> >> > technically incompatible.
>> >> > It is critical enough to justify being an exception, IMO.
>> >> >
>> >> > Added Rohith and Vrushali
>> >> >
>> >> > On Tue, May 8, 2018 at 6:20 AM, Wei-Chiu Chuang <weic...@apache.org>
>> >> > wrote:
>> >> >
>> >> >> Thanks Yongjun for driving 3.0.3 release!
>> >> >>
>> >> >> IMHO, could we consider adding YARN-7190
>> >> >> <https://issues.apache.org/jira/browse/YARN-7190> into the list?
>> >> >> I understand that it is listed as an incompatible change, however,
>> >> because
>> >> >> of this bug, HBase considers the entire Hadoop 3.0.x line not
>> >> production
>> >> >> ready. I feel there's not much point releasing any more 3.0.x
>> releases
>> >> if
>> >> >> downstream projects can't pick it up (after the fact that HBase is
>> one
>> >> of
>> >> >> the most important projects around Hadoop).
>> >> >>
>> >> >> On Mon, May 7,

Re: Apache Hadoop 3.0.3 Release plan

2018-05-23 Thread Yongjun Zhang
Hi,

I have gardened the jiras for 3.0.3, and have the following open issues:

https://issues.apache.org/jira/issues/?filter=12343970

Two of them are blockers, one of them (YARN-8346) has already got +1 for
patch, the other (YARN-8108) will take longer time to resolve and it seems
we can possibly push it to next release given 3.0.2 also has the issue.

My thinking is to cut the branch in next couple of days and create RC for
vote at the end of month.

Comments are welcome.

Thanks,

--Yongjun

On Tue, May 8, 2018 at 11:40 AM, Vrushali C <vrushalic2...@gmail.com> wrote:

> +1 for including the YARN-7190 patch in 3.0.3 release. This is a fix that
> will enable HBase to use Hadoop 3.0.x in the production line.
>
> thanks
> Vrushali
>
>
> On Tue, May 8, 2018 at 10:24 AM, Yongjun Zhang <yzh...@cloudera.com>
> wrote:
>
>> Thanks Wei-Chiu and Haibo for the feedback!
>>
>> Good thing is that I have made the following note couple of days ago when
>> I
>> looked the at branch diff, so we are on the same page:
>>
>>  496dc57 Revert "YARN-7190. Ensure only NM classpath in 2.x gets TSv2
>> related hbase jars, not the user classpath. Contributed by Varun Saxena."
>>
>> *YARN-7190 is not in 3.0.2,  I will include it in 3.0.3 per* the comment
>> below:
>> https://issues.apache.org/jira/browse/YARN-7190?focusedComme
>> ntId=16457649&
>> page=com.atlassian.jira.plugin.system.issuetabpanels
>> <https://issues.apache.org/jira/browse/YARN-7190?focusedCommentId=16457649=com.atlassian.jira.plugin.system.issuetabpanels>
>> :
>> comment-tabpanel#comment-16457649
>>
>>
>> In addition, I will revert   https://issues.apache.org/
>> jira/browse/HADOOP-13055 from 3.0.3 since it's a feature.
>>
>>
>> Best,
>>
>> --Yongjun
>>
>> On Tue, May 8, 2018 at 8:57 AM, Haibo Chen <haiboc...@cloudera.com>
>> wrote:
>>
>> > +1 on adding YARN-7190 to Hadoop 3.0.x despite the fact that it is
>> > technically incompatible.
>> > It is critical enough to justify being an exception, IMO.
>> >
>> > Added Rohith and Vrushali
>> >
>> > On Tue, May 8, 2018 at 6:20 AM, Wei-Chiu Chuang <weic...@apache.org>
>> > wrote:
>> >
>> >> Thanks Yongjun for driving 3.0.3 release!
>> >>
>> >> IMHO, could we consider adding YARN-7190
>> >> <https://issues.apache.org/jira/browse/YARN-7190> into the list?
>> >> I understand that it is listed as an incompatible change, however,
>> because
>> >> of this bug, HBase considers the entire Hadoop 3.0.x line not
>> production
>> >> ready. I feel there's not much point releasing any more 3.0.x releases
>> if
>> >> downstream projects can't pick it up (after the fact that HBase is one
>> of
>> >> the most important projects around Hadoop).
>> >>
>> >> On Mon, May 7, 2018 at 1:19 PM, Yongjun Zhang <yzh...@cloudera.com>
>> >> wrote:
>> >>
>> >> > Hi Eric,
>> >> >
>> >> > Thanks for the feedback, good point. I will try to clean up things,
>> then
>> >> > cut branch before the release production and vote.
>> >> >
>> >> > Best,
>> >> >
>> >> > --Yongjun
>> >> >
>> >> > On Mon, May 7, 2018 at 8:39 AM, Eric Payne <eric.payne1...@yahoo.com
>> .
>> >> > invalid
>> >> > > wrote:
>> >> >
>> >> > > >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th)
>> and
>> >> vote
>> >> > > for RC on May 30th
>> >> > > I much prefer to wait to cut the branch until just before the
>> >> production
>> >> > > of the release and the vote. With so many branches, we sometimes
>> miss
>> >> > > putting critical bug fixes in unreleased branches if the branch is
>> cut
>> >> > too
>> >> > > early.
>> >> > >
>> >> > > My 2 cents...
>> >> > > Thanks,
>> >> > > -Eric Payne
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Monday, May 7, 2018, 12:09:00 AM CDT, Yongjun Zhang <
>> >> > > yjzhan...@apache.org> wrote:
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > &g

Re: [ANNOUNCE] Apache Hadoop 2.9.1 Release

2018-05-19 Thread Yongjun Zhang
Thanks for the effort doing the release Sammi, very nice work!

--Yongjun

On Tue, May 15, 2018 at 10:21 AM, Chen, Sammi <sammi.c...@intel.com> wrote:

> Hello everyone,
>
> I am glad to announce that Apache Hadoop 2.9.1 has been released.
>
> Apache Hadoop 2.9.1 is the next release of Apache Hadoop 2.9 line. It
> includes 208 bug fixes, improvements and enhancements since previous Apache
> Hadoop 2.9.0 release.
>
>  - For major changes included in Hadoop 2.9 line, please refer to Hadoop
> 2.9.1 main page [1].
>  - For more details about fixes and improvements in 2.9.1 release, please
> refer to CHANGES [2] and RELEASENOTES [3].
>  - For download, please got to download page[4]
>
> Thank you all for contributing to the Apache Hadoop 2.9.1.
>
>
> Last, thanks Yongjun Zhang, Junping Du, Andrew Wang and Chris Douglas for
> your help and support.
>
>
> Bests,
> Sammi Chen
>
> [1] http://hadoop.apache.org/docs/r2.9.1/index.html
> [2] http://hadoop.apache.org/docs/r2.9.1/hadoop-project-dist/
> hadoop-common/release/2.9.1/CHANGES.2.9.1.html
> [3] http://hadoop.apache.org/docs/r2.9.1/hadoop-project-dist/
> hadoop-common/release/2.9.1/RELEASENOTES.2.9.1.html
> [4] http://hadoop.apache.org/releases.html#Download
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>


Re: Apache Hadoop 3.0.3 Release plan

2018-05-08 Thread Yongjun Zhang
Thanks Wei-Chiu and Haibo for the feedback!

Good thing is that I have made the following note couple of days ago when I
looked the at branch diff, so we are on the same page:

 496dc57 Revert "YARN-7190. Ensure only NM classpath in 2.x gets TSv2
related hbase jars, not the user classpath. Contributed by Varun Saxena."

*YARN-7190 is not in 3.0.2,  I will include it in 3.0.3 per* the comment
below:
https://issues.apache.org/jira/browse/YARN-7190?focusedCommentId=16457649;
page=com.atlassian.jira.plugin.system.issuetabpanels:
comment-tabpanel#comment-16457649


In addition, I will revert   https://issues.apache.org/
jira/browse/HADOOP-13055 from 3.0.3 since it's a feature.

Best,

--Yongjun

On Tue, May 8, 2018 at 8:57 AM, Haibo Chen <haiboc...@cloudera.com> wrote:

> +1 on adding YARN-7190 to Hadoop 3.0.x despite the fact that it is
> technically incompatible.
> It is critical enough to justify being an exception, IMO.
>
> Added Rohith and Vrushali
>
> On Tue, May 8, 2018 at 6:20 AM, Wei-Chiu Chuang <weic...@apache.org>
> wrote:
>
>> Thanks Yongjun for driving 3.0.3 release!
>>
>> IMHO, could we consider adding YARN-7190
>> <https://issues.apache.org/jira/browse/YARN-7190> into the list?
>> I understand that it is listed as an incompatible change, however, because
>> of this bug, HBase considers the entire Hadoop 3.0.x line not production
>> ready. I feel there's not much point releasing any more 3.0.x releases if
>> downstream projects can't pick it up (after the fact that HBase is one of
>> the most important projects around Hadoop).
>>
>> On Mon, May 7, 2018 at 1:19 PM, Yongjun Zhang <yzh...@cloudera.com>
>> wrote:
>>
>> > Hi Eric,
>> >
>> > Thanks for the feedback, good point. I will try to clean up things, then
>> > cut branch before the release production and vote.
>> >
>> > Best,
>> >
>> > --Yongjun
>> >
>> > On Mon, May 7, 2018 at 8:39 AM, Eric Payne <eric.payne1...@yahoo.com.
>> > invalid
>> > > wrote:
>> >
>> > > >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and
>> vote
>> > > for RC on May 30th
>> > > I much prefer to wait to cut the branch until just before the
>> production
>> > > of the release and the vote. With so many branches, we sometimes miss
>> > > putting critical bug fixes in unreleased branches if the branch is cut
>> > too
>> > > early.
>> > >
>> > > My 2 cents...
>> > > Thanks,
>> > > -Eric Payne
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Monday, May 7, 2018, 12:09:00 AM CDT, Yongjun Zhang <
>> > > yjzhan...@apache.org> wrote:
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Hi All,
>> > >
>> > > >
>> > > We have released Apache Hadoop 3.0.2 in April of this year [1]. Since
>> > then,
>> > > there are quite some commits done to branch-3.0. To further improve
>> the
>> > > quality of release, we plan to do 3.0.3 release now. The focus of
>> 3.0.3
>> > > will be fixing blockers (3), critical bugs (17) and bug fixes (~130),
>> see
>> > > [2].
>> > >
>> > > Usually no new feature should be included for maintenance releases, I
>> > > noticed we have https://issues.apache.org/jira/browse/HADOOP-13055 in
>> > the
>> > > branch classified as new feature. I will talk with the developers to
>> see
>> > if
>> > > we should include it in 3.0.3.
>> > >
>> > > I also noticed that there are more commits in the branch than can be
>> > found
>> > > by query [2], also some commits committed to 3.0.3 do not have their
>> jira
>> > > target release field filled in accordingly. I will go through them to
>> > > update the jira.
>> > >
>> > > >
>> > > We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and vote
>> > for
>> > > RC on May 30th, targeting for Jun 8th release.
>> > >
>> > > >
>> > > Your insights are welcome.
>> > >
>> > > >
>> > > [1] https://www.mail-archive.com/general@hadoop.apache.org/msg07
>> 790.html
>> > >
>> > > > [2] https://issues.apache.org/jira/issues/?filter=12343874  See
>> Note
>> > > below
>> > > Note: seems I need some admin change so that I can make the filter in
>> [2]
>> > > public, I'm working on that. For now, you can use jquery
>> > > (project = hadoop OR project = "Hadoop HDFS" OR project = "Hadoop
>> YARN"
>> > OR
>> > > project = "Hadoop Map/Reduce") AND fixVersion in (3.0.3) ORDER BY
>> > priority
>> > > DESC
>> > >
>> > > Thanks and best regards,
>> > >
>> > > --Yongjun
>> > >
>> > > -
>> > > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
>> > > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> A very happy Hadoop contributor
>>
>
>


Re: Apache Hadoop 3.0.3 Release plan

2018-05-07 Thread Yongjun Zhang
Hi Eric,

Thanks for the feedback, good point. I will try to clean up things, then
cut branch before the release production and vote.

Best,

--Yongjun

On Mon, May 7, 2018 at 8:39 AM, Eric Payne <eric.payne1...@yahoo.com.invalid
> wrote:

> >  We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and vote
> for RC on May 30th
> I much prefer to wait to cut the branch until just before the production
> of the release and the vote. With so many branches, we sometimes miss
> putting critical bug fixes in unreleased branches if the branch is cut too
> early.
>
> My 2 cents...
> Thanks,
> -Eric Payne
>
>
>
>
>
> On Monday, May 7, 2018, 12:09:00 AM CDT, Yongjun Zhang <
> yjzhan...@apache.org> wrote:
>
>
>
>
>
> Hi All,
>
> >
> We have released Apache Hadoop 3.0.2 in April of this year [1]. Since then,
> there are quite some commits done to branch-3.0. To further improve the
> quality of release, we plan to do 3.0.3 release now. The focus of 3.0.3
> will be fixing blockers (3), critical bugs (17) and bug fixes (~130), see
> [2].
>
> Usually no new feature should be included for maintenance releases, I
> noticed we have https://issues.apache.org/jira/browse/HADOOP-13055 in the
> branch classified as new feature. I will talk with the developers to see if
> we should include it in 3.0.3.
>
> I also noticed that there are more commits in the branch than can be found
> by query [2], also some commits committed to 3.0.3 do not have their jira
> target release field filled in accordingly. I will go through them to
> update the jira.
>
> >
> We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and vote for
> RC on May 30th, targeting for Jun 8th release.
>
> >
> Your insights are welcome.
>
> >
> [1] https://www.mail-archive.com/general@hadoop.apache.org/msg07790.html
>
> > [2] https://issues.apache.org/jira/issues/?filter=12343874  See Note
> below
> Note: seems I need some admin change so that I can make the filter in [2]
> public, I'm working on that. For now, you can use jquery
> (project = hadoop OR project = "Hadoop HDFS" OR project = "Hadoop YARN" OR
> project = "Hadoop Map/Reduce") AND fixVersion in (3.0.3) ORDER BY priority
> DESC
>
> Thanks and best regards,
>
> --Yongjun
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Apache Hadoop 3.0.3 Release plan

2018-05-06 Thread Yongjun Zhang
Hi All,

>
We have released Apache Hadoop 3.0.2 in April of this year [1]. Since then,
there are quite some commits done to branch-3.0. To further improve the
quality of release, we plan to do 3.0.3 release now. The focus of 3.0.3
will be fixing blockers (3), critical bugs (17) and bug fixes (~130), see
[2].

Usually no new feature should be included for maintenance releases, I
noticed we have https://issues.apache.org/jira/browse/HADOOP-13055 in the
branch classified as new feature. I will talk with the developers to see if
we should include it in 3.0.3.

I also noticed that there are more commits in the branch than can be found
by query [2], also some commits committed to 3.0.3 do not have their jira
target release field filled in accordingly. I will go through them to
update the jira.

>
We plan to cut branch-3.0.3 by the coming Wednesday (May 9th) and vote for
RC on May 30th, targeting for Jun 8th release.

>
Your insights are welcome.

>
[1] https://www.mail-archive.com/general@hadoop.apache.org/msg07790.html

> [2] https://issues.apache.org/jira/issues/?filter=12343874  See Note below
 Note: seems I need some admin change so that I can make the filter in [2]
public, I'm working on that. For now, you can use jquery
(project = hadoop OR project = "Hadoop HDFS" OR project = "Hadoop YARN" OR
project = "Hadoop Map/Reduce") AND fixVersion in (3.0.3) ORDER BY priority
DESC

Thanks and best regards,

--Yongjun


[jira] [Created] (HDFS-13315) Add a test for the issue reported in HDFS-11481 which is fixed by HDFS-10997.

2018-03-19 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-13315:


 Summary: Add a test for the issue reported in HDFS-11481 which is 
fixed by HDFS-10997.
 Key: HDFS-13315
 URL: https://issues.apache.org/jira/browse/HDFS-13315
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Yongjun Zhang


HDFS-11481 reported that hdfs snapshotDiff /.reserved/raw/... fails on 
snapshottable directories. It turns out that HDFS-10997 fixed the issue as a 
byproduct. This jira is to add a test for the HDFS-11481 issue.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13115) Handle inode of a given inodeId already deleted

2018-02-06 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-13115:


 Summary: Handle inode of a given inodeId already deleted
 Key: HDFS-13115
 URL: https://issues.apache.org/jira/browse/HDFS-13115
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


In LeaseManager, 
{code}
 private synchronized INode[] getINodesWithLease() {
List inodes = new ArrayList<>(leasesById.size());
INode currentINode;
for (long inodeId : leasesById.keySet()) {
  currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
  // A file with an active lease could get deleted, or its
  // parent directories could get recursively deleted.
  if (currentINode != null &&
  currentINode.isFile() &&
  !fsnamesystem.isFileDeleted(currentINode.asFile())) {
inodes.add(currentINode);
  }
}
return inodes.toArray(new INode[0]);
  }
{code}
we can see that given an {{inodeId}},  
{{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
reason is explained in the comment.

HDFS-12985 RCAed a case and solved that case, we saw that it fixes some cases, 
but we are still seeing NullPointerException from FSnamesystem

{code}
  public long getCompleteBlocksTotal() {
// Calculate number of blocks under construction
long numUCBlocks = 0;
readLock();
try {
  numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
  return getBlocksTotal() - numUCBlocks;
} finally {
  readUnlock();
}
  }
{code}

The exception happens when the inode is removed for the given inodeid, see 
LeaseManager code below:
{code}
  synchronized long getNumUnderConstructionBlocks() {
assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock wasn't"
  + "acquired before counting under construction blocks";
long numUCBlocks = 0;
for (Long id : getINodeIdWithLeases()) {
  final INodeFile cons = 
fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
  Preconditions.checkState(cons.isUnderConstruction());
  BlockInfo[] blocks = cons.getBlocks();
  if(blocks == null)
continue;
  for(BlockInfo b : blocks) {
if(!b.isComplete())
  numUCBlocks++;
  }
}
LOG.info("Number of blocks under construction: " + numUCBlocks);
return numUCBlocks;
  }
{code}

Create this jira to add a check whether the inode is removed, as a safeguard, 
to avoid the NullPointerException.

Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it got 
deleted from FSDirectory map.

Ideally we should find out who deleted it, like in HDFS-12985. 

But it seems reasonable to me to have a safeguard here, like other code that 
calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-02-01 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-13101:


 Summary: Yet another fsimage corruption related to snapshot
 Key: HDFS-13101
 URL: https://issues.apache.org/jira/browse/HDFS-13101
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang


Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is present, 
so it's likely another case not covered by the fix. We are currently trying to 
collect good fsimage + editlogs to replay to reproduce it and investigate. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13100) Handle IllegalArgumentException in when GETSERVERDEFAULTS is not implemented in webhdfs.

2018-02-01 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-13100:


 Summary: Handle IllegalArgumentException in when GETSERVERDEFAULTS 
is not implemented in webhdfs.
 Key: HDFS-13100
 URL: https://issues.apache.org/jira/browse/HDFS-13100
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, webhdfs
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


HDFS-12386 added getserverdefaults call to webhdfs, and expect clusters that 
don't support this to throw UnsupportedOperationException.  However, in that 
case, the following handler is called, and IllegalArgumentException instead of 
UnsupportedOperationException is thrown, and the client side would fail to deal 
with IllegalArgumentException.

{code}
  public void handle(ChannelHandlerContext ctx, HttpRequest req)
throws IOException, URISyntaxException {
String op = params.op();
HttpMethod method = req.getMethod();
if (PutOpParam.Op.CREATE.name().equalsIgnoreCase(op)
  && method == PUT) {
  onCreate(ctx);
} else if (PostOpParam.Op.APPEND.name().equalsIgnoreCase(op)
  && method == POST) {
  onAppend(ctx);
} else if (GetOpParam.Op.OPEN.name().equalsIgnoreCase(op)
  && method == GET) {
  onOpen(ctx);
} else if(GetOpParam.Op.GETFILECHECKSUM.name().equalsIgnoreCase(op)
  && method == GET) {
  onGetFileChecksum(ctx);
} else if(PutOpParam.Op.CREATE.name().equalsIgnoreCase(op)
&& method == OPTIONS) {
  allowCORSOnCreate(ctx);
} else {
  throw new IllegalArgumentException("Invalid operation " + op);
}
}
{code}

We either need to make the server throw UnsupportedOperationException, or make 
the client to handle IllegalArgumentException. For backward compatibility and 
easier operation in the field, the latter is preferred.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released

2017-12-20 Thread Yongjun Zhang
Congratulations Andrew and all, a great milestone! Thanks Andrew for
driving it!

--Yongjun

On Tue, Dec 19, 2017 at 9:10 AM, Jonathan Kelly 
wrote:

> Thanks, Andrew!
>
> On Mon, Dec 18, 2017 at 4:54 PM Andrew Wang 
> wrote:
>
> > Thanks for the spot, I just pushed a correct tag. I can't delete the bad
> > tag myself, will ask ASF infra for help.
> >
> > On Mon, Dec 18, 2017 at 4:46 PM, Jonathan Kelly 
> > wrote:
> >
> >> Congrats on the huge release!
> >>
> >> I just noticed, though, that the Github repo does not appear to have the
> >> correct tag for 3.0.0. I see a new tag called "rel/release-" that
> points to
> >> the same commit as "release-3.0.0-RC1"
> >> (c25427ceca461ee979d30edd7a4b0f50718e6533). I assume that should have
> >> actually been called "rel/release-3.0.0" to match the pattern for prior
> >> releases.
> >>
> >> Thanks,
> >> Jonathan Kelly
> >>
> >> On Thu, Dec 14, 2017 at 10:45 AM Andrew Wang 
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm pleased to announce that Apache Hadoop 3.0.0 is generally available
> >>> (GA).
> >>>
> >>> 3.0.0 GA consists of 302 bug fixes, improvements, and other
> enhancements
> >>> since 3.0.0-beta1. This release marks a point of quality and stability
> >>> for
> >>> the 3.0.0 release line, and users of earlier 3.0.0-alpha and -beta
> >>> releases
> >>> are encouraged to upgrade.
> >>>
> >>> Looking back, 3.0.0 GA is the culmination of over a year of work on the
> >>> 3.0.0 line, starting with 3.0.0-alpha1 which was released in September
> >>> 2016. Altogether, 3.0.0 incorporates 6,242 changes since 2.7.0.
> >>>
> >>> Users are encouraged to read the overview of major changes
> >>>  in 3.0.0. The GA
> >>> release
> >>> notes
> >>> <
> >>> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/
> hadoop-common/release/3.0.0/RELEASENOTES.3.0.0.html
> >>> >
> >>>  and changelog
> >>> <
> >>> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/
> hadoop-common/release/3.0.0/CHANGES.3.0.0.html
> >>> >
> >>> detail
> >>> the changes since 3.0.0-beta1.
> >>>
> >>> The ASF press release provides additional color and highlights some of
> >>> the
> >>> major features:
> >>>
> >>>
> >>> https://globenewswire.com/news-release/2017/12/14/
> 1261879/0/en/The-Apache-Software-Foundation-Announces-
> Apache-Hadoop-v3-0-0-General-Availability.html
> >>>
> >>> Let me end by thanking the many, many contributors who helped with this
> >>> release line. We've only had three major releases in Hadoop's 10 year
> >>> history, and this is our biggest major release ever. It's an incredible
> >>> accomplishment for our community, and I'm proud to have worked with all
> >>> of
> >>> you.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>
> >
>


[jira] [Created] (HDFS-12866) Recursive delete of a large directory or snapshot makes namenode unresponsive

2017-11-28 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12866:


 Summary: Recursive delete of a large directory or snapshot makes 
namenode unresponsive
 Key: HDFS-12866
 URL: https://issues.apache.org/jira/browse/HDFS-12866
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang


Currently file/directory deletion happens in two steps (see 
{{FSNamesystem#delete(String src, boolean recursive, boolean logRetryCache)}}:

# Do the following under fsn write lock and release the lock afterwards
** 1.1  recursively traverse the target, collect INodes and all blocks to be 
deleted
** 1.2  delete all INodes
# Delete the blocks to be deleted incrementally, chunk by chunk. That is, in a 
loop, do:   
** acquire fsn write lock,
** delete chunk of blocks
** release fsn write lock

Breaking the deletion to two steps is to not hold the fsn write lock for too 
long thus making NN not responsive. However, even with this, for deleting large 
directory, or deleting snapshot that has a lot of contents, step 1 itself would 
takes long time thus still hold the fsn write lock for too long and make NN not 
responsive.

A possible solution would be to add one more sub step in step 1, and only hold 
fsn write lock in sub step 1.1:

* 1.1. hold the fsn write lock, disconnect the target to be deleted from its 
parent dir, release the lock
* 1.2 recursively traverse the target, collect INodes and all blocks to be 
deleted
* 1.3  delete all INodes

Then do step 2.

This means, any operations on any file/dir need to check if its ancestor is 
deleted (ancestor is disconnected), similar to what's done in 
FSNamesystem#isFileDeleted method.

I'm throwing the thought here for further discussion. Welcome comments and 
inputs.








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: If primary replica is unresponsive, hsync() hangs

2017-09-11 Thread Yongjun Zhang
Thanks for finding the issue Wei-Chiu.

I agree hsync should be handling DN failure similarly as write-pipeline
recovery, as you stated. If it's not doing that, it should be fixed.

--Yongjun

On Mon, Sep 11, 2017 at 10:53 AM, Wei-Chiu Chuang 
wrote:

> Hello my dear HDFS dev colleagues,
>
> It appears that when a dfs client writes and hsync(), and if the primary
> replica (that is, the first DataNode in the write pipeline) is unresponsive
> to the hsync() request, the hsync() would wait at
> DataStreamer#waitForAckedSeqno().
>
> In one scenario, we saw this behavior when the primary DataNode has a flaky
> disk drive controller, and DataNode was thus unable to write back ack to
> client because it was unable to write to the disk successfully. The client
> is a Flume agent and it finally bailed out after 180 seconds.
>
> My question is: why doesn't hsync() replace bad DataNodes in the pipeline
> just like the typical write pipeline failure recovery?
>
> I would like to understand if this is intended before I file a jira and
> post a patch.
>
> Thanks,
> Wei-Chiu
> --
> A very happy Hadoop contributor
>


[jira] [Resolved] (HDFS-12296) Add a field to FsServerDefaults to tell if external attribute provider is enabled

2017-09-08 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-12296.
--
Resolution: Won't Fix

> Add a field to FsServerDefaults to tell if external attribute provider is 
> enabled
> -
>
> Key: HDFS-12296
> URL: https://issues.apache.org/jira/browse/HDFS-12296
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>    Reporter: Yongjun Zhang
>    Assignee: Yongjun Zhang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12294) Let distcp to bypass external attribute provider when calling getFileStatus etc at source cluster

2017-09-08 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-12294.
--
Resolution: Won't Fix

> Let distcp to bypass external attribute provider when calling getFileStatus 
> etc at source cluster
> -
>
> Key: HDFS-12294
> URL: https://issues.apache.org/jira/browse/HDFS-12294
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>    Reporter: Yongjun Zhang
>    Assignee: Yongjun Zhang
>
> This is an alternative solution for HDFS-12202, which proposed introducing a 
> new set of API, with an additional boolean parameter bypassExtAttrProvider, 
> so to let NN bypass external attribute provider when getFileStatus. The goal 
> is to avoid distcp from copying attributes from one cluster's external 
> attribute provider and save to another cluster's fsimage.
> The solution here is, instead of having an additional parameter, encode this 
> parameter to the path itself, when calling getFileStatus (and some other 
> calls), NN will parse the path, and figure out that whether external 
> attribute provider need to be bypassed. The suggested encoding is to have a 
> prefix to the path before calling getFileStatus, e.g. /ab/c becomes 
> /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning.
> Thanks much to [~andrew.wang] for this suggestion. The scope of change is 
> smaller and we don't have to change the FileSystem APIs.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12404) Name of config introduced by HDFS-12357 need to be changed from authorization.provider to attribute.provider

2017-09-07 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12404:


 Summary: Name of config introduced by HDFS-12357 need to be 
changed from authorization.provider to attribute.provider
 Key: HDFS-12404
 URL: https://issues.apache.org/jira/browse/HDFS-12404
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12360) TestLeaseRecoveryStriped.testLeaseRecovery failure

2017-08-26 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12360:


 Summary: TestLeaseRecoveryStriped.testLeaseRecovery failure
 Key: HDFS-12360
 URL: https://issues.apache.org/jira/browse/HDFS-12360
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


TestLeaseRecoveryStriped.testLeaseRecovery failed:

{code}
---
 T E S T S
---
Running org.apache.hadoop.hdfs.TestLeaseRecoveryStriped
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.808 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecoveryStriped
testLeaseRecovery(org.apache.hadoop.hdfs.TestLeaseRecoveryStriped)  Time 
elapsed: 15.509 sec  <<< FAILURE!
java.lang.AssertionError: failed testCase at i=0, blockLengths=[10485760, 
4194304, 6291456, 10485760, 11534336, 11534336, 6291456, 4194304, 3145728]
java.io.IOException: Failed: the number of failed blocks = 4 > the number of 
data blocks = 3
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamers(DFSStripedOutputStream.java:393)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.handleStreamerFailure(DFSStripedOutputStream.java:411)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.flushAllInternals(DFSStripedOutputStream.java:1128)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:628)
at 
org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:564)
at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164)
at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:79)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48)
at java.io.DataOutputStream.write(DataOutputStream.java:88)
at 
org.apache.hadoop.hdfs.TestLeaseRecoveryStriped.writePartialBlocks(TestLeaseRecoveryStriped.java:182)
at 
org.apache.hadoop.hdfs.TestLeaseRecoveryStriped.runTest(TestLeaseRecoveryStriped.java:158)
at 
org.apache.hadoop.hdfs.TestLeaseRecoveryStriped.testLeaseRecovery(TestLeaseRecoveryStriped.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.hdfs.TestLeaseRecoveryStriped.testLeaseRecovery(TestLeaseRecoveryStriped.java:152)

Results :

Failed tests: 
  TestLeaseRecoveryStriped.testLeaseRecovery:152 failed testCase at i=0, 
blockLengths=[10485760, 4194304, 6291456, 10485760, 11534336, 11534336, 
6291

[jira] [Created] (HDFS-12357) Let NameNode to bypass external attribute provider for special user

2017-08-25 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12357:


 Summary: Let NameNode to bypass external attribute provider for 
special user
 Key: HDFS-12357
 URL: https://issues.apache.org/jira/browse/HDFS-12357
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


This is a third proposal to solve the problem described in HDFS-12202.

The problem is, when we do distcp from one cluster to another (or within the 
same cluster), in addition to copying file data, we copy the metadata from 
source to target. If external attribute provider is enabled, the metadata may 
be read from the provider, thus provider data read from source may be saved to 
target HDFS. 

We want to avoid saving metadata from external provider to HDFS, so we want to 
bypass external provider when doing the distcp (or hadoop fs -cp) operation.

Two alternative approaches were proposed earlier, one in HDFS-12202, the other 
in HDFS-12294. The proposal here is the third one.

The idea is, we introduce a new config, that specifies a special user (or a 
list of users), and let NN bypass external provider when the current user is a 
special user.

If we run applications as the special user that need data from external 
attribute provider, then it won't work. So the constraint on this approach is, 
the special users here should not run applications that need data from external 
provider.

Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn], 
[~manojg] for the discussions in the other jiras. 

I'm creating this one to discuss further.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12202) Provide new set of FileSystem API to bypass external attribute provider

2017-08-16 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-12202.
--
Resolution: Won't Fix

> Provide new set of FileSystem API to bypass external attribute provider
> ---
>
> Key: HDFS-12202
> URL: https://issues.apache.org/jira/browse/HDFS-12202
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client
>    Reporter: Yongjun Zhang
>    Assignee: Yongjun Zhang
>
> HDFS client uses 
> {code}
>   /**
>* Return a file status object that represents the path.
>* @param f The path we want information from
>* @return a FileStatus object
>* @throws FileNotFoundException when the path does not exist
>* @throws IOException see specific implementation
>*/
>   public abstract FileStatus getFileStatus(Path f) throws IOException;
>   /**
>* List the statuses of the files/directories in the given path if the path 
> is
>* a directory.
>* 
>* Does not guarantee to return the List of files/directories status in a
>* sorted order.
>* 
>* Will not return null. Expect IOException upon access error.
>* @param f given path
>* @return the statuses of the files/directories in the given patch
>* @throws FileNotFoundException when the path does not exist
>* @throws IOException see specific implementation
>*/
>   public abstract FileStatus[] listStatus(Path f) throws 
> FileNotFoundException,
>  IOException;
> {code}
> to get FileStatus of files.
> When external attribute provider (INodeAttributeProvider) is enabled for a 
> cluster, the  external attribute provider is consulted to get back some 
> relevant info (including ACL, group etc) and returned back in FileStatus, 
> There is a problem here, when we use distcp to copy files from srcCluster to 
> tgtCluster, if srcCluster has external attribute provider enabled, the data 
> we copied would contain data from attribute provider, which we may not want.
> Create this jira to add a new set of interface for distcp to use, so that 
> distcp can copy HDFS data only and bypass external attribute provider data.
> The new set API would look like
> {code}
>  /**
>* Return a file status object that represents the path.
>* @param f The path we want information from
>* @param bypassExtAttrProvider if true, bypass external attr provider
>*when it's in use.
>* @return a FileStatus object
>* @throws FileNotFoundException when the path does not exist
>* @throws IOException see specific implementation
>*/
>   public FileStatus getFileStatus(Path f,
>   final boolean bypassExtAttrProvider) throws IOException;
>   /**
>* List the statuses of the files/directories in the given path if the path 
> is
>* a directory.
>* 
>* Does not guarantee to return the List of files/directories status in a
>* sorted order.
>* 
>* Will not return null. Expect IOException upon access error.
>* @param f
>* @param bypassExtAttrProvider if true, bypass external attr provider
>*when it's in use.
>* @return
>* @throws FileNotFoundException
>* @throws IOException
>*/
>   public FileStatus[] listStatus(Path f,
>   final boolean bypassExtAttrProvider) throws FileNotFoundException,
>   IOException;
> {code}
> So when bypassExtAttrProvider is true, external attribute provider will be 
> bypassed.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12296) Add a field to FsServerDefaults to tell if external attirbute provider is enabled

2017-08-13 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12296:


 Summary: Add a field to FsServerDefaults to tell if external 
attirbute provider is enabled
 Key: HDFS-12296
 URL: https://issues.apache.org/jira/browse/HDFS-12296
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr

2017-08-12 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12295:


 Summary: NameNode to support file path prefix 
/.reserved/bypassExtAttr
 Key: HDFS-12295
 URL: https://issues.apache.org/jira/browse/HDFS-12295
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs, namenode
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add 
thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes 
/.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, 
and bypass external attribute provider if the prefix is there.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12294) Support /.reserved/bypassExtAttr prefix to file path so getFileStatus etc can bypass external attribute provider

2017-08-12 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12294:


 Summary: Support /.reserved/bypassExtAttr prefix to file path so 
getFileStatus etc can bypass external attribute provider
 Key: HDFS-12294
 URL: https://issues.apache.org/jira/browse/HDFS-12294
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


This is an alternative solution for HDFS-12202, which proposed introducing a 
new set of API, with an additional boolean parameter bypassExtAttrProvider, so 
to let NN bypass external attribute provider when getFileStatus. The goal is to 
avoid distcp from copying attributes from one cluster's external attribute 
provider and save to another cluster's fsimage.

The solution here is, instead of having an additional parameter, encode this 
parameter to the path itself, when calling getFileStatus (and some other 
calls), NN will parse the path, and figure out that whether external attribute 
provider need to be bypassed.

Thanks much to [~andrew.wang] for this suggestion. The scope of change is 
smaller and we don't have to change the FileSystem APIs.






 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12202) Provide new set of FileSystem API to bypass external attribute provider

2017-07-26 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12202:


 Summary: Provide new set of FileSystem API to bypass external 
attribute provider
 Key: HDFS-12202
 URL: https://issues.apache.org/jira/browse/HDFS-12202
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs, hdfs-client
Reporter: Yongjun Zhang


HDFS client uses 

{code}
  /**
   * Return a file status object that represents the path.
   * @param f The path we want information from
   * @return a FileStatus object
   * @throws FileNotFoundException when the path does not exist
   * @throws IOException see specific implementation
   */
  public abstract FileStatus getFileStatus(Path f) throws IOException;

  /**
   * List the statuses of the files/directories in the given path if the path is
   * a directory.
   * 
   * Does not guarantee to return the List of files/directories status in a
   * sorted order.
   * 
   * Will not return null. Expect IOException upon access error.
   * @param f given path
   * @return the statuses of the files/directories in the given patch
   * @throws FileNotFoundException when the path does not exist
   * @throws IOException see specific implementation
   */
  public abstract FileStatus[] listStatus(Path f) throws FileNotFoundException,
 IOException;

{code}
to get FileStatus of files.

When external attribute provider (INodeAttributeProvider) is enabled for a 
cluster, the  external attribute provider is consulted to get back some 
relevant info (including ACL, group etc) and returned back in FileStatus, 

There is a problem here, when we use distcp to copy files from srcCluster to 
tgtCluster, if srcCluster has external attribute provider enabled, the data we 
copied would contain data from attribute provider, which we may not want.

Create this jira to add a new set of interface for distcp to use, so that 
distcp can copy HDFS data only and bypass external attribute provider data.

The new set API would look like
{code}
 /**
   * Return a file status object that represents the path.
   * @param f The path we want information from
   * @param bypassExtAttrProvider if true, bypass external attr provider
   *when it's in use.
   * @return a FileStatus object
   * @throws FileNotFoundException when the path does not exist
   * @throws IOException see specific implementation
   */
  public FileStatus getFileStatus(Path f,
  final boolean bypassExtAttrProvider) throws IOException;

  /**
   * List the statuses of the files/directories in the given path if the path is
   * a directory.
   * 
   * Does not guarantee to return the List of files/directories status in a
   * sorted order.
   * 
   * Will not return null. Expect IOException upon access error.
   * @param f
   * @param bypassExtAttrProvider if true, bypass external attr provider
   *when it's in use.
   * @return
   * @throws FileNotFoundException
   * @throws IOException
   */
  public FileStatus[] listStatus(Path f,
  final boolean bypassExtAttrProvider) throws FileNotFoundException,
  IOException;
{code}

So when bypassExtAttrProvider is true, external attribute provider will be 
bypassed.

Thanks.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12190) Enable 'hdfs dfs -stat' to display access time

2017-07-21 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12190:


 Summary: Enable 'hdfs dfs -stat' to display access time
 Key: HDFS-12190
 URL: https://issues.apache.org/jira/browse/HDFS-12190
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, shell
Reporter: Yongjun Zhang


"hdfs dfs -stat" currently only can show modification time of a file but not 
access time. Sometimes it's useful to show access time. 





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12162) Update listStatus document to describe the behavior when the argument is a file

2017-07-19 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12162:


 Summary: Update listStatus document to describe the behavior when 
the argument is a file
 Key: HDFS-12162
 URL: https://issues.apache.org/jira/browse/HDFS-12162
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, httpfs
Reporter: Yongjun Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12139) liststatus returns incorrect pathSuffix for path of file

2017-07-13 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-12139:


 Summary: liststatus returns incorrect pathSuffix for path of file
 Key: HDFS-12139
 URL: https://issues.apache.org/jira/browse/HDFS-12139
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


Per the following logs, we can see that liststatus returns the same pathSuffix 
"test.txt" for  /tmp/yj/yj1 and /tmp/yj/yj1/test.txt, which is wrong. The 
pathSuffix for the latter should be empty. 

[thost ~]$ hadoop fs -copyFromLocal test.txt /tmp/yj/yj1
[thost ~]$ curl 
"http://thost.x.y:14000/webhdfs/v1/tmp/yj/yj1?op=LISTSTATUS=tuser;
{"FileStatuses":{"FileStatus":[{"pathSuffix":"test.txt","type":"FILE","length":16,"owner":"tuser","group":"supergroup","permission":"644","accessTime":157684989,"modificationTime":157685286,"blockSize":134217728,"replication":3}]}}
[thost ~]$ curl 
"http://thost.x.y:14000/webhdfs/v1/tmp/yj/yj1/test.txt?op=LISTSTATUS=tuser;
{"FileStatuses":{"FileStatus":[{"pathSuffix":"test.txt","type":"FILE","length":16,"owner":"tuser","group":"supergroup","permission":"644","accessTime":157684989,"modificationTime":157685286,"blockSize":134217728,"replication":3}]}}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11976) Examine code base for cases that exception is thrown from finally block and fix it

2017-06-14 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11976:


 Summary: Examine code base for cases that exception is thrown from 
finally block and fix it
 Key: HDFS-11976
 URL: https://issues.apache.org/jira/browse/HDFS-11976
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


If exception X is thrown in try block, and exception Y is thrown is finally 
block, X will be swallowed.
In addition, finally block is used to ensure resources are released properly in 
general. If we throw exception from there, some resources may be leaked. So 
it's not recommended to throw exception in the finally block
I caught one today and reported HDFS-11794, creating this jira as a master one 
to catch other similar cases.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11974) Fsimage transfer failed due to socket timeout, but logs doesn't show that

2017-06-14 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11974:


 Summary: Fsimage transfer failed due to socket timeout, but logs 
doesn't show that
 Key: HDFS-11974
 URL: https://issues.apache.org/jira/browse/HDFS-11974
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


The idea of HDFS-11914 is to add more diagnosis information to understand what 
happened when we saw

{code}
WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: 
File http://x.y.z:50070/imagetransfer?getimage=1=latest received length 
xyz is not of the advertised size abc.
{code}

After further study, I realize that the above exception is thrown in the 
{{finally}} block of {{TransferFsImage#receiveFile}} method, thus other 
exception thrown in the main code is not reported, such as SocketTimeOut.

We should include the information of the exceptions thrown in the main code 
when throwing exception in the {{finally}} block.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11938) Logs for KMS delegation token lifecycle

2017-06-06 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11938:


 Summary: Logs for KMS delegation token lifecycle
 Key: HDFS-11938
 URL: https://issues.apache.org/jira/browse/HDFS-11938
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang


We run into quite some customer cases about authentication failures related to 
KMS delegation token. It would be nice to see a log for each stage of the token:
1. creation
2. renewal
3. removal upon cancel
4. remove upon expiration
So that when we correlate the logs for the same DT, we can have a good picture 
about what's going on, and what could have caused the authentication failure.

The same is applicable to other delegation tokens.

NOTE: When log info about delagation token, we don't want leak user's secret 
info.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11914) Add more diagnosis info for fsimage transfer failure.

2017-06-02 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11914:


 Summary: Add more diagnosis info for fsimage transfer failure.
 Key: HDFS-11914
 URL: https://issues.apache.org/jira/browse/HDFS-11914
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


Hit a fsimage download problem:

Client tries to download fsimage, and got:

 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: 
File http://x.y.z:50070/imagetransfer?getimage=1=latest received length 
xyz is not of the advertised size abc.

Basically client does not get enough fsimage data and finished prematurely 
without any exception thrown, until it finds the size of data received is 
smaller than expected. The client then closed the conenction to NN, that caused 
NN to report

INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Connection closed 
by client

This jira is to add some more information in logs to help debugging the 
sitaution. Specifically, report the stack trace when the connection is closed. 
And how much data has been sent at that point. etc.
 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Jenkins test failure

2017-05-17 Thread Yongjun Zhang
Thanks a lot Akira! Good catch!

--Yongjun

On Wed, May 17, 2017 at 3:46 PM, Akira Ajisaka <aajis...@apache.org> wrote:

> Hi Yongjun,
>
> Jenkins selects the latest attachment for precommit job regardless of the
> type of the attachment.
>
> The workaround is to attach the patch again.
>
> Regards,
> Akira
>
> On 2017/05/17 18:38, Yongjun Zhang wrote:
>
>> Hi,
>>
>> I saw quite a few jenkins test failure for patches uploaded to jira,
>>
>> For example,
>> https://builds.apache.org/job/PreCommit-HADOOP-Build/12347/console
>>
>> apache-yetus-2971eff/yetus-project/pom.xml
>> Modes:  Sentinel  MultiJDK  Jenkins  Robot  Docker  ResetRepo  UnitTests
>> Processing: HADOOP-14407
>> HADOOP-14407 patch is being downloaded at Wed May 17 19:38:33 UTC 2017
>> from
>>   https://issues.apache.org/jira/secure/attachment/12868556/
>> TotalTime-vs-CopyBufferSize.jpg
>> -> Downloaded
>> ERROR: Unsure how to process HADOOP-14407.
>>
>>
>> Wonder if anyone can help?
>>
>>
>> Thanks.
>>
>>
>> --Yongjun
>>
>>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Jenkins test failure

2017-05-17 Thread Yongjun Zhang
Hi,

I saw quite a few jenkins test failure for patches uploaded to jira,

For example,
https://builds.apache.org/job/PreCommit-HADOOP-Build/12347/console

apache-yetus-2971eff/yetus-project/pom.xml
Modes:  Sentinel  MultiJDK  Jenkins  Robot  Docker  ResetRepo  UnitTests
Processing: HADOOP-14407
HADOOP-14407 patch is being downloaded at Wed May 17 19:38:33 UTC 2017 from
  
https://issues.apache.org/jira/secure/attachment/12868556/TotalTime-vs-CopyBufferSize.jpg
-> Downloaded
ERROR: Unsure how to process HADOOP-14407.


Wonder if anyone can help?


Thanks.


--Yongjun


[jira] [Created] (HDFS-11799) Introduce a config to allow setting up write pipeline with single node

2017-05-10 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11799:


 Summary: Introduce a config to allow setting up write pipeline 
with single node
 Key: HDFS-11799
 URL: https://issues.apache.org/jira/browse/HDFS-11799
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


During pipeline recovery, if not enough DNs can be found, if 
dfs.client.block.write.replace-datanode-on-failure.best-effort
is enabled, we let the pipeline to continue, even if there is a single DN.

Similarly, when we create the write pipeline initially, if for some reason we 
can't find enough DNs, we can have a similar config to enable writing with a 
single DN.

More study will be done.
 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11706) Enable fallback to regular distcp when distcp failed with snapshot diff

2017-04-26 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11706:


 Summary: Enable fallback to regular distcp when distcp failed with 
snapshot diff
 Key: HDFS-11706
 URL: https://issues.apache.org/jira/browse/HDFS-11706
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang


When snapshot based distcp failed (-diff), it used to fallback to regular 
distcp. However, the fallback was disabled by HDFS-10313, for couple of reasons:

# Safety reason. For example, if user passed wrong parameter to the command 
(especially  snapshot name), the sync step could fail.
# -diff doesn't allow -delete option, which means, even if we fallback to 
regular distcp, distcp doesn't know whether -delete should be applied.
There are two possible approaches to solve this problem:
* introduce a new command line switch, to tell the fallback run whether to 
enable -delete
* let the command line option parser to remember whether -delete was passed 
initially. If -delete was passed, disable -delete when -diff is passed, then 
re-enable -delete when fallback.

This jira is to implement one of these approaches. This applies to -rdiff too.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11656) RetryInvocationHandler may report ANN as SNN in messages.

2017-04-14 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11656:


 Summary: RetryInvocationHandler  may report ANN as SNN in messages.
 Key: HDFS-11656
 URL: https://issues.apache.org/jira/browse/HDFS-11656
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


When multiple threads use the same DFSClient to make RPC calls, they may report 
incorrect NN host name in messages like

 INFO [pool-3-thread-13] retry.RetryInvocationHandler 
(RetryInvocationHandler.java:invoke(148)) - Exception while invoking delete of 
class ClientNamenodeProtocolTranslatorPB over 
hdpb-nn0001.prn.parsec.apple.com/*a.b.c.d*:8020. Trying to fail over 
immediately.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category WRITE is not supported in state standby. Visit 
https://s.apache.org/sbnn-error

where *a.b.c.d* is the active NN, which confuses user to think failover is not 
behaving correctly.

The reason is that the ProxyDescriptor data field of RetryInvocationHandler may 
be shared by multiple threads that do the RPC calls, the failover done by one 
thread may be visible to other threads when reporting the above kind of 
message. 

As an example, 
# multiple threads start with the same SNN to to the call, 
# all threads discover that a failover is needed, 
# thread X failover first, and changed the ProxyDescriptor's proxyInfo to ANN
# other threads reports the above message with the proxyInfo changed by thread 
X, and reported ANN instead of SNN in the message.

Some details:

RetryInvocationHandler does the following when failing over:
{code}
  synchronized void failover(long expectedFailoverCount, Method method,
   int callId) {
  // Make sure that concurrent failed invocations only cause a single
  // actual failover.
  if (failoverCount == expectedFailoverCount) {
fpp.performFailover(proxyInfo.proxy);
failoverCount++;
  } else {
LOG.warn("A failover has occurred since the start of call #" + callId
+ " " + proxyInfo.getString(method.getName()));
  }
  proxyInfo = fpp.getProxy();
}
{code}
and changed the proxyInfo in the ProxyDescriptor.

While the log method below report message with ProxyDescriotor's proxyinfo:
{code}
private void log(final Method method, final boolean isFailover,
  final int failovers, final long delay, final Exception ex) {
..
   final StringBuilder b = new StringBuilder()
.append(ex + ", while invoking ")
.append(proxyDescriptor.getProxyInfo().getString(method.getName()));
if (failovers > 0) {
  b.append(" after ").append(failovers).append(" failover attempts");
}
b.append(isFailover? ". Trying to failover ": ". Retrying ");
b.append(delay > 0? "after sleeping for " + delay + "ms.": "immediately.");
{code}
and so does  {{handleException}} method do
{code}
if (LOG.isDebugEnabled()) {
  LOG.debug("Exception while invoking call #" + callId + " "
  + proxyDescriptor.getProxyInfo().getString(method.getName())
  + ". Not retrying because " + retryInfo.action.reason, e);
}
{code}

FailoverProxyProvider
{code}
   public String getString(String methodName) {
  return proxy.getClass().getSimpleName() + "." + methodName
  + " over " + proxyInfo;
}

@Override
public String toString() {
  return proxy.getClass().getSimpleName() + " over " + proxyInfo;
}
{code}
 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11385) DistCp CopyCommitter should issue warning instead of throw exception when failed to delete file.

2017-02-01 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11385:


 Summary: DistCp CopyCommitter should issue warning instead of 
throw exception when failed to delete file.
 Key: HDFS-11385
 URL: https://issues.apache.org/jira/browse/HDFS-11385
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Reporter: Yongjun Zhang


CopyCommiter#deleteMissing(Configuration conf) does the following in a loop for 
all files that need to be deleted:

{code}
if (result) {
  LOG.info("Deleted " + trgtFileStatus.getPath() + " - Missing at 
source");
  deletedEntries++;
} else {
  throw new IOException("Unable to delete " + trgtFileStatus.getPath());
}
{code}

If for some reason there is a failure to delete a file, then an exception is 
thrown and no other files will be deleted.

It seems more reasonable to issue a warning here instead of throw an exception.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-24 Thread Yongjun Zhang
Thanks Andrew much for the work here!

+1 (binding).

- Downloaded both binary and src tarballs
- Verified md5 checksum and signature for both
- Built from source tarball
- Deployed 2 pseudo clusters, one with the released tarball and the other
  with what I built from source, and did the following on both:
  - Run basic HDFS operations, snapshots and distcp jobs
  - Run pi job
  - Examined HDFS webui, YARN webui.

 Best,

 --Yongjun


On Tue, Jan 24, 2017 at 3:56 PM, Eric Badger 
wrote:

> +1 (non-binding)
> - Verified signatures and md5- Built from source- Started single-node
> cluster on my mac- Ran some sleep jobs
> Eric
>
> On Tuesday, January 24, 2017 4:32 PM, Yufei Gu 
> wrote:
>
>
>  Hi Andrew,
>
> Thanks for working on this.
>
> +1 (Non-Binding)
>
> 1. Downloaded the binary and verified the md5.
> 2. Deployed it on 3 node cluster with 1 ResourceManager and 2 NodeManager.
> 3. Set YARN to use Fair Scheduler.
> 4. Ran MapReduce jobs Pi
> 5. Verified Hadoop version command output is correct.
>
> Best,
>
> Yufei
>
> On Tue, Jan 24, 2017 at 3:02 AM, Marton Elek 
> wrote:
>
> > ]>
> > > minicluster is kind of weird on filesystems that don't support mixed
> > case, like OS X's default HFS+.
> > >
> > > $  jar tf hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar | grep
> -i
> > license
> > > LICENSE.txt
> > > license/
> > > license/LICENSE
> > > license/LICENSE.dom-documentation.txt
> > > license/LICENSE.dom-software.txt
> > > license/LICENSE.sax.txt
> > > license/NOTICE
> > > license/README.dom.txt
> > > license/README.sax.txt
> > > LICENSE
> > > Grizzly_THIRDPARTYLICENSEREADME.txt
> >
> >
> > I added a patch to https://issues.apache.org/jira/browse/HADOOP-14018 to
> > add the missing META-INF/LICENSE.txt to the shaded files.
> >
> > Question: what should be done with the other LICENSE files in the
> > minicluster. Can we just exclude them (from legal point of view)?
> >
> > Regards,
> > Marton
> >
> > -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >
> >
>
>
>


[jira] [Created] (HDFS-11292) log lastWrittenTxId in logSyncAll

2017-01-04 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11292:


 Summary: log lastWrittenTxId in logSyncAll
 Key: HDFS-11292
 URL: https://issues.apache.org/jira/browse/HDFS-11292
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Yongjun Zhang


For the issue reported in HDFS-10943, even after HDFS-7964's fix is included, 
the problem still exists, this means there might be some synchronization issue.

To diagnose that, create this jira to report the lastWrittenTxId info in 
{{logSyncAll()}} call, such that we can compare against the error message 
reported in HDFS-7964

Specifically, there is two possibility for the HDFS-10943 issue:

1. {{logSyncAll()}} (statement A in the code quoted below) doesn't flush all 
requested txs for some reason

2.  {{logSyncAll()}} does flush all requested txs, but some new txs sneaked in 
between A and B. It's observed that the lastWrittenTxId in B and C are the same.

This proposed reporting would help confirming if 2 is true.

{code}
 public synchronized void endCurrentLogSegment(boolean writeEndTxn) {
LOG.info("Ending log segment " + curSegmentTxId);
Preconditions.checkState(isSegmentOpen(),
"Bad state: %s", state);

if (writeEndTxn) {
  logEdit(LogSegmentOp.getInstance(cache.get(),
  FSEditLogOpCodes.OP_END_LOG_SEGMENT));
}
// always sync to ensure all edits are flushed.
A.logSyncAll();

B.printStatistics(true);

final long lastTxId = getLastWrittenTxId();

try {
C.  journalSet.finalizeLogSegment(curSegmentTxId, lastTxId);
  editLogStream = null;
} catch (IOException e) {
  //All journals have failed, it will be handled in logSync.
}

state = State.BETWEEN_LOG_SEGMENTS;
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11040) Add documentation for HDFS-9820 distcp improvement

2016-10-20 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-11040:


 Summary: Add documentation for HDFS-9820 distcp improvement
 Key: HDFS-11040
 URL: https://issues.apache.org/jira/browse/HDFS-11040
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10993) rename may fail without a clear message indicating the failure reason.

2016-10-10 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10993:


 Summary: rename may fail without a clear message indicating the 
failure reason.
 Key: HDFS-10993
 URL: https://issues.apache.org/jira/browse/HDFS-10993
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang


Currently the FSDirRenameOp$unprotectedRenameTo  looks like
{code}
 static INodesInPath unprotectedRenameTo(FSDirectory fsd,
  final INodesInPath srcIIP, final INodesInPath dstIIP, long timestamp)
  throws IOException {
assert fsd.hasWriteLock();
final INode srcInode = srcIIP.getLastINode();
try {
  validateRenameSource(fsd, srcIIP);
} catch (SnapshotException e) {
  throw e;
} catch (IOException ignored) {
  return null;
}

String src = srcIIP.getPath();
String dst = dstIIP.getPath();
// validate the destination
if (dst.equals(src)) {
  return dstIIP;
}

try {
  validateDestination(src, dst, srcInode);
} catch (IOException ignored) {
  return null;
}

if (dstIIP.getLastINode() != null) {
  NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
  "failed to rename " + src + " to " + dst + " because destination " +
  "exists");
  return null;
}
INode dstParent = dstIIP.getINode(-2);
if (dstParent == null) {
  NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
  "failed to rename " + src + " to " + dst + " because destination's " +
  "parent does not exist");
  return null;
}

fsd.ezManager.checkMoveValidity(srcIIP, dstIIP, src);
// Ensure dst has quota to accommodate rename
verifyFsLimitsForRename(fsd, srcIIP, dstIIP);
verifyQuotaForRename(fsd, srcIIP, dstIIP);

RenameOperation tx = new RenameOperation(fsd, srcIIP, dstIIP);

boolean added = false;

INodesInPath renamedIIP = null;
try {
  // remove src
  if (!tx.removeSrc4OldRename()) {
return null;
  }

  renamedIIP = tx.addSourceToDestination();
  added = (renamedIIP != null);
  if (added) {
if (NameNode.stateChangeLog.isDebugEnabled()) {
  NameNode.stateChangeLog.debug("DIR* FSDirectory" +
  ".unprotectedRenameTo: " + src + " is renamed to " + dst);
}

tx.updateMtimeAndLease(timestamp);
tx.updateQuotasInSourceTree(fsd.getBlockStoragePolicySuite());

return renamedIIP;
  }
} finally {
  if (!added) {
tx.restoreSource();
  }
}
NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
"failed to rename " + src + " to " + dst);
return null;
  }
{code}

There are several places that returns null without a clear message. Though that 
seems to be on purpose in the code, it left to user to guess what's going on.

It seems to make sense to have a warning for each failed scenario.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.6.5 (RC1)

2016-10-07 Thread Yongjun Zhang
Hi Sangjin,

Thanks a lot for your work here.

My +1 (binding).

- Downloaded both binary and src tarballs
- Verified md5 checksum and signature for both
- Build from source tarball
- Deployed 2 pseudo clusters, one with the released tarball and the other
with what I built from source, and did the following on both:
- Run basic HDFS operations, and distcp jobs
- Run pi job
- Examined HDFS webui, YARN webui.

Best,

--Yongjun

> > > * verified basic HDFS operations and Pi job.
> > > * Did a sanity check for RM and NM UI.


On Fri, Oct 7, 2016 at 5:08 PM, Sangjin Lee  wrote:

> I'm casting my vote: +1 (binding)
>
> Regards,
> Sangjin
>
> On Fri, Oct 7, 2016 at 3:12 PM, Andrew Wang 
> wrote:
>
> > Thanks to Chris and Sangjin for working on this release.
> >
> > +1 binding
> >
> > * Verified signatures
> > * Built from source tarball
> > * Started HDFS and did some basic ops
> >
> > Thanks,
> > Andrew
> >
> > On Fri, Oct 7, 2016 at 2:50 PM, Wangda Tan  wrote:
> >
> > > Thanks Sangjin for cutting this release!
> > >
> > > +1 (Binding)
> > >
> > > - Downloaded binary tar ball and setup a single node cluster.
> > > - Submit a few applications and which can successfully run.
> > >
> > > Thanks,
> > > Wangda
> > >
> > >
> > > On Fri, Oct 7, 2016 at 10:33 AM, Zhihai Xu 
> > wrote:
> > >
> > > > Thanks Sangjin for creating release 2.6.5 RC1.
> > > >
> > > > +1 (non-binding)
> > > >
> > > > * Downloaded and built from source
> > > > * Verified md5 checksums and signature
> > > > * Deployed a pseudo cluster
> > > > * verified basic HDFS operations and Pi job.
> > > > * Did a sanity check for RM and NM UI.
> > > >
> > > > Thanks
> > > > zhihai
> > > >
> > > > On Fri, Oct 7, 2016 at 8:16 AM, Sangjin Lee 
> wrote:
> > > >
> > > > > Thanks Masatake!
> > > > >
> > > > > Today's the last day for this vote, and I'd like to ask you to try
> > out
> > > > the
> > > > > RC and vote on this today. So far there has been no binding vote.
> > > Thanks
> > > > > again.
> > > > >
> > > > > Regards,
> > > > > Sangjin
> > > > >
> > > > > On Fri, Oct 7, 2016 at 6:45 AM, Masatake Iwasaki <
> > > > > iwasak...@oss.nttdata.co.jp> wrote:
> > > > >
> > > > > > +1(non-binding)
> > > > > >
> > > > > > * verified signature and md5.
> > > > > > * built with -Pnative on CentOS6 and OpenJDK7.
> > > > > > * built documentation and skimmed the contents.
> > > > > > * built rpms by bigtop and ran smoke-tests of hdfs, yarn and
> > > mapreduce
> > > > on
> > > > > > 3-node cluster.
> > > > > >
> > > > > > Thanks,
> > > > > > Masatake Iwasaki
> > > > > >
> > > > > > On 10/3/16 09:12, Sangjin Lee wrote:
> > > > > >
> > > > > >> Hi folks,
> > > > > >>
> > > > > >> I have pushed a new release candidate (R1) for the Apache Hadoop
> > > 2.6.5
> > > > > >> release (the next maintenance release in the 2.6.x release
> line).
> > > RC1
> > > > > >> contains fixes to CHANGES.txt, and is otherwise identical to
> RC0.
> > > > > >>
> > > > > >> Below are the details of this release candidate:
> > > > > >>
> > > > > >> The RC is available for validation at:
> > > > > >> http://home.apache.org/~sjlee/hadoop-2.6.5-RC1/.
> > > > > >>
> > > > > >> The RC tag in git is release-2.6.5-RC1 and its git commit is
> > > > > >> e8c9fe0b4c252caf2ebf1464220599650f119997.
> > > > > >>
> > > > > >> The maven artifacts are staged via repository.apache.org at:
> > > > > >> https://repository.apache.org/content/repositories/
> > > > > orgapachehadoop-1050/.
> > > > > >>
> > > > > >> You can find my public key at
> > > > > >> http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS.
> > > > > >>
> > > > > >> Please try the release and vote. The vote will run for the
> usual 5
> > > > > days. I
> > > > > >> would greatly appreciate your timely vote. Thanks!
> > > > > >>
> > > > > >> Regards,
> > > > > >> Sangjin
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (HDFS-10961) Flaky test TestSnapshotFileLength.testSnapshotfileLength

2016-10-04 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10961:


 Summary: Flaky test TestSnapshotFileLength.testSnapshotfileLength
 Key: HDFS-10961
 URL: https://issues.apache.org/jira/browse/HDFS-10961
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, hdfs-client
Affects Versions: 3.0
Reporter: Yongjun Zhang


Flaky test TestSnapshotFileLength.testSnapshotfileLength

{code}
Error Message
Unable to close file because the last block does not have enough number of 
replicas.
Stack Trace
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2630)
at 
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2592)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2546)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength.testSnapshotfileLength(TestSnapshotFileLength.java:130)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed

2016-09-30 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10943:


 Summary: rollEditLog expects empty EditsDoubleBuffer.bufCurrent 
which is not guaranteed
 Key: HDFS-10943
 URL: https://issues.apache.org/jira/browse/HDFS-10943
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


Per the following trace stack:
{code}
FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log 
segment 10562075963, 10562174157 failed for required journal 
(JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, 
0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid 
10562075963))
java.io.IOException: FSEditStream has 49708 bytes still to be flushed and 
cannot be closed.
at 
org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002)
at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
at 
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
2016-09-23 21:40:59,618 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting 
QuorumOutputStream starting at txid 10562075963
{code}

The exception is from  EditsDoubleBuffer
{code}
 public void close() throws IOException {
Preconditions.checkNotNull(bufCurrent);
Preconditions.checkNotNull(bufReady);

int bufSize = bufCurrent.size();
if (bufSize != 0) {
  throw new IOException("FSEditStream has " + bufSize
  + " bytes still to be flushed and cannot be closed.");
}

IOUtils.cleanup(null, bufCurrent, bufReady);
bufCurrent = bufReady = null;
  }
{code}

We can see that FSNamesystem.rollEditLog expects  EditsDoubleBuffer.bufCurrent 
to be empty.

Edits are recorded via FSEditLog$logSync, which does:
{code}
   * The data is double-buffered within each edit log implementation so that
   * in-memory writing can occur in parallel with the on-disk writing.
   *
   * Each sync occurs in three steps:
   *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
   *  flag.
   *   2. unsynchronized, it flushes the data to storage
   *   3. synchronized, it resets the flag and notifies anyone waiting on the
   *  sync.
   *
   * The lack of synchronization on step 2 allows other threads to continue
   * to write into the memory buffer while the sync is in progress.
   * Because this step is unsynchronized, actions that need to avoid
   * concurrency with sync() should be synchronized and also call
   * waitForSyncToFinish() before assuming they are running alone.
   */
{code}

We can see that step 2 is on-purposely not synchronized to let other threads to 
write into the memory buffer, presumbaly EditsDoubleBuffer.bufCurrent. This 
means that the EditsDoubleBuffer.bufCurrent  can be non-empty when logSync is 
done.

Now if rollEditLog happens, the above exception happens.

Another interesting observation is, the size of the EditsDoubleBuffer can be as 
large as "private int outputBufferCapacity = 512 * 1024;", which means a lot of 
edits could get buffered before they are flushed to JNs. 

How

[jira] [Created] (HDFS-10942) Incorrect handling of flushing edit logs to JN

2016-09-30 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10942:


 Summary: Incorrect handling of flushing edit logs to JN
 Key: HDFS-10942
 URL: https://issues.apache.org/jira/browse/HDFS-10942
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: s, hdfs
Reporter: Yongjun Zhang


We use EditsDoubleBuffer to handle edit logs:
{code}
/**
 * A double-buffer for edits. New edits are written into the first buffer
 * while the second is available to be flushed. Each time the double-buffer
 * is flushed, the two internal buffers are swapped. This allows edits
 * to progress concurrently to flushes without allocating new buffers each
 * time.
 */
{code}

With the following code, that flush the ready buffer, it copy the ready buffer 
to a local copy, then flush.

{code}

QuarumOutputStream (buf in the code below is an instance of EditsDoubleBuffer):

  @Override
  protected void flushAndSync(boolean durable) throws IOException {
int numReadyBytes = buf.countReadyBytes();
if (numReadyBytes > 0) {
  int numReadyTxns = buf.countReadyTxns();
  long firstTxToFlush = buf.getFirstReadyTxId();

  assert numReadyTxns > 0;

  // Copy from our double-buffer into a new byte array. This is for
  // two reasons:
  // 1) The IPC code has no way of specifying to send only a slice of
  //a larger array.
  // 2) because the calls to the underlying nodes are asynchronous, we
  //need a defensive copy to avoid accidentally mutating the buffer
  //before it is sent.
  DataOutputBuffer bufToSend = new DataOutputBuffer(numReadyBytes);
  buf.flushTo(bufToSend);
  assert bufToSend.getLength() == numReadyBytes;
  byte[] data = bufToSend.getData();
  assert data.length == bufToSend.getLength();
{code}

 The above call doesn't seem to prevent the orginal copy of the buffer inside 
buf from being swapped by  the following method. 

{code}
EditsDoubleBuffer:

 public void setReadyToFlush() {
assert isFlushed() : "previous data not flushed yet";
TxnBuffer tmp = bufReady;
bufReady = bufCurrent;
bufCurrent = tmp;
  }

{code}

Though we have some runtime assertion in the code, the assertion is not enabled 
in production, so the expected condition in the assert may be false at runtime. 
This would possibly cause a mess.  When any condition is not as expected by the 
assertion, it seems a real exception should be thrown instead.

So two issues in summary:
- How we synchronize between the flush and the swap of the two buffers
- Whether we should throw real exception instead of the assert that's disabled 
at runtime normally.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10919) Provide admin/debug tool to dump out info of a given block

2016-09-27 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10919:


 Summary: Provide admin/debug tool to dump out info of a given block
 Key: HDFS-10919
 URL: https://issues.apache.org/jira/browse/HDFS-10919
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs
Reporter: Yongjun Zhang


We have fsck to find out blocks associated with a file, which is nice. 
Sometimes, we saw trouble with a specific block, we'd like to collect info 
about this block, such as
* what file this block belong to, 
* where the replicas of this block are located, 
* whether the block is EC coded; 
* if a block is EC coded, whether it's a data block, or code
* if a block is EC coded, what's the codec.
* if a block is EC coded, what's the block group
* for the block group, what are the other blocks

Create this jira to provide such a util, as dfsadmin, or a debug tool.

Thanks.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10887) Provide admin/debug tool to dump block map

2016-09-22 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10887:


 Summary: Provide admin/debug tool to dump block map
 Key: HDFS-10887
 URL: https://issues.apache.org/jira/browse/HDFS-10887
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


>From time to time, when NN restarts, we see
{code}
"The reported blocks X needs additional Y blocks to reach the threshold 0.9990 
of total blocks Z. Safe mode will be turned off automatically.
{code}

We'd wonder what these blocks that still need block reports are, and what DNs 
they could possibly be located, what happened to these DNs.

This jira to to propose a new admin or debug tool to dump the block map info 
with the blocks that have fewer than minRepl replicas.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10812) ntermittent failure of TestRBWBlockInvalidation.testRWRInvalidation

2016-08-27 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10812:


 Summary: ntermittent failure of 
TestRBWBlockInvalidation.testRWRInvalidation
 Key: HDFS-10812
 URL: https://issues.apache.org/jira/browse/HDFS-10812
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang


Intermittent failure of TestRBWBlockInvalidation.testRWRInvalidation

{code}
Regression

org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation.testRWRInvalidation
Failing for the past 1 build (Since Unstable#16553 )
Took 8.9 sec.
Error Message

expected:<[old gs data
new gs data
]> but was:<[]>

Stacktrace

org.junit.ComparisonFailure: expected:<[old gs data
new gs data
]> but was:<[]>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation.testRWRInvalidation(TestRBWBlockInvalidation.java:225)
{code}

see 
https://builds.apache.org/job/PreCommit-HDFS-Build/16553/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestRBWBlockInvalidation/testRWRInvalidation/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10811) Intermittent failure of TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN

2016-08-27 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10811:


 Summary: Intermittent failure of 
TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN
 Key: HDFS-10811
 URL: https://issues.apache.org/jira/browse/HDFS-10811
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang


Intermittent failure of 
TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN

{code}
Regression

org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN
Failing for the past 1 build (Since Unstable#16553 )
Took 10 min.
Error Message

test timed out after 60 milliseconds

Stacktrace

java.lang.Exception: test timed out after 60 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN(TestRBWBlockInvalidation.java:120)

{code}

See:

https://builds.apache.org/job/PreCommit-HDFS-Build/16553/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestRBWBlockInvalidation/testBlockInvalidationWhenRBWReplicaMissedInDN/





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10575) webhdfs fails with filenames including semicolons

2016-08-24 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-10575.
--
Resolution: Duplicate

> webhdfs fails with filenames including semicolons
> -
>
> Key: HDFS-10575
> URL: https://issues.apache.org/jira/browse/HDFS-10575
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Bob Hansen
> Attachments: curl_request.txt, dfs_copyfrom_local_traffic.txt
>
>
> Via webhdfs or native HDFS, we can create files with semicolons in their 
> names:
> {code}
> bhansen@::1 /tmp$ hdfs dfs -copyFromLocal /tmp/data 
> "webhdfs://localhost:50070/foo;bar"
> bhansen@::1 /tmp$ hadoop fs -ls /
> Found 1 items
> -rw-r--r--   2 bhansen supergroup  9 2016-06-24 12:20 /foo;bar
> {code}
> Attempting to fetch the file via webhdfs fails:
> {code}
> bhansen@::1 /tmp$ curl -L 
> "http://localhost:50070/webhdfs/v1/foo%3Bbar?user.name=bhansen=OPEN;
> {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
>  does not exist: /foo\n\tat 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)\n\tat
>  
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)\n\tat
>  
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)\n"}}
> {code}
> It appears (from the attached TCP dump in curl_request.txt) that the 
> namenode's redirect unescapes the semicolon, and the DataNode's HTTP server 
> is splitting the request at the semicolon, and failing to find the file "foo".
> Interesting side notes:
> * In the attached dfs_copyfrom_local_traffic.txt, you can see the 
> copyFromLocal command writing the data to "foo;bar_COPYING_", which is then 
> redirected and just writes to "foo".  The subsequent rename attempts to 
> rename "foo;bar_COPYING_" to "foo;bar", but has the same parsing bug so 
> effectively renames "foo" to "foo;bar".
> Here is the full range of special characters that we initially started with 
> that led to the minimal reproducer above:
> {code}
> hdfs dfs -copyFromLocal /tmp/data webhdfs://localhost:50070/'~`!@#$%^& 
> ()-_=+|<.>]}",\\\[\{\*\?\;'\''data'
> curl -L 
> "http://localhost:50070/webhdfs/v1/%7E%60%21%40%23%24%25%5E%26+%28%29-_%3D%2B%7C%3C.%3E%5D%7D%22%2C%5C%5B%7B*%3F%3B%27data?user.name=bhansen=OPEN=0;
> {code}
> Thanks to [~anatoli.shein] for making a concise reproducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-10788.
--
Resolution: Duplicate

Thanks guys, I'm marking it as duplicate of HDFS-9958.


> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Issue in handling checksum errors in write pipeline

2016-07-30 Thread Yongjun Zhang
Hi Brahma,

Thanks for reporting the issue.

If your problem is really a network issue, then your proposed solution
sounds reasonable to me, and it's different than what HDFS-6937 intends to
solve. I think we can create a new jira for your issue. Here is why:

HDFS-6937's scenario is that we keep replacing the third node in recovery,
and did not detect that the middle node is corrupt. Thus adding a
corruption checking for the middle node would solve the issue; In your
case, even if we try to check the middle node, it would appear as not
corrupt. The problem is that, we don't have a check for network issue (and
probably adding a network check may not be feasible here).

On the other hand, if it's not a network issue, then it could be caused by
HDFS-4660, if you don't already have the fix.

Hope my explanation makes sense.

Thanks.

--Yongjun

On Sat, Jul 30, 2016 at 4:03 AM, Brahma Reddy Battula <
brahmareddy.batt...@huawei.com> wrote:

> Hello
>
>
> We had come across one issue, where write is failed even 7 DN's are
> available due to network fault at one datanode which is LAST_IN_PIPELINE.
> It will be similar to HDFS-6937 .
>
> Scenario : (DN3 has N/W Fault and Min repl=2).
>
> Write pipeline:
> DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
> DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
> 
> And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no
> more datanodes to construct the pipeline.
>
> Thinking we can handle like below:
>
> Instead of throwing IOException for ERROR_CHECKSUM ack from downstream, If
> we can send back the pipeline ack and client side we can replace both DN2
> and DN3 with new nodes as we can't decide on which is having network
> problem.
>
>
> Please give you views the possible fix..
>
>
> --Brahma Reddy Battula
>
>


[jira] [Created] (HDFS-10698) Test org.apache.hadoop.cli.TestHDFSCLI fails in trunk

2016-07-27 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10698:


 Summary: Test org.apache.hadoop.cli.TestHDFSCLI fails in trunk
 Key: HDFS-10698
 URL: https://issues.apache.org/jira/browse/HDFS-10698
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang


{code}
Running org.apache.hadoop.cli.TestHDFSCLI
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.887 sec <<< 
FAILURE! - in org.apache.hadoop.cli.TestHDFSCLI
testAll(org.apache.hadoop.cli.TestHDFSCLI)  Time elapsed: 39.697 sec  <<< 
FAILURE!
java.lang.AssertionError: One of the tests failed. See the Detailed results to 
identify the command that failed
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.cli.CLITestHelper.displayResults(CLITestHelper.java:263)
at org.apache.hadoop.cli.CLITestHelper.tearDown(CLITestHelper.java:125)
at org.apache.hadoop.cli.TestHDFSCLI.tearDown(TestHDFSCLI.java:87)


Results :

Failed tests:
  
TestHDFSCLI.tearDown:87->CLITestHelper.tearDown:125->CLITestHelper.displayResults:263
 One of the tests failed. See the Detailed results to identify the command that 
failed

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0

{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10667) Report more accurate info about data corruption location

2016-07-21 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10667:


 Summary: Report more accurate info about data corruption location
 Key: HDFS-10667
 URL: https://issues.apache.org/jira/browse/HDFS-10667
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs
Reporter: Yongjun Zhang


Per 

https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15376897=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15376897

129.77 report:

{code}
2016-07-13 11:49:01,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving blk_1116167880_42906656 src: /10.6.134.229:43844 dest: 
/10.6.129.77:5080
2016-07-13 11:49:01,543 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Checksum error in block blk_1116167880_42906656 from /10.6.134.229:43844
org.apache.hadoop.fs.ChecksumException: Checksum error: 
DFSClient_NONMAPREDUCE_2019484565_1 at 81920 exp: 1352119728 got: -1012279895
at 
org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
Method)
at 
org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:421)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:558)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
2016-07-13 11:49:01,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Exception for blk_1116167880_42906656
java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
Unexpected checksum mismatch while writing blk_1116167880_42906656 from 
/10.6.134.229:43844
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:571)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
{code}

and

https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15378879=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15378879

{quote}
While verifying only packet, the position mentioned in the checksum exception, 
is relative to packet buffer offset, not the block offset. So 81920 is the 
offset in the exception.
{quote}

Create this jira to report more accurate corruption location information: the 
offset in the file, offset in block, and offset in packet.

See 

https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15387083=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15387083




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption

2016-07-21 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-10587.
--
   Resolution: Duplicate
Fix Version/s: 2.7.1, 2.6.4

Closing this jira as duplicate of HDFS-4660.


> Incorrect offset/length calculation in pipeline recovery causes block 
> corruption
> 
>
> Key: HDFS-10587
> URL: https://issues.apache.org/jira/browse/HDFS-10587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.7.1, 2.6.4
>
> Attachments: HDFS-10587-test.patch, HDFS-10587.001.patch
>
>
> We found incorrect offset and length calculation in pipeline recovery may 
> cause block corruption and results in missing blocks under a very unfortunate 
> scenario. 
> (1) A client established pipeline and started writing data to the pipeline.
> (2) One of the data node in the pipeline restarted, closing the socket, and 
> some written data were unacknowledged.
> (3) Client replaced the failed data node with a new one, initiating block 
> transfer to copy existing data in the block to the new datanode.
> (4) The block is transferred to the new node. Crucially, the entire block, 
> including the unacknowledged data, was transferred.
> (5) The last chunk (512 bytes) was not a full chunk, but the destination 
> still reserved the whole chunk in its buffer, and wrote the entire buffer to 
> disk, therefore some written data is garbage.
> (6) When the transfer was done, the destination data node converted the 
> replica from temporary to rbw, which made its visible length as the length of 
> bytes on disk. That is to say, it thought whatever was transferred was 
> acknowledged. However, the visible length of the replica is different (round 
> up to the next multiple of 512) than the source of transfer. [1]
> (7) Client then truncated the block in the attempt to remove unacknowledged 
> data. However, because the visible length is equivalent of the bytes on disk, 
> it did not truncate unacknowledged data.
> (8) When new data was appended to the destination, it skipped the bytes 
> already on disk. Therefore, whatever was written as garbage was not replaced.
> (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it 
> wouldn’t tell NameNode to mark the replica as corrupt, so the client 
> continued to form a pipeline using the corrupt replica.
> (10) Finally the DN that had the only healthy replica was restarted. NameNode 
> then update the pipeline to only contain the corrupt replica.
> (11) Client continue to write to the corrupt replica, because neither client 
> nor the data node itself knows the replica is corrupt. When the restarted 
> datanodes comes back, their replica are stale, despite they are not corrupt. 
> Therefore, none of the replica is good and up to date.
> The sequence of events was reconstructed based on DataNode/NameNode log and 
> my understanding of code.
> Incidentally, we have observed the same sequence of events on two independent 
> clusters.
> [1]
> The sender has the replica as follows:
> 2016-04-15 22:03:05,066 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW
>   getNumBytes() = 41381376
>   getBytesOnDisk()  = 41381376
>   getVisibleLength()= 41186444
>   getVolume()   = /hadoop-i/data/current
>   getBlockFile()= 
> /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324
>   bytesAcked=41186444
>   bytesOnDisk=41381376
> while the receiver has the replica as follows:
> 2016-04-15 22:03:05,068 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW
>   getNumBytes() = 41186816
>   getBytesOnDisk()  = 41186816
>   getVisibleLength()= 41186816
>   getVolume()   = /hadoop-g/data/current
>   getBlockFile()= 
> /hadoop-g/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324
>   bytesAcked=41186816
>   bytesOnDisk=41186816



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10652) Add a unit test for HDFS-4660

2016-07-19 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10652:


 Summary: Add a unit test for HDFS-4660
 Key: HDFS-10652
 URL: https://issues.apache.org/jira/browse/HDFS-10652
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs
Reporter: Yongjun Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10632) DataXceiver to report the length of the block it's receiving

2016-07-14 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10632:


 Summary: DataXceiver to report the length of the block it's 
receiving
 Key: HDFS-10632
 URL: https://issues.apache.org/jira/browse/HDFS-10632
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang


In DataXceiver:#writeBlock:

{code}
LOG.info("Receiving " + block + " src: " + remoteAddress + " dest: "
+ localAddress);
{code}
This message is better to report the size of the block its receiving.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10624) VolumeScanner to report why a block is found bad

2016-07-13 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10624:


 Summary: VolumeScanner to report why a block is found bad
 Key: HDFS-10624
 URL: https://issues.apache.org/jira/browse/HDFS-10624
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs
Reporter: Yongjun Zhang


Seeing the following on DN log. 

{code}
2016-04-07 20:27:45,416 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
opWriteBlock BP-1800173197-10.204.68.5-125156296:blk_1170125248_96465013 
received exception java.io.EOFException: Premature EOF: no length prefix 
available
2016-04-07 20:27:45,416 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
rn2-lampp-lapp1115.rno.apple.com:1110:DataXceiver error processing WRITE_BLOCK 
operation  src: /10.204.64.137:45112 dst: /10.204.64.151:1110
java.io.EOFException: Premature EOF: no length prefix available
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2241)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:738)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
2016-04-07 20:27:46,116 WARN 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
BP-1800173197-10.204.68.5-125156296:blk_1170125248_96458336 on 
/ngs8/app/lampp/dfs/dn
2016-04-07 20:27:46,117 ERROR 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: 
VolumeScanner(/ngs8/app/lampp/dfs/dn, DS-a14baf2b-a1ef-4282-8d88-3203438e708e) 
exiting because of exception
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
2016-04-07 20:27:46,118 INFO 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: 
VolumeScanner(/ngs8/app/lampp/dfs/dn, DS-a14baf2b-a1ef-4282-8d88-3203438e708e) 
exiting.
2016-04-07 20:27:46,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.204.64.151, 
datanodeUuid=6064994a-6769-4192-9377-83f78bd3d7a6, infoPort=0, 
infoSecurePort=1175, ipcPort=1120, 
storageInfo=lv=-56;cid=cluster6;nsid=1112595121;c=0):Failed to transfer 
BP-1800173197-10.204.68.5-125156296:blk_1170125248_96465013 to 
10.204.64.10:1110 got
java.net.SocketException: Original Exception : java.io.IOException: Connection 
reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at 
org.apache.hadoop.security.SaslOutputStream.write(SaslOutputStream.java:190)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:585)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:758)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:705)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2154)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.transferReplicaForPipelineRecovery(DataNode.java:2884)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.transferBlock(DataXceiver.java:862)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opTransferBlock(Receiver.java:200)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:118

[jira] [Created] (HDFS-10603) Flaky test org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint

2016-07-08 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10603:


 Summary: Flaky test 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint
 Key: HDFS-10603
 URL: https://issues.apache.org/jira/browse/HDFS-10603
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Reporter: Yongjun Zhang


Test 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint

may fail intermittently as

{code}
---
 T E S T S
---
Running 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.386 sec <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
testWithCheckpoint(org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot)
  Time elapsed: 15.092 sec  <<< ERROR!
java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1363)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2041)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2011)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint(TestOpenFilesWithSnapshot.java:94)


Results :

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 » IO Timed out waiting for 
Min...

Tests run: 7, Failures: 0, Errors: 1, Skipped: 0

{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10396) Using -diff option with DistCp may get "Comparison method violates its general contract" exception

2016-05-12 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10396:


 Summary: Using -diff option with DistCp may get "Comparison method 
violates its general contract" exception
 Key: HDFS-10396
 URL: https://issues.apache.org/jira/browse/HDFS-10396
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


Using -diff option get the following exception due to a bug in the comparison 
operator:

{code}
16/04/21 14:34:18 WARN tools.DistCp: Failed to use snapshot diff for distcp
java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
at java.util.TimSort.mergeHi(TimSort.java:868)
at java.util.TimSort.mergeAt(TimSort.java:485)
at java.util.TimSort.mergeForceCollapse(TimSort.java:426)
at java.util.TimSort.sort(TimSort.java:223)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at org.apache.hadoop.tools.DistCpSync.moveToTarget(DistCpSync.java:293)
at org.apache.hadoop.tools.DistCpSync.syncDiff(DistCpSync.java:261)
at org.apache.hadoop.tools.DistCpSync.sync(DistCpSync.java:131)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:163)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:432)
16/04/21 14:34:18 ERROR tools.DistCp: Exception encountered 

{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10387) DataTransferProtocol#writeBlock missing some javadocs

2016-05-10 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10387:


 Summary: DataTransferProtocol#writeBlock missing some javadocs
 Key: HDFS-10387
 URL: https://issues.apache.org/jira/browse/HDFS-10387
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


DataTransferProtocol#writeBlock's javadocs has the following parameters missing:
{code}
  final DataChecksum requestedChecksum,
  final CachingStrategy cachingStrategy,
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10386) DataTransferProtocol#writeBlock missing some javadocs

2016-05-10 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10386:


 Summary: DataTransferProtocol#writeBlock missing some javadocs
 Key: HDFS-10386
 URL: https://issues.apache.org/jira/browse/HDFS-10386
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


DataTransferProtocol#writeBlock's javadocs has the following parameters missing:
{code}
  final DataChecksum requestedChecksum,
  final CachingStrategy cachingStrategy,
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10376) setOwner call is not run as the specified user in TestPermission

2016-05-06 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10376:


 Summary: setOwner call is not run as the specified user in 
TestPermission
 Key: HDFS-10376
 URL: https://issues.apache.org/jira/browse/HDFS-10376
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


TestPermission create a user with the following name and group:

{code}
 final private static String USER_NAME = "user" + RAN.nextInt();
 final private static String[] GROUP_NAMES = {"group1", "group2"};

   UserGroupInformation userGroupInfo = 
UserGroupInformation.createUserForTesting(USER_NAME, GROUP_NAMES );
  
  FileSystem userfs = DFSTestUtil.getFileSystemAs(userGroupInfo, conf);

  // make sure mkdir of a existing directory that is not owned by 
  // this user does not throw an exception.
  userfs.mkdirs(CHILD_DIR1);
  
{code}

Supposedly 

{code}
 userfs.setOwner(CHILD_FILE3, "foo", "bar");
{code}
will be run as the specified user, but it seems to be run as me who run the 
test.

Running as the specified user would disallow setOwner, which requires superuser 
privilege. This is not happening.

Creating this jira for some investigation to understand whether it's indeed an 
issue.

Thanks.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10333) Intermittent org.apache.hadoop.hdfs.TestFileAppend failure in trunk

2016-04-26 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10333:


 Summary: Intermittent org.apache.hadoop.hdfs.TestFileAppend 
failure in trunk
 Key: HDFS-10333
 URL: https://issues.apache.org/jira/browse/HDFS-10333
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang


Java8 (I used JAVA_HOME=/opt/toolchain/jdk1.8.0_25):

{code}
--
 T E S T S
---
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.hdfs.TestFileAppend
Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 27.75 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestFileAppend
testMultipleAppends(org.apache.hadoop.hdfs.TestFileAppend)  Time elapsed: 3.674 
sec  <<< ERROR!
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[127.0.0.1:43067,DS-cf80da41-3697-4afa-8f89-93693cd5035d,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:32946,DS-3b08422c-959e-42f0-a624-91b2524c4371,DISK]],
 
original=[DatanodeInfoWithStorage[127.0.0.1:43067,DS-cf80da41-3697-4afa-8f89-93693cd5035d,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:32946,DS-3b08422c-959e-42f0-a624-91b2524c4371,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.
at 
org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1166)
at 
org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
at 
org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)


{code}

However, when I run with Java1.7, the test is sometimes successful, and it 
sometimes fails with 
{code}
Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 41.32 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestFileAppend
testMultipleAppends(org.apache.hadoop.hdfs.TestFileAppend)  Time elapsed: 9.099 
sec  <<< ERROR!
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[127.0.0.1:49006,DS-498240fa-d1c7-4ba1-b97e-a1761cbbefa5,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:43097,DS-b83b49ce-fc14-4b9e-a3fc-7df2cd9fc753,DISK]],
 
original=[DatanodeInfoWithStorage[127.0.0.1:49006,DS-498240fa-d1c7-4ba1-b97e-a1761cbbefa5,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:43097,DS-b83b49ce-fc14-4b9e-a3fc-7df2cd9fc753,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.
at 
org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162)
at 
org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
at 
org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)

{code}


The failure of this test is intermittent, but it fails pretty often.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10314) Propose a new tool that wraps around distcp to "restore" changes on target cluster

2016-04-19 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10314:


 Summary: Propose a new tool that wraps around distcp to "restore" 
changes on target cluster
 Key: HDFS-10314
 URL: https://issues.apache.org/jira/browse/HDFS-10314
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of 
-diff switch. 

Upon discussion with [~jingzhao], we will introduce a new tool that wraps 
around distcp to achieve the same purpose.

I'm thinking about calling the new tool "rsync", similar to unix/linux command 
"rsync". The "r" here means remote.

The syntax that simulate -rdiff behavior proposed in HDFS-9820 is
 {code}  
rsync  
Pcode}
This command ensure   is newer than .

I think, In the future, we can add another command to have the functionality of 
-diff switch of distcp.
 {code}  
sync  
Pcode}
where   must be older than .

Thanks [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10313:


 Summary: Distcp does not check the order of snapshot names passed 
to -diff
 Key: HDFS-10313
 URL: https://issues.apache.org/jira/browse/HDFS-10313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Reporter: Yongjun Zhang


This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
informative error message.

This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
Thanks Jing.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10263) Reversed snapshot diff report contains incorrect entries

2016-04-05 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10263:


 Summary: Reversed snapshot diff report contains incorrect entries
 Key: HDFS-10263
 URL: https://issues.apache.org/jira/browse/HDFS-10263
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang


Steps to reproduce:

1. Take a snapshot s1 at:

{code}
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/bar
-rw-r--r--   1 yzhang supergroup   1024 2016-04-05 14:48 /target/bar/f1
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/foo
-rw-r--r--   1 yzhang supergroup   1024 2016-04-05 14:48 /target/foo/f1
{code}

2. Make the following change:
{code}
  private int changeData7(Path dir) throws Exception {
final Path foo = new Path(dir, "foo");
final Path foo2 = new Path(dir, "foo2");
final Path foo_f1 = new Path(foo, "f1");
final Path foo2_f2 = new Path(foo2, "f2");
final Path foo2_f1 = new Path(foo2, "f1");
final Path foo_d1 = new Path(foo, "d1");
final Path foo_d1_f3 = new Path(foo_d1, "f3");

int numDeletedAndModified = 0;
dfs.rename(foo, foo2);

dfs.delete(foo2_f1, true);

DFSTestUtil.createFile(dfs, foo_f1, BLOCK_SIZE, DATA_NUM, 0L);
DFSTestUtil.appendFile(dfs, foo_f1, (int) BLOCK_SIZE);
dfs.rename(foo_f1, foo2_f2);
numDeletedAndModified += 1; // "M ./foo"
DFSTestUtil.createFile(dfs, foo_d1_f3, BLOCK_SIZE, DATA_NUM, 0L);
return numDeletedAndModified;
  }
{code}

that results in
{code}
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/bar
-rw-r--r--   1 yzhang supergroup   1024 2016-04-05 14:48 /target/bar/f1
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/foo
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/foo/d1
-rw-r--r--   1 yzhang supergroup   1024 2016-04-05 14:48 /target/foo/d1/f3
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/foo2
-rw-r--r--   1 yzhang supergroup   2048 2016-04-05 14:48 /target/foo2/f2
{code}

3. take snapshot s2 here

4. Do the following to revert the change done in step 2
{code}
 private int revertChangeData7(Path dir) throws Exception {
final Path foo = new Path(dir, "foo");
final Path foo2 = new Path(dir, "foo2");
final Path foo_f1 = new Path(foo, "f1");
final Path foo2_f2 = new Path(foo2, "f2");
final Path foo2_f1 = new Path(foo2, "f1");
final Path foo_d1 = new Path(foo, "d1");
final Path foo_d1_f3 = new Path(foo_d1, "f3");

int numDeletedAndModified = 0;

dfs.delete(foo_d1, true);

dfs.rename(foo2_f2, foo_f1);

dfs.delete(foo, true);

DFSTestUtil.createFile(dfs, foo2_f1, BLOCK_SIZE, DATA_NUM, 0L);
DFSTestUtil.appendFile(dfs, foo2_f1, (int) BLOCK_SIZE);

dfs.rename(foo2,  foo);

return numDeletedAndModified;
  }
{code}
that get the following results:

{code}
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/bar
-rw-r--r--   1 yzhang supergroup   1024 2016-04-05 14:48 /target/bar/f1
drwxr-xr-x   - yzhang supergroup  0 2016-04-05 14:48 /target/foo
-rw-r--r--   1 yzhang supergroup   2048 2016-04-05 14:48 /target/foo/f1
{code}

4. Take snapshot s3 here.

Below is the different snapshots

{code}
s1-s2: Difference between snapshot s1 and snapshot s2 under directory /target:
M   .
+   ./foo
R   ./foo -> ./foo2
M   ./foo
+   ./foo/f2
-   ./foo/f1

s2-s1: Difference between snapshot s2 and snapshot s1 under directory /target:
M   .
-   ./foo
R   ./foo2 -> ./foo
M   ./foo
-   ./foo/f2
+   ./foo/f1

s2-s3: Difference between snapshot s2 and snapshot s3 under directory /target:
M   .
-   ./foo
R   ./foo2 -> ./foo
M   ./foo2
+   ./foo2/f1
-   ./foo2/f2

s3-s2: Difference between snapshot s3 and snapshot s2 under directory /target:
M   .
+   ./foo
R   ./foo -> ./foo2
M   ./foo2
-   ./foo2/f1
+   ./foo2/f2
{code}

The s2-s1 snapshot is supposed to be the same as s2-s3, because  the change 
from s2 to s3 is an exact reversion of the change from s1 to s2.  We can see 
that s1 and s3 have same file structures.

However, the resulted shown above is not. I expect the following part
{code}
M   ./foo
-   ./foo/f2
+   ./foo/f1
{code}

in s2-s1 diff should be 

{code}
M   ./foo2
+   ./foo2/f1
-   ./foo2/f2
{code}
(same as in s2-s3)

instead.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-10211) Add more info to DelegationTokenIdentifier#toString for better supportability

2016-03-24 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-10211.
--
Resolution: Duplicate

> Add more info to DelegationTokenIdentifier#toString for better supportability
> -
>
> Key: HDFS-10211
> URL: https://issues.apache.org/jira/browse/HDFS-10211
> Project: Hadoop HDFS
>  Issue Type: Improvement
>    Reporter: Yongjun Zhang
>    Assignee: Yongjun Zhang
>
> Base class {{AbstractDelegationTokenIdentifier}} has the following 
> implementation of {{toString()}} method
> {code}
> @Override
>   public String toString() {
> StringBuilder buffer = new StringBuilder();
> buffer
> .append("owner=" + owner + ", renewer=" + renewer + ", realUser="
> + realUser + ", issueDate=" + issueDate + ", maxDate=" + maxDate
> + ", sequenceNumber=" + sequenceNumber + ", masterKeyId="
> + masterKeyId);
> return buffer.toString();
>   }
> {code}
> However, derived class 
> {{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}}
> has the following implementation that overrides the base class above:
> {code}
>   @Override
>   public String toString() {
> return getKind() + " token " + getSequenceNumber()
> + " for " + getUser().getShortUserName();
>   }
> {code}
> And when exception is thrown because of token expiration or other reason:
> {code}
> if (info.getRenewDate() < Time.now()) {
>   throw new InvalidToken("token (" + identifier.toString() + ") is 
> expired");
> }
> {code}
> The exception doesn't show the detailed information about the token, like the 
> base class' toString() method returns.
> Creating this jira to change the 
> {{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}}
>  implementation to include all the info about the token, as included by the 
> base class.
> This change would help supportability, at the expense of printing a little 
> more information to the log. I hope no code really depends on the output 
> string. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10211) Add more info to DelegationTokenIdentifier#toString for better supportability

2016-03-24 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10211:


 Summary: Add more info to DelegationTokenIdentifier#toString for 
better supportability
 Key: HDFS-10211
 URL: https://issues.apache.org/jira/browse/HDFS-10211
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


Base class {{AbstractDelegationTokenIdentifier}} has the following 
implementation of {{toString()}} method
{code}
@Override
  public String toString() {
StringBuilder buffer = new StringBuilder();
buffer
.append("owner=" + owner + ", renewer=" + renewer + ", realUser="
+ realUser + ", issueDate=" + issueDate + ", maxDate=" + maxDate
+ ", sequenceNumber=" + sequenceNumber + ", masterKeyId="
+ masterKeyId);
return buffer.toString();
  }
{code}

However, derived class 
{{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}}

has the following implementation that overrides the base class above:

{code}
  @Override
  public String toString() {
return getKind() + " token " + getSequenceNumber()
+ " for " + getUser().getShortUserName();
  }
{code}

And when exception is thrown because of token expiration or other reason:
{code}
if (info.getRenewDate() < Time.now()) {
  throw new InvalidToken("token (" + identifier.toString() + ") is 
expired");
}
{code}
The exception doesn't show the detailed information about the token, like the 
base class' toString() method returns.

Creating this jira to change the 
{{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}}
 implementation to include all the info about the token, as included by the 
base class.

This change would help supportability, at the expense of printing a little more 
information to the log. I hope no code really depends on the output string. 







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9939) Possible performance improvement by increasing buf size in DecompressorStream in HDFS

2016-03-10 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-9939:
---

 Summary: Possible performance improvement  by increasing buf size 
in DecompressorStream in HDFS
 Key: HDFS-9939
 URL: https://issues.apache.org/jira/browse/HDFS-9939
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


See ACCUMULO-2353 for details.

Filing this jira to investigate performance difference and possibly make the 
buf size change accordingly.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: CHANGES.txt is gone from trunk, branch-2, branch-2.8

2016-03-03 Thread Yongjun Zhang
That's nice, thanks Andrew and Allen.

--Yongjun

On Thu, Mar 3, 2016 at 9:11 PM, Andrew Wang 
wrote:

> Hi all,
>
> With the inclusion of HADOOP-12651 going back to branch-2.8, CHANGES.txt
> and release notes are now generated by Yetus. I've gone ahead and deleted
> the manually updated CHANGES.txt from trunk, branch-2, and branch-2.8
> (HADOOP-11792). Many thanks to Allen for the releasedocmaker.py rewrite,
> and the Yetus integration.
>
> I'll go ahead and update the HowToCommit and HowToRelease wiki pages, but
> at a high-level, this means we no longer need to edit CHANGES.txt on new
> commit, streamlining our commit process. CHANGES.txt updates will still be
> necessary for backports to older release lines like 2.6.x and 2.7.x.
>
> Happy committing!
>
> Best,
> Andrew
>


[jira] [Created] (HDFS-9889) Update balancer/mover document about HDFS-6133 feature

2016-03-02 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-9889:
---

 Summary: Update balancer/mover document about HDFS-6133 feature
 Key: HDFS-9889
 URL: https://issues.apache.org/jira/browse/HDFS-9889
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Looking to a Hadoop 3 release

2016-02-19 Thread Yongjun Zhang
Thanks Andrew for initiating the effort!

+1 on pushing 3.x with extended alpha cycle, and continuing the more stable
2.x releases.

--Yongjun

On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang 
wrote:

> Hi Kai,
>
> Sure, I'm open to it. It's a new major release, so we're allowed to make
> these kinds of big changes. The idea behind the extended alpha cycle is
> that downstreams can give us feedback. This way if we do anything too
> radical, we can address it in the next alpha and have downstreams re-test.
>
> Best,
> Andrew
>
> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai  wrote:
>
> > Thanks Andrew for driving this. Wonder if it's a good chance for
> > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
> it's
> > not an incompatible change, but feel better to be done in the major
> release.
> >
> > Regards,
> > Kai
> >
> > -Original Message-
> > From: Andrew Wang [mailto:andrew.w...@cloudera.com]
> > Sent: Friday, February 19, 2016 7:04 AM
> > To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
> > Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org;
> > yarn-...@hadoop.apache.org
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi Kihwal,
> >
> > I think there's still value in continuing the 2.x releases. 3.x comes
> with
> > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> > be beta or GA for some number of months. In the meanwhile, it'd be good
> to
> > keep putting out regular, stable 2.x releases.
> >
> > Best,
> > Andrew
> >
> >
> > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee  >
> > wrote:
> >
> > > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > > motivations, are we getting rid of branch-2.8?
> > >
> > > Kihwal
> > >
> > >   From: Andrew Wang 
> > >  To: "common-...@hadoop.apache.org" 
> > > Cc: "yarn-...@hadoop.apache.org" ; "
> > > mapreduce-...@hadoop.apache.org" ;
> > > hdfs-dev 
> > >  Sent: Thursday, February 18, 2016 4:35 PM
> > >  Subject: Re: Looking to a Hadoop 3 release
> > >
> > > Hi all,
> > >
> > > Reviving this thread. I've seen renewed interest in a trunk release
> > > since HDFS erasure coding has not yet made it to branch-2. Along with
> > > JDK8, the shell script rewrite, and many other improvements, I think
> > > it's time to revisit Hadoop 3.0 release plans.
> > >
> > > My overall plan is still the same as in my original email: a series of
> > > regular alpha releases leading up to beta and GA. Alpha releases make
> > > it easier for downstreams to integrate with our code, and making them
> > > regular means features can be included when they are ready.
> > >
> > > I know there are some incompatible changes waiting in the wings (i.e.
> > > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > > If you have changes like this, please set the target version to 3.0.0
> > > and mark them "Incompatible". We can use this JIRA query to track:
> > >
> > >
> > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> > >
> > > There's some release-related stuff that needs to be sorted out
> > > (namely, the new CHANGES.txt and release note generation from Yetus),
> > > but I'd tentatively like to roll the first alpha a month out, so third
> > > week of March.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata 
> > wrote:
> > >
> > > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > > source version to JDK8.
> > > >
> > > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > > not a way of abandoning it.
> > > >
> > > >
> > > >
> > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > > 
> > > > wrote:
> > > > > Hi Raymie,
> > > > >
> > > > > Konst proposed just releasing off of trunk rather than cutting a
> > > > branch-2,
> > > > > and there was general agreement there. So, consider #3 abandoned.
> > > > > 1&2
> > > can
> > > > > be achieved at the same time, we just need to avoid using JDK8
> > > > > language features in trunk so things can be backported.
> > > > >
> > > > > Best,
> > > > > Andrew
> > > > >
> > > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata
> > > > > 
> > > > wrote:
> > > > >
> > > > >> In this (and the related threads), I see the following three
> > > > requirements:
> > > > >>
> > > 

Re: [VOTE] Release Apache Hadoop 2.6.4 RC0

2016-02-10 Thread Yongjun Zhang
Thanks Junping and Allen.

It'd be nice to have HDFS-9629 but I'm ok with option 2, given the fact
that the issue is not critical (and will be addressed in all future
releases), and the concern Allen raised.

Best,

--Yongjun

On Wed, Feb 10, 2016 at 8:37 AM, Allen Wittenauer  wrote:

>
> > On Feb 9, 2016, at 6:27 PM, Junping Du  wrote:
> >
> > Thanks Yongjun for identifying and proposing this change to 2.6.4. I
> think this is the right thing to do and check for following releases. For
> 2.6.4, it seems unnecessary to create another release candidate for this
> issue as we only kicking off a new RC build when last RC has serious
> problem in functionality. The vote progress is quite smoothly so far, so it
> seems unlikely that we will create a new RC. However, I think there are
> still two options here:
> > Option 1:  in final build, adopt change of HDFS-9629 that only updates
> the footer of Web UI to show year 2016.
> > Option 2: skip HDFS-9629 for 2.6.4 and adopt it later for 2.6.5.
> > I prefer Option 1 as this is a very low risky change without affecting
> any functionality, and we allow non-functional changes (like release date,
> etc.) happen on final build after RC passed. I would like to hear the
> voices in community here before acting for the next step. Thoughts?
> >
>
> I’d think having PMC votes apply to what is not actually the final
> artifact is against the ASF rules.
>
>
>


Re: [VOTE] Release Apache Hadoop 2.6.4 RC0

2016-02-07 Thread Yongjun Zhang
Thanks Junping again for working on this release.

+1 (binding),

- Downloaded source tarball and binary tarball
- Verified signature and checksum for both source and binary tarballs
- Compiled and built a single node cluster
- Run HDFS shell commands to create files
- Run distcp job between this new cluster and some other cluster with older
release successfully.

BTW, in case other people find any issue or request additional important
fixes to include, such that we need new RC, I'd  suggest to to
include HDFS-9629 together to update to release year of Web UI footer
(currently it's 2014).

Thanks.

--Yongjun

On Sat, Feb 6, 2016 at 11:27 PM, Akira AJISAKA 
wrote:

> +1 (binding)
>
> - Downloaded source tarball and binary tarball
> - Verified signatures and checksums
> - Compiled and built a single node cluster
> - Compiled Hive 1.2.1 and Tez 0.7.0/0.8.2 using Hadoop 2.6.4 pom
> successfully
> - Ran some Hive on Tez/MRv2 queries successfully
>
> Thanks,
> Akira
>
> On 2/4/16 11:30, Ted Yu wrote:
>
>> I modified hbase pom.xml (0.98 branch) to point to staged maven artifacts.
>>
>> All unit tests passed.
>>
>> Cheers
>>
>> On Tue, Feb 2, 2016 at 11:01 PM, Junping Du  wrote:
>>
>> Hi community folks,
>>> I've created a release candidate RC0 for Apache Hadoop 2.6.4 (the
>>> next
>>> maintenance release to follow up 2.6.3.) according to email thread of
>>> release plan 2.6.4 [1]. Below is details of this release candidate:
>>>
>>> The RC is available for validation at:
>>> *http://people.apache.org/~junping_du/hadoop-2.6.4-RC0/
>>> *
>>>
>>> The RC tag in git is: release-2.6.4-RC0
>>>
>>> The maven artifacts are staged via repository.apache.org at:
>>> *
>>> https://repository.apache.org/content/repositories/orgapachehadoop-1028/
>>> ?
>>> <
>>> https://repository.apache.org/content/repositories/orgapachehadoop-1028/
>>>
 *

>>>
>>> You can find my public key at:
>>> http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS
>>>
>>> Please try the release and vote. The vote will run for the usual 5 days.
>>>
>>> Thanks!
>>>
>>>
>>> Cheers,
>>>
>>> Junping
>>>
>>>
>>> [1]: 2.6.4 release plan: http://markmail.org/message/fk3ud3c665lscvx5?
>>>
>>>
>>>
>>
>


[jira] [Created] (HDFS-9764) DistCp doesn't print arg value for -numListstatusThreads

2016-02-04 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-9764:
---

 Summary: DistCp doesn't print arg value for -numListstatusThreads
 Key: HDFS-9764
 URL: https://issues.apache.org/jira/browse/HDFS-9764
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Priority: Minor


numListstatusThreads is missing from DistCpOption code:
{code}
 public String toString() {
return "DistCpOptions{" +
"atomicCommit=" + atomicCommit +
", syncFolder=" + syncFolder +
", deleteMissing=" + deleteMissing +
", ignoreFailures=" + ignoreFailures +
", maxMaps=" + maxMaps +
", sslConfigurationFile='" + sslConfigurationFile + '\'' +
", copyStrategy='" + copyStrategy + '\'' +
", sourceFileListing=" + sourceFileListing +
", sourcePaths=" + sourcePaths +
", targetPath=" + targetPath +
", targetPathExists=" + targetPathExists +
", preserveRawXattrs=" + preserveRawXattrs +
", filtersFile='" + filtersFile + '\'' +
'}';
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >