Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-02-28 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/150/

[Feb 28, 2018 2:25:49 AM] (yqlin) HDFS-13194. CachePool permissions incorrectly 
checked. Contributed by
[Feb 28, 2018 9:19:03 PM] (stevel) Revert "HADOOP-15090. Add ADL 
troubleshooting doc."




-1 overall


The following subsystems voted -1:
docker


Powered by Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-02-28 Thread sanjay Radia

Andrew, thanks for your response.

1) Wrt to NN on top of HDSL. You raised the issue of FSN lock separation . This 
was a key issue we discussed heavily in the past in the context of “Show the 
community a way to connect NN into the the new block layer”. We heard you 
clearly and thought deeply and showed how NN can be put on top of  WITHOUT 
removing the FSN.  We described this in detail  in HDFS-10419 and also  in the 
summary of the DISCUSSION thread:
  Milestone 1 (no removal of FSN) gives almost 2x scalability and does not 
require separation of FSN lock and that milestone 2 which removes the FSN lock 
gives 2x scalability. 

You have conveniently ignored this. Let me reemphasize: Removing the FSN lock 
is not necessary for NN/HDFS to benefit from HDSL and you get almost the same 
scalability benefit. Hence the FSN local issue is moot. 

2) You have also conveniently ignored our arguments that there is benefit in 
keeping HDSL and HDFS together that are in the vote and discussion thread 
summary:
  A) Side by side usage and resulting operational concerns
>>"In the short term and medium term, the new system and HDFS
>> will be used side-by-side by users. ……  
>> During this time, sharing the DN daemon and admin functions
>> between the two systems is operationally important”

   B) Sharing code 
>>"Need to easily share the block layer code between the two systems
>> when used side-by-side. Areas where sharing code is desired over time: 
>>  - Sharing new block layer’s  new netty based protocol engine
>> for old HDFS DNs (a long time sore issue for HDFS block layer). 
>> - Shallow data copy from old system to new system is practical
>> only if within same project and daemon otherwise have to deal
>> with security setting and coordinations across daemons.
>> Shallow copy is useful as customer migrate from old to new.
>> - Shared disk scheduling in the future"



3) You argue for separate project from 2 conflicting arguments: (1) Separate 
then merge later, what’s the hurry.  (2) keep seperate and focus on non-HDFS 
storage use cases. The HDFS community members built HDSL to address HDFS 
scalability; they were  not trying go after object store users or market (ceph 
etc). As explained multiple times OzoneFS is an intermediate step to stabilize 
HDSL but of immediate value for apps such as Hive and Spark. So even if there 
might be value in being separate (your motivation 2)  and go after a new 
storage use cases, the HDFS community members that built HDSL want to focus on 
improving HDFS; you may not agree with that but the engineers that are writing 
the code should be able to drive the direction.  Further look at the Security 
design we posted  - shows a Hadoop/HDFS focus not a focus for some other object 
store market: it fits into the Hadoop security model, especially supporting the 
use case of Jobs and the resulting need to support delegation tokens. 

4) You argue that the  HDSL and OzoneFS modules are separate and therefore one 
should go as a separate project. * Looks like one can’t win here. Damned if you 
do and Damned if you don’t. In the discussion with the Cloudera team one of the 
issues raised was that there a lot of new code and it will destabilized HDFS. 
We explained that  we have kept the code in separate modules so that it will 
not impact current HDFS stability, and that features like HDSL’s  new protocol 
engine will be plugged into the old HDFS block layer only after stabilization. 
You argue for stability and hence separate modules and then use it against to 
push it out as a separate project.

sanjay


> On Feb 28, 2018, at 12:10 AM, Andrew Wang  wrote:
> 
> Resending since the formatting was messed up, let's try plain text this
> time:
> 
> Hi Jitendra and all,
> 
> Thanks for putting this together. I caught up on the discussion on JIRA and
> document at HDFS-10419, and still have the same concerns raised earlier
> about merging the Ozone branch to trunk.
> 
> To recap these questions/concerns at a very high level:
> 
> * Wouldn't Ozone benefit from being a separate project?
> * Why should it be merged now?
> 
> I still believe that both Ozone and Hadoop would benefit from Ozone being a
> separate project, and that there is no pressing reason to merge Ozone/HDSL
> now.
> 
> The primary reason I've heard for merging is that the Ozone is that it's at
> a stage where it's ready for user feedback. Second, that it needs to be
> merged to start on the NN refactoring for HDFS-on-HDSL.
> 
> First, without HDFS-on-HDSL support, users are testing against the Ozone
> object storage interface. Ozone and HDSL themselves are implemented as
> separate masters and new functionality bolted onto the datanode. It also
> doesn't look like HDFS in terms of API or featureset; yes, it speaks
> FileSystem, but so do many out-of-tree storage systems like S3, Ceph,
> Swift, ADLS etc. Ozone/HDSL does not support popular HDFS features like
> erasure coding, encryption, 

[jira] [Resolved] (HADOOP-15276) branch-2 site not building after ADL troubleshooting doc added

2018-02-28 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15276.
-
   Resolution: Fixed
Fix Version/s: 2.10.0

Fixed by reverting the patch. If someone really wants the troubleshooting doc, 
they can re-open this & submit a patch

> branch-2 site not building after ADL troubleshooting doc added
> --
>
> Key: HADOOP-15276
> URL: https://issues.apache.org/jira/browse/HADOOP-15276
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.10.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 2.10.0
>
>
> Toc error on the ADL troubleshooting doc from HADOOP-15090
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.5:site (default-cli) on project 
> hadoop-azure-datalake: Error parsing 
> 'hadoop-trunk/hadoop-tools/hadoop-azure-datalake/src/site/markdown/troubleshooting_adl.md':
>  line [-1] Error parsing the model: Unable to execute macro in the document: 
> toc -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15276) branch-2 site not building after ADL troubleshooting doc added

2018-02-28 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15276:
---

 Summary: branch-2 site not building after ADL troubleshooting doc 
added
 Key: HADOOP-15276
 URL: https://issues.apache.org/jira/browse/HADOOP-15276
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.10.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Toc error on the ADL troubleshooting doc from HADOOP-15090
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-site-plugin:3.5:site (default-cli) on project 
hadoop-azure-datalake: Error parsing 
'hadoop-trunk/hadoop-tools/hadoop-azure-datalake/src/site/markdown/troubleshooting_adl.md':
 line [-1] Error parsing the model: Unable to execute macro in the document: 
toc -> [Help 1]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15275) Incorrect javadoc in return type of RetryPolicy#shouldRetry

2018-02-28 Thread Nanda kumar (JIRA)
Nanda kumar created HADOOP-15275:


 Summary: Incorrect javadoc in return type of 
RetryPolicy#shouldRetry
 Key: HADOOP-15275
 URL: https://issues.apache.org/jira/browse/HADOOP-15275
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Nanda kumar
Assignee: Nanda kumar


The return type of {{RetryPolicy#shouldRetry}} has been changed from 
{{boolean}} to {{RetryAction}}, but the javadoc is not updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15274) move hadoop-openstack to slf4j

2018-02-28 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15274:
---

 Summary: move hadoop-openstack to slf4j
 Key: HADOOP-15274
 URL: https://issues.apache.org/jira/browse/HADOOP-15274
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/swift
Reporter: Steve Loughran






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-6473) Add hadoop health check/diagnostics to run from command line, JSP pages, other tools

2018-02-28 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-6473.

Resolution: Won't Fix

we are adding specific diags for specific problems; a generic one is 
unrealistic. I know that now :)

> Add hadoop health check/diagnostics to run from command line, JSP pages, 
> other tools
> 
>
> Key: HADOOP-6473
> URL: https://issues.apache.org/jira/browse/HADOOP-6473
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Steve Loughran
>Priority: Minor
>  Labels: ipv6
>
> If the lifecycle ping() is for short-duration "are we still alive" checks, 
> Hadoop still needs something bigger to check the overall system health,.This 
> would be for end users, but also for automated cluster deployment, a complete 
> validation of the cluster, 
> It could be a command line tool, and something that runs on different nodes, 
> checked via IPC or JSP. the idea would be to do thorough checks with good 
> diagnostics.  Oh, and they should be executable through JUnit too.
> For example
>  -if running on windows, check that cygwin is on the path, fail with a 
> pointer to a wiki issue if not
>  -datanodes should check that it can create locks on the filesystem, create 
> files, timestamps are (roughly) aligned with local time.
>  -namenodes should try and create files/locks in the filesystem
>  -task tracker should try and exec() something
>  -run through the classpath and look for problems; duplicate JARs, 
> unsupported java, xerces versions, etc.
> * The number of tests should be extensible -rather than one single class with 
> all the tests, there'd be something separate for name, task, data, job 
> tracker nodes
> * They can't be in the nodes themselves, as they should be executable even if 
> the nodes don't come up. 
> * output could be in human readable text or html, and a form that could be 
> processed through hadoop itself in future
> * these tests could have side effects, such as actually trying to submit work 
> to a cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15273) distcp error message on checksum mismatch is misleading when checksum protocol itself is different

2018-02-28 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15273:
---

 Summary: distcp error message on checksum mismatch is misleading 
when checksum protocol itself is different
 Key: HADOOP-15273
 URL: https://issues.apache.org/jira/browse/HADOOP-15273
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools/distcp
Affects Versions: 3.1.0
Reporter: Steve Loughran


When using distcp without {{-skipCRC}} . If there's a checksum mismatch between 
src and dest store types (e.g hdfs to s3), then the error message will talk 
about blocksize, even when its the underlying checksum protocol itself which is 
the cause for failure

bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
(NOTE: By skipping checksums, one runs the risk of masking data-corruption 
during file-transfer.)

IF the checksum types are fundamentally different, the error message should say 
so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-02-28 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/

[Feb 27, 2018 2:48:52 AM] (yqlin) HDFS-13184. RBF: Improve the unit test 
TestRouterRPCClientRetries.
[Feb 27, 2018 3:38:29 PM] (arp) HADOOP-15178. Generalize NetUtils#wrapException 
to handle other
[Feb 27, 2018 4:48:03 PM] (inigoiri) HDFS-13193. Various Improvements for 
BlockTokenSecretManager.
[Feb 27, 2018 4:53:00 PM] (inigoiri) HDFS-13192. Change the code order in 
getFileEncryptionInfo to avoid
[Feb 27, 2018 6:15:43 PM] (arp) HADOOP-14959. 
DelegationTokenAuthenticator.authenticate() to wrap
[Feb 27, 2018 6:18:07 PM] (arp) HDFS-13181. DiskBalancer: Add an configuration 
for valid plan hours .
[Feb 27, 2018 6:27:18 PM] (arp) MAPREDUCE-7061. SingleCluster setup document 
needs to be updated.
[Feb 27, 2018 9:19:16 PM] (wangda) YARN-7893. Document the FPGA isolation 
feature. (Zhankun Tang via
[Feb 27, 2018 9:19:24 PM] (wangda) YARN-7959. Add .vm extension to 
PlacementConstraints.md to ensure proper
[Feb 27, 2018 10:33:57 PM] (billie) YARN-7446. Remove --user flag when running 
privileged mode docker
[Feb 27, 2018 11:28:41 PM] (szetszwo) HDFS-13143. SnapshotDiff - 
snapshotDiffReport might be inconsistent if
[Feb 28, 2018 1:39:02 AM] (inigoiri) HDFS-13199. RBF: Fix the hdfs router page 
missing label icon issue.




-1 overall


The following subsystems voted -1:
findbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
   org.apache.hadoop.yarn.api.records.Resource.getResources() may expose 
internal representation by returning Resource.resources At Resource.java:by 
returning Resource.resources At Resource.java:[line 234] 

Failed junit tests :

   hadoop.crypto.key.kms.server.TestKMS 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 
   hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure 
   hadoop.fs.http.server.TestHttpFSServerWebServer 
   hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage 
   hadoop.yarn.client.TestApplicationMasterServiceProtocolForTimelineV2 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/diff-compile-javac-root.txt
  [280K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/whitespace-eol.txt
  [9.2M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/whitespace-tabs.txt
  [288K]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/xml.txt
  [4.0K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/diff-javadoc-javadoc-root.txt
  [760K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/patch-unit-hadoop-common-project_hadoop-kms.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [328K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt
  [24K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [48K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [20K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/706/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [84K]

Powered by Apache 

[jira] [Created] (HADOOP-15272) Update Guava, see what breaks

2018-02-28 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15272:
---

 Summary: Update Guava, see what breaks
 Key: HADOOP-15272
 URL: https://issues.apache.org/jira/browse/HADOOP-15272
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: build
Affects Versions: 3.1.0
Reporter: Steve Loughran


We're still on Guava 11; the last attempt at an update (HADOOP-10101) failed to 
take

Now we have better shading, we should try again. I suspect that YARN timeline 
service is going to be the problem because of its use of HBase. That's the 
price of a loop in the DAG. We cannot keep everything frozen just because of 
that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15271) Remove unicode multibyte characters from JavaDoc

2018-02-28 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created HADOOP-15271:
--

 Summary: Remove unicode multibyte characters from JavaDoc
 Key: HADOOP-15271
 URL: https://issues.apache.org/jira/browse/HADOOP-15271
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: documentation
 Environment: Java 9.0.4, Applied HADOOP-12760 and HDFS-11610
Reporter: Akira Ajisaka


{{mvn package -Pdist,native -Dtar -DskipTests}} fails.
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:3.0.0-M1:jar (module-javadocs) on 
project hadoop-common: MavenReportException: Error while generating Javadoc: 
[ERROR] Exit code: 1 - javadoc: warning - The old Doclet and Taglet APIs in the 
packages
[ERROR] com.sun.javadoc, com.sun.tools.doclets and their implementations
[ERROR] are planned to be removed in a future JDK release. These
[ERROR] components have been superseded by the new APIs in jdk.javadoc.doclet.
[ERROR] Users are strongly recommended to migrate to the new APIs.
[ERROR] 
/home/centos/git/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:1652:
 error: unmappable character (0xE2) for encoding US-ASCII
[ERROR]* closed automatically ???these the marked paths will be deleted as 
a result.
[ERROR]   ^
[ERROR] 
/home/centos/git/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:1652:
 error: unmappable character (0x80) for encoding US-ASCII
[ERROR]* closed automatically ???these the marked paths will be deleted as 
a result.
[ERROR]^
[ERROR] 
/home/centos/git/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:1652:
 error: unmappable character (0x94) for encoding US-ASCII
[ERROR]* closed automatically ???these the marked paths will be deleted as 
a result.
{noformat}
JDK9 JavaDoc cannot treat non-ascii characters due to 
https://bugs.openjdk.java.net/browse/JDK-8188649.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-02-28 Thread Andrew Wang
Resending since the formatting was messed up, let's try plain text this
time:

Hi Jitendra and all,

Thanks for putting this together. I caught up on the discussion on JIRA and
document at HDFS-10419, and still have the same concerns raised earlier
about merging the Ozone branch to trunk.

To recap these questions/concerns at a very high level:

* Wouldn't Ozone benefit from being a separate project?
* Why should it be merged now?

I still believe that both Ozone and Hadoop would benefit from Ozone being a
separate project, and that there is no pressing reason to merge Ozone/HDSL
now.

The primary reason I've heard for merging is that the Ozone is that it's at
a stage where it's ready for user feedback. Second, that it needs to be
merged to start on the NN refactoring for HDFS-on-HDSL.

First, without HDFS-on-HDSL support, users are testing against the Ozone
object storage interface. Ozone and HDSL themselves are implemented as
separate masters and new functionality bolted onto the datanode. It also
doesn't look like HDFS in terms of API or featureset; yes, it speaks
FileSystem, but so do many out-of-tree storage systems like S3, Ceph,
Swift, ADLS etc. Ozone/HDSL does not support popular HDFS features like
erasure coding, encryption, high-availability, snapshots, hflush/hsync (and
thus HBase), or APIs like WebHDFS or NFS. This means that Ozone feels like
a new, different system that could reasonably be deployed and tested
separately from HDFS. It's unlikely to replace many of today's HDFS
deployments, and from what I understand, Ozone was not designed to do this.

Second, the NameNode refactoring for HDFS-on-HDSL by itself is a major
undertaking. The discussion on HDFS-10419 is still ongoing so it’s not
clear what the ultimate refactoring will be, but I do know that the earlier
FSN/BM refactoring during 2.x was very painful (introducing new bugs and
making backports difficult) and probably should have been deferred to a new
major release instead. I think this refactoring is important for the
long-term maintainability of the NN and worth pursuing, but as a Hadoop 4.0
item. Merging HDSL is also not a prerequisite for starting this
refactoring. Really, I see the refactoring as the prerequisite for
HDFS-on-HDSL to be possible.

Finally, I earnestly believe that Ozone/HDSL itself would benefit from
being a separate project. Ozone could release faster and iterate more
quickly if it wasn't hampered by Hadoop's release schedule and security and
compatibility requirements. There are also publicity and community
benefits; it's an opportunity to build a community focused on the novel
capabilities and architectural choices of Ozone/HDSL. There are examples of
other projects that were "incubated" on a branch in the Hadoop repo before
being spun off to great success.

In conclusion, I'd like to see Ozone succeeding and thriving as a separate
project. Meanwhile, we can work on the HDFS refactoring required to
separate the FSN and BM and make it pluggable. At that point (likely in the
Hadoop 4 timeframe), we'll be ready to pursue HDFS-on-HDSL integration.

Best,
Andrew

On Tue, Feb 27, 2018 at 11:23 PM, Andrew Wang 
wrote:

>
>
>
>
>
>
>
>
>
> *Hi Jitendra and all,Thanks for putting this together. I caught up on the
> discussion on JIRA and document at HDFS-10419, and still have the same
> concerns raised earlier
> 
> about merging the Ozone branch to trunk.To recap these questions/concerns
> at a very high level:* Wouldn't Ozone benefit from being a separate
> project?* Why should it be merged now?I still believe that both Ozone and
> Hadoop would benefit from Ozone being a separate project, and that there is
> no pressing reason to merge Ozone/HDSL now.The primary reason I've heard
> for merging is that the Ozone is that it's at a stage where it's ready for
> user feedback. Second, that it needs to be merged to start on the NN
> refactoring for HDFS-on-HDSL.First, without HDFS-on-HDSL support, users are
> testing against the Ozone object storage interface. Ozone and HDSL
> themselves are implemented as separate masters and new functionality bolted
> onto the datanode. It also doesn't look like HDFS in terms of API or
> featureset; yes, it speaks FileSystem, but so do many out-of-tree storage
> systems like S3, Ceph, Swift, ADLS etc. Ozone/HDSL does not support popular
> HDFS features like erasure coding, encryption, high-availability,
> snapshots, hflush/hsync (and thus HBase), or APIs like WebHDFS or NFS. This
> means that Ozone feels like a new, different system that could reasonably
> be deployed and tested separately from HDFS. It's unlikely to replace many
> of today's HDFS deployments, and from what I understand, Ozone was not
> designed to do this.Second, the NameNode refactoring for HDFS-on-HDSL by
> itself is a major undertaking.