[DISCUSS] Guidelines for Code cleanup JIRAs

2020-01-09 Thread epa...@apache.org
There was some discussion on https://issues.apache.org/jira/browse/YARN-9052
about concerns surrounding the costs/benefits of code cleanup JIRAs. This email
is to get the discussion going within a wider audience.

The positive points for code cleanup JIRAs:
- Clean up tech debt
- Make code more readable
- Make code more maintainable
- Make code more performant

The concerns regarding code cleanup JIRAs are as follows:
- If the changes only go into trunk, then contributors and committers trying to
 backport to prior releases will have to create and test multiple patch 
versions.
- Some have voiced concerns that code cleanup JIRAs may not be tested as
  thoroughly as features and bug fixes because functionality is not supposed to
  change.
- Any patches awaiting review that are touching the same code will have to be
  redone, re-tested, and re-reviewed.
- JIRAs that are opened for code cleanup and not worked on right away tend to
  clutter up the JIRA space.

Here are my opinions:
- Code changes of any kind force a non-trivial amount of overhead for other
  developers. For code cleanup JIRAs, sometimes the usability, maintainability,
  and performance is worth the overhead (as in the case of YARN-9052).
- Before opening any JIRA, please always consider whether or not the added
  usability will outweigh the added pain you are causing other developers.
- If you believe the benefits outweigh the costs, please backport the changes
  yourself to all active lines. My preference is to port all the way back to 
2.10.
- Please don't run code analysis tools and then open many JIRAs that document
  those findings. That activity does not put any thought into this cost-benefit
  analysis.

Thanks everyone. I'm looking forward to your thoughts. I appreciate all you do
for the open source community and it is always a pleasure to work with you.
-Eric Payne

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Hadoop 2019 Release Planning

2020-01-09 Thread Steve Loughran
Well volunteered! I will help with the testing

On Mon, Jan 6, 2020 at 10:08 AM Gabor Bota 
wrote:

> I'm interested in doing a release of hadoop.
> The version we need an RM is 3.1.3 right? What's the target date for that?
>
> Thanks,
> Gabor
>
> On Mon, Jan 6, 2020 at 8:31 AM Akira Ajisaka  wrote:
>
> > Thank you Wangda.
> >
> > Now it's 2020. Let's release Hadoop 3.3.0.
> > I created a wiki page for tracking blocker/critical issues for 3.3.0 and
> > I'll check the issues in the list.
> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.3+Release
> > If you find blocker/critical issues in trunk, please set the target
> version
> > to 3.3.0 for tracking.
> >
> > > We still need RM for 3.3.0 and 3.1.3.
> > I can work as a release manager for 3.3.0. Is there anyone who wants to
> be
> > a RM?
> >
> > Thanks and regards,
> > Akira
> >
> > On Fri, Aug 16, 2019 at 9:28 PM zhankun tang 
> > wrote:
> >
> > > Thanks Wangda for bring this up!
> > >
> > > I ran the submarine 0.2.0 release before with a lot of help from folks
> > > especially Sunil. :D
> > > And this time I would like to help to release the 3.1.4. Thanks!
> > >
> > > BR,
> > > Zhankun
> > >
> > > Hui Fei 于2019年8月16日 周五下午7:19写道:
> > >
> > > > Hi Wangda,
> > > > Thanks for bringing this up!
> > > > Looking forward to see HDFS 3.x is widely used,but RollingUpgrade is
> a
> > > > problem.
> > > > Hope commiters watch and review these issues, Thanks
> > > > https://issues.apache.org/jira/browse/HDFS-13596
> > > > https://issues.apache.org/jira/browse/HDFS-14396
> > > >
> > > > Wangda Tan  于2019年8月10日周六 上午10:59写道:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Hope this email finds you well
> > > > >
> > > > > I want to hear your thoughts about what should be the release plan
> > for
> > > > > 2019.
> > > > >
> > > > > In 2018, we released:
> > > > > - 1 maintenance release of 2.6
> > > > > - 3 maintenance releases of 2.7
> > > > > - 3 maintenance releases of 2.8
> > > > > - 3 releases of 2.9
> > > > > - 4 releases of 3.0
> > > > > - 2 releases of 3.1
> > > > >
> > > > > Total 16 releases in 2018.
> > > > >
> > > > > In 2019, by far we only have two releases:
> > > > > - 1 maintenance release of 3.1
> > > > > - 1 minor release of 3.2.
> > > > >
> > > > > However, the community put a lot of efforts to stabilize features
> of
> > > > > various release branches.
> > > > > There're:
> > > > > - 217 fixed patches in 3.1.3 [1]
> > > > > - 388 fixed patches in 3.2.1 [2]
> > > > > - 1172 fixed patches in 3.3.0 [3] (OMG!)
> > > > >
> > > > > I think it is the time to do maintenance releases of 3.1/3.2 and
> do a
> > > > minor
> > > > > release for 3.3.0.
> > > > >
> > > > > In addition, I saw community discussion to do a 2.8.6 release for
> > > > security
> > > > > fixes.
> > > > >
> > > > > Any other releases? I think there're release plans for Ozone as
> well.
> > > And
> > > > > please add your thoughts.
> > > > >
> > > > > Volunteers welcome! If you have interests to run a release as
> Release
> > > > > Manager (or co-Resource Manager), please respond to this email
> thread
> > > so
> > > > we
> > > > > can coordinate.
> > > > >
> > > > > Thanks,
> > > > > Wangda Tan
> > > > >
> > > > > [1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution =
> Fixed
> > > AND
> > > > > fixVersion = 3.1.3
> > > > > [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution =
> Fixed
> > > AND
> > > > > fixVersion = 3.2.1
> > > > > [3] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution =
> Fixed
> > > AND
> > > > > fixVersion = 3.3.0
> > > > >
> > > >
> > >
> >
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2020-01-09 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1376/

[Jan 8, 2020 2:28:34 AM] (aajisaka) HDFS-15072. HDFS MiniCluster fails to start 
when run in directory path
[Jan 8, 2020 6:31:30 AM] (pjoseph) YARN-10068. Fix TimelineV2Client leaking 
File Descriptors.
[Jan 8, 2020 7:45:39 AM] (github) HDFS-15077. Fix intermittent failure of
[Jan 8, 2020 8:55:17 AM] (rakeshr) HDFS-15080. Fix the issue in reading 
persistent memory cached data with
[Jan 8, 2020 11:25:01 AM] (gabor.bota) HADOOP-16772. Extract version numbers to 
head of pom.xml (addendum)
[Jan 8, 2020 11:32:31 AM] (github) HADOOP-16751. Followup: move java import. 
(#1799)
[Jan 8, 2020 11:46:54 AM] (stevel) HADOOP-16785. Improve wasb and abfs 
resilience on double close() calls.
[Jan 8, 2020 2:28:20 PM] (stevel) HADOOP-16642. ITestDynamoDBMetadataStoreScale 
fails when throttled.
[Jan 8, 2020 5:08:13 PM] (jeagles) MAPREDUCE-7252. Handling 0 progress in 
SimpleExponential task runtime
[Jan 8, 2020 5:29:56 PM] (ericp) YARN-10072: TestCSAllocateCustomResource 
failures. Contributed by Jim
[Jan 8, 2020 7:26:01 PM] (ericp) YARN-7387:




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core
 
   Class org.apache.hadoop.applications.mawo.server.common.TaskStatus 
implements Cloneable but does not define or use clone method At 
TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 
39-346] 
   Equals method for 
org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument 
is of type WorkerId At WorkerId.java:the argument is of type WorkerId At 
WorkerId.java:[line 114] 
   
org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does 
not check for null argument At WorkerId.java:null argument At 
WorkerId.java:[lines 114-115] 

FindBugs :

   module:hadoop-cloud-storage-project/hadoop-cos 
   Redundant nullcheck of dir, which is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:[line 66] 
   org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may 
expose internal representation by returning CosNInputStream$ReadBuffer.buffer 
At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At 
CosNInputStream.java:[line 87] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, 
byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, 
File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long):in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long): new String(byte[]) At 
CosNativeFileSystemStore.java:[line 178] 
   org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, 
String, String, int) may fail to clean up java.io.InputStream Obligation to 
clean up resource created at CosNativeFileSystemStore.java:fail to clean up 
java.io.InputStream Obligation to clean up resource created at 
CosNativeFileSystemStore.java:[line 252] is not discharged 

Failed junit tests :

   hadoop.hdfs.TestReconstructStripedFile 
   hadoop.hdfs.server.namenode.TestRedudantBlocks 
   hadoop.hdfs.TestDeadNodeDetection 
   

Re: Reminder: Hadoop Storage Online Meetup tomorrow (Hadoop 2->3 upgrade)

2020-01-09 Thread Wei-Chiu Chuang
Thanks for Wangda's help, I am able to retrieve the recording of this
session.

Please feel free to download the recording at:
https://cloudera.zoom.us/rec/share/7MF_dLX0339OY5391xvkZP8NLrXieaa8gyZK-fYJnUkGOUUXvaUh5cl_6AVYetQl

non-Mandarin speakers, please send me the feedback on how you think about
the session this time. I served as the translator this time and I need your
feedback to improve next time.

On Fri, Jan 3, 2020 at 10:01 PM Wei-Chiu Chuang  wrote:

>
> Hi, it was a well attended session with more than 40 attendees joined!
> Thanks Fei Hui for giving us such a great talk.
>
> Here's the summary for your reference.
>
>
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
> 01/02/2020 Didi talked about their large scale HDFS cluster upgrade
> experience.
>
> Slides:
> https://drive.google.com/open?id=1iwJ1asalYfgnOCBuE-RfeG-NpSocjIcy
>
> Didi studied two upgrade approaches from the community documentation:
> express upgrade and rolling upgrade. Rolling upgrade was selected.
>
> The upgrade involved HDFS server side only. Clients are still on Hadoop
> 2.7 because applications such as Hive and Spark does not support Hadoop 3
> yet.
>
> Zookeeper was not upgraded.
>
> Didi practiced upgrade + downgrade more than 10 times before doing it for
> real.
>
> Didi’s largest cluster has 5 federated namespaces, and 10+ thousand nodes.
> The upgrade took a month. JournalNodes took 1 week; NameNode: 2 weeks;
> DataNodes took a week.
>
> During upgrade, HDFS does not clean up trash. Because the upgrade window
> was a month long, the trash became a concern because it could exhaust all
> available space. Didi has a (script?) to clean trash daily.
>
> A problem was encountered which may not be related: Clients were
> occasionally unable to close files. Solution: reviewed DataNode log, and
> found that the blocks were not reported in time, and that was because
> delete blocks took too long.
>
> Two parameters were changed to address the issue:
>
> Increase dfs.client.block.write.locateFollowingBlock.retries and
>
> Reduce dfs.block.invalidate.limit (from the default 1000 to 500)
>
> Didi believes the new upstream change HDFS-14997 can alleviate this issue.
>
> Timeline:
>
> May 2019, verified the plan is good.
>
> July: trial run with a 100-node cluster, completed rolling upgrade
> successfully.
>
> Oct: 300+ node cluster rolling upgrade completed.
>
> Nov: 10-thousand node cluster rolling upgrade completed.
>
> Offline test
>
> Had Spark, Hive and Hadoop full test set. Verified the upgrade/downgrade
> has no impact.
>
> Reviewed the 4000+ patches between Hadoop 2.7 and 3.2, to make sure
> there’s no incompatible changes.
>
> Authored 40+ internal wikis to document the process.
>
> Future:
>
> Didi’s interested in Ozone to address the small file problems.
>
> Want to incorporate the Consistent Read from Standby feature to increase
> NameNode RPC performance.
>
> Finally, DataNode upgrade is hard. Will look into HDFS Maintenance Mode to
> make this easier in the future.
>
> This is a HDFS-only upgrade work. YARN upgrade is planned in the second
> half of 2020. Since the main purpose is to use EC to reduce space usage,
> Didi ported EC client side code to Hadoop 2.7 clients, and these clients
> can read/write EC blocks!
>
>
> On Wed, Jan 1, 2020 at 7:42 PM Wei-Chiu Chuang  wrote:
>
>> Hi,
>> This is a gentle reminder for tomorrow's online meetup. Fei Hui from DiDi
>> is going to give a presentation about DiDi's Hadoop 2 -> Hadoop 3 upgrade
>> experience.
>>
>> We will extend this session to 1 hour. Fei will speak in Mandarin and I
>> will help translate. So non-Mandarin speakers feel free to join!
>>
>> Time/Date:
>> Jan 1 10PM (US west coast PST) / Jan 2 2pm (Beijing, China CST) / Jan 2
>> 11:30am (India, IST) / Jan 2 3pm (Tokyo, Japan, JST)
>>
>> Join Zoom Meeting
>>
>> https://cloudera.zoom.us/j/880548968
>>
>> One tap mobile
>>
>> +16465588656,,880548968# US (New York)
>>
>> +17207072699,,880548968# US
>>
>> Dial by your location
>>
>> +1 646 558 8656 US (New York)
>>
>> +1 720 707 2699 US
>>
>> 877 853 5257 US Toll-free
>>
>> 888 475 4499 US Toll-free
>>
>> Meeting ID: 880 548 968
>> Find your local number: https://zoom.us/u/acaGRDfMVl
>>
>


Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts

2020-01-09 Thread Ayush Saxena
Hi All,
FYI :
We will be going ahead with the present approach, will merge by tomorrow EOD. 
Considering no one has objections.
Thanx Everyone!!!

-Ayush

> On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula  wrote:
> 
> Hi Sree vaddi,Owen,stack,Duo Zhang,
> 
> We can move forward based on your comments, just waiting for your
> reply.Hope all of your comments answered..(unification we can think
> parallel thread as Vinay mentioned).
> 
> 
> 
> On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B 
> wrote:
> 
>> Hi Sree,
>> 
>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
>> Project ? Or as a TLP ?
>>> Or as a new project definition ?
>> As already mentioned by Ayush, this will be a subproject of Hadoop.
>> Releases will be voted by Hadoop PMC as per ASF process.
>> 
>> 
>>> The effort to streamline and put in an accepted standard for the
>> dependencies that require shading,
>>> seems beyond the siloed efforts of hadoop, hbase, etc
>> 
>>> I propose, we bring all the decision makers from all these artifacts in
>> one room and decide best course of action.
>>> I am looking at, no projects should ever had to shade any artifacts
>> except as an absolute necessary alternative.
>> 
>> This is the ideal proposal for any project. But unfortunately some projects
>> takes their own course based on need.
>> 
>> In the current case of protobuf in Hadoop,
>>Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up to
>> avoid downstream failures. Since Hadoop is a platform, its dependencies
>> will get added to downstream projects' classpath. So any change in Hadoop's
>> dependencies will directly affect downstreams. Hadoop strictly follows
>> backward compatibility as far as possible.
>>Though protobuf provides wire compatibility b/w versions, it doesnt
>> provide compatibility for generated sources.
>>Now, to support ARM protobuf upgrade is mandatory. Using shading
>> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and
>> still have 2.5.0 protobuf (deprecated) for downstreams.
>> 
>> This shading is necessary to have both versions of protobuf supported.
>> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for
>> hadoop's internal usage).
>> And this entire work to be done before 3.3.0 release.
>> 
>> So, though its ideal to make a common approach for all projects, I suggest
>> for Hadoop we can go ahead as per current approach.
>> We can also start the parallel effort to address these problems in a
>> separate discussion/proposal. Once the solution is available we can revisit
>> and adopt new solution accordingly in all such projects (ex: HBase, Hadoop,
>> Ratis).
>> 
>> -Vinay
>> 
>>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena  wrote:
>>> 
>>> Hey Sree
>>> 
 apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
 Project ? Or as a TLP ?
 Or as a new project definition ?
 
>>> A sub project of Apache Hadoop, having its own independent release
>> cycles.
>>> May be you can put this into the same column as ozone or as
>>> submarine(couple of months ago).
>>> 
>>> Unifying for all, seems interesting but each project is independent and
>> has
>>> its own limitations and way of thinking, I don't think it would be an
>> easy
>>> task to bring all on the same table and get them agree to a common stuff.
>>> 
>>> I guess this has been into discussion since quite long, and there hasn't
>>> been any other alternative suggested. Still we can hold up for a week, if
>>> someone comes up with a better solution, else we can continue in the
>>> present direction.
>>> 
>>> -Ayush
>>> 
>>> 
>>> 
>>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi > .invalid>
>>> wrote:
>>> 
 apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating
 Project ? Or as a TLP ?
 Or as a new project definition ?
 
 The effort to streamline and put in an accepted standard for the
 dependencies that require shading,seems beyond the siloed efforts of
 hadoop, hbase, etc
 
 I propose, we bring all the decision makers from all these artifacts in
 one room and decide best course of action.I am looking at, no projects
 should ever had to shade any artifacts except as an absolute necessary
 alternative.
 
 
 Thank you./Sree
 
 
 
On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B <
 vinayakum...@apache.org> wrote:
 
 Hi,
 Sorry for the late reply,.
>>> To be exact, how can we better use the thirdparty repo? Looking at
 HBase as an example, it looks like everything that are known to break a
>>> lot
 after an update get shaded into the hbase-thirdparty artifact: guava,
 netty, ... etc.
 Is it the purpose to isolate these naughty dependencies?
 Yes, shading is to isolate these naughty dependencies from downstream
 classpath and have independent control on these upgrades without
>> breaking
 downstreams.
 
 

[jira] [Created] (HADOOP-16797) Add dockerfile for ARM builds

2020-01-09 Thread Vinayakumar B (Jira)
Vinayakumar B created HADOOP-16797:
--

 Summary: Add dockerfile for ARM builds
 Key: HADOOP-16797
 URL: https://issues.apache.org/jira/browse/HADOOP-16797
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Vinayakumar B


Similar to x86 docker image in {{dev-support/docker/Dockerfile}},
add one more Dockerfile to support aarch64 builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org