Agenda & More Information about Hadoop Community Meetup @ Palo Alto, June 26

2019-06-19 Thread Wangda Tan
Hi All,

I want to let you know that we have confirmed most of the agenda for Hadoop
Community Meetup. It will be a whole day event.

Agenda & Dial-In info because see below, *please RSVP
at https://www.meetup.com/Hadoop-Contributors/events/262055924/
*

Huge thanks to Daniel Templeton, Wei-Chiu Chuang, Christina Vu for helping
with organizing and logistics.

*Please help to promote meetup information on Twitter, LinkedIn, etc.
Appreciated! *

Best,
Wangda

























































*AM:9:00: Arrival and check-in--9:30 -
10:15:-Talk: Hadoop storage in cloud-native
environmentsAbstract: Hadoop is a mature storage system but designed years
before the cloud-native movement. Kubernetes and other cloud-native tools
are emerging solutions for containerized environments but sometimes they
require different approaches.In this presentation we would like to share
our experiences to run Apache Hadoop Ozone in Kubernetes and the connection
point to other cloud-native ecosystem elements. We will compare the
benefits and drawbacks to use Kubernetes and Hadoop storage together and
show our current achievements and future plans.Speaker: Marton Elek
(Cloudera)10:20 - 11:00:--Talk: Selective Wire Encryption In
HDFSAbstract: Wire data encryption is a key component of the Hadoop
Distributed File System (HDFS). However, such encryption enforcement comes
in as an all-or-nothing feature. In our use case at LinkedIn, we would like
to selectively expose fast unencrypted access to fully managed internal
clients, which can be trusted, while only expose encrypted access to
clients outside of the trusted circle with higher security risks. That way
we minimize performance overhead for trusted internal clients while still
securing data from potential outside threats. Our design extends HDFS
NameNode to run on multiple ports, connecting to different NameNode ports
would end up with different levels of encryption protection. This
protection then gets enforced for both NameNode RPC and the subsequent data
transfers to/from DataNode. This approach comes with minimum operational
and performance overhead.Speaker: Konstantin Shvachko (LinkedIn), Chen
Liang (LinkedIn)11:10 - 11:55:-Talk: YuniKorn: Next Generation
Scheduling for YARN and K8sAbstract: We will talk about our open source
work - YuniKorn scheduler project (Y for YARN, K for K8s, uni- for Unified)
brings long-wanted features such as hierarchical queues, fairness between
users/jobs/queues, preemption to Kubernetes; and it brings service
scheduling enhancements to YARN. Any improvements to this scheduler can
benefit both Kubernetes and YARN community.Speaker: Wangda Tan
(Cloudera)PM:12:00 - 12:55 Lunch Break (Provided by
Cloudera)1:00 -
1:25---Talk: Yarn Efficiency at UberAbstract: We will present the
work done at Uber to improve YARN cluster utilization and job SOA with
elastic resource management, low compute workload on passive datacenter,
preemption, larger container, etc. We will also go through YARN upgrade in
order to adopt new features and talk about the challenges.Speaker: Aihua Xu
(Uber), Prashant Golash (Uber)1:30 - 2:10 One more
talk-2:20 - 4:00---BoF sessions &
Breakout Sessions & Group discussions: Talk about items like JDK 11
support, next releases (2.10.0, 3.3.0, etc.), Hadoop on Cloud, etc.4:00:
Reception provided by
Cloudera.==Join Zoom
Meetinghttps://cloudera.zoom.us/j/116816195
*


Re: [DISCUSS] A unified and open Hadoop community sync up schedule?

2019-06-19 Thread Lars Francke
Just another thing you might be interested in: The Comdev project recently
voted on creating an iot@apache (exact name and details still unclear)
mailing list for cross-project collaboration.

It's been on my list of things to do to propose the same for "bigdata" (I
know the name sucks but i don't have a better one) as a single place for
projects to coordinate between each other in public without having to
subscribe to each others mailing lists.

On Wed, Jun 19, 2019 at 6:03 PM Sunil Govindan  wrote:

> Hi Eric
>
> This community meeting will be starting from next week onwards
>
> - Sunil
>
> On Wed, Jun 19, 2019 at 9:14 PM Eric Badger
>  wrote:
>
> > Is there a YARN call today (in 16 minutes)? I saw it on the calendar
> until
> > a few minutes ago.
> >
> > Eric
> >
> > On Tue, Jun 18, 2019 at 11:18 PM Wangda Tan  wrote:
> >
> > > Thanks @Wei-Chiu Chuang  . updated gdoc
> > >
> > > On Tue, Jun 18, 2019 at 7:35 PM Wei-Chiu Chuang 
> > > wrote:
> > >
> > > > Thanks Wangda,
> > > >
> > > > I just like to make a correction -- the .ics calendar file says the
> > first
> > > > Wednesday for HDFS/cloud connector is in Mandarin whereas on the gdoc
> > is
> > > to
> > > > host it on the third Wednesday.
> > > >
> > > > On Tue, Jun 18, 2019 at 5:29 PM Wangda Tan 
> > wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I just updated doc:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#
> > > > > with
> > > > > dial-in information, notes, etc.
> > > > >
> > > > > Here's a calendar to subscribe:
> > > > >
> > > > >
> > > >
> > >
> >
> https://calendar.google.com/calendar/ical/hadoop.community.sync.up%40gmail.com/public/basic.ics
> > > > >
> > > > > I'm thinking to give it a try from next week, any suggestions?
> > > > >
> > > > > Thanks,
> > > > > Wangda
> > > > >
> > > > > On Fri, Jun 14, 2019 at 4:02 PM Wangda Tan 
> > > wrote:
> > > > >
> > > > > > And please let me know if you can help with coordinate logistics
> > > stuff,
> > > > > > cross-checking, etc. Let's spend some time next week to get it
> > > > finalized.
> > > > > >
> > > > > > Thanks,
> > > > > > Wangda
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 4:00 PM Wangda Tan 
> > > > wrote:
> > > > > >
> > > > > >> Hi Folks,
> > > > > >>
> > > > > >> Yufei: Agree with all your opinions.
> > > > > >>
> > > > > >> Anu: it might be more efficient to use Google doc to track
> meeting
> > > > > >> minutes and we can put them together.
> > > > > >>
> > > > > >> I just put the proposal to
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://calendar.google.com/calendar/b/3?cid=aGFkb29wLmNvbW11bml0eS5zeW5jLnVwQGdtYWlsLmNvbQ
> > > > > ,
> > > > > >> you can check if the proposal time works or not. If you agree,
> we
> > > can
> > > > go
> > > > > >> ahead to add meeting link, google doc, etc.
> > > > > >>
> > > > > >> If you want to have edit permissions, please drop a private
> email
> > to
> > > > me
> > > > > >> so I will add you.
> > > > > >>
> > > > > >> We still need more hosts, in each track, ideally we should have
> at
> > > > least
> > > > > >> 3 hosts per track just like HDFS blocks :), please volunteer, so
> > we
> > > > can
> > > > > >> have enough members to run the meeting.
> > > > > >>
> > > > > >> Let's shoot by end of the next week, let's get all logistics
> done
> > > and
> > > > > >> starting community sync up series from the week of Jun 25th.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Wangda
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Wangda
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Jun 11, 2019 at 10:23 AM Anu Engineer <
> > > aengin...@cloudera.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> For Ozone, we have started using the Wiki itself as the agenda
> > and
> > > > > after
> > > > > >>> the meeting is over, we convert it into the meeting notes.
> > > > > >>> Here is an example, the project owner can edit and maintain it,
> > it
> > > is
> > > > > >>> like 10 mins work - and allows anyone to add stuff into the
> > agenda
> > > > too.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/2019-06-10+Meeting+notes
> > > > > >>>
> > > > > >>> --Anu
> > > > > >>>
> > > > > >>> On Tue, Jun 11, 2019 at 10:20 AM Yufei Gu <
> flyrain...@gmail.com>
> > > > > wrote:
> > > > > >>>
> > > > >  +1 for this idea. Thanks Wangda for bringing this up.
> > > > > 
> > > > >  Some comments to share:
> > > > > 
> > > > > - Agenda needed to be posted ahead of meeting and welcome
> any
> > > > >  interested
> > > > > party to contribute to topics.
> > > > > - We should encourage more people to attend. That's whole
> > point
> > > > of
> > > > >  the
> > > > > meeting.
> > > > > - Hopefully, this can mitigate the situation that some
> > patches
> > > > are
> > > > > waiting for review for ever, which turns away 

Re: June Hadoop Community Meetup

2019-06-19 Thread Vinod Kumar Vavilapalli
Plus general@ list.

Thanks
+Vinod

> On Jun 4, 2019, at 9:47 PM, Daniel Templeton  
> wrote:
> 
> The meetup page is now live:
> 
>https://www.meetup.com/Hadoop-Contributors/events/262055924
> 
> I'll fill in the agenda details after we get them nailed down.  The meetup 
> will be an all-day event on June 26th with lunch provided and a reception 
> after.  Let me know if there are any questions.
> 
> Hope to see you there!
> Daniel
> 
> On 5/23/19 10:57 AM, Daniel Templeton wrote:
>> Hi, all!  I want to let you know that Cloudera is planning to host a 
>> contributors meetup on June 26 at our Palo Alto headquarters. We're still 
>> working out the details, but we're hoping to follow the format that Oath and 
>> LinkedIn followed during the last two. Please feel free to reach out to me 
>> if you have a topic you'd like to propose for the meetup.  I will also be 
>> reaching out to key folks from the community to solicit ideas.  I will send 
>> out an update with more details when I have more to share.
>> 
>> Thanks!
>> Daniel
> 
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] A unified and open Hadoop community sync up schedule?

2019-06-19 Thread Sunil Govindan
Hi Eric

This community meeting will be starting from next week onwards

- Sunil

On Wed, Jun 19, 2019 at 9:14 PM Eric Badger
 wrote:

> Is there a YARN call today (in 16 minutes)? I saw it on the calendar until
> a few minutes ago.
>
> Eric
>
> On Tue, Jun 18, 2019 at 11:18 PM Wangda Tan  wrote:
>
> > Thanks @Wei-Chiu Chuang  . updated gdoc
> >
> > On Tue, Jun 18, 2019 at 7:35 PM Wei-Chiu Chuang 
> > wrote:
> >
> > > Thanks Wangda,
> > >
> > > I just like to make a correction -- the .ics calendar file says the
> first
> > > Wednesday for HDFS/cloud connector is in Mandarin whereas on the gdoc
> is
> > to
> > > host it on the third Wednesday.
> > >
> > > On Tue, Jun 18, 2019 at 5:29 PM Wangda Tan 
> wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I just updated doc:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#
> > > > with
> > > > dial-in information, notes, etc.
> > > >
> > > > Here's a calendar to subscribe:
> > > >
> > > >
> > >
> >
> https://calendar.google.com/calendar/ical/hadoop.community.sync.up%40gmail.com/public/basic.ics
> > > >
> > > > I'm thinking to give it a try from next week, any suggestions?
> > > >
> > > > Thanks,
> > > > Wangda
> > > >
> > > > On Fri, Jun 14, 2019 at 4:02 PM Wangda Tan 
> > wrote:
> > > >
> > > > > And please let me know if you can help with coordinate logistics
> > stuff,
> > > > > cross-checking, etc. Let's spend some time next week to get it
> > > finalized.
> > > > >
> > > > > Thanks,
> > > > > Wangda
> > > > >
> > > > > On Fri, Jun 14, 2019 at 4:00 PM Wangda Tan 
> > > wrote:
> > > > >
> > > > >> Hi Folks,
> > > > >>
> > > > >> Yufei: Agree with all your opinions.
> > > > >>
> > > > >> Anu: it might be more efficient to use Google doc to track meeting
> > > > >> minutes and we can put them together.
> > > > >>
> > > > >> I just put the proposal to
> > > > >>
> > > >
> > >
> >
> https://calendar.google.com/calendar/b/3?cid=aGFkb29wLmNvbW11bml0eS5zeW5jLnVwQGdtYWlsLmNvbQ
> > > > ,
> > > > >> you can check if the proposal time works or not. If you agree, we
> > can
> > > go
> > > > >> ahead to add meeting link, google doc, etc.
> > > > >>
> > > > >> If you want to have edit permissions, please drop a private email
> to
> > > me
> > > > >> so I will add you.
> > > > >>
> > > > >> We still need more hosts, in each track, ideally we should have at
> > > least
> > > > >> 3 hosts per track just like HDFS blocks :), please volunteer, so
> we
> > > can
> > > > >> have enough members to run the meeting.
> > > > >>
> > > > >> Let's shoot by end of the next week, let's get all logistics done
> > and
> > > > >> starting community sync up series from the week of Jun 25th.
> > > > >>
> > > > >> Thanks,
> > > > >> Wangda
> > > > >>
> > > > >> Thanks,
> > > > >> Wangda
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Tue, Jun 11, 2019 at 10:23 AM Anu Engineer <
> > aengin...@cloudera.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> For Ozone, we have started using the Wiki itself as the agenda
> and
> > > > after
> > > > >>> the meeting is over, we convert it into the meeting notes.
> > > > >>> Here is an example, the project owner can edit and maintain it,
> it
> > is
> > > > >>> like 10 mins work - and allows anyone to add stuff into the
> agenda
> > > too.
> > > > >>>
> > > > >>>
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/2019-06-10+Meeting+notes
> > > > >>>
> > > > >>> --Anu
> > > > >>>
> > > > >>> On Tue, Jun 11, 2019 at 10:20 AM Yufei Gu 
> > > > wrote:
> > > > >>>
> > > >  +1 for this idea. Thanks Wangda for bringing this up.
> > > > 
> > > >  Some comments to share:
> > > > 
> > > > - Agenda needed to be posted ahead of meeting and welcome any
> > > >  interested
> > > > party to contribute to topics.
> > > > - We should encourage more people to attend. That's whole
> point
> > > of
> > > >  the
> > > > meeting.
> > > > - Hopefully, this can mitigate the situation that some
> patches
> > > are
> > > > waiting for review for ever, which turns away new
> contributors.
> > > > - 30m per session sounds a little bit short, we can try it
> out
> > > and
> > > >  see
> > > > if extension is needed.
> > > > 
> > > >  Best,
> > > > 
> > > >  Yufei
> > > > 
> > > >  `This is not a contribution`
> > > > 
> > > > 
> > > >  On Fri, Jun 7, 2019 at 4:39 PM Wangda Tan 
> > > > wrote:
> > > > 
> > > >  > Hi Hadoop-devs,
> > > >  >
> > > >  > Previous we have regular YARN community sync up (1 hr,
> biweekly,
> > > but
> > > >  not
> > > >  > open to public). Recently because of changes in our schedules,
> > > Less
> > > >  folks
> > > >  > showed up in the sync up for the last several months.
> > > >  >
> > > >  > I saw the K8s community did a pretty good job to run their sig
> > > >  meetings,
> > > >  > 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2019-06-19 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1172/

[Jun 18, 2019 2:55:56 AM] (weichiu) HADOOP-14807. should prevent the 
possibility of NPE about
[Jun 18, 2019 3:18:53 AM] (weichiu) HDFS-13730. 
BlockReaderRemote.sendReadResult throws NPE. Contributed by
[Jun 18, 2019 4:23:52 AM] (ztang) YARN-9584. Should put initializeProcessTrees 
method call before get pid.
[Jun 18, 2019 4:48:16 AM] (weichiu) HDFS-12770. Add doc about how to disable 
client socket cache.
[Jun 18, 2019 4:51:33 AM] (weichiu) HADOOP-9157. Better option for curl in 
hadoop-auth-examples. Contributed
[Jun 18, 2019 5:45:52 AM] (weichiu) HDFS-14340. Lower the log level when can't 
get postOpAttr. Contributed
[Jun 18, 2019 5:54:21 AM] (iwasakims) YARN-9630. [UI2] Add a link in docs's top 
page
[Jun 18, 2019 5:56:00 AM] (weichiu) HADOOP-15914. hadoop jar command has no 
help argument. Contributed by
[Jun 18, 2019 6:06:02 AM] (weichiu) HDFS-12315. Use Path instead of String to 
check closedFiles set.
[Jun 18, 2019 6:11:25 AM] (weichiu) HDFS-12314. Typo in the
[Jun 18, 2019 6:47:57 AM] (weichiu) HADOOP-16156. [Clean-up] Remove NULL check 
before instanceof and fix
[Jun 18, 2019 6:55:38 AM] (elek) HDDS-1694. TestNodeReportHandler is failing 
with NPE
[Jun 18, 2019 7:35:48 AM] (weichiu) HDFS-14010. Pass correct DF usage to 
ReservedSpaceCalculator builder.
[Jun 18, 2019 3:51:16 PM] (bharat) HDDS-1670. Add limit support to 
/api/containers and /api/containers/{id}
[Jun 18, 2019 4:58:29 PM] (inigoiri) HDFS-14201. Ability to disallow safemode 
NN to become active.
[Jun 18, 2019 5:21:22 PM] (templedf) HDFS-14487. Missing Space in Client Error 
Message (Contributed by Shweta
[Jun 18, 2019 5:51:12 PM] (eyang) YARN-9574. Update 
hadoop-yarn-applications-mawo artifactId to match
[Jun 18, 2019 6:13:13 PM] (bharat) HDDS-1699. Update RocksDB version to 6.0.1 
(#980)
[Jun 18, 2019 7:01:26 PM] (weichiu) HDFS-14078. Admin helper fails to prettify 
NullPointerExceptions.
[Jun 18, 2019 9:44:23 PM] (aengineer) HDDS-1702. Optimize Ozone Recon build 
time (#982)
[Jun 18, 2019 11:08:48 PM] (github) HDDS-1684. OM should create Ratis related 
dirs only if ratis is enabled




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-documentstore
 
   Unread field:TimelineEventSubDoc.java:[line 56] 
   Unread field:TimelineMetricSubDoc.java:[line 44] 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core
 
   Class org.apache.hadoop.applications.mawo.server.common.TaskStatus 
implements Cloneable but does not define or use clone method At 
TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 
39-346] 
   Equals method for 
org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument 
is of type WorkerId At WorkerId.java:the argument is of type WorkerId At 
WorkerId.java:[line 114] 
   
org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does 
not check for null argument At WorkerId.java:null argument At 
WorkerId.java:[lines 114-115] 

Failed junit tests :

   hadoop.util.TestReadWriteDiskValidator 
   hadoop.metrics2.sink.TestFileSink 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapreduce.v2.app.TestRuntimeEstimators 
   hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis 
   hadoop.ozone.client.rpc.TestOzoneAtRestEncryption 
   hadoop.ozone.client.rpc.TestOzoneRpcClient 
   hadoop.ozone.client.rpc.TestSecureOzoneRpcClient 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1172/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1172/artifact/out/diff-compile-javac-root.txt
  [332K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1172/artifact/out/diff-checkstyle-root.txt
  [17M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1172/artifact/out/diff-patch-hadolint.txt
  [8.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1172/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1172/artifact/out/diff-patch-pylint.txt
  [120K]

   shellcheck:

   

Re: [DISCUSS] A unified and open Hadoop community sync up schedule?

2019-06-19 Thread Eric Badger
Is there a YARN call today (in 16 minutes)? I saw it on the calendar until
a few minutes ago.

Eric

On Tue, Jun 18, 2019 at 11:18 PM Wangda Tan  wrote:

> Thanks @Wei-Chiu Chuang  . updated gdoc
>
> On Tue, Jun 18, 2019 at 7:35 PM Wei-Chiu Chuang 
> wrote:
>
> > Thanks Wangda,
> >
> > I just like to make a correction -- the .ics calendar file says the first
> > Wednesday for HDFS/cloud connector is in Mandarin whereas on the gdoc is
> to
> > host it on the third Wednesday.
> >
> > On Tue, Jun 18, 2019 at 5:29 PM Wangda Tan  wrote:
> >
> > > Hi Folks,
> > >
> > > I just updated doc:
> > >
> > >
> >
> https://docs.google.com/document/d/1GfNpYKhNUERAEH7m3yx6OfleoF3MqoQk3nJ7xqHD9nY/edit#
> > > with
> > > dial-in information, notes, etc.
> > >
> > > Here's a calendar to subscribe:
> > >
> > >
> >
> https://calendar.google.com/calendar/ical/hadoop.community.sync.up%40gmail.com/public/basic.ics
> > >
> > > I'm thinking to give it a try from next week, any suggestions?
> > >
> > > Thanks,
> > > Wangda
> > >
> > > On Fri, Jun 14, 2019 at 4:02 PM Wangda Tan 
> wrote:
> > >
> > > > And please let me know if you can help with coordinate logistics
> stuff,
> > > > cross-checking, etc. Let's spend some time next week to get it
> > finalized.
> > > >
> > > > Thanks,
> > > > Wangda
> > > >
> > > > On Fri, Jun 14, 2019 at 4:00 PM Wangda Tan 
> > wrote:
> > > >
> > > >> Hi Folks,
> > > >>
> > > >> Yufei: Agree with all your opinions.
> > > >>
> > > >> Anu: it might be more efficient to use Google doc to track meeting
> > > >> minutes and we can put them together.
> > > >>
> > > >> I just put the proposal to
> > > >>
> > >
> >
> https://calendar.google.com/calendar/b/3?cid=aGFkb29wLmNvbW11bml0eS5zeW5jLnVwQGdtYWlsLmNvbQ
> > > ,
> > > >> you can check if the proposal time works or not. If you agree, we
> can
> > go
> > > >> ahead to add meeting link, google doc, etc.
> > > >>
> > > >> If you want to have edit permissions, please drop a private email to
> > me
> > > >> so I will add you.
> > > >>
> > > >> We still need more hosts, in each track, ideally we should have at
> > least
> > > >> 3 hosts per track just like HDFS blocks :), please volunteer, so we
> > can
> > > >> have enough members to run the meeting.
> > > >>
> > > >> Let's shoot by end of the next week, let's get all logistics done
> and
> > > >> starting community sync up series from the week of Jun 25th.
> > > >>
> > > >> Thanks,
> > > >> Wangda
> > > >>
> > > >> Thanks,
> > > >> Wangda
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Jun 11, 2019 at 10:23 AM Anu Engineer <
> aengin...@cloudera.com
> > >
> > > >> wrote:
> > > >>
> > > >>> For Ozone, we have started using the Wiki itself as the agenda and
> > > after
> > > >>> the meeting is over, we convert it into the meeting notes.
> > > >>> Here is an example, the project owner can edit and maintain it, it
> is
> > > >>> like 10 mins work - and allows anyone to add stuff into the agenda
> > too.
> > > >>>
> > > >>>
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/2019-06-10+Meeting+notes
> > > >>>
> > > >>> --Anu
> > > >>>
> > > >>> On Tue, Jun 11, 2019 at 10:20 AM Yufei Gu 
> > > wrote:
> > > >>>
> > >  +1 for this idea. Thanks Wangda for bringing this up.
> > > 
> > >  Some comments to share:
> > > 
> > > - Agenda needed to be posted ahead of meeting and welcome any
> > >  interested
> > > party to contribute to topics.
> > > - We should encourage more people to attend. That's whole point
> > of
> > >  the
> > > meeting.
> > > - Hopefully, this can mitigate the situation that some patches
> > are
> > > waiting for review for ever, which turns away new contributors.
> > > - 30m per session sounds a little bit short, we can try it out
> > and
> > >  see
> > > if extension is needed.
> > > 
> > >  Best,
> > > 
> > >  Yufei
> > > 
> > >  `This is not a contribution`
> > > 
> > > 
> > >  On Fri, Jun 7, 2019 at 4:39 PM Wangda Tan 
> > > wrote:
> > > 
> > >  > Hi Hadoop-devs,
> > >  >
> > >  > Previous we have regular YARN community sync up (1 hr, biweekly,
> > but
> > >  not
> > >  > open to public). Recently because of changes in our schedules,
> > Less
> > >  folks
> > >  > showed up in the sync up for the last several months.
> > >  >
> > >  > I saw the K8s community did a pretty good job to run their sig
> > >  meetings,
> > >  > there's regular meetings for different topics, notes, agenda,
> etc.
> > >  Such as
> > >  >
> > >  >
> > > 
> > >
> >
> https://docs.google.com/document/d/13mwye7nvrmV11q9_Eg77z-1w3X7Q1GTbslpml4J7F3A/edit
> > >  >
> > >  >
> > >  > For Hadoop community, there are less such regular meetings open
> to
> > > the
> > >  > public except for Ozone project and offline meetups or
> > >  Bird-of-Features in
> > >  > Hadoop/DataWorks Summit. Recently we have a 

HDFS-13287 broke the build; reverted

2019-06-19 Thread Wei-Chiu Chuang
Sorry for the cross-post.

If you find your build breaks, it's on me. I committed HDFS-13287 which
resulted in compilation error in trunk/branch-3.2/branch-3.1. The commit
has since been reverted.

Thanks Akira for identifying the issue!


Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2019-06-19 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/

[Jun 18, 2019 3:20:34 AM] (weichiu) HDFS-13730. 
BlockReaderRemote.sendReadResult throws NPE. Contributed by
[Jun 18, 2019 7:25:53 AM] (weichiu) HDFS-13770. dfsadmin -report does not 
always decrease "missing blocks
[Jun 18, 2019 9:38:42 PM] (weichiu) HDFS-14101. Random failure of 
testListCorruptFilesCorruptedBlock.




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   module:hadoop-common-project/hadoop-common 
   Class org.apache.hadoop.fs.GlobalStorageStatistics defines non-transient 
non-serializable instance field map In GlobalStorageStatistics.java:instance 
field map In GlobalStorageStatistics.java 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.TestRollingUpgrade 
   hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.server.nodemanager.containermanager.TestContainerManager 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
   hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities 
   hadoop.mapreduce.TestMapReduceLazyOutput 
   hadoop.mapreduce.v2.TestMiniMRProxyUser 
   hadoop.mapreduce.v2.TestUberAM 
   hadoop.mapreduce.v2.TestMRJobsWithHistoryService 
   hadoop.mapred.gridmix.TestLoadJob 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-compile-cc-root-jdk1.8.0_212.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-compile-javac-root-jdk1.8.0_212.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-patch-shellcheck.txt
  [72K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/whitespace-tabs.txt
  [1.2M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/357/artifact/out/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
  [8.0K]
   

[jira] [Created] (HADOOP-16382) clock skew can cause S3Guard to think object metadata is out of date

2019-06-19 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-16382:
---

 Summary: clock skew can cause S3Guard to think object metadata is 
out of date
 Key: HADOOP-16382
 URL: https://issues.apache.org/jira/browse/HADOOP-16382
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran


When a S3Guard entry is added for an object, its last updated flag is taken 
from the local clock: if a getFileStatus is made immediately afterwards, the 
timestamp of the file from the HEAD may be > than the local time, so the DDB 
entry updated.

This is even if the clocks are *close*. When updating an entry from S3, the 
actual timestamp of the file should be used to fix it, not local clocks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: HDFS Scalability Limit?

2019-06-19 Thread Konstantin Shvachko
Hi Wei-Chiu,

We run similar Hadoop installations as Kihwal describes. Thanks for sharing
Kihwal.
Our clusters are also not Federated.
So far the growth rate is the same as we reported in our talk last year
(slide #5) 2x per year:
https://www.slideshare.net/Hadoop_Summit/scaling-hadoop-at-linkedin-107176757

We track three main metrics: Total space used, Number of Objects (Files+
blocks), and Number of Tasks per day.
I found that the number of nodes is mostly irrelevant in measuring cluster
size, since the nodes are very diverse in configuration and are constantly
upgrading, so you may have the same #nodes, but much more drives, cores,
RAM on each of them - a bigger cluster.

I do not see 200 GB heap size as a limit. We ran Dynamometer experiments
with a bigger heap fitting 1 billion files and blocks. It should be doable,
but we may hit other scalability limits when we get to so many objects. See
Erik's talk discussing the experiments and solutions:
https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dynamometer-and-a-case-study-in-namenode-gc

Well Hadoop scalability has always been a moving target for me. I don't
think you can set it in stone once and for all.

Thanks,
--Konstantin

On Sat, Jun 15, 2019 at 5:20 PM Wei-Chiu Chuang  wrote:

> Thank you, Kihwal for the insightful comments!
>
> As I understand it, Yahoo's ops team has a good control of application
> behavior. I tend to be conservative in terms of number of files and
> heap size. We don't have such luxury, and our customers have a wide
> spectrum of workloads and features (e.g., snapshots, data at-rest
> encryption, Impala).
>
> Yes -- decomm/recomm is a pain, and I am working with my colleague,
> @Stephen
> O'Donnell  , to address this problem. Have you
> tried maintenance mode? It's in Hadoop 2.9 but a number of decomm/recomm
> needs are alleviated by maintenance mode.
>
> I know Twitter is a big user of maintenance mode, and I'm wondering if
> Twitter folks can offer some experience with it at large scale. CDH
> supports maintenance mode, but our users don't seem to be quite familiar
> with it. Are there issues that were dealt with, but not reported in the
> JIRA? Basically, I'd like to know the operational complexity of this
> feature at large scale.
>
> On Thu, Jun 13, 2019 at 4:00 PM Kihwal Lee  .invalid>
> wrote:
>
> > Hi Wei-Chiu,
> >
> > We have experience with 5,000 - 6,000 node clusters.  Although it
> ran/runs
> > fine, any heavy hitter activities such as decommissioning needed to be
> > carefully planned.   In terms of files and blocks, we have multiple
> > clusters running stable with over 500M files and blocks.  Some at over
> 800M
> > with the max heap at 256GB. It can probably go higher, but we haven't
> done
> > performance testing & optimizations beyond 256GB yet.  All our clusters
> are
> > un-federated. Funny how the feature was developed in Yahoo! and ended up
> > not being used here. :)  We have a cluster with about 180PB of
> provisioned
> > space. Many clusters are using over 100PB in their steady state.  We
> don't
> > run datanodes too dense, so can't tell what the per-datanode limit is.
> >
> > Thanks and 73
> > Kihwal
> >
> > On Thu, Jun 13, 2019 at 1:57 PM Wei-Chiu Chuang 
> > wrote:
> >
> >> Hi community,
> >>
> >> I am currently drafting a HDFS scalability guideline doc, and I'd like
> to
> >> understand any data points regarding HDFS scalability limit. I'd like to
> >> share it publicly eventually.
> >>
> >> As an example, through my workplace, and through community chatters, I
> am
> >> aware that HDFS is capable of operating at the following scale:
> >>
> >> Number of DataNodes:
> >> Unfederated: I can reasonably believe a single HDFS NameNode can manage
> up
> >> to 4000 DataNodes. Is there any one who would like to share an even
> larger
> >> cluster?
> >>
> >> Federated: I am aware of one federated HDFS cluster composed of 20,000
> >> DataNodes. JD.com
> >> <
> >>
> https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/64692
> >> >
> >> has a 15K DN cluster and 210PB total capacity. I suspect it's a
> federated
> >> HDFS cluster.
> >>
> >> Number of blocks & files:
> >> 500 million files seems to be the upper limit at this point. At
> >> this
> >> scale NameNode consumes around 200GB heap, and my experience told me any
> >> number beyond 200GB is unstable. But at some point I recalled some one
> >> mentioned a 400GB NN heap.
> >>
> >> Amount of Data:
> >> I am aware a few clusters more than 100PB in size (federated or not) --
> >> Uber, Yahoo Japan, JD.com.
> >>
> >> Number of Volumes in a DataNode:
> >> DataNodes with 24 volumes is known to work reasonably well. If DataNode
> is
> >> used for archival use cases, a DN can have up to 48 volumes. This is
> >> certainly hardware dependent, but if I know where the current limit is,
> I
> >> can start optimizing the software.
> >>
> >> Total disk space:
> >> CDH
> >> <
> >>
>