Re: Java 8 Lambdas
LambdaTestUtils added l-expression based intercept() in HADOOP-13716 in October 2016. Five Years Ago. That was still java-7...we added it knowing what java 8 would bring. There is no way we could go back on not using intercept() in tests. Since then some other big l-expression stuff I've been involved in include the org.apache.hadoop.fs.s3a.Invoker which lets you executre a remote operation with conversion of AWS SDK exceptions into IOEs, and a retry policy based off those IOEs. final String region = invoker.retry("getBucketLocation()", bucketName, true, () -> s3.getBucketLocation(bucketName)); That was HADOOP-13786, S3A committers: 2017. Four years ago. Which was after branch-3 was java 8 only. More recently, if you look closely, the whole org.apache.hadoop.util.functional package is designed to give us basic Functional Programming around IOE-raising code, including our remote iterators, giving us a minimal *and tested* set of transformations we can do with our code. public RemoteIterator createLocatedFileStatusIterator( RemoteIterator statusIterator) { return RemoteIterators.mappingRemoteIterator( statusIterator, listingOperationCallbacks::toLocatedFileStatus); } This ties in nicely with the duration tracking/IOStatistics code it came in (HADOOP-17450), so I can evaluate an operation and collect min/mean/max durations of operations, not just log but serialize into the task/job summary files and so get some details on where the bottlenecks are in talking to cloud services. final RemoteIterator listing = trackDuration(iostatistics, OP_DIRECTORY_SCAN, () -> operations.listStatusIterator(srcDir)); So I'm afraid that I will be carrying on using L-expressions, such as in HADOOP-17511. But I don't expect any of the code there to be backportable to Java 7(*) At the same time, I'd like to know what the performance impact of us using l-expressions is in terms of cost of allocations of closures, evaluation etc. There's also the *little* detail that stack trace data doesn't get preserved that well. Together that argues against gratuitous use of java streams. To summarise my PoV then Java 8 lambda expressions are an incredible tool which can be used in interesting and innovative ways. Adding retryability, stats gathering and auditing of remote IO being the key ones I've been using it for, in the Hadoop codebase, for 4-5 years. I'm happy to let someone lay out a style guide on good/bad uses, a "no gratuitous move to streams()" policy, and may be a designated "No Lambda's here" bit of code. (UGI?) But a discussion about whether to have them in the code at all? Not only too late, I don't see how that can be justified. -Steve (*) . Having recently been backporting some ABFS code to a branch-3.1 fork, Mockito version downgrading is enough of a blocker on test cases there that the language version is a detail...you won't get that far.
Re: [E] Re: Java 8 Lambdas
I just think that we should be cognizant of changes (particularly bug fixes), that will need to be ported to branch-2.10. Since it is still on Java7, anytime you use a lambda in code on trunk, we need to change it for branch-2.10. While not difficult, this is extra work and it increases the differences between branches, which can also cause more conflicts when porting bug fixes back. On Wed, Apr 28, 2021 at 9:28 PM Ahmed Hussein wrote: > Thanks Eric for raising this issue! > > The debate about lambda is very complicated and won't be resolved any time > soon. > > I don't personally know a lot about the > > performance of lambdas and welcome arguments on behalf of why lambdas > > No one probably knows :) > - Lambda performance would depend on the JVM implementation. This changes > between > releases. > - Java8+ features forces lambda. For example, > ConcurrentHashMap.computeIfAbsent() > > I believe that we can transform this discussion into specific action items > for future commits: > For instance, a couple of those specifications would be: > - No refactor just for the sake of using Lambda, unless there is a strong > technical justification. > - Usage of lambda in Unit-tests should be fine. If lambda makes the test > more readable, and > allows passing method references, then this should make the unit-tests. > - We put sample code in the "how-to-contribute" to elaborate "capturing Vs > non-capturing" > lambda expressions and the implications of each type on the performance. > - Without getting into much detail, IMHO, streams should be committed into > the code > in exceptional cases. The possibility of executing code in parallel makes > debugging > a nightmare. i.e., Usage of ForEach needs to be justified, what does it > bring to the table? > > On Tue, Apr 27, 2021 at 3:07 PM Eric Badger > wrote: > > > Hello all, > > > > I'd like to gauge the community on the usage of lambdas within Hadoop > code. > > I've been reviewing a lot of patches recently that either add or modify > > lambdas and I'm beginning to think that sometimes we, as a community, are > > writing lambdas because we can rather than because we should. To me, it > > seems that lambdas often decrease the readability of the code, making it > > more difficult to understand. I don't personally know a lot about the > > performance of lambdas and welcome arguments on behalf of why lambdas > > should be used. An additional argument is that lambdas aren't available > in > > Java 7, and branch-2.10 currently supports Java 7. So any code going back > > to branch-2.10 has to be redone upon backporting. Anyway, my main point > > here is to encourage us to rethink whether we should be using lambdas in > > any given circumstance just because we can. > > > > Eric > > > > p.s. I'm also happy to accept this as my personal "old man yells at > cloud" > > issue if everyone else thinks lambdas are the greatest > > > > > -- > Best Regards, > > *Ahmed Hussein, PhD* >
Re: Java 8 Lambdas
Thanks Eric for raising this issue! The debate about lambda is very complicated and won't be resolved any time soon. I don't personally know a lot about the > performance of lambdas and welcome arguments on behalf of why lambdas No one probably knows :) - Lambda performance would depend on the JVM implementation. This changes between releases. - Java8+ features forces lambda. For example, ConcurrentHashMap.computeIfAbsent() I believe that we can transform this discussion into specific action items for future commits: For instance, a couple of those specifications would be: - No refactor just for the sake of using Lambda, unless there is a strong technical justification. - Usage of lambda in Unit-tests should be fine. If lambda makes the test more readable, and allows passing method references, then this should make the unit-tests. - We put sample code in the "how-to-contribute" to elaborate "capturing Vs non-capturing" lambda expressions and the implications of each type on the performance. - Without getting into much detail, IMHO, streams should be committed into the code in exceptional cases. The possibility of executing code in parallel makes debugging a nightmare. i.e., Usage of ForEach needs to be justified, what does it bring to the table? On Tue, Apr 27, 2021 at 3:07 PM Eric Badger wrote: > Hello all, > > I'd like to gauge the community on the usage of lambdas within Hadoop code. > I've been reviewing a lot of patches recently that either add or modify > lambdas and I'm beginning to think that sometimes we, as a community, are > writing lambdas because we can rather than because we should. To me, it > seems that lambdas often decrease the readability of the code, making it > more difficult to understand. I don't personally know a lot about the > performance of lambdas and welcome arguments on behalf of why lambdas > should be used. An additional argument is that lambdas aren't available in > Java 7, and branch-2.10 currently supports Java 7. So any code going back > to branch-2.10 has to be redone upon backporting. Anyway, my main point > here is to encourage us to rethink whether we should be using lambdas in > any given circumstance just because we can. > > Eric > > p.s. I'm also happy to accept this as my personal "old man yells at cloud" > issue if everyone else thinks lambdas are the greatest > -- Best Regards, *Ahmed Hussein, PhD*
Java 8 Lambdas
Hello all, I'd like to gauge the community on the usage of lambdas within Hadoop code. I've been reviewing a lot of patches recently that either add or modify lambdas and I'm beginning to think that sometimes we, as a community, are writing lambdas because we can rather than because we should. To me, it seems that lambdas often decrease the readability of the code, making it more difficult to understand. I don't personally know a lot about the performance of lambdas and welcome arguments on behalf of why lambdas should be used. An additional argument is that lambdas aren't available in Java 7, and branch-2.10 currently supports Java 7. So any code going back to branch-2.10 has to be redone upon backporting. Anyway, my main point here is to encourage us to rethink whether we should be using lambdas in any given circumstance just because we can. Eric p.s. I'm also happy to accept this as my personal "old man yells at cloud" issue if everyone else thinks lambdas are the greatest