[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051036#comment-13051036 ] Hudson commented on HDFS-2058: -- Integrated in Hadoop-Hdfs-trunk #699 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/699/]) DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050115#comment-13050115 ] Hudson commented on HDFS-2058: -- Integrated in Hadoop-Hdfs-trunk-Commit #746 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/]) DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047049#comment-13047049 ] Todd Lipcon commented on HDFS-2058: --- P.S. don't be alarmed at the size of the patch. If you exclude generated code it's only: 16 files changed, 805 insertions(+), 673 deletions(-) DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047055#comment-13047055 ] Hadoop QA commented on HDFS-2058: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482031/hdfs-2058.txt against trunk revision 1134170. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/760//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/760//console This message is automatically generated. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047215#comment-13047215 ] M. C. Srivas commented on HDFS-2058: I thought Hadoop was standardizing on Avro everywhere. Any reason this work is not in Avro? DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047289#comment-13047289 ] Todd Lipcon commented on HDFS-2058: --- I don't want this JIRA to devolve into a discussion of the merits and demerits of various serialization frameworks. In the past those discussions have been what resulted in us picking _no_ framework instead of just getting it done with _something_. That said, here is my quick summary of why I picked protobufs vs Avro and Thrift: h3. Avro Avro is a fantastic data serialization framework with the following unique features: (a) dynamic schema stored with the data, (b) very compact storage format, (c) a standardized container format (d) Java-based codegen that integrates easily into a build. Features A, B, and C are very good when you want to store a lot of data on disk: it's small, you can read it without knowing what someone else wrote, and it's splittable and compressible in MR. D is great since you don't need to make developers install anything. For the case of the DataTransferProtocol and Hadoop RPC in general, features A through C are less useful. The different parts of HDFS may divolve slightly over time but there's no need to talk to a completely unknown server. Compactness is always a plus, but a 10-20% improvement on compactness of header data only translates to a 1% improvement of compactness on data transfer, since the ratio of data:header is very high. The storage format doesn't help any for RPC -- this is transient. In addition, the dynamic nature of Avro requires the readers and writers know the schema of their peer in order to communicate. This has to be done with a handshake of some kind. It would certainly be possible to implement this, but in order to do it without an extra round trip you need to add schema dictionaries, hashes, etc. Plus, the peer's schema needs to be threaded throughout the places where serialization/deserialization is done. This is possible, but I didn't want to do this work. h3. Thrift vs Protobufs I like Thrift a lot -- in fact I'm a Thrift committer and PMC member. So it might seem strange that I didn't pick Thrift. Here's my thinking: - Thrift and Protobuf are more or less equivalent: tagged serialization, codegen tool written in C++, good language support, mature wire format - Thrift has the plus side that it's a true open source community at the ASF with some committer overlap with the people working on Hadoop - Protobufs has the plus side that, apparently, MR2/YARN has chosen it for their RPC formats. - Protobuf has two nice features that thrift doesn't have yet: 1) when unknown data is read, it is maintained in a map and then put back on the wire if the same object is rewritten. 2) it has a decent plugin system that makes it easy to modify the generated code -- even with a plugin written in python or Java, in theory. These could be implemented in Thrift, but again, I didn't want to take the time. - Thrift's main advantage vs protobufs is a standardized RPC wire format and set of clients/servers. I don't think the Java implementations in Thrift are nearly as mature as the Hadoop RPC stack, and swapping out for entirely new RPC transport is a lot more work than just switching serialization mechanisms. Since we already have a pretty good (albeit nonstandard) RPC stack, this advantage of Thrift is less of a big deal. h3. Conclusions - In the end I was torn between protobufs and Thrift. Mostly since MR2 uses Protobuf already, I just went with it. - I think protobufs is a good choice for wire format serialization. I still think Avro is a good choice for disk storage (eg perhaps using Avro to store a denser machine-readable version of the audit log). These two things can coexist just fine. - There is working code attached to this JIRA. If you disagree with my thinking above, please do not post a comment; post a patch with your serialization of choice, showing that the code is at least as clean, the performance is comparable, and the tests pass. - IMO, there are a lot of interesting things to work on and discuss in Hadoop; this is not one of them. Let's just get something (anything) in there that works and move on with our lives. So, assuming I have support from a couple of committers, I will move forward to clean up this patch. As you can see above, it already works modulo some bug with block token. With some more comments, a little refactoring, and a build target to regenerate code, I think we could commit this. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047305#comment-13047305 ] Suresh Srinivas commented on HDFS-2058: --- This is really cool Todd! If you think this patch is ready for review, I will review it. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047308#comment-13047308 ] Eli Collins commented on HDFS-2058: --- @Todd, this is awesome, complimentary to HADOOP-7347. I agree protobufs is a sound choice, thanks for spelling out your analysis. How about filing jiras for the other HDFS protocols/issues so we get a sense of the scope of the remaining effort? DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047311#comment-13047311 ] Todd Lipcon commented on HDFS-2058: --- bq. If you think this patch is ready for review, I will review it. It's not polished as much as I'd like - eg each of the protobufs should have a nice bit of doc explaining when it's sent, and there's a little refactoring/moving of classes that could be done. But, if you want to review it for the overall content and not polish, it should be fine to start on it now. I can post cleanup as a diff. Or, we could even commit it and then clean up and improve docs as a followup, since let's be honest - it's not that clean as it stands now :) Committing this quicker rather than slower will also save a lot of time. As you know, patches like this fall out of date fast and the merges are a pain. bq. How about filing jiras for the other HDFS protocols/issues so we get a sense of the scope of the remaining effort? Sure. I have an idea what I would tackle next. The nice thing is that this can be done incrementally -- ie it's OK to use the existing RPC for calls to the NN and use protobuf for data transfer. The two protocols are non-overlapping, and there is benefit even at the half done state. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047320#comment-13047320 ] Tsz Wo (Nicholas), SZE commented on HDFS-2058: -- Nico work! DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047328#comment-13047328 ] Todd Lipcon commented on HDFS-2058: --- btw, a lot of the credit for this work goes to Nicholas who has done a lot of good cleanup in DataTransferProtocol over the last few months. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047346#comment-13047346 ] Hadoop QA commented on HDFS-2058: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482078/hdfs-2058.txt against trunk revision 1134170. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 32 javac compiler warnings (more than the trunk's current 31 warnings). -1 findbugs. The patch appears to introduce 23 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/765//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/765//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/765//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/765//console This message is automatically generated. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047370#comment-13047370 ] Todd Lipcon commented on HDFS-2058: --- Cool, thanks Suresh. I'm working on some cleanup as well, I'll merge your changes into mine. A few comments: bq. Should we make DatanodeInfoProto members required As I was doing this I noted that it's kind of silly that we even send DatanodeInfo as part of the pipeline. All of the fields are unused for data transfer, right? All we really need is DatanodeId here, no? bq. Evenutally some of the messages can be common and could be moved out Yep, in my patch I'm doing now, I'm refactoring out things like block/datanode into a different file, since as you said they're not datatransfer specific. bq. I prefer generating DataTransferProtos.java instead of checking it in. I agree long term. In the short term what I'd like to do is check it in but also provide an ant target or shell script that can be used to generate protobufs. This allows new developers to get up to speed quicker without having to set up the full toolchain. Longer term we could have an ant task which downloaded a protoc binary based on current platform and used that, perhaps. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047374#comment-13047374 ] Suresh Srinivas commented on HDFS-2058: --- As I was doing this I noted that it's kind of silly that we even send DatanodeInfo as part of the pipeline. All of the fields are unused for data transfer, right? All we really need is DatanodeId here, no? You are right. But do you think the information we are sending could be useful for the client. I prefer leaving it as it is now. +1 for the change. Since findbugs warnings are for generated code - we should just suppress them in src/test/findbugsExclude.xml. I have not looked at javac warnings. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047403#comment-13047403 ] Hadoop QA commented on HDFS-2058: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482087/HDFS-2058.patch against trunk revision 1134397. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 32 javac compiler warnings (more than the trunk's current 31 warnings). -1 findbugs. The patch appears to introduce 21 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/767//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/767//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/767//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/767//console This message is automatically generated. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047406#comment-13047406 ] Arun C Murthy commented on HDFS-2058: - {quote} Protobuf has two nice features that thrift doesn't have yet: 1) when unknown data is read, it is maintained in a map and then put back on the wire if the same object is rewritten. 2) it has a decent plugin system that makes it easy to modify the generated code – even with a plugin written in python or Java, in theory. These could be implemented in Thrift, but again, I didn't want to take the time. {quote} Similar reasons to choose PB for MRv2. Also, MRv2 has a pluggable layer, so in theory, it could be anything. We choose PB for exact same reasons outlined by Todd. But, I'd encourage keeping serialization and in-memory data-structures separate ala MR-279. Mainly, we need to move ahead with something. +1 DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047514#comment-13047514 ] Hadoop QA commented on HDFS-2058: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482098/hdfs-2058.txt against trunk revision 113. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 31 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/768//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/768//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/768//console This message is automatically generated. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047516#comment-13047516 ] Suresh Srinivas commented on HDFS-2058: --- Comments for the patch in this jira: # Name ProtoUtil.java HDFSProtoUtil.java # With change in build.xml, do you still need to add generated file? Comments: (could be taken care of in a separate jira) # Move ByteBufferOutputStream to common # Create ProtoUtil.java in common and move vint related methods into that. # ExactSizeInputStream constructor needs javadoc. Calling the parameter numBytes as remaining seems more appropriate. # For the tests added instead of catching the exception with expected comments, you could do @Test(expected = EOFException.class) DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047753#comment-13047753 ] Suresh Srinivas commented on HDFS-2058: --- One thing missed - ExactSizeInputStream could move to common as well. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047767#comment-13047767 ] Suresh Srinivas commented on HDFS-2058: --- +1 for the change. it problematic since I want to only be sure that the exception is thrown at that one spot in the test, and not earlier or later than it's supposed to be. Your tests are small enough and the your check is specific enough. So there is not other code that could throw EOFException. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047768#comment-13047768 ] Suresh Srinivas commented on HDFS-2058: --- Your tests are small enough and the your check is specific enough. So there is not other code that could throw EOFException. This is a minor thing. No need to hold up the patch for it. The patch is good to be committed. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047786#comment-13047786 ] Todd Lipcon commented on HDFS-2058: --- bq. This is a minor thing. No need to hold up the patch for it. The patch is good to be committed. Awesome. I'll commit this assuming Hudson comes back green. I'm very excited how fast this went from idea to completion -- 24 hrs! Thanks again. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs
[ https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047804#comment-13047804 ] Hadoop QA commented on HDFS-2058: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482115/hdfs-2058.txt against trunk revision 1134458. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 31 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/771//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/771//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/771//console This message is automatically generated. DataTransfer Protocol using protobufs - Key: HDFS-2058 URL: https://issues.apache.org/jira/browse/HDFS-2058 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira