[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410383#comment-16410383 ] Hudson commented on HDFS-9144: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13869 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13869/]) HDFS-9144. Refactoring libhdfs++ into stateful/ephemeral objects. (james.clampffer: rev d7ecf396c9993c750c64960143bb35379f1c0f64) * (delete) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/inputstream.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/libhdfspp/options.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/remote_block_reader_test.cc * (delete) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/inputstream_impl.h * (delete) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/remote_block_reader_impl.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filesystem.h * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/connection/datanodeconnection.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/protobuf.h * (delete) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/remote_block_reader.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/bindings/c/hdfs.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filesystem.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/block_reader.h * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filehandle.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/bad_datanode_test.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/mock_connection.h * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/fileinfo.h * (delete) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/bindings/c/hdfs_cpp.cc * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/connection/CMakeLists.txt * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/connection/datanodeconnection.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/rpc/rpc_engine.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/CMakeLists.txt * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.cc * (delete) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/inputstream_test.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/hdfs_public_api.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/bindings/c/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/datatransfer.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/asio.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/libhdfspp/hdfs.h * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filehandle.cc * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/async_stream.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/datatransfer_impl.h * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/block_reader.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/options.cc * (delete) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/bindings/c/hdfs_cpp.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/CMakeLists.txt > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task >
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037729#comment-15037729 ] James Clampffer commented on HDFS-9144: --- Thanks for reviewing the rebase [~bobthansen]. I've committed your patch to HDFS-8707. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, > HDFS-9144.HDFS-8707.006.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036541#comment-15036541 ] James Clampffer commented on HDFS-9144: --- And by HDFS-9559 I meant HDFS-9359. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, > HDFS-9144.HDFS-8707.006.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036774#comment-15036774 ] Hadoop QA commented on HDFS-9144: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} | {color:red} HDFS-9144 does not apply to HDFS-8707. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-9144 | | GITHUB PR | https://github.com/apache/hadoop/pull/43 | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13740/console | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13740/console | This message was automatically generated. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, > HDFS-9144.HDFS-8707.006.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036692#comment-15036692 ] Bob Hansen commented on HDFS-9144: -- Thanks for rebasing those changes, [~James Clampffer]. +1 on the changes. It looks like the refactored lifecycle management clears up the segfault that has been making test_libhdfs_threaded_hdfspp_test_shim_static segfault. It was segfaulting fairly regularly on my dev machine before applying the current patch, and after applying it, I've run through ~40 iterations of native test without an error. I can't say for sure that it clears up HDFS-9486, but it definitely makes the integration tests more stable on my machine. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, > HDFS-9144.HDFS-8707.006.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022627#comment-15022627 ] James Clampffer commented on HDFS-9144: --- "It's to ensure that the BlockReaderImpl's lifecycle is tied to the request, and won't be freed until the final callback is complete in the read_handler. I suppose it could be a unique_ptr, but I went with shared_ptr for consistency." Cool, I think shared_ptr is fine. I wanted to make sure I wasn't missing something. Thanks for rebasing and the last round of fixes. +1 on the patch At the moment my plan is to commit HDFS-9368, then HDFS-9359 (depends on HDFS-9368 and needs reviewing), and then make sure everything is stable under stress. Once everything seems solid we can add this. Those two patches shouldn't require any further changes to your patch. Seem reasonable? > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022647#comment-15022647 ] Bob Hansen commented on HDFS-9144: -- Absolutely. I'm looking forward to seeing the results of those testing patches under this one. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022116#comment-15022116 ] Hadoop QA commented on HDFS-9144: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s {color} | {color:red} HDFS-9144 does not apply to HDFS-8707. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-9144 | | GITHUB PR | https://github.com/apache/hadoop/pull/43 | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13617/console | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13617/console | This message was automatically generated. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022211#comment-15022211 ] Bob Hansen commented on HDFS-9144: -- {quote} -The synchronous FileSystem::Open isn't declared pure virtual, it probably should be. -hdfsConnect looks like it can leak an IoService if FileSystem::New returns null. Unlikely but possible as far as I can tell. {quote} Sorry - editing problems. Both of those issues were addressed in the new patch. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022209#comment-15022209 ] Bob Hansen commented on HDFS-9144: -- [~James Clampffer]: thanks again for taking the time to review this large patch. {quote} -The synchronous FileSystem::Open isn't declared pure virtual, it probably should be. -hdfsConnect looks like it can leak an IoService if FileSystem::New returns null. Unlikely but possible as far as I can tell. -Why is GetRandomClientName declared inline? Is this just to trick the compiler into letting it stay in the header? Probably not a whole lot to gain by inlining a non-leaf function that large; consider moving implementation to a .cc file? -DataNodeConnectionImpl could probably have it's constructor implementation in a .cc file. I think you could set conn_ in the initializer list to get rid of the conn_.reset call in the ctor. {quote} The excessive inlining in the headers were mostly an artifact of refactoring; that's how they were in the original. I've moved them all into .cc files. bq. -BlockReaderImpl has a shared_ptr member, what is this being shared with? Is that just so it can pass a reference to itself to continuations? It's to ensure that the BlockReaderImpl's lifecycle is tied to the request, and won't be freed until the final callback is complete in the read_handler. I suppose it could be a unique_ptr, but I went with shared_ptr for consistency. bq. -DataTransferSaslStream::Handshake still uses a templated callback which I think a lot of this patch is moving away from. Fixing up those little pieces can be deferred to a later jira as it looks like nothing is calling it at the moment. I did as little in the SaslStream as possible, given its uncertain future. We can revisit that when we revisit SaslStream. bq. -Have you been able to run anything against the C++ or C APIs, with and without memcheck, to verify that the refactor didn't introduce any subtle bugs? I did for patch 2; I'll do some more and report if anything comes up. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018351#comment-15018351 ] Bob Hansen commented on HDFS-9144: -- [~James Clampffer]: thanks for the review I have attached another version that is rebased on the DN retry stuff. There were a lot of conflicts if you want to give it another once-over. I will address your other comments in a further patch; you can wait for them to re-review if you would like. Thanks for taking the time to slog through all the changes. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018585#comment-15018585 ] James Clampffer commented on HDFS-9144: --- The rebased changes look good to me. Hopefully the changes due to DN retry weren't too painful. Looking forward to seeing the next patch. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, > HDFS-9144.HDFS-8707.004.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014569#comment-15014569 ] James Clampffer commented on HDFS-9144: --- Some feedback, as well as some questions to make sure I understand everything correctly: -needs to be rebased to head now that dn retry is added -The synchronous FileSystem::Open isn't declared pure virtual, it probably should be. -hdfsConnect looks like it can leak an IoService if FileSystem::New returns null. Unlikely but possible as far as I can tell. -Why is GetRandomClientName declared inline? Is this just to trick the compiler into letting it stay in the header? Probably not a whole lot to gain by inlining a non-leaf function that large; consider moving implementation to a .cc file? -DataNodeConnectionImpl could probably have it's constructor implementation in a .cc file. I think you could set conn_ in the initializer list to get rid of the conn_.reset call in the ctor. -BlockReaderImpl has a shared_ptr member, what is this being shared with? Is that just so it can pass a reference to itself to continuations? -DataTransferSaslStream::Handshake still uses a templated callback which I think a lot of this patch is moving away from. Fixing up those little pieces can be deferred to a later jira as it looks like nothing is calling it at the moment. -Have you been able to run anything against the C++ or C APIs, with and without memcheck, to verify that the refactor didn't introduce any subtle bugs? Overall I think it's a solid set of abstractions. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012038#comment-15012038 ] Suresh Srinivas commented on HDFS-9144: --- bq. I will work very hard in the future to coordinate with the community to scope changes up-front, and get them into bite-sized pieces. [~bobhansen], the expectation is not as severe as bite-sized pieces. Reasonable breakdown of patch to make reviewer's job easy. I think this is something that can be done by an individual contributor (and community may have feedback/ideas on further breakdown). > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009385#comment-15009385 ] Haohui Mai commented on HDFS-9144: -- I'm not yet confident enough to review / commit this change as is given (1) the scope of the changes and (2) there are pending issues still need some discussions. I think in order to move forward we'll need to separate the patch into multiple jiras. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004595#comment-15004595 ] Suresh Srinivas commented on HDFS-9144: --- [~bobhansen], it is very difficult to review a patch that mixes code changes with large scale refactors. Please refactor the code separately. Such code is very easy to review and commit. [~wheat9], given it takes a long time to split this, can the patch be committed as is or with minimal changes? I can help you with the review. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004856#comment-15004856 ] Bob Hansen commented on HDFS-9144: -- [~sureshms] - I will work very hard in the future to coordinate with the community to scope changes up-front, and get them into bite-sized pieces. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004033#comment-15004033 ] Bob Hansen commented on HDFS-9144: -- [~wheat9]: thank you for the salient questions. bq. What is the purpose of having multiple inheritances on FileHandle, FileHandleImpl, CancelHandle, CancelHandleImpl? (I assume you mean multiple layers of inheritance, not multiple inheritance) The intention there was in interface/implementation separation. The primary consumer C++ interface will be in interactions with FileSystem and FileHandle (the renamed InputStream), and I wanted to keep the public surface of those as small as possible. There is no CancelHandleImpl; if you mean Cancelable vs. CancelHandle, the intention is that Cancelable is an interface for something that can be canceled. CancelHandle encapsulates a copyable object that ensures that the lifecycle of the cancelable object does not exceed the lifecycle of the handle. I apologize for the lack of comments in this module. bq. Why having multi-inheritance? The only instance that I recall (correct me if I am wrong) is in the MockDNConnection where I wanted to bridge the (well-written, thank you) MockConnection to the DNConnection abstraction. In a perfect world, I would have refactored the NNCode to also use the AsyncConnection interface and had the base MockConnection simple implement AsyncConnection, but the timeframe for this patch had already dragged on too long. I can add that additional refactoring to this patch, but that doesn't seem to be the way you would like it to go. bq. What is the principle of life cycle management here? Why changing pointers to shared_ptr in RemoteBlockReader? It's deliberately left as normal pointer as continuations do not own the stream, but the State object own it. The move to shared pointers for streams and the NNEngine comes from the uncertain nature of the asio callbacks. It is very hard to know when there may be a queued but unsatisfied call in the asio queue, and we have had numerous instances where we had a segfault because there was a reference to a destroyed instance in a handler. Once we get real integration tests with real servers, I suspect we will be seeing more of them. Using shared pointers ensures that the object will be valid, even if the the operation becomes a no-op (such as the underlying fd being closed). bq. What is the purpose of copying the callback handlers? Obviously, in they need to be copied eventually to keep the references that they contain alive. I made them pass-by-value in the interfaces to make it unambiguous in the interface that the implementation was making copies of the callbacks. If the public interface includes references, there could be uncertainty whether the consumer needed to keep the reference to the handle valid. You and I both know what we eventually make copies deep in the implementation, but kind interface design is to remove the question entirely. It is possible, if there is a _lot_ of state captured directly in the callback, that the copy might involve heap allocation, but in the vast majority of the cases, it is a simple object copy. I don't suspect that that will be on the critical path for performance for a long, long time. bq. What is cancelable? Is a request cancelable or a stream cancelable? Why AsyncStream inherits Cancelable. Cancelable is an aspect that can have its async operations canceled. We have a strong use case for canceling AsyncStream operations in that they can have potentially unbounded latency (if a server has gone cataonic during a client read, for example). The current implementation of DNConnection.cancel is naive but correct in order to respond to your request to show how AsyncReads could be canceled. A more sophisticated approach would support canceling individual reads in addition to all reads on a DNConnection to satisfy different use cases. Again, I apologize for the lack of comments in the Cancelable classes describing their bounds; they were included purely as an example of how cancellation could be implemented with the refactoring. bq. The patch contains many unnecessary changes and can be separated into several patches. There are refactors in NN, DN, various changes from here and there. It needs to be separated, reviewed and applied independently. That would be very difficult. Because the structure of the trait injection and state management were very closely intertwined in the original code, it came naturally to refactor them both together. Approximately a week and a half of engineering time was necessary to get them untangled and correct to this state and are not easily separable as can be seen from the (too verbose, sorry) change lot; can I ask for some flexibility in not requiring substantial additional engineering just to make review slightly smoother? I am
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003043#comment-15003043 ] Haohui Mai commented on HDFS-9144: -- A pretty large patch. Can you explain what you're trying to achieve here? Some more questions: * What is the purpose of having multiple inheritances on {{FileHanlde}}, {{FileHandleImpl}}, {{CancelHandle}}, {{CancelHandleImpl}}? * Why having multi-inheritance? * What is the principle of life cycle management here? Why changing pointers to shared_ptr in {{RemoteBlockReader}}? It's deliberately left as normal pointer as continuations do not own the stream, but the {{State}} object own it. What is the purpose of copying the callback handlers? * What is cancelable? Is a request cancelable or a stream cancelable? Why {{AsyncStream}} inherits {{Cancelable}}. The patch contains many unnecessary changes and can be separated into several patches. There are refactors in NN, DN, various changes from here and there. It needs to be separated, reviewed and applied independently. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002148#comment-15002148 ] Hadoop QA commented on HDFS-9144: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 32s {color} | {color:green} HDFS-8707 passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 26s {color} | {color:red} root in HDFS-8707 failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 18s {color} | {color:red} root in HDFS-8707 failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 0s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 0s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 0s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 0s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 33s {color} | {color:red} root in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 1m 33s {color} | {color:red} root in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 33s {color} | {color:red} root in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 29s {color} | {color:red} root in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 1m 29s {color} | {color:red} root in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 29s {color} | {color:red} root in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 0s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 12 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 0s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s {color} | {color:red} Patch generated 426 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 10m 57s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-12 | | JIRA Issue | HDFS-9144 | | GITHUB PR | https://github.com/apache/hadoop/pull/43 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml cc | | uname | Linux be7da899a6d3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build@2/patchprocess/apache-yetus-fa12328/precommit/personality/hadoop.sh | | git revision | HDFS-8707 / 3ce4230 | | compile | https://builds.apache.org/job/PreCommit-HDFS-Build/13485/artifact/patchprocess/branch-compile-root-jdk1.8.0_60.txt | | compile |
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997424#comment-14997424 ] James Clampffer commented on HDFS-9144: --- There's a few things I like about this: -std::function for callbacks makes it a lot easier to figure out what the API expects. -Polymorphism makes things a lot more flexible than templates (and I expect the performance impact to be negligible). -Separating stateful and stateless components makes the interfaces clear and I think will reduce the chances of introducing bugs that assume things won't change in certain places. -Merging FileSystem/InputStream and HadoopFileSystem/FileHandle is a major improvement for maintainability. Waiting on this is starting to block or at least complicate other work so I'd like it to get in soon, it seems like a solid improvement to me. +1 > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993919#comment-14993919 ] James Clampffer commented on HDFS-9144: --- We're using this branch as an opportunity to try using github code reviews for the project. The reason for trying github is the nice code review tools for pull requests where comments can be written inline with the relevant code. Here's the link: https://github.com/bobhansen/hadoop/pull/1/files I still plan on including a higher level summary here once I finish adding inline comments. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993929#comment-14993929 ] Bob Hansen commented on HDFS-9144: -- Until we get the github/apache integration automatic, let's keep Apache JIRA as the sole repository for review. A github pull request can be a great tool for visualizing changes and collecting thoughts (and I would encourage anybody to use it), but let's get all the feedback and discussion here for now. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993935#comment-14993935 ] Bob Hansen commented on HDFS-9144: -- Well, according to https://blogs.apache.org/infra/entry/improved_integration_between_apache_and, if the PR is against the Apache repo and mentions the JIRA number, we have replication both ways. Next time, we'll give that a try. Has anybody used that in the real world and care to comment? > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994465#comment-14994465 ] Bob Hansen commented on HDFS-9144: -- Hm. Very noisy. Probably want a squashed pull next time, but it looks like it is available for using GitHub for review. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994458#comment-14994458 ] ASF GitHub Bot commented on HDFS-9144: -- GitHub user bobhansen opened a pull request: https://github.com/apache/hadoop/pull/43 HDFS-9144: libhdfs++ refactoring Code changes for HDFS-9144 as described in the JIRA. Removing some templates and traits and restructuring the code for more modularity. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bobhansen/hadoop HDFS-9144-merge Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/43.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #43 commit 1fb1ea527c9b5321e6da6c2543859db2ec3eaf7c Author: Bob HansenDate: 2015-10-22T11:58:41Z Refactored NameNodeConnection commit c6cf5175b9c21561bdcbd22be27f50e22a1d3ebd Author: Bob Hansen Date: 2015-10-22T12:01:36Z Removed fs_ from InputStream commit 8b8190d334224d8acec9a4bef97d5e0226c1045a Author: Bob Hansen Date: 2015-10-22T13:05:53Z Moved GetBlockInfo to NN connection commit 108b54f3079ed21149a59b9222d6d9832ee05d79 Author: Bob Hansen Date: 2015-10-22T13:20:56Z Moved GetBlockLocations to std::function commit 6d112a17048bcec437701b422209641e56f6196e Author: Bob Hansen Date: 2015-10-22T13:48:02Z Added comments commit e57b0ed02e29781f347499f0f3546659870aabab Author: Bob Hansen Date: 2015-10-22T13:52:39Z Stripped whitespace commit c9c82125e8c0b742ee3a70d6fdbdedca180cdd4f Author: Bob Hansen Date: 2015-10-27T16:07:33Z Renamed NameNodeConnection to NameNodeOperations commit 01499b6027ec771ebf04d4723899ee976b2a6044 Author: Bob Hansen Date: 2015-10-27T23:26:26Z Renamed input_stream and asio_continuation commit 02c67837fe832e45286a675f1a27fa29e1b80a9a Author: Bob Hansen Date: 2015-10-27T23:30:44Z Renamed CreatePipeline to Connect commit 5d28d02e1752be74975647f8dc656776ab9e2cbf Author: Bob Hansen Date: 2015-10-27T23:58:18Z Rename async_connect to async_request commit 9d98bf41091c923103cbeeadb5459c3119b50584 Author: Bob Hansen Date: 2015-10-28T13:01:38Z Renamed read_some to read_packet commit 6ced4a97e297ce0e833db8dbd4b38c91c966d71c Author: Bob Hansen Date: 2015-10-28T13:15:50Z Renamed async_request to async_request_block commit f05a771e578969b9b281de4e0c97887f98b0f2cf Author: Bob Hansen Date: 2015-10-28T13:19:09Z Renamed BlockReader::request to request_block commit fcf1585bf67f84ef8c0acc72660d2ad250005e3b Author: Bob Hansen Date: 2015-10-28T19:12:39Z Moved to file_info commit a3fd975285b25a3eae448e5ac46d0118a14d6610 Author: Bob Hansen Date: 2015-10-28T19:16:20Z Made file_info pointers const commit 366f488b8e8364eba3f1966b931216d2bf404ae1 Author: Bob Hansen Date: 2015-10-28T21:37:46Z Refactored DataNodeConnection, etc. commit 418799feb8d12181d9e5bd6b6aa94333bb21e126 Author: Bob Hansen Date: 2015-10-29T13:53:46Z Added shared_ptr to DN_Connection commit f043e154a261e9ff64f1ead450e3a256ecd023a2 Author: Bob Hansen Date: 2015-10-29T15:31:28Z Moved DNConnection into trait commit aea859ff34a6768c7df29ec25f1abd2b92835b9e Author: Bob Hansen Date: 2015-10-29T15:32:12Z Trimmed whitespace commit 55d7b5dcd92b0fd9d0011e97d8f47e78c3316205 Author: Bob Hansen Date: 2015-10-29T17:23:30Z Re-enabled IS tests commit 142efabbda38852b431d94096d6cef69f5c96393 Author: Bob Hansen Date: 2015-10-29T17:31:05Z Cleaned up some tests commit 4bc0f448fe52a762a242428a1331272c9fee3247 Author: Bob Hansen Date: 2015-10-29T21:53:57Z Working on less templates commit dd16d4fa9f08f55f9d4140219471f002eca5a8ed Author: Bob Hansen Date: 2015-10-29T23:28:01Z Compiles! commit 2b14efa8277c66a3e9e0fb67af925501757d39f8 Author: Bob Hansen Date: 2015-10-30T20:46:52Z Fixed DNconnection signature commit 8d143e789a98431f8cd2cb08db37a0a05f4d9c77 Author: Bob Hansen Date: 2015-11-02T16:35:54Z Fixed segfault in ReadData commit b6f5454e626c1caa1b76398c9edf220fc1252be9 Author: Bob Hansen Date: 2015-11-02T18:36:15Z Removed BlockReader callback templates commit 3b5d712b454f5b817c22909bac2f3477a64624fe Author: Bob Hansen Date: 2015-11-02T18:52:16Z Removed last templates from BlockReader commit d9b9241f12a957226df7ccacad07d8e1a0d98cca Author: Bob Hansen Date:
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989813#comment-14989813 ] Bob Hansen commented on HDFS-9144: -- [~wheat9]: I re-named the "object that acts as a clean interface to the NameNode" as "NameNodeOperations," but I'm still not completely satisfied. This object should own both the RPCEngine and the ClientNameNodeProtocol objects and provide a nice, higher-level interface to the underlying operations (see the GetBlockLocations implementation). I considered just pushing the current protobuf-consuming methods into the private space of ClientNameNodeProtocol and adding clean public methods, but ClientNameNodeProtocol is generated code. If you can suggest a better refactoring, I would happily take feedback. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch, > HDFS-9144.HDFS-8707.002.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969191#comment-14969191 ] Bob Hansen commented on HDFS-9144: -- Thanks for the feedback; [~wheat9]: you are correct, the RpcEngine is owned by the NNConnection. By "familiar", I mean an easy path for those who are used to POSIX file semantics: open a file, sequential reads, seeks, close a file. We will, of course, also support parallel asynchronous positional reads, but I think we want a nice clean interface for POSIX-y semantics. Let's break the work down into smaller pieces. I started with abstracting out the NN connection in the attached patch. I will work on abstracting the DN connection next. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9144.HDFS-8707.001.patch > > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967592#comment-14967592 ] Haohui Mai commented on HDFS-9144: -- bq. Stateful quasi-posix API that will be familiar and easy to consume Can you elaborate what do you mean by familiar and easy to consume? Are you talking about the filesystem API in C++1y (https://msdn.microsoft.com/en-us/library/hh874694.aspx)? There's no reference of RPCEngine in DataNodeConnection. I think the plan might be too grand to execute. It needs to be broken down. Maybe it's nice to start with implementing AsyncReadBlockOperation and its cancel() method? > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965491#comment-14965491 ] James Clampffer commented on HDFS-9144: --- This looks like a solid plan to me. > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948661#comment-14948661 ] Bob Hansen commented on HDFS-9144: -- A proposed structure to start: __C-API__ hdfslib-compatible layer that is a thin wrapper around C++-posixish-API __C++-posixish-API__ Stateful quasi-posix API that will be familiar and easy to consume Embodies sane default policies and strategies for common operations implements all asynchronous operations has synchronous helpers for all asynchronous operations Wrapper around functional-API, below FileSystem: constructs with config object open() returns a FileHandle common NN operations holds state for dead DNs shared state and thread-safe (implement single lock for FS?) owns and is wrapper around NameNodeConnection FileHandle: supports implicit position and streaming reads (posixy) stateful and single-threaded with the exception of cancellation method thread-safe cancel() method will cancel any outstanding I/Os and deliver a cancellation error to its continuation implements reliable reads and error recovery Maintains a pointer to the posixy-FileSystem for operations on the dead DN Owns block map Read operation: will pick appropriate DNConnections and Will eventually cache DNConnections __functional-API__ Low-level implementation of composible asynchronous blocks NameNodeConnection: Has all configuration params explicitly passed in/set Owns TCP connection to NN Encapsulates method call to Message construction Refactoring of the current FileSystemImpl object Thread-safe methods May be connected or not DataNodeConnection: Owns TCP connection to the DN Owns RpcEngine Encapsulates method call to Message construction Encapsulates connecting and handshaking to the DN Thread-safe methods May be connected or not AsyncReadBlockOperation: Ephemeral object; performs operation once and is done Takes a DataNodeConnection, block extents as input Connects DataNodeConnection if necessary and makes RPC calls to read data Single-threaded (although wil. have callbacks from asio and will call into consuler handler from asio thread) outside of cancel() method Encapsulation of current InputStreamImpl::AsyncReadBlock method and its associated state PositionalReadOperation: Ephemeral object; performs operation once and is done Owns BlockReadOperation Owns DNConnection Given block map and snapshot of dead DN list, creates a new DN connection and kicks off BlockReadOperation Refactoring of current InputStreamImpl::PositionRead Single-threaded outside of cancel() method Cannot do DNConnection caching Functional convenince object for those not using FileHandle Some retry logic here? > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects
[ https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907970#comment-14907970 ] Bob Hansen commented on HDFS-9144: -- For clarification, these are all asynchronous (or at least primarily asychronous) interfaces, perhaps with a Future-like blocking mechanism > Refactor libhdfs into stateful/ephemeral objects > > > Key: HDFS-9144 > URL: https://issues.apache.org/jira/browse/HDFS-9144 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen > > In discussion for other efforts, we decided that we should separate several > concerns: > * A posix-like FileSystem/FileHandle object (stream-based, positional reads) > * An ephemeral ReadOperation object that holds the state for > reads-in-progress, which consumes > * An immutable FileInfo object which holds the block map and file size (and > other metadata about the file that we assume will not change over the life of > the file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)