[GitHub] orc issue #128: ORC-178 Implement Basic C++ Writer and Writer Option

2017-07-05 Thread xndai
Github user xndai commented on the issue:

https://github.com/apache/orc/pull/128
  
Hi @omalley , please see my replies and the new commit. Let me know if you 
have further questions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #134: Orc 17

2017-07-05 Thread xndai
Github user xndai commented on a diff in the pull request:

https://github.com/apache/orc/pull/134#discussion_r125729396
  
--- Diff: c++/src/OrcHdfsFile.cc ---
@@ -34,15 +34,13 @@
 #include "common/hdfs_configuration.h"
 #include "common/configuration_loader.h"
 
-#define BUF_SIZE 1048576 //1 MB
-
 namespace orc {
 
   class HdfsFileInputStream : public InputStream {
--- End diff --

Does HDFS C++ provide a mocked service interface? My concern is the 
testability of HdfsFileInputStream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #134: Orc 17

2017-07-05 Thread xndai
Github user xndai commented on a diff in the pull request:

https://github.com/apache/orc/pull/134#discussion_r125728966
  
--- Diff: c++/src/OrcHdfsFile.cc ---
@@ -123,17 +123,21 @@ namespace orc {
   uint64_t length,
   uint64_t offset) override {
 
-  size_t last_bytes_read = 0;
-
   if (!buf) {
 throw ParseError("Buffer is null");
   }
 
   hdfs::Status status;
-  status = file->PositionRead(buf, static_cast(length), 
static_cast(offset), _bytes_read);
-  if(!status.ok()) {
-throw ParseError("Error reading the file: " + status.ToString());
-  }
+  size_t total_bytes_read = 0;
+  size_t last_bytes_read = 0;
+  
+  do {
+status = file->PositionRead(buf, static_cast(length) - 
total_bytes_read, static_cast(offset + total_bytes_read), 
_bytes_read);
--- End diff --

Make sure to wrap on 80 characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #134: Orc 17

2017-07-05 Thread xndai
Github user xndai commented on a diff in the pull request:

https://github.com/apache/orc/pull/134#discussion_r125728350
  
--- Diff: c++/src/OrcHdfsFile.cc ---
@@ -66,22 +64,22 @@ namespace orc {
 options = config->GetOptions();
   }
   hdfs::IoService * io_service = hdfs::IoService::New();
-  //Wrapping fs into a shared pointer to guarantee deletion
-  std::shared_ptr 
fs(hdfs::FileSystem::New(io_service, "", options));
-  if (!fs) {
+  //Wrapping file_system into a unique pointer to guarantee deletion
+  file_system = 
std::unique_ptr(hdfs::FileSystem::New(io_service, "", 
options));
+  if (!file_system) {
--- End diff --

Unfortunately unique_ptr can be redefined as auto_ptr if platform doesn't 
support unique_ptr (see orc_config.hh). We talked about removing this redefine, 
but it hasn't been done yet. So I'd suggest we stick to file_system.get() != 
nullptr for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #134: Orc 17

2017-07-05 Thread AnatoliShein
Github user AnatoliShein commented on a diff in the pull request:

https://github.com/apache/orc/pull/134#discussion_r125674828
  
--- Diff: c++/src/OrcHdfsFile.cc ---
@@ -121,39 +116,24 @@ namespace orc {
 }
 
 uint64_t getNaturalReadSize() const override {
-  return BUF_SIZE;
+  return 1048576; //1 MB
 }
 
 void read(void* buf,
   uint64_t length,
   uint64_t offset) override {
 
-  ssize_t total_bytes_read = 0;
   size_t last_bytes_read = 0;
 
   if (!buf) {
 throw ParseError("Buffer is null");
   }
 
   hdfs::Status status;
-  char input_buffer[BUF_SIZE];
-  do{
-//Reading file chunks
-status = file->PositionRead(input_buffer, sizeof(input_buffer), 
offset, _bytes_read);
-if(status.ok()) {
-  //Writing file chunks to buf
-  if(total_bytes_read + last_bytes_read >= length){
-memcpy((char*) buf + total_bytes_read, input_buffer, length - 
total_bytes_read);
-break;
-  } else {
-memcpy((char*) buf + total_bytes_read, input_buffer, 
last_bytes_read);
-total_bytes_read += last_bytes_read;
-offset += last_bytes_read;
-  }
-} else {
-  throw ParseError("Error reading the file: " + status.ToString());
-}
-  } while (true);
+  status = file->PositionRead(buf, static_cast(length), 
static_cast(offset), _bytes_read);
--- End diff --

This makes sense. I just added a loop here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #134: Orc 17

2017-07-05 Thread AnatoliShein
Github user AnatoliShein commented on a diff in the pull request:

https://github.com/apache/orc/pull/134#discussion_r125674525
  
--- Diff: c++/src/OrcHdfsFile.cc ---
@@ -34,15 +34,13 @@
 #include "common/hdfs_configuration.h"
 #include "common/configuration_loader.h"
 
-#define BUF_SIZE 1048576 //1 MB
-
 namespace orc {
 
   class HdfsFileInputStream : public InputStream {
--- End diff --

I don't currently have the UTs for this class because to test it we need a 
hadoop cluster, and in order to fire one up we would need to bring the entire 
hadoop tree into ORC (which is actually much larger than ORC), and it can make 
things very heavy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] orc pull request #134: Orc 17

2017-07-05 Thread AnatoliShein
Github user AnatoliShein commented on a diff in the pull request:

https://github.com/apache/orc/pull/134#discussion_r125668140
  
--- Diff: c++/src/OrcHdfsFile.cc ---
@@ -66,22 +64,22 @@ namespace orc {
 options = config->GetOptions();
   }
   hdfs::IoService * io_service = hdfs::IoService::New();
-  //Wrapping fs into a shared pointer to guarantee deletion
-  std::shared_ptr 
fs(hdfs::FileSystem::New(io_service, "", options));
-  if (!fs) {
+  //Wrapping file_system into a unique pointer to guarantee deletion
+  file_system = 
std::unique_ptr(hdfs::FileSystem::New(io_service, "", 
options));
+  if (!file_system) {
--- End diff --

Yes, unique pointer actually supports this notation (thanks to [operator 
bool](http://en.cppreference.com/w/cpp/memory/unique_ptr/operator_bool))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---