Hello Pratyush,

Zero-copy read is only possible when the client process attempting the read
is located on the same machine as a DataNode that hosts a replica of the
HDFS block.  The implementation relies on the mmap syscall to map the
underlying block file directly into the client process's address space.
This isn't possible across machine boundaries.

If the client process is not co-located with the block, then it falls back
to a buffer-copying read, using a ByteBufferPool as a factory for producing
buffers to receive copies of the data.  This path does not benefit from the
performance enhancements of zero-copy read, but at least it still works
functionally.  In practice, it would be rare that all blocks of a
multi-block file would be co-located with a single client, hence the
importance of applications scheduling work with locality in mind.

When hadoopReadZero fails, it also sets errno.  I suspect that if you were
to check errno, you would find the error is EPROTONOSUPPORT.  LibHDFS uses
this error code when 1) zero-copy read is not possible (e.g. client is not
co-located with the block replica), and 2) the caller has not provided any
ByteBufferPool for the fallback copying implementation.

I suggest adding this to your code:

hadoopRzOptionsSetByteBufferPool(opts, ELASTIC_BYTE_BUFFER_POOL_CLASS);

This function is documented in the header here:

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h#L978-L994

With this in place, your client will fall back to a buffer copy when the
process is not co-located with the HDFS block.  Some of this is
demonstrated in the test suite for LibHDFS zero copy too:

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c

I see you are already configuring the client to skip checksums.  The other
alternative is to use Centralized Cache Management
<https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html>,
which will perform an eager checksum validation before calling mlock to pin
the block into physical memory explicitly at the DataNode host.

Chris Nauroth


On Sun, Oct 24, 2021 at 1:42 PM Pratyush Das <reik...@gmail.com> wrote:

> Hi,
>
> I can successfully load files from HDFS via the C API like -
>
> #include "hdfs.h"
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include <stdint.h>
> #include <inttypes.h>
>
> int main(int argc, char **argv) {
>     hdfsFS fs = hdfsConnect("127.0.0.1", 9000);
>     const char* readPath = "/lineitem.tbl";
>     hdfsFile readFile = hdfsOpenFile(fs, readPath, O_RDONLY, 0, 0, 0);
>     if(!readFile) {
>           fprintf(stderr, "Failed to open %s for reading!\n", readPath);
>           exit(-1);
>     }
>     if (!hdfsFileIsOpenForRead(readFile)) {
>         fprintf(stderr, "hdfsFileIsOpenForRead: we just opened a file with
> O_RDONLY, and it did not show up as 'open for read'\n");
>         exit(-1);
>     }
>     int size_in_bytes = hdfsAvailable(fs, readFile);
>     fprintf(stderr, "hdfsAvailable: %d\n", size_in_bytes);
>     char *buffer;
>     buffer = (char*)malloc(sizeof(char)*(size_in_bytes+1));
>     memset(buffer, 0, sizeof(buffer));
>     int num_read_bytes = 0;
>     while (num_read_bytes < size_in_bytes) {
>         int rbytes = hdfsRead(fs, readFile, &buffer[num_read_bytes],
> size_in_bytes);
>         num_read_bytes += rbytes;
>     }
>     printf("%s\n", buffer);
>     printf("Total bytes read = %d\n", num_read_bytes);
>     free(buffer);
>     hdfsCloseFile(fs, readFile);
>     hdfsDisconnect(fs);
> }
>
> and I am able to see all the contents of the file printed out
> successfully.
>
> But when I try to use the zero copy API like -
>
> #include "hdfs.h"
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include <stdint.h>
> #include <inttypes.h>
>
> int main(int argc, char **argv) {
>     hdfsFS fs = hdfsConnect("127.0.0.1", 9000);
>     const char* readPath = "/lineitem.tbl";
>     hdfsFile readFile = hdfsOpenFile(fs, readPath, O_RDONLY, 0, 0, 0);
>     if(!readFile) {
>           fprintf(stderr, "Failed to open %s for reading!\n", readPath);
>           exit(-1);
>     }
>     if (!hdfsFileIsOpenForRead(readFile)) {
>         fprintf(stderr, "hdfsFileIsOpenForRead: we just opened a file with
> O_RDONLY, and it did not show up as 'open for read'\n");
>         exit(-1);
>     }
>     int size_in_bytes = hdfsAvailable(fs, readFile);
>     fprintf(stderr, "hdfsAvailable: %d\n", size_in_bytes);
>     struct hadoopRzOptions *opts = NULL;
>     opts = hadoopRzOptionsAlloc();
>     if (!opts) {
>         fprintf(stderr, "Unable to set zero copy options\n");
>         exit(-1);
>     }
>     if (hadoopRzOptionsSetSkipChecksum(opts, 1)) {
>         fprintf(stderr, "Unable to set skip checksum\n");
>         exit(-1);
>     }
>     /*if (hadoopRzOptionsSetByteBufferPool(opts, NULL)) {
>         fprintf(stderr, "Unable to set byte buffer pool\n");
>         exit(-1);
>     }*/
>     struct hadoopRzBuffer *hbuffer = NULL;
>     //hadoopRzBufferFree(readFile, hbuffer);
>     hbuffer = hadoopReadZero(readFile, opts, 100);
>     if (!hbuffer) {
>         fprintf(stderr, "Unable to read zero copy hdfs file\n");
>         exit(-1);
>     }
>     char *buffer; buffer = (char*)malloc(sizeof(char)*(size_in_bytes+1));
>     memset(buffer, 0, sizeof(buffer));
>     buffer = hadoopRzBufferGet(hbuffer);
>     int num_read_bytes = hadoopRzBufferLength(hbuffer);
>     printf("Actual size = %d\n", size_in_bytes);
>     printf("Bytes read = %d\n", num_read_bytes);
>     //printf("%s\n", buffer);
>     //printf("%s\n", buffer[size_in_bytes - 1000]);
>     hdfsCloseFile(fs, readFile);
> }
>
> I get the error - "Unable to read zero copy hdfs file" which means that
> hbuffer didn't read anything in.
>
> Am I doing something incorrectly?
>
> Thank you,
>
> --
> Pratyush Das
>

Reply via email to