Hi,

Is there a way to skip checksum while loading a file using Hadoop's C API?

I have a sample program that just counts the number of spaces in a HDFS
file -

int main(int argc, char *argv[]) {
  long count = 0;
  long buflen = 2147483645; // Integer.MAX_VALUE - 2
  char path[35] = "hdfs://127.0.0.1:9000";
  strcat(path, argv[1]);
  hdfsFS x2 = hdfsConnect("127.0.0.1", 9000);
  hdfsFile x3 = hdfsOpenFile(x2, path, 0, 0, 0, 0);
  hdfsFileInfo *info;
  info = hdfsGetPathInfo(x2, path);
  tOffset length = info->mSize;

  char *buffer = calloc(buflen, sizeof(char));
  long i = 0;
  while (i < length) {
    long diff = length - i;
    int toread = ((long)buflen) > diff ? (int)diff : buflen;
    hdfsPread(x2, x3, i, buffer, toread);
    for (int j = 0; j<toread; j++) {
      if (buffer[j] != ' ') {
        count++;
      }
    }
    i += toread;
  }
  printf("%ld\n", count);
  free(buffer);
  hdfsFreeFileInfo(info, 1);
  hdfsCloseFile(x2, x3);
  return 1;
}

Is there a way to skip checksum for the above operation over an HDFS file?

I know that there is a way to skip checksum using
hadoopRzOptionsSetSkipChecksum(), but that uses different API calls to load
the file.

Thanks,

-- 
Pratyush Das

Reply via email to