Thanks in advance for any help. I have been banging my head against the wall on this one all day. When I run the cmd: hadoop fs -put /path/to/input /path/in/hdfs from the command line, the hadoop shell dutifully copies my entire file correctly, no matter the size.
I wrote a webservice client for an external service in python and I am simply trying to replicate the same command after retreiving some csv delimited results from the webservice cmd = ['hadoop', 'fs', '-put', '/path/to/input/', '/path/in/hdfs/'] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=256*1024*1024) output, errors = p.communicate() if p.returncode: raise OSError(errors) else: LOG.info( output ) without fail the hadoop shell only writes the first 4096 bytes of the input file (which according to the documentation is the default value for io.file.buffer.size) I have tried almost everything including adding -Dio.file.buffer.size=XXXXXX where XXXXXX is a really big number and NOTHING seems to work. Please help!
