Hi, The following snippet lets me iterate over each character of a file in HDFS -
// Opening the file Configuration conf = new Configuration(); FSDataInputStream in = null; FileSystem fs = FileSystem.get(conf); Path inFile = new Path(args[0]); in = fs.open(inFile); // Reading the file Reader reader = new BufferedReader(new InputStreamReader(in, Charset.forName(StandardCharsets.UTF_8.name()))); int c = 0; while ((c = reader.read()) != -1) { System.out.println((char)c); } But I imagine this is probably inefficient because of the BufferedReader. I tried something like - Configuration conf = new Configuration(); FSDataInputStream in = null; FileSystem fs = FileSystem.get(conf); Path inFile = new Path(args[0]); in = fs.open(inFile); ByteBuffer x = ByteBuffer.allocate(655360); int length = in.read(x); while (length > 0) { int c = 0; while (c < length) { System.out.println(x.getChar(c)); c++; } x.clear(); length = in.read(x); } Although this is significantly faster, this does not seem to be printing out the correct characters. What is the best way to iterate over each character of a file stored in HDFS? Thanks, -- Pratyush Das