Hi, I have tried different ways to create a large Hadoop SequnceFile with simply one short(<100bytes) key but one large (>1GB) value (BytesWriteable).
Firstly, I try https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/BigMapOutput.java which writes multiple random-length key and value into a large (>3GB) file. It works fine for me out-of-box, but it is not what I am trying to do. So I modified it using hadoop 2.2.0 API to something like: Path file = new Path("/input"); SequenceFile.Writer writer = SequenceFile.createWriter(conf, SequenceFile.Writer.file(file), SequenceFile.Writer.compression(CompressionType.NONE), SequenceFile.Writer.keyClass(BytesWritable.class), SequenceFile.Writer.valueClass(BytesWritable.class)); int numBytesToWrite = fileSizeInMB * 1024 * 1024; BytesWritable randomKey = new BytesWritable(); BytesWritable randomValue = new BytesWritable(); randomKey.setSize(1); randomValue.setSize(numBytesToWrite); randomizeBytes(randomValue.getBytes(), 0, randomValue.getLength()); writer.append(randomKey, randomValue); writer.close(); where I increasingly change fileSizeInMB. However, when fileSizeInMB>700MB, I am getting errors like java.lang.NegativeArraySizeException at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144) at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123) at com.openresearchinc.hadoop.sequencefile.Util.bigSeqFile(Util.java:229) I see some similar error being discussed, but not see any resolution. even int(2^32) can be as large as 2GB, it should not fail at 700MB. I will post other ways and exception later. Thanks in advance. -Qiming
