Re: can't read the SequenceFile correctly
On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote: Hey Tom, I got also burned by this ?? Why does BytesWritable.getBytes() returns non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of function. It does it because continually resizing the array to the valid length is very expensive. It would be a reasonable patch to add a getValidBytes, but most methods in Java's libraries are aware of this and let you pass in byte[], offset, and length. So once you realize what the problem is, you can work around it. -- Owen
Re: can't read the SequenceFile correctly
+1 on something like getValidBytes(). Just the existence of this would warn many programmers about getBytes(). Raghu. Owen O'Malley wrote: On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote: Hey Tom, I got also burned by this ?? Why does BytesWritable.getBytes() returns non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of function. It does it because continually resizing the array to the valid length is very expensive. It would be a reasonable patch to add a getValidBytes, but most methods in Java's libraries are aware of this and let you pass in byte[], offset, and length. So once you realize what the problem is, you can work around it. -- Owen
Re: can't read the SequenceFile correctly
Hi Mark, Not all the bytes stored in a BytesWritable object are necessarily valid. Use BytesWritable#getLength() to determine how much of the buffer returned by BytesWritable#getBytes() to use. Tom On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, I have written binary files to a SequenceFile, seemeingly successfully, but when I read them back with the code below, after a first few reads I get the same number of bytes for the different files. What could go wrong? Thank you, Mark reader = new SequenceFile.Reader(fs, path, conf); Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf); Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf); long position = reader.getPosition(); while (reader.next(key, value)) { String syncSeen = reader.syncSeen() ? * : ; byte [] fileBytes = ((BytesWritable) value).getBytes(); System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen, key, fileBytes.length); position = reader.getPosition(); // beginning of next record }
Re: can't read the SequenceFile correctly
Indeed, this was the answer! Thank you, Mark On Fri, Feb 6, 2009 at 4:25 AM, Tom White t...@cloudera.com wrote: Hi Mark, Not all the bytes stored in a BytesWritable object are necessarily valid. Use BytesWritable#getLength() to determine how much of the buffer returned by BytesWritable#getBytes() to use. Tom On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, I have written binary files to a SequenceFile, seemeingly successfully, but when I read them back with the code below, after a first few reads I get the same number of bytes for the different files. What could go wrong? Thank you, Mark reader = new SequenceFile.Reader(fs, path, conf); Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf); Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf); long position = reader.getPosition(); while (reader.next(key, value)) { String syncSeen = reader.syncSeen() ? * : ; byte [] fileBytes = ((BytesWritable) value).getBytes(); System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen, key, fileBytes.length); position = reader.getPosition(); // beginning of next record }
RE: can't read the SequenceFile correctly
Hey Tom, I got also burned by this ?? Why does BytesWritable.getBytes() returns non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of function. Best Bhupesh -Original Message- From: Tom White [mailto:t...@cloudera.com] Sent: Fri 2/6/2009 2:25 AM To: core-user@hadoop.apache.org Subject: Re: can't read the SequenceFile correctly Hi Mark, Not all the bytes stored in a BytesWritable object are necessarily valid. Use BytesWritable#getLength() to determine how much of the buffer returned by BytesWritable#getBytes() to use. Tom On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, I have written binary files to a SequenceFile, seemeingly successfully, but when I read them back with the code below, after a first few reads I get the same number of bytes for the different files. What could go wrong? Thank you, Mark reader = new SequenceFile.Reader(fs, path, conf); Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf); Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf); long position = reader.getPosition(); while (reader.next(key, value)) { String syncSeen = reader.syncSeen() ? * : ; byte [] fileBytes = ((BytesWritable) value).getBytes(); System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen, key, fileBytes.length); position = reader.getPosition(); // beginning of next record }
can't read the SequenceFile correctly
Hi, I have written binary files to a SequenceFile, seemeingly successfully, but when I read them back with the code below, after a first few reads I get the same number of bytes for the different files. What could go wrong? Thank you, Mark reader = new SequenceFile.Reader(fs, path, conf); Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf); Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf); long position = reader.getPosition(); while (reader.next(key, value)) { String syncSeen = reader.syncSeen() ? * : ; byte [] fileBytes = ((BytesWritable) value).getBytes(); System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen, key, fileBytes.length); position = reader.getPosition(); // beginning of next record }