Re: can't read the SequenceFile correctly

2009-02-09 Thread Owen O'Malley


On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:


Hey Tom,

I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes()  
kind of function.


It does it because continually resizing the array to the valid  
length is very expensive. It would be a reasonable patch to add a  
getValidBytes, but most methods in Java's libraries are aware of this  
and let you pass in byte[], offset, and length. So once you realize  
what the problem is, you can work around it.


-- Owen


Re: can't read the SequenceFile correctly

2009-02-09 Thread Raghu Angadi


+1 on something like getValidBytes(). Just the existence of this would 
warn many programmers about getBytes().


Raghu.

Owen O'Malley wrote:


On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:


Hey Tom,

I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() 
kind of function.


It does it because continually resizing the array to the valid length 
is very expensive. It would be a reasonable patch to add a 
getValidBytes, but most methods in Java's libraries are aware of this 
and let you pass in byte[], offset, and length. So once you realize what 
the problem is, you can work around it.


-- Owen




Re: can't read the SequenceFile correctly

2009-02-06 Thread Tom White
Hi Mark,

Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.

Tom

On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner markkerz...@gmail.com wrote:
 Hi,

 I have written binary files to a SequenceFile, seemeingly successfully, but
 when I read them back with the code below, after a first few reads I get the
 same number of bytes for the different files. What could go wrong?

 Thank you,
 Mark

  reader = new SequenceFile.Reader(fs, path, conf);
Writable key = (Writable)
 ReflectionUtils.newInstance(reader.getKeyClass(), conf);
Writable value = (Writable)
 ReflectionUtils.newInstance(reader.getValueClass(), conf);
long position = reader.getPosition();
while (reader.next(key, value)) {
String syncSeen = reader.syncSeen() ? * : ;
byte [] fileBytes = ((BytesWritable) value).getBytes();
System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen,
 key, fileBytes.length);
position = reader.getPosition(); // beginning of next record
}



Re: can't read the SequenceFile correctly

2009-02-06 Thread Mark Kerzner
Indeed, this was the answer!

Thank you,
Mark

On Fri, Feb 6, 2009 at 4:25 AM, Tom White t...@cloudera.com wrote:

 Hi Mark,

 Not all the bytes stored in a BytesWritable object are necessarily
 valid. Use BytesWritable#getLength() to determine how much of the
 buffer returned by BytesWritable#getBytes() to use.

 Tom

 On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner markkerz...@gmail.com
 wrote:
  Hi,
 
  I have written binary files to a SequenceFile, seemeingly successfully,
 but
  when I read them back with the code below, after a first few reads I get
 the
  same number of bytes for the different files. What could go wrong?
 
  Thank you,
  Mark
 
   reader = new SequenceFile.Reader(fs, path, conf);
 Writable key = (Writable)
  ReflectionUtils.newInstance(reader.getKeyClass(), conf);
 Writable value = (Writable)
  ReflectionUtils.newInstance(reader.getValueClass(), conf);
 long position = reader.getPosition();
 while (reader.next(key, value)) {
 String syncSeen = reader.syncSeen() ? * : ;
 byte [] fileBytes = ((BytesWritable) value).getBytes();
 System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen,
  key, fileBytes.length);
 position = reader.getPosition(); // beginning of next
 record
 }
 



RE: can't read the SequenceFile correctly

2009-02-06 Thread Bhupesh Bansal
Hey Tom, 

I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of 
function. 


Best
Bhupesh 



-Original Message-
From: Tom White [mailto:t...@cloudera.com]
Sent: Fri 2/6/2009 2:25 AM
To: core-user@hadoop.apache.org
Subject: Re: can't read the SequenceFile correctly
 
Hi Mark,

Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.

Tom

On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner markkerz...@gmail.com wrote:
 Hi,

 I have written binary files to a SequenceFile, seemeingly successfully, but
 when I read them back with the code below, after a first few reads I get the
 same number of bytes for the different files. What could go wrong?

 Thank you,
 Mark

  reader = new SequenceFile.Reader(fs, path, conf);
Writable key = (Writable)
 ReflectionUtils.newInstance(reader.getKeyClass(), conf);
Writable value = (Writable)
 ReflectionUtils.newInstance(reader.getValueClass(), conf);
long position = reader.getPosition();
while (reader.next(key, value)) {
String syncSeen = reader.syncSeen() ? * : ;
byte [] fileBytes = ((BytesWritable) value).getBytes();
System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen,
 key, fileBytes.length);
position = reader.getPosition(); // beginning of next record
}




can't read the SequenceFile correctly

2009-02-05 Thread Mark Kerzner
Hi,

I have written binary files to a SequenceFile, seemeingly successfully, but
when I read them back with the code below, after a first few reads I get the
same number of bytes for the different files. What could go wrong?

Thank you,
Mark

  reader = new SequenceFile.Reader(fs, path, conf);
Writable key = (Writable)
ReflectionUtils.newInstance(reader.getKeyClass(), conf);
Writable value = (Writable)
ReflectionUtils.newInstance(reader.getValueClass(), conf);
long position = reader.getPosition();
while (reader.next(key, value)) {
String syncSeen = reader.syncSeen() ? * : ;
byte [] fileBytes = ((BytesWritable) value).getBytes();
System.out.printf([%s%s]\t%s\t%s\n, position, syncSeen,
key, fileBytes.length);
position = reader.getPosition(); // beginning of next record
}