Re: AWS S3AInputStream questions

2016-08-05 Thread Aaron Fabbri
On Tue, Aug 2, 2016 at 12:17 AM, Mr rty ff  wrote:
>
> Hi I have few questions about implementation of inputstream in S3.
>  1)public synchronized long getPos() throws IOException
> {return (nextReadPos < 0) ? 0 : nextReadPos;}
> Why does it return nextReadPos  not pos?

My understanding is:

seek() is a lazy implementation.  S3AInputStream keeps track of two
seek positions:

1. current position in underlying stream (pos)
2. next position to read (nextReadPos).

If the seek() implementation were eager, not lazy, we could do the seeking when
seek() is called.  In that case, I think we would only need to keep
track of #1 (pos).

Instead we keep track of where the next read() will start, and
lazily do the seek logic when it is actually needed.

getPos() is supposed to return the position of the next read(),
so nextReadPos is the correct value to return.

> In memeber definition for
> pos/*** This is the public position; the one set in {@link #seek(long)}* and
> returned in {@link #getPos()}.*/

This is probably the source of your confusion.  Looks like this comment should
be changed.  I believe pos is the position of the underlying stream,
not the next read pos. They probably became different when
lazy seek was implemented.

> private long pos;

> 2)seekInStream  In the last lines you have:// close the stream;
>  if read the object will be opened at the new pos
> closeStream("seekInStream()", this.requestedStreamLen);
> pos = targetPos; Why you need this line? Shouldn`t pos be updated
> with actual skipped value? As you did:
> | if (skipped > 0) { |
> | pos += skipped; |

skipped variable is not in scope at that point.

It is used to keep track of how far the underlying stream actually skipped.

The point of this logic is to balance performance between
(a) always reopening the stream at the newly-seeked position
(b) just reading forward and discarding unneeded bytes

I believe (a) was found to inefficient in some cases.

This code implements both approaches, depending on how far
forward the seek() is.  The code you are talking about here is
the (a) case where we reopen the stream on next read().

In this case, we just store the desired position (pos) which
will be used in the next call to read() to open the
stream at the offset 'pos' (see call to lazySeek()).

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: AWS S3AInputStream questions

2016-08-02 Thread Mr rty ff
The message got garbled up so I trying to send it again.
Hi I have few questions about implementation of inputstream in S3.From 
S3AInputStream.java
1)
public synchronized long getPos() throws IOException {return (nextReadPos < 0) 
? 0 : nextReadPos;}
Why does it return nextReadPos  not pos?In memeber definition for pos
/*** This is the public position; the one set in {@link #seek(long)}* and 
returned in {@link #getPos()}.*/private long pos;
 2)seekInStreamIn the last lines you have:
// close the stream; if read the object will be opened at the new 
poscloseStream("seekInStream()", this.requestedStreamLen);pos = targetPos; Why 
you need this line? Shouldn`t pos be updated with actual skipped value? As you 
did:
| if (skipped > 0) { |
|


| pos += skipped; |


Thanks 

On Tuesday, August 2, 2016 10:17 AM, Mr rty ff  wrote:
 

 Hi I have few questions about implementation of inputstream in S3.
 1)public synchronized long getPos() throws IOException {return (nextReadPos < 
0) ? 0 : nextReadPos;}Why does it return nextReadPos  not pos?In memeber 
definition for pos/*** This is the public position; the one set in {@link 
#seek(long)}* and returned in {@link #getPos()}.*/private long pos; 
2)seekInStreamIn the last lines you have:// close the stream; if read the 
object will be opened at the new poscloseStream("seekInStream()", 
this.requestedStreamLen);pos = targetPos; Why you need this line? Shouldn`t pos 
be updated with actual skipped value? As you did:
| if (skipped > 0) { |
|


| pos += skipped; |


Thanks


  

AWS S3AInputStream questions

2016-08-02 Thread Mr rty ff
Hi I have few questions about implementation of inputstream in S3.
 1)public synchronized long getPos() throws IOException {return (nextReadPos < 
0) ? 0 : nextReadPos;}Why does it return nextReadPos  not pos?In memeber 
definition for pos/*** This is the public position; the one set in {@link 
#seek(long)}* and returned in {@link #getPos()}.*/private long pos; 
2)seekInStreamIn the last lines you have:// close the stream; if read the 
object will be opened at the new poscloseStream("seekInStream()", 
this.requestedStreamLen);pos = targetPos; Why you need this line? Shouldn`t pos 
be updated with actual skipped value? As you did:
| if (skipped > 0) { |
|


| pos += skipped; |


Thanks