[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-12-23 Thread Deepak Kumar V (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856136#comment-13856136
 ] 

Deepak Kumar V commented on HADOOP-9307:


Doug pointed me here.

I see a similar error while reading Avro file, doing random number of seeks.


Details
=
Hello,
I have a 340 MB avro data file that contains records sorted and identified by 
unique id (duplicate records exists). At the beginning of every unique record a 
synchronization point is created with DataFileWriter.sync(). (I cannot or do 
not want to save the sync points and i do not want to use SortedKeyValueFile as 
output format for M/R job)  

There are at-least 25k synchronization points in a 340 MB file.

Ex:
Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2


As records are sorted, for efficient retrieval, binary search is performed 
using the attached code.

Most of the times the search is successful, at times the code throws the 
following exception
--
org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at 
org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210 
--



Questions
1) Is it ok to have 25k sycn points for 300 MB file ? Does it cost in 
performance while reading ?
2) I note down the position that was used to invoke fileReader.sync(mid);. If i 
catch AvroRuntimeException, close and open the file and sync(mid) i do not see 
exception. Why should Avro throw exception before and not later ?
3) Is there a limit on number of times sync() is invoked ?
4) When sync(position) is invoked, are any 0 = position = file.size()  valid 
? If yes why do i see AvroRuntimeException (#2) ?

==

Some of the questions are irrelevant here.

As the patch has been committed, what version of hadoop-core will have this fix 
? 

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.1.0-beta, 1.3.0

 Attachments: hadoop-9307-branch-1.txt, hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-12-23 Thread Deepak Kumar V (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856137#comment-13856137
 ] 

Deepak Kumar V commented on HADOOP-9307:


I am using hadoop-core-1.1.2.21

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.1.0-beta, 1.3.0

 Attachments: hadoop-9307-branch-1.txt, hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-12-23 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856168#comment-13856168
 ] 

Harsh J commented on HADOOP-9307:
-

Hello Deepak,

The Fix Versions lists the versions 2.1.0-beta (and onwards) for
Hadoop 2.x, or 1.3.0 (and onwards) for Hadoop 1.x.




-- 
Harsh J


 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.1.0-beta, 1.3.0

 Attachments: hadoop-9307-branch-1.txt, hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658229#comment-13658229
 ] 

Hudson commented on HADOOP-9307:


Integrated in Hadoop-Yarn-trunk #210 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/210/])
HADOOP-9307. BufferedFSInputStream.read returns wrong results after certain 
seeks. Contributed by Todd Lipcon. (Revision 1482377)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1482377
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BufferedFSInputStream.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFileSystem.java


 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: hadoop-9307-branch-1.txt, hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658338#comment-13658338
 ] 

Hudson commented on HADOOP-9307:


Integrated in Hadoop-Hdfs-trunk #1399 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1399/])
HADOOP-9307. BufferedFSInputStream.read returns wrong results after certain 
seeks. Contributed by Todd Lipcon. (Revision 1482377)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1482377
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BufferedFSInputStream.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFileSystem.java


 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: hadoop-9307-branch-1.txt, hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658374#comment-13658374
 ] 

Hudson commented on HADOOP-9307:


Integrated in Hadoop-Mapreduce-trunk #1426 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1426/])
HADOOP-9307. BufferedFSInputStream.read returns wrong results after certain 
seeks. Contributed by Todd Lipcon. (Revision 1482377)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1482377
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BufferedFSInputStream.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFileSystem.java


 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: hadoop-9307-branch-1.txt, hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-05-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656861#comment-13656861
 ] 

Todd Lipcon commented on HADOOP-9307:
-

Hey Steve. I agree that improving the general cross-filesystem testing is a 
worthy goal. But, this is a simple bug in an existing implementation, and the 
patch adds a specific unit test. Given that this breaks HBase running on the 
local filesystem, I don't think it makes sense to block fixing it on a much 
bigger project like standardizing tests.

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-05-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656872#comment-13656872
 ] 

Harsh J commented on HADOOP-9307:
-

+1 - The change and the added regression test looks good. I tested it without 
the fix as well. Nice find Todd!

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-04-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627129#comment-13627129
 ] 

Todd Lipcon commented on HADOOP-9307:
-

[~ste...@apache.org], mind taking a look?

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624108#comment-13624108
 ] 

Hadoop QA commented on HADOOP-9307:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577294/hadoop-9307.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2421//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2421//console

This message is automatically generated.

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hadoop-9307.txt


 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-02-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578244#comment-13578244
 ] 

Todd Lipcon commented on HADOOP-9307:
-

An example sequence of seeks which returns the wrong data is as follows, 
assuming a 4096-byte buffer:

{code}
seek(0);
readFully(1);
{code}

This primes the buffer. After this, the current state of the buffered stream is 
{{pos=0, count=4096, filepos=4096}}

{code}
seek(2000);
{code}

The seek sees that the required data is in already in the buffer, and just sets 
{{pos=2000}}

{code}
readFully(1);
{code}

This first copies the remaining bytes from the buffer and sets {{pos=4096}}. 
Then, because 5904 bytes are remaining, and this is larger than the buffer 
size, it copies them directly into the user-supplied output buffer. This leaves 
the state of the stream at {{pos=4096, count=4096, filepos=12000}}

{code}
seek(11000);
{code}

The optimization in BufferedFSInputStream sees that there are 4096 buffered 
bytes, and that this seek is supposedly within the window, assuming that those 
4096 bytes directly precede filepos. So, it erroneously just sets {{pos=3096}}.

The next read will then get the wrong results for the first 1000 bytes -- 
yielding bytes 3096-4096 of the file instead of bytes 11000-12000.

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578275#comment-13578275
 ] 

Steve Loughran commented on HADOOP-9307:


Interesting. I saw some quirks with data read/writes talking to OpenStack 
swift, but felt that was eventual consistency related, not buffering. If you 
look in {{FileSystemContractBaseTest}} there's some updated code for creating 
test datasets and comparing byte arrays in files -that comparison code could be 
teased out, and/or a new test added to the contract if you seek(offset) then 
readFully(bytes[]), you get the data at 
file[offset]...file[offset+bytes.length-1]

Let me add that to my list of things we assume that a filesystem does.


 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks

2013-02-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578514#comment-13578514
 ] 

Todd Lipcon commented on HADOOP-9307:
-

Yea, I have a randomized test case that finds this bug within a few seconds - 
basically a copy of one that I wrote for HDFS a couple years ago. Will upload 
it with a bugfix patch hopefully later today, but maybe early next week (pretty 
busy next two days). FWIW the fix is simple -- just need to add {{(this.pos != 
this.count)}} into the condition to run the seek-in-buffer optimization

 BufferedFSInputStream.read returns wrong results after certain seeks
 

 Key: HADOOP-9307
 URL: https://issues.apache.org/jira/browse/HADOOP-9307
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.1.1, 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 After certain sequences of seek/read, BufferedFSInputStream can silently 
 return data from the wrong part of the file. Further description in first 
 comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira