[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2018-10-26 Thread James Clampffer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10543:
---
Parent Issue: HDFS-14032  (was: HDFS-8707)

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
>Assignee: James Clampffer
>Priority: Major
> Fix For: HDFS-8707
>
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-25 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-10543:
-
Fix Version/s: HDFS-8707

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Fix For: HDFS-8707
>
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-24 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10543:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-24 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10543:
---
Assignee: (was: James Clampffer)

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-23 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10543:
---
Attachment: HDFS-10543.HDFS-8707.004.patch

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
>Assignee: James Clampffer
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-21 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10543:
---
Attachment: HDFS-10543.HDFS-8707.003.patch

Fix whitespace issue.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
>Assignee: James Clampffer
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-21 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10543:
---
Attachment: HDFS-10543.HDFS-8707.002.patch

HDFS-10543.HDFS-8707.002.patch fixed the issue that after moving the retry 
block to PositionRead, it only reads 1 block again.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
>Assignee: James Clampffer
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-20 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10543:
---
Status: Patch Available  (was: Open)

Marking as patch available to get CI running, had to assign to myself to make 
that happen.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
>Assignee: James Clampffer
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-20 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10543:
---
Flags: Patch

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-20 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10543:
---
Flags:   (was: Patch)

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-20 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10543:
---
Attachment: HDFS-10543.HDFS-8707.001.patch

HDFS-10543.HDFS-8707.001.patch moves retry loop of reading a multi-block file 
to PositionRead.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-20 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10543:
---
Attachment: HDFS-10543.HDFS-8707.000.patch

The patch fixed the issue that hdfsRead return only the size of the last read 
block. And it also fixed the bug that when offset is at the last byte of a 
block, it will return as block not found.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Attachments: HDFS-10543.HDFS-8707.000.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10543) hdfsRead read stops at block boundary

2016-06-17 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10543:
---
Description: 
Reproducer:

char *buf2 = new char[file_info->mSize];
  memset(buf2, 0, (size_t)file_info->mSize);
  int ret = hdfsRead(fs, file, buf2, file_info->mSize);
  delete [] buf2;
  if(ret != file_info->mSize) {
std::stringstream ss;
ss << "tried to read " << file_info->mSize << " bytes. but read " << 
ret << " bytes";
ReportError(ss.str());
hdfsCloseFile(fs, file);
continue;
  }

When it runs with a file ~1.4GB large, it will return an error like "tried to 
read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
against has a block size of 134217728 bytes. So it seems hdfsRead will stop at 
a block boundary. Looks like a regression. We should add retry to continue 
reading cross blocks in case of files w/ multiple blocks.

  was:
Reproducer:

char *buf2 = new char[file_info->mSize];
  memset(buf2, 0, (size_t)file_info->mSize);
  int ret = hdfsRead(fs, file, buf2, file_info->mSize);
  delete [] buf2;
  if(ret != file_info->mSize) {
std::stringstream ss;
ss << "tried to read " << file_info->mSize << " bytes. but read " << 
ret << " bytes";
ReportError(ss.str());
hdfsCloseFile(fs, file);
continue;
  }

When it runs with a file ~1.4GB large, it will return an error like "tried to 
read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
against has a block size of 146890 bytes. So it seems hdfsRead will stop at 
a block boundary. Looks like a regression. We should add retry to continue 
reading cross blocks in case of files w/ multiple blocks.


> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org