[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-07 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated MAPREDUCE-6729:
-
Assignee: mingleizhang  (was: mingleizhang)

> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Assignee: mingleizhang
>Priority: Minor
>  Labels: performance, test
> Attachments: MR-6729.txt
>
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time 
> consumption because of fs.delete(*,*) by the writeTest method
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-07 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated MAPREDUCE-6729:
-
Assignee: mingleizhang

> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Assignee: mingleizhang
>Priority: Minor
>  Labels: performance, test
> Attachments: MR-6729.txt
>
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time 
> consumption because of fs.delete(*,*) by the writeTest method
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-07 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Attachment: MR-6729.txt

> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
> Attachments: MR-6729.txt
>
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time 
> consumption because of fs.delete(*,*) by the writeTest method
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-06 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Description: 
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs); // this line of code will cause extra time consumption 
because of fs.delete(*,*) by the writeTest method
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]




  was:
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs); // this line of code will cause extra time consumption 
because of fs.delete(*,*)
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]





> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time 
> consumption because of fs.delete(*,*) by the writeTest method
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-06 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Description: 
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]




  was:
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. We need to replace or improve 
this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]





> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs);
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-06 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Description: 
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs); // this line of code will cause extra time consumption 
because of fs.delete(*,*)
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]




  was:
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs); // this line of code will cause extra time consumption 
 because of fs.delete(*,*)
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]





> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time 
> consumption because of fs.delete(*,*)
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-06 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Description: 
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs); // this line of code will cause extra time consumption 
 because of fs.delete(*,*)
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]




  was:
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs); // this line of code will cause extra time because of 
fs.delete(*,*)
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]





> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time 
> consumption  because of fs.delete(*,*)
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-06 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Description: 
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs); // this line of code will cause extra time because of 
fs.delete(*,*)
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]




  was:
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. So we need to replace or 
improve this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]





> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. So we need to replace 
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time because 
> of fs.delete(*,*)
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-06 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Description: 
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput while the files are lots of. We need to replace or improve 
this hack to prevent this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]




  was:
When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
writes plenty of files to disk or read from, both can cause performance issue 
and imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput for us. We need to replace or improve this hack to prevent 
this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]





> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput while the files are lots of. We need to replace or 
> improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs);
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6729) Hitting performance and error when lots of files to write or read

2016-07-06 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated MAPREDUCE-6729:

Description: 
When doing DFSIO test as distributed i/o benchmark tool. Then especially writes 
plenty of files to disk or read from, both can cause performance issue and 
imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause extra time 
consumption and furthermore cause performance issue, statistical time error and 
imprecise throughput for us. We need to replace or improve this hack to prevent 
this from happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]




  was:
When doing DFSIO test as distributed i/o benchmark tool. Then especially writes 
plenty of files to disk or read from, both can cause performance issue and 
imprecise value in a way. The question is that existing practices needs to 
delete files when before running a job and that will cause time consumption and 
furthermore cause performance issue, statistical time error and imprecise 
throughput for us. We need to replace or improve this hack to prevent this from 
happening in the future.

{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
  }

private void writeTest(FileSystem fs) throws IOException {
  Path writeDir = getWriteDir(config);
  fs.delete(getDataDir(config), true);
  fs.delete(writeDir, true);
  runIOTest(WriteMapper.class, writeDir);
  }
{code} 

[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]





> Hitting performance and error when lots of files to write or read
> -
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, performance, test
>Reporter: mingleizhang
>Priority: Minor
>  Labels: performance, test
>
> When doing DFSIO test as distributed i/o benchmark tool. Then especially 
> writes plenty of files to disk or read from, both can cause performance issue 
> and imprecise value in a way. The question is that existing practices needs 
> to delete files when before running a job and that will cause extra time 
> consumption and furthermore cause performance issue, statistical time error 
> and imprecise throughput for us. We need to replace or improve this hack to 
> prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs);
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org