[jira] [Updated] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785

2015-02-09 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-6234:

Summary: TestHighRamJob fails due to the change in MAPREDUCE-5785  (was: 
MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml)

 TestHighRamJob fails due to the change in MAPREDUCE-5785
 

 Key: MAPREDUCE-6234
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix, mrv2
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, 
 MAPREDUCE-6234.003.patch


 TestHighRamJob fails by this.
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec  
 FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
 testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
 Time elapsed: 1.102 sec   FAILURE!
 java.lang.AssertionError: expected:1024 but was:-1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml

2015-02-09 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-6234:

Attachment: MAPREDUCE-6234.003.patch

003 fixes test failure without changing the value of DEFAULT_*_MEMORY_MB.

 MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
 

 Key: MAPREDUCE-6234
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix, mrv2
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, 
 MAPREDUCE-6234.003.patch


 TestHighRamJob fails by this.
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec  
 FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
 testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
 Time elapsed: 1.102 sec   FAILURE!
 java.lang.AssertionError: expected:1024 but was:-1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313469#comment-14313469
 ] 

Hadoop QA commented on MAPREDUCE-6174:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697564/MAPREDUCE-6174.v1.txt
  against trunk revision af08425.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5178//console

This message is automatically generated.

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6248) Persist DistCp job id in the staging directory

2015-02-09 Thread Jing Zhao (JIRA)
Jing Zhao created MAPREDUCE-6248:


 Summary: Persist DistCp job id in the staging directory
 Key: MAPREDUCE-6248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Jing Zhao
Assignee: Jing Zhao


Currently the DistCp is acting as a tool and the corresponding MapReduce Job  
is created and used inside of its {{execute}} method. It is thus difficult for 
external services to query its progress and counters. It may be helpful to 
persist the job id into a file inside its staging directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.

  was:
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;;)) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.


 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
   Original Estimate: 24h
  Remaining Estimate: 24h

 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.
 I changed the current DBOutputFormat class by checking the product name from 
 connection object to see if it is DB2 then generates INSERT INTO command 
 without semicolon(;). 
 This technique is already used in DBInputFormat class for generating 
 different SELECT statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.

  was:
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.


 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
   Original Estimate: 24h
  Remaining Estimate: 24h

 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.
 I changed the current DBOutputFormat class by checking the product name from 
 connection object to see if it is DB2 then generates INSERT INTO command 
 without semicolon(;). 
 This technique is already used in DBInputFormat class for generating 
 different SELECT statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312642#comment-14312642
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6244:
---

Cancelling for the previous comment.

 Hadoop examples when run without an argument, gives ERROR instead of just 
 usage info
 

 Key: MAPREDUCE-6244
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.0, trunk-win, 2.6.0
Reporter: Robert Justice
Assignee: Abhishek Kapoor
Priority: Minor
 Attachments: HADOOP-8834.patch, HADOOP-8834.patch


 Hadoop sort example should not give an ERROR and only should display usage 
 when run with no parameters. 
 {code}
 $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort
 ERROR: Wrong number of parameters: 0 instead of 2.
 sort [-m maps] [-r reduces] [-inFormat input format class] [-outFormat 
 output format class] [-outKey output key class] [-outValue output value 
 class] [-totalOrder pcnt num samples max splits] input output
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Attachment: MAPREDUCE-6246.patch

 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
  Labels: DB2, mapreduce
 Fix For: 2.4.1

 Attachments: MAPREDUCE-6246.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)

  was:
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)

In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.


 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
  Labels: DB2, mapreduce
 Fix For: 2.4.1

 Attachments: MAPREDUCE-6246.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 DBoutputformat is used for writing output of mapreduce jobs to the database 
 and when used with db2 jdbc drivers it fails with following error
 com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
 SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
 DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
 com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures

2015-02-09 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312791#comment-14312791
 ] 

Akira AJISAKA commented on MAPREDUCE-6223:
--

Thanks [~varun_saxena] for updating the patch. +1 pending [~kasha]'s review. 
The findbugs warnings look unrelated to the patch.

 TestJobConf#testNegativeValueForTaskVmem failures
 -

 Key: MAPREDUCE-6223
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Gera Shegalov
Assignee: Varun Saxena
 Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, 
 MAPREDUCE-6223.003.patch


 {code}
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec  
 FAILURE! - in org.apache.hadoop.conf.TestJobConf
 testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf)  Time 
 elapsed: 0.089 sec   FAILURE!
 java.lang.AssertionError: expected:1024 but was:-1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)
ramtin created MAPREDUCE-6246:
-

 Summary: DBOutputFormat.java appending extra semicolon to query 
which is incompatible with DB2
 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
Platform: xSeries, pSeries
Browser: Firefox, IE
Security Settings: No Security, Flat file, LDAP, PAM
File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin


In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe

2015-02-09 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312583#comment-14312583
 ] 

Kannan Rajah commented on MAPREDUCE-6237:
-

[~ozawa] Is the patch alright? Anything else I need to do to get this committed?

 DBRecordReader is not thread safe
 -

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-5381) Support graceful decommission of tasktracker

2015-02-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-5381.

Resolution: Won't Fix

Hardly any development is happening in 1.x now. I am closing this in favor of 
YARN's YARN-914. Please reopen if need be.

 Support graceful decommission of tasktracker
 

 Key: MAPREDUCE-5381
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5381
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1
Affects Versions: 1.2.0
Reporter: Luke Lu
Assignee: Binglin Chang
 Attachments: MAPREDUCE-5381-graceful-decomm.v1.patch


 When TTs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running jobs.
 Currently if a TT is decommissioned, all running tasks on the TT need to be 
 rescheduled on other TTs. Further more, for finished map tasks, if their map 
 output are not fetched by the reducers of the job, these map tasks will need 
 to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 tasktracker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;;)) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.

  was:
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.


 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
   Original Estimate: 24h
  Remaining Estimate: 24h

 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;;)) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.
 I changed the current DBOutputFormat class by checking the product name from 
 connection object to see if it is DB2 then generates INSERT INTO command 
 without semicolon(;). 
 This technique is already used in DBInputFormat class for generating 
 different SELECT statements for Oracle and MySQL databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Fix Version/s: 2.6.1

 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

  was:
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO command 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.


 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
   Original Estimate: 24h
  Remaining Estimate: 24h

 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312636#comment-14312636
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6237:
---

Committed this to trunk, branch-2, and branch-2.6. Thanks [~rkannan82] for your 
contribution and thanks [~jira.shegalov] for your review.

[~rkannan82], BTW, do you mind creating following JIRA to use thread pool based 
on Gera's suggestion?

 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6244:
--
Status: Open  (was: Patch Available)

 Hadoop examples when run without an argument, gives ERROR instead of just 
 usage info
 

 Key: MAPREDUCE-6244
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.6.0, 0.23.0, trunk-win
Reporter: Robert Justice
Assignee: Abhishek Kapoor
Priority: Minor
 Attachments: HADOOP-8834.patch, HADOOP-8834.patch


 Hadoop sort example should not give an ERROR and only should display usage 
 when run with no parameters. 
 {code}
 $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort
 ERROR: Wrong number of parameters: 0 instead of 2.
 sort [-m maps] [-r reduces] [-inFormat input format class] [-outFormat 
 output format class] [-outKey output key class] [-outValue output value 
 class] [-totalOrder pcnt num samples max splits] input output
 Generic options supported are
 -conf configuration file specify an application configuration file
 -D property=valueuse value for given property
 -fs local|namenode:port  specify a namenode
 -jt local|jobtracker:portspecify a job tracker
 -files comma separated list of filesspecify comma separated files to be 
 copied to the map reduce cluster
 -libjars comma separated list of jarsspecify comma separated jar files 
 to include in the classpath.
 -archives comma separated list of archivesspecify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
 bin/hadoop command [genericOptions] [commandOptions]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312590#comment-14312590
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6237:
---

+1, findbugs look not related to your patch. I'll commit this to branch-2 and 
trunk shortly.

 DBRecordReader is not thread safe
 -

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Affects Version/s: 2.6.0
 Hadoop Flags: Reviewed

 DBRecordReader is not thread safe
 -

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6237:
--
Summary: Multiple mappers with DBInputFormat don't work because of reusing 
conections  (was: DBRecordReader is not thread safe)

 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313278#comment-14313278
 ] 

Hadoop QA commented on MAPREDUCE-6246:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697540/MAPREDUCE-6246.patch
  against trunk revision af08425.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5177//console

This message is automatically generated.

 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
  Labels: DB2, mapreduce
 Fix For: 2.4.1

 Attachments: MAPREDUCE-6246.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 DBoutputformat is used for writing output of mapreduce jobs to the database 
 and when used with db2 jdbc drivers it fails with following error
 com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
 SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
 DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
 com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312817#comment-14312817
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #833 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/833/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt


 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-02-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Status: Patch Available  (was: Open)

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.6.0, 3.0.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-02-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6174:
--
Attachment: MAPREDUCE-6174.v1.txt

[~jira.shegalov], I have uploaded a patch for this issue. Would you please have 
a look?

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2015-02-09 Thread kumar ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312122#comment-14312122
 ] 

kumar ranganathan commented on MAPREDUCE-5903:
--

I am also facing the same exception when enabling LDAP for windows active 
directory in hadoop-2.6.0. 

 If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
 phase
 

 Key: MAPREDUCE-5903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
 Environment: hadoop: 2.4.0.2.1.2.0
Reporter: Victor Kim
Priority: Critical
  Labels: shuffle

 I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
 Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
 ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
 principal. 
 Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
 having Kerberos principal on all boxes). Result: job successfully completed.
 Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
 Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
 ShuffleError Caused by: java.io.IOException: Exceeded 
 MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
 The use case with user impersonation used to work on earlier versions, 
 without YARN (with JTTT).
 I found similar issue with Kerberos AUTH involved here: 
 https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
 And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
 resolved, which is not the case when Kerberos Authentication is enabled.
 The exception trace from YarnChild JVM:
 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
 with too many fetch failures and insufficient progress!
 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#3
 at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
 bailing-out.
 at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
 at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
 at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
 at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

2015-02-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-207:
---
Status: Open  (was: Patch Available)

Cancelling patch, as it no longer applies.

 Computing Input Splits on the MR Cluster
 

 Key: MAPREDUCE-207
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster, mrv2
Reporter: Philip Zeyliger
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
 MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, 
 MAPREDUCE-207.v07.patch


 Instead of computing the input splits as part of job submission, Hadoop could 
 have a separate job task type that computes the input splits, therefore 
 allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312951#comment-14312951
 ] 

Hudson commented on MAPREDUCE-6237:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #99 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/99/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt


 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)

2015-02-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4413:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

+1 committed to trunk.

Thanks!

 MR lib dir contains jdiff (which is gpl)
 

 Key: MAPREDUCE-4413
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Nemon Lou
Priority: Critical
 Fix For: 3.0.0

 Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch


 A tarball built from trunk contains the following:
 ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar
 jdiff is gplv2, we need to exclude it from the build artifact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313047#comment-14313047
 ] 

Hudson commented on MAPREDUCE-6237:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7053/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java


 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4413) MR lib dir contains jdiff (which is gpl)

2015-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313044#comment-14313044
 ] 

Hudson commented on MAPREDUCE-4413:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7053/])
MAPREDUCE-4413. MR lib dir contains jdiff (which is gpl) (Nemon Lou via aw) 
(aw: rev aab459c904bf2007c5b230af8c058793935faf89)
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-assemblies/src/main/resources/assemblies/hadoop-mapreduce-dist.xml


 MR lib dir contains jdiff (which is gpl)
 

 Key: MAPREDUCE-4413
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4413
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Nemon Lou
Priority: Critical
 Fix For: 3.0.0

 Attachments: MAPREDUCE-4413.patch, MAPREDUCE-4413.patch


 A tarball built from trunk contains the following:
 ./share/hadoop/mapreduce/lib/jdiff-1.0.9.jar
 jdiff is gplv2, we need to exclude it from the build artifact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Fix Version/s: 2.4.1
   Labels: DB2 mapreduce  (was: )
   Status: Patch Available  (was: Open)

I changed the current DBOutputFormat class by checking the product name from 
connection object to see if it is DB2 then generates INSERT INTO  query 
without semicolon(;). 

This technique is already used in DBInputFormat class for generating different 
SELECT statements for Oracle and MySQL databases.

 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
  Labels: mapreduce, DB2
 Fix For: 2.4.1

   Original Estimate: 24h
  Remaining Estimate: 24h

 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)

In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

  was:
In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.


 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
  Labels: DB2, mapreduce
 Fix For: 2.4.1

 Attachments: MAPREDUCE-6246.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 DBoutputformat is used for writing output of mapreduce jobs to the database 
 and when used with db2 jdbc drivers it fails with following error
 com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
 SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
 DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
 com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-02-09 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated MAPREDUCE-6246:
--
Description: 
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)


In DBOutputFormat class there is constructQuery method that generates INSERT 
INTO statement with semicolon(;) at the end.

Semicolon is ANSI SQL-92 standard character for a statement terminator but this 
feature is disabled(OFF) as a default settings in IBM DB2.

Although by using -t we can turn it ON for db2. 
(http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
 But there are some products that already built on top of this default setting 
(OFF) so by turning ON this feature make them error prone.

  was:
DBoutputformat is used for writing output of mapreduce jobs to the database and 
when used with db2 jdbc drivers it fails with following error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)


 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
  Labels: DB2, mapreduce
 Fix For: 2.4.1

 Attachments: MAPREDUCE-6246.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 DBoutputformat is used for writing output of mapreduce jobs to the database 
 and when used with db2 jdbc drivers it fails with following error
 com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
 SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
 DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
 com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6247) Use DBCP connection pooling in DBInputFormat

2015-02-09 Thread Kannan Rajah (JIRA)
Kannan Rajah created MAPREDUCE-6247:
---

 Summary: Use DBCP connection pooling in DBInputFormat
 Key: MAPREDUCE-6247
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6247
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.6.0, 2.5.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
Priority: Minor


As part of MAPREDUCE-6237, we removed caching of DB connection. 
[~jira.shegalov] and [~ozawa] suggested that we use DBCP connection pooling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master

2015-02-09 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated MAPREDUCE-6242:

Status: Patch Available  (was: Open)

 Progress report log is incredibly excessive in application master
 -

 Key: MAPREDUCE-6242
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.4.0
Reporter: Jian Fang
Assignee: Varun Saxena
 Attachments: MAPREDUCE-6242.001.patch


 We saw incredibly excessive logs in application master for a long running one 
 with many task attempts. The log write rate is around 1MB/sec in some cases. 
 Most of the log entries were from the progress report such as the following 
 ones.
 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.15605757
 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.4108217
 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_02_0 is : 0.06634143
 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.6506
 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_01_0 is : 0.21723115
 Looks like the report interval is controlled by a hard-coded variable 
 PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We 
 should allow users to set the appropriate progress interval for their 
 applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures

2015-02-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313167#comment-14313167
 ] 

Karthik Kambatla commented on MAPREDUCE-6223:
-

Patch looks mostly good to me. Nit: I would leave the test for negative values, 
but update the asserts to reflect the expected behavior. 

 TestJobConf#testNegativeValueForTaskVmem failures
 -

 Key: MAPREDUCE-6223
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Gera Shegalov
Assignee: Varun Saxena
 Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch, 
 MAPREDUCE-6223.003.patch


 {code}
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec  
 FAILURE! - in org.apache.hadoop.conf.TestJobConf
 testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf)  Time 
 elapsed: 0.089 sec   FAILURE!
 java.lang.AssertionError: expected:1024 but was:-1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313096#comment-14313096
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #100 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/100/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java


 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master

2015-02-09 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated MAPREDUCE-6242:

Status: Open  (was: Patch Available)

 Progress report log is incredibly excessive in application master
 -

 Key: MAPREDUCE-6242
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.4.0
Reporter: Jian Fang
Assignee: Varun Saxena
 Attachments: MAPREDUCE-6242.001.patch


 We saw incredibly excessive logs in application master for a long running one 
 with many task attempts. The log write rate is around 1MB/sec in some cases. 
 Most of the log entries were from the progress report such as the following 
 ones.
 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.15605757
 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.4108217
 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_02_0 is : 0.06634143
 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.6506
 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_01_0 is : 0.21723115
 Looks like the report interval is controlled by a hard-coded variable 
 PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We 
 should allow users to set the appropriate progress interval for their 
 applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313139#comment-14313139
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #2031 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2031/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java


 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313146#comment-14313146
 ] 

Hudson commented on MAPREDUCE-6237:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2050 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2050/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java


 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml

2015-02-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313148#comment-14313148
 ] 

Karthik Kambatla commented on MAPREDUCE-6234:
-

Thanks for working on this, folks. As you might see in the description of the 
config, it is kind of hard to pick a single value for DEFAULT_MAP_MEMORY_MB, 
and the most appropriate value seemed 1024 since we fallback to that value. I 
like Gera's proposal of adding a helper method to get the default value; 
however, I wonder if that would just translate to calling 
{{JobConf#getMemoryRequired}} on the default conf. 



 MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
 

 Key: MAPREDUCE-6234
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix, mrv2
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch


 TestHighRamJob fails by this.
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec  
 FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
 testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
 Time elapsed: 1.102 sec   FAILURE!
 java.lang.AssertionError: expected:1024 but was:-1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory

2015-02-09 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313414#comment-14313414
 ] 

Jing Zhao commented on MAPREDUCE-6248:
--

yes, actually that will be even better! I will upload a patch for this later.

 Persist DistCp job id in the staging directory
 --

 Key: MAPREDUCE-6248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently the DistCp is acting as a tool and the corresponding MapReduce Job  
 is created and used inside of its {{execute}} method. It is thus difficult 
 for external services to query its progress and counters. It may be helpful 
 to persist the job id into a file inside its staging directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6248) Persist DistCp job id in the staging directory

2015-02-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313411#comment-14313411
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6248:


Why not have a public API in DistCp and use that programmatically instead of 
persisting IDs into files and then reading them?

 Persist DistCp job id in the staging directory
 --

 Key: MAPREDUCE-6248
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6248
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently the DistCp is acting as a tool and the corresponding MapReduce Job  
 is created and used inside of its {{execute}} method. It is thus difficult 
 for external services to query its progress and counters. It may be helpful 
 to persist the job id into a file inside its staging directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6249) Streaming task will not untar tgz uploaded with -archives

2015-02-09 Thread Liu Xiao (JIRA)
Liu Xiao created MAPREDUCE-6249:
---

 Summary: Streaming task will not untar tgz uploaded with -archives
 Key: MAPREDUCE-6249
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 2.5.2
 Environment: hadoop-2.5.2
hadoop-streaming-2.5.2.jar
Reporter: Liu Xiao


when writing hadoop streaming task. i used -archives to upload a tgz from local 
machine to hdfs task working directory, but it has not been untarred as the 
document says. I've searched a lot without any luck.

Here is the hadoop streaming task starting command with hadoop-2.5.2

hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
-files mapper.sh
-archives /home/hadoop/tmp/test.tgz#test \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-input /test/test.txt \
-output /res/ \
-mapper sh mapper.sh \
-reducer cat

and mapper.sh

cat  /dev/null
ls -l test
exit 0

in test.tgz there is two files test.1.txt and test.2.txt

echo abcd  test.1.txt
echo efgh  test.2.txt
tar zcvf test.tgz test.1.txt test.2.txt

the output from above task

lrwxrwxrwx 1 hadoop hadoop 71 Feb  8 23:25 test - 
/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz

but what desired may be like this

-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.1.txt
-rw-r--r-- 1 hadoop hadoop 5 Feb  8 23:25 test.2.txt

so, why test.tgz has not been untarred automatically as document says, and or 
there is actually another way makes the tgz being untarred



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

2015-02-09 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312745#comment-14312745
 ] 

Kannan Rajah commented on MAPREDUCE-6237:
-

Created MAPREDUCE-6247 to track connection pooling.

 Multiple mappers with DBInputFormat don't work because of reusing conections
 

 Key: MAPREDUCE-6237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.5.0, 2.6.0
Reporter: Kannan Rajah
Assignee: Kannan Rajah
 Fix For: 2.6.1

 Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
 mapreduce-6237.patch


 DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
 of DBRecordReader. This is not a good idea. We should be creating separate 
 connection. If performance is a concern, then we should be using connection 
 pooling instead.
 I looked at DBOutputFormat.getRecordReader. It actually creates a new 
 Connection object for each DBRecordReader. So can we just change 
 DBInputFormat to create new Connection every time? The connection reuse code 
 was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
 caching the connection?
 We observed this issue in a customer setup where they were reading data from 
 MySQL using Pig. As per customer, the query is returning two records which 
 causes Pig to create two instances of DBRecordReader. These two instances are 
 sharing the database connection instance. The first DBRecordReader runs to 
 extract the first record from MySQL just fine, but then closes the shared 
 connection instance. When the second DBRecordReader runs, it tries to execute 
 a query to retrieve the second record on the closed shared connection 
 instance, which fail. If we set
 mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6234) TestHighRamJob fails due to the change in MAPREDUCE-5785

2015-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313534#comment-14313534
 ] 

Hadoop QA commented on MAPREDUCE-6234:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12697632/MAPREDUCE-6234.003.patch
  against trunk revision 260b5e3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-gridmix.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5179//console

This message is automatically generated.

 TestHighRamJob fails due to the change in MAPREDUCE-5785
 

 Key: MAPREDUCE-6234
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix, mrv2
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch, 
 MAPREDUCE-6234.003.patch


 TestHighRamJob fails by this.
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapred.gridmix.TestHighRamJob
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec  
 FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob
 testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob)  
 Time elapsed: 1.102 sec   FAILURE!
 java.lang.AssertionError: expected:1024 but was:-1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98)
   at 
 org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)