[jira] [Resolved] (MAPREDUCE-6992) Race for temp dir in LocalDistributedCacheManager.java

2017-10-26 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger resolved MAPREDUCE-6992.

Resolution: Duplicate

I agree; this is a dupe. Thanks!

> Race for temp dir in LocalDistributedCacheManager.java
> --
>
> Key: MAPREDUCE-6992
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6992
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Philip Zeyliger
>
> When localizing distributed cache files in "local" mode, 
> LocalDistributedCacheManager.java chooses a "unique" directory based on a 
> millisecond time stamp. When running code with some parallelism, it's 
> possible to run into this.
> The error message looks like 
> {code}
> bq. java.io.FileNotFoundException: jenkins/mapred/local/1508958341829_tmp 
> does not exist
> {code}
> I ran into this in Impala's data loading. There, we run a HiveServer2 which 
> runs in MapReduce. If multiple queries are submitted simultaneously to the 
> HS2, they conflict on this directory. Googling found that StreamSets ran into 
> something very similar looking at 
> https://issues.streamsets.com/browse/SDC-5473.
> I believe the buggy code is (link: 
> https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L94)
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
> new AtomicLong(System.currentTimeMillis());
> {code}
> Notably, a similar code path uses an actual random number generator 
> ({{LocalJobRunner.java}}, 
> https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java#L912).
> {code}
>   public String getStagingAreaDir() throws IOException {
> Path stagingRootDir = new Path(conf.get(JTConfig.JT_STAGING_AREA_ROOT,
> "/tmp/hadoop/mapred/staging"));
> UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
> String user;
> randid = rand.nextInt(Integer.MAX_VALUE);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6992) Race for temp dir in LocalDistributedCacheManager.java

2017-10-26 Thread Philip Zeyliger (JIRA)
Philip Zeyliger created MAPREDUCE-6992:
--

 Summary: Race for temp dir in LocalDistributedCacheManager.java
 Key: MAPREDUCE-6992
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6992
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Philip Zeyliger


When localizing distributed cache files in "local" mode, 
LocalDistributedCacheManager.java chooses a "unique" directory based on a 
millisecond time stamp. When running code with some parallelism, it's possible 
to run into this.

The error message looks like 
{code}
bq. java.io.FileNotFoundException: jenkins/mapred/local/1508958341829_tmp does 
not exist
{code}

I ran into this in Impala's data loading. There, we run a HiveServer2 which 
runs in MapReduce. If multiple queries are submitted simultaneously to the HS2, 
they conflict on this directory. Googling found that StreamSets ran into 
something very similar looking at https://issues.streamsets.com/browse/SDC-5473.

I believe the buggy code is (link: 
https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L94)
{code}
// Generating unique numbers for FSDownload.
AtomicLong uniqueNumberGenerator =
new AtomicLong(System.currentTimeMillis());
{code}

Notably, a similar code path uses an actual random number generator 
({{LocalJobRunner.java}}, 
https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java#L912).
{code}
  public String getStagingAreaDir() throws IOException {
Path stagingRootDir = new Path(conf.get(JTConfig.JT_STAGING_AREA_ROOT,
"/tmp/hadoop/mapred/staging"));
UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
String user;
randid = rand.nextInt(Integer.MAX_VALUE);
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5653) DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb

2015-03-12 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358974#comment-14358974
 ] 

Philip Zeyliger commented on MAPREDUCE-5653:


You could make an argument that DistCp, as a Yarn application, knows better 
than the defaults about how much memory it uses.  I.e., that the bug is that 
DistCp isn't setting both intimately related settings 
({{mapred.job.{map|reduce}.memory.mb}} and {{mapreduce.map.java.opts}}, but 
rather than just one.  If the defaults in your cluster were to use a lot of 
memory, and DistCP uses very little (after all, it's copying a buffer around), 
it's wasteful.

 DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb
 

 Key: MAPREDUCE-5653
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5653
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 0.23.9, 2.2.0
Reporter: Mithun Radhakrishnan
Assignee: Ratandeep Ratti
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5653.branch-0.23.patch, 
 MAPREDUCE-5653.branch-2.patch, MAPREDUCE-5653.trunk.2.patch, 
 MAPREDUCE-5653.trunk.patch


 When a DistCp job is run through Oozie (through a Java action that launches 
 DistCp), one sees that mapred.child.java.opts as set from the caller is 
 honoured by DistCp. But, DistCp doesn't seem to honour any overrides for 
 configs mapreduce.[map,reduce].memory.mb.
 Problem has been identified. I'll post a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5653) DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb

2015-03-12 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359045#comment-14359045
 ] 

Philip Zeyliger commented on MAPREDUCE-5653:


Allen, do you think there's more than just this one Xmx passthrough that's 
affecting DistCP?  There's not much smarts it needs: it's not like it's every 
doing anything besides copying files.

Hadn't seen MAPREDUCE-5785.  Agree that that's an excellent direction.



 DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb
 

 Key: MAPREDUCE-5653
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5653
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 0.23.9, 2.2.0
Reporter: Mithun Radhakrishnan
Assignee: Ratandeep Ratti
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5653.branch-0.23.patch, 
 MAPREDUCE-5653.branch-2.patch, MAPREDUCE-5653.trunk.2.patch, 
 MAPREDUCE-5653.trunk.patch


 When a DistCp job is run through Oozie (through a Java action that launches 
 DistCp), one sees that mapred.child.java.opts as set from the caller is 
 honoured by DistCp. But, DistCp doesn't seem to honour any overrides for 
 configs mapreduce.[map,reduce].memory.mb.
 Problem has been identified. I'll post a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5577) Allow querying the JobHistoryServer by job arrival time

2013-10-15 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-5577:
---

Description: 
  The JobHistoryServer REST APIs currently allow querying by job submit time 
and finish time.  However, jobs don't necessarily arrive in order of their 
finish time, meaning that a client who wants to stay on top of all completed 
jobs needs to query large time intervals to make sure they're not missing 
anything.  Exposing functionality to allow querying by the time a job lands at 
the JobHistoryServer would allow clients to set the start of their query 
interval to the time of their last query. 

The arrival time of a job would be defined as the time that it lands in the 
done directory and can be picked up using the last modified date on history 
files.


  was:
The JobHistoryServer REST APIs currently allow querying by job submit time and 
finish time.  However, jobs don't necessarily arrive in order of their finish 
time, meaning that a client who wants to stay on top of all completed jobs 
needs to query large time intervals to make sure they're not missing anything.  
Exposing functionality to allow querying by the time a job lands at the 
JobHistoryServer would allow clients to set the start of their query interval 
to the time of their last query. 

The arrival time of a job would be defined as the time that it lands in the 
done directory and can be picked up using the last modified date on history 
files.



 Allow querying the JobHistoryServer by job arrival time
 ---

 Key: MAPREDUCE-5577
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5577.patch


   The JobHistoryServer REST APIs currently allow querying by job submit time 
 and finish time.  However, jobs don't necessarily arrive in order of their 
 finish time, meaning that a client who wants to stay on top of all completed 
 jobs needs to query large time intervals to make sure they're not missing 
 anything.  Exposing functionality to allow querying by the time a job lands 
 at the JobHistoryServer would allow clients to set the start of their query 
 interval to the time of their last query. 
 The arrival time of a job would be defined as the time that it lands in the 
 done directory and can be picked up using the last modified date on history 
 files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

2012-11-08 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493443#comment-13493443
 ] 

Philip Zeyliger commented on MAPREDUCE-4469:


If you're looking for a resource usage of a process and its children, look at 
{{man getrusage}} which includes a flag to get the CPU usage of the children.  
Mind you, you'd need native code to get at it.

 Resource calculation in child tasks is CPU-heavy
 

 Key: MAPREDUCE-4469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance, task
Affects Versions: 1.0.3
Reporter: Todd Lipcon
Assignee: Ahmed Radwan
 Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, 
 MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch


 In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
 each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
 that it's spending a lot of time looping through all the files in /proc to 
 calculate resource usage.
 As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
 within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
 runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4610) Support deprecated mapreduce.job.counters.limit property in MR2

2012-08-30 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445110#comment-13445110
 ] 

Philip Zeyliger commented on MAPREDUCE-4610:


+1 LGTM.

 Support deprecated mapreduce.job.counters.limit property in MR2
 ---

 Key: MAPREDUCE-4610
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4610
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.0-alpha
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-4610.patch


 The property mapreduce.job.counters.limit was introduced in MAPREDUCE-1943, 
 but the mechanism was changed in MAPREDUCE-901 where the property name was 
 changed to mapreduce.job.counters.max without supporting the old name. We 
 should deprecate but honour the old name to make it easier for folks to move 
 from Hadoop 1 to Hadoop 2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-279) Map-Reduce 2.0

2011-08-16 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085979#comment-13085979
 ] 

Philip Zeyliger commented on MAPREDUCE-279:
---

I will return on the 24th.  For urgent matters, please contact my
teammates or Amr.

Thanks,

-- Philip


 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0

 Attachments: MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
 MR-279_MR_files_to_move.txt, MR-279_MR_files_to_move.txt, 
 MapReduce_NextGen_Architecture.pdf, capacity-scheduler-dark-theme.png, 
 hadoop_contributors_meet_07_01_2011.pdf, 
 multi-column-stable-sort-default-theme.png, post-move.patch, 
 yarn-state-machine.job.dot, yarn-state-machine.job.png, 
 yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
 yarn-state-machine.task.dot, yarn-state-machine.task.png


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution.
 Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2803) Separate client and server configs

2011-08-10 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082803#comment-13082803
 ] 

Philip Zeyliger commented on MAPREDUCE-2803:


I'm a huge +1 (like +1.1 or +1.2) for separating out client and server configs, 
fwiw.  I've seen countless folks (mostly myself, of course) get confused about 
whether a given config is client-side, jobtracker-side, or task-tracker side.  
Since configs aren't going to be compatible anyway, this is a reasonable time 
to try to separate that.



 Separate client and server configs
 --

 Key: MAPREDUCE-2803
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2803
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Luke Lu
 Fix For: 0.23.0


 yarn-{site,default}.xml contains many knobs none-ops users don't need to know 
 (e.g., server principals and keytab locations etc.). It's confusing to users. 
 Let's separate the server config into separate files 
 yarn-server-{site.default}.xml
 yarn common and client configs would remain in yarn-{site,default}.xml and 
 YarnServerConfig shall read both.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2463) Job History files are not moving to done folder when job history location is hdfs location

2011-05-02 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027564#comment-13027564
 ] 

Philip Zeyliger commented on MAPREDUCE-2463:


I recall MAPREDUCE-2351 changing how this code path worked.  Don't entirely 
remember the details any more...

 Job History files are not moving to done folder when job history location is 
 hdfs location
 --

 Key: MAPREDUCE-2463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2463
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.23.0
Reporter: Devaraj K
Assignee: Devaraj K

 If mapreduce.jobtracker.jobhistory.location is configured as HDFS location 
 then either during initialization of Job Tracker (while moving old job 
 history files) or after completion of the job, history files are not moving 
 to done and giving following exception.
 {code:xml} 
 2011-04-29 15:27:27,813 ERROR 
 org.apache.hadoop.mapreduce.jobhistory.JobHistory: Unable to move history 
 file to DONE folder.
 java.lang.IllegalArgumentException: Wrong FS: 
 hdfs://10.18.52.146:9000/history/job_201104291518_0001_root, expected: 
 file:///
   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:402)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:58)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:419)
   at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:294)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
   at 
 org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1516)
   at 
 org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1492)
   at 
 org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:1482)
   at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistory.moveToDoneNow(JobHistory.java:348)
   at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistory.access$200(JobHistory.java:61)
   at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistory$1.run(JobHistory.java:439)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-279) Map-Reduce 2.0

2011-03-19 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008882#comment-13008882
 ] 

Philip Zeyliger commented on MAPREDUCE-279:
---

I'm traveling and will return to the office on Monday, March 28th.

For urgent matters, please contact Aparna Ramani.

Thanks!

-- Philip


 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0

 Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
 MR-279_MR_files_to_move.txt


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (MAPREDUCE-2381) JobTracker instrumentation not consistent about error handling

2011-03-13 Thread Philip Zeyliger (JIRA)
JobTracker instrumentation not consistent about error handling
--

 Key: MAPREDUCE-2381
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2381
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
 Attachments: MAPREDUCE-2381.patch.txt

In the current code, if the class specified by the JobTracker instrumentation 
config property is not there, the JobTracker fails to start with a 
ClassNotFound.  If it's there, but it can't load for whatever reason, the 
JobTracker continues with the default.  Having two different error-handling 
routes is a bit confusing; I propose to move one line so that it's consistent.  
(On the TaskTracker instrumentation side, if any of the multiple 
instrumentations aren't available, the default is used.)

The attached patch merely moves a line inside of the try block that's already 
there. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2381) JobTracker instrumentation not consistent about error handling

2011-03-13 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-2381:
---

Attachment: MAPREDUCE-2381.patch.txt

 JobTracker instrumentation not consistent about error handling
 --

 Key: MAPREDUCE-2381
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2381
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
 Attachments: MAPREDUCE-2381.patch.txt


 In the current code, if the class specified by the JobTracker instrumentation 
 config property is not there, the JobTracker fails to start with a 
 ClassNotFound.  If it's there, but it can't load for whatever reason, the 
 JobTracker continues with the default.  Having two different error-handling 
 routes is a bit confusing; I propose to move one line so that it's 
 consistent.  (On the TaskTracker instrumentation side, if any of the multiple 
 instrumentations aren't available, the default is used.)
 The attached patch merely moves a line inside of the try block that's already 
 there. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2381) JobTracker instrumentation not consistent about error handling

2011-03-13 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-2381:
---

Assignee: Philip Zeyliger
  Status: Patch Available  (was: Open)

 JobTracker instrumentation not consistent about error handling
 --

 Key: MAPREDUCE-2381
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2381
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
 Attachments: MAPREDUCE-2381.patch.txt


 In the current code, if the class specified by the JobTracker instrumentation 
 config property is not there, the JobTracker fails to start with a 
 ClassNotFound.  If it's there, but it can't load for whatever reason, the 
 JobTracker continues with the default.  Having two different error-handling 
 routes is a bit confusing; I propose to move one line so that it's 
 consistent.  (On the TaskTracker instrumentation side, if any of the multiple 
 instrumentations aren't available, the default is used.)
 The attached patch merely moves a line inside of the try block that's already 
 there. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-2043) TaskTrackerInstrumentation and JobTrackerInstrumentation should be public

2010-09-01 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-2043:
---

Attachment: MAPREDUCE-2043.patch.txt

Chris and Luke,

Thanks for the context.  I think the experimental developers ought to 
embed/extend Hadoop in a different package than org.apache.hadoop.mapred, so 
there's a reasonable argument for 'public', with the interface caveats.  Agree 
wholeheartedly that this interface should be evolving.  It's proven a 
convenient way, actually, to try some things out.

I've taken Luke's suggestion and added the interface annotations.  I've 
tested this with ant compile-core only (ant compile breaks on mumak in 
contrib).  Attached is that new patch.

-- Philip

 TaskTrackerInstrumentation and JobTrackerInstrumentation should be public
 -

 Key: MAPREDUCE-2043
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2043
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
 Attachments: MAPREDUCE-2043.patch.txt, MAPREDUCE-2043.patch.txt


 Hadoop administrators can specify classes to be loaded as 
 TaskTrackerInstrumentation and JobTrackerInstrumentation implementations, 
 which, roughly, define listeners on TT and JT events.  Unfortunately, since 
 the class has default access, extending it requires setting the extension's 
 package to org.apache.hadoop.mapred, which seems like poor form.
 I propose we make the two instrumentation classes public, so they can be 
 extended wherever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2043) TaskTrackerInstrumentation and JobTrackerInstrumentation should be public

2010-08-30 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-2043:
---

Attachment: MAPREDUCE-2043.patch.txt

Patch attached.  Here are the changed lines, to save people some clicks:

{noformat}
-class JobTrackerInstrumentation {
+public class JobTrackerInstrumentation {
-class TaskTrackerInstrumentation  {
+public class TaskTrackerInstrumentation  {
{noformat}

 TaskTrackerInstrumentation and JobTrackerInstrumentation should be public
 -

 Key: MAPREDUCE-2043
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2043
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
 Attachments: MAPREDUCE-2043.patch.txt


 Hadoop administrators can specify classes to be loaded as 
 TaskTrackerInstrumentation and JobTrackerInstrumentation implementations, 
 which, roughly, define listeners on TT and JT events.  Unfortunately, since 
 the class has default access, extending it requires setting the extension's 
 package to org.apache.hadoop.mapred, which seems like poor form.
 I propose we make the two instrumentation classes public, so they can be 
 extended wherever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2043) TaskTrackerInstrumentation and JobTrackerInstrumentation should be public

2010-08-30 Thread Philip Zeyliger (JIRA)
TaskTrackerInstrumentation and JobTrackerInstrumentation should be public
-

 Key: MAPREDUCE-2043
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2043
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
 Attachments: MAPREDUCE-2043.patch.txt

Hadoop administrators can specify classes to be loaded as 
TaskTrackerInstrumentation and JobTrackerInstrumentation implementations, 
which, roughly, define listeners on TT and JT events.  Unfortunately, since the 
class has default access, extending it requires setting the extension's package 
to org.apache.hadoop.mapred, which seems like poor form.

I propose we make the two instrumentation classes public, so they can be 
extended wherever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation

2010-08-11 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897334#action_12897334
 ] 

Philip Zeyliger commented on MAPREDUCE-1881:


I'll chime in that I'm using the instrumentation classes and find them a useful 
way to listen to some events that are otherwise hard to get at.

 Improve TaskTrackerInstrumentation
 --

 Key: MAPREDUCE-1881
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, 
 mapreduce-1881.patch


 The TaskTrackerInstrumentation class provides a useful way to capture key 
 events at the TaskTracker for use in various reporting tools, but it is 
 currently rather limited, because only one TaskTrackerInstrumentation can be 
 added to a given TaskTracker and this objects receives minimal information 
 about tasks (only their IDs). I propose enhancing the functionality through 
 two changes:
 # Support a comma-separated list of TaskTrackerInstrumentation classes rather 
 than just a single one in the JobConf, and report events to all of them.
 # Make the reportTaskLaunch and reportTaskEnd methods in 
 TaskTrackerInstrumentation receive a reference to a whole Task object rather 
 than just its TaskAttemptID. It might also be useful to make the latter 
 receive the task's final state, i.e. failed, killed, or successful.
 I'm just posting this here to get a sense of whether this is a good idea. If 
 people think it's okay, I will make a patch against trunk that implements 
 these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-03 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895129#action_12895129
 ] 

Philip Zeyliger commented on MAPREDUCE-220:
---

Hi Scott,

You could also reset the counters to 0 when the new task is started (sort of 
like a tare button on a scale).  If 
resourceCalculator.getProcCumulativeCpuTime() was rather 
resourceCalculator.getCumulativeCpuTimeDelta() [cumulative CPU time since last 
call], you could use counter.incr() for the CPU usage.

It's also worth mentioning that the memory usage here is the last-known memory 
usage value.  It's not byte-seconds (which wouldn't be that useful), nor is it 
maximum memory.  That seems useful, but it's a bit unintuitive.

{noformat}
+long cpuTime = resourceCalculator.getProcCumulativeCpuTime();
+long pMem = resourceCalculator.getProcPhysicalMemorySize();
+long vMem = resourceCalculator.getProcVirtualMemorySize();
+counters.findCounter(TaskCounter.CPU_MILLISECONDS).setValue(cpuTime);
+counters.findCounter(TaskCounter.PHYSICAL_MEMORY_BYTES).setValue(pMem);
+counters.findCounter(TaskCounter.VIRTUAL_MEMORY_BYTES).setValue(vMem);
{noformat}

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-02 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894821#action_12894821
 ] 

Philip Zeyliger commented on MAPREDUCE-220:
---

Scott,

Quick question: have you tried this patch with JVM re-use enabled?  On my 
quick-reading, this patch doesn't handle that case; I don't know if it's a real 
problem or not.

Cheers,

-- Philip

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-01-28 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806042#action_12806042
 ] 

Philip Zeyliger commented on MAPREDUCE-1126:


@Scott: the annotations for Input/OutputFormat seem to be misplaced.  It seems 
desirable to be able to write a single Map function that does wordcount on 
Strings, regardless of whether those strings are stored in newline-delimited 
text, sequence files, avro data files, or whatever.  

@Chris: 1) throwing away all Java type hierarchies.  Only sometimes, no?  
This is only in the case where you explicitly want to do unions (and Java's 
union support is either Object, type hierarchies, or wrappers).  In the typical 
case, your map functions on SomeSpecificRecordType, outputs 
SomeSpecificMapOutputKey/ValueType, and so forth.  You still get type safety in 
many of the recommended use cases.

 shuffle should use serialization to get comparator
 --

 Key: MAPREDUCE-1126
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Reporter: Doug Cutting
Assignee: Aaron Kimball
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
 MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
 MAPREDUCE-1126.patch, MAPREDUCE-1126.patch


 Currently the key comparator is defined as a Java class.  Instead we should 
 use the Serialization API to create key comparators.  This would permit, 
 e.g., Avro-based comparators to be used, permitting efficient sorting of 
 complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1368) Vertica adapter doesn't use explicity transactions or report progress

2010-01-11 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798758#action_12798758
 ] 

Philip Zeyliger commented on MAPREDUCE-1368:


Would transactions help you?  A speculative task can show up right after the 
map task decides to commit the transaction, and you're in the same place.

 Vertica adapter doesn't use explicity transactions or report progress
 -

 Key: MAPREDUCE-1368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Omer Trajman
Assignee: Omer Trajman
 Fix For: 0.21.0


 The vertica adapter doesn't use explicit transactions, so speculative tasks 
 can result in duplicate loads.  The JDBC driver supports it so the fix is 
 pretty minor. Also the JDBC driver commits synchronously and the adapter 
 needs to report progress even if it takes longer than the timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1368) Vertica adapter doesn't use explicity transactions or report progress

2010-01-11 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798810#action_12798810
 ] 

Philip Zeyliger commented on MAPREDUCE-1368:


Sorry, I wasn't clear.  I think that even if you had transactions, you could 
still have data inserted twice.  A map task looks like: (1) start map task, (2) 
begin transaction, (3) insert many rows, (4) commit transaction, (5) end map 
task.  If you crash between (4) and (5), MapReduce will schedule another worker.


 Vertica adapter doesn't use explicity transactions or report progress
 -

 Key: MAPREDUCE-1368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Omer Trajman
Assignee: Omer Trajman
 Fix For: 0.21.0


 The vertica adapter doesn't use explicit transactions, so speculative tasks 
 can result in duplicate loads.  The JDBC driver supports it so the fix is 
 pretty minor. Also the JDBC driver commits synchronously and the adapter 
 needs to report progress even if it takes longer than the timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1154) Large-scale, automated test framwork for Map-Reduce

2009-10-25 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769898#action_12769898
 ] 

Philip Zeyliger commented on MAPREDUCE-1154:


I'm vaguely uncomfortable with having a lot of code, even though it's test 
code, weaved in via AspectJ.  It seems like it will make it very easy to make 
changes that break the testing code (because the testing code is not visible to 
the regular tools, and is in an unexpected place).  I understand, of course, 
that the build system will check that the weaving can happen, but since these 
tests are inherently large-scale and not run at every Hudson (or are they?), it 
worries me a bit.

Has anyone done the reverse and unweaved functions from classes?  Seems 
like we could annotate functions with @RemoveInProduction, and then use some 
tool to forcibly remove methods from the resulting .class files.  Still opaque, 
but at least it's clear where the testing code is.

If you've already been working with this in AspectJ, I'm curious how the 
experience has been.

-- Philip

 Large-scale, automated test framwork for Map-Reduce
 ---

 Key: MAPREDUCE-1154
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1154
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: test
Reporter: Arun C Murthy
 Fix For: 0.21.0

 Attachments: testing.patch


 HADOOP-6332 proposes a large-scale, automated, junit-based test-framework for 
 Hadoop.
 This jira is meant to track relevant work to Map-Reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-989) Allow segregation of DistributedCache for maps and reduces

2009-09-17 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756676#action_12756676
 ] 

Philip Zeyliger commented on MAPREDUCE-989:
---

The use cases definitely make sense. Unpacking archives on setup tasks is often
going to be pointless.

I've been thinking about what a reasonable API for this would be (especially 
after working on MAPREDUCE-476), from the Job
submitter's role. One thought is:

bq. addCacheFile(URI path, SetTaskType tasks, SetDistributedCacheOptions 
options);

Where the default for tasks is an ImmutableSet(EnumSetTaskType) containing
MAP and REDUCE. DistributedCacheOptions include
{code}
   ADD_TO_CLASSPATH
   UNARCHIVE
   CREATE_SYMLINK
{code}
The defaults are to not add to classpath, not unarchive, and not create the 
symlink.
(Note that we'd be creating symlinks per-file, instead of globally, which is 
the only
place to set the option currently.)

What I like about this is that it replaces 5 methods (addCacheFile,
addCacheArchive, addFileToClassPath, addArchiveToClassPath, createSymlink),
with one method, and doesn't loose much in the way of readability.

You could also use booleans or enums (boolean add_to_classpath, boolean
unarchive, boolean create_symlink), but that is often difficult to read.

On the back-end, you'd need to revisit how the files to be cached are stored.
The current scheme of using
{code}
mapred.cache.archives.timestamps
mapred.cache.localFiles
mapred.job.classpath.files
mapred.job.classpath.archives
mapred.cache.archives
mapred.cache.files
mapred.create.symlink
{code}
probably needs to remain for backwards compatibility, but it would
be great to just stick that into one configuration property:
bq. mapred.filecache = [ { path: ..., tasks: [ MAP, REDUCE ], ... }, ... ]
or, if it's legal
{code}
  mapred.filecache.0 = { path: ..., ... }
  mapred.filecache.1 = ...
  ...
{code}

Thoughts?

 Allow segregation of DistributedCache for maps and reduces
 --

 Key: MAPREDUCE-989
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-989
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Arun C Murthy

 Applications might have differing needs for files in the DistributedCache wrt 
 maps and reduces. We should allow them to specify them separately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-980) Modify JobHistory to use Avro for serialization instead of raw JSON

2009-09-17 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756752#action_12756752
 ] 

Philip Zeyliger commented on MAPREDUCE-980:
---

My experience with generated objects (from a couple of years using protocol 
buffers) is that one ends up wrapping them often (preferably with composition). 
 

The generated class is responsible for serialization and deserialization, and 
the wrapper class is responsible for added logic.  It's hard to make the 
generator do something reasonable for logic (or even inheritance) 
cross-language.  Having a wrapper also allows you to have two ways to use 
something, in two different contexts, where you might want different 
surrounding logic.  (So, if you had an Avro schema for an Event, the code that 
generates the Event might use one wrapper, and the code that consumes it might 
use the raw object, or have a different object.)

 Modify JobHistory to use Avro for serialization instead of raw JSON
 ---

 Key: MAPREDUCE-980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-980
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jothi Padmanabhan
Assignee: Doug Cutting
 Fix For: 0.21.0

 Attachments: MAPREDUCE-980.patch, MAPREDUCE-980.patch, 
 MAPREDUCE-980.patch, MAPREDUCE-980.patch


 MAPREDUCE-157 modifies JobHistory to log events using Json Format.  This can 
 be modified to use Avro instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-977) Missing jackson jars from Eclipse template

2009-09-16 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756103#action_12756103
 ] 

Philip Zeyliger commented on MAPREDUCE-977:
---

+1.

I can confirm that without this patch the Eclipse build is broken, and with it, 
it's not broken.

 Missing jackson jars from Eclipse template
 --

 Key: MAPREDUCE-977
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-977
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.21.0

 Attachments: MAPREDUCE-977.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-990) Making distributed cache getters in JobContext never return null

2009-09-16 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-990:
--

Attachment: MAPREDUCE-990.patch.txt

This changes javadocs and implementations of accessors to never return null.

I've also made getFileTimestamps and getArchiveTimestamps package private.  
Ideally those interfaces aren't leaked to the user at all--the only person who 
accesses them is the mapreduce framework itself, so they should remain an 
implementation detail.

 Making distributed cache getters in JobContext never return null
 

 Key: MAPREDUCE-990
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-990
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: MAPREDUCE-990.patch.txt


 MAPREDUCE-898 moved distributed cache setters and getters into Job and 
 JobContext.  Since the API is new, I'd like to propose that those getters 
 never return null, but instead always return an array, even if it's empty.
 If people don't like this change, I can instead merely update the javadoc to 
 reflect the fact that null may be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-990) Making distributed cache getters in JobContext never return null

2009-09-16 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-990:
--

Status: Patch Available  (was: Open)

 Making distributed cache getters in JobContext never return null
 

 Key: MAPREDUCE-990
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-990
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: MAPREDUCE-990.patch.txt


 MAPREDUCE-898 moved distributed cache setters and getters into Job and 
 JobContext.  Since the API is new, I'd like to propose that those getters 
 never return null, but instead always return an array, even if it's empty.
 If people don't like this change, I can instead merely update the javadoc to 
 reflect the fact that null may be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-990) Making distributed cache getters in JobContext never return null

2009-09-16 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756363#action_12756363
 ] 

Philip Zeyliger commented on MAPREDUCE-990:
---

bq. As javadoc says JobContext is a readonly view of the job provided to tasks. 
get*TimeStamps cannot be package private methods. Framework would call them 
from a different packge. 

The framework doesn't currently call them at all.  The framework probably has 
more than the JobContext, so they ought to move somewhere else.  I don't think 
they should be part of the user API--can I delete them entirely until they 
actually get used?

-- Philip

 Making distributed cache getters in JobContext never return null
 

 Key: MAPREDUCE-990
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-990
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: MAPREDUCE-990.patch.txt


 MAPREDUCE-898 moved distributed cache setters and getters into Job and 
 JobContext.  Since the API is new, I'd like to propose that those getters 
 never return null, but instead always return an array, even if it's empty.
 If people don't like this change, I can instead merely update the javadoc to 
 reflect the fact that null may be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Moved: (MAPREDUCE-987) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger moved HDFS-621 to MAPREDUCE-987:


Component/s: (was: tools)
 (was: test)
 test
 build
Key: MAPREDUCE-987  (was: HDFS-621)
Project: Hadoop Map/Reduce  (was: Hadoop HDFS)

 Exposing MiniDFS and MiniMR clusters as a single process command-line
 -

 Key: MAPREDUCE-987
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-987
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: build, test
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HDFS-621-0.20-patch, HDFS-621.patch


 It's hard to test non-Java programs that rely on significant mapreduce 
 functionality.  The patch I'm proposing shortly will let you just type 
 bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster to start a 
 cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
 of daemons, etc.  A test that checks how some external process interacts with 
 Hadoop might start minicluster as a subprocess, run through its thing, and 
 then simply kill the java subprocess.
 I've been using just such a system for a couple of weeks, and I like it.  
 It's significantly easier than developing a lot of scripts to start a 
 pseudo-distributed cluster, and then clean up after it.  I figure others 
 might find it useful as well.
 I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
 have all the required libraries, so I've put it there.  I could conceivably 
 split this into minimr and minihdfs, but it's specifically the fact that 
 they're configured to talk to each other that I like about having them 
 together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-987) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-987:
--

Attachment: MAPREDUCE-987.patch

Nicholas,

Agreed that circular dependencies are to be avoided.  I've moved this issue 
into MAPREDUCE, and spun up a new patch.

Do we anticipate a world where MR doesn't depend statically on HDFS (i.e., it 
only depends on the FileSystem interfaces)?

-- Philip

 Exposing MiniDFS and MiniMR clusters as a single process command-line
 -

 Key: MAPREDUCE-987
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-987
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: build, test
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HDFS-621-0.20-patch, HDFS-621.patch, MAPREDUCE-987.patch


 It's hard to test non-Java programs that rely on significant mapreduce 
 functionality.  The patch I'm proposing shortly will let you just type 
 bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster to start a 
 cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
 of daemons, etc.  A test that checks how some external process interacts with 
 Hadoop might start minicluster as a subprocess, run through its thing, and 
 then simply kill the java subprocess.
 I've been using just such a system for a couple of weeks, and I like it.  
 It's significantly easier than developing a lot of scripts to start a 
 pseudo-distributed cluster, and then clean up after it.  I figure others 
 might find it useful as well.
 I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
 have all the required libraries, so I've put it there.  I could conceivably 
 split this into minimr and minihdfs, but it's specifically the fact that 
 they're configured to talk to each other that I like about having them 
 together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-777) A method for finding and tracking jobs from the new API

2009-09-13 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754821#action_12754821
 ] 

Philip Zeyliger commented on MAPREDUCE-777:
---

I may be crazy to harm on this, but ClientProtocol still reads as very 
generic to me.  Perhaps JobTrackerClientProtocol, to at least indicate one of 
the components involved?

-- Philip

 A method for finding and tracking jobs from the new API
 ---

 Key: MAPREDUCE-777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-777
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: client
Reporter: Owen O'Malley
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: m-777.patch, patch-777-1.txt, patch-777-2.txt, 
 patch-777-3.txt, patch-777-4.txt, patch-777-5.txt, patch-777-6.txt, 
 patch-777-7.txt, patch-777.txt


 We need to create a replacement interface for the JobClient API in the new 
 interface. In particular, the user needs to be able to query and track jobs 
 that were launched by other processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-973) Move FailJob from examples to test

2009-09-10 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753976#action_12753976
 ] 

Philip Zeyliger commented on MAPREDUCE-973:
---

If you haven't done this already, I'm happy to move it to test.  Should 
SleepJob move too?

 Move FailJob from examples to test
 

 Key: MAPREDUCE-973
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples, test
Affects Versions: 0.21.0
Reporter: Chris Douglas
 Fix For: 0.21.0


 The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
 should either move to src/test, ideally with a unit test built around it, or 
 be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-777) A method for finding and tracking jobs from the new API

2009-09-03 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751289#action_12751289
 ] 

Philip Zeyliger commented on MAPREDUCE-777:
---

Took a quick pass at your patch.  Some comments, mostly documentation-related.

bq. +  static Counters downgrade(org.apache.hadoop.mapreduce.Counters counters) 
{

You might have some JavaDoc for this method.  Also, variables would be clearer 
if everything were old_counter and new_counter, since it's hard to keep track 
what's what.

bq. ClientProtocol

Are we settled on the name ClientProtocol?  It's quite generic sounding, and, 
without the package, hard to decipher.  Since these protocols will be the names 
of the public-ish wire APIs, perhaps JobClientProtocol would be more 
descriptive?

bq. +public class CLI extends Configured implements Tool {

Some of Hadoop uses apache.commons.cli to parse command line arguments.  (And 
there's CLI2 too, referred to in Maven, though I don't see any usages of it.  
You might consider using a command-line parsing library.

You might also consider splitting up the run() method into separate methods 
(even classes) for each piece of functionality.  This will make it much easier 
to test, and easier to parse, too.


bq. +public interface ClientProtocol extends VersionedProtocol {

In the javadoc here documenting the history of this protocol, you might mention 
the rename. 

bq. Changed protocol to use new api

This is not very descriptive for someone unfamiliar with this ticket.


Cheers,

-- Philip

 A method for finding and tracking jobs from the new API
 ---

 Key: MAPREDUCE-777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-777
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: client
Reporter: Owen O'Malley
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: m-777.patch, patch-777-1.txt, patch-777-2.txt, 
 patch-777-3.txt, patch-777.txt


 We need to create a replacement interface for the JobClient API in the new 
 interface. In particular, the user needs to be able to query and track jobs 
 that were launched by other processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-777) A method for finding and tracking jobs from the new API

2009-08-26 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748223#action_12748223
 ] 

Philip Zeyliger commented on MAPREDUCE-777:
---

Overall, +1 on having this interface!  Some thoughts:

 * Can getReasonForBlackList return an enum?
 * Is there a reason why getJobs returns Job[] and not CollectionJob?  
 * It seems like people may want to push filters down in getJobs.
 * Instead of get(Map,Reduce,SetupAndCleanup)TaskReports, should that just be a 
getTaskReport(TaskType)?  The number of task types is likely to increase.

-- Philip

 A method for finding and tracking jobs from the new API
 ---

 Key: MAPREDUCE-777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-777
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: client
Reporter: Owen O'Malley
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: m-777.patch, patch-777-1.txt, patch-777-2.txt, 
 patch-777.txt


 We need to create a replacement interface for the JobClient API in the new 
 interface. In particular, the user needs to be able to query and track jobs 
 that were launched by other processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-24 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Status: Patch Available  (was: Open)

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476-v9.patch, MAPREDUCE-476.patch, 
 v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-24 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476-v9.patch

Well-spotted, Tom.  I've restored the missing test.

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476-v9.patch, MAPREDUCE-476.patch, 
 v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-24 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747092#action_12747092
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

Failing test is 
org.apache.hadoop.mapred.TestRecoveryManager.testRestartCount.  I think 
that's failing all-over, not just here.

-- Philip

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476-v9.patch, MAPREDUCE-476.patch, 
 v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-23 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Status: Open  (was: Patch Available)

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476.patch, v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-23 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Status: Patch Available  (was: Open)

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476.patch, v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-23 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476-v8.patch

The test failure was spurious 
(org.apache.hadoop.mapred.TestRecoveryManager.testRestartCount is failing 
elsewhere also, and has nothing to do with this patch).

But the FindBugs errors were reasonable.  New patch fixes the three reported, 
and pasted in below.

{quote}
Bad practice Warnings
CodeWarning
DP  
org.apache.hadoop.mapreduce.filecache.TaskDistributedCacheManager.makeClassLoader(ClassLoader)
 creates a java.net.URLClassLoader classloader, which should be performed 
within a doPrivileged block


Bug type DP_CREATE_CLASSLOADER_INSIDE_DO_PRIVILEGED (click for details)
In class org.apache.hadoop.mapreduce.filecache.TaskDistributedCacheManager
In method 
org.apache.hadoop.mapreduce.filecache.TaskDistributedCacheManager.makeClassLoader(ClassLoader)
In class java.net.URLClassLoader
At TaskDistributedCacheManager.java:[line 235]
RV  org.apache.hadoop.mapred.TaskRunner.setupWorkDir(JobConf, File) ignores 
exceptional return value of java.io.File.mkdir()


Bug type RV_RETURN_VALUE_IGNORED_BAD_PRACTICE (click for details)
In class org.apache.hadoop.mapred.TaskRunner
In method org.apache.hadoop.mapred.TaskRunner.setupWorkDir(JobConf, File)
Called method java.io.File.mkdir()
At TaskRunner.java:[line 630]
Performance Warnings
CodeWarning
UrF Unread field: 
org.apache.hadoop.mapreduce.filecache.TaskDistributedCacheManager$CacheFile.localClassPath
{quote}

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476.patch, v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-21 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: v6-to-v7.patch

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476.patch, v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-21 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476-v7.patch

Fixing the test failures, or so I hope.

The workDir handling in LocalJobRunner was confused, so I've fixed that.  In 
general, the symlinking stuff is a bit of a complex mess: you can symlink 
things when you do foo#bar, and also, if you set the config variable, 
everything gets symlinked into the local working dir.  This is used by 
streaming.  It seems like the latter layer should be done by streaming on its 
own, but, alas, it may be too late to do that.

Note that I've attached a v6-vs-v7.patch file to make it easier to see what the 
latest changes were.

-- Philip

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476.patch, v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-21 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Status: Open  (was: Patch Available)

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476.patch, v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-903) Adding AVRO jar to eclipse classpath

2009-08-21 Thread Philip Zeyliger (JIRA)
Adding AVRO jar to eclipse classpath


 Key: MAPREDUCE-903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-903
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger


Avro is missing from the eclipse classpath, which caused Eclipse to whine.  
Easy fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-903) Adding AVRO jar to eclipse classpath

2009-08-21 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-903:
--

Attachment: MAPREDUCE-903.patch

This does not require a test---it's a configuration change for Eclipse.  It's a 
one-line diff.

 Adding AVRO jar to eclipse classpath
 

 Key: MAPREDUCE-903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-903
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
 Attachments: MAPREDUCE-903.patch


 Avro is missing from the eclipse classpath, which caused Eclipse to whine.  
 Easy fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-903) Adding AVRO jar to eclipse classpath

2009-08-21 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-903:
--

Status: Patch Available  (was: Open)

 Adding AVRO jar to eclipse classpath
 

 Key: MAPREDUCE-903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-903
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
 Attachments: MAPREDUCE-903.patch


 Avro is missing from the eclipse classpath, which caused Eclipse to whine.  
 Easy fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-905) Add Eclipse launch tasks for MapReduce

2009-08-21 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-905:
--

Status: Patch Available  (was: Open)

 Add Eclipse launch tasks for MapReduce
 --

 Key: MAPREDUCE-905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-905
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: Eclipse 3.5
Reporter: Philip Zeyliger
Priority: Minor
 Attachments: MAPREDUCE-905.patch


 This is a revival of HADOOP-5911, but only for the MR project.
 Eclipse has a notion of run configuration, which encapsulates what's needed 
 to run or debug an application. I use this quite a bit to start various 
 Hadoop daemons in debug mode, with breakpoints set, to inspect state and what 
 not.
 This is simply configuration, so no tests are provided.  After running ant 
 eclipse-files and refreshing your project, you should see entries in the 
 Run pulldown.  There's a template for testing a specific test, and also 
 templates to run all the tests, the job tracker, and a task tracker.  It's 
 likely that some parameters need to be further tweaked to have the same 
 behavior as ant test, but for most tests, this works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-14 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743257#action_12743257
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

Vinod,

Thanks for updating the patch!  Do you have an update to MAPREDUCE-711 that has 
the package move?  I am trying to apply MAPREDUCE-711-20090709-mapreduce.1.txt 
and MAPREDUCE-476-20090814.1.txt to trunk, and I think there's a mismatch in 
the new filecache package name.

-- Philip

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-v2-vs-v3.patch, 
 MAPREDUCE-476-v2-vs-v3.try2.patch, MAPREDUCE-476-v2-vs-v4.txt, 
 MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, MAPREDUCE-476-v3.try2.patch, 
 MAPREDUCE-476-v4-requires-MR711.patch, MAPREDUCE-476-v5-requires-MR711.patch, 
 MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-08-12 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742714#action_12742714
 ] 

Philip Zeyliger commented on MAPREDUCE-157:
---

Avro would force you in to a schema, and I think having a schema is the only 
way to get stability in the format.  Yes, there's probably overhead, but if 
we're using Avro for other things (i.e., all RPCs), we may as well fix those 
overheads when we get to them.  (It may also be a net win to store the data in 
binary avro format, and write an avrocat to deserialize into text before 
pushing to tools like awk, but I do understand the desire for a text format.)

All that said, you have specific needs in mind here, and I'm mostly waxing 
poetical, so I'll certainly defer.

-- Philip

 Job History log file format is not friendly for external tools.
 ---

 Key: MAPREDUCE-157
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan

 Currently, parsing the job history logs with external tools is very difficult 
 because of the format. The most critical problem is that newlines aren't 
 escaped in the strings. That makes using tools like grep, sed, and awk very 
 tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-08-10 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741630#action_12741630
 ] 

Philip Zeyliger commented on MAPREDUCE-157:
---

Would this be a good place to try the Avro serialization format in Hadoop 
proper?  If text-formatting is desired, AVRO-50 has a text format for Avro, 
which is JSON already.  So you'd basically be implementing the same thing, but 
with the extra context of an Avro schema.

-- Philip

 Job History log file format is not friendly for external tools.
 ---

 Key: MAPREDUCE-157
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan

 Currently, parsing the job history logs with external tools is very difficult 
 because of the format. The most critical problem is that newlines aren't 
 escaped in the strings. That makes using tools like grep, sed, and awk very 
 tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-08-10 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741634#action_12741634
 ] 

Philip Zeyliger commented on MAPREDUCE-157:
---

done

On Mon, Aug 10, 2009 at 1:08 AM, Jothi Padmanabhan (JIRA)



 Job History log file format is not friendly for external tools.
 ---

 Key: MAPREDUCE-157
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan

 Currently, parsing the job history logs with external tools is very difficult 
 because of the format. The most critical problem is that newlines aren't 
 escaped in the strings. That makes using tools like grep, sed, and awk very 
 tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-08-10 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741641#action_12741641
 ] 

Philip Zeyliger commented on MAPREDUCE-157:
---

Ack, ignore that done.  Was in the wrong browser tab.

 Job History log file format is not friendly for external tools.
 ---

 Key: MAPREDUCE-157
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan

 Currently, parsing the job history logs with external tools is very difficult 
 because of the format. The most critical problem is that newlines aren't 
 escaped in the strings. That makes using tools like grep, sed, and awk very 
 tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-04 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739033#action_12739033
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

bq.I agree. A few of them are used to manage the Configuration object. (In 
my mind, we're serializing and de-serializing a set of requirements for the 
distributed cache into the text configuration, and doing so a bit haphazardly.) 
I was very tempted to remove all the ones that are only meant to be internal, 
but Tom advised me that I need to keep them deprecated for a version. Again, I 
think moving those methods into a more private place is a good task to do along 
with changing how JobClient calls into this stuff.
bq. +1. So are you planning to do in the next version or in this patch itself?

Next version.

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-04 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476-v5-requires-MR711.patch

Latest patch.

Thanks Vinod for the review!

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-07-31 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737621#action_12737621
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

Hi Vinod,

Thanks for the ping; got distracted by other things.  And thanks again for the 
detailed
review.  My responses are below.  I've generated a patch that shows the 
differences
between v2 and v4, and also the patch, in a state where it still depends on 
MAPREDUCE-711. 
is there anything blocking MAPREDUCE-711 that prevents it from being committed?

Also, sorry about the multiple uploads here.  I had a very clever bug in there
(caused by not thinking enough while resolving a merge conflict) that deleted
the current working directory, recursively, in one of the tests.  (TaskRunner
is hard-coded to delete current working directory, which is ok, since it's
typically a child process; not ok for LocalJobRunner.)

I've run the relevant tests; the full tests take a while, so I'm running those
in the background.

{quote}
$for i in TestMRWithDistributedCache TestMiniMRLocalFS TestMiniMRDFSCaching 
TestTrackerDistributedCacheManager; do; ant test -Dtestcase=$i  test-out-$i  
echo $i good || echo $i bad; done
TestMRWithDistributedCache good
TestMiniMRLocalFS good
TestMiniMRDFSCaching good
TestTrackerDistributedCacheManager good
{quote}

bq. There is quite a bit of refactoring in this patch, though I find it really 
useful.

Yep.  Having DistributedCache work locally is easy if you refactor the code
a bit, so that's how I went at it.


bq. Please make sure that in the newly added code, lines aren't longer than 80 
characters. For e.g, see DistributedCacheManager.newTaskHandle() method.

A handful of git diff foo..bar | egrep ^\+\+\+|^\+ .{80} has done the trick,
I think.  The tricky bit is always fixing only the lines I've changed,
and not all the lines in a given file, to preserve history and keep
reviewing sane.

bq. Just a thought, can the classes be better renamed to reflect their usage, 
something like TrackerDistributedCacheManager and TaskDistributeCacheManager?

I like those names better; thanks.  Changed.

bq. DistributedCacheManager and DistributedCacheHandle: Explicitly state in 
javadoc that it is not a public interface 

Done.

bq. This class should also have the variable number argument getLocalCache() 
methods so that the corresponding methods in DistributedCache can be 
deprecated. Also, each method in DistributedCache should call the correponding 
method in DistributedCacheManager class.

Don't think I agree here.  We can deprecate the getLocalCache methods 
in DistributedCache right away.  They delegate to each other, and one of them
delegates to TrackerDistributedCacheManager.  Ideally, I'd remove these
altogether --- Hadoop internally does not use these methods with
this patch, and there's no sensible reason why someone else would,
but since it's public, it's getting deprecated.  But it's not
being deprecated with a pointer to use something else; it's getting
deprecated so that you don't use it at all.


bq. DistributedCacheHandle CacheFile.makeCacheFiles()
bq. isClassPath can be renamed to shouldBePutInClasspath
Renamed to shouldBeAddedToClasspath.

bq. paths can be renamed to pathsToPutInClasspath.
Renamed to pathsToBeAddedToClasspath

bq. Use .equals method at +150 if (cacheFile.type == CacheFile.FileType.ARCHIVE)

I believe that technically it doesn't matter.
The JDK implementation of equals() on java.lang.Enum is final, 
and hardcoded to this==other.  This is the only thing that makes
sense, since there's only ever one instance of a given Enum.
I took an inaccurate look at the code base, and == is the more
common option.
{quote}
  # Inaccurate!  Not a static analysis!  Not even close! ;)
  [1]doorstop:hadoop-mapreduce(140142)$ack \([a-zA-Z]*\.[a-zA-Z]*\.equals src 
| wc -l
  11
  [0]doorstop:hadoop-mapreduce(140143)$ack \([a-zA-Z]*\.[a-zA-Z]* == src | wc 
-l
  127
{quote}
If you feel strongly about this, happy to change it, but I think == is
more consistent.

bq. makeCacheFiles: boolean isArchive - FileType fileType

done.

bq. I think it would be cleaner to return target instead of passing it as an 
argument.

Done.

bq. makeCacheFiles() method should be documented

Done.

bq. setup() This method is really useful, avoids a lot of code duplication!

Ok.

bq. Leave localTaskFile writing business back in TaskRunner itself. I think It 
is the task's responsiblity, not the DistributeCacheHandle's

Good call; done.

bq. cacheSubdir can better be an argument to setup() method instead of passing 
it to the constructor.

Good idea; done.

bq. getClassPaths() : Document that it has to be called and useful only when is 
already invoked.

Done.  I've made it throw an exception if it's called erroneously, since I could
see that causing trouble for developers.

bq. TaskTracker.initialize() A new DistributedCacheManager is created every 

[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-07-31 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476-v4-requires-MR711.patch
MAPREDUCE-476-v2-vs-v4.txt

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-07-30 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737385#action_12737385
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

Vinod,

Yes.  I've been hacking away at it today.  Please ignore those last two updated 
diffs: while getting rid of some 80+ character lines, I fumbled some git stuff 
and produced bad patches.  I'll be producing good ones after some more sanity 
checking either late today or tomorrow morning.

-- Philip

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-07-30 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737391#action_12737391
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

Never mind, trying to rush before leaving the office, and the tests fail here.  
Back tomorrow.

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, MAPREDUCE-476-v3.try2.patch, 
 MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-07-17 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732547#action_12732547
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

Vinod,

Thanks for your comments and thorough review.  I'll take a closer look over the 
next couple of days and post a new patch.

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-v2.patch, MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-711) Move Distributed Cache from Common to Map/Reduce

2009-07-10 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729692#action_12729692
 ] 

Philip Zeyliger commented on MAPREDUCE-711:
---

Cool; I'll produce a new patch once you upload a new one here.  Do consider 
changing the package name from  filecache to distributedcache, since two names 
are more confusing than one.

I think people who depended on the one-jar-to-rule-them-all (the pre-split 
world) will assume that they must depend on all three split jars for if they 
don't want to worry about what ended up where.  So I'm not sure you're breaking 
code by moving it into another jar any more than the project split already has.

-- Philip

 Move Distributed Cache from Common to Map/Reduce
 

 Key: MAPREDUCE-711
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-711
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Vinod K V
 Attachments: MAPREDUCE-711-20090709-common.txt, 
 MAPREDUCE-711-20090709-mapreduce.1.txt, MAPREDUCE-711-20090709-mapreduce.txt, 
 MAPREDUCE-711-20090710.txt


 Distributed Cache logically belongs as part of map/reduce and not Common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-07-09 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476-v2.patch

In light of MAPREDUCE-711, I generated a new patch.  I applied 
MAPREDUCE-711-20090709-mapreduce.1.txt first, so this shouldn't be submitted to 
Hudson until after that gets checked in.  I generated the patch by applying

bq. cat HADOOP-2914-v3.patch | sed -e 's%src/core/%src/java/%g' | sed -e 
's%src/mapred/%src/java/%g' | sed -e 's%src/test/core%src/test/mapred%g' | 
patch -p0

I had to clean up DistributedCache.java a tiny bit (there were 2 rejects) 
because some Javadoc links were removed in the project move; I've reinstated 
them.  (I think they were removed because they pointed to MR from Common, but 
that's no longer an issue with MAPREDUCE-711.)


 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-v2.patch, MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-07-01 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476.patch

Regenerated patch after project split.  I used:

bq. cat /Users/philip/Downloads/HADOOP-2914-v3.patch | sed -e 
's,src/mapred/,src/java/,' | patch -p0

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.