Re: Creating splits/tasks at the client

2006-09-29 Thread Doug Cutting
Benjamin Reed wrote: Split will write the hosts first, so in the JobTracker, when you get the byte array representing the Split, any fields from the sub class will follow the Split serialized bytes. The JobTracker can skip the Type in the bytes representing the serialized Split and then deseriali

Re: Creating splits/tasks at the client

2006-09-29 Thread Doug Cutting
Owen O'Malley wrote: Of course, once we allow user-defined InputSplits we will be back in exactly the same boat of running user-code on the JobTracker, unless we also ship over the preferred hosts for each InputFormat too. So, to entirely avoid user code in the job tracker we'd need a final

Re: Creating splits/tasks at the client

2006-09-29 Thread Benjamin Reed
No, even with user defined Splits we don't need to use user code in the JobTracker if we make Split a Writable class that has the hosts array. Split will write the hosts first, so in the JobTracker, when you get the byte array representing the Split, any fields from the sub class will follow the S

Re: Creating splits/tasks at the client

2006-09-29 Thread Owen O'Malley
On Sep 29, 2006, at 12:20 AM, Benjamin Reed wrote: I please correct me if I'm reading the code incorrectly, but it seems like submitJob puts the submitted job on the jobInitQueue which is immediately dequeued by the JobInitThread and then initTasks() will get the file splits and create Tasks

Re: Creating splits/tasks at the client

2006-09-29 Thread Benjamin Reed
I please correct me if I'm reading the code incorrectly, but it seems like submitJob puts the submitted job on the jobInitQueue which is immediately dequeued by the JobInitThread and then initTasks() will get the file splits and create Tasks. Thus, it doesn't seem like there is any difference in me

Re: Creating splits/tasks at the client

2006-09-28 Thread Bryan A. P. Pendleton
I'm largely at fault for the "user code running in the JobTracker" that exists. I support this change - but, I might reformulate it. Why not make this a sort of special Job? It can even be formulated roughly like this: input -> map(Job,FilePath) -> reduce(Job,FileSplits) -> SchedulableJob It mi

Re: Creating splits/tasks at the client

2006-09-28 Thread Doug Cutting
Benjamin Reed wrote: One of the things that bothers me about the JobTracker is that it is running user code when it creates the FileSplits. In the long term this puts the JobTracker JVM at risk due to errors in the user code. JVM's are supposed to be able to do this kind of stuff securely. Sti