Hi John, 2) The shell script is invoked in the mappers of a Hadoop streaming job.
1) The use case is that I have to process multiple entities in parallel. Each entity is associated with its own data set. The processing involves a few hive queries to do joins and aggregations, which is followed by some code in Python. My thought process is to put the hive queries and python invocation in a shell script, and invoke the shell script on multiple entities in parallel through a streaming mapreduce job. Shirish On Sat, Apr 16, 2016 at 12:10 AM, Jörn Franke <jornfra...@gmail.com> wrote: > Just out of curiosity, what is the use case behind this? > > How do you call the shell script? > > > On 16 Apr 2016, at 00:24, Shirish Tatikonda <shirish.tatiko...@gmail.com> > wrote: > > > > Hello, > > > > I am trying to run multiple hive queries in parallel by submitting them > through a map-reduce job. > > More specifically, I have a map-only hadoop streaming job where each > mapper runs a shell script that does two things -- 1) parses input lines > obtained via streaming; and 2) submits a very simple hive query (via hive > -e ...) with parameters computed from step-1. > > > > Now, when I run the streaming job, the mappers seem to be stuck and I > don't know what is going on. When I looked on resource manager web UI, I > don't see any new MR Jobs (triggered from the hive query). I am trying to > understand this behavior. > > > > This may be a bad idea to begin with, and there may be better ways to > accomplish the same task. However, I would like to understand the behavior > of such a MR job. > > > > Any thoughts? > > > > Thank you, > > Shirish > > >