Hi John,

2) The shell script is invoked in the mappers of a Hadoop streaming job.

1) The use case is that I have to process multiple entities in parallel.
Each entity is associated with its own data set. The processing involves a
few hive queries to do joins and aggregations, which is followed by some
code in Python. My thought process is to put the hive queries and python
invocation in a shell script, and invoke the shell script on multiple
entities in parallel through a streaming mapreduce job.

Shirish


On Sat, Apr 16, 2016 at 12:10 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> Just out of curiosity, what is the use case behind this?
>
> How do you call the shell script?
>
> > On 16 Apr 2016, at 00:24, Shirish Tatikonda <shirish.tatiko...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > I am trying to run multiple hive queries in parallel by submitting them
> through a map-reduce job.
> > More specifically, I have a map-only hadoop streaming job where each
> mapper runs a shell script that does two things -- 1) parses input lines
> obtained via streaming; and 2) submits a very simple hive query (via hive
> -e ...) with parameters computed from step-1.
> >
> > Now, when I run the streaming job, the mappers seem to be stuck and I
> don't know what is going on. When I looked on resource manager web UI, I
> don't see any new MR Jobs (triggered from the hive query). I am trying to
> understand this behavior.
> >
> > This may be a bad idea to begin with, and there may be better ways to
> accomplish the same task. However, I would like to understand the behavior
> of such a MR job.
> >
> > Any thoughts?
> >
> > Thank you,
> > Shirish
> >
>

Reply via email to