Re: oozie in Shell script

2018-06-25 Thread Sowjanya Kakarala
Hi Peter,
Thanks for your reply and great suggestions.
But, for my usecase I dont think coodinator can help because I am doing
some logistics in the shellscript.

So my usecase is:
I have a shell script which takes config file and table names from postgres
and hive. and this script generates sqoop commands based on dates sent
through config file and reads from postgres and writes to hive tables(S3
location). so lets say this will run 900 sqoop commands

when I run this script separately it runs fine, but for some time it fails
with "5datanodes exists and no nodes excluded error". So I was thinking if
I could send each sqoop job into oozie so that it would only continue when
one job is completed and then takes other so that I wont hit datanodes
continuously.

for that I am not able to understand how to split the script with each
command and send it to oozie workflow.

my second question now is(different que):
Also, recently I have been working with simple sqoop command in action and
it doesnt work no matter how many configurations i keep adding. and
checking through yarn.log and oozie.log doesnt help either.

in oozie logs:

 SERVER[ip-172-31-33-142.us-west-2.compute.internal] Hadoop command-line
option parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.

2018-06-25 21:00:25,233  WARN JobResourceUploader:171 -
SERVER[ip-172-31-33-142.us-west-2.compute.internal] No job jar file set.  User
classes may not be found. See Job or Job#setJar(String).

2018-06-25 21:00:25,403  WARN MRApps:614 -
SERVER[ip-172-31-33-142.us-west-2.compute.internal] cache file
(mapreduce.job.cache.files)
hdfs://ip-172-31-33-142.us-west-2.compute.internal:8020/user/oozie/share/lib/lib_20180618211518/sqoop/apacheds-i18n-2.0.0-M15.jar
conflicts with cache file (mapreduce.job.cache.files)
hdfs://ip-172-31-33-142.us-west-2.compute.internal:8020/user/oozie/share/lib/lib_20180618211518/oozie/apacheds-i18n-2.0.0-M15.jar
This will be an error in Hadoop 2.0

2018-06-25 21:00:25,403  WARN MRApps:614 -
SERVER[ip-172-31-33-142.us-west-2.compute.internal] cache file
(mapreduce.job.cache.files)
hdfs://ip-172-31-33-142.us-west-2.compute.internal:8020/user/oozie/share/lib/lib_20180618211518/sqoop/apacheds-kerberos-codec-2.0.0-M15.jar
conflicts with cache file (mapreduce.job.cache.files)
hdfs://ip-172-31-33-142.us-west-2.compute.internal:8020/user/oozie/share/lib/lib_20180618211518/oozie/apacheds-kerberos-codec-2.0.0-M15.jar
This will be an error in Hadoop 2.0

in RM logs:
I dont see anything helpful instead of standard logs.

If these make sense. please throw out some light.
Thanks for your help.

On Mon, Jun 25, 2018 at 4:31 PM, Peter Cseh 
wrote:

> Hey,
>
> I don't think I understand the whole picture, but in general:
> - try to use sqoop action for executing sqoop commands
> - for date-related scheduling, use coordinators. They can handle catch-up
> and other stuff for you
> - try to split your shell script to atomic steps and use the action-data
> field to communicate between them
>
> Hope it helps,
> gp
>
> On Thu, Jun 21, 2018 at 9:00 PM Sowjanya Kakarala 
> wrote:
>
> > Hi Guys,
> >
> > I am trying to build a workflow, which should get commands from a shell
> > script and oozie job has to complete that sqoop command and then take the
> > other command from same shell script.
> >
> > for example:
> > my shell script have sqoop command and automatically looped over, from
> > start date within it after completing one after other till given end
> date,
> > when the sqoop command is getting generated at that point I wanted to
> call
> > oozie and run that until the shell script hits end date.
> >
> > I saw examples the other way, but it is not what i wanted.
> >
> > Is it possible? any suggestions will help.
> >
> > Thanks
> > Sowjanya
> >
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> --
>


Re: oozie in Shell script

2018-06-25 Thread Peter Cseh
Hey,

I don't think I understand the whole picture, but in general:
- try to use sqoop action for executing sqoop commands
- for date-related scheduling, use coordinators. They can handle catch-up
and other stuff for you
- try to split your shell script to atomic steps and use the action-data
field to communicate between them

Hope it helps,
gp

On Thu, Jun 21, 2018 at 9:00 PM Sowjanya Kakarala 
wrote:

> Hi Guys,
>
> I am trying to build a workflow, which should get commands from a shell
> script and oozie job has to complete that sqoop command and then take the
> other command from same shell script.
>
> for example:
> my shell script have sqoop command and automatically looped over, from
> start date within it after completing one after other till given end date,
> when the sqoop command is getting generated at that point I wanted to call
> oozie and run that until the shell script hits end date.
>
> I saw examples the other way, but it is not what i wanted.
>
> Is it possible? any suggestions will help.
>
> Thanks
> Sowjanya
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


oozie in Shell script

2018-06-21 Thread Sowjanya Kakarala
Hi Guys,

I am trying to build a workflow, which should get commands from a shell
script and oozie job has to complete that sqoop command and then take the
other command from same shell script.

for example:
my shell script have sqoop command and automatically looped over, from
start date within it after completing one after other till given end date,
when the sqoop command is getting generated at that point I wanted to call
oozie and run that until the shell script hits end date.

I saw examples the other way, but it is not what i wanted.

Is it possible? any suggestions will help.

Thanks
Sowjanya