Re: How to build ozzie 4.2.0 with hadoop-2.7.1

2016-05-27 Thread Peter Cseh
Hi,

Can you provide the output of the build and the command you've run to build
oozie?
Thank you

Peter

On Fri, May 27, 2016 at 9:35 AM, rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Any help??
>
> On Wed, May 25, 2016 at 5:05 PM, rammohan ganapavarapu <
> rammohanga...@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to build ozzie 2.4.0 with hadoop 2.7.1, build successful but
> > it doesn't have hadoop-2 under hadooplibs to create war file. can someone
> > help me?
> >
> > Ram
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Oozie error with pig

2016-06-22 Thread Peter Cseh
1309706-oozie-oozi-C]
> ACTION[0002430-160613201309706-oozie-oozi-C@1]
> [0002430-160613201309706-oozie-oozi-C@1]::CoordActionInputCheck:: Missing
> deps:
> 2016-06-21 19:33:34,619  WARN ParameterVerifier:546 -
> SERVER[hadoop.oss.ads] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[0002430-160613201309706-oozie-oozi-C]
> ACTION[0002430-160613201309706-oozie-oozi-C@1] The application does not
> define formal parameters in its XML definition
> 2016-06-21 19:33:54,696  INFO CoordActionUpdateXCommand:543 -
> SERVER[hadoop.oss.ads] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[0002430-160613201309706-oozie-oozi-C]
> ACTION[0002430-160613201309706-oozie-oozi-C@1] Updating Coordintaor
> action id :0002430-160613201309706-oozie-oozi-C@1 status  to KILLED,
> pending = 0
>




-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: ShareLibService Error

2016-06-15 Thread Peter Cseh
Based on that e-mail thread:

An update of the library from 2.1 to 2.4 into
oozie_webapp_folder/WEB-INF/lib solved my problem


Try copiing the jar there and restart oozie afterwards

gp



On Wed, Jun 15, 2016 at 10:35 PM, rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Peter,
>
> What is the processes to upgrade common-io-lib? i already have ( manually
> copied) that jar under $OOZIE_HOME/lib dir but still having issues.
>
> Thanks,
> Ram
>
> On Wed, Jun 15, 2016 at 12:10 PM, Peter Cseh <gezap...@cloudera.com>
> wrote:
>
> > You could try updating the common-io library to 2.4, that helped a
> similar
> > issue before:
> >
> >
> http://mail-archives.apache.org/mod_mbox/oozie-user/201507.mbox/%3CCALBGZ8o4n27S8w6fn3HFxfzJmZbA9Gsz71Ewg%2Br6XEFCZTFpPQ%40mail.gmail.com%3E
> >
> > gp
> >
> > On Wed, Jun 15, 2016 at 8:54 PM, rammohan ganapavarapu <
> > rammohanga...@gmail.com> wrote:
> >
> > > I am getting this for update.
> > >
> > > oozie  admin -sharelibupdate  -oozie http://localhost:11000/oozie
> > > Error: HTTP error code: 500 : Internal Server Error
> > >
> > > and log from oozie.log
> > >
> > >
> > > 2016-06-15 18:54:04,974  WARN AuthenticationFilter:509 -
> > SERVER[localhost]
> > > AuthenticationToken ignored:
> > > org.apache.hadoop.security.authentication.util.SignerException: Invalid
> > > signature
> > >
> > > On Wed, Jun 15, 2016 at 5:28 AM, Peter Cseh <gezap...@cloudera.com>
> > wrote:
> > >
> > > > Hi Ram,
> > > >
> > > > Have you told oozie to update the sharelibs with the command:
> > > >
> > > > oozie  admin -sharelibupdate -oozie http://localhost:11000/oozie
> > > >
> > > >
> > > > BRs
> > > >
> > > > gezapeti
> > > >
> > > > On Wed, Jun 15, 2016 at 1:06 AM, rammohan ganapavarapu <
> > > > rammohanga...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am getting this error in oozie-error.log, i have created sharelib
> > > using
> > > > > setup script it got successful but when i do shareliblist i am
> > getting
> > > > > empty output but i can list it from hdfs FS, any help?
> > > > >
> > > > >
> > > > > 1. ./oozie-setup.sh sharelib create -fs hdfs://localhost:8020
> > > > > the destination path for sharelib is:
> > > > > /user/hadoop/share/lib/lib_20160614225830
> > > > >
> > > > >
> > > > >  2. oozie  admin -shareliblist -oozie http://localhost:11000/oozie
> > > > > [Available ShareLib]
> > > > >
> > > > > 3. hadoop fs -ls /uap/oozie/share/lib/lib_20160614164826/
> > > > > Found 10 items
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/distcp
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/hcatalog
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/hive
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/hive2
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/mapreduce-streaming
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/oozie
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/pig
> > > > > -rw-r--r--   3 hadoop hadoop   1361 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/sharelib.properties
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/spark
> > > > > drwxr-xr-x   - hadoop hadoop  0 2016-06-14 16:48
> > > > > /uap/oozie/share/lib/lib_20160614164826/sqoop
> > > > > [hadoop@eqp049wo bin]$
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > org.apache.oozie.

Re: Workflow submission time

2016-06-27 Thread Peter Cseh
Hi Pierre,

There was a bugfix around submitting fork jobs which parallelized job
submission:
https://issues.apache.org/jira/browse/OOZIE-2345

But the issue you've reported is known and not resolved yet:
https://issues.apache.org/jira/browse/OOZIE-1978

I could not find a workaround description, but one sub-workflow per fork
may help as the validation of the xml is the slow part.
Best regards,
Peter

On Mon, Jun 27, 2016 at 4:22 PM, Pierre Villard <pierre.villard...@gmail.com
> wrote:

> Hi guys,
>
> I am trying to submit workflows with around 50 actions. However depending
> of how the workflow is defined and the number of actions, the time needed
> by Oozie to accept the workflow may change a lot (I am not talking about
> the execution time of actions, I’m really talking about the time needed
> between the moment I launch the command line 'job –run' and the moment I
> get back the prompt and my job ID).
>
> The submission time also seems to exponentially depend of the number of
> forks in the workflow (5 forks : few seconds, 6 forks : 1 minute, 7 forks :
> 10 minutes, 8 forks : one hour).
>
> I was expecting to have workflows with a higher number of actions. Is it a
> known issue? Is there some tuning to perform? are there workarounds? should
> I use sub-workflows?
>
> Thanks for your help,
> Best regards,
> Pierre
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Workflow submission time

2016-06-27 Thread Peter Cseh
Hi Pierre,

Now that you've mentioned it I've found that you can disable fork-join
validation at workflow and application level:
https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.1.5_Fork_and_Join_Control_Nodes

"By default, Oozie performs some validation that any forking in a workflow
is valid and won't lead to any incorrect behavior or instability. However,
if Oozie is preventing a workflow from being submitted and you are very
certain that it should work, you can disable forkjoin validation so that
Oozie will accept the workflow. To disable this validation just for a
specific workflow, simply set *oozie.wf.validate.ForkJoin* to false in the
job.properties file. To disable this validation for all workflows, simply
set *oozie.validate.ForkJoin* to false in the oozie-site.xml file.
Disabling this validation is determined by the AND of both of these
properties, so it will be disabled if either or both are set to false and
only enabled if both are set to true (or not specified)."

You may limit the number of concurrent actions by submitting them into a
queue in YARN and configure the scheduler accordingly.

BRs,
Peter

On Mon, Jun 27, 2016 at 5:22 PM, Pierre Villard <pierre.villard...@gmail.com
> wrote:

> Hi Peter,
>
> Thanks a lot for your answer, useful references to the JIRAs!
> I'll try to have a look at the code and see if this can be improved.
>
> Out of curiosity, what is the process covered by 'validation of the XML'? I
> am asking because, when doing 'oozie validate' command, it is OK very
> quickly.
>
> Is there a way to "deactivate" this validation part?
>
> In my specific use-case, I could use one single fork/join, the thing is
> that if I take that route, I'd like to be able to limit the number of
> concurrent actions that can run in parallel from the fork. Is it something
> we can do?
>
> Thanks,
> Pierre.
>
>
>
>
>
>
> 2016-06-27 17:01 GMT+02:00 Peter Cseh <gezap...@cloudera.com>:
>
> > Hi Pierre,
> >
> > There was a bugfix around submitting fork jobs which parallelized job
> > submission:
> > https://issues.apache.org/jira/browse/OOZIE-2345
> >
> > But the issue you've reported is known and not resolved yet:
> > https://issues.apache.org/jira/browse/OOZIE-1978
> >
> > I could not find a workaround description, but one sub-workflow per fork
> > may help as the validation of the xml is the slow part.
> > Best regards,
> > Peter
> >
> > On Mon, Jun 27, 2016 at 4:22 PM, Pierre Villard <
> > pierre.villard...@gmail.com
> > > wrote:
> >
> > > Hi guys,
> > >
> > > I am trying to submit workflows with around 50 actions. However
> depending
> > > of how the workflow is defined and the number of actions, the time
> needed
> > > by Oozie to accept the workflow may change a lot (I am not talking
> about
> > > the execution time of actions, I’m really talking about the time needed
> > > between the moment I launch the command line 'job –run' and the moment
> I
> > > get back the prompt and my job ID).
> > >
> > > The submission time also seems to exponentially depend of the number of
> > > forks in the workflow (5 forks : few seconds, 6 forks : 1 minute, 7
> > forks :
> > > 10 minutes, 8 forks : one hour).
> > >
> > > I was expecting to have workflows with a higher number of actions. Is
> it
> > a
> > > known issue? Is there some tuning to perform? are there workarounds?
> > should
> > > I use sub-workflows?
> > >
> > > Thanks for your help,
> > > Best regards,
> > > Pierre
> > >
> >
> >
> >
> > --
> > Peter Cseh
> > Software Engineer
> > <http://www.cloudera.com>
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Extended attributes

2017-01-17 Thread Peter Cseh
I don't know how long it will take to review and submit oozie 1770.
Feel free to open an issue to bump the default hadoop version and add a
patch to it.
I'm sure it will be submitted earlier.
Go


On Jan 17, 2017 17:07, "Artem Ervits" <artemerv...@gmail.com> wrote:

> do you know if it's a long wait? I'd like to contribute a patch soon.
>
> On Mon, Jan 16, 2017 at 3:00 AM, Peter Cseh <gezap...@cloudera.com> wrote:
>
> > Hi,
> >
> > the patch for OOZIE-1770
> > <https://issues.apache.org/jira/browse/OOZIE-1770> changes
> > the hadoop version to 2.6.0.
> > We're working on getting it to the master branch.
> >
> > gp
> >
> >
> > On Mon, Jan 16, 2017 at 2:37 AM, Artem Ervits <dbis...@gmail.com> wrote:
> >
> > > Hello all, I'm trying to extend FS action functionality to extended
> > > attributes and I realized that this functionality is not exposed in
> > Hadoop
> > > 2.4. All major distros moved to 2.6.1 at the least, what are the plans
> to
> > > migrate these libraries?
> > >
> >
> >
> >
> > --
> > Peter Cseh
> > Software Engineer
> > <http://www.cloudera.com>
> >
>


Re: Extended attributes

2017-01-16 Thread Peter Cseh
Hi,

the patch for OOZIE-1770
<https://issues.apache.org/jira/browse/OOZIE-1770> changes
the hadoop version to 2.6.0.
We're working on getting it to the master branch.

gp


On Mon, Jan 16, 2017 at 2:37 AM, Artem Ervits <dbis...@gmail.com> wrote:

> Hello all, I'm trying to extend FS action functionality to extended
> attributes and I realized that this functionality is not exposed in Hadoop
> 2.4. All major distros moved to 2.6.1 at the least, what are the plans to
> migrate these libraries?
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Oozie sharelib does not exists

2017-02-26 Thread Peter Cseh
HI,

What's the user you're running the oozie server with?

oozie-setup.sh uploads the sharelib to the folder
/user/${userName}/share/lib and Oozie looks for it there, but if the oozie
server is run by an other user, that can cause issues.
You should probably check your hdfs to locate the Oozie sharelib after
upload

gp

On Sun, Feb 26, 2017 at 6:50 PM, Pushkar.Gujar <pushkarvgu...@gmail.com>
wrote:

> I am setting up oozie for first time. After all the setup is done, when I
> ran example workflows, they failed. And -errorlog shows following message
>
> org.apache.oozie.action.ActionExecutorException:
> File /user/userName/share/lib does not exist
>
> I did run the below command to create sharelib while doing setup and I can
> see the sharelib with all the jars in hdfs webconsole-
>
> ./oozie-setup.sh sharelib create -fs hdfs://localhost:9000
>
> but when I verify the available sharelib using
>
> oozie admin -shareliblist
>
> it shows only [Availble Sharelib] message with no libs actually listed.
>
> Any idea, what can be the issue?
>
> ​Thank you,
> Pushkar
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Oozie sharelib does not exists

2017-02-26 Thread Peter Cseh
Can you share the output of the
oozie admin -sharelibupdate


command?
Thanks
gp

On Sun, Feb 26, 2017 at 10:17 PM, Pushkar.Gujar <pushkarvgu...@gmail.com>
wrote:

> I am using same username to run hadoop as well as oozie server. I can see
> all the jar files created under share folder /user/pushkargujar/share/lib/
> lib_
>
>
>
> *​*
>
>
> Thank you,
> *Pushkar Gujar*
>
>
> On Sun, Feb 26, 2017 at 3:24 PM, Peter Cseh <gezap...@cloudera.com> wrote:
>
>> HI,
>>
>> What's the user you're running the oozie server with?
>>
>> oozie-setup.sh uploads the sharelib to the folder
>> /user/${userName}/share/lib and Oozie looks for it there, but if the oozie
>> server is run by an other user, that can cause issues.
>> You should probably check your hdfs to locate the Oozie sharelib after
>> upload
>>
>> gp
>>
>> On Sun, Feb 26, 2017 at 6:50 PM, Pushkar.Gujar <pushkarvgu...@gmail.com>
>> wrote:
>>
>> > I am setting up oozie for first time. After all the setup is done, when
>> I
>> > ran example workflows, they failed. And -errorlog shows following
>> message
>> >
>> > org.apache.oozie.action.ActionExecutorException:
>> > File /user/userName/share/lib does not exist
>> >
>> > I did run the below command to create sharelib while doing setup and I
>> can
>> > see the sharelib with all the jars in hdfs webconsole-
>> >
>> > ./oozie-setup.sh sharelib create -fs hdfs://localhost:9000
>> >
>> > but when I verify the available sharelib using
>> >
>> > oozie admin -shareliblist
>> >
>> > it shows only [Availble Sharelib] message with no libs actually listed.
>> >
>> > Any idea, what can be the issue?
>> >
>> > ​Thank you,
>> > Pushkar
>> >
>>
>>
>>
>> --
>> Peter Cseh
>> Software Engineer
>> <http://www.cloudera.com>
>>
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: oozie execute shell(content hive or sqoop command)

2016-09-06 Thread Peter Cseh
Hi,
you may use the Sqoop action to do the import:
https://oozie.apache.org/docs/4.2.0/DG_SqoopActionExtension.html

gp

On Tue, Sep 6, 2016 at 1:51 PM, wangwei <963906...@qq.com> wrote:

> Hi,
>  I have a scene, there are a lot of tables need to use the sqoop import
> mysql, so I need to write the sqoop in the shell script, to cycle through
> all the tables.
>   It still appears the same error。
>
>
>
>
> -- 原始邮件 --
> 发件人: "satish saley";<satishsale...@gmail.com>;
> 发送时间: 2016年9月6日(星期二) 晚上7:21
> 收件人: "user"<user@oozie.apache.org>;
>
> 主题: Re: oozie execute shell(content hive or sqoop command)
>
>
>
> Hi,
> For hive scripts, use hive-action. It would easy to follow the pipeline for
> others and to debug since oozie will show the hive job url directly in the
> UI.
>
> https://oozie.apache.org/docs/4.2.0/DG_HiveActionExtension.html
> https://oozie.apache.org/docs/4.2.0/DG_Hive2ActionExtension.html
>
> On Tue, Sep 6, 2016 at 3:21 AM, wangwei <963906...@qq.com> wrote:
>
> > Hi:
> >  my shell content: hive.sh
> >   #!/bin/bash
> >   hive -e "select count(*) from test;"
> >  my workflow content:workflow.xml
> >
> > The following error occurred:
> >
> > How to solve?,Please
> >
> >
> >
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: 答复: Launcher exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf

2016-09-11 Thread Peter Cseh
Hi,
I've just created a build from Oozie's master and checked and it seems
hive-exec jar is needed and included in hive2 action's sharelib:


> GezapetiMBP:bin gezapeti$ hdfs dfs -ls share/lib/lib_20160911145035/hive2
> Found 48 items
> ...
> -rw-r--r--   3 gezapeti supergroup 115618 2016-09-11 14:50
> share/lib/lib_20160911145035/hive2/hive-beeline-0.13.1.jar
> -rw-r--r--   3 gezapeti supergroup   15141449 2016-09-11 14:50
> share/lib/lib_20160911145035/hive2/hive-exec-0.13.1.jar

...

I don't know what's the situation for other Hive versions.

I've compiled Oozie with:
mvn clean package assembly:single -DskipTests -Puber -Dhadoop.version=2.4.0

Created sharelib using:

./oozie-setup.sh sharelib create -fs hdfs://localhost:9000 -locallib
${OOZIE_SRC_DIR}/sharelib/target/oozie-sharelib-4.3.0-SNAPSHOT/share/

If -locallib is not specified, the oozie-sharelib-version.tar.gz from
Oozie's home dir (defined by oozie.home.dir property) will be used.

Make sure that the sharelib version matches your Oozie and Hive versions.

Hope it helps,

gp


​


Re: Unable to configure custom mapreduce.job.queuename in Oozie shell action

2016-09-11 Thread Peter Cseh
Hi,

Can you submit jobs into the specified queues outside Oozie?
Is there any reason Pig Action
<https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.3_Pig_Action>
can't
be used instead of Shell Action to submit Pig jobs?

Thanks,
Peter



On Sat, Sep 10, 2016 at 7:37 PM, Bijoy Deb <bijoy.comput...@gmail.com>
wrote:

> Hi,
> I have configured 2 new queues 'launcher' and 'mapreduce' apart from
> 'default' in Capacity scheduler in yarn-site.
> Now, I am trying to submit a Pig job via *Oozie Shell action* into those
> queues queues such that the oozie launcher job goes into 'launcher' and pig
> job into 'mapreduce' queue.On submitting the oozie workflow I can see that
> my launcher vjob successfully runs in 'launcher' queue,but the
> Pig/mapreduce job is still running in default queue.
>
> Below is the snippet that I added to my workflow.xml for above:
>
> **
>
> **
>
> *oozie.launcher.mapreduce.job.queuename*
>
> *launcher*
>
> **
>
> **
>
> *mapreduce.job.queuename*
>
> *mapreduce*
> **
> **
>
> I also tried using 'mapred.job.queue.name' instead of
> 'mapreduce.job.queuename' above and also by passing it as
> -Dmapreduce.job.queuename=mapreduce while submitting the oozie job via
> Oozie cli.But in every case the Pig/mapreduce job goes into 'default'
> queue.
>
> Am I doing anything wrong here or is it that Oozie Shell action doesn't
> support 'mapreduce.job.quenename specification?
>
> I am using MR2, Hadoop version 2.7.1, oozie version 4.2.0.
>
> Any help would be really appreciated.
>
> Thanks
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Launcher exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf

2016-09-09 Thread Peter Cseh
Hi,

It looks like the org.apache.hadoop.hive.conf.HiveConf class is missing
from the classpath when the action runs.
Can you check the output of the  oozie admin -shareliblist hive2 and check
that the hive-exec.jar is there?
You can find more information about the sharelib and how to install it here
<http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/>
.

Peter

On Fri, Sep 9, 2016 at 6:35 AM, Huang Meilong <ims...@outlook.com> wrote:

> Hello,
>
>
> I'm using oozie 4.2.0 on a HA cluster, I got a launcher exception when I
> run the oozie example app hive2:
>
>
> 2016-09-09 10:10:44,927  WARN Hive2ActionExecutor:523 -
> SERVER[emr-header-1.cluster-500031470] USER[oozie] GROUP[-] TOKEN[]
> APP[hive2-wf] JOB[004-160908174647725-oozie-oozi-W] ACTION[004-
> 160908174647725-oozie-oozi-W@hive2-node] Launcher ERROR, reason: Main
> class [org.apache.oozie.action.hadoop.Hive2Main], main() threw exception,
> org/apache/hadoop/hive/conf/HiveConf
> 2016-09-09 10:10:44,931  WARN Hive2ActionExecutor:523 -
> SERVER[emr-header-1.cluster-500031470] USER[oozie] GROUP[-] TOKEN[]
> APP[hive2-wf] JOB[004-160908174647725-oozie-oozi-W] ACTION[004-
> 160908174647725-oozie-oozi-W@hive2-node] Launcher exception:
> org/apache/hadoop/hive/conf/HiveConf
> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
> at org.apache.oozie.action.hadoop.Hive2Main.runBeeline(Hive2Main.java:240)
> at org.apache.oozie.action.hadoop.Hive2Main.run(Hive2Main.java:223)
> at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
> at org.apache.oozie.action.hadoop.Hive2Main.main(Hive2Main.java:56)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.oozie.action.hadoop.LauncherMapper.map(
> LauncherMapper.java:236)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.LocalContainerLauncher$
> EventHandler.runSubtask(LocalContainerLauncher.java:380)
> at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(
> LocalContainerLauncher.java:301)
> at org.apache.hadoop.mapred.LocalContainerLauncher$
> EventHandler.access$200(LocalContainerLauncher.java:187)
> at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(
> LocalContainerLauncher.java:230)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.
> HiveConf
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 21 more
>
> job definition:
>
>
> 
> 
> 
> 
>
> 
> 
> ${jobTracker}
> ${nameNode}
> 
> 
> 
> 
> 
> 
> mapred.job.queue.name
> ${queueName}
> 
> 
> ${jdbcURL}
> script.q
> INPUT=/user/${wf:user()}/${examplesRoot}/input-data/
> table
> OUTPUT=/user/${wf:user()}/${examplesRoot}/output-
> data/hive2
> 
> 
> 
> 
>
> 
> Hive2 (Beeline) action failed, error
> message[${wf:errorMessage(wf:lastErrorNode())}]
> 
> 
> 
>
> job configuration:
>
>
> 
>   
> examplesRoot
> examples
>   
>   
> oozie.wf.application.path
> hdfs://emr-cluster/user/oozie/examples/apps/hive2
>   
>   
> oozie.use.system.libpath
> true
>   
>   
> queueName
>     default
>   
>   
> jdbcURL
> jdbc:hive2://localhost:1/default
>   
>   
> user.name
> oozie
>   
>   
> jobTracker
> rm1,rm2
>   
>   
> mapreduce.job.user.name
> oozie
>   
>   
> nameNode
> hdfs://emr-cluster
>   
> 
>
>
>
> How can I fix this error, can you give me a hand, thanks in advance!
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Oozie Hive2 Action with Kerberos security and HS2 HTTP transport mode

2016-08-29 Thread Peter Cseh
Have you tried including the principal and the auth path
<https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-UsingKerberoswithaPre-AuthenticatedSubject>
in the jdbc url?
Beeline needs that so it has to included in the jdbc-url field in the
action too.

Gp

On Thu, Aug 25, 2016 at 5:14 PM, Jiri Kaplan <jiri.kap...@software.dell.com>
wrote:

> Hi,
>
>
>
> I’d like to ask for a help with Oozie Hive2 action on HDP-2.3.4.0 cluster
> with Oozie 4.2.0.2.3 installed and with enabled security over Kerberos.
> Oozie job always ends up with following exception: HiveSQLException:
> Delegation token only supported over kerberos authentication. We have
> HiveServer2 configured with hive.server2.transport.mode=http,
> hive.server2.thrift.http.path=cliservice and
> hive.server2.thrift.http.port=10001. I'm not sure if I do something wrong
> or if this configuration is even supported but when we switch back HS2
> transport mode to binary it works. Any kind of help is welcome.
>
>
>
> Exception stack trace (from HS2 log):
>
> 2016-08-25 11:01:23,337 ERROR [HiveServer2-HttpHandler-Pool: Thread-38]:
> thrift.ThriftCLIService (ThriftCLIService.java:GetDelegationToken(237)) -
> Error obtaining delegation token
>
> org.apache.hive.service.cli.HiveSQLException: Delegation token only
> supported over kerberos authentication
>
> at org.apache.hive.service.auth.HiveAuthFactory.
> getDelegationToken(HiveAuthFactory.java:283)
>
> at org.apache.hive.service.cli.session.HiveSessionImplwithUGI.
> getDelegationToken(HiveSessionImplwithUGI.java:192)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:497)
>
> at org.apache.hive.service.cli.session.HiveSessionProxy.
> invoke(HiveSessionProxy.java:78)
>
> at org.apache.hive.service.cli.session.HiveSessionProxy.
> access$000(HiveSessionProxy.java:36)
>
> at org.apache.hive.service.cli.session.HiveSessionProxy$1.
> run(HiveSessionProxy.java:63)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1657)
>
> at org.apache.hive.service.cli.session.HiveSessionProxy.
> invoke(HiveSessionProxy.java:59)
>
> at com.sun.proxy.$Proxy20.getDelegationToken(Unknown Source)
>
> at org.apache.hive.service.cli.CLIService.getDelegationToken(
> CLIService.java:484)
>
> at org.apache.hive.service.cli.thrift.ThriftCLIService.
> GetDelegationToken(ThriftCLIService.java:231)
>
> at org.apache.hive.service.cli.thrift.TCLIService$Processor$
> GetDelegationToken.getResult(TCLIService.java:1573)
>
> at org.apache.hive.service.cli.thrift.TCLIService$Processor$
> GetDelegationToken.getResult(TCLIService.java:1558)
>
> at org.apache.thrift.ProcessFunction.process(
> ProcessFunction.java:39)
>
> at org.apache.thrift.TBaseProcessor.process(
> TBaseProcessor.java:39)
>
> at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
>
> at org.apache.hive.service.cli.thrift.ThriftHttpServlet.
> doPost(ThriftHttpServlet.java:171)
>
>
>
> Here is my workflow.xml content:
>
> 
>
>   
>
> myrmaddress:8050
>
> hdfs://mynnaddress:8020/
>
>   
>
>   
>
> 
>
>   
>
> hive2.jdbc.url
>
> jdbc:hive2://myhiveserver:10001/;sasl.qop=
> auth-conf;transportMode=http;httpPath=cliservice
>
>   
>
>   
>
> hive2.server.principal
>
> hive/myhiveserver@mydomain
>
>   
>
> 
>
>   
>
>   
>
> 
>
> 
>
>jdbc:hive2://myhiveserver:
> 10001/;sasl.qop=auth-conf;transportMode=http;httpPath=cliservice
> 
>
>   script.hql
>
> 
>
> 
>
> 
>
>   
>
>   
>
> Action failed, error
>
> message[${wf:errorMessage(wf:
> lastErrorNode())}]
>
> 
>
>   
>
>   
>
> 
>
>
>
> *Jiří Kaplan*
> Software Developer
>
> *Dell** | *R Database Management, EMEA
>
> [image: dell_software]
>
>
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Problems with properties in Java action

2016-11-11 Thread Peter Cseh
Hey,

If you use the oozie.launcher prefix on a property, that property will be
applied to the Launcher MR job, that launches the action.
In your example if you set mapreduce.job.complete.cancel.delegation.tokens
to false in the action's configuration then Sqoop will get this property,
but the Launcher Mapper job isn't. If you set oozie.launcher.
mapreduce.job.complete.cancel.delegation.tokens to false there, then the
launcher job gets it.

I hope this helps,
gp

On Mon, Nov 7, 2016 at 11:38 AM, Андрей Ривкин <amriv...@gmail.com> wrote:

> Maybe u could tell me some books or where in documentation I can read about
> it?
>
> Regards,
> Andrey
>
> 2016-11-02 10:02 GMT+03:00 Андрей Ривкин <amriv...@gmail.com>:
>
> > Hello everyone,
> >
> > Could somebody explain me how properties in oozie work?
> >
> > I've got java action on CDH 5.3 which generates a lot Sqoop actions and
> we
> > have problem with delegation tokens (https://issues.apache.org/jir
> > a/browse/YARN-2964). This job works for 20 mins and always fails with
> > delegation token not found in cache when tring to agregate logs. So we
> even
> > can't see logs.
> >
> > I've tried to set "mapreduce.job.complete.cancel.delegation.tokens" to
> > false in java action and in whole workflow, but it didn't set.
> > Then I've tried to set some custom propertie, for examle
> > "some.custom.property=true", but it also didn't set.
> > Then I've tried to change some oozie property, for example
> > "oozie.launcher.mapreduce.map.memory.mb" and it worked.
> >
> > So I'm confused how oozie is working with job properties.
> >
> > I'm checking all properties in JobHistory -> configuration.
> >
> >
> >
> > Regards,
> > Andrey
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Spark-action failed launching MRAppMaster

2016-10-18 Thread Peter Cseh
INFO [IPC Server listener on 45951]
> org.apache.hadoop.ipc.Server: IPC Server listener on 45951: starting
>
> 2016-10-17 23:23:08,276 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.client.MRClientService:
> Instantiated MRClientService at ip-10-0-1-143/10.0.1.143:45951
>
> 2016-10-17 23:23:08,327 INFO [main] org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
>
> 2016-10-17 23:23:08,368 INFO [main] org.apache.hadoop.http.HttpServer:
> Added global filter 'safety' (class=org.apache.hadoop.http.
> HttpServer$QuotingInputFilter)
>
> 2016-10-17 23:23:08,371 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
> Error starting MRAppMaster
>
> java.lang.NoSuchMethodError: org.apache.hadoop.yarn.webapp.
> util.WebAppUtils.getProxyHostsAndPortsForAmFilter(Lorg/apache/hadoop/conf/
> Configuration;)Ljava/util/List;
>
> at org.apache.hadoop.yarn.server.webproxy.amfilter.
> AmFilterInitializer.initFilter(AmFilterInitializer.java:40)
>
> at org.apache.hadoop.http.HttpServer.(HttpServer.java:272)
>
> at org.apache.hadoop.yarn.webapp.WebApps$Builder$2.(
> WebApps.java:222)
>
> at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.
> java:219)
>
> at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.
> serviceStart(MRClientService.java:136)
>
> at org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:193)
>
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
> serviceStart(MRAppMaster.java:1058)
>
> at org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:193)
>
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(
> MRAppMaster.java:1445)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1491)
>
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
> initAndStartAppMaster(MRAppMaster.java:1441)
>
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(
> MRAppMaster.java:1374)
>
> 2016-10-17 23:23:08,374 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
> MRAppMaster received a signal. Signaling RMCommunicator and
> JobHistoryEventHandler.
>
> 2016-10-17 23:23:08,374 WARN [Thread-1] 
> org.apache.hadoop.util.ShutdownHookManager:
> ShutdownHook 'MRAppMasterShutdownHook' failed,
> java.lang.NullPointerException
>
> java.lang.NullPointerException
>
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$
> ContainerAllocatorRouter.setSignalled(MRAppMaster.java:827)
>
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$
> MRAppMasterShutdownHook.run(MRAppMaster.java:1395)
>
> at org.apache.hadoop.util.ShutdownHookManager$1.run(
> ShutdownHookManager.java:54)
>
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Getting Permission denied to run shell action

2016-10-21 Thread Peter Cseh
Hi Ram,

Can the yarn/oozie user write to /data/tmp?

Check the permissions and ownership of /data/tmp.
Hope it helps
Gp


On Fri, Oct 21, 2016 at 1:32 AM, rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> I have changed the yarn.nodemanager.local-dirs from default /tmp/ to
> /data/tmp dir and from then i am getting this error
>
>
>
> Cannot run program "test.sh" (in directory
> "/data/tmp/nm-local-dir/usercache/hadoop/appcache/
> application_1476931300239_0003/container_1476931300239_0003_01_02"):
> error=13, Permission denied
>
> Property in yarn-site.xml
>
> ${hadoop.tmp.dir}=/data/tmp/
> 
>  yarn.nodemanager.local-dirs
>  ${hadoop.tmp.dir}/nm-local-dir
> 
>
> Not sure why i am getting this error, i am submitting job wtih the same
> user a directory owner,i have changed the ${hadoop.tmp.dir}=/tmp/ but oozie
> still saying /data/* permission denied, i am not sure whats going on here,
> can some one help me to figure it out?
>
> Thanks,
> Ram
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: how to execute HDFS file actions from shell action on Kerberized Cluster

2016-11-25 Thread Peter Cseh
Hi,

The best way is to split up the shell script and use FS action
<https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.4_Fs_HDFS_action>s,
Hive action
<https://oozie.apache.org/docs/4.2.0/DG_HiveActionExtension.html>s and
other specific actions in the workflow.
This way you can define the credentials
<https://oozie.apache.org/docs/4.2.0/DG_ActionAuthentication.html> and let
Oozie handle the authentication for you.
If you want to do it all in a shell script, you will have to make sure the
keytab is accessible on all machines in the cluster and handle
authentication from the shell script by yourself.

BRs
gp



On Thu, Nov 24, 2016 at 10:13 PM, Aniruddh Sharma <asharma...@gmail.com>
wrote:

> Hello
>
> I know if one has to execute hive action from shell , then one can do
> something like this
>  hive -e "SET mapreduce.job.credentials.binary=$HADOOP_TOKEN_FILE_
> LOCATION;
> select * from test"
>
>
>  My requirement is to execute hdfs fs actions from shell.
> for example "hadoop fs -get /user/abc/d.txt"
>
> But it fails because of Kerberos. How I can use HADOOP_TOKEN_FILE_LOCATION
> to authenticate for HDFS file actions ?
>
> Thanks and Regards
> Aniruddh
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Does Oozie support run sparkR with spark action?

2016-11-11 Thread Peter Cseh
Hi,

This exception is caused by a missing jar on the classpath.
The needed jars  should be added to the classpath in Oozie action. This
blogpost
<http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/>describes
several ways to do it.

I've never tried to run a SparkR application from Oozie. I guess it can be
done, but in the current state it need some manual work:

According to Spark <https://github.com/apache/spark/tree/master/R>, the
SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should be
also set for the job.
$SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and
you could add the SparkR stuff to Spark sharelib to make it available in
the action.
It's not guarantied that it will work after these steps, but there's a
chance. I would be delighted to hear about the result if you have the time
to try to make this work.

Thanks,
gp


On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
wrote:

> Hi:
> I have an issue with oozie run sparkR, could you please help me?
> I try to run sparkR job through oozie in yarn-client mode. And I have
> installed R package in all my nodes.
>
> job.properties is like:
> nameNode=hdfs://XXX:8020
> jobTracker=XXX:8050
> master=yarn-client
> queueName=default
> oozie.use.system.libpath=true
> oozie.wf.application.path=/user/oozie/measurecountWF
>
> The workflow is like:
> 
> 
> 
> 
> oozie.launcher.yarn.app.mapreduce.am.env
> SPARK_HOME=
> 
> 
> 
> 
> 
> 
> ${jobTracker}
> ${nameNode}
> ${master}
> measurecountWF
> measurecount.R
>  --conf spark.driver.extraJavaOptions=
> 
>  
> 
>   
>   
>   
> Workflow failed, error
> message[${wf:errorMessage(wf:lastErrorNode())}]
> 
>   
>   
> 
>
> It failed with class not found exception.
>
> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
> in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException:
> com.cloudant.spark.common.JsonStoreRDDPartition
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.serializer.JavaDeserializationStream$$
> anon$1.resolveClass(JavaSerializer.scala:68)
> at java.io.ObjectInputStream.readNonProxyDesc(
> ObjectInputStream.java:1613)
> at java.io.ObjectInputStream.readClassDesc(
> ObjectInputStream.java:1518)
> at java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
> Calls: sql -> callJMethod -> invokeJava
> Execution halted
> Intercepting System.exit(1)
>
> Does oozie support run sparkR in spark action? Or we should only wrap
> it in ssh action?
>
> Thanks a lot
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Does Oozie support run sparkR with spark action?

2016-11-29 Thread Peter Cseh
Hi,

I'm glad that you could make Spark R work.
Thank you for sharing the solution with us!

gp


On Tue, Nov 29, 2016 at 2:15 AM, Dongying Jiao <pineapple...@gmail.com>
wrote:

> Hi:
> Spark R can be run in oozie spark action. I tried to run the simple spark R
> script under spark example folder, it is successful.
> After setup R envioriment in your cluster, only need to put
> spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder.
> Below is the workflow I use for yarn cluster mode.
> 
> 
> ${jobTracker}
> ${nameNode}
> ${master}
> sparkRtest
> ${nameNode}/user/oozie/sparkR/dataframe.R
>  --conf spark.driver.extraJavaOptions=
> 
>  
>   
>   
>   
>
> Thanks
>
>
> 2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>:
>
> > Hi Peter:
> > Thank you very much for your reply.
> > I will have a try and tell you the result.
> >
> > 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>:
> >
> >> Hi,
> >>
> >> This exception is caused by a missing jar on the classpath.
> >> The needed jars  should be added to the classpath in Oozie action. This
> >> blogpost
> >> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel
> >> ib-in-apache-oozie-cdh-5/>describes
> >> several ways to do it.
> >>
> >> I've never tried to run a SparkR application from Oozie. I guess it can
> be
> >> done, but in the current state it need some manual work:
> >>
> >> According to Spark <https://github.com/apache/spark/tree/master/R>, the
> >> SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should
> be
> >> also set for the job.
> >> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482,
> and
> >> you could add the SparkR stuff to Spark sharelib to make it available in
> >> the action.
> >> It's not guarantied that it will work after these steps, but there's a
> >> chance. I would be delighted to hear about the result if you have the
> time
> >> to try to make this work.
> >>
> >> Thanks,
> >> gp
> >>
> >>
> >> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
> >> wrote:
> >>
> >> > Hi:
> >> > I have an issue with oozie run sparkR, could you please help me?
> >> > I try to run sparkR job through oozie in yarn-client mode. And I have
> >> > installed R package in all my nodes.
> >> >
> >> > job.properties is like:
> >> > nameNode=hdfs://XXX:8020
> >> > jobTracker=XXX:8050
> >> > master=yarn-client
> >> > queueName=default
> >> > oozie.use.system.libpath=true
> >> > oozie.wf.application.path=/user/oozie/measurecountWF
> >> >
> >> > The workflow is like:
> >> > 
> >> > 
> >> > 
> >> > 
> >> > oozie.launcher.yarn.
> >> app.mapreduce.am.env
> >> > SPARK_HOME=
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > ${jobTracker}
> >> > ${nameNode}
> >> > ${master}
> >> > measurecountWF
> >> > measurecount.R
> >> >  --conf spark.driver.extraJavaOptions=
> >> > 
> >> >  
> >> > 
> >> >   
> >> >   
> >> >   
> >> > Workflow failed, error
> >> > message[${wf:errorMessage(wf:lastErrorNode())}]
> >> > 
> >> >   
> >> >   
> >> > 
> >> >
> >> > It failed with class not found exception.
> >> >
> >> > org.apache.spark.SparkException: Job aborted due to stage failure:
> >> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
> >> > in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException:
> >> > com.cloudant.spark.common.JsonStoreRDDPartition
> >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> > at java.lang.ClassLoa

Re: Welcome new Oozie Committers - Abhishek Bafna and Satish Saley

2017-01-06 Thread Peter Cseh
Congrats for the both of you!
Thanks for the past and future contributions!


On Fri, Jan 6, 2017 at 6:35 PM, Venkat Ranganathan <
vranganat...@hortonworks.com> wrote:

> Congratulations Abhishek and Satish
>
> Venkat
>
> On 1/6/17, 9:25 AM, "Robert Kanter" <rkan...@cloudera.com> wrote:
>
> Hi everyone,
>
> It is my pleasure to announce that Oozie PMC has invited Abhishek
> and Satish to become Oozie committers and they have both accepted our
> invitation.
>
> Please join me congratulating them.
> Congrats!
>
>
> - Robert, on behalf of the Oozie PMC
>
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Question about spark-bagel in oozie 4.3.0

2016-12-06 Thread Peter Cseh
Hi,

Is the spark-bagel module there if  4.3.0 is compiled with the spark-2
profile active?
Removing the jar from the sharelib should not cause issues inside Oozie.
BRs
gp



On Tue, Dec 6, 2016 at 4:23 AM, Dongying Jiao <pineapple...@gmail.com>
wrote:

> Hi:
> I noticed oozie 4.3.0 add spark-bagel lib to spark sharelib compared to
> oozie 4.2.0.
> I found this module is deprecated and superseded by GraphX in spark
> official site, why do we add this deprecated compenent since GraphX is
> already in sharelib?
>
> And I want to use spark 2.0 for oozie 4.3.0, as there is no spark-bagel
> module in spark 2.X, is there any risk if I delete this module in oozie
> 4.3.0?
>
> Thanks very much
>
> Best Regards,
> Dongying Jiao
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Welcome new Oozie Committers - Peter Bacsko and Peter Cseh

2017-03-22 Thread Peter Cseh
Thanks everyone! :)


On Tue, Mar 21, 2017 at 11:43 PM, satish saley <satishsale...@gmail.com>
wrote:

>
> |
> |
> ||
>
>   |
> |
> |   Congrats to both.|
>
>   |
> |
> ||
>
>   |
>
>  |
> |
> || Yahoo Mail Stationery |
>
>   |
>
>
>
> On Tuesday, March 21, 2017 3:37 PM, Attila Sasvari <
> asasv...@cloudera.com> wrote:
>
>
>  Congratulations!
>
> On Tue, Mar 21, 2017 at 3:36 PM, Abhishek Bafna <bafna.i...@gmail.com>
> wrote:
>
> > Congrats (Peter)^2.
> >
> > > On Mar 21, 2017, at 6:58 AM, goun na <gou...@gmail.com> wrote:
> > >
> > > Congrats!
> > >
> > > 2017-03-21 7:37 GMT+09:00 Robert Kanter <rkan...@apache.org>:
> > >
> > >> Hi everyone,
> > >>
> > >> It is my pleasure to announce that the Oozie PMC has invited
> > >> Peter Bacsko and Peter (Geza) Cseh to become Oozie committers
> > >> and they have both accepted our invitation.
> > >>
> > >> Please join me congratulating them.
> > >> Congrats!
> > >>
> > >>
> > >> - Robert, on behalf of the Oozie PMC
> > >>
> >
> >
>
>
>
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: workflow is not running

2017-04-20 Thread Peter Cseh
Hi Hitesh,
Please check the yarn logs for the application job_1492491380035_0002 for
the root cause of the failure.

Thanks,
gp


On Tue, Apr 18, 2017 at 7:25 AM, Hitesh Goyal <hitesh.go...@nlpcaptcha.com>
wrote:

> Hi Satish,
> Here is the job info
>
> Workflow Name : oozie_test.py
> App Path  : emr
> Status: KILLED
> Run   : 0
> User  : hadoop
> Group : -
> Created   : 2017-04-18 05:08 GMT
> Started   : 2017-04-18 05:08 GMT
> Last Modified : 2017-04-18 05:09 GMT
> Ended : 2017-04-18 05:09 GMT
> CoordAction ID: -
>
> Actions
> 
> 
> ID
> StatusExt ID Ext Status Err Code
> 
> 
> 000-170418050029006-oozie-oozi-W@:start:
> OK-  OK -
> 
> 
> 000-170418050029006-oozie-oozi-W@spark-node
>  ERROR job_1492491380035_0002 FAILED/KILLEDJA018
> 
> 
> 000-170418050029006-oozie-oozi-W@hitesh
>  OK-  OK E0729
> 
> 
>
>
> -Original Message-
> From: satish saley [mailto:satishsale...@gmail.com]
> Sent: Monday, April 17, 2017 7:12 PM
> To: user@oozie.apache.org
> Subject: Re: workflow is not running
>
> it is showing the following . See image as attached file.
> Hi Hitesh,Could you please attach the image?
>
> On Monday, April 17, 2017 3:10 AM, Hitesh Goyal <
> hitesh.go...@nlpcaptcha.com> wrote:
>
>
>   Hi team,I am new to oozie. I have
> started using it as follows:- I have created a job.properties file and a
> workflow.xml file. I am running oozie on aws emr cluster. My python script
> is at  ~/emr/lib/my_python.py I am running it via command:-  oozie job
> --oozie http://ip-10-163-125-124.ap-southeast-1.compute.internal:
> 11000/oozie -config ~/emr/job.properties -runwhen I check info of job
> via :-  oozie job info  -020-170417052643712-oozie-oozi-Wit is
> showing the following . See image as attached file. Please help me
> throughout. Thanks & Regards, Hitesh Goyal Cont. No.:- 9996588220
>
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Job Submission to Cross Cluster Fails

2017-03-08 Thread Peter Cseh
Hi,

Have you tried running dryruns to see the exact configurations the
workflows and the actions are getting from Hue?
Oozie is designed to work against multiple clusters, but I don't know if it
can be done via Hue.

gp

On Mon, Mar 6, 2017 at 8:49 PM, mdk-swandha <dipeshsoftw...@gmail.com>
wrote:

> Hi,
>
> I have the following configuration:
>
> Cluster 1:
> Job Submission through Hue (hue1) to Cluster 1 (NN1 and RM1) via Oozie
> works.
>
> Cluster 2:
> Job Submission through Hue (hue2) to Cluster 2 (NN2 and RM2) via Oozie
> works.
>
> As Oozie is cluster agnostic - now I have tweaked my Hue code to submit job
> to cross cluster i.e. from Hue1 to Cluster2 (NN2 and RM2). In the workspace
> folder I do see job.properties with modified NN and JT (RM)
>
> Job is going to RM2 but it is failing there with the following warnings and
> error
>
> Warning:
>
> org.apache.hadoop.conf.Configuration: job.xml:an attempt to override
> final parameter: fs.defaultFS;  Ignoring.
> org.apache.hadoop.conf.Configuration: job.xml:an attempt to override
> final parameter: fs.defaultFS;  Ignoring.
>
>
> java.io.FileNotFoundException: File does not exist:
> hdfs://cluster1-nn1:8020/user/user1/.staging/job_1488573599716_0001/job.
> splitmetainfo
>
>
> org.apache.hadoop.security.token.SecretManager$InvalidToken):
> appattempt_1488573599716_0001_01 not found in
> AMRMTokenSecretManager.
>
>
> Also it is trying to connect to RM1 as it shows the following message in
> the log
>
>
> INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to
> ResourceManager at RM1/xx.xx.xx.xx:8030
>
>
> I have modified following in Oozie-site.xml
>
>
> oozie.service.ProxyUserService.proxyuser.hue.groups ==> *
>
> oozie.service.ProxyUserService.proxyuser.hue.hosts ==> *
>
>
> I would appreciate if someone can shed some light on how to make this work.
>
>
> Do I need to route job to cluster specific Oozie server?
>
>
> Why job.xml (sent to RM via Oozie) has Cluster-1's NN and RM
> information (if that is the case)?
>
>
> Do I require any other configuration setting to enable this cross
> cluster job submission work?
>
>
> Thanks.
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: A problem about oozie workflow about shell Action.

2017-05-09 Thread Peter Cseh
xternal ID [job_1494042098609_0071]
> 2017-05-09 19:51:06,611  WARN ShellActionExecutor:523 - SERVER[tw-master]
> USER[root] GROUP[-] TOKEN[] APP[SHELL_AGAIN] 
> JOB[127-170506114458023-oozie-oozi-W]
> ACTION[127-170506114458023-oozie-oozi-W@shell-077a] Launcher ERROR,
> reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code
> [1]
> 2017-05-09 19:51:06,711  INFO ActionEndXCommand:520 - SERVER[tw-master]
> USER[root] GROUP[-] TOKEN[] APP[SHELL_AGAIN] 
> JOB[127-170506114458023-oozie-oozi-W]
> ACTION[127-170506114458023-oozie-oozi-W@shell-077a] ERROR is
> considered as FAILED for SLA
> 2017-05-09 19:51:06,853  INFO ActionStartXCommand:520 - SERVER[tw-master]
> USER[root] GROUP[-] TOKEN[] APP[SHELL_AGAIN] 
> JOB[127-170506114458023-oozie-oozi-W]
> ACTION[127-170506114458023-oozie-oozi-W@Kill] Start action
> [127-170506114458023-oozie-oozi-W@Kill] with user-retry state :
> userRetryCount [0], userRetryMax [0], userRetryInterval [10]
> 2017-05-09 19:51:06,864  INFO ActionStartXCommand:520 - SERVER[tw-master]
> USER[root] GROUP[-] TOKEN[] APP[SHELL_AGAIN] 
> JOB[127-170506114458023-oozie-oozi-W]
> ACTION[127-170506114458023-oozie-oozi-W@Kill]
> [***127-170506114458023-oozie-oozi-W@Kill***]Action status=DONE
> 2017-05-09 19:51:06,864  INFO ActionStartXCommand:520 - SERVER[tw-master]
> USER[root] GROUP[-] TOKEN[] APP[SHELL_AGAIN] 
> JOB[127-170506114458023-oozie-oozi-W]
> ACTION[127-170506114458023-oozie-oozi-W@Kill]
> [***127-170506114458023-oozie-oozi-W@Kill***]Action updated in DB!
> 2017-05-09 19:51:06,994  INFO WorkflowNotificationXCommand:520 -
> SERVER[tw-master] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[127-170506114458023-oozie-oozi-W] ACTION[127-
> 170506114458023-oozie-oozi-W@Kill] No Notification URL is defined.
> Therefore nothing to notify for job 127-170506114458023-oozie-
> oozi-W@Kill
> 2017-05-09 19:51:06,994  INFO WorkflowNotificationXCommand:520 -
> SERVER[tw-master] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[127-170506114458023-oozie-oozi-W] ACTION[127-
> 170506114458023-oozie-oozi-W@shell-077a] No Notification URL is defined.
> Therefore nothing to notify for job 127-170506114458023-oozie-
> oozi-W@shell-077a
> 2017-05-09 19:51:06,995  INFO WorkflowNotificationXCommand:520 -
> SERVER[tw-master] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[127-170506114458023-oozie-oozi-W] ACTION[] No Notification URL is
> defined. Therefore nothing to notify for job 127-170506114458023-oozie-
> oozi-W
> 3. I don't know how to do with ERROR is considered as FAILED
> for SLA.Is there something wrong with my permission,or the solution is not
> supported.
> Thanks
>
> Best Regards,
>  Hollis
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


OYA! - no more LauncherMappers

2017-05-26 Thread Peter Cseh
Hi everyone,

OOZIE-1770 - Create Oozie Application Master for YARN is committed to
master!
I would like to thank the effort of everybody who was involved in the
design, the development or provided feedback in Jira or on ReviewBoard.

This is a big change in how Oozie works and there is still a lot to do: you
can check out OOZIE-2889 for details.

Thanks again everyone!
gp


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Oozie Job + Version Control Tool

2017-05-30 Thread Peter Cseh
> > 
> > 
> > parentWorkflowAppPath
> > ${wf:appPath()}
> > 
> > 
> > kafkaConfigFilePath
> > ${wf:appPath()}/load/kafka.properties
> > 
> > 
> > hqlDeltaTransformationPath
> > load/hql-delta-transform-${wf:id()}.hql
> > 
> > 
> > 
> > 
> > 
> > 
> >
> >
> >
> > Sorry if I'm blabbing
> > /Pelle
> >
> > On Thu, Oct 20, 2016 at 3:52 AM, goun na <gou...@gmail.com> wrote:
> >
> > > Per Ullberg, a snippet of pom.xml would help us. :)
> > > Thanks,
> > >
> > > 2016-10-20 3:36 GMT+09:00 Per Ullberg <per.ullb...@klarna.com>:
> > >
> > > > @goun na: we keep one coordinator per (zip|war|jar)
> > > >
> > > > @shiva: I'm happy to share, but it's hard to know what you're in need
> > of.
> > > > Ask and I will try to answer :)
> > > >
> > > > /Pelle
> > > >
> > > >
> > > > On Wednesday, October 19, 2016, Shiva Ramagopal <tr.s...@gmail.com>
> > > wrote:
> > > >
> > > > > Per,
> > > > >
> > > > > Your approach seems very interesting. Could you elaborate more on
> > your
> > > > > approach?
> > > > >
> > > > > Thanks,
> > > > > Shiva
> > > > >
> > > > > On Wed, Oct 19, 2016 at 2:19 PM, Per Ullberg <
> per.ullb...@klarna.com
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > We package our oozie jobs with maven and release artifacts to
> > nexus.
> > > We
> > > > > > keep the version number as part of the coordinator name. That way
> > we
> > > > have
> > > > > > full traceability between code base and running coordinators.
> > > > > >
> > > > > > regards
> > > > > > /Pelle
> > > > > >
> > > > > > On Wed, Oct 19, 2016 at 10:05 AM, Abhishek Bafna <
> > > bafna.i...@gmail.com
> > > > > <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Oozie does not have version control for jobs. When you submit a
> > > > > > > workflow/coordinator/bundle to oozie, it stores it into DB uses
> > it
> > > > from
> > > > > > > there for further execution.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Abhishek
> > > > > > > > On Oct 19, 2016, at 1:16 PM, goun na <gou...@gmail.com
> > > > > <javascript:;>> wrote:
> > > > > > > >
> > > > > > > > Hi users,
> > > > > > > >
> > > > > > > > What is the best to manage Oozie jobs? Is there a built-in
> > > version
> > > > > > > control
> > > > > > > > feature?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Goun Na
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > *Per Ullberg*
> > > > > > Data Vault Tech Lead
> > > > > > Odin Uppsala
> > > > > > +46 701612693 <+46+701612693>
> > > > > >
> > > > > > Klarna AB (publ)
> > > > > > Sveavägen 46, 111 34 Stockholm
> > > > > > Tel: +46 8 120 120 00 <+46812012000>
> > > > > > Reg no: 556737-0431
> > > > > > klarna.com
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > *Per Ullberg*
> > > > Data Vault Tech Lead
> > > > Odin Uppsala
> > > > +46 701612693 <+46+701612693>
> > > >
> > > > Klarna AB (publ)
> > > > Sveavägen 46, 111 34 Stockholm
> > > > Tel: +46 8 120 120 00 <+46812012000>
> > > > Reg no: 556737-0431
> > > > klarna.com
> > > >
> > >
> >
> >
> >
> > --
> >
> > *Per Ullberg*
> > Data Vault Tech Lead
> > Odin Uppsala
> > +46 701612693 <+46+701612693>
> >
> > Klarna AB (publ)
> > Sveavägen 46, 111 34 Stockholm
> > Tel: +46 8 120 120 00 <+46812012000>
> > Reg no: 556737-0431
> > klarna.com
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Oozie Sqoop Wallet Specification

2017-06-14 Thread Peter Cseh
Hey,

This might be related to whitespaces:
https://oozie.apache.org/docs/4.3.0/DG_SqoopActionExtension.html
"The Sqoop command can be specified either using the command element or
multiple arg elements.

When using the command element, Oozie will split the command on every space
into multiple arguments.

When using the arg elements, Oozie will pass each argument value as an
argument to Sqoop.

The arg variant should be used when there are spaces within a single
argument."

If you check the launcher logs, I'm pretty sure that Sqoop gets a messed up
version of the parameter list.

Also, you should set java properties via the  section of the
workflow.xml.

Options strating with  "-D" in the command line won't take effect as Sqoop
runs in the same JVM as the Launcher.

Please check out https://cwiki.apache.org/confluence/display/OOZIE/Cookbooks
for several examples (Not for Sqoop, sorry for that)

I hope this helps

gp

On Tue, Jun 13, 2017 at 11:17 PM, Arun Selvan <asel...@clarityinsights.com>
wrote:

> Hi Team,
>
> I'm invoking Sqoop via Oozie with the Optionsfile. Now I need to use Wallet
> file for Authentication. I'm able to use wallet file in command line. But
> when I gave those args in Oozie Sqoop Action, I'm getting Unknown Host
> specified Error. PFB the command used in oozie Sqoop Action.
>
>  import -Dmapreduce.job.quenename=a
> -Dmapred.map.child.java.opts='-Doracle.net.tns_admin=.
> -Doracle.net.wallet_location=. -Dyarn.app.mapreduce.am.staging-dir=//dev'
> -files cwallet.sso,ewallet.p12,sqlnet.ora,tnsnames.ora -libjars
> oraclepki.jar,osdt_cert.jar,osdt_core.jar --options-file
> Options_File
> ${walletLocation}/cwallet.sso#cwallet.sso
> ${walletLocation}/ewallet.p12#ewallet.p12
> ${walletLocation}/sqlnet.ora#sqlnet.ora
> ${walletLocation}/tnsnames.ora#tnsnames.ora
>
> Kindly help.
>
> Thanks,
> Arun
>
> --
>
> *Clarity Solution Group is now Clarity Insights. Check out our website
> using the link below. *
> *ClarityInsights.com* <http://ClarityInsights.com>
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Error in running coord-input-logic example

2017-09-08 Thread Peter Cseh
Hi,

Can you attach a full stack trace, the coordinator example and the command
you're executing when getting the error?
Thanks
gp

On Fri, Sep 8, 2017 at 1:22 PM 罗 辉  wrote:

> hi there
>
>   I got a problem in running the coord-input-logic example. I've
> modified namenode,jobtracker accordingly in the job.properties , and also
> put them into a right path of HDFS. However an error take place  as the
> attached photo. The error code is E0701, and says that can not find the
> declaration of element 'coordinator-app'.
>
>  I didn't make any changes in the coordinator.xml and workflow.xml for
> this example.
>
>   Any idea is welcome, thank you.
>
>
>


Re: Set SMTP settings in workflow

2017-11-11 Thread Peter Cseh
Hey Jan!

The SMTP access data and the "from" field comes from the
ConfigigurationService directly, so you can't overwrite that from the
Workflows:
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/email/EmailActionExecutor.java#L185-L188

I don't see any reason why we couldn't add the possibility to set the
"from" address as well in the emailAction just like it's possible to set
the "cc" and the "bcc" fields.
Feel free to file a Jira for this.
Best,
gp

On Sat, Nov 11, 2017 at 12:16 PM, Jan Hentschel <
jan.hentsc...@ultratendency.com> wrote:

> Hello,
>
>
>
> I’m currently trying to create a workflow, which has an email action
> sending an HDFS file as an attachment. I didn’t configure the SMTP settings
> in the oozie-site to prevent a restart of the service. Instead I tried to
> set them directly in the workflow, but wasn’t able to do that.
>
>
>
> Before opening a ticket to also make this available in the workflow
> configuration, I wanted to make sure that there’s no other way than
> configuring the SMTP settings in the oozie-site. Setting some settings
> directly in the workflow would have the advantage to use a different FROM
> address for different workflows instead of one central address.
>
>
>
> Thanks for your help.
>
>
>
> Best, Jan
>
>
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Ozzie Spark 2.X action

2017-10-25 Thread Peter Cseh
Hey!

You could compile Oozie using the spark-2 profile and it should work fine.
You may even overwrite the versions it's pulling in:
https://github.com/apache/oozie/blob/master/pom.xml#L1956-L1964

gp

On Wed, Oct 25, 2017 at 7:32 AM Aravindakshan Srinivasan
 wrote:

> Team,
> Does Oozie's Spark action work with Spark 2.X? If yes, do we need the
> Oozie 4.3 Spark action for this or does Oozie 4.2 work for  Spark 2.X as
> well?
> Thanks a bunch,
> Aravind
>


Re: Oozie email action truncating the string containing newline chars

2018-05-08 Thread Peter Cseh
Hey!
The action output is treated and parsed as a serialized
java.util.Properties object.
You can test out what works for you easily by using  a Properties object
and fool around with it.

E.g. something like
SHELL_OUTPUT='John,28,1,0 \
Jack,32,0,15 \
Mary,45,23,12 \
Jill,33,12,55'

should work fine.

gp

On Tue, May 8, 2018 at 10:14 PM, Buntu Dev <buntu...@gmail.com> wrote:

> I've this output captured from the shell action:
>
> SHELL_OUTPUT='John,28,1,0
> Jack,32,0,15
> Mary,45,23,12
> Jill,33,12,55'
>
> The email action uses this captured output in the body like this:
>
> Data: ${ wf:actionData('shell-c23f')['SHELL_OUTPUT'] }
>
> But the email received seems to be truncated and only sending the first
> line instead of the complete string, for example:
>
> Data: John,28,1,0
>
> How do I go about making sure the complete string is part of the body of
> the email?
>
> Thanks!
>



-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: How To Set Environment Variables For Spark Action Script From XML Definition

2018-05-14 Thread Peter Cseh
Hi!

There is no easy and straightforward way of doing this for the Spark
action, but you can take advantage of the fact that Oozie 4.1.0 uses
MapReduce to launch Spark.
Just put "mapred.map.child.env" in the action configuration using the
format k1=v1,k2=v2. EL functions should also work here.

Gp


On Thu, May 10, 2018 at 6:39 PM, Richard Primera <
richard.prim...@woombatcg.com> wrote:

> Greetings,
>
> How can I set an environment variable to be accessible from either a .jar
> or .py script launched via a spark action?
>
> The idea is to set the environment variable with the output of the EL
> function ${wf:id()} from within the XML workflow definition, something
> along these lines:
>
> script.py
>
> OOZIE_WORKFLOW_ID=${wf:id()}
>
> And then have the ability to do wf_id = os.getenv("OOZIE_WORKFLOW_ID")
> from the script without having to pass them as command line arguments. The
> thing about command line arguments is that they don't scale as well because
> they rely on a specific ordering or some custom parsing implementation.
> This can be done easily it seems with a shell action, but I've been unable
> to find a similar straightforward way of doing it for a spark action.
>
> Oozie Version: 4.1.0-cdh5.12.1
>
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Oozie Variable Substitution In XML's Workflow Name

2018-05-14 Thread Peter Cseh
Hey!

This should be possible as of
https://issues.apache.org/jira/browse/OOZIE-637.
I haven't tried to do so, but please file a bug ticket with a reproduction
case attached  so it is easier to fix.

Thanks!

On Thu, May 10, 2018 at 8:59 PM, Richard Primera <
richard.prim...@woombatcg.com> wrote:

> Greetings,
>
> I have a workflow definition where I would like to generate the name of
> the workflow dynamically. Say you have a workflow that can operate on
> hundreds of different sources. It would be benefitial if one could set the
> name of the workflow to be $applicationName_$source instead of simply
> $applicationName. In this case I thought that a simple variable
> substitution would've worked, meaning:
>
> 
>
> Could then be set to be:
>
>  xmlns="uri:oozie:workflow:0.5">
>
> Or
>  xmlns="uri:oozie:workflow:0.5">
>
>
> However this doesn't work. In the first case launching the workflow simply
> fails, I assume due to the missing quotes in the name parameter inside the
>  element. In the second case the workflow name appears as the
> literal string ${dynamically_generated_wf_name}, so it's obvious that
> variable substitution is not being performed in that element.
>
> The obvious approach to this would be to manually do the substitution on
> the XML template with a script and then place that in the HDFS path where
> the XML file normally resides, however this approach implies more work if
> the actual workflow is to be launched many times in parallel for different
> parameters and with different names. In that case one would have to place
> multiple XML files in different locations in the HDFS which becomes a bit
> of a pain. At this moment I'm waiting for a more elegant approach but I've
> failed to come to it on my own, so I decided to reach out to other oozie
> users out there and see what comes up.
>
> Thanks in advance.
>



-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Spark 2.3 in oozie

2018-05-15 Thread Peter Cseh
Oozie has a spark-2 profile that is currently hard-coded to Spark 2.1:
https://github.com/apache/oozie/blob/master/pom.xml#L1983
I'm sure if you overwrite the -Dspark.version and compile Oozie that way it
will work.
gp


On Tue, May 15, 2018 at 5:07 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Hello,
>
> Does oozie supports spark 2.3? Or will it even care of the spark version
>
> I want to use spark action
>
>
>
> Thanks,
> Purna
>



-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Oozie for spark jobs without Hadoop

2018-05-19 Thread Peter Cseh
Wow, great work!
Can you please summarize the required steps? This would be useful for
others so we probably should add it to our documentation.
Thanks in advance!
Peter

On Fri, May 18, 2018 at 11:33 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> I got this fixed by setting jetty_opts with proxy values.
>
> Thanks Peter!!
>
> On Thu, May 17, 2018 at 4:05 PM purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> Ok I fixed this by adding aws keys in oozie
>>
>> But I’m getting below error
>>
>> I have tried setting proxy in core-site.xml but no luck
>>
>>
>> 2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 -
>> SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-]
>> JOB[000-180517144113498-oozie-xjt0-C] ACTION[000-180517144113498
>> -oozie-xjt0-C@2] org.apache.oozie.service.HadoopAccessorException:
>> E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa:
>> com.amazonaws.SdkClientException: Unable to execute HTTP request:
>> Connect to mybucket.s3.amazonaws.com:443
>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket.s3.amazonaws.
>> com/52.216.165.155
>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155>] failed:
>> connect timed out]
>>
>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>> occurred: [doesBucketExist on cmsegmentation-qa: 
>> com.amazonaws.SdkClientException:
>> Unable to execute HTTP request: Connect to mybucket.s3.amazonaws.com:443
>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket
>> .s3.amazonaws.com
>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155> failed:
>> connect timed out]
>>
>> at org.apache.oozie.service.HadoopAccessorService.
>> createFileSystem(HadoopAccessorService.java:630)
>>
>> at org.apache.oozie.service.HadoopAccessorService.
>> createFileSystem(HadoopAccessorService.java:594)
>> at org.apache.oozie.dependency.FS
>> URIHandler.getFileSystem(FSURIHandler.java:184)-env.sh
>>
>> But now I’m getting this error
>>
>>
>>
>> On Thu, May 17, 2018 at 2:53 PM purna pradeep <purna2prad...@gmail.com>
>> wrote:
>>
>>> Ok I got passed this error
>>>
>>> By rebuilding oozie with Dhttpclient.version=4.5.5
>>> -Dhttpcore.version=4.4.9
>>>
>>> now getting this error
>>>
>>>
>>>
>>> ACTION[000-180517144113498-oozie-xjt0-C@1] 
>>> org.apache.oozie.service.HadoopAccessorException:
>>> E0902: Exception occurred: [doesBucketExist on 
>>> mybucketcom.amazonaws.AmazonClientException:
>>> No AWS Credentials provided by BasicAWSCredentialsProvider
>>> EnvironmentVariableCredentialsProvider 
>>> SharedInstanceProfileCredentialsProvider
>>> : com.amazonaws.SdkClientException: Unable to load credentials from
>>> service endpoint]
>>>
>>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>>> occurred: [doesBucketExist on cmsegmentation-qa: 
>>> com.amazonaws.AmazonClientException:
>>> No AWS Credentials provided by BasicAWSCredentialsProvider
>>> EnvironmentVariableCredentialsProvider 
>>> SharedInstanceProfileCredentialsProvider
>>> : com.amazonaws.SdkClientException: Unable to load credentials from
>>> service endpoint]
>>>
>>> On Thu, May 17, 2018 at 12:24 PM purna pradeep <purna2prad...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Peter,
>>>>
>>>> Also When I submit a job with new http client jar, I get
>>>>
>>>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
>>>> server. No of retries = 1. Exception = Could not authenticate,
>>>> Authentication failed, status: 500, message: Server Error```
>>>>
>>>>
>>>> On Thu, May 17, 2018 at 12:14 PM purna pradeep <purna2prad...@gmail.com>
>>>> wrote:
>>>>
>>>>> Ok I have tried this
>>>>>
>>>>> It appears that s3a support requires httpclient 4.4.x and oozie is
>>>>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>>>>> stops loading.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Purna,
>>>>>>
>>>>

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread Peter Cseh
Purna,

Based on
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
you should try to go for s3a.
You'll have to include the aws-jdk as well if I see it correctly:
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
Also, the property names are slightly different so you'll have to change
the example I've given.



On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Peter,
>
> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
>
> Is this for s3 or s3a ?
>
> I’m using s3 but if this is for s3a do you know which jar I need to
> include I mean Hadoop-aws jar or any other jar if required
>
> Hadoop-aws-2.8.3.jar is what I’m using
>
> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> wrote:
>
>> Ok, I've found it:
>>
>> If you are using 4.3.0 or newer this is the part which checks for
>> dependencies:
>> https://github.com/apache/oozie/blob/master/core/src/
>> main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>> It passes the coordinator action's configuration and even does
>> impersonation to check for the dependencies:
>> https://github.com/apache/oozie/blob/master/core/src/
>> main/java/org/apache/oozie/coord/input/logic/
>> CoordInputLogicEvaluatorPhaseOne.java#L159
>>
>> Have you tried the following in the coordinator xml:
>>
>>  
>> 
>>   hdfs://bar:9000/usr/joe/logsprocessor-wf
>>   
>> 
>>   fs.s3.awsAccessKeyId
>>   [YOURKEYID]
>> 
>> 
>>   fs.s3.awsSecretAccessKey
>>   [YOURKEY]
>> 
>>  
>>
>>   
>>
>> Based on the source this should be able to poll s3 periodically.
>>
>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com>
>> wrote:
>>
>>>
>>> I have tried with coordinator's configuration too but no luck ☹️
>>>
>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com>
>>> wrote:
>>>
>>>> Great progress there purna! :)
>>>>
>>>> Have you tried adding these properites to the coordinator's
>>>> configuration? we usually use the action config to build up connection to
>>>> the distributed file system.
>>>> Although I'm not sure we're using these when polling the dependencies
>>>> for coordinators, but I'm excited about you trying to make it work!
>>>>
>>>> I'll get back with a - hopefully - more helpful answer soon, I have to
>>>> check the code in more depth first.
>>>> gp
>>>>
>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2prad...@gmail.com
>>>> > wrote:
>>>>
>>>>> Peter,
>>>>>
>>>>> I got rid of this error by adding
>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>
>>>>> But I’m getting below error now
>>>>>
>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>
>>>>> I have tried adding AWS access ,secret keys in
>>>>>
>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I have tried this ,just added s3 instead of *
>>>>>>
>>>>>> 
>>>>>>
>>>>>> oozie.service.HadoopAccessorService.
>>>>>> supported.filesystems
>>>>>>
>>>>>> hdfs,hftp,webhdfs,s3
>>>>>>
>>>>>> 
>>>>>>
>>>>>>
>>>>>> Getting below error
>>>>>>
>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.getClass(
>>>>>> Configuration.java:2369)
>>>>>>
>>>&g

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
You'll have to configure
oozie.service.HadoopAccessorService.supported.filesystems
hdfs,hftp,webhdfs Enlist
the different filesystems supported for federation. If wildcard "*" is
specified, then ALL file schemes will be allowed.properly.

For testing purposes it's ok to put * in there in oozie-site.xml

On Wed, May 16, 2018 at 5:29 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Peter,
>
> I have tried to specify dataset with uri starting with s3://, s3a:// and
> s3n:// and I am getting exception
>
>
>
> Exception occurred:E0904: Scheme [s3] not supported in uri
> [s3://mybucket/input.data] Making the job failed
>
> org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
> supported in uri [s3:// mybucket /input.data]
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(
> URIHandlerService.java:185)
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(
> URIHandlerService.java:168)
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(
> URIHandlerService.java:160)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
> CoordCommandUtils.java:465)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.
> separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.
> materializeInputDataEvents(CoordCommandUtils.java:731)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
> CoordCommandUtils.java:546)
>
> at
> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>
> at
> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>
> at
> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> MaterializeTransitionXCommand.java:73)
>
> at
> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> MaterializeTransitionXCommand.java:29)
>
> at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
> CallableQueueService.java:181)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> Is S3 support specific to CDH distribution or should it work in Apache
> Oozie as well? I’m not using CDH yet so
>
> On Wed, May 16, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> wrote:
>
> > I think it should be possible for Oozie to poll S3. Check out this
> > <
> > https://www.cloudera.com/documentation/enterprise/5-9-
> x/topics/admin_oozie_s3.html
> > >
> > description on how to make it work in jobs, something similar should work
> > on the server side as well
> >
> > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <purna2prad...@gmail.com>
> > wrote:
> >
> > > Thanks Andras,
> > >
> > > Also I also would like to know if oozie supports Aws S3 as input events
> > to
> > > poll for a dependency file before kicking off a spark action
> > >
> > >
> > > For example: I don’t want to kick off a spark action until a file is
> > > arrived on a given AWS s3 location
> > >
> > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
> andras.pi...@cloudera.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Oozie needs HDFS to store workflow, coordinator, or bundle
> definitions,
> > > as
> > > > well as sharelib files in a safe, distributed and scalable way. Oozie
> > > needs
> > > > YARN to run almost all of its actions, Spark action being no
> exception.
> > > >
> > > > At the moment it's not feasible to install Oozie without those Hadoop
> > > > components. How to install Oozie please *find here
> > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
> > > >
> > > > Regards,
> > > >
> > > > Andras
> > > >
> > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
> > purna2prad...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Would like

Re: Spark 2.3 in oozie

2018-05-16 Thread Peter Cseh
The version of the xml schema has nothing to do with the version of the
component you're using.

Thanks for verifying that -Dspark.scala.binary.verstion=2.11 is required
for compilation with Spark 2.3.0

Oozie does not pull in Spark's Kubernetes artifact.
To make it part of the Oozie Spark sharelib you'll have to include the
spark-kubernetes.jar
<https://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-kubernetes_2.11%7C2.3.0%7Cjar>
in
the sharelib/spark/pom.xml as a compile-time dependency.

gp

On Tue, May 15, 2018 at 9:04 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> I’m able to compile successfully with after adding this override option
>
> -Dspark.scala.binary.version=2.11
>
> Dspark.version = 2.3.0
>
> But when I’m running a spark action with spark-pi example jar against
> Kubernetes master I’m getting below error in stderr log
>
>
> *Error:Could not load KUBERNETES classes.This copy of spark may not have
> been compiled with Kubernetes support*
>
> Below is my workflow.xml
>
> <*spark xmlns="uri:oozie:spark-action:1.0">*
>
> *${resourceManager}*
>
> *${nameNode}*
>
> *k8s://<***.com>*
>
> *Python-Spark-Pi*
>
> *spark-examples_2.11-2.3.0.jar*
>
> *--class org.apache.spark.examples.SparkPi --conf
> spark.executor.instances=2 --conf spark.kubernetes.namespace=spark --conf
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf
> spark.kubernetes.container.image=artifactory.cloud.
> capitalone.com/kubespark/spark-quantum:v2.3.0
> <http://artifactory.cloud.capitalone.com/kubespark/spark-quantum:v2.3.0>
> --conf spark.kubernetes.node.selector.node-role.kubernetes.io/worker=true
> <http://spark.kubernetes.node.selector.node-role.kubernetes.io/worker=true
> >
> --conf
> spark.kubernetes.driver.label.application=is1-driver --conf
> spark.kubernetes.executor.label.application=is1-exec*utor
> local:///opt/spark/examples/jars/spark-examples_2.11-2.3.
> 0.jar
>
> 
>
>
> Is this because of uri:oozie:spark-action:1.0 in spark xml tag? Does it
> needs to be spark-action:2.0 as I’m using spark 2.3?
>
>
> Please suggest!
>
>
> On Tue, May 15, 2018 at 12:43 PM Peter Cseh <gezap...@cloudera.com> wrote:
>
> > I think the error is related to the Scala version being present in the
> > artifact name.
> > I'll take a look at this tomorrow.
> > Gp
> >
> > On Tue, May 15, 2018, 18:28 Artem Ervits <artemerv...@gmail.com> wrote:
> >
> > > Did you run
> > > mvn clean install first on the parent directory?
> > >
> > > On Tue, May 15, 2018, 11:35 AM purna pradeep <purna2prad...@gmail.com>
> > > wrote:
> > >
> > > > Thanks peter,
> > > >
> > > > I have tried changing Dspark.version to 2.3.0 and compiled oozie I’m
> > > > getting below error from oozie examples
> > > >
> > > >
> > > > *ERROR] Failed to execute goal on project oozie-examples: Could not
> > > resolve
> > > > dependencies for project org.apache.oozie:oozie-examples:jar:5.0.0:
> > Could
> > > > not find artifact org.apache.spark:spark-core_2.10:jar:2.3.0 in
> > > resolution
> > > > *
> > > >
> > > > On Tue, May 15, 2018 at 11:14 AM Peter Cseh <gezap...@cloudera.com>
> > > wrote:
> > > >
> > > > > Oozie has a spark-2 profile that is currently hard-coded to Spark
> > 2.1:
> > > > > https://github.com/apache/oozie/blob/master/pom.xml#L1983
> > > > > I'm sure if you overwrite the -Dspark.version and compile Oozie
> that
> > > way
> > > > it
> > > > > will work.
> > > > > gp
> > > > >
> > > > >
> > > > > On Tue, May 15, 2018 at 5:07 PM, purna pradeep <
> > > purna2prad...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Does oozie supports spark 2.3? Or will it even care of the spark
> > > > version
> > > > > >
> > > > > > I want to use spark action
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Purna
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Peter Cseh *| Software Engineer
> > > > > cloudera.com <https://www.cloudera.com>
> > > > >
> > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > >
> > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > > Cloudera
> > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > --
> > > > >
> > > >
> > >
> >
>



-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
I think it should be possible for Oozie to poll S3. Check out this
<https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_oozie_s3.html>
description on how to make it work in jobs, something similar should work
on the server side as well

On Tue, May 15, 2018 at 4:43 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Thanks Andras,
>
> Also I also would like to know if oozie supports Aws S3 as input events to
> poll for a dependency file before kicking off a spark action
>
>
> For example: I don’t want to kick off a spark action until a file is
> arrived on a given AWS s3 location
>
> On Tue, May 15, 2018 at 10:17 AM Andras Piros <andras.pi...@cloudera.com>
> wrote:
>
> > Hi,
> >
> > Oozie needs HDFS to store workflow, coordinator, or bundle definitions,
> as
> > well as sharelib files in a safe, distributed and scalable way. Oozie
> needs
> > YARN to run almost all of its actions, Spark action being no exception.
> >
> > At the moment it's not feasible to install Oozie without those Hadoop
> > components. How to install Oozie please *find here
> > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
> >
> > Regards,
> >
> > Andras
> >
> > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <purna2prad...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Would like to know if I can use sparkaction in oozie without having
> > Hadoop
> > > cluster?
> > >
> > > I want to use oozie to schedule spark jobs on Kubernetes cluster
> > >
> > > I’m a beginner in oozie
> > >
> > > Thanks
> > >
> >
>



-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Spark 2.3 in oozie

2018-05-15 Thread Peter Cseh
I think the error is related to the Scala version being present in the
artifact name.
I'll take a look at this tomorrow.
Gp

On Tue, May 15, 2018, 18:28 Artem Ervits <artemerv...@gmail.com> wrote:

> Did you run
> mvn clean install first on the parent directory?
>
> On Tue, May 15, 2018, 11:35 AM purna pradeep <purna2prad...@gmail.com>
> wrote:
>
> > Thanks peter,
> >
> > I have tried changing Dspark.version to 2.3.0 and compiled oozie I’m
> > getting below error from oozie examples
> >
> >
> > *ERROR] Failed to execute goal on project oozie-examples: Could not
> resolve
> > dependencies for project org.apache.oozie:oozie-examples:jar:5.0.0: Could
> > not find artifact org.apache.spark:spark-core_2.10:jar:2.3.0 in
> resolution
> > *
> >
> > On Tue, May 15, 2018 at 11:14 AM Peter Cseh <gezap...@cloudera.com>
> wrote:
> >
> > > Oozie has a spark-2 profile that is currently hard-coded to Spark 2.1:
> > > https://github.com/apache/oozie/blob/master/pom.xml#L1983
> > > I'm sure if you overwrite the -Dspark.version and compile Oozie that
> way
> > it
> > > will work.
> > > gp
> > >
> > >
> > > On Tue, May 15, 2018 at 5:07 PM, purna pradeep <
> purna2prad...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > Does oozie supports spark 2.3? Or will it even care of the spark
> > version
> > > >
> > > > I want to use spark action
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Purna
> > > >
> > >
> > >
> > >
> > > --
> > > *Peter Cseh *| Software Engineer
> > > cloudera.com <https://www.cloudera.com>
> > >
> > > [image: Cloudera] <https://www.cloudera.com/>
> > >
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > --
> > >
> >
>


Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
That's strange, this exception should not happen in that case.
Can you check the server logs for messages like this?
LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
LOG.info("Loaded default urihandler {0}",
defaultHandler.getClass().getName());
Thanks

On Wed, May 16, 2018 at 5:47 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> This is what I already have in my oozie-site.xml
>
> 
>
> oozie.service.HadoopAccessorService.
> supported.filesystems
>
>     *
>
> 
>
> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> wrote:
>
>> You'll have to configure
>> oozie.service.HadoopAccessorService.supported.filesystems
>> hdfs,hftp,webhdfs Enlist
>> the different filesystems supported for federation. If wildcard "*" is
>> specified, then ALL file schemes will be allowed.properly.
>>
>> For testing purposes it's ok to put * in there in oozie-site.xml
>>
>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <purna2prad...@gmail.com>
>> wrote:
>>
>> > Peter,
>> >
>> > I have tried to specify dataset with uri starting with s3://, s3a:// and
>> > s3n:// and I am getting exception
>> >
>> >
>> >
>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>> > [s3://mybucket/input.data] Making the job failed
>> >
>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
>> > supported in uri [s3:// mybucket /input.data]
>> >
>> > at
>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>> > URIHandlerService.java:185)
>> >
>> > at
>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>> > URIHandlerService.java:168)
>> >
>> > at
>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>> > URIHandlerService.java:160)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>> > CoordCommandUtils.java:465)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.
>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.
>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.
>> materializeOneInstance(
>> > CoordCommandUtils.java:546)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>> > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>> >
>> > at
>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>> > MaterializeTransitionXCommand.java:73)
>> >
>> > at
>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>> > MaterializeTransitionXCommand.java:29)
>> >
>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>> >
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> >
>> > at
>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>> > CallableQueueService.java:181)
>> >
>> > at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>> > ThreadPoolExecutor.java:1149)
>> >
>> > at
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> > ThreadPoolExecutor.java:624)
>> >
>> > at java.lang.Thread.run(Thread.java:748)
>> >
>> >
>> >
>> > Is S3 support specific to CDH distribution or should it work in Apache
>> > Oozie as well? I’m not using CDH yet so
>> >
>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com>
>> wrote:
>> >
>> > > I think it should be possible for Oozie to poll S3. Check out this
>> > > <
>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>> > x/topics/admin_oozie_s3.html
>> > > >
>> > > description on how to make it work in jobs, something similar should
>> work
>> > > on the server side as well
>> > >
>> > > On Tue, May 15, 2018 at 4:4

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread Peter Cseh
Can you try configuring the access keys via environment variables in the
server?
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_environment_variables

It's possible that we don't propagate the coordinator action's
configuration properly to the polling code.

On Thu, May 17, 2018 at 8:53 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Ok I got passed this error
>
> By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9
>
> now getting this error
>
>
>
> ACTION[000-180517144113498-oozie-xjt0-C@1] 
> org.apache.oozie.service.HadoopAccessorException:
> E0902: Exception occurred: [doesBucketExist on 
> mybucketcom.amazonaws.AmazonClientException:
> No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider 
> SharedInstanceProfileCredentialsProvider
> : com.amazonaws.SdkClientException: Unable to load credentials from
> service endpoint]
>
> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
> occurred: [doesBucketExist on cmsegmentation-qa: 
> com.amazonaws.AmazonClientException:
> No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider 
> SharedInstanceProfileCredentialsProvider
> : com.amazonaws.SdkClientException: Unable to load credentials from
> service endpoint]
>
> On Thu, May 17, 2018 at 12:24 PM purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>>
>> Peter,
>>
>> Also When I submit a job with new http client jar, I get
>>
>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
>> server. No of retries = 1. Exception = Could not authenticate,
>> Authentication failed, status: 500, message: Server Error```
>>
>>
>> On Thu, May 17, 2018 at 12:14 PM purna pradeep <purna2prad...@gmail.com>
>> wrote:
>>
>>> Ok I have tried this
>>>
>>> It appears that s3a support requires httpclient 4.4.x and oozie is
>>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>>> stops loading.
>>>
>>>
>>>
>>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com>
>>> wrote:
>>>
>>>> Purna,
>>>>
>>>> Based on https://hadoop.apache.org/docs/stable/hadoop-aws/tools/
>>>> hadoop-aws/index.html#S3 you should try to go for s3a.
>>>> You'll have to include the aws-jdk as well if I see it correctly:
>>>> https://hadoop.apache.org/docs/stable/hadoop-
>>>> aws/tools/hadoop-aws/index.html#S3A
>>>> Also, the property names are slightly different so you'll have to
>>>> change the example I've given.
>>>>
>>>>
>>>>
>>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com
>>>> > wrote:
>>>>
>>>>> Peter,
>>>>>
>>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no
>>>>> luck
>>>>>
>>>>> Is this for s3 or s3a ?
>>>>>
>>>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>>>> include I mean Hadoop-aws jar or any other jar if required
>>>>>
>>>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>>>
>>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Ok, I've found it:
>>>>>>
>>>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>>>> dependencies:
>>>>>> https://github.com/apache/oozie/blob/master/core/src/
>>>>>> main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-
>>>>>> L926
>>>>>> It passes the coordinator action's configuration and even does
>>>>>> impersonation to check for the dependencies:
>>>>>> https://github.com/apache/oozie/blob/master/core/src/
>>>>>> main/java/org/apache/oozie/coord/input/logic/
>>>>>> CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>>>
>>>>>> Have you tried the following in the coordinator xml:
>>>>>>
>>>>>>  
>>>>>> 
>>>>>>   hdfs://bar:9000/usr/joe/logsprocessor-wf>>>>> path>
>>>>>>   
>>>>>> 
>>>>>>   fs.s3.awsAccessKey

Re: Spark 2.3 in oozie

2018-05-16 Thread Peter Cseh
Wow, that's great news!

Can I ask you to summarize the steps necessary to make this happen? It
would be good to see everything together - also, it would probably help
others as well.

Thank you for sharing your struggles - and solutions as well!

Peter

On Wed, May 16, 2018 at 10:49 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Thanks Peter!
>
> I’m able to run spark pi example on Kubernetes cluster from oozie after
> this change
>
> On Wed, May 16, 2018 at 10:27 AM Peter Cseh <gezap...@cloudera.com> wrote:
>
> > The version of the xml schema has nothing to do with the version of the
> > component you're using.
> >
> > Thanks for verifying that -Dspark.scala.binary.verstion=2.11 is required
> > for compilation with Spark 2.3.0
> >
> > Oozie does not pull in Spark's Kubernetes artifact.
> > To make it part of the Oozie Spark sharelib you'll have to include the
> > spark-kubernetes.jar
> > <
> > https://search.maven.org/#artifactdetails%7Corg.apache.
> spark%7Cspark-kubernetes_2.11%7C2.3.0%7Cjar
> > >
> > in
> > the sharelib/spark/pom.xml as a compile-time dependency.
> >
> > gp
> >
> > On Tue, May 15, 2018 at 9:04 PM, purna pradeep <purna2prad...@gmail.com>
> > wrote:
> >
> > > I’m able to compile successfully with after adding this override option
> > >
> > > -Dspark.scala.binary.version=2.11
> > >
> > > Dspark.version = 2.3.0
> > >
> > > But when I’m running a spark action with spark-pi example jar against
> > > Kubernetes master I’m getting below error in stderr log
> > >
> > >
> > > *Error:Could not load KUBERNETES classes.This copy of spark may not
> have
> > > been compiled with Kubernetes support*
> > >
> > > Below is my workflow.xml
> > >
> > > <*spark xmlns="uri:oozie:spark-action:1.0">*
> > >
> > > *${resourceManager}*
> > >
> > > *${nameNode}*
> > >
> > > *k8s://<***.com>*
> > >
> > > *Python-Spark-Pi*
> > >
> > > *spark-examples_2.11-2.3.0.jar*
> > >
> > > *--class org.apache.spark.examples.SparkPi
> --conf
> > > spark.executor.instances=2 --conf spark.kubernetes.namespace=spark
> --conf
> > > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf
> > > spark.kubernetes.container.image=artifactory.cloud.
> > > capitalone.com/kubespark/spark-quantum:v2.3.0
> > > <http://artifactory.cloud.capitalone.com/kubespark/
> spark-quantum:v2.3.0>
> > > --conf
> > spark.kubernetes.node.selector.node-role.kubernetes.io/worker=true
> > > <
> > http://spark.kubernetes.node.selector.node-role.kubernetes.
> io/worker=true
> > > >
> > > --conf
> > > spark.kubernetes.driver.label.application=is1-driver --conf
> > > spark.kubernetes.executor.label.application=is1-exec*utor
> > > local:///opt/spark/examples/jars/spark-examples_2.11-2.3.
> > > 0.jar
> > >
> > > 
> > >
> > >
> > > Is this because of uri:oozie:spark-action:1.0 in spark xml tag? Does it
> > > needs to be spark-action:2.0 as I’m using spark 2.3?
> > >
> > >
> > > Please suggest!
> > >
> > >
> > > On Tue, May 15, 2018 at 12:43 PM Peter Cseh <gezap...@cloudera.com>
> > wrote:
> > >
> > > > I think the error is related to the Scala version being present in
> the
> > > > artifact name.
> > > > I'll take a look at this tomorrow.
> > > > Gp
> > > >
> > > > On Tue, May 15, 2018, 18:28 Artem Ervits <artemerv...@gmail.com>
> > wrote:
> > > >
> > > > > Did you run
> > > > > mvn clean install first on the parent directory?
> > > > >
> > > > > On Tue, May 15, 2018, 11:35 AM purna pradeep <
> > purna2prad...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks peter,
> > > > > >
> > > > > > I have tried changing Dspark.version to 2.3.0 and compiled oozie
> > I’m
> > > > > > getting below error from oozie examples
> > > > > >
> > > > > >
> > > > > > *ERROR] Failed to execute goal on project oozie-examples: Could
> not
> > > > > resolve
> > > > > > dependencies 

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
Great progress there purna! :)

Have you tried adding these properites to the coordinator's configuration?
we usually use the action config to build up connection to the distributed
file system.
Although I'm not sure we're using these when polling the dependencies for
coordinators, but I'm excited about you trying to make it work!

I'll get back with a - hopefully - more helpful answer soon, I have to
check the code in more depth first.
gp

On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Peter,
>
> I got rid of this error by adding
> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>
> But I’m getting below error now
>
> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
> Key must be specified by setting the fs.s3.awsAccessKeyId and
> fs.s3.awsSecretAccessKey properties (respectively)
>
> I have tried adding AWS access ,secret keys in
>
> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>
>
>
>
> On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>>
>> I have tried this ,just added s3 instead of *
>>
>> 
>>
>> oozie.service.HadoopAccessorService.
>> supported.filesystems
>>
>> hdfs,hftp,webhdfs,s3
>>
>> 
>>
>>
>> Getting below error
>>
>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>
>> at org.apache.hadoop.conf.Configuration.getClass(
>> Configuration.java:2369)
>>
>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>> FileSystem.java:2793)
>>
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(
>> FileSystem.java:2810)
>>
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>> FileSystem.java:2849)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>
>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>> HadoopAccessorService.java:625)
>>
>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>> HadoopAccessorService.java:623
>>
>>
>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <purna2prad...@gmail.com>
>> wrote:
>>
>>> This is what is in the logs
>>>
>>> 2018-05-16 14:06:13,500  INFO URIHandlerService:520 - SERVER[localhost]
>>> Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler]
>>>
>>> 2018-05-16 14:06:13,501  INFO URIHandlerService:520 - SERVER[localhost]
>>> Loaded default urihandler org.apache.oozie.dependency.FSURIHandler
>>>
>>>
>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com>
>>> wrote:
>>>
>>>> That's strange, this exception should not happen in that case.
>>>> Can you check the server logs for messages like this?
>>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>>>> LOG.info("Loaded default urihandler {0}",
>>>> defaultHandler.getClass().getName());
>>>> Thanks
>>>>
>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <purna2prad...@gmail.com
>>>> > wrote:
>>>>
>>>>> This is what I already have in my oozie-site.xml
>>>>>
>>>>> 
>>>>>
>>>>> oozie.service.HadoopAccessorService.
>>>>> supported.filesystems
>>>>>
>>>>> *
>>>>>
>>>>> 
>>>>>
>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> You'll have to configure
>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>> the different filesystems supported for federation. If wildcard "*" is
>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>
>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>
>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>> purna2prad...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > Peter,
>>>>>> >
>

Re: Reg: Oozie 4.1.0 having jdbc error

2018-06-11 Thread Peter Cseh
Hey!

Oozie 4.3.1 and newer releases are compiling with Hadoop 2.6.0.
Why is the Hadoop jar you're referring is 2.4.0? Also, what Hadoop version
you're compiling Oozie and you're code submitting to Oozie against?

There might be different versions in play on the Oozie classpath and that
can cause issues like this.
gp

On Thu, Jun 7, 2018 at 3:39 PM, Jaboy Mathai 
wrote:

> Hi Peter,
>
> Good Day!
>
> I managed to build oozie 4.3.1 locally and tried to run job. Oozie starts
> up
> fine. But I get below exception now in the oozie logs.
>
> 2018-06-06 00:08:45,862  INFO ActionStartXCommand:520 -
> SERVER[svdt5neonhadoop01.safaricom.net] USER[hadoop] GROUP[-] TOKEN[]
> APP[SAFARICOM_KENYA_CBS_RECHARGE] JOB[003-180605235623225-
> oozie-hado-W]
> ACTION[003-180605235623225-oozie-hado-W@GetProcessedFileDetails_
> Recharge_infile]
> Start action
> [003-180605235623225-oozie-hado-W@GetProcessedFileDetails_
> Recharge_infile]
> with user-retry state : userRetryCount [0], userRetryMax [0],
> userRetryInterval [10]
> 2018-06-06 00:08:45,990 ERROR ActionStartXCommand:517 -
> SERVER[svdt5neonhadoop01.safaricom.net] USER[hadoop] GROUP[-] TOKEN[]
> APP[SAFARICOM_KENYA_CBS_RECHARGE] JOB[003-180605235623225-
> oozie-hado-W]
> ACTION[003-180605235623225-oozie-hado-W@GetProcessedFileDetails_
> Recharge_infile]
> Error,
> java.lang.NoSuchMethodError:
> org.apache.hadoop.yarn.util.timeline.TimelineUtils.
> buildTimelineTokenService(Lorg/apache/hadoop/conf/
> Configuration;)Lorg/apache/hadoop/io/Text;
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.
> serviceInit(YarnClientImpl.java:166)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(
> ResourceMgrDelegate.java:102)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.mapred.ResourceMgrDelegate.(
> ResourceMgrDelegate.java:96)
> at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:112)
> at
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(
> YarnClientProtocolProvider.java:34)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
>
>
> I have checked for the jar file having this method and found that  the
> class
> ' org.apache.hadoop.yarn.util.timeline.TimelineUtils' is present in the
> jar
> file ' hadoop-yarn-common-2.4.0.jar' but could find the jar for '
> org.apache.hadoop.yarn.util.timeline.TimelineUtils.
> buildTimelineTokenService'
> , hence don’t know that which would be the dependent jar file here.
>
> [hadoop@svdt5neonhadoop01 lib]$
> [hadoop@svdt5neonhadoop01 lib]$ egrep -i
> org.apache.hadoop.yarn.util.timeline.TimelineUtils *
> Binary file hadoop-yarn-common-2.4.0.jar matches
> [hadoop@svdt5neonhadoop01 lib]$
> [hadoop@svdt5neonhadoop01 lib]$ egrep -i
> org.apache.hadoop.yarn.util.timeline.TimelineUtils.
> buildTimelineTokenService
> *
> [hadoop@svdt5neonhadoop01 lib]$
> [hadoop@svdt5neonhadoop01 lib]$ ls |wc -l
> 160
> [hadoop@svdt5neonhadoop01 lib]$ pwd
> /usr/local/oozie/lib
> [hadoop@svdt5neonhadoop01 lib]$
>
>
> Please help !
>
> Thanks & Regards,
> Jaboy Mathai
>
>
>
>
> -Original Message-
> From: Peter Cseh [mailto:gezap...@cloudera.com]
> Sent: 01 June 2018 15:48
> To: user@oozie.apache.org
> Cc: Saurabh Kumar; Balakrishnan Nagiah; Vinod Rajasekharan; Shivam Garg
> Subject: Re: Reg: Oozie 4.1.0 having jdbc error
>
> I'm not aware of a binary distribution of Apache Oozie available to
> download.
> There are modified versions here and there (e.g. CDH's Oozie can be
> installed via rpm:
> https://www.cloudera.com/documentation/enterprise/5-14-
> x/topics/cdh_ig_cdh5_install.html
> , but there are others available as well).
>
> Peter
>
> On Fri, Jun 1, 2018 at 11:53 AM, Jaboy Mathai 
> wrote:
>
> > Dear Peter,
> >
> > Thanks for your reply. I am using the same Maven 3.5.3 but the java
> > version is '1.7.0_67'. Will try to build version 4.3.1 , but I am not
> > sure because we don’t have local environment(having RHEL) with
> > internet connection as you pointed. That’s why, I requested for a
> > build, that may be readily available.
> > Let me try building v.4.3.1 and will let you know.
> >
> > Best Regards
> > Jaboy
> >
> > -Original Message-
> > From: Peter Cseh [mailto:gezap...@cloudera.com]
> > Sent: 01 June 2018 14:49
> > To:

Re: Delegation Token Expiring

2018-06-14 Thread Peter Cseh
Hey!

Yes, there were some issues about expiring tokens in a non-kerberized
environment. They should be fixed in Oozie 5.0 where we don't get tokens if
they are not required.
By setting the renewer to "yarn" we allow the ResourceManager to renew our
delegation tokens.
This blogpost explains a lot about this issue:
https://blog.cloudera.com/blog/2017/12/hadoop-delegation-tokens-explained/

Hope it helps,
gp

On Wed, Jun 13, 2018 at 5:24 PM Daminato,Josh
 wrote:

> We recently ran into an issue where our launcher task attempted to kick
> off map reduce jobs after the delegation token provided by Oozie had
> expired.
>
> We found that we could increase
> 'yarn.resourcemanager.delegation.token.renew-interval', but we also started
> thinking that maybe it made sense for Oozie itself to renew these tokens.
>
> Oozie is already monitoring the Java actions that it kicks off, so we
> thought why not have it also keep the delegation tokens that it provided to
> that action alive while the action is still running.
>
> We are currently running without Kerberos enabled, and on 4.1.0 version of
> Oozie.
>
> I fiddled around with renewing the token programmatically in the launcher
> task, and was able to get it working by pretending to be the fake 'oozie mr
> token' user that Oozie sets as the renewer in an insecure cluster. But
> switching to that user to renew a delegation token is a hack.
>
> I also experimented briefly on a cluster with Kerberos enabled, and I
> found that Oozie set 'yarn' as the renewer of the 'RM_DELEGATION_TOKEN'.
> Not sure why this is. Will the resource manager renew this token?
>
>
> Curious on anyones thoughts about Oozie automagically renewing the
> delegation tokens that it passes to Java actions while the actions are
> still running.
>
>
> Thanks,
> Josh
>
>
>
>
> CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Reg: Oozie 4.1.0 having jdbc error

2018-06-01 Thread Peter Cseh
> or switch firewalls expected).
>
>
>
> Checked for iptables and firewalls – no firewalls are running in the
> cluster.
>
>
>
> *Please note:* Same job (workflow.xml) is running fine in one of our
> other cluster with below versions:
>
>
>
> Oozie 4.1.0  , RHEL 7.3   and database in mariadb 5.5.41.
>
>
>
>
>
> *Attaching the logs having oozie startup and the job run log having the
> error, with this e-mail.*
>
>
>
> *Is there any issue with oozie 4.1.0 working in RHEL 7.5 ?*
>
>
>
> Tried to install oozie 4.3.0 but I am stuck as per below screenshot, when
> I try to run below command:
>
>
>
> # ./mkdistro.sh –DskipTests
>
>
>
>
>
>
>
> I am able to wget the apache-17.pom from the linux shell prompt but the
> build process fails. We have added the required proxy to access the
> internet. I don’t know if any proxy to be added so that the build process
> comes to know about the internet availability through proxy.
>
>
>
> Please suggest a solution with the shortest approach possible to solve
> this. Will appreciate if someone have the oozie 4.3.0 for RHEL 7.5 readily
> available and could share.
>
>
>
> Below is the oozie related settings:
>
>
>
>
>
>
>
>
>
> WARN: Use of this script is deprecated; use 'oozied.sh stop' instead
>
>
>
> Setting OOZIE_HOME:  /usr/local/oozie
>
> Setting OOZIE_CONFIG:/usr/local/oozie/conf
>
> Sourcing:/usr/local/oozie/conf/oozie-env.sh
>
>   setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
>
>   setting OOZIE_BASE_URL="http://${OOZIE_HTTP_HOSTNAME}:${OOZIE_
> HTTP_PORT}/oozie"
>
> Setting OOZIE_CONFIG_FILE:   oozie-site.xml
>
> Setting OOZIE_DATA:  /usr/local/oozie/data
>
> Setting OOZIE_LOG:   /usr/local/oozie/logs
>
> Setting OOZIE_LOG4J_FILE:oozie-log4j.properties
>
> Setting OOZIE_LOG4J_RELOAD:  10
>
> Setting OOZIE_HTTP_HOSTNAME: svdt5neonhadoop01.safaricom.net
>
> Setting OOZIE_HTTP_PORT: 11000
>
> Setting OOZIE_ADMIN_PORT: 11001
>
> Setting OOZIE_HTTPS_PORT: 11443
>
> Using   OOZIE_BASE_URL:  http://:/oozie
>
> Setting CATALINA_BASE:   /usr/local/oozie/oozie-server
>
> Setting OOZIE_HTTPS_KEYSTORE_FILE: /home/hadoop/.keystore
>
> Setting OOZIE_HTTPS_KEYSTORE_PASS: password
>
> Setting OOZIE_INSTANCE_ID:   svdt5neonhadoop01.safaricom.net
>
> Setting CATALINA_OUT:/usr/local/oozie/logs/catalina.out
>
> Setting CATALINA_PID:/usr/local/oozie/oozie-server/temp/oozie.pid
>
>
>
> Using   CATALINA_OPTS:-Xmx1024m -Dderby.stream.error.file=/
> usr/local/oozie/logs/derby.log
>
> Adding to CATALINA_OPTS: -Doozie.home.dir=/usr/local/oozie
> -Doozie.config.dir=/usr/local/oozie/conf -Doozie.log.dir=/usr/local/oozie/logs
> -Doozie.data.dir=/usr/local/oozie/data -Doozie.instance.id=svdt5neonh
> adoop01.safaricom.net -Doozie.config.file=oozie-site.xml
> -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10
> -Doozie.http.hostname=svdt5neonhadoop01.safaricom.net
> -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443
> -Doozie.base.url=http://:/oozie 
> -Doozie.https.keystore.file=/home/hadoop/.keystore
> -Doozie.https.keystore.pass=password -Djava.library.path=
>
> Using CATALINA_BASE:   /usr/local/oozie/oozie-server
>
> Using CATALINA_HOME:   /usr/local/oozie/oozie-server
>
> Using CATALINA_TMPDIR: /usr/local/oozie/oozie-server/temp
>
> Using JRE_HOME:/usr/java/jdk1.7.0_67
>
> Using CLASSPATH:   /usr/local/oozie/oozie-server/bin/bootstrap.jar
>
> Using CATALINA_PID:/usr/local/oozie/oozie-server/temp/oozie.pid
>
>
>
>
>
> Thanks & Regards,
>
> *Jaboy Mathai*
>
>
>



-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: oozie in Shell script

2018-06-25 Thread Peter Cseh
Hey,

I don't think I understand the whole picture, but in general:
- try to use sqoop action for executing sqoop commands
- for date-related scheduling, use coordinators. They can handle catch-up
and other stuff for you
- try to split your shell script to atomic steps and use the action-data
field to communicate between them

Hope it helps,
gp

On Thu, Jun 21, 2018 at 9:00 PM Sowjanya Kakarala 
wrote:

> Hi Guys,
>
> I am trying to build a workflow, which should get commands from a shell
> script and oozie job has to complete that sqoop command and then take the
> other command from same shell script.
>
> for example:
> my shell script have sqoop command and automatically looped over, from
> start date within it after completing one after other till given end date,
> when the sqoop command is getting generated at that point I wanted to call
> oozie and run that until the shell script hits end date.
>
> I saw examples the other way, but it is not what i wanted.
>
> Is it possible? any suggestions will help.
>
> Thanks
> Sowjanya
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Oozie Presentation - New Features Overview and Ambari GUI

2018-04-30 Thread Peter Cseh
Thanks Clay for sharing these!

On Sun, Apr 29, 2018 at 8:44 PM, Clay B. <c...@clayb.net> wrote:

> Hi Oozie Users,
>
> Recently the ever productive Artem Ervits[1] and I presented at DataWorks
> Summit Berlin on some of the recent Oozie community work. We focused on new
> features to Oozie and the Ambari Workflow Manager for building and managing
> Oozie workflows all within a GUI.
>
> To see our slides, please see: http://bit.ly/DataWorks_Breath
> ing_New_Life_into_Oozie
>
> Also, if of interest, please see more from my past year's presentations:
> * HBase and Oozie[2] (specifically ideas around using HBase delegation
>   tokens in Java actions and using Oozie as a controlled privilege
>   escalation for HBase export snapshot)
> * Continuous delivery with Oozie (particularly ideas around
>   OOZIE-2877)[3].
>
> Cheers,
> Clay
>
> [1]: Artem Ervits:
> * JIRAs: http://bit.ly/artems_oozie_jiras
> * LinkedIn: https://twitter.com/dbist/status/987160309264801792
> * Tweet of the event: https://twitter.com/dbist/status/987160309264801792
> (This is only half the audience and does not include the folks standing
> too!)
>
> [2]: DataWorks Summit San Jose 2017: "Multitenancy At Bloomberg - HBase
> and Oozie":
> Slides: http://bit.ly/DataWorks_Multitenancy_at_Bloomberg
> Video: https://www.youtube.com/watch?v=iPCA1ZTitQk
>
> [3]: Apache: Big Data North America 2017: "Cluster Continuous Delivery
> with Oozie":
> Slides: http://bit.ly/ApacheCon_Cluster_Continuous_Delivery_with_Oozie
>



-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: HDP3.0: oozie cannot start

2018-07-30 Thread Peter Cseh
es.java:305)
> ... 26 more
>
>
>
> My oozie settings which worked in HDP2.6 is:
>
> {
>   "oozie-site": {
> "properties": {
>   "oozie.service.JPAService.jdbc.username": "admin",
>   "oozie.service.JPAService.jdbc.password": "%SERVICE_PASSWORD%",
>   "oozie.email.from.address": "had...@something.com",
>   "oozie.email.smtp.auth": "Yes",
>   "oozie.email.smtp.host": "localhost",
>   "oozie.email.smtp.username": "admin",
>   "oozie.email.smtp.password": "%EMAIL_PASSWORD%"
> }
>   }
> }
>
> Any idea? Thanks for any hints.
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: HDP3.0: oozie cannot start

2018-08-13 Thread Peter Cseh
Thanks Lian for posting the resolution!


On Thu, Aug 9, 2018 at 6:06 PM Lian Jiang  wrote:

> By reading 4.3.1 oozie source code, I solved the problem by adding
> oozie.service.JPAService.validate.db.connection": "false". Thanks.
>
> On Wed, Aug 8, 2018 at 3:49 PM, Lian Jiang  wrote:
>
> > Thanks Peter. I attached sanitized
> /etc/oozie/3.0.0.0-1634/0/oozie-site.xml.
> > Please let me know if you see any issue.
> >
> > On Mon, Jul 30, 2018 at 1:23 AM, Peter Cseh
>  > > wrote:
> >
> >> Hi!
> >>
> >> I'm not familiar with the details of HDP, but based on the error message
> >> your xml is not well formatted.
> >>
> >> Based on the code:
> >> https://github.com/apache/oozie/blob/master/core/src/main/
> >> java/org/apache/oozie/service/JPAService.java#L198
> >> the issue is in your driver classname or url for the jdbc.
> >> Please check the generated xml files, or share them after removing
> >> confidential information.
> >>
> >> All the best,
> >> gp
> >>
> >>
> >> On Sun, Jul 29, 2018 at 7:24 PM Lian Jiang 
> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am installing HDP3.0 using ambari 2.7. OOZIE failed to start due to
> >> below
> >> > error:
> >> >
> >> >
> >> > 2018-07-29 06:21:27,402  WARN ConfigurationService:523 - SERVER[
> >> > prod-namenode.subnet1.hadoop.oraclevcn.com] Invalid configuration
> >> defined,
> >> > [credentialStoreClassPath]
> >> > 2018-07-29 06:21:27,406  WARN Services:523 - SERVER[
> >> > prod-namenode.subnet1.hadoop.oraclevcn.com] System ID [oozie-oozi]
> >> exceeds
> >> > maximum length [10], trimming
> >> > 2018-07-29 06:21:27,631  WARN ConfigUtils:523 - SERVER[
> >> > prod-namenode.subnet1.hadoop.oraclevcn.com] Using a deprecated
> >> > configuration property
> >> > [oozie.service.AuthorizationService.security.enabled], should use
> >> > [oozie.service.AuthorizationService.authorization.enabled].  Please
> >> delete
> >> > the deprecated property in order for the new property to take effect.
> >> > 2018-07-29 06:21:28,551 FATAL Services:514 - SERVER[
> >> > prod-namenode.subnet1.hadoop.oraclevcn.com] Runtime Exception during
> >> > Services Load. Check your list of 'oozie.services' or
> >> 'oozie.services.ext'
> >> > 2018-07-29 06:21:28,558 FATAL Services:514 - SERVER[
> >> > prod-namenode.subnet1.hadoop.oraclevcn.com] E0103: Could not load
> >> service
> >> > classes, Unmatched braces in the pattern.
> >> > org.apache.oozie.service.ServiceException: E0103: Could not load
> >> service
> >> > classes, Unmatched braces in the pattern.
> >> > at
> >> > org.apache.oozie.service.Services.loadServices(Services.java:309)
> >> > at org.apache.oozie.service.Services.init(Services.java:213)
> >> > at
> >> >
> >> > org.apache.oozie.servlet.ServicesLoader.contextInitialized(S
> >> ervicesLoader.java:46)
> >> > at
> >> >
> >> > org.apache.catalina.core.StandardContext.listenerStart(Stand
> >> ardContext.java:4276)
> >> > at
> >> > org.apache.catalina.core.StandardContext.start(StandardConte
> >> xt.java:4779)
> >> > at
> >> >
> >> > org.apache.catalina.core.ContainerBase.addChildInternal(Cont
> >> ainerBase.java:803)
> >> > at
> >> >
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:780)
> >> > at
> >> > org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583)
> >> > at
> >> >
> >> > org.apache.catalina.startup.HostConfig.deployDescriptor(Host
> >> Config.java:676)
> >> > at
> >> >
> >> > org.apache.catalina.startup.HostConfig.deployDescriptors(Hos
> >> tConfig.java:602)
> >> > at
> >> > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:503)
> >> > at
> >> > org.apache.catalina.startup.HostConfig.start(HostConfig.java:1322)
> >> > at
> >> > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostCo
> >> nfig.java:325)
> >> > at
> >> >
> >> > or

Re: Oozie 5.0.0 Launcher Out of Memory error

2018-08-30 Thread Peter Cseh
Hey Suresh!

The old way of setting the memory limits via mapreduce properties should
work after OOZIE-2896 <https://issues.apache.org/jira/browse/OOZIE-2896> See
that jira for some details.
However the new - and preferred - way of doing so is to add a launcher
configuration to the action like:


4096

This should go between the name-node and the job-xml tags or can go into
the workflow's  section to apply to every action in there.
See the Common schema
<https://oozie.apache.org/docs/5.0.0/WorkflowFunctionalSpec.html> here for
some details.

The third option is to increase the default by setting the
oozie.launcher.default.memory.mb in the oozie-site.xml.

Hope it helps,
gp

On Wed, Aug 29, 2018 at 10:02 PM Suresh V  wrote:

> We recently launched an EMR cluster with latest version of Oozie that is
> Oozie 5.0.0.
>
> We understand the Oozie launcher is no more a mpareduce job in Yarn.
> We see that it shows up as 'Oozie launcher' in the Yarn UI.
>
> Our workflow has a Spark and a Sqoop job, and the launcher is failing first
> attempt with running out of memory error, causing it to try multiple
> attempts.
> Please advise where we can set the memory limits for Oozie launcher, in
> order to work around this error?
>
> AM Container for appattempt_1534385557019_0066_01 exited with exitCode:
> -104
> Failing this attempt.Diagnostics: Container
> [pid=19109,containerID=container_1534385557019_0066_01_01] is running
> beyond physical memory limits. Current usage: 2.2 GB of 2 GB physical
> memory used; 9.9 GB of 10 GB virtual memory used. Killing container.
>
>
> Thank you
> Suresh.
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: Oozie web console - links to job

2018-03-27 Thread Peter Cseh
Hey!

I've never used Oozie 3.3 so I'm not familiar with what changed in 3.3 in
regards to 4.3. Can you file a Jira with screenshots and REST calls if
possible?
We should track down what's changed here.

gp

On Tue, Mar 27, 2018 at 11:48 AM, Łukasz Kulawczuk <
l.kulawc...@ipipan.waw.pl> wrote:

> I have successfully upgraded from Oozie 3.3 to Oozie 4.3.
>
> In previous version, when luncher finished the links in web console where
> updated in the way, that they where pointing to the actual job. After
> upgrade the links are not updated, so I have to manually search job in yarn
> console to control counters and logs of the job.
>
> Is this a feature or a bug? Is there an option to restore previous
> behavior? If so, how to do this?
>
>


-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Re: Oozie 5.1.0 release and plans for the year

2018-10-04 Thread Peter Cseh
Sure, go ahead Andras. Thanks for stepping up!
Unfortunately I have too much on my plate currently to manage the release.
I'll happily be part of it's testing though!

gp

On Wed, Oct 3, 2018 at 4:17 PM Andras Piros
 wrote:

> Hi everyone,
>
> Gp, I can take over 5.1.0 release management. AFAIK we're almost a go for
> RC0.
>
> Regards,
>
> Andras
>
> On Tue, Jun 19, 2018 at 10:32 PM Artem Ervits 
> wrote:
>
> > +1 on the plans, I won't get to the jiras I'm assigned to until next
> week.
> > Don't want to be a blocker.
> >
> > On Tue, Jun 19, 2018, 4:48 AM Andras Piros  wrote:
> >
> > > Good idea Gp!
> > >
> > > Thanks for volunteering as the release manager for 5.1.0. I can see *a
> > few
> > > blockers
> > > <
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20OOZIE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%205.1.0%20AND%20priority%20%3D%20Blocker%20ORDER%20BY%20%20%20priority%20DESC%2C%20updated%20DESC
> > > >*
> > > for 5.1.0 now, of which *OOZIE-3178
> > > <https://issues.apache.org/jira/browse/OOZIE-3178>* seems to be a real
> > > one.
> > >
> > > Regards,
> > >
> > > Andras
> > >
> > > On Tue, Jun 19, 2018 at 12:50 PM Gézapeti Cseh 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Now, that Oozie-2339 is in (with other fixes as well), I think it
> would
> > > be
> > > > nice to release it as part of Oozie 5.1.0 so more people will try it
> > out
> > > > before we jump in to do the coordinator/bunlde part as well.
> > > >
> > > > Also, we're planning to pick up on new action types with Andras and
> > Peter
> > > > and probably will do a bunch of new releases as they are ready later
> > this
> > > > year. The git action is the closest one, but there are others in the
> > > > pipeline like Maven and callback.
> > > >
> > > > Is anyone aware of issues in which we should wait for to start
> working
> > on
> > > > 5.1.0?
> > > > If nobody else does, I can volunteer to be the release manager for it
> > and
> > > > start the branching process in a week or so.
> > > >
> > > > thanks
> > > > gp
> > > >
> > >
> >
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: oozie 5.0.0 on AWS EMR

2019-03-25 Thread Peter Cseh
 sure, but the above code indeed returns the FileSystem
> eventually complains "WRONG FS" in my case, and the above commit changes
> the "jobConf" from the createJobConf to createConfiguration.
>
> So my question here, do you think that it is the above change causing my
> issue? If so, I believe there is a reason for the above commit, but do I
> have a solution also for my use case?
>
> Thanks
>
> Yong
>
>

-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--


Re: oozie 5.0.0 on AWS EMR

2019-03-25 Thread Peter Cseh
Hi Yong,
The usage of local filesystems are strictly prohibited in Oozie 5.0.
I'd guess you have a hdfs://seomnode as fs.defaultFS and you're providing
the S3 credentials for the job only.
I'll try to carve out some time to reproduce and fix this, but I can't
promise you anything soon due to other priorities.
Once we have the reproduction steps, we should file a Jira for this.

gp

On Mon, Mar 25, 2019 at 8:34 PM  wrote:

> Hi Yong
>
> Have you also tried s3a in place of s3?
>
>
> -
> Suresh.
>
>
> > On Mar 25, 2019, at 2:03 PM, Peter Cseh 
> wrote:
> >
> > Hey Yong,
> >
> > Thanks for reporting this issue!
> > If I see correctly, your Oozie is set up to talk to a HDFS instance and
> to
> > S3 as well. This is not a scenario I'm too familiar with.
> > Could you give us some easy-to-follow steps to reproduce this?
> > Thanks
> > gp
> >
> >> On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang 
> wrote:
> >>
> >> Hi, oozier:
> >>
> >> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> >> 4.3.
> >>
> >> We found out one nice feature was broken for us on Oozie 5.0.0,
> >> unfortunately.
> >>
> >> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> >> release repository, and in the oozie application properties file, we
> just
> >> use as following:
> >>
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
> >>
> >> And oozie 4.3 runtime will load all the application code from the S3,
> and
> >> still use the oozie sharelib from the HDFS for us, and whole application
> >> workflow works perfectly.
> >>
> >> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as
> our
> >> application repository anymore. The same application will WORK fine if
> the
> >> application is stored in HDFS. But if stored in S3, we got the following
> >> error message:
> >>
> >> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> >> create lib paths list for application
> >> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
> >>at org.apache.oozie.command.wf
> >> .SubmitXCommand.execute(SubmitXCommand.java:168)
> >>... 36 more
> >> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
> >>at
> >> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
> >>at
> >>
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
> >>at
> >>
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
> >>at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
> >>at
> >> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
> >>at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
> >>... 37 more
> >>
> >> It looks like if we config the APP path as in S3 by
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will
> complain
> >> that it cannot load the sharelib any more from the HDFS URI, even though
> >> the all the share lib are indeed stored in the HFDS correct location as
> >> specified in the error message.
> >>
> >> With this error message, I found out the following commit in the Oozie
> 5.0
> >>
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>
> >> Since the error comes from the FileSystem in
> >> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >,
> >> so I think MAYBE above commit causing it?
> 

Re: Oozie 5.2.0 Build Fail

2020-02-03 Thread Peter Cseh
Hi!

Oozie's master is compiling with Hive 1.2.2 by default.
It looks like we haven't been  keeping up with our dependencies and when
Hive changed to Log4j2, it breaks our stuff.
I've managed to fix the distro build by changing the order of dependencies
in the core/pom.xml. I've put log4j-related entries to the top. (See
attached file)
gp



On Sat, Feb 1, 2020 at 7:04 PM Kaden Cho  wrote:

> I tried to build Oozie on:
>
> - Debian GNU/Linux 8 (jessie)
> - Java 1.8.0_221
> - Maven 3.6.3
>
> with 'bin/mkdistro.sh -P uber -DskipTests -Dhadoop.version=2.7.4
> -Dhive.version=2.1.1 -e'
>
> but I failed with the error like following:
>
> [*ERROR*] Failed to execute goal
> org.apache.maven.plugins:maven-compiler-plugin:3.7.0:testCompile
> *(default-testCompile)* on project oozie-core: *Compilation failure*:
> Compilation failure:
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java:[213,47]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getLevel()
>
> [*ERROR*]   location: variable logEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java:[214,32]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getMessage()
>
> [*ERROR*]   location: variable logEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java:[215,82]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getLoggerName()
>
> [*ERROR*]   location: variable logEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java:[221,47]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getLevel()
>
> [*ERROR*]   location: variable logEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java:[222,32]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getMessage()
>
> [*ERROR*]   location: variable logEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java:[231,32]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getMessage()
>
> [*ERROR*]   location: variable logEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java:[240,32]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getMessage()
>
> [*ERROR*]   location: variable logEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java:[848,48]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getLevel()
>
> [*ERROR*]   location: variable firstLogEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java:[849,33]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getMessage()
>
> [*ERROR*]   location: variable firstLogEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*]
>
> /tmp/oozie-5.2.0/core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java:[850,79]
> cannot find symbol
>
> [*ERROR*]   symbol:   method getLoggerName()
>
> [*ERROR*]   location: variable firstLogEntry of type
> org.apache.log4j.spi.LoggingEvent
>
> [*ERROR*] -> *[Help 1]*
>
> [*ERROR*]
>
> [*ERROR*] To see the full stack trace of the errors, re-run Maven with the
> *-e* switch.
>
> [*ERROR*] Re-run Maven using the *-X* switch to enable full debug logging.
>
> [*ERROR*]
>
> [*ERROR*] For more information about the errors and possible solutions,
> please read the following articles:
>
> [*ERROR*] *[Help 1]*
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
> [*ERROR*]
>
> [*ERROR*] After correcting the problems, you can resume the build with the
> command
>
> [*ERROR*]   *mvn  -rf :oozie-core*
>
>
> ERROR, Oozie distro creation failed
>
>
> Any idea?
>


-- 
*Peter Cseh* | Software Engineer, Cloudera Search
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
<https://www.cloudera.com/>
--


Re: Passing variables from dataset to workflow in Coordinator

2020-08-17 Thread Peter Cseh
Hey!

Unfortunately there is no way I can recall to do this more easily.
But yeah, it would make sense to have a function for this.

gp

On Mon, Aug 10, 2020 at 5:27 PM Lars Francke  wrote:

> Hi,
>
> I have a simple coordinator with a single dataset:
>
>
> ${hadoop_nameNode}/${coord_stagingFolder}/${YEAR}/${MONTH}/${DAY}
>
> I also have a corresponding input-event:
>
> 
>   ${coord:current(-1)}
> 
>
> Now in my workflow I need to pass in the ${YEAR}, MONTH and DAY variables
> as properties.
> So far we've always used "fake"  and basically template our
> variables that way (so used data-out things that don't actually correspond
> to a folder anywhere).
>
> What's the correct way of accessing the data we need?
>
> We can also parse the output of ${coord:dataIn('foo_event')} but that seems
> a bit redundant.
> I hope I'm missing something.
>
> Thank you for your help!
>
> Cheers,
> Lars
>


-- 
*Peter Cseh* | Software Engineer, Cloudera Search
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
<https://www.cloudera.com/>
--


Re: org.apache.hadoop.security.AccessControlException:Client cannot authenticate via:[TOKEN,KERBEROS]

2020-07-14 Thread Peter Cseh
Hey,
I can't see any of the images you've posted. Can you change them to code
snippets?
Thank you
gp

On Tue, Jul 14, 2020 at 5:28 PM qq <987626...@qq.com> wrote:

> Hello:
>
> The following error occurred while running the MapReduce task using Oozie:
>
>
> The cluster environment information is as follows:
> oozie version is 5.2.0
> hadoop version is 3.2.1
> the authentication mode is kerberos.
>
> When I deleted the code in the red box, the problem was solved.
>
> Can anyone tell me other ways to solve this problem?
>
> thinks.
> I am looking forward to your reply!
>


-- 
*Peter Cseh* | Software Engineer, Cloudera Search
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
<https://www.cloudera.com/>
--


Re: Oozie(5.2.0)get jhs delegation token occured exception in updateCredentials

2020-11-09 Thread Peter Cseh
Have you set up your kerberos configuration correctly?
Is the JobHistoryServer URL is configured? Is JHS up?
gp

On Mon, Nov 9, 2020 at 9:47 AM 泳...  wrote:

> hi:
>
>
> When I use Oozie(5.2.0) Encountered the follow problem.
> please give me some advice,Thanks!
>
>
>
> Operation Environment:
> Apache Oozie 5.2.0
> Apache Hadoop 3.1.1 with Kerberos authentication
>
> Runing Map-reduce exmaple Action:GET JHS_DELEGATION_TOKEN FAIL
> 2020-11-09 10:24:27,193 DEBUG JHSCredentials:526 - SERVER[hadoop301]
> USER[hadoop] GROUP[-] TOKEN[] APP[shell-wf]
> JOB[000-201109102404406-oozie-hado-W]
> ACTION[000-201109102404406-oozie-hado-W@shell-node] exception in
> updateCredentials
> java.io.IOException: DestHost:destPort hadoop301:10020 ,
> LocalHost:localPort hadoop301.bonc.com/172.16.13.11:0. Failed on local
> exception: java.io.IOException:
> org.apache.hadoop.security.AccessControlException: Client cannot
> authenticate via:[TOKEN, KERBEROS]
>
>
>
>
> If modify this part of the code,just shell action will normal
> running,Map-reduce action Still failed.
>  org.apache.oozie.action.hadoop.JavaActionExecutor
> ...
>  private void
> addHadoopCredentialPropertiesToActionConf(Map CredentialsProperties credentialsProperties) {
>  LOG.info("Adding default
> credentials for action: hdfs, yarn and jhs");
> 
> addHadoopCredentialProperties(credentialsProperties,
> CredentialsProviderFactory.HDFS);
> 
> addHadoopCredentialProperties(credentialsProperties,
> CredentialsProviderFactory.YARN);
> 
> //addHadoopCredentialProperties(credentialsProperties,
> CredentialsProviderFactory.JHS);
>  }
> ...



-- 
*Peter Cseh* | Software Engineer, Cloudera Search
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
<https://www.cloudera.com/>
--


Re: RPC response exceeds max length

2021-05-07 Thread Peter Cseh
Hey Aravind,

This is a HDFS-level property that you can set in the hdfs-site or
core-site.xml of your HDFS config.
See the details here:
https://stackoverflow.com/questions/53633054/oozie-ja009-rpc-response-exceeds-maximum-data-length
Hope it helps
gp

On Wed, May 5, 2021 at 10:53 PM Aravind Srinivasan
 wrote:

> Team,
>
> Any idea what could be causing this error on a simple Hive action in Oozie?
>
> JA009: RPC response exceeds maximum data length
>
> Stack trace attached.
>
> Thanks,
> Aravind
>
>
>

-- 
*Peter Cseh* | Software Engineer, Cloudera Search
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
<https://www.cloudera.com/>
--


Re: recovering from el expression failure

2021-02-16 Thread Peter Cseh
Unfortunately I don't think so. It's not possible to define an 
transition inside the decision or the switch node.
It would be a nice feature though.
gp

On Mon, Feb 15, 2021 at 9:46 PM jelmer  wrote:

> no
>
> On Mon, 15 Feb 2021 at 21:30, Trevor Grayson 
> wrote:
>
> > unsubscribe
> >
> > On Mon, Feb 15, 2021 at 9:57 AM jelmer  wrote:
> >
> > > Hi,
> > >
> > > In my workflow I have a switch that looks like this
> > >
> > > 
> > > ${fs:exists('hdfs://host/path')}
> > > 
> > > 
> > >
> > > However the exists el expression can throw an exception when the
> namenode
> > > is not whiteliisted in oozie.
> > >
> > > Unfortunately the whitelist is not under my control and when this
> > exception
> > > is raised I would like it to use the default transition.
> > >
> > > Is it possible to do this ?
> > >
> >
>


-- 
*Peter Cseh* | Software Engineer, Cloudera Search
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
<https://www.cloudera.com/>
--