Re: Apply patches to Apache Tez

2016-04-06 Thread Hitesh Shah
Apache releases are officially source-only releases. Some projects do provide 
binary jars for convenience but that depends on the project. Additionally, 
these will only be available for “releases” only and not for each and every 
patch applied on a branch. 

In your case, the only option is to download the source code for the release in 
question ( http://tez.apache.org/releases/index.html ). Download the patch file 
from JIRA and apply the patch against the source code. Build the source and 
deploy as explained in my previous email.

If you are willing to live with the downtime, you can re-build the code and 
replace the jars/tarball in place for the current ones ( new locations are 
needed if you want to do a live/rolling upgrade with no downtime - however the 
new location approach will also allow you to do some testing to verify 
correctness of your newly applied patches ). 

Additionally, for cases like this, feel free to ask/push the project community 
for making a new release of 0.7.1 to make your life a bit simpler. 0.7.1 has 
been pending on our plate for quite some time and we have been a bit lax on 
making a release for it. 

thanks
— Hitesh

On Apr 6, 2016, at 11:41 AM, Sam Joe  wrote:

> Hi Hitesh,
> 
> That surely helps!
> 
> However, how do I apply the .patch file to existing releases. For example, 
> Tez 0.7.0 has a bug which has been fixed through a JIRA with a .patch file 
> provided. No new set of jars are provided.
> 
> How do I apply that .patch file to my existing setup of jars?
> 
> Appreciate your help and time.
> 
> Thanks,
> Joel
> 
> On Wed, Apr 6, 2016 at 2:28 PM, Hitesh Shah  wrote:
> Every component has a different approach to how it is deployed/upgraded.
> 
> I can cover how you can go about patching Tez on an existing production 
> system. The steps should be similar to that described in INSTALL.md in the 
> source tree with a few minor gotchas to be aware of:
> 
>- Deploying Tez has 2 aspects:
>  - installing the client jars on the local filesystem which can then be 
> added to the class paths of various components such as Hive/Pig, etc that use 
> Tez. These components need the tez-api, tez-common, tez-mapreduce, 
> tez-runtime-library jars in their classpath for the most part ( this set is 
> bundled as tez-minimal tarball in tez-dist when you build Tez ). The 
> classpath manipulation is usually done by adding the tez jars to 
> HADOOP_CLASSPATH.
>  - installing the tez tarball on HDFS and configuring the configs to 
> point to the location of the tez tarball on HDFS.
> 
> Usually most bugs/patches tend to get applied to tez-dag and 
> tez-runtime-internals so for the most part you will likely only need to patch 
> the tez tarball. If you are moving to a new version, both the client side and 
> HDFS tarball need to be upgraded as there is an in-built check to ensure that 
> both sides are consistent/compatible.
> 
> To upgrade client side jars, it should be a simple option to install the new 
> jars in an appropriate location and modifying HADOOP_CLASSPATH to point to 
> the new location. Likewise for the tez tarball - upload the new tarball to a 
> new location and modify configs to point to the new location. The exact steps 
> would be the following:
>1) Upload new tez tarball to new location on HDFS
>2) Backup tez configs to a new tez config dir and modify tez.lib.uris to 
> point to the new tarball location
>3) Install new tez client side jars.
>4) Update HADOOP_CLASSPATH to contain location of new tez client jars as 
> well as new tez config dir
> 
> What the above does is ensure that existing jobs do not start failing in 
> between while things are being upgraded. As long as the old tarball is not 
> deleted while old jobs are runnings, existing jobs should not fail. New jobs 
> submitted with the new HADOOP_CLASSPATH will pick up the newly deployed bits.
> 
> Hope that helps
> — Hitesh
> 
> 
> On Apr 6, 2016, at 10:34 AM, Sam Joe  wrote:
> 
> > Hi,
> >
> > How do you apply patches to Tez or any other Hadoop component? For example 
> > if there is a bug in the existing classes used in a Hadoop component and 
> > it's resolved in a Jira, how do you apply that patch to the existing 
> > on-premise Hadoop setup? I think we should use Git but don't know the exact 
> > steps to do that. Please help.
> >
> >
> > Thanks,
> > Sam
> 
> 



Re: Apply patches to Apache Tez

2016-04-06 Thread Sam Joe
Hi Hitesh,

That surely helps!

However, how do I apply the .patch file to existing releases. For example,
Tez 0.7.0 has a bug which has been fixed through a JIRA with a .patch file
provided. No new set of jars are provided.

How do I apply that .patch file to my existing setup of jars?

Appreciate your help and time.

Thanks,
Joel

On Wed, Apr 6, 2016 at 2:28 PM, Hitesh Shah  wrote:

> Every component has a different approach to how it is deployed/upgraded.
>
> I can cover how you can go about patching Tez on an existing production
> system. The steps should be similar to that described in INSTALL.md in the
> source tree with a few minor gotchas to be aware of:
>
>- Deploying Tez has 2 aspects:
>  - installing the client jars on the local filesystem which can then
> be added to the class paths of various components such as Hive/Pig, etc
> that use Tez. These components need the tez-api, tez-common, tez-mapreduce,
> tez-runtime-library jars in their classpath for the most part ( this set is
> bundled as tez-minimal tarball in tez-dist when you build Tez ). The
> classpath manipulation is usually done by adding the tez jars to
> HADOOP_CLASSPATH.
>  - installing the tez tarball on HDFS and configuring the configs to
> point to the location of the tez tarball on HDFS.
>
> Usually most bugs/patches tend to get applied to tez-dag and
> tez-runtime-internals so for the most part you will likely only need to
> patch the tez tarball. If you are moving to a new version, both the client
> side and HDFS tarball need to be upgraded as there is an in-built check to
> ensure that both sides are consistent/compatible.
>
> To upgrade client side jars, it should be a simple option to install the
> new jars in an appropriate location and modifying HADOOP_CLASSPATH to point
> to the new location. Likewise for the tez tarball - upload the new tarball
> to a new location and modify configs to point to the new location. The
> exact steps would be the following:
>1) Upload new tez tarball to new location on HDFS
>2) Backup tez configs to a new tez config dir and modify tez.lib.uris
> to point to the new tarball location
>3) Install new tez client side jars.
>4) Update HADOOP_CLASSPATH to contain location of new tez client jars
> as well as new tez config dir
>
> What the above does is ensure that existing jobs do not start failing in
> between while things are being upgraded. As long as the old tarball is not
> deleted while old jobs are runnings, existing jobs should not fail. New
> jobs submitted with the new HADOOP_CLASSPATH will pick up the newly
> deployed bits.
>
> Hope that helps
> — Hitesh
>
>
> On Apr 6, 2016, at 10:34 AM, Sam Joe  wrote:
>
> > Hi,
> >
> > How do you apply patches to Tez or any other Hadoop component? For
> example if there is a bug in the existing classes used in a Hadoop
> component and it's resolved in a Jira, how do you apply that patch to the
> existing on-premise Hadoop setup? I think we should use Git but don't know
> the exact steps to do that. Please help.
> >
> >
> > Thanks,
> > Sam
>
>


Re: Apply patches to Apache Tez

2016-04-06 Thread Hitesh Shah
Every component has a different approach to how it is deployed/upgraded. 

I can cover how you can go about patching Tez on an existing production system. 
The steps should be similar to that described in INSTALL.md in the source tree 
with a few minor gotchas to be aware of:

   - Deploying Tez has 2 aspects: 
 - installing the client jars on the local filesystem which can then be 
added to the class paths of various components such as Hive/Pig, etc that use 
Tez. These components need the tez-api, tez-common, tez-mapreduce, 
tez-runtime-library jars in their classpath for the most part ( this set is 
bundled as tez-minimal tarball in tez-dist when you build Tez ). The classpath 
manipulation is usually done by adding the tez jars to HADOOP_CLASSPATH.
 - installing the tez tarball on HDFS and configuring the configs to point 
to the location of the tez tarball on HDFS. 

Usually most bugs/patches tend to get applied to tez-dag and 
tez-runtime-internals so for the most part you will likely only need to patch 
the tez tarball. If you are moving to a new version, both the client side and 
HDFS tarball need to be upgraded as there is an in-built check to ensure that 
both sides are consistent/compatible.  

To upgrade client side jars, it should be a simple option to install the new 
jars in an appropriate location and modifying HADOOP_CLASSPATH to point to the 
new location. Likewise for the tez tarball - upload the new tarball to a new 
location and modify configs to point to the new location. The exact steps would 
be the following: 
   1) Upload new tez tarball to new location on HDFS
   2) Backup tez configs to a new tez config dir and modify tez.lib.uris to 
point to the new tarball location
   3) Install new tez client side jars.
   4) Update HADOOP_CLASSPATH to contain location of new tez client jars as 
well as new tez config dir

What the above does is ensure that existing jobs do not start failing in 
between while things are being upgraded. As long as the old tarball is not 
deleted while old jobs are runnings, existing jobs should not fail. New jobs 
submitted with the new HADOOP_CLASSPATH will pick up the newly deployed bits. 

Hope that helps
— Hitesh


On Apr 6, 2016, at 10:34 AM, Sam Joe  wrote:

> Hi,
> 
> How do you apply patches to Tez or any other Hadoop component? For example if 
> there is a bug in the existing classes used in a Hadoop component and it's 
> resolved in a Jira, how do you apply that patch to the existing on-premise 
> Hadoop setup? I think we should use Git but don't know the exact steps to do 
> that. Please help.
> 
> 
> Thanks,
> Sam



Apply patches to Apache Tez

2016-04-06 Thread Sam Joe
Hi,

How do you apply patches to Tez or any other Hadoop component? For example
if there is a bug in the existing classes used in a Hadoop component and
it's resolved in a Jira, how do you apply that patch to the existing
on-premise Hadoop setup? I think we should use Git but don't know the exact
steps to do that. Please help.


Thanks,
Sam