Re: Apply patches to Apache Tez
Apache releases are officially source-only releases. Some projects do provide binary jars for convenience but that depends on the project. Additionally, these will only be available for “releases” only and not for each and every patch applied on a branch. In your case, the only option is to download the source code for the release in question ( http://tez.apache.org/releases/index.html ). Download the patch file from JIRA and apply the patch against the source code. Build the source and deploy as explained in my previous email. If you are willing to live with the downtime, you can re-build the code and replace the jars/tarball in place for the current ones ( new locations are needed if you want to do a live/rolling upgrade with no downtime - however the new location approach will also allow you to do some testing to verify correctness of your newly applied patches ). Additionally, for cases like this, feel free to ask/push the project community for making a new release of 0.7.1 to make your life a bit simpler. 0.7.1 has been pending on our plate for quite some time and we have been a bit lax on making a release for it. thanks — Hitesh On Apr 6, 2016, at 11:41 AM, Sam Joe wrote: > Hi Hitesh, > > That surely helps! > > However, how do I apply the .patch file to existing releases. For example, > Tez 0.7.0 has a bug which has been fixed through a JIRA with a .patch file > provided. No new set of jars are provided. > > How do I apply that .patch file to my existing setup of jars? > > Appreciate your help and time. > > Thanks, > Joel > > On Wed, Apr 6, 2016 at 2:28 PM, Hitesh Shah wrote: > Every component has a different approach to how it is deployed/upgraded. > > I can cover how you can go about patching Tez on an existing production > system. The steps should be similar to that described in INSTALL.md in the > source tree with a few minor gotchas to be aware of: > >- Deploying Tez has 2 aspects: > - installing the client jars on the local filesystem which can then be > added to the class paths of various components such as Hive/Pig, etc that use > Tez. These components need the tez-api, tez-common, tez-mapreduce, > tez-runtime-library jars in their classpath for the most part ( this set is > bundled as tez-minimal tarball in tez-dist when you build Tez ). The > classpath manipulation is usually done by adding the tez jars to > HADOOP_CLASSPATH. > - installing the tez tarball on HDFS and configuring the configs to > point to the location of the tez tarball on HDFS. > > Usually most bugs/patches tend to get applied to tez-dag and > tez-runtime-internals so for the most part you will likely only need to patch > the tez tarball. If you are moving to a new version, both the client side and > HDFS tarball need to be upgraded as there is an in-built check to ensure that > both sides are consistent/compatible. > > To upgrade client side jars, it should be a simple option to install the new > jars in an appropriate location and modifying HADOOP_CLASSPATH to point to > the new location. Likewise for the tez tarball - upload the new tarball to a > new location and modify configs to point to the new location. The exact steps > would be the following: >1) Upload new tez tarball to new location on HDFS >2) Backup tez configs to a new tez config dir and modify tez.lib.uris to > point to the new tarball location >3) Install new tez client side jars. >4) Update HADOOP_CLASSPATH to contain location of new tez client jars as > well as new tez config dir > > What the above does is ensure that existing jobs do not start failing in > between while things are being upgraded. As long as the old tarball is not > deleted while old jobs are runnings, existing jobs should not fail. New jobs > submitted with the new HADOOP_CLASSPATH will pick up the newly deployed bits. > > Hope that helps > — Hitesh > > > On Apr 6, 2016, at 10:34 AM, Sam Joe wrote: > > > Hi, > > > > How do you apply patches to Tez or any other Hadoop component? For example > > if there is a bug in the existing classes used in a Hadoop component and > > it's resolved in a Jira, how do you apply that patch to the existing > > on-premise Hadoop setup? I think we should use Git but don't know the exact > > steps to do that. Please help. > > > > > > Thanks, > > Sam > >
Re: Apply patches to Apache Tez
Hi Hitesh, That surely helps! However, how do I apply the .patch file to existing releases. For example, Tez 0.7.0 has a bug which has been fixed through a JIRA with a .patch file provided. No new set of jars are provided. How do I apply that .patch file to my existing setup of jars? Appreciate your help and time. Thanks, Joel On Wed, Apr 6, 2016 at 2:28 PM, Hitesh Shah wrote: > Every component has a different approach to how it is deployed/upgraded. > > I can cover how you can go about patching Tez on an existing production > system. The steps should be similar to that described in INSTALL.md in the > source tree with a few minor gotchas to be aware of: > >- Deploying Tez has 2 aspects: > - installing the client jars on the local filesystem which can then > be added to the class paths of various components such as Hive/Pig, etc > that use Tez. These components need the tez-api, tez-common, tez-mapreduce, > tez-runtime-library jars in their classpath for the most part ( this set is > bundled as tez-minimal tarball in tez-dist when you build Tez ). The > classpath manipulation is usually done by adding the tez jars to > HADOOP_CLASSPATH. > - installing the tez tarball on HDFS and configuring the configs to > point to the location of the tez tarball on HDFS. > > Usually most bugs/patches tend to get applied to tez-dag and > tez-runtime-internals so for the most part you will likely only need to > patch the tez tarball. If you are moving to a new version, both the client > side and HDFS tarball need to be upgraded as there is an in-built check to > ensure that both sides are consistent/compatible. > > To upgrade client side jars, it should be a simple option to install the > new jars in an appropriate location and modifying HADOOP_CLASSPATH to point > to the new location. Likewise for the tez tarball - upload the new tarball > to a new location and modify configs to point to the new location. The > exact steps would be the following: >1) Upload new tez tarball to new location on HDFS >2) Backup tez configs to a new tez config dir and modify tez.lib.uris > to point to the new tarball location >3) Install new tez client side jars. >4) Update HADOOP_CLASSPATH to contain location of new tez client jars > as well as new tez config dir > > What the above does is ensure that existing jobs do not start failing in > between while things are being upgraded. As long as the old tarball is not > deleted while old jobs are runnings, existing jobs should not fail. New > jobs submitted with the new HADOOP_CLASSPATH will pick up the newly > deployed bits. > > Hope that helps > — Hitesh > > > On Apr 6, 2016, at 10:34 AM, Sam Joe wrote: > > > Hi, > > > > How do you apply patches to Tez or any other Hadoop component? For > example if there is a bug in the existing classes used in a Hadoop > component and it's resolved in a Jira, how do you apply that patch to the > existing on-premise Hadoop setup? I think we should use Git but don't know > the exact steps to do that. Please help. > > > > > > Thanks, > > Sam > >
Re: Apply patches to Apache Tez
Every component has a different approach to how it is deployed/upgraded. I can cover how you can go about patching Tez on an existing production system. The steps should be similar to that described in INSTALL.md in the source tree with a few minor gotchas to be aware of: - Deploying Tez has 2 aspects: - installing the client jars on the local filesystem which can then be added to the class paths of various components such as Hive/Pig, etc that use Tez. These components need the tez-api, tez-common, tez-mapreduce, tez-runtime-library jars in their classpath for the most part ( this set is bundled as tez-minimal tarball in tez-dist when you build Tez ). The classpath manipulation is usually done by adding the tez jars to HADOOP_CLASSPATH. - installing the tez tarball on HDFS and configuring the configs to point to the location of the tez tarball on HDFS. Usually most bugs/patches tend to get applied to tez-dag and tez-runtime-internals so for the most part you will likely only need to patch the tez tarball. If you are moving to a new version, both the client side and HDFS tarball need to be upgraded as there is an in-built check to ensure that both sides are consistent/compatible. To upgrade client side jars, it should be a simple option to install the new jars in an appropriate location and modifying HADOOP_CLASSPATH to point to the new location. Likewise for the tez tarball - upload the new tarball to a new location and modify configs to point to the new location. The exact steps would be the following: 1) Upload new tez tarball to new location on HDFS 2) Backup tez configs to a new tez config dir and modify tez.lib.uris to point to the new tarball location 3) Install new tez client side jars. 4) Update HADOOP_CLASSPATH to contain location of new tez client jars as well as new tez config dir What the above does is ensure that existing jobs do not start failing in between while things are being upgraded. As long as the old tarball is not deleted while old jobs are runnings, existing jobs should not fail. New jobs submitted with the new HADOOP_CLASSPATH will pick up the newly deployed bits. Hope that helps — Hitesh On Apr 6, 2016, at 10:34 AM, Sam Joe wrote: > Hi, > > How do you apply patches to Tez or any other Hadoop component? For example if > there is a bug in the existing classes used in a Hadoop component and it's > resolved in a Jira, how do you apply that patch to the existing on-premise > Hadoop setup? I think we should use Git but don't know the exact steps to do > that. Please help. > > > Thanks, > Sam
Apply patches to Apache Tez
Hi, How do you apply patches to Tez or any other Hadoop component? For example if there is a bug in the existing classes used in a Hadoop component and it's resolved in a Jira, how do you apply that patch to the existing on-premise Hadoop setup? I think we should use Git but don't know the exact steps to do that. Please help. Thanks, Sam