[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210869#comment-17210869 ] Hyukjin Kwon commented on HIVE-16391: - SPARK-20202 is resolved now. Spark does not use Hive 1.2 fork anymore, and does not need 1.2.x release. I am tentatively resolving this ticket. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012436#comment-17012436 ] Hive QA commented on HIVE-16391: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12927784/HIVE-16391.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20138/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20138/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20138/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2020-01-10 04:01:13.051 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-20138/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2020-01-10 04:01:13.054 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at f8e583f HIVE-22709: NullPointerException during query compilation after HIVE-22578 (Jason Dere, reviewed by Prasanth Jayachandran) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at f8e583f HIVE-22709: NullPointerException during query compilation after HIVE-22578 (Jason Dere, reviewed by Prasanth Jayachandran) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2020-01-10 04:01:13.816 + rm -rf ../yetus_PreCommit-HIVE-Build-20138 + mkdir ../yetus_PreCommit-HIVE-Build-20138 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-20138 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-20138/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Trying to apply the patch with -p0 error: a/pom.xml: does not exist in index error: a/ql/pom.xml: does not exist in index Trying to apply the patch with -p1 error: patch failed: pom.xml:44 Falling back to three-way merge... Applied patch to 'pom.xml' cleanly. error: patch failed: ql/pom.xml:671 Falling back to three-way merge... Applied patch to 'ql/pom.xml' with conflicts. Going to apply patch with: git apply -p1 error: patch failed: pom.xml:44 Falling back to three-way merge... Applied patch to 'pom.xml' cleanly. error: patch failed: ql/pom.xml:671 Falling back to three-way merge... Applied patch to 'ql/pom.xml' with conflicts. U ql/pom.xml + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-20138 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12927784 - PreCommit-HIVE-Build > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519931#comment-16519931 ] Saisai Shao commented on HIVE-16391: Gently ping [~hagleitn], would you please help to review the current proposed patch and suggest the next step. Thanks a lot. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514621#comment-16514621 ] Hive QA commented on HIVE-16391: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12927784/HIVE-16391.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/11814/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11814/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11814/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2018-06-16 02:04:25.805 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-11814/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2018-06-16 02:04:25.809 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 73ee8a1 HIVE-19837: Setting to have different default location for external tables (Jason Dere, reviewed by Ashutosh Chauhan) + git clean -f -d Removing itests/${project.basedir}/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 73ee8a1 HIVE-19837: Setting to have different default location for external tables (Jason Dere, reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2018-06-16 02:04:27.106 + rm -rf ../yetus_PreCommit-HIVE-Build-11814 + mkdir ../yetus_PreCommit-HIVE-Build-11814 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-11814 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-11814/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/pom.xml: does not exist in index error: a/ql/pom.xml: does not exist in index error: patch failed: pom.xml:44 Falling back to three-way merge... Applied patch to 'pom.xml' cleanly. error: patch failed: ql/pom.xml:671 Falling back to three-way merge... Applied patch to 'ql/pom.xml' with conflicts. Going to apply patch with: git apply -p1 error: patch failed: pom.xml:44 Falling back to three-way merge... Applied patch to 'pom.xml' cleanly. error: patch failed: ql/pom.xml:671 Falling back to three-way merge... Applied patch to 'ql/pom.xml' with conflicts. U ql/pom.xml + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-11814 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12927784 - PreCommit-HIVE-Build > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511851#comment-16511851 ] Saisai Shao commented on HIVE-16391: I see. I can keep the "core" classifier and use another name. Will update the patch. [~owen.omalley] would you please help to review this patch, since you created a Spark JIRA, or can you please point someone in Hive community to help to review? Thanks a lot. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511747#comment-16511747 ] Marcelo Vanzin commented on HIVE-16391: --- It would be good to get comments from people on the Hive side here... Your patch is removing the "hive-exec:core" artifact, right? And replacing it with "hive-exec-core", which is also a bit different. So technically it's a breaking change, even if I think it's more correct than the current status. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506103#comment-16506103 ] Hive QA commented on HIVE-16391: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12926720/HIVE-16391.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/11619/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11619/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11619/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Tests exited with: Exception: Patch URL https://issues.apache.org/jira/secure/attachment/12926720/HIVE-16391.1.patch was found in seen patch url's cache and a test was probably run already on it. Aborting... {noformat} This message is automatically generated. ATTACHMENT ID: 12926720 - PreCommit-HIVE-Build > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506017#comment-16506017 ] Steve Loughran commented on HIVE-16391: --- I'm pleased to see the kryo version stuff isn't an issue any more...what do the hive team have to say here? > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505607#comment-16505607 ] Saisai Shao commented on HIVE-16391: Any comment [~vanzin] [~ste...@apache.org]? > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504514#comment-16504514 ] Hive QA commented on HIVE-16391: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12926720/HIVE-16391.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/11584/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11584/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11584/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2018-06-07 10:41:20.640 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-11584/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2018-06-07 10:41:20.642 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at cfd5734 HIVE-19503: Create a test that checks for dropPartitions with directSql (Peter Vary, reviewed by Vihang Karajgaonkar) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at cfd5734 HIVE-19503: Create a test that checks for dropPartitions with directSql (Peter Vary, reviewed by Vihang Karajgaonkar) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2018-06-07 10:41:21.662 + rm -rf ../yetus_PreCommit-HIVE-Build-11584 + mkdir ../yetus_PreCommit-HIVE-Build-11584 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-11584 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-11584/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/pom.xml: does not exist in index error: a/ql/pom.xml: does not exist in index error: patch failed: pom.xml:44 Falling back to three-way merge... Applied patch to 'pom.xml' cleanly. error: patch failed: ql/pom.xml:648 Falling back to three-way merge... Applied patch to 'ql/pom.xml' with conflicts. Going to apply patch with: git apply -p1 error: patch failed: pom.xml:44 Falling back to three-way merge... Applied patch to 'pom.xml' cleanly. error: patch failed: ql/pom.xml:648 Falling back to three-way merge... Applied patch to 'ql/pom.xml' with conflicts. U ql/pom.xml + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-11584 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12926720 - PreCommit-HIVE-Build > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here >
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503188#comment-16503188 ] Saisai Shao commented on HIVE-16391: Uploaded a new patch [^HIVE-16391.1.patch]to use the solution mentioned by Marcelo. Simply by adding two new maven modules and rename the original "hive-exec" module. One added module is new "hive-exec" which is compliant to existing Hive, another added module "hive-exec-spark" is specifically for Spark. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.1.patch, HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502976#comment-16502976 ] Saisai Shao commented on HIVE-16391: [~vanzin] one problem about your proposed solution: hive-exec test jar is not valid anymore, because we changed the artifact name for the current "hive-exec" pom. This might affect the user who relies on this test jar. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502756#comment-16502756 ] Saisai Shao commented on HIVE-16391: {quote}The problem with that is that it changes the meaning of Hive's artifacts, so anybody currently importing hive-exec would see a breakage, and that's probably not desired. {quote} This might not be acceptable from Hive community, because it will break the current user as you mentioned. As [~joshrosen] mentioned, Spark wants the hive-exec jar which shades kryo and prototuf-java, not a pure non-shaded jar. {quote}Another option is to change the artifact name of the current "hive-exec" pom. Then you'd publish the normal jar under the new artifact name, then have a separate module that imports that jar, shades dependencies, and publishes the result as "hive-exec". That would maintain compatibility with existing artifacts. {quote} I can try this approach, but it seems not a small change for Hive, I'm not sure if Hive community will accept such approach (at least for branch 1.2). > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502129#comment-16502129 ] Steve Loughran commented on HIVE-16391: --- bq. The problem with that is that it changes the meaning of Hive's artifacts, so anybody currently importing hive-exec would see a breakage, and that's probably not desired. probably true. Obviously, its up to the hive team, but yes, the "purist" approach is unshaded with a shaded option. One issue I recall from building that 1.2.1-spark JAR was that a very small bit of the hive API used by spark passed kryo objects around. It wasn't enough to shade, we had to tweak the hive source to import the previous kryo package so that all was in sync. If that is now fixed through: API changes, spark/hive version changes, life is simpler. Ideally: an API which didn't pass shaded classes around. Where do things stand there? > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502035#comment-16502035 ] Marcelo Vanzin commented on HIVE-16391: --- bq. I'm not sure if there's a way to publish two pom files mapping to two different shaded jars I'm pretty sure that's not possible, unless they are two separate modules. I think the proper fix would be to change "hive-exec" to be the "normal" jar, with the pom published with all dependencies. Then you could have a different, shaded jar published with a classifier (or a separate module for that, if a separate pom is desired). The problem with that is that it changes the meaning of Hive's artifacts, so anybody currently importing hive-exec would see a breakage, and that's probably not desired. Another option is to change the artifact name of the current "hive-exec" pom. Then you'd publish the normal jar under the new artifact name, then have a separate module that imports that jar, shades dependencies, and publishes the result as "hive-exec". That would maintain compatibility with existing artifacts. But all that assumes that what Spark wants is the non-shaded hive-exec jar. Historically Hive and Spark have had different dependencies for a few libraries, and that approach might actually not work. For example, Kryo used to be different (not sure now). In that case, what Spark would really need is an even more shaded version of Hive, where all conflicting dependencies have been relocated in the hive-exec jar. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 1.2.2 >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > Fix For: 1.2.3 > > Attachments: HIVE-16391.patch > > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501667#comment-16501667 ] Saisai Shao commented on HIVE-16391: I see, thanks. Will upload the patch. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Assignee: Saisai Shao >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501662#comment-16501662 ] Rui Li commented on HIVE-16391: --- [~jerryshao], assigning this to you so you should have permission to upload. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501561#comment-16501561 ] Saisai Shao commented on HIVE-16391: Seems there's no permission for me to upload a file. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501534#comment-16501534 ] Steve Loughran commented on HIVE-16391: --- Generally uses .patch files attached to the JIRA > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501415#comment-16501415 ] Saisai Shao commented on HIVE-16391: I'm not sure if submitting a PR is a right way to review in Hive Community, waiting for the feedback. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501408#comment-16501408 ] ASF GitHub Bot commented on HIVE-16391: --- GitHub user jerryshao opened a pull request: https://github.com/apache/hive/pull/364 HIVE-16391: Add a new classifier for hive-exec to be used by Spark This fix adding a new classifier for hive-exec artifact (`core-spark`), which is specifically used for Spark. Details in [SPARK-20202](https://issues.apache.org/jira/browse/SPARK-20202). This is because original hive-exec packages many transitive dependencies into shaded jar without relocation, this makes conflicts in Spark. Spark only needs to relocate protobuf and kryo jar. So here propose to add a new classifier to generate a new artifact only for Spark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/hive 1.2-spark-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/364.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #364 commit bb27b260d82fa0a77d9fea3c123f2af8f1ea88aa Author: jerryshao Date: 2018-06-05T06:59:37Z HIVE-16391: Add a new classifier for hive-exec to be used by Spark > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > Labels: pull-request-available > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501285#comment-16501285 ] Saisai Shao commented on HIVE-16391: Hi [~joshrosen] I'm trying to make the hive changes as you mentioned above using the new classifier {{core-spark}}. I found one problem about release two shaded jars (one is hive-exec, another is hive-exec-core-spark). The published pom file is still reduced pom file, which is related to hive-exec, so when Spark using hive-exec-core-spark jar, it should explicitly declare all the transitive dependencies of hive-exec. I'm not sure if there's a way to publish two pom files mapping to two different shaded jars, or it is acceptable for Spark to explicitly declare all the transitive dependencies, like {{core}} classifier you used before? > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385539#comment-16385539 ] Saisai Shao commented on HIVE-16391: Hi all, Do we have any progress on it? Spark currently uses forked Hive 1.2.1.spark2, which rejects the Hadoop version 3.0 support (SPARK-18673). We can patch forked Hive 1.2.1.spark2 to support Hadoop 3, but seems a proper solution is to maintain this in Hive as discussed (SPARK-20202) and make it fix in the Hive community. > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin >Priority: Major > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032497#comment-16032497 ] Josh Rosen commented on HIVE-16391: --- I tried to see whether Spark can consume existing Hive 1.2.1 artifacts, but it looks like neither the regular nor {{core}} hive-exec artifacts can work: * We can't use the regular Hive uber-JAR artifacts because they include many transitive dependencies but do not relocate those dependencies' classes into a private namespace, so this will cause multiple versions of the same class to be included on the classpath. To see this, note the long list of artifacts at https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml#L685 but there is only one relocation pattern (for Kryo). * We can't use the {{core}}-classified artifact: ** We actually need Kryo to be shaded in {{hive-exec}} because Spark now uses Kryo 3 (which is needed by Chill 0.8.x, which is needed for Scala 2.12) while Hive uses Kryo 2. ** In addition, I think that Spark needs to shade Hive's {{com.google.protobuf:protobuf-java}} dependency. ** The published {{hive-exec}} POM is a "dependency-reduced" POM which doesn't declare {{hive-exec}}'s transitive dependencies. To see this, compare the declared dependencies in the published POM in Maven Central (http://central.maven.org/maven2/org/apache/hive/hive-exec/1.2.1/hive-exec-1.2.1.pom) to the dependencies the source repo's POM: https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml. The lack of declared dependencies creates an additional layer of pain for us when consuming the {{core}} JAR because we now have to shoulder the burden of declaring explicit dependencies on {{hive-exec}}'s transitive dependencies (since they're no longer bundled in an uber JAR when we use the {{core}} JAR), making it harder to use tools like Maven's {{dependency:tree}} to help us spot potential dep. conflicts. Spark's current custom Hive fork is effectively making three changes compared to Hive 1.2.1 order to work around the above problems plus some legacy issues which are no longer relevant: * Remove the shading/bundling of most non-Hive classes, with the exception of Kryo and Protobuf. This has the effect of making the published POM non-dependency-reduced, easing the dep. management story in Spark's POMs, while still ensuring that we relocate classes that conflict with Spark. * Package the hive-shims into the hive-exec JAR. I don't think that this is strictly necessary. * Downgrade Kryo to 2.21. This isn't necessary anymore: there was an earlier time where we purposely _unshaded_ Kryo and pinned Hive's version to match Spark's. The only reason that this change is present today was to minimize the diff between versions 1 and 2 of Spark's Hive fork. For the full details, see https://github.com/apache/hive/compare/release-1.2.1...JoshRosen:release-1.2.1-spark2, which compares the current Version 2 of our Hive fork to stock Hive 1.2.1. Maven classifiers do not allow the declaration of different dependencies for artifacts depending on their classifiers, so if we wanted to publish a {{hive-exec core}}-like artifact which declares its transitive dependencies then this would need to be done under a new Maven artifact name or new version (e.g. Hive 1.2.2-spark). That said, proper declaration of transitive dependencies isn't a hard blocker for us: a long, long, long time ago, I think that Spark may have actually built with a stock {{core}} artifact and explicitly declared the transitive deps, so if we've handled that dependency declaration before then we can do it again at the cost of some pain in the future if we want to bump to Hive 2.x. Therefore, I think the minimal change needed in Hive's build is to add a new classifier, say {{core-spark}}, which behaves like {{core}} except that it shades and relocates Kryo and Protobuf. If this artifact existed then I think Spark could use that classified artifact, declare an explicit dependency on the shim artifacts (assuming Kryo and Protobuf don't need to be shaded there) and explicitly pull in all of {{hive-exec}}'s transitive dependencies. This avoids the need to publish separate _versions_ for Spark: instead, Spark would just consume a differently-packaged/differently-classified version of a stock Hive release. If we go with this latter approach, then I guess Hive would need to publish 1.2.3 or 1.2.2.1 in order to introduce the new classified artifact. Does this sound like a reasonable approach? Or would it make more sense to have a separate Hive branch and versioning scheme for Spark (e.g. {{branch-1.2-spark}} and Hive {{1.2.1-spark}})? I lean towards the former approach (releasing 1.2.3 with an additional Spark-specific classifier), especially if we want to fix bugs or make functional / non-packaging changes later down the road (I think [~ste...@apache.org] had a few changes / fixes
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962887#comment-15962887 ] Edward Capriolo commented on HIVE-16391: It is good to have someone submit a clean patch, against trunk and also one back ported to your target version 1.2.2. I roughly get the use case but you might want to state this more specifically which submodules you are depending on and what are the problematic dependencies. Potentially include the output of : mvn dependency tree > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
[ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961920#comment-15961920 ] Edward Capriolo commented on HIVE-16391: Looking this over. This fork has 225 commits with 1265 files changed. As you mentioned the fork includes other fixes that are unnecessary. I think to move forward it would be good if someone submitted a patch/branch with only the changes needed. Does anyone wish to mark themselves as the assignee and do this work? > Publish proper Hive 1.2 jars (without including all dependencies in uber jar) > - > > Key: HIVE-16391 > URL: https://issues.apache.org/jira/browse/HIVE-16391 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Reynold Xin > > Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the > only change in the fork is to work around the issue that Hive publishes only > two sets of jars: one set with no dependency declared, and another with all > the dependencies included in the published uber jar. That is to say, Hive > doesn't publish a set of jars with the proper dependencies declared. > There is general consensus on both sides that we should remove the forked > Hive. > The change in the forked version is recorded here > https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 > Note that the fork in the past included other fixes but those have all become > unnecessary. -- This message was sent by Atlassian JIRA (v6.3.15#6346)