[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2020-10-09 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210869#comment-17210869
 ] 

Hyukjin Kwon commented on HIVE-16391:
-

SPARK-20202 is resolved now. Spark does not use Hive 1.2 fork anymore, and does 
not need 1.2.x release. I am tentatively resolving this ticket.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2020-01-09 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012436#comment-17012436
 ] 

Hive QA commented on HIVE-16391:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12927784/HIVE-16391.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20138/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20138/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20138/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2020-01-10 04:01:13.051
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-20138/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2020-01-10 04:01:13.054
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at f8e583f HIVE-22709: NullPointerException during query 
compilation after HIVE-22578 (Jason Dere, reviewed by Prasanth Jayachandran)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at f8e583f HIVE-22709: NullPointerException during query 
compilation after HIVE-22578 (Jason Dere, reviewed by Prasanth Jayachandran)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2020-01-10 04:01:13.816
+ rm -rf ../yetus_PreCommit-HIVE-Build-20138
+ mkdir ../yetus_PreCommit-HIVE-Build-20138
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-20138
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-20138/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Trying to apply the patch with -p0
error: a/pom.xml: does not exist in index
error: a/ql/pom.xml: does not exist in index
Trying to apply the patch with -p1
error: patch failed: pom.xml:44
Falling back to three-way merge...
Applied patch to 'pom.xml' cleanly.
error: patch failed: ql/pom.xml:671
Falling back to three-way merge...
Applied patch to 'ql/pom.xml' with conflicts.
Going to apply patch with: git apply -p1
error: patch failed: pom.xml:44
Falling back to three-way merge...
Applied patch to 'pom.xml' cleanly.
error: patch failed: ql/pom.xml:671
Falling back to three-way merge...
Applied patch to 'ql/pom.xml' with conflicts.
U ql/pom.xml
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-20138
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12927784 - PreCommit-HIVE-Build

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general 

[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-21 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519931#comment-16519931
 ] 

Saisai Shao commented on HIVE-16391:


Gently ping [~hagleitn], would you please help to review the current proposed 
patch and suggest the next step. Thanks a lot.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514621#comment-16514621
 ] 

Hive QA commented on HIVE-16391:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12927784/HIVE-16391.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11814/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11814/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11814/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-06-16 02:04:25.805
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-11814/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-06-16 02:04:25.809
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 73ee8a1 HIVE-19837: Setting to have different default location 
for external tables (Jason Dere, reviewed by Ashutosh Chauhan)
+ git clean -f -d
Removing itests/${project.basedir}/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 73ee8a1 HIVE-19837: Setting to have different default location 
for external tables (Jason Dere, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-06-16 02:04:27.106
+ rm -rf ../yetus_PreCommit-HIVE-Build-11814
+ mkdir ../yetus_PreCommit-HIVE-Build-11814
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-11814
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-11814/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/pom.xml: does not exist in index
error: a/ql/pom.xml: does not exist in index
error: patch failed: pom.xml:44
Falling back to three-way merge...
Applied patch to 'pom.xml' cleanly.
error: patch failed: ql/pom.xml:671
Falling back to three-way merge...
Applied patch to 'ql/pom.xml' with conflicts.
Going to apply patch with: git apply -p1
error: patch failed: pom.xml:44
Falling back to three-way merge...
Applied patch to 'pom.xml' cleanly.
error: patch failed: ql/pom.xml:671
Falling back to three-way merge...
Applied patch to 'ql/pom.xml' with conflicts.
U ql/pom.xml
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-11814
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12927784 - PreCommit-HIVE-Build

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version 

[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-13 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511851#comment-16511851
 ] 

Saisai Shao commented on HIVE-16391:


I see. I can keep the "core" classifier and use another name. Will update the 
patch.

[~owen.omalley] would you please help to review this patch, since you created a 
Spark JIRA, or can you please point someone in Hive community to help to 
review? Thanks a lot.

 

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-13 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511747#comment-16511747
 ] 

Marcelo Vanzin commented on HIVE-16391:
---

It would be good to get comments from people on the Hive side here...

Your patch is removing the "hive-exec:core" artifact, right? And replacing it 
with "hive-exec-core", which is also a bit different. So technically it's a 
breaking change, even if I think it's more correct than the current status.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506103#comment-16506103
 ] 

Hive QA commented on HIVE-16391:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12926720/HIVE-16391.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11619/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11619/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11619/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12926720/HIVE-16391.1.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12926720 - PreCommit-HIVE-Build

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-08 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506017#comment-16506017
 ] 

Steve Loughran commented on HIVE-16391:
---

I'm pleased to see the kryo version stuff isn't an issue any more...what do the 
hive team have to say here?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505607#comment-16505607
 ] 

Saisai Shao commented on HIVE-16391:


Any comment [~vanzin] [~ste...@apache.org]?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504514#comment-16504514
 ] 

Hive QA commented on HIVE-16391:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12926720/HIVE-16391.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11584/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11584/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11584/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-06-07 10:41:20.640
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-11584/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-06-07 10:41:20.642
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at cfd5734 HIVE-19503: Create a test that checks for dropPartitions 
with directSql (Peter Vary, reviewed by Vihang Karajgaonkar)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at cfd5734 HIVE-19503: Create a test that checks for dropPartitions 
with directSql (Peter Vary, reviewed by Vihang Karajgaonkar)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-06-07 10:41:21.662
+ rm -rf ../yetus_PreCommit-HIVE-Build-11584
+ mkdir ../yetus_PreCommit-HIVE-Build-11584
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-11584
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-11584/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/pom.xml: does not exist in index
error: a/ql/pom.xml: does not exist in index
error: patch failed: pom.xml:44
Falling back to three-way merge...
Applied patch to 'pom.xml' cleanly.
error: patch failed: ql/pom.xml:648
Falling back to three-way merge...
Applied patch to 'ql/pom.xml' with conflicts.
Going to apply patch with: git apply -p1
error: patch failed: pom.xml:44
Falling back to three-way merge...
Applied patch to 'pom.xml' cleanly.
error: patch failed: ql/pom.xml:648
Falling back to three-way merge...
Applied patch to 'ql/pom.xml' with conflicts.
U ql/pom.xml
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-11584
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12926720 - PreCommit-HIVE-Build

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> 

[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503188#comment-16503188
 ] 

Saisai Shao commented on HIVE-16391:


Uploaded a new patch [^HIVE-16391.1.patch]to use the solution mentioned by 
Marcelo.

Simply by adding two new maven modules and rename the original "hive-exec" 
module. One added module is new "hive-exec" which is compliant to existing 
Hive, another added module "hive-exec-spark" is specifically for Spark.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502976#comment-16502976
 ] 

Saisai Shao commented on HIVE-16391:


[~vanzin] one problem about your proposed solution: hive-exec test jar is not 
valid anymore, because we changed the artifact name for the current "hive-exec" 
pom. This might affect the user who relies on this test jar.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502756#comment-16502756
 ] 

Saisai Shao commented on HIVE-16391:


{quote}The problem with that is that it changes the meaning of Hive's 
artifacts, so anybody currently importing hive-exec would see a breakage, and 
that's probably not desired.
{quote}
 
 This might not be acceptable from Hive community, because it will break the 
current user as you mentioned.

As [~joshrosen] mentioned, Spark wants the hive-exec jar which shades kryo and 
prototuf-java, not a pure non-shaded jar.
{quote}Another option is to change the artifact name of the current "hive-exec" 
pom. Then you'd publish the normal jar under the new artifact name, then have a 
separate module that imports that jar, shades dependencies, and publishes the 
result as "hive-exec". That would maintain compatibility with existing 
artifacts.
{quote}
I can try this approach, but it seems not a small change for Hive, I'm not sure 
if Hive community will accept such approach (at least for branch 1.2).

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502129#comment-16502129
 ] 

Steve Loughran commented on HIVE-16391:
---

bq. The problem with that is that it changes the meaning of Hive's artifacts, 
so anybody currently importing hive-exec would see a breakage, and that's 
probably not desired.

probably true.

Obviously, its up to the hive team, but yes, the "purist" approach is unshaded 
with a shaded option.

One issue I recall from building that 1.2.1-spark JAR was that a very small bit 
of the hive API used by spark passed kryo objects around. It wasn't enough to 
shade, we had to tweak the hive source to import the previous kryo package so 
that all was in sync. If that is now fixed through: API changes, spark/hive 
version changes, life is simpler. Ideally:  an API which didn't pass shaded 
classes around.

Where do things stand there?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502035#comment-16502035
 ] 

Marcelo Vanzin commented on HIVE-16391:
---

bq. I'm not sure if there's a way to publish two pom files mapping to two 
different shaded jars

I'm pretty sure that's not possible, unless they are two separate modules.

I think the proper fix would be to change "hive-exec" to be the "normal" jar, 
with the pom published with all dependencies. Then you could have a different, 
shaded jar published with a classifier (or a separate module for that, if a 
separate pom is desired).

The problem with that is that it changes the meaning of Hive's artifacts, so 
anybody currently importing hive-exec would see a breakage, and that's probably 
not desired.

Another option is to change the artifact name of the current "hive-exec" pom. 
Then you'd publish the normal jar under the new artifact name, then have a 
separate module that imports that jar, shades dependencies, and publishes the 
result as "hive-exec". That would maintain compatibility with existing 
artifacts.

But all that assumes that what Spark wants is the non-shaded hive-exec jar. 
Historically Hive and Spark have had different dependencies for a few 
libraries, and that approach might actually not work. For example, Kryo used to 
be different (not sure now). In that case, what Spark would really need is an 
even more shaded version of Hive, where all conflicting dependencies have been 
relocated in the hive-exec jar.


> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501667#comment-16501667
 ] 

Saisai Shao commented on HIVE-16391:


I see, thanks. Will upload the patch.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501662#comment-16501662
 ] 

Rui Li commented on HIVE-16391:
---

[~jerryshao], assigning this to you so you should have permission to upload.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501561#comment-16501561
 ] 

Saisai Shao commented on HIVE-16391:


Seems there's no permission for me to upload a file.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501534#comment-16501534
 ] 

Steve Loughran commented on HIVE-16391:
---

Generally uses .patch files attached to the JIRA

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501415#comment-16501415
 ] 

Saisai Shao commented on HIVE-16391:


I'm not sure if submitting a PR is a right way to review in Hive Community, 
waiting for the feedback.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501408#comment-16501408
 ] 

ASF GitHub Bot commented on HIVE-16391:
---

GitHub user jerryshao opened a pull request:

https://github.com/apache/hive/pull/364

HIVE-16391: Add a new classifier for hive-exec to be used by Spark

This fix adding a new classifier for hive-exec artifact (`core-spark`), 
which is specifically used for Spark. Details in 
[SPARK-20202](https://issues.apache.org/jira/browse/SPARK-20202). 

This is because  original hive-exec packages many transitive dependencies 
into shaded jar without relocation, this makes conflicts in Spark. Spark only 
needs to relocate protobuf and kryo jar. So here propose to add a new 
classifier to generate a new artifact only for Spark.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/hive 1.2-spark-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #364


commit bb27b260d82fa0a77d9fea3c123f2af8f1ea88aa
Author: jerryshao 
Date:   2018-06-05T06:59:37Z

HIVE-16391: Add a new classifier for hive-exec to be used by Spark




> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>  Labels: pull-request-available
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-04 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501285#comment-16501285
 ] 

Saisai Shao commented on HIVE-16391:


Hi [~joshrosen] I'm trying to make the hive changes as you mentioned above 
using the new classifier {{core-spark}}. I found one problem about release two 
shaded jars (one is hive-exec, another is hive-exec-core-spark). The published 
pom file is still reduced pom file, which is related to hive-exec, so when 
Spark using hive-exec-core-spark jar, it should explicitly declare all the 
transitive dependencies of hive-exec.

I'm not sure if there's a way to publish two pom files mapping to two different 
shaded jars, or it is acceptable for Spark to explicitly declare all the 
transitive dependencies, like {{core}} classifier you used before?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-03-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385539#comment-16385539
 ] 

Saisai Shao commented on HIVE-16391:


Hi all,

Do we have any progress on it? Spark currently uses forked Hive 1.2.1.spark2, 
which rejects the Hadoop version 3.0 support (SPARK-18673). We can patch forked 
Hive 1.2.1.spark2 to support Hadoop 3, but seems a proper solution is to 
maintain this in Hive as discussed (SPARK-20202) and make it fix in the Hive 
community.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2017-05-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032497#comment-16032497
 ] 

Josh Rosen commented on HIVE-16391:
---

I tried to see whether Spark can consume existing Hive 1.2.1 artifacts, but it 
looks like neither the regular nor {{core}} hive-exec artifacts can work:

* We can't use the regular Hive uber-JAR artifacts because they include many 
transitive dependencies but do not relocate those dependencies' classes into a 
private namespace, so this will cause multiple versions of the same class to be 
included on the classpath. To see this, note the long list of artifacts at 
https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml#L685 but there is 
only one relocation pattern (for Kryo).
* We can't use the {{core}}-classified artifact:
** We actually need Kryo to be shaded in {{hive-exec}} because Spark now uses 
Kryo 3 (which is needed by Chill 0.8.x, which is needed for Scala 2.12) while 
Hive uses Kryo 2.
** In addition, I think that Spark needs to shade Hive's 
{{com.google.protobuf:protobuf-java}} dependency.
** The published {{hive-exec}} POM is a "dependency-reduced" POM which doesn't 
declare {{hive-exec}}'s transitive dependencies. To see this, compare the 
declared dependencies in the published POM in Maven Central 
(http://central.maven.org/maven2/org/apache/hive/hive-exec/1.2.1/hive-exec-1.2.1.pom)
 to the dependencies the source repo's POM:  
https://github.com/apache/hive/blob/release-1.2.1/ql/pom.xml. The lack of 
declared dependencies creates an additional layer of pain for us when consuming 
the {{core}} JAR because we now have to shoulder the burden of declaring 
explicit dependencies on {{hive-exec}}'s transitive dependencies (since they're 
no longer bundled in an uber JAR when we use the {{core}} JAR), making it 
harder to use tools like Maven's {{dependency:tree}} to help us spot potential 
dep. conflicts.

Spark's current custom Hive fork is effectively making three changes compared 
to Hive 1.2.1 order to work around the above problems plus some legacy issues 
which are no longer relevant:

* Remove the shading/bundling of most non-Hive classes, with the exception of 
Kryo and Protobuf. This has the effect of making the published POM 
non-dependency-reduced, easing the dep. management story in Spark's POMs, while 
still ensuring that we relocate classes that conflict with Spark.
* Package the hive-shims into the hive-exec JAR. I don't think that this is 
strictly necessary.
* Downgrade Kryo to 2.21. This isn't necessary anymore: there was an earlier 
time where we purposely _unshaded_ Kryo and pinned Hive's version to match 
Spark's. The only reason that this change is present today was to minimize the 
diff between versions 1 and 2 of Spark's Hive fork.

For the full details, see 
https://github.com/apache/hive/compare/release-1.2.1...JoshRosen:release-1.2.1-spark2,
 which compares the current Version 2 of our Hive fork to stock Hive 1.2.1.

Maven classifiers do not allow the declaration of different dependencies for 
artifacts depending on their classifiers, so if we wanted to publish a 
{{hive-exec core}}-like artifact which declares its transitive dependencies 
then this would need to be done under a new Maven artifact name or new version 
(e.g. Hive 1.2.2-spark).

That said, proper declaration of transitive dependencies isn't a hard blocker 
for us: a long, long, long time ago, I think that Spark may have actually built 
with a stock {{core}} artifact and explicitly declared the transitive deps, so 
if we've handled that dependency declaration before then we can do it again at 
the cost of some pain in the future if we want to bump to Hive 2.x.

Therefore, I think the minimal change needed in Hive's build is to add a new 
classifier, say {{core-spark}}, which behaves like {{core}} except that it 
shades and relocates Kryo and Protobuf. If this artifact existed then I think 
Spark could use that classified artifact, declare an explicit dependency on the 
shim artifacts (assuming Kryo and Protobuf don't need to be shaded there) and 
explicitly pull in all of {{hive-exec}}'s transitive dependencies. This avoids 
the need to publish separate _versions_ for Spark: instead, Spark would just 
consume a differently-packaged/differently-classified version of a stock Hive 
release.

If we go with this latter approach, then I guess Hive would need to publish 
1.2.3 or 1.2.2.1 in order to introduce the new classified artifact.

Does this sound like a reasonable approach? Or would it make more sense to have 
a separate Hive branch and versioning scheme for Spark (e.g. 
{{branch-1.2-spark}} and Hive {{1.2.1-spark}})? I lean towards the former 
approach (releasing 1.2.3 with an additional Spark-specific classifier), 
especially if we want to fix bugs or make functional / non-packaging changes 
later down the road (I think [~ste...@apache.org] had a few changes / fixes 

[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2017-04-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962887#comment-15962887
 ] 

Edward Capriolo commented on HIVE-16391:


It is good to have someone submit a clean patch, against trunk and also one 
back ported to your target version 1.2.2. I roughly get the use case but you 
might want to state this more specifically which submodules you are depending 
on and what are the problematic dependencies.

Potentially include the output of : mvn dependency tree 



> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2017-04-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961920#comment-15961920
 ] 

Edward Capriolo commented on HIVE-16391:


Looking this over. This fork has 225 commits with 1265 files changed. As you 
mentioned the fork includes other fixes that are unnecessary. I think to move 
forward it would be good if someone submitted a patch/branch with only the 
changes needed. Does anyone wish to mark themselves as the assignee and do this 
work?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)