Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-29 Thread Denys Kuzmenko
Would we fix the problem by relocating just guava and joda-time? 
Here is how it's done in Impala:
https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L70-L77
 


Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-29 Thread Sourabh Badhya
+1. Multiple projects will benefit from this.

Thanks Simhadri for driving this discussion.

Regards,
Sourabh Badhya

On Mon, Apr 29, 2024 at 12:46 PM Stamatis Zampetakis 
wrote:

> I shared the reasons behind the removal of the jar and my concerns around
> bringing it back. I'm still not convinced that it's needed but if the rest
> of the community feels that it's the right path forward then I am ok with
> this.
>
> Best,
> Stamatis
>
> On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena  wrote:
>
>> Stamatis,
>> Isn't the removal itself an incompatible change? There are a lot of
>> projects using it & we suddenly removed a jar because there were some
>> people not sure how to properly use it and were complaining about it.
>>
>> What about the projects which are now stuck? reading the thread at [1],
>> there were promises made that everything will be relocated and sorted
>> before the release, but we couldn't, AFAIK it isn't a naive task to just
>> relocate all the dependencies.
>>
>> As I see here @Chao Sun , even raised concerns [2], that the removal just
>> stops the way for upgrading downstream projects and it got countered like
>> folks chasing the removal will help chase getting all the dependencies
>> relocated or solve the issues for downstream. I think none volunteered.
>>
>> I would either recommend:
>> * Best case we relocate all the dependencies present in hive-exec, not
>> just one or two. Somebody volunteers to raise one PR relocating "all" and
>> we can commit that and we should be sorted.
>> * Restore back the core jar, because a lot of projects depend on it, the
>> removal itself was incompatible, the removal I don't think had a clear
>> community agreement, it was a conditional agreement, which I don't think
>> got sorted, so we should rollback.
>>
>> On a lighter note, we might release with some 5000+ commits, with best
>> performance or so, but if nobody is able to consume those release bits, I
>> think those efforts are just getting waste, eventually people will just
>> stick to their older versions and not even try to upgrade & we will be
>> releasing for nobody or maybe for few folks who just have only Hive in
>> their stack (I don't know if there are folks like that), No matter how good
>> a product is, if people don't use it, it is gonna die :-(
>>
>>
>> I think we have a ticket which talks about relocating all dependencies, I
>> agree we should drop the core jar for sure, it leads to all the problems as
>> Stamatis mentioned but lets restore the core jar back & we can drop it when
>> that relocation ticket is resolved. Does that sound convincing, or even
>> worth a thought?
>>
>> btw. having jars with a set of dependencies shaded and other ones
>> unshaded is done in hadoop as well, hadoop-minicluster vs
>> hadoop-client-minicluster & such problems by users keep on coming, eg [3]
>>
>> Anyone else, any thoughts?
>>
>> -Ayush
>>
>> [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
>> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn
>> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x
>>
>>
>>
>> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis 
>> wrote:
>>
>>> Hey Simhadri, thanks for starting this discussion.
>>>
>>> Maven has many limitations when it comes to publishing multiple
>>> artifacts from the same module. In most cases, the end result is
>>> broken and hard to use. The pom file that is published for a given
>>> module is not able to describe correctly all artifacts of the module
>>> and that's why there is one main artifact for every module; dependency
>>> declarations are usually correct for the main artifact but are not
>>> representative for the rest.
>>>
>>> For example, end-users who consume the hive-exec-core module tend to
>>> think that maven will automatically resolve all transitive
>>> dependencies and things will work as usual which is not the case. In
>>> the past, this kind of assumption created a lot of confusion on
>>> consumers of the hive-core-exec.jar with tickets and open debates that
>>> spanned for multiple months. The discussions even reached a point
>>> where people requested certain features of Hive to be reverted in
>>> order to rectify some things around transitive dependencies and the
>>> core jar.
>>>
>>> I think we should stick to the usual maven convention and just publish
>>> one artifact for each module. Adding back and claiming to support the
>>> "core" jar is a step backwards that just postpones the real problems
>>> that we need to tackle.
>>>
>>> Furthermore, I don't think that the hive-exec module was ever meant to
>>> be used as a dependency. This is mainly an application module and not
>>> a library module and that's why shading takes place. Clearly some
>>> parts from hive-exec could be considered to become a library and that
>>> would be a promising direction going forward (splitting hive-exec into
>>> other modules) but a bit outside the scope of the current discussion.
>>>
>>> 

Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-29 Thread Stamatis Zampetakis
I shared the reasons behind the removal of the jar and my concerns around
bringing it back. I'm still not convinced that it's needed but if the rest
of the community feels that it's the right path forward then I am ok with
this.

Best,
Stamatis

On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena  wrote:

> Stamatis,
> Isn't the removal itself an incompatible change? There are a lot of
> projects using it & we suddenly removed a jar because there were some
> people not sure how to properly use it and were complaining about it.
>
> What about the projects which are now stuck? reading the thread at [1],
> there were promises made that everything will be relocated and sorted
> before the release, but we couldn't, AFAIK it isn't a naive task to just
> relocate all the dependencies.
>
> As I see here @Chao Sun , even raised concerns [2], that the removal just
> stops the way for upgrading downstream projects and it got countered like
> folks chasing the removal will help chase getting all the dependencies
> relocated or solve the issues for downstream. I think none volunteered.
>
> I would either recommend:
> * Best case we relocate all the dependencies present in hive-exec, not
> just one or two. Somebody volunteers to raise one PR relocating "all" and
> we can commit that and we should be sorted.
> * Restore back the core jar, because a lot of projects depend on it, the
> removal itself was incompatible, the removal I don't think had a clear
> community agreement, it was a conditional agreement, which I don't think
> got sorted, so we should rollback.
>
> On a lighter note, we might release with some 5000+ commits, with best
> performance or so, but if nobody is able to consume those release bits, I
> think those efforts are just getting waste, eventually people will just
> stick to their older versions and not even try to upgrade & we will be
> releasing for nobody or maybe for few folks who just have only Hive in
> their stack (I don't know if there are folks like that), No matter how good
> a product is, if people don't use it, it is gonna die :-(
>
>
> I think we have a ticket which talks about relocating all dependencies, I
> agree we should drop the core jar for sure, it leads to all the problems as
> Stamatis mentioned but lets restore the core jar back & we can drop it when
> that relocation ticket is resolved. Does that sound convincing, or even
> worth a thought?
>
> btw. having jars with a set of dependencies shaded and other ones unshaded
> is done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster
> & such problems by users keep on coming, eg [3]
>
> Anyone else, any thoughts?
>
> -Ayush
>
> [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn
> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x
>
>
>
> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis 
> wrote:
>
>> Hey Simhadri, thanks for starting this discussion.
>>
>> Maven has many limitations when it comes to publishing multiple
>> artifacts from the same module. In most cases, the end result is
>> broken and hard to use. The pom file that is published for a given
>> module is not able to describe correctly all artifacts of the module
>> and that's why there is one main artifact for every module; dependency
>> declarations are usually correct for the main artifact but are not
>> representative for the rest.
>>
>> For example, end-users who consume the hive-exec-core module tend to
>> think that maven will automatically resolve all transitive
>> dependencies and things will work as usual which is not the case. In
>> the past, this kind of assumption created a lot of confusion on
>> consumers of the hive-core-exec.jar with tickets and open debates that
>> spanned for multiple months. The discussions even reached a point
>> where people requested certain features of Hive to be reverted in
>> order to rectify some things around transitive dependencies and the
>> core jar.
>>
>> I think we should stick to the usual maven convention and just publish
>> one artifact for each module. Adding back and claiming to support the
>> "core" jar is a step backwards that just postpones the real problems
>> that we need to tackle.
>>
>> Furthermore, I don't think that the hive-exec module was ever meant to
>> be used as a dependency. This is mainly an application module and not
>> a library module and that's why shading takes place. Clearly some
>> parts from hive-exec could be considered to become a library and that
>> would be a promising direction going forward (splitting hive-exec into
>> other modules) but a bit outside the scope of the current discussion.
>>
>> From the issues outlined above the only actionable item that I see
>> concerns the joda library so we could try to simply relocate it if it
>> is causing issues.
>>
>> Finally, if someone wants to create a jar with specific contents from
>> the hive-exec module it is rather easy to do