Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread Kent Yao
+1

Kent Yao

On 2024/04/30 09:07:21 Yuming Wang wrote:
> +1
> 
> On Tue, Apr 30, 2024 at 3:31 PM Ye Xianjin  wrote:
> 
> > +1
> > Sent from my iPhone
> >
> > On Apr 30, 2024, at 3:23 PM, DB Tsai  wrote:
> >
> > 
> > +1
> >
> > On Apr 29, 2024, at 8:01 PM, Wenchen Fan  wrote:
> >
> > 
> > To add more color:
> >
> > Spark data source table and Hive Serde table are both stored in the Hive
> > metastore and keep the data files in the table directory. The only
> > difference is they have different "table provider", which means Spark will
> > use different reader/writer. Ideally the Spark native data source
> > reader/writer is faster than the Hive Serde ones.
> >
> > What's more, the default format of Hive Serde is text. I don't think
> > people want to use text format tables in production. Most people will add
> > `STORED AS parquet` or `USING parquet` explicitly. By setting this config
> > to false, we have a more reasonable default behavior: creating Parquet
> > tables (or whatever is specified by `spark.sql.sources.default`).
> >
> > On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan  wrote:
> >
> >> @Mich Talebzadeh  there seems to be a
> >> misunderstanding here. The Spark native data source table is still stored
> >> in the Hive metastore, it's just that Spark will use a different (and
> >> faster) reader/writer for it. `hive-site.xml` should work as it is today.
> >>
> >> On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon 
> >> wrote:
> >>
> >>> +1
> >>>
> >>> It's a legacy conf that we should eventually remove it away. Spark
> >>> should create Spark table by default, not Hive table.
> >>>
> >>> Mich, for your workload, you can simply switch that conf off if it
> >>> concerns you. We also enabled ANSI as well (that you agreed on). It's a 
> >>> bit
> >>> akwakrd to stop in the middle for this compatibility reason during making
> >>> Spark sound. The compatibility has been tested in production for a long
> >>> time so I don't see any particular issue about the compatibility case you
> >>> mentioned.
> >>>
> >>> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh <
> >>> mich.talebza...@gmail.com> wrote:
> >>>
> 
>  Hi @Wenchen Fan 
> 
>  Thanks for your response. I believe we have not had enough time to
>  "DISCUSS" this matter.
> 
>  Currently in order to make Spark take advantage of Hive, I create a
>  soft link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is
>  3.1.1
> 
>   /opt/spark/conf/hive-site.xml ->
>  /data6/hduser/hive-3.1.1/conf/hive-site.xml
> 
>  This works fine for me in my lab. So in the future if we opt to use the
>  setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
>  not be a need for this logical link.?
>  On the face of it, this looks fine but in real life it may require a
>  number of changes to the old scripts. Hence my concern.
>  As a matter of interest has anyone liaised with the Hive team to ensure
>  they have introduced the additional changes you outlined?
> 
>  HTH
> 
>  Mich Talebzadeh,
>  Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>  London
>  United Kingdom
> 
> 
> view my Linkedin profile
>  
> 
> 
>   https://en.everybodywiki.com/Mich_Talebzadeh
> 
> 
> 
>  *Disclaimer:* The information provided is correct to the best of my
>  knowledge but of course cannot be guaranteed . It is essential to note
>  that, as with any advice, quote "one test result is worth one-thousand
>  expert opinions (Werner
>  Von Braun
>  )".
> 
> 
>  On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:
> 
> > @Mich Talebzadeh  thanks for sharing your
> > concern!
> >
> > Note: creating Spark native data source tables is usually Hive
> > compatible as well, unless we use features that Hive does not support
> > (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to
> > create Spark native table in this case, instead of creating Hive table 
> > and
> > fail.
> >
> > On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:
> >
> >> +1 (non-binding)
> >>
> >> Thanks,
> >> Cheng Pan
> >>
> >> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
> >> wrote:
> >> >
> >> > +1
> >> >
> >> > Twitter: https://twitter.com/holdenkarau
> >> > Books (Learning Spark, High Performance Spark, etc.):
> >> https://amzn.to/2MaRAG9
> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >> >
> >> >
> >> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh 
> >> wrote:
> >> >>
> >> >> +1
> >> >>
> >> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun <
> >> dongj...@apache.org> 

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread Yuming Wang
+1

On Tue, Apr 30, 2024 at 3:31 PM Ye Xianjin  wrote:

> +1
> Sent from my iPhone
>
> On Apr 30, 2024, at 3:23 PM, DB Tsai  wrote:
>
> 
> +1
>
> On Apr 29, 2024, at 8:01 PM, Wenchen Fan  wrote:
>
> 
> To add more color:
>
> Spark data source table and Hive Serde table are both stored in the Hive
> metastore and keep the data files in the table directory. The only
> difference is they have different "table provider", which means Spark will
> use different reader/writer. Ideally the Spark native data source
> reader/writer is faster than the Hive Serde ones.
>
> What's more, the default format of Hive Serde is text. I don't think
> people want to use text format tables in production. Most people will add
> `STORED AS parquet` or `USING parquet` explicitly. By setting this config
> to false, we have a more reasonable default behavior: creating Parquet
> tables (or whatever is specified by `spark.sql.sources.default`).
>
> On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan  wrote:
>
>> @Mich Talebzadeh  there seems to be a
>> misunderstanding here. The Spark native data source table is still stored
>> in the Hive metastore, it's just that Spark will use a different (and
>> faster) reader/writer for it. `hive-site.xml` should work as it is today.
>>
>> On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> It's a legacy conf that we should eventually remove it away. Spark
>>> should create Spark table by default, not Hive table.
>>>
>>> Mich, for your workload, you can simply switch that conf off if it
>>> concerns you. We also enabled ANSI as well (that you agreed on). It's a bit
>>> akwakrd to stop in the middle for this compatibility reason during making
>>> Spark sound. The compatibility has been tested in production for a long
>>> time so I don't see any particular issue about the compatibility case you
>>> mentioned.
>>>
>>> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>

 Hi @Wenchen Fan 

 Thanks for your response. I believe we have not had enough time to
 "DISCUSS" this matter.

 Currently in order to make Spark take advantage of Hive, I create a
 soft link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is
 3.1.1

  /opt/spark/conf/hive-site.xml ->
 /data6/hduser/hive-3.1.1/conf/hive-site.xml

 This works fine for me in my lab. So in the future if we opt to use the
 setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
 not be a need for this logical link.?
 On the face of it, this looks fine but in real life it may require a
 number of changes to the old scripts. Hence my concern.
 As a matter of interest has anyone liaised with the Hive team to ensure
 they have introduced the additional changes you outlined?

 HTH

 Mich Talebzadeh,
 Technologist | Architect | Data Engineer  | Generative AI | FinCrime
 London
 United Kingdom


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* The information provided is correct to the best of my
 knowledge but of course cannot be guaranteed . It is essential to note
 that, as with any advice, quote "one test result is worth one-thousand
 expert opinions (Werner
 Von Braun
 )".


 On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:

> @Mich Talebzadeh  thanks for sharing your
> concern!
>
> Note: creating Spark native data source tables is usually Hive
> compatible as well, unless we use features that Hive does not support
> (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to
> create Spark native table in this case, instead of creating Hive table and
> fail.
>
> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>> Cheng Pan
>>
>> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
>> wrote:
>> >
>> > +1
>> >
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> >
>> >
>> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun <
>> dongj...@apache.org> wrote:
>> >> >
>> >> > I'll start with my +1.
>> >> >
>> >> > Dongjoon.
>> >> >
>> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>> >> > > Please vote on SPARK-46122 to set
>> spark.sql.legacy.createHiveTableByDefault
>> >> > > to `false` by default. The technical scope is defined in the
>> 

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread Nimrod Ofek
+1 (non-binding)

p.s
How do I become binding?

Thanks,
Nimrod

On Tue, Apr 30, 2024 at 10:53 AM Ye Xianjin  wrote:

> +1
> Sent from my iPhone
>
> On Apr 30, 2024, at 3:23 PM, DB Tsai  wrote:
>
> 
> +1
>
> On Apr 29, 2024, at 8:01 PM, Wenchen Fan  wrote:
>
> 
> To add more color:
>
> Spark data source table and Hive Serde table are both stored in the Hive
> metastore and keep the data files in the table directory. The only
> difference is they have different "table provider", which means Spark will
> use different reader/writer. Ideally the Spark native data source
> reader/writer is faster than the Hive Serde ones.
>
> What's more, the default format of Hive Serde is text. I don't think
> people want to use text format tables in production. Most people will add
> `STORED AS parquet` or `USING parquet` explicitly. By setting this config
> to false, we have a more reasonable default behavior: creating Parquet
> tables (or whatever is specified by `spark.sql.sources.default`).
>
> On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan  wrote:
>
>> @Mich Talebzadeh  there seems to be a
>> misunderstanding here. The Spark native data source table is still stored
>> in the Hive metastore, it's just that Spark will use a different (and
>> faster) reader/writer for it. `hive-site.xml` should work as it is today.
>>
>> On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> It's a legacy conf that we should eventually remove it away. Spark
>>> should create Spark table by default, not Hive table.
>>>
>>> Mich, for your workload, you can simply switch that conf off if it
>>> concerns you. We also enabled ANSI as well (that you agreed on). It's a bit
>>> akwakrd to stop in the middle for this compatibility reason during making
>>> Spark sound. The compatibility has been tested in production for a long
>>> time so I don't see any particular issue about the compatibility case you
>>> mentioned.
>>>
>>> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>

 Hi @Wenchen Fan 

 Thanks for your response. I believe we have not had enough time to
 "DISCUSS" this matter.

 Currently in order to make Spark take advantage of Hive, I create a
 soft link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is
 3.1.1

  /opt/spark/conf/hive-site.xml ->
 /data6/hduser/hive-3.1.1/conf/hive-site.xml

 This works fine for me in my lab. So in the future if we opt to use the
 setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
 not be a need for this logical link.?
 On the face of it, this looks fine but in real life it may require a
 number of changes to the old scripts. Hence my concern.
 As a matter of interest has anyone liaised with the Hive team to ensure
 they have introduced the additional changes you outlined?

 HTH

 Mich Talebzadeh,
 Technologist | Architect | Data Engineer  | Generative AI | FinCrime
 London
 United Kingdom


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* The information provided is correct to the best of my
 knowledge but of course cannot be guaranteed . It is essential to note
 that, as with any advice, quote "one test result is worth one-thousand
 expert opinions (Werner
 Von Braun
 )".


 On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:

> @Mich Talebzadeh  thanks for sharing your
> concern!
>
> Note: creating Spark native data source tables is usually Hive
> compatible as well, unless we use features that Hive does not support
> (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to
> create Spark native table in this case, instead of creating Hive table and
> fail.
>
> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>> Cheng Pan
>>
>> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
>> wrote:
>> >
>> > +1
>> >
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> >
>> >
>> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun <
>> dongj...@apache.org> wrote:
>> >> >
>> >> > I'll start with my +1.
>> >> >
>> >> > Dongjoon.
>> >> >
>> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>> >> > > Please vote on SPARK-46122 to set
>> spark.sql.legacy.createHiveTableByDefault
>> >> > > to `false` by 

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread XiDuo You
+1

Dongjoon Hyun  于2024年4月27日周六 03:50写道:
>
> Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
> to `false` by default. The technical scope is defined in the following PR.
>
> - DISCUSSION: https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> - PR: https://github.com/apache/spark/pull/46207
>
> The vote is open until April 30th 1AM (PST) and passes
> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...
>
> Thank you in advance.
>
> Dongjoon

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread Ye Xianjin
+1 Sent from my iPhoneOn Apr 30, 2024, at 3:23 PM, DB Tsai  wrote:+1 On Apr 29, 2024, at 8:01 PM, Wenchen Fan  wrote:To add more color:Spark data source table and Hive Serde table are both stored in the Hive metastore and keep the data files in the table directory. The only difference is they have different "table provider", which means Spark will use different reader/writer. Ideally the Spark native data source reader/writer is faster than the Hive Serde ones.What's more, the default format of Hive Serde is text. I don't think people want to use text format tables in production. Most people will add `STORED AS parquet` or `USING parquet` explicitly. By setting this config to false, we have a more reasonable default behavior: creating Parquet tables (or whatever is specified by `spark.sql.sources.default`).On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan  wrote:@Mich Talebzadeh there seems to be a misunderstanding here. The Spark native data source table is still stored in the Hive metastore, it's just that Spark will use a different (and faster) reader/writer for it. `hive-site.xml` should work as it is today.On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon  wrote:+1It's a legacy conf that we should eventually remove it away. Spark should create Spark table by default, not Hive table.Mich, for your workload, you can simply switch that conf off if it concerns you. We also enabled ANSI as well (that you agreed on). It's a bit akwakrd to stop in the middle for this compatibility reason during making Spark sound. The compatibility has been tested in production for a long time so I don't see any particular issue about the compatibility case you mentioned.On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh  wrote:Hi @Wenchen Fan Thanks for your response. I believe we have not had enough time to "DISCUSS" this matter. Currently in order to make Spark take advantage of Hive, I create a soft link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is 3.1.1 /opt/spark/conf/hive-site.xml -> /data6/hduser/hive-3.1.1/conf/hive-site.xmlThis works fine for me in my lab. So in the future if we opt to use the setting "spark.sql.legacy.createHiveTableByDefault" to False, there will not be a need for this logical link.? On the face of it, this looks fine but in real life it may require a number of changes to the old scripts. Hence my concern. As a matter of interest has anyone liaised with the Hive team to ensure they have introduced the additional changes you outlined?HTHMich Talebzadeh,Technologist | Architect | Data Engineer  | Generative AI | FinCrimeLondonUnited Kingdom

   view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

 Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:@Mich Talebzadeh thanks for sharing your concern!Note: creating Spark native data source tables is usually Hive compatible as well, unless we use features that Hive does not support (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to create Spark native table in this case, instead of creating Hive table and fail.On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:+1 (non-binding)

Thanks,
Cheng Pan

On Sat, Apr 27, 2024 at 9:29 AM Holden Karau  wrote:
>
> +1
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh  wrote:
>>
>> +1
>>
>> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:
>> >
>> > I'll start with my +1.
>> >
>> > Dongjoon.
>> >
>> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>> > > Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
>> > > to `false` by default. The technical scope is defined in the following PR.
>> > >
>> > > - DISCUSSION:
>> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
>> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
>> > > - PR: https://github.com/apache/spark/pull/46207
>> > >
>> > > The vote is open until April 30th 1AM (PST) and passes
>> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> > >
>> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
>> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...
>> > >
>> > > Thank you in advance.
>> > >
>> > > Dongjoon
>> > >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> 

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread DB Tsai
+1 On Apr 29, 2024, at 8:01 PM, Wenchen Fan  wrote:To add more color:Spark data source table and Hive Serde table are both stored in the Hive metastore and keep the data files in the table directory. The only difference is they have different "table provider", which means Spark will use different reader/writer. Ideally the Spark native data source reader/writer is faster than the Hive Serde ones.What's more, the default format of Hive Serde is text. I don't think people want to use text format tables in production. Most people will add `STORED AS parquet` or `USING parquet` explicitly. By setting this config to false, we have a more reasonable default behavior: creating Parquet tables (or whatever is specified by `spark.sql.sources.default`).On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan  wrote:@Mich Talebzadeh there seems to be a misunderstanding here. The Spark native data source table is still stored in the Hive metastore, it's just that Spark will use a different (and faster) reader/writer for it. `hive-site.xml` should work as it is today.On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon  wrote:+1It's a legacy conf that we should eventually remove it away. Spark should create Spark table by default, not Hive table.Mich, for your workload, you can simply switch that conf off if it concerns you. We also enabled ANSI as well (that you agreed on). It's a bit akwakrd to stop in the middle for this compatibility reason during making Spark sound. The compatibility has been tested in production for a long time so I don't see any particular issue about the compatibility case you mentioned.On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh  wrote:Hi @Wenchen Fan Thanks for your response. I believe we have not had enough time to "DISCUSS" this matter. Currently in order to make Spark take advantage of Hive, I create a soft link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is 3.1.1 /opt/spark/conf/hive-site.xml -> /data6/hduser/hive-3.1.1/conf/hive-site.xmlThis works fine for me in my lab. So in the future if we opt to use the setting "spark.sql.legacy.createHiveTableByDefault" to False, there will not be a need for this logical link.? On the face of it, this looks fine but in real life it may require a number of changes to the old scripts. Hence my concern. As a matter of interest has anyone liaised with the Hive team to ensure they have introduced the additional changes you outlined?HTHMich Talebzadeh,Technologist | Architect | Data Engineer  | Generative AI | FinCrimeLondonUnited Kingdom

   view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

 Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:@Mich Talebzadeh thanks for sharing your concern!Note: creating Spark native data source tables is usually Hive compatible as well, unless we use features that Hive does not support (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to create Spark native table in this case, instead of creating Hive table and fail.On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:+1 (non-binding)

Thanks,
Cheng Pan

On Sat, Apr 27, 2024 at 9:29 AM Holden Karau  wrote:
>
> +1
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh  wrote:
>>
>> +1
>>
>> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:
>> >
>> > I'll start with my +1.
>> >
>> > Dongjoon.
>> >
>> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>> > > Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
>> > > to `false` by default. The technical scope is defined in the following PR.
>> > >
>> > > - DISCUSSION:
>> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
>> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
>> > > - PR: https://github.com/apache/spark/pull/46207
>> > >
>> > > The vote is open until April 30th 1AM (PST) and passes
>> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> > >
>> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
>> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...
>> > >
>> > > Thank you in advance.
>> > >
>> > > Dongjoon
>> > >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: 

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Wenchen Fan
To add more color:

Spark data source table and Hive Serde table are both stored in the Hive
metastore and keep the data files in the table directory. The only
difference is they have different "table provider", which means Spark will
use different reader/writer. Ideally the Spark native data source
reader/writer is faster than the Hive Serde ones.

What's more, the default format of Hive Serde is text. I don't think people
want to use text format tables in production. Most people will add `STORED
AS parquet` or `USING parquet` explicitly. By setting this config to false,
we have a more reasonable default behavior: creating Parquet tables (or
whatever is specified by `spark.sql.sources.default`).

On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan  wrote:

> @Mich Talebzadeh  there seems to be a
> misunderstanding here. The Spark native data source table is still stored
> in the Hive metastore, it's just that Spark will use a different (and
> faster) reader/writer for it. `hive-site.xml` should work as it is today.
>
> On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon  wrote:
>
>> +1
>>
>> It's a legacy conf that we should eventually remove it away. Spark should
>> create Spark table by default, not Hive table.
>>
>> Mich, for your workload, you can simply switch that conf off if it
>> concerns you. We also enabled ANSI as well (that you agreed on). It's a bit
>> akwakrd to stop in the middle for this compatibility reason during making
>> Spark sound. The compatibility has been tested in production for a long
>> time so I don't see any particular issue about the compatibility case you
>> mentioned.
>>
>> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>
>>> Hi @Wenchen Fan 
>>>
>>> Thanks for your response. I believe we have not had enough time to
>>> "DISCUSS" this matter.
>>>
>>> Currently in order to make Spark take advantage of Hive, I create a soft
>>> link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is 3.1.1
>>>
>>>  /opt/spark/conf/hive-site.xml ->
>>> /data6/hduser/hive-3.1.1/conf/hive-site.xml
>>>
>>> This works fine for me in my lab. So in the future if we opt to use the
>>> setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
>>> not be a need for this logical link.?
>>> On the face of it, this looks fine but in real life it may require a
>>> number of changes to the old scripts. Hence my concern.
>>> As a matter of interest has anyone liaised with the Hive team to ensure
>>> they have introduced the additional changes you outlined?
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> Von Braun
>>> )".
>>>
>>>
>>> On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:
>>>
 @Mich Talebzadeh  thanks for sharing your
 concern!

 Note: creating Spark native data source tables is usually Hive
 compatible as well, unless we use features that Hive does not support
 (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to
 create Spark native table in this case, instead of creating Hive table and
 fail.

 On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:

> +1 (non-binding)
>
> Thanks,
> Cheng Pan
>
> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
> wrote:
> >
> > +1
> >
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
> >
> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh 
> wrote:
> >>
> >> +1
> >>
> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun 
> wrote:
> >> >
> >> > I'll start with my +1.
> >> >
> >> > Dongjoon.
> >> >
> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> >> > > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> >> > > to `false` by default. The technical scope is defined in the
> following PR.
> >> > >
> >> > > - DISCUSSION:
> >> > >
> https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> >> > > - PR: https://github.com/apache/spark/pull/46207
> >> > >
> >> > > The vote is open until April 30th 1AM (PST) and passes
> >> > > if a majority 

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Wenchen Fan
@Mich Talebzadeh  there seems to be a
misunderstanding here. The Spark native data source table is still stored
in the Hive metastore, it's just that Spark will use a different (and
faster) reader/writer for it. `hive-site.xml` should work as it is today.

On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon  wrote:

> +1
>
> It's a legacy conf that we should eventually remove it away. Spark should
> create Spark table by default, not Hive table.
>
> Mich, for your workload, you can simply switch that conf off if it
> concerns you. We also enabled ANSI as well (that you agreed on). It's a bit
> akwakrd to stop in the middle for this compatibility reason during making
> Spark sound. The compatibility has been tested in production for a long
> time so I don't see any particular issue about the compatibility case you
> mentioned.
>
> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh 
> wrote:
>
>>
>> Hi @Wenchen Fan 
>>
>> Thanks for your response. I believe we have not had enough time to
>> "DISCUSS" this matter.
>>
>> Currently in order to make Spark take advantage of Hive, I create a soft
>> link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is 3.1.1
>>
>>  /opt/spark/conf/hive-site.xml ->
>> /data6/hduser/hive-3.1.1/conf/hive-site.xml
>>
>> This works fine for me in my lab. So in the future if we opt to use the
>> setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
>> not be a need for this logical link.?
>> On the face of it, this looks fine but in real life it may require a
>> number of changes to the old scripts. Hence my concern.
>> As a matter of interest has anyone liaised with the Hive team to ensure
>> they have introduced the additional changes you outlined?
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von Braun
>> )".
>>
>>
>> On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:
>>
>>> @Mich Talebzadeh  thanks for sharing your
>>> concern!
>>>
>>> Note: creating Spark native data source tables is usually Hive
>>> compatible as well, unless we use features that Hive does not support
>>> (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to
>>> create Spark native table in this case, instead of creating Hive table and
>>> fail.
>>>
>>> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:
>>>
 +1 (non-binding)

 Thanks,
 Cheng Pan

 On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
 wrote:
 >
 > +1
 >
 > Twitter: https://twitter.com/holdenkarau
 > Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9
 > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
 >
 >
 > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh 
 wrote:
 >>
 >> +1
 >>
 >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun 
 wrote:
 >> >
 >> > I'll start with my +1.
 >> >
 >> > Dongjoon.
 >> >
 >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
 >> > > Please vote on SPARK-46122 to set
 spark.sql.legacy.createHiveTableByDefault
 >> > > to `false` by default. The technical scope is defined in the
 following PR.
 >> > >
 >> > > - DISCUSSION:
 >> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
 >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
 >> > > - PR: https://github.com/apache/spark/pull/46207
 >> > >
 >> > > The vote is open until April 30th 1AM (PST) and passes
 >> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1
 votes.
 >> > >
 >> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by
 default
 >> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault
 because ...
 >> > >
 >> > > Thank you in advance.
 >> > >
 >> > > Dongjoon
 >> > >
 >> >
 >> >
 -
 >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >> >
 >>
 >> -
 >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >>

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Hyukjin Kwon
+1

It's a legacy conf that we should eventually remove it away. Spark should
create Spark table by default, not Hive table.

Mich, for your workload, you can simply switch that conf off if it concerns
you. We also enabled ANSI as well (that you agreed on). It's a bit akwakrd
to stop in the middle for this compatibility reason during making Spark
sound. The compatibility has been tested in production for a long time so I
don't see any particular issue about the compatibility case you mentioned.

On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh 
wrote:

>
> Hi @Wenchen Fan 
>
> Thanks for your response. I believe we have not had enough time to
> "DISCUSS" this matter.
>
> Currently in order to make Spark take advantage of Hive, I create a soft
> link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is 3.1.1
>
>  /opt/spark/conf/hive-site.xml ->
> /data6/hduser/hive-3.1.1/conf/hive-site.xml
>
> This works fine for me in my lab. So in the future if we opt to use the
> setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
> not be a need for this logical link.?
> On the face of it, this looks fine but in real life it may require a
> number of changes to the old scripts. Hence my concern.
> As a matter of interest has anyone liaised with the Hive team to ensure
> they have introduced the additional changes you outlined?
>
> HTH
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:
>
>> @Mich Talebzadeh  thanks for sharing your
>> concern!
>>
>> Note: creating Spark native data source tables is usually Hive compatible
>> as well, unless we use features that Hive does not support (TIMESTAMP NTZ,
>> ANSI INTERVAL, etc.). I think it's a better default to create Spark native
>> table in this case, instead of creating Hive table and fail.
>>
>> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > Twitter: https://twitter.com/holdenkarau
>>> > Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> >
>>> >
>>> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh  wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun 
>>> wrote:
>>> >> >
>>> >> > I'll start with my +1.
>>> >> >
>>> >> > Dongjoon.
>>> >> >
>>> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>>> >> > > Please vote on SPARK-46122 to set
>>> spark.sql.legacy.createHiveTableByDefault
>>> >> > > to `false` by default. The technical scope is defined in the
>>> following PR.
>>> >> > >
>>> >> > > - DISCUSSION:
>>> >> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
>>> >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
>>> >> > > - PR: https://github.com/apache/spark/pull/46207
>>> >> > >
>>> >> > > The vote is open until April 30th 1AM (PST) and passes
>>> >> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >> > >
>>> >> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by
>>> default
>>> >> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault
>>> because ...
>>> >> > >
>>> >> > > Thank you in advance.
>>> >> > >
>>> >> > > Dongjoon
>>> >> > >
>>> >> >
>>> >> >
>>> -
>>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >> >
>>> >>
>>> >> -
>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-28 Thread Mich Talebzadeh
Hi @Wenchen Fan 

Thanks for your response. I believe we have not had enough time to
"DISCUSS" this matter.

Currently in order to make Spark take advantage of Hive, I create a soft
link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is 3.1.1

 /opt/spark/conf/hive-site.xml ->
/data6/hduser/hive-3.1.1/conf/hive-site.xml

This works fine for me in my lab. So in the future if we opt to use the
setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
not be a need for this logical link.?
On the face of it, this looks fine but in real life it may require a number
of changes to the old scripts. Hence my concern.
As a matter of interest has anyone liaised with the Hive team to ensure
they have introduced the additional changes you outlined?

HTH

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sun, 28 Apr 2024 at 09:34, Wenchen Fan  wrote:

> @Mich Talebzadeh  thanks for sharing your
> concern!
>
> Note: creating Spark native data source tables is usually Hive compatible
> as well, unless we use features that Hive does not support (TIMESTAMP NTZ,
> ANSI INTERVAL, etc.). I think it's a better default to create Spark native
> table in this case, instead of creating Hive table and fail.
>
> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>> Cheng Pan
>>
>> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
>> wrote:
>> >
>> > +1
>> >
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> >
>> >
>> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh  wrote:
>> >>
>> >> +1
>> >>
>> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun 
>> wrote:
>> >> >
>> >> > I'll start with my +1.
>> >> >
>> >> > Dongjoon.
>> >> >
>> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>> >> > > Please vote on SPARK-46122 to set
>> spark.sql.legacy.createHiveTableByDefault
>> >> > > to `false` by default. The technical scope is defined in the
>> following PR.
>> >> > >
>> >> > > - DISCUSSION:
>> >> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
>> >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
>> >> > > - PR: https://github.com/apache/spark/pull/46207
>> >> > >
>> >> > > The vote is open until April 30th 1AM (PST) and passes
>> >> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >> > >
>> >> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by
>> default
>> >> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault
>> because ...
>> >> > >
>> >> > > Thank you in advance.
>> >> > >
>> >> > > Dongjoon
>> >> > >
>> >> >
>> >> > -
>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >> >
>> >>
>> >> -
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-28 Thread Wenchen Fan
@Mich Talebzadeh  thanks for sharing your
concern!

Note: creating Spark native data source tables is usually Hive compatible
as well, unless we use features that Hive does not support (TIMESTAMP NTZ,
ANSI INTERVAL, etc.). I think it's a better default to create Spark native
table in this case, instead of creating Hive table and fail.

On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan  wrote:

> +1 (non-binding)
>
> Thanks,
> Cheng Pan
>
> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau 
> wrote:
> >
> > +1
> >
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
> >
> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh  wrote:
> >>
> >> +1
> >>
> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun 
> wrote:
> >> >
> >> > I'll start with my +1.
> >> >
> >> > Dongjoon.
> >> >
> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> >> > > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> >> > > to `false` by default. The technical scope is defined in the
> following PR.
> >> > >
> >> > > - DISCUSSION:
> >> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> >> > > - PR: https://github.com/apache/spark/pull/46207
> >> > >
> >> > > The vote is open until April 30th 1AM (PST) and passes
> >> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >> > >
> >> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by
> default
> >> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault
> because ...
> >> > >
> >> > > Thank you in advance.
> >> > >
> >> > > Dongjoon
> >> > >
> >> >
> >> > -
> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >> >
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Cheng Pan
+1 (non-binding)

Thanks,
Cheng Pan

On Sat, Apr 27, 2024 at 9:29 AM Holden Karau  wrote:
>
> +1
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh  wrote:
>>
>> +1
>>
>> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:
>> >
>> > I'll start with my +1.
>> >
>> > Dongjoon.
>> >
>> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>> > > Please vote on SPARK-46122 to set 
>> > > spark.sql.legacy.createHiveTableByDefault
>> > > to `false` by default. The technical scope is defined in the following 
>> > > PR.
>> > >
>> > > - DISCUSSION:
>> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
>> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
>> > > - PR: https://github.com/apache/spark/pull/46207
>> > >
>> > > The vote is open until April 30th 1AM (PST) and passes
>> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> > >
>> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
>> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because 
>> > > ...
>> > >
>> > > Thank you in advance.
>> > >
>> > > Dongjoon
>> > >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Zhou Jiang
+1 (non-binding)

On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:

> I'll start with my +1.
>
> Dongjoon.
>
> On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> > to `false` by default. The technical scope is defined in the following
> PR.
> >
> > - DISCUSSION:
> > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> > - PR: https://github.com/apache/spark/pull/46207
> >
> > The vote is open until April 30th 1AM (PST) and passes
> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because
> ...
> >
> > Thank you in advance.
> >
> > Dongjoon
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
*Zhou JIANG*


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Mich Talebzadeh
-1 for me

Do not change spark.sql.legacy.createHiveTableByDefault because:

   1. We have not had enough time to "DISCUSS" this matter. The discussion
   thread was opened almost 24 hours ago.
   2. Compatibility: Changing the default behavior could potentially break
   existing workflows or pipelines that rely on the current behavior. Many
   users may have scripts or applications that expect Hive tables to be
   created by default, and altering this behavior without careful
   consideration could lead to unexpected issues.
   3. User Experience: For users who are familiar with the current
   behavior, having Hive tables created by default may be more intuitive and
   convenient. Changing the default behavior could require users to modify
   their scripts or workflows, leading to confusion and productivity loss.
   4. Flexibility: Retaining the option to create Hive tables by default
   allows users to leverage the features and optimizations provided by the
   Hive metastore. While Spark native tables may offer certain advantages,
   there are use cases where Hive tables are preferred, such as integration
   with the existing Hive ecosystems or compatibility with other tools as I
   brought up in the "DISCUSS" thread.
   5. Many users have built workflows, scripts, or applications based on
   this behaviour, and any changes to it could impact their ability to
   effectively use Spark SQL in their data processing pipelines.


IMO, these reasons warrant the importance of carefully evaluating the
impact of changing the default behaviour.
Mich TalebzadehTechnologist | Architect | Data Engineer  | Generative AI |
FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Fri, 26 Apr 2024 at 20:06, L. C. Hsieh  wrote:

> +1
>
> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun 
> wrote:
> >
> > I'll start with my +1.
> >
> > Dongjoon.
> >
> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> > > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> > > to `false` by default. The technical scope is defined in the following
> PR.
> > >
> > > - DISCUSSION:
> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> > > - PR: https://github.com/apache/spark/pull/46207
> > >
> > > The vote is open until April 30th 1AM (PST) and passes
> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> > >
> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by
> default
> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because
> ...
> > >
> > > Thank you in advance.
> > >
> > > Dongjoon
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Holden Karau
+1

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh  wrote:

> +1
>
> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun 
> wrote:
> >
> > I'll start with my +1.
> >
> > Dongjoon.
> >
> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> > > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> > > to `false` by default. The technical scope is defined in the following
> PR.
> > >
> > > - DISCUSSION:
> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> > > - PR: https://github.com/apache/spark/pull/46207
> > >
> > > The vote is open until April 30th 1AM (PST) and passes
> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> > >
> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by
> default
> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because
> ...
> > >
> > > Thank you in advance.
> > >
> > > Dongjoon
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread L. C. Hsieh
+1

On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:
>
> I'll start with my +1.
>
> Dongjoon.
>
> On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> > Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
> > to `false` by default. The technical scope is defined in the following PR.
> >
> > - DISCUSSION:
> > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> > - PR: https://github.com/apache/spark/pull/46207
> >
> > The vote is open until April 30th 1AM (PST) and passes
> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...
> >
> > Thank you in advance.
> >
> > Dongjoon
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Gengliang Wang
+1

On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun  wrote:

> I'll start with my +1.
>
> Dongjoon.
>
> On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> > Please vote on SPARK-46122 to set
> spark.sql.legacy.createHiveTableByDefault
> > to `false` by default. The technical scope is defined in the following
> PR.
> >
> > - DISCUSSION:
> > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> > - PR: https://github.com/apache/spark/pull/46207
> >
> > The vote is open until April 30th 1AM (PST) and passes
> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because
> ...
> >
> > Thank you in advance.
> >
> > Dongjoon
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
I'll start with my +1.

Dongjoon.

On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
> to `false` by default. The technical scope is defined in the following PR.
> 
> - DISCUSSION:
> https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> - PR: https://github.com/apache/spark/pull/46207
> 
> The vote is open until April 30th 1AM (PST) and passes
> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> 
> [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...
> 
> Thank you in advance.
> 
> Dongjoon
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
to `false` by default. The technical scope is defined in the following PR.

- DISCUSSION:
https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
- JIRA: https://issues.apache.org/jira/browse/SPARK-46122
- PR: https://github.com/apache/spark/pull/46207

The vote is open until April 30th 1AM (PST) and passes
if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
[ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...

Thank you in advance.

Dongjoon