Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Asif Shahid
Thanks for the explanation. Regards Asif On Tue, Jan 28, 2025 at 10:00 AM Herman van Hovell wrote: > There are many factors: > >- Typically it is a race between multiple PRs, where they all pass CI >without the other changes, and get merged at the same time. >- Differences between

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Herman van Hovell
There are many factors: - Typically it is a race between multiple PRs, where they all pass CI without the other changes, and get merged at the same time. - Differences between (the nightly job and the PR job) environments (e.g. size of the machine) can also cause these issues. - In

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Asif Shahid
I am genuinely curious to know, as to how do those commits which are reliably failing the build, end up in master ? Is there some window of race where two conflicting PRs in terms of logic ,tend to mess up the final state in master ? I have seen in past few months, while synching up my open PRs, f

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-27 Thread Dongjoon Hyun
Did you see the PR, Martin? SBT is also broken like the following and we've been waiting for actions over two days on the original PR. $ build/sbt clean "catalyst/testOnly org.apache.spark.sql.catalyst.encoders.EncoderResolutionSuite" ... [info] *** 1 SUITE ABORTED *** [error] Error during tests:

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-27 Thread Martin Grund
Would it not have been mindful to wait for the original author to investigate the PR and do a forward fix instead of reverting such a big change? Since this was only blocking the Maven test we could have waited probably a few more days without any issues. On Mon, Jan 27, 2025 at 8:32 PM Dongjoon H

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-27 Thread Dongjoon Hyun
This is reverted from branch-4.0 via the following. - https://github.com/apache/spark/pull/49696 Revert "[SPARK-49700][CONNECT][SQL] Unified Scala Interface for Connect and Classic" Dongjoon. On 2025/01/26 16:58:45 Dongjoon Hyun wrote: > Thank you! > > Dongjoon > > On Sat, Jan 25, 2025 at 20:

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-26 Thread Dongjoon Hyun
Thank you! Dongjoon On Sat, Jan 25, 2025 at 20:01 Yang Jie wrote: > I reported a test issue that is suspected to be related to this pr: > > - https://github.com/apache/spark/pull/48818/files#r1929652392 > > and it seems to be causing the failure of the Maven daily test. > > Thanks, > Jie Yang >

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-25 Thread Yang Jie
I reported a test issue that is suspected to be related to this pr: - https://github.com/apache/spark/pull/48818/files#r1929652392 and it seems to be causing the failure of the Maven daily test. Thanks, Jie Yang On 2025/01/24 20:24:57 Dongjoon Hyun wrote: > Hi, All. > > SPARK-49700 landed one

RE: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-23 Thread Balaji Sudharsanam V
Do we have a Java Client for Spark Connect which is something like PySpark? From: Mich Talebzadeh Sent: 22 January 2025 15:05 To: Hyukjin Kwon Cc: Martin Grund ; Holden Karau ; Dongjoon Hyun ; dev Subject: [EXTERNAL] Re: FYI: A Hallucination about Spark Connect Stability in Spark 4 CI

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Mich Talebzadeh
CI broken is really an operational aspect albeit in this case was quote temporary. We should put that aside and move on as 1) product is sound and 2) spark connect is strategic for the future of Spark. HTH Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Hyukjin Kwon
While it might be a bit too much to talk about its stability, it is true that the CI dedicated for Spark Connect compat was broken there for a couple of weeks, and the errors from the tests look confusing. I agree that tests and builds could be one of the easiest measurements to tell the state of a

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Martin Grund
I'm very confused about how we use stability in CI as a measure to discuss the strategy of a particular feature, particularly because we call these "hallucinations." >From real-world experience, I can say that we have thousands of clients using Spark Connect across many different versions in our i

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Jules Damji
Thanks for update and looking into it. Excuse the thumb typos On Tue, 21 Jan 2025 at 4:09 PM, Hyukjin Kwon wrote: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that is > still in development. > I

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Ángel
I'm passionate about and have lots of experience fixing OOMs. Contact me if you need some help. El mié, 22 ene 2025, 1:10, Hyukjin Kwon escribió: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that i

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
Thank you, Hyukjin! Dongjoon On Tue, Jan 21, 2025 at 16:10 Hyukjin Kwon wrote: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that is > still in development. > I made an umbrella JIRA (https://issue

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
Just a quick note on that: the major reason is 1. OOM we should figure out and fix the CI environment. 2. structured streaming test failure that is still in development. I made an umbrella JIRA (https://issues.apache.org/jira/browse/SPARK-50907), and I will work there. Should be easier to look at w

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
Let me take a look. shouldn't be a major issue. On Wed, 22 Jan 2025 at 08:31, Mich Talebzadeh wrote: > As discussed on a thread over the weekend, we agreed among us including > Matei on a shift towards a more stable and version-independent APIs. > Spark Connect IMO is a key enabler of this shi

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Mich Talebzadeh
As discussed on a thread over the weekend, we agreed among us including Matei on a shift towards a more stable and version-independent APIs. Spark Connect IMO is a key enabler of this shift, allowing users and developers to build applications and libraries that are more resilient to changes in Sp

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
To be clear, (1) is `PySpark 4.0 Client` + `Spark 4.0 Server`, which is more severe. And, your point matches with (2) exactly. Thank you for your reply, Holden. Dongjoon. On 2025/01/21 22:38:20 Holden Karau wrote: > Interesting. So given one of the features of Spark connect should be > simpler

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Holden Karau
Interesting. So given one of the features of Spark connect should be simpler migrations we should (in my mind) only declare it stable once we’ve gone through two releases where the previous client + its code can talk to the new server. Twitter: https://twitter.com/holdenkarau Fight Health Insuranc

Re: [FYI] Known `Spark Connect` Test Suite Flakiness

2025-01-20 Thread Dongjoon Hyun
Thank you, Paddy. Dongjoon. On Mon, Jan 20, 2025 at 2:32 AM Paddy Xu wrote: > I have worked on tests related to “interrupt”. Not sure about SPARK-50888: > > My findings: > 1. These test failures only occur in the GitHub CI. > 2. The failure is due to the thread pool we created in CI having on

RE: [FYI] Known `Spark Connect` Test Suite Flakiness

2025-01-20 Thread Paddy Xu
I have worked on tests related to “interrupt”. Not sure about SPARK-50888: My findings: 1. These test failures only occur in the GitHub CI. 2. The failure is due to the thread pool we created in CI having only two threads, while our tests require three concurrent threads to run. To workaround th

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-27 Thread Hussein Awala
>> > 日期: 2024年4月26日 星期五 15:05 >> > 收件人: Xinrong Meng >> > 抄送: Dongjoon Hyun , "dev@spark.apache.org" < >> dev@spark.apache.org> >> > 主题: Re: [FYI] SPARK-47993: Drop Python 3.8 >> > >> > >> > >> > +1

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-26 Thread John Zhuge
+1 On Fri, Apr 26, 2024 at 8:41 AM Kent Yao wrote: > +1 > > yangjie01 于2024年4月26日周五 17:16写道: > > > > +1 > > > > > > > > 发件人: Ruifeng Zheng > > 日期: 2024年4月26日 星期五 15:05 > > 收件人: Xinrong Meng > > 抄送: Dongjoon Hyun , "dev@spark

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-26 Thread Kent Yao
+1 yangjie01 于2024年4月26日周五 17:16写道: > > +1 > > > > 发件人: Ruifeng Zheng > 日期: 2024年4月26日 星期五 15:05 > 收件人: Xinrong Meng > 抄送: Dongjoon Hyun , "dev@spark.apache.org" > > 主题: Re: [FYI] SPARK-47993: Drop Python 3.8 > > > > +1 > > >

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-26 Thread yangjie01
+1 发件人: Ruifeng Zheng 日期: 2024年4月26日 星期五 15:05 收件人: Xinrong Meng 抄送: Dongjoon Hyun , "dev@spark.apache.org" 主题: Re: [FYI] SPARK-47993: Drop Python 3.8 +1 On Fri, Apr 26, 2024 at 10:26 AM Xinrong Meng mailto:xinr...@apache.org>> wrote: +1 On Thu, Apr 25, 2024 at 2:08

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Ruifeng Zheng
+1 On Fri, Apr 26, 2024 at 10:26 AM Xinrong Meng wrote: > +1 > > On Thu, Apr 25, 2024 at 2:08 PM Holden Karau > wrote: > >> +1 >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 >> YouTube

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Denny Lee
+1 (non-binding) On Thu, Apr 25, 2024 at 19:26 Xinrong Meng wrote: > +1 > > On Thu, Apr 25, 2024 at 2:08 PM Holden Karau > wrote: > >> +1 >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Xinrong Meng
+1 On Thu, Apr 25, 2024 at 2:08 PM Holden Karau wrote: > +1 > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > On Thu, Apr

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread L. C. Hsieh
+1 On Thu, Apr 25, 2024 at 11:19 AM Maciej wrote: > > +1 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 4/25/24 6:21 PM, Reynold Xin wrote: > > +1 > > On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale > wrote: >> >> +1 >> >> On Thu, Apr 25, 2024

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu, Apr 25, 2024 at 11:18 AM Maciej wrote: > +1 > > Best regards, > Maciej Szy

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 4/25/24 6:21 PM, Reynold Xin wrote: +1 On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale wrote: +1 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun wrote: FYI, there is a proposal to drop

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Reynold Xin
+1 On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale wrote: > +1 > > On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun > wrote: > >> FYI, there is a proposal to drop Python 3.8 because its EOL is October >> 2024. >> >> https://github.com/apache/spark/pull/46228 >> [SPARK-47993][PYTHON] Drop Python 3.8

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Santosh Pingale
+1 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun wrote: > FYI, there is a proposal to drop Python 3.8 because its EOL is October > 2024. > > https://github.com/apache/spark/pull/46228 > [SPARK-47993][PYTHON] Drop Python 3.8 > > Since it's still alive and there will be an overlap between the lifecy

Re: [FYI] SPARK-45981: Improve Python language test coverage

2023-12-02 Thread Hyukjin Kwon
Awesome! On Sat, Dec 2, 2023 at 2:33 PM Dongjoon Hyun wrote: > Hi, All. > > As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community > starts to have test coverage for all supported Python versions from Today. > > - https://github.com/apache/spark/actions/runs/7061665420 > > Her

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-16 Thread Wenchen Fan
Great job! On Sat, Nov 13, 2021 at 11:18 AM Hyukjin Kwon wrote: > Awesome! > > On Sat, Nov 13, 2021 at 12:04 PM Xiao Li wrote: > >> Thank you! Great job! >> >> Xiao >> >> >> On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan >> wrote: >> >>> >>> Nice job ! >>> There are some nice API's which

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Hyukjin Kwon
Awesome! On Sat, Nov 13, 2021 at 12:04 PM Xiao Li wrote: > Thank you! Great job! > > Xiao > > > On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan > wrote: > >> >> Nice job ! >> There are some nice API's which should be interesting to explore with JDK >> 17 :-) >> >> Regards. >> Mridul >> >> O

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Xiao Li
Thank you! Great job! Xiao On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan wrote: > > Nice job ! > There are some nice API's which should be interesting to explore with JDK > 17 :-) > > Regards. > Mridul > > On Fri, Nov 12, 2021 at 7:08 PM Yuming Wang wrote: > >> Cool, thank you Dongjoon.

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Mridul Muralidharan
Nice job ! There are some nice API's which should be interesting to explore with JDK 17 :-) Regards. Mridul On Fri, Nov 12, 2021 at 7:08 PM Yuming Wang wrote: > Cool, thank you Dongjoon. > > On Sat, Nov 13, 2021 at 4:09 AM shane knapp ☠ wrote: > >> woot! nice work everyone! :) >> >> On Fri,

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Yuming Wang
Cool, thank you Dongjoon. On Sat, Nov 13, 2021 at 4:09 AM shane knapp ☠ wrote: > woot! nice work everyone! :) > > On Fri, Nov 12, 2021 at 11:37 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Apache Spark community has been working on Java 17 support under the >> following JIRA. >> >> https

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread shane knapp ☠
woot! nice work everyone! :) On Fri, Nov 12, 2021 at 11:37 AM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark community has been working on Java 17 support under the > following JIRA. > > https://issues.apache.org/jira/browse/SPARK-33772 > > As of today, Apache Spark starts to have daily

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-09 Thread Hyukjin Kwon
To share the update about GitHub Actions: - I am informed that having more resources is now being discussed in org level. Hopefully we get the situation better. - Duplicated workflows will be canceled to save resources, see https://github.com/apache/spark/pull/31104 cc @Holden Karau @Xiao Li too

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread Hyukjin Kwon
For GitHub resources of ASF repo, I have been contacting GitHub to address the issue few days ago. This is not a repo level problem cc @Sean Owen . ASF organisation in GitHub has already too many repos, and we should have a way to increase the limit, or set the separare limit specifically for the

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp ☠
no, i don't think that'd be a good idea... adding additional dependencies to our cluster won't scale one bit. On Fri, Jan 8, 2021 at 2:16 PM Dongjoon Hyun wrote: > BTW, Shane, do you think we can utilize some of UCB machines as GitHub > Action runners? > > Bests, > Dongjoon. > > On Fri, Jan 8,

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp ☠
hmm, the ubuntu16 machines are acting up. i pinned the sbt master builds to ubuntu20 and they're happily building while i investigate wtf is up. On Fri, Jan 8, 2021 at 2:15 PM Dongjoon Hyun wrote: > The followings? > > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread Dongjoon Hyun
BTW, Shane, do you think we can utilize some of UCB machines as GitHub Action runners? Bests, Dongjoon. On Fri, Jan 8, 2021 at 2:14 PM Dongjoon Hyun wrote: > The followings? > > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/18

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread Dongjoon Hyun
The followings? https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/1836/console https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/1887/console On Fri, Jan 8, 2021 at 2:13 P

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp ☠
> > 1. Jenkins machines start to fail with the following recently. > (master branch) > > Python versions prior to 3.6 are not supported. > Build step 'Execute shell' marked build as failure > > examples please? -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research /

Re: FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-12-01 Thread Ryan Blue
Wenchen, could you start a new thread? Many people have probably already muted this one, and it isn't really on topic. The question that needs to be discussed is whether this is a safe change for the 3.1 release, and reusing an old thread is not a great way to get people's attention about somethin

Re: FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-12-01 Thread Wenchen Fan
I'm reviving this thread because this feature was reverted before the 3.0 release, and now we are trying to add it back since the CREATE TABLE syntax is unified. The benefits are pretty clear: CREATE TABLE by default (without USING or STORED AS) should create native tables that work best with Spar

Re: [FYI] Removing `spark-3.1.0-bin-hadoop2.7-hive1.2.tgz` from Apache Spark 3.1 distribution

2020-10-07 Thread Dongjoon Hyun
There is a new thread here. https://lists.apache.org/thread.html/ra2418b75ac276861a598e7ec943750e2b038c2f8ba49f41db57e5ae9%40%3Cdev.spark.apache.org%3E Could you share your use case of Hive 1.2 here, Koert? Bests, Dongjoon. On Wed, Oct 7, 2020 at 1:04 PM Koert Kuipers wrote: > i am a little c

Re: [FYI] Removing `spark-3.1.0-bin-hadoop2.7-hive1.2.tgz` from Apache Spark 3.1 distribution

2020-10-07 Thread Koert Kuipers
i am a little confused about this. i assumed spark would no longer make a distribution with hive 1.x, but the hive-1.2 profile remains. yet i see the hive-1.2 profile has been removed from pom.xml? On Wed, Sep 23, 2020 at 6:58 PM Dongjoon Hyun wrote: > Hi, All. > > Since Apache Spark 3.0.0, Apa

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-10-04 Thread Dongjoon Hyun
Sure, German. Please add your comment on that issue. If possible, please provide a reproducible example which you did. On Sat, Oct 3, 2020 at 10:31 PM German Schiavon wrote: > Hi! > I just run to this same issue while testing k8s in local mode > > https://issues.apache.org/jira/browse/SPARK-3

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-10-03 Thread German Schiavon
Hi! I just run to this same issue while testing k8s in local mode https://issues.apache.org/jira/browse/SPARK-31800 Note that the tittle shouldn't be "*Unable to disable Kerberos when submitting jobs to Kubernetes" *(based on the comments) and something more related with the spark.kubernetes.fil

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-09-29 Thread Dongjoon Hyun
Thank you! Bests, Dongjoon On Mon, Sep 28, 2020 at 8:07 PM Dr. Kent Yao wrote: > Thanks, Dongjon, > >I pined two long-standing issues to the umbrella. > > > >https://issues.apache.org/jira/browse/SPARK-28895 > >https://issues.apache.org/jira/browse/SPARK-28992 > > > >This helps

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-09-28 Thread Dr. Kent Yao
Thanks, Dongjon, I pined two long-standing issues to the umbrella. https://issues.apache.org/jira/browse/SPARK-28895 https://issues.apache.org/jira/browse/SPARK-28992 This helps fix some problems of running Spark on k8s application with HDFS. Hoping to get reviewed soon Best

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Reynold Xin
I agree it sucks. We started with some decision that might have made sense back in 2013 (let's use Hive as the default source, and guess what, pick the slowest possible serde by default). We are paying that debt ever since. Thanks for bringing this thread up though. We don't have a clear solutio

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
Technically, I has been suffered with (1) `CREATE TABLE` due to many difference for a long time (since 2017). So, I had a wrong assumption for the implication of that "(2) FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", Reynold. I admit that. You may not feel in the si

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Reynold Xin
You are joking when you said " informed widely and discussed in many ways twice" right? This thread doesn't even talk about char/varchar:  https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E (Yes it talked about changing the

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
+1 for Wenchen's suggestion. I believe that the difference and effects are informed widely and discussed in many ways twice. First, this was shared on last December. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", 2019/12/06 https://lists.apache.org/thread.htm

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Wenchen Fan
OK let me put a proposal here: 1. Permanently ban CHAR for native data source tables, and only keep it for Hive compatibility. It's OK to forget about padding like what Snowflake and MySQL have done. But it's hard for Spark to require consistent behavior about CHAR type in all data sources. Since

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Michael Armbrust
> > What I'd oppose is to just ban char for the native data sources, and do > not have a plan to address this problem systematically. > +1 > Just forget about padding, like what Snowflake and MySQL have done. > Document that char(x) is just an alias for string. And then move on. Almost > no work

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Maryann Xue
It would be super weird not to support VARCHAR as SQL engine. Banning CHAR is probably fine, as its semantics is genuinely confusing. We can issue a warning when parsing VARCHAR with a limit and suggest the usage of String instead. On Tue, Mar 17, 2020 at 10:27 AM Wenchen Fan wrote: > I agree th

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Wenchen Fan
I agree that Spark can define the semantic of CHAR(x) differently than the SQL standard (no padding), and ask the data sources to follow it. But the problem is, some data sources may not be able to skip padding, like the Hive serde table. On the other hand, it's easier to require padding for CHAR(

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Stephen Coy
I don’t think I can recall any usages of type CHAR in any situation. Really, it’s only use (on any traditional SQL database) would be when you *want* a fixed width character column that has been right padded with spaces. On 17 Mar 2020, at 12:13 pm, Reynold Xin mailto:r...@databricks.com>> wro

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
For sure. There's another reason I feel char is not that important and it's more important to be internally consistent (e.g. all data sources support it with the same behavior, vs one data sources do one behavior and another do the other). char was created at a time when cpu was slow and storag

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Thank you for sharing and confirming. We had better consider all heterogeneous customers in the world. And, I also have experiences with the non-negligible cases in on-prem. Bests, Dongjoon. On Mon, Mar 16, 2020 at 5:42 PM Reynold Xin wrote: > −User > > char barely showed up (honestly negligib

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
−User char barely showed up (honestly negligible). I was comparing select vs select. On Mon, Mar 16, 2020 at 5:40 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > Ur, are you comparing the number of SELECT statement with TRIM and CREATE > statements with `CHAR`? > > > I looked up our

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Ur, are you comparing the number of SELECT statement with TRIM and CREATE statements with `CHAR`? > I looked up our usage logs (sorry I can't share this publicly) and trim has at least four orders of magnitude higher usage than char. We need to discuss more about what to do. This thread is what I

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
BTW I'm not opposing us sticking to SQL standard (I'm in general for it). I was merely pointing out that if we deviate away from SQL standard in any way we are considered "wrong" or "incorrect". That argument itself is flawed when plenty of other popular database systems also deviate away from t

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
I looked up our usage logs (sorry I can't share this publicly) and trim has at least four orders of magnitude higher usage than char. On Mon, Mar 16, 2020 at 5:27 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > Thank you, Stephen and Reynold. > > > To Reynold. > > > The way I see

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Thank you, Stephen and Reynold. To Reynold. The way I see the following is a little different. > CHAR is an undocumented data type without clearly defined semantics. Let me describe in Apache Spark User's View point. Apache Spark started to claim `HiveContext` (and `hql/hiveql` function)

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Stephen Coy
Hi there, I’m kind of new around here, but I have had experience with all of all the so called “big iron” databases such as Oracle, IBM DB2 and Microsoft SQL Server as well as Postgresql. They all support the notion of “ANSI padding” for CHAR columns - which means that such columns are always

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
I haven't spent enough time thinking about it to give a strong opinion, but this is of course very different from TRIM. TRIM is a publicly documented function with two arguments, and we silently swapped the two arguments. And trim is also quite commonly used from a long time ago. CHAR is an un

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Hi, Reynold. (And +Michael Armbrust) If you think so, do you think it's okay that we change the return value silently? Then, I'm wondering why we reverted `TRIM` functions then? > Are we sure "not padding" is "incorrect"? Bests, Dongjoon. On Sun, Mar 15, 2020 at 11:15 PM Gourav Sengupta wrote

Re: FYI: The evolution on `CHAR` type behavior

2020-03-15 Thread Reynold Xin
Are we sure "not padding" is "incorrect"? I don't know whether ANSI SQL actually requires padding, but plenty of databases don't actually pad. https://docs.snowflake.net/manuals/sql-reference/data-types-text.html ( https://docs.snowflake.net/manuals/sql-reference/data-types-text.html#:~:text=CH

Re: FYI: The evolution on `CHAR` type behavior

2020-03-15 Thread Dongjoon Hyun
Hi, Reynold. Please see the following for the context. https://issues.apache.org/jira/browse/SPARK-31136 "Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax" I raised the above issue according to the new rubric, and the banning was the proposed alternative to reduce th

Re: FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Reynold Xin
I don’t understand this change. Wouldn’t this “ban” confuse the hell out of both new and old users? For old users, their old code that was working for char(3) would now stop working. For new users, depending on whether the underlying metastore char(3) is either supported but different from ansi S

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-02 Thread Wenchen Fan
Thanks for cleaning this up! On Sun, Feb 2, 2020 at 2:08 PM Xiao Li wrote: > Thanks! Dongjoon. > > Xiao > > On Sat, Feb 1, 2020 at 5:15 PM Hyukjin Kwon wrote: > >> Thanks Dongjoon. >> >> On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: >> >>> Hi, All. >>> >>> From Today, we have `branch-3.0` as

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Xiao Li
Thanks! Dongjoon. Xiao On Sat, Feb 1, 2020 at 5:15 PM Hyukjin Kwon wrote: > Thanks Dongjoon. > > On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: > >> Hi, All. >> >> From Today, we have `branch-3.0` as a tool of `Feature Freeze`. >> >> https://github.com/apache/spark/tree/branch-3.0 >> >> A

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Hyukjin Kwon
Thanks Dongjoon. On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: > Hi, All. > > From Today, we have `branch-3.0` as a tool of `Feature Freeze`. > > https://github.com/apache/spark/tree/branch-3.0 > > All open JIRA issues whose type is `Improvement` or `New Feature` and had > `3.0.0` as a `Ta

Re: [FYI] SBT Build Failure

2020-01-18 Thread Manu Zhang
Thanks Dongjoon for the information and the fix in https://github.com/apache/spark/pull/27242 On Fri, Jan 17, 2020 at 6:58 AM Sean Owen wrote: > Ah. The Maven build already long since points at https:// for > resolution for security. I tried just overriding the resolver for the > SBT build, but

Re: [FYI] SBT Build Failure

2020-01-16 Thread Sean Owen
Ah. The Maven build already long since points at https:// for resolution for security. I tried just overriding the resolver for the SBT build, but it doesn't seem to work. I don't understand the SBT build well enough to debug right now. I think it's possible to override resolvers with local config

Re: FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2019-12-06 Thread Takeshi Yamamuro
Oh, looks nice. Thanks for the sharing, Dongjoon Bests, Takeshi On Sat, Dec 7, 2019 at 3:35 AM Dongjoon Hyun wrote: > Hi, All. > > I want to share the following change to the community. > > SPARK-30098 Use default datasource as provider for CREATE TABLE syntax > > This is merged today and n

Re: FYI - filed bunch of issues for flaky tests in recent CI builds

2019-09-18 Thread Gabor Somogyi
Had a look at the Kafka test(SPARK-29136 ) and commented. BR, G On Wed, Sep 18, 2019 at 7:54 AM Jungtaek Lim wrote: > Hi devs, > > I've found bunch of test failures (intermittently) in both CI build for > master branch as well as PR builder (o

Re: FYI

2018-05-29 Thread Jonathan Coveney
Nooo On Wed, May 30, 2018 at 13:17 eric xu wrote: > unsubscribe > > 发自我的 iPhone >

Re: FYI - Kafka's built-in performance test tool

2017-05-31 Thread Ofir Manor
Hi, sorry for that, I sent my original email to this list by mistake (gmail autocomplete fooled me), the page I linked isn't open to buplic. Anyway, since you are interested, here is the sample commands and output from a VirtualBox image on my laptop. 1. Create a topic kafka-topics.sh --create

Re: FYI - Kafka's built-in performance test tool

2017-05-31 Thread 郭健
It seems an internal page so I cannot access it: Your email address doesn't have access to equalum.atlassian.net 发件人: Ofir Manor 日期: 2017年5月26日 星期五 01:12 至: dev 主题: FYI - Kafka's built-in performance test tool comes with source code. Some basic results from the VM, * Write every second 50

Re: FYI: i've doubled the jenkins executors for every build node

2014-09-29 Thread shane knapp
yeah, this is why i'm gonna keep a close eye on things this week... as for VMs vs containers, please do the latter more than the former. one of our longer-term plans here at the lab is to move most of our jenkins infra to VMs, and running tests w/nested VMs is Bad[tm]. On Mon, Sep 29, 2014 at 2:

Re: FYI: i've doubled the jenkins executors for every build node

2014-09-29 Thread Reynold Xin
Thanks. We might see more failures due to contention on resources. Fingers acrossed ... At some point it might make sense to run the tests in a VM or container. On Mon, Sep 29, 2014 at 2:20 PM, shane knapp wrote: > we were running at 8 executors per node, and BARELY even stressing the > machine

Re: FYI: jenkins systems patched to fix bash exploit

2014-09-26 Thread shane knapp
> > > we're not running bash.x86_64 0:4.1.2-15.el6_5.2 on all of our systems. > > s/not/now :)

Re: FYI -- javax.servlet dependency issue workaround

2014-05-28 Thread Sean Owen
This class was introduced in Servlet 3.0. We have in the dependency tree some references to Servlet 2.5 and Servlet 3.0. The latter is a superset of the former. So we standardized on depending on Servlet 3.0. At least, that seems to have been successful in the Maven build, but this is just evidenc

Re: FYI -- javax.servlet dependency issue workaround

2014-05-27 Thread Prashant Sharma
Also just for sake of completeness, sometimes the desired dependency might just be an older version in that case even if you include it like above it may get evicted (Sbt's default strategy for conflict manager is to choose the latest version). So to further ensure that it does include it. We can