Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Felix Cheung
+1 From: Denny Lee Sent: Monday, April 1, 2024 10:06:14 AM To: Hussein Awala Cc: Chao Sun ; Hyukjin Kwon ; Mridul Muralidharan ; dev Subject: Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect) +1 (non-binding) On Mon, Apr 1, 2024 at 9:24 AM Hussein

Re: Why are hash functions seeded with 42?

2022-09-30 Thread Felix Cheung
+1 to doc, seed argument would be great if possible From: Sean Owen Sent: Monday, September 26, 2022 5:26:26 PM To: Nicholas Gustafson Cc: dev Subject: Re: Why are hash functions seeded with 42? Oh yeah I get why we love to pick 42 for random things. I'm

Fwd: CRAN submission SparkR 3.2.0

2021-10-20 Thread Felix Cheung
-- Forwarded message - From: Gregor Seyer Date: Wed, Oct 20, 2021 at 4:42 AM Subject: Re: CRAN submission SparkR 3.2.0 To: Felix Cheung , CRAN < cran-submissi...@r-project.org> Thanks, Please add \value to .Rd files regarding exported methods and explain the functions r

Re: CRAN package SparkR

2021-08-31 Thread Felix Cheung
ser' confirmation when we > install.spark? > IIRC, the auto installation is only triggered by interactive shell so > getting user's confirmation should be fine. > > 2021년 6월 18일 (금) 오전 2:54, Felix Cheung 님이 작성: > >> Any suggestion or comment on this? They are going to re

Re: CRAN package SparkR

2021-06-17 Thread Felix Cheung
Any suggestion or comment on this? They are going to remove the package by 6-28 Seems to me if we have a switch to opt in to install (and not by default on), or prompt the user in interactive session, should be good as user confirmation. On Sun, Jun 13, 2021 at 11:25 PM Felix Cheung wrote

Fwd: CRAN package SparkR

2021-06-14 Thread Felix Cheung
, 2021 at 10:19 PM Subject: CRAN package SparkR To: Felix Cheung CC: Dear maintainer, Checking this apparently creates the default directory as per #' @param localDir a local directory where Spark is installed. The directory con tains #' version-specific folders of Spark

Re: Welcoming six new Apache Spark committers

2021-03-26 Thread Felix Cheung
Welcome! From: Driesprong, Fokko Sent: Friday, March 26, 2021 1:25:33 PM To: Matei Zaharia Cc: Spark Dev List Subject: Re: Welcoming six new Apache Spark committers Well deserved all! Welcome! Op vr 26 mrt. 2021 om 21:21 schreef Matei Zaharia

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-05 Thread Felix Cheung
Congrats and thanks! From: Hyukjin Kwon Sent: Wednesday, March 3, 2021 4:09:23 PM To: Dongjoon Hyun Cc: Gabor Somogyi ; Jungtaek Lim ; angers zhu ; Wenchen Fan ; Kent Yao ; Takeshi Yamamuro ; dev ; user @spark Subject: Re: [ANNOUNCE] Announcing Apache Spark

Re: Recovering SparkR on CRAN?

2020-12-30 Thread Felix Cheung
consider > dropping it as Dongjoon initially pointed out. > > 2020년 12월 30일 (수) 오후 1:59, Felix Cheung 님이 작성: > >> Ah, I don’t recall actually - maybe it was just missed? >> >> The last message I had, was in June when it was broken by R 4.0.1, which >> was fixed. >

Re: Recovering SparkR on CRAN?

2020-12-29 Thread Felix Cheung
-31918 and > https://issues.apache.org/jira/browse/SPARK-32073. > I wonder why other releases were not uploaded yet. Do you guys know any > context or if there is a standing issue on this, @Felix Cheung > or @Shivaram Venkataraman > ? > > 2020년 12월 23일 (수) 오전 11:21, Mridul Mu

Re: Recovering SparkR on CRAN?

2020-12-22 Thread Felix Cheung
Ok - it took many years to get it first published, so it was hard to get there. On Tue, Dec 22, 2020 at 5:45 PM Hyukjin Kwon wrote: > Adding @Shivaram Venkataraman and @Felix > Cheung FYI > > 2020년 12월 23일 (수) 오전 9:22, Michael Heuer 님이 작성: > >> Anecdotally, as a projec

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Felix Cheung
So IMO maintaining outside in a separate repo is going to be harder. That was why I asked. From: Maciej Szymkiewicz Sent: Tuesday, August 4, 2020 12:59 PM To: Sean Owen Cc: Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; Spark Dev List Subject: Re

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Felix Cheung
What would be the reason for separate git repo? From: Hyukjin Kwon Sent: Monday, August 3, 2020 1:58:55 AM To: Maciej Szymkiewicz Cc: Driesprong, Fokko ; Holden Karau ; Spark Dev List Subject: Re: [PySpark] Revisiting PySpark type annotations Okay, seems like

Re: Exposing Spark parallelized directory listing & non-locality listing in core

2020-07-22 Thread Felix Cheung
+1 From: Holden Karau Sent: Wednesday, July 22, 2020 10:49:49 AM To: Steve Loughran Cc: dev Subject: Re: Exposing Spark parallelized directory listing & non-locality listing in core Wonderful. To be clear the patch is more to start the discussion about how we

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread Felix Cheung
Welcome! From: Nick Pentreath Sent: Tuesday, July 14, 2020 10:21:17 PM To: dev Cc: Dilip Biswal ; Jungtaek Lim ; huaxin gao Subject: Re: Welcoming some new Apache Spark committers Congratulations and welcome as Apache Spark committers! On Wed, 15 Jul 2020 at

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-07-05 Thread Felix Cheung
I think pluggable storage in shuffle is essential for k8s GA From: Holden Karau Sent: Monday, June 29, 2020 9:33 AM To: Maxim Gekk Cc: Dongjoon Hyun; dev Subject: Re: Apache Spark 3.1 Feature Expectation (Dec. 2020) Should we also consider the shuffle service

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung
-- Forwarded message - We are pleased to announce that ApacheCon @Home will be held online, September 29 through October 1. More event details are available at https://apachecon.com/acah2020 but there’s a few things that I want to highlight for you, the members. Yes, the CFP

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Felix Cheung
Congrats From: Jungtaek Lim Sent: Thursday, June 18, 2020 8:18:54 PM To: Hyukjin Kwon Cc: Mridul Muralidharan ; Reynold Xin ; dev ; user Subject: Re: [ANNOUNCE] Apache Spark 3.0.0 Great, thanks all for your efforts on the huge step forward! On Fri, Jun 19,

Re: More publicly documenting the options under spark.sql.*

2020-01-16 Thread Felix Cheung
I think it’s a good idea From: Hyukjin Kwon Sent: Wednesday, January 15, 2020 5:49:12 AM To: dev Cc: Sean Owen ; Nicholas Chammas Subject: Re: More publicly documenting the options under spark.sql.* Resending to the dev list for archive purpose: I think

Re: Enabling fully disaggregated shuffle on Spark

2019-11-20 Thread Felix Cheung
; Christopher Crosbie ; Griselda Cuevas ; Holden Karau ; Mayank Ahuja ; Kalyan Sivakumar ; alfo...@fb.com ; Felix Cheung ; Matt Cheah ; Yifei Huang (PD) Subject: Re: Enabling fully disaggregated shuffle on Spark That sounds great! On Wed, Nov 20, 2019 at 9:02 AM John Zhuge mailto:jzh

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Felix Cheung
Just to add - hive 1.2 fork is definitely not more stable. We know of a few critical bug fixes that we cherry picked into a fork of that fork to maintain ourselves. From: Dongjoon Hyun Sent: Wednesday, November 20, 2019 11:07:47 AM To: Sean Owen Cc: dev

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-18 Thread Felix Cheung
1000% with Steve, the org.spark-project hive 1.2 will need a solution. It is old and rather buggy; and It’s been *years* I think we should decouple hive change from everything else if people are concerned? From: Steve Loughran Sent: Sunday, November 17, 2019

Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Felix Cheung
this is about test description and not test file name right? if yes I don’t see a problem. From: Hyukjin Kwon Sent: Thursday, November 14, 2019 6:03:02 PM To: Shixiong(Ryan) Zhu Cc: dev ; Felix Cheung ; Shivaram Venkataraman Subject: Re: Adding JIRA ID

Re: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-11 Thread Felix Cheung
+1 From: Thomas graves Sent: Wednesday, September 4, 2019 7:24:26 AM To: dev Subject: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling Hey everyone, I'd like to call for a vote on SPARK-27495 SPIP: Support Stage level

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-09-08 Thread Felix Cheung
I’d prefer strict mode and fail fast (analysis check) Also I like what Alastair suggested about standard clarification. I think we can re-visit this proposal and restart the vote From: Ryan Blue Sent: Friday, September 6, 2019 5:28 PM To: Alastair Green Cc:

Re: maven 3.6.1 removed from apache maven repo

2019-09-03 Thread Felix Cheung
(Hmm, what is spark-...@apache.org?) From: Sean Owen Sent: Tuesday, September 3, 2019 11:58:30 AM To: Xiao Li Cc: Tom Graves ; spark-...@apache.org Subject: Re: maven 3.6.1 removed from apache maven repo It's because build/mvn only queries ASF mirrors, and

Re: Design review of SPARK-28594

2019-09-01 Thread Felix Cheung
I did review it and solving this problem makes sense. I will comment in the JIRA. From: Jungtaek Lim Sent: Sunday, August 25, 2019 3:34:22 PM To: dev Subject: Design review of SPARK-28594 Hi devs, I have been working on designing SPARK-28594 [1] (though I've

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-30 Thread Felix Cheung
+1 Run tests, R tests, r-hub Debian, Ubuntu, mac, Windows From: Hyukjin Kwon Sent: Wednesday, August 28, 2019 9:14 PM To: Takeshi Yamamuro Cc: dev; Dongjoon Hyun Subject: Re: [VOTE] Release Apache Spark 2.4.4 (RC3) +1 (from the last blocker PR) 2019년 8월 29일

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Felix Cheung
That’s great! From: ☼ R Nair Sent: Saturday, August 24, 2019 10:57:31 AM To: Dongjoon Hyun Cc: dev@spark.apache.org ; user @spark/'user @spark'/spark users/user@spark Subject: Re: JDK11 Support in Apache Spark Finally!!! Congrats On Sat, Aug 24, 2019, 11:11

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-17 Thread Felix Cheung
+1 Glad to see the progress in this space - it’s been more than a year since the original discussion and effort started. From: Yinan Li Sent: Monday, June 17, 2019 7:14:42 PM To: rb...@netflix.com Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Felix Cheung
How about pyArrow? From: Holden Karau Sent: Friday, June 14, 2019 11:06:15 AM To: Felix Cheung Cc: Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp Subject: Re: [DISCUSS] Increasing minimum supported version of Pandas Are there other Python

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Felix Cheung
So to be clear, min version check is 0.23 Jenkins test is 0.24 I’m ok with this. I hope someone will test 0.23 on releases though before we sign off? From: shane knapp Sent: Friday, June 14, 2019 10:23:56 AM To: Bryan Cutler Cc: Dongjoon Hyun; Holden Karau;

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Felix Cheung
. From: shane knapp Sent: Friday, May 31, 2019 7:38:10 PM To: Denny Lee Cc: Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark Hamstra; Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng; dev; user Subject: Re: Should python-2 be supported in Spark 3.0? +1000

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Felix Cheung
We don’t usually reference a future release on website > Spark website and state that Python 2 is deprecated in Spark 3.0 I suspect people will then ask when is Spark 3.0 coming out then. Might need to provide some clarity on that. From: Reynold Xin Sent:

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-27 Thread Felix Cheung
+1 I’d prefer to see more of the end goal and how that could be achieved (such as ETL or SPARK-24579). However given the rounds and months of discussions we have come down to just the public API. If the community thinks a new set of public API is maintainable, I don’t see any problem with

Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung
You could df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save It could get some data skew problem but might work for you From: Burak Yavuz Sent: Tuesday, May 7, 2019 9:35:10 AM To: Shubham Chaurasia Cc: dev; u...@spark.apache.org Subject: Re: Static

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-05 Thread Felix Cheung
I ran basic tests on R, r-hub etc. LGTM. +1 (limited - I didn’t get to run other usual tests) From: Sean Owen Sent: Wednesday, May 1, 2019 2:21 PM To: Xiao Li Cc: dev@spark.apache.org Subject: Re: [VOTE] Release Apache Spark 2.4.3 +1 from me. There is little

Re: [VOTE] Release Apache Spark 2.4.2

2019-05-01 Thread Felix Cheung
Just my 2c If there is a known security issue, we should fix it rather waiting for if it actually could be might be affecting Spark to be found by a black hat, or worse. I don’t think any of us want to see Spark in the news for this reason. From: Sean Owen

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Felix Cheung
+1 R tests, package tests on r-hub. Manually check commits under R, doc etc From: Sean Owen Sent: Saturday, April 20, 2019 11:27 AM To: Wenchen Fan Cc: Spark dev list Subject: Re: [VOTE] Release Apache Spark 2.4.2 +1 from me too. It seems like there is

Re: Spark 2.4.2

2019-04-18 Thread Felix Cheung
Re shading - same argument I’ve made earlier today in a PR... (Context- in many cases Spark has light or indirect dependencies but bringing them into the process breaks users code easily) From: Michael Heuer Sent: Thursday, April 18, 2019 6:41 AM To: Reynold

Re: Dataset schema incompatibility bug when reading column partitioned data

2019-04-13 Thread Felix Cheung
I kinda agree it is confusing when a parameter is not used... From: Ryan Blue Sent: Thursday, April 11, 2019 11:07:25 AM To: Bruce Robbins Cc: Dávid Szakállas; Spark Dev List Subject: Re: Dataset schema incompatibility bug when reading column partitioned data

ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-13 Thread Felix Cheung
Hi Spark community! As you know ApacheCon NA 2019 is coming this Sept and it’s CFP is now open! This is an important milestone as we celebrate 20 years of ASF. We have tracks like Big Data and Machine Learning among many others. Please submit your talks/thoughts/challenges/learnings here:

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Felix Cheung
To: Bryan Cutler Cc: Felix Cheung; Hyukjin Kwon; dev Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276] i'm not opposed to 3.6 at all. On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler mailto:cutl...@gmail.com>> wrote: PyArrow dropping Python 3.4 was mainly due to support goin

Re: [k8s][jenkins] spark dev tool docs now have k8s+minikube instructions!

2019-03-29 Thread Felix Cheung
Definitely the part on the PR. Thanks! From: shane knapp Sent: Thursday, March 28, 2019 11:19 AM To: dev; Stavros Kontopoulos Subject: [k8s][jenkins] spark dev tool docs now have k8s+minikube instructions! https://spark.apache.org/developer-tools.html search

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-29 Thread Felix Cheung
+1 build source R tests R package CRAN check locally, r-hub From: d_t...@apple.com on behalf of DB Tsai Sent: Wednesday, March 27, 2019 11:31 AM To: dev Subject: [VOTE] Release Apache Spark 2.4.1 (RC9) Please vote on releasing the following candidate as Apache

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-29 Thread Felix Cheung
(I think the .invalid is added by the list server) Personally I’d rather everyone just +1 or -1, and shouldn’t add binding or not. It’s really the responsibility of the RM to confirm if a vote is binding. Mistakes have been made otherwise. From: Marcelo Vanzin

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
3.4 is end of life but 3.5 is not. From your link we expect to release Python 3.5.8 around September 2019. From: shane knapp Sent: Thursday, March 28, 2019 7:54 PM To: Hyukjin Kwon Cc: Bryan Cutler; dev; Felix Cheung Subject: Re: Upgrading minimal PyArrow

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
Shane is also correct in that newer versions of pyarrow have stopped support for Python 3.4, so we should probably have Jenkins test against 2.7 and 3.5. On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin mailto:r...@databricks.com>> wrote: +1 on doing this in 3.0. On Mon, Mar 25, 2019 at 9:31 PM,

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Felix Cheung
I’m +1 if 3.0 From: Sean Owen Sent: Monday, March 25, 2019 6:48 PM To: Hyukjin Kwon Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276] I don't know a lot about Arrow here, but seems

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Felix Cheung
Reposting for shane here [SPARK-27178] https://github.com/apache/spark/commit/342e91fdfa4e6ce5cc3a0da085d1fe723184021b Is problematic too and it’s not in the rc8 cut https://github.com/apache/spark/commits/branch-2.4 (Personally I don’t want to delay 2.4.1 either..)

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-07 Thread Felix Cheung
There is SPARK-26604 we are looking into From: Saisai Shao Sent: Wednesday, March 6, 2019 6:05 PM To: shane knapp Cc: Stavros Kontopoulos; Sean Owen; DB Tsai; Spark dev list; d_t...@apple.com Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2) Do we have other

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Felix Cheung
To: Xiangrui Meng Cc: Felix Cheung; Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling I think treating SPIPs as this high-level takes away much of the point of VOTEing on them. I'm not sure that's even what Reynold is suggesting elsewhere

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Felix Cheung
this. From: Sean Owen Sent: Sunday, March 3, 2019 8:15 AM To: Felix Cheung Cc: Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling I'm for this in general, at least a +0. I do think this has

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
I’m very hesitant with this. I don’t want to vote -1, because I personally think it’s important to do, but I’d like to see more discussion points addressed and not voting completely on the spirit of it. First, SPIP doesn’t match the format of SPIP proposed and agreed on. (Maybe this is a

Re: SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
+1 on mesos - what Sean says From: Andrew Melo Sent: Friday, March 1, 2019 9:19 AM To: Xingbo Jiang Cc: Sean Owen; Xiangrui Meng; dev Subject: Re: SPIP: Accelerator-aware Scheduling Hi, On Fri, Mar 1, 2019 at 9:48 AM Xingbo Jiang wrote: > > Hi Sean, > > To

Re: [DISCUSS][SQL][PySpark] Column name support for SQL functions

2019-02-24 Thread Felix Cheung
I hear three topics in this thread 1. I don’t think we should remove string. Column and string can both be “type safe”. And I would agree we don’t *need* to break API compatibility here. 2. Gaps in python API. Extending on #1, definitely we should be consistent and add string as param where it

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread Felix Cheung
I merged the fix to 2.4. From: Felix Cheung Sent: Wednesday, February 20, 2019 9:34 PM To: DB Tsai; Spark dev list Cc: Cesar Delgado Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2) Could you hold for a bit - I have one more fix to get

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread Felix Cheung
Could you hold for a bit - I have one more fix to get in From: d_t...@apple.com on behalf of DB Tsai Sent: Wednesday, February 20, 2019 12:25 PM To: Spark dev list Cc: Cesar Delgado Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2) Okay. Let's fail rc2, and

Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

2019-02-19 Thread Felix Cheung
+1 From: Ryan Blue Sent: Tuesday, February 19, 2019 9:34 AM To: Jamison Bennett Cc: dev Subject: Re: [VOTE] SPIP: Identifiers for multi-catalog Spark +1 On Tue, Feb 19, 2019 at 8:41 AM Jamison Bennett wrote: +1 (non-binding) Jamison Bennett Cloudera

Re: Missing SparkR in CRAN

2019-02-19 Thread Felix Cheung
, Jan 25, 2019 at 1:41 AM Felix Cheung mailto:felixche...@apache.org>> wrote: Yes it was discussed on dev@. We are waiting for 2.3.3 to release to resubmit. On Thu, Jan 24, 2019 at 5:33 AM Hyukjin Kwon mailto:gurwls...@gmail.com>> wrote: Hi all, I happened to find SparkR is missing

Re: Vectorized R gapply[Collect]() implementation

2019-02-10 Thread Felix Cheung
This is super awesome! From: Shivaram Venkataraman Sent: Saturday, February 9, 2019 8:33 AM To: Hyukjin Kwon Cc: dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram Venkataraman Subject: Re: Vectorized R gapply[Collect]() implementation Those speedups

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-10 Thread Felix Cheung
Zhuge Sent: Saturday, February 9, 2019 6:25 PM To: Felix Cheung Cc: Takeshi Yamamuro; Spark dev list Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2) Not me. I am running zulu8, maven, and hadoop-2.7. On Sat, Feb 9, 2019 at 5:42 PM Felix Cheung mailto:felixcheun...@hotmail.com>> wrote: On

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-09 Thread Felix Cheung
-integration-tests` for the JDBC integration tests. I run these tests, and then I checked if they are passed. On Sat, Feb 9, 2019 at 5:26 PM Herman van Hovell mailto:her...@databricks.com>> wrote: I count 2 binding votes :)... Op vr 8 feb. 2019 om 22:36 schreef Felix Cheung mailto:feli

Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-08 Thread Felix Cheung
For this case I’d agree with Ryan. I haven’t followed this thread and the details of the change since it’s way too much for me to consume “in my free time” (which is 0 nowadays) but I’m pretty sure the existing behavior works for us and very likely we don’t want it to change because of some

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Felix Cheung
Nope, still only 1 binding vote ;) From: Mark Hamstra Sent: Friday, February 8, 2019 7:30 PM To: Marcelo Vanzin Cc: Takeshi Yamamuro; Spark dev list Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2) There are 2. C'mon Marcelo, you can make it 3! On Fri, Feb

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Felix Cheung
Likely need a shim (which we should have anyway) because of namespace/import changes. I’m huge +1 on this. From: Hyukjin Kwon Sent: Monday, February 4, 2019 12:27 PM To: Xiao Li Cc: Sean Owen; Felix Cheung; Ryan Blue; Marcelo Vanzin; Yuming Wang; dev Subject

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-01 Thread Felix Cheung
What’s the update and next step on this? We have real users getting blocked by this issue. From: Xiao Li Sent: Wednesday, January 16, 2019 9:37 AM To: Ryan Blue Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Cheung; Yuming Wang; dev Subject: Re: [DISCUSS

Re: Missing SparkR in CRAN

2019-01-24 Thread Felix Cheung
Yes it was discussed on dev@. We are waiting for 2.3.3 to release to resubmit. On Thu, Jan 24, 2019 at 5:33 AM Hyukjin Kwon wrote: > Hi all, > > I happened to find SparkR is missing in CRAN. See > https://cran.r-project.org/web/packages/SparkR/index.html > > I remember I saw some threads about

Re: Make proactive check for closure serializability optional?

2019-01-21 Thread Felix Cheung
Agreed on the pros / cons, esp driver could be the data science notebook. Is it worthwhile making it configurable? From: Sean Owen Sent: Monday, January 21, 2019 10:42 AM To: Reynold Xin Cc: dev Subject: Re: Make proactive check for closure serializability

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-20 Thread Felix Cheung
+1 My focus is on R (sorry couldn’t cross validate what’s Sean is seeing) tested: reviewed doc R package test win-builder, r-hub Tarball/package signature From: Takeshi Yamamuro Sent: Thursday, January 17, 2019 6:49 PM To: Spark dev list Subject: [VOTE]

Re: [DISCUSS] Identifiers with multi-catalog support

2019-01-20 Thread Felix Cheung
+1 I like Ryan last mail. Thank you for putting it clearly (should be a spec/SPIP!) I agree and understand the need for 3 part id. However I don’t think we should make assumption that it must be or can only be as long as 3 parts. Once the catalog is identified (ie. The first part), the catalog

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Felix Cheung
of it) from the spark core project.. From: Xiao Li Sent: Tuesday, January 15, 2019 10:03 AM To: Felix Cheung Cc: rb...@netflix.com; Yuming Wang; dev Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4 Let me take my words back. To read/write a table, Spark users do

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Felix Cheung
And we are super 100% dependent on Hive... From: Ryan Blue Sent: Tuesday, January 15, 2019 9:53 AM To: Xiao Li Cc: Yuming Wang; dev Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4 How do we know that most Spark users are not using Hive? I wouldn't be

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Felix Cheung
Resolving https://issues.apache.org/jira/browse/HIVE-16391 means to keep Spark on Hive 1.2? I’m not sure that is reducing dependency on Hive - Hive is still there and it’s a very old Hive. IMO it is increasing the risk the longer we keep on this. (And it’s been years) Looking at the two PR.

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-13 Thread Felix Cheung
13, 2019 5:45 AM To: Felix Cheung Cc: Dongjoon Hyun; dev Subject: Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ? Will do. Er, maybe add Shane here too -- should we disable this docs job? are these docs used, and is there much value in nightly snapshots of the whole site? On Sat, Jan

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-12 Thread Felix Cheung
These get “published” by doc nightly build from riselab Jenkins... From: Dongjoon Hyun Sent: Saturday, January 12, 2019 4:32 PM To: Sean Owen Cc: dev Subject: Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ? +1 for removing old docs there. It seems

Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
Awesome Shane! From: shane knapp Sent: Sunday, January 6, 2019 11:38 AM To: Felix Cheung Cc: Dongjoon Hyun; Wenchen Fan; dev Subject: Re: Spark Packaging Jenkins noted. i like the idea of building (but not signing) the release and will update the job(s

Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
https://spark.apache.org/release-process.html Look for do-release-docker.sh script From: Felix Cheung Sent: Sunday, January 6, 2019 11:17 AM To: Dongjoon Hyun; Wenchen Fan Cc: dev; shane knapp Subject: Re: Spark Packaging Jenkins The release process doc should

Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
The release process doc should have been updated on this - as mentioned we do not use Jenkins for release signing (take this offline if further discussion is needed) The release build on Jenkins can still be useful for pre-validating the release build process (without actually signing it)

Re: Apache Spark 2.2.3 ?

2019-01-02 Thread Felix Cheung
+1 on 2.2.3 of course From: Dongjoon Hyun Sent: Wednesday, January 2, 2019 12:21 PM To: Saisai Shao Cc: Xiao Li; Felix Cheung; Sean Owen; dev Subject: Re: Apache Spark 2.2.3 ? Thank you for swift feedbacks and Happy New Year. :) For 2.2.3 release on next week

Re: Apache Spark 2.2.3 ?

2019-01-01 Thread Felix Cheung
Speaking of, it’s been 3 months since 2.3.2... (Sept 2018) And 2 months since 2.4.0 (Nov 2018) - does the community feel 2.4 branch is stabilizing? From: Sean Owen Sent: Tuesday, January 1, 2019 8:30 PM To: Dongjoon Hyun Cc: dev Subject: Re: Apache Spark 2.2.3

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-11 Thread Felix Cheung
I opened a PR on the vignettes fix to skip eval. From: Shivaram Venkataraman Sent: Wednesday, November 7, 2018 7:26 AM To: Felix Cheung Cc: Sean Owen; Shivaram Venkataraman; Wenchen Fan; Matei Zaharia; dev Subject: Re: [CRAN-pretest-archived] CRAN submission

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-10 Thread Felix Cheung
Considering the timing for Spark 3.0, > deprecating lower versions, bumping up R to 3.4 might be reasonable > option. > > Adding Shane as well. > > If we ended up with not upgrading it, I will forward this email to CRAN > sysadmin to discuss further anyway. > > > &

Re: DataSourceV2 capability API

2018-11-09 Thread Felix Cheung
One question is where will the list of capability strings be defined? From: Ryan Blue Sent: Thursday, November 8, 2018 2:09 PM To: Reynold Xin Cc: Spark Dev List Subject: Re: DataSourceV2 capability API Yes, we currently use traits that have methods. Something

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Felix Cheung
Very cool! From: Hyukjin Kwon Sent: Thursday, November 8, 2018 10:29 AM To: dev Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization. It

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-08 Thread Felix Cheung
_20181105_165757/Windows/00check.log > and > https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Debian/00check.log, > the tests run in 1s. > On Tue, Nov 6, 2018 at 1:29 PM Felix Cheung wrote: > > > > I’d rather not mess with 2.4.0 at this point. On CRA

Re: Test and support only LTS JDK release?

2018-11-06 Thread Felix Cheung
Is there a list of LTS release that I can reference? From: Ryan Blue Sent: Tuesday, November 6, 2018 1:28 PM To: sn...@snazy.de Cc: Spark Dev List; cdelg...@apple.com Subject: Re: Test and support only LTS JDK release? +1 for supporting LTS releases. On Tue,

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread Felix Cheung
So to clarify, only scala 2.12 is supported in Spark 3? From: Ryan Blue Sent: Tuesday, November 6, 2018 1:24 PM To: d_t...@apple.com Cc: Sean Owen; Spark Dev List; cdelg...@apple.com Subject: Re: Make Scala 2.12 as default Scala version in Spark 3.0 +1 to Scala

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Felix Cheung
. Need to investigate but worse case test_package can run with 0 test. From: Sean Owen Sent: Tuesday, November 6, 2018 10:51 AM To: Shivaram Venkataraman Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR

Re: Java 11 support

2018-11-06 Thread Felix Cheung
+1 for Spark 3, definitely Thanks for the updates From: Sean Owen Sent: Tuesday, November 6, 2018 9:11 AM To: Felix Cheung Cc: dev Subject: Re: Java 11 support I think that Java 9 support basically gets Java 10, 11 support. But the jump from 8 to 9

Java 11 support

2018-11-06 Thread Felix Cheung
Speaking of, get we work to support Java 11? That will fix all the problems below. From: Felix Cheung Sent: Tuesday, November 6, 2018 8:57 AM To: Wenchen Fan Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman Subject: Re: [CRAN-pretest-archived

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Felix Cheung
We have not been able to publish to CRAN for quite some time (since 2.3.0 was archived - the cause is Java 11) I think it’s ok to announce the release of 2.4.0 From: Wenchen Fan Sent: Tuesday, November 6, 2018 8:51 AM To: Felix Cheung Cc: Matei Zaharia; Sean

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Felix Cheung
some ideas. Matei > On Nov 5, 2018, at 9:09 PM, Felix Cheung wrote: > > I don¡Št know what the cause is yet. > > The test should be skipped because of this check > https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21 > > And this >

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-05 Thread Felix Cheung
: callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", "fit", formula, The earlier release was achived because of Java 11+ too so this unfortunately isn’t new. From: Sean Owen Sent: Monday, November 5, 2018 7:22 PM To: Felix Cheung

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-05 Thread Felix Cheung
FYI. SparkR submission failed. It seems to detect Java 11 correctly with vignettes but not skipping tests as would be expected. Error: processing vignette ‘sparkr-vignettes.Rmd’ failed with diagnostics: Java version 8 is required for this package; found version: 11.0.1 Execution halted *

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-01 Thread Felix Cheung
Thanks for being this up and much appreciate with keeping on top of this at all times. Can upgrading R able to fix the issue. Is this perhaps not necessarily malform but some new format for new versions perhaps? Anyway we should consider upgrading R version if that fixes the problem. As an

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Felix Cheung
+1 Checked R doc and all R API changes From: Denny Lee Sent: Wednesday, October 31, 2018 9:13 PM To: Chitral Verma Cc: Wenchen Fan; dev@spark.apache.org Subject: Re: [VOTE] SPARK 2.4.0 (RC5) +1 On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Felix Cheung
Yes please! From: Ryan Blue Sent: Thursday, October 25, 2018 1:10 PM To: Spark Dev List Subject: DataSourceV2 hangouts sync Hi everyone, There's been some great discussion for DataSourceV2 in the last few months, but it has been difficult to resolve some of

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Felix Cheung
I’m in favor of it. If you check the PR it’s a few isolated script changes and all test-only changes. Should have low impact on release but much better integration test coverage. From: Erik Erlandson Sent: Tuesday, October 16, 2018 8:20 AM To: dev Subject:

Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-07 Thread Felix Cheung
Jars and libraries only accessible locally at the driver is fairly limited? Don’t you want the same on all executor? From: Yinan Li Sent: Friday, October 5, 2018 11:25 AM To: Stavros Kontopoulos Cc: rve...@dotnetrdf.org; dev Subject: Re: [DISCUSS][K8S] Local

  1   2   3   >